Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-8480: Cleanup before finished. 0 out of 1 streams have finished #2897

Merged
merged 1 commit into from
May 12, 2024

Conversation

rymarm
Copy link
Member

@rymarm rymarm commented Mar 28, 2024

DRILL-8480: Make Nested Loop Join operator properly process empty batches and batches with new schema

Description

Nested Loop Join operator (NestedLoopJoinBatch, NestedLoopJoin) unproperly handles batch iteration outcome OK with 0 records. Drill design of the processing of batches involves 5 states:

  • NONE (batch can have only 0 records)
  • OK (batch can have 0+ records)
  • OK_NEW_SCHEMA (batch can have 0+ records)
  • NOT_YET (undefined)
  • EMIT (batch can have 0+ records)
    The Nested Loop Join operator in some circumstances could receive OK outcome with 0 records, and instead of requesting the next batch, the operator stops data processing and returns NONE outcome to upstream batches(operators) without freeing resources of underlying batches.

Solution

Make the Nested Loop Join operator properly handle OK and OK_NEW_SCHEMA outcomes with 0 records and keep processing until NONE and NOT_YET outcomes are received.

Make the Nested Loop Join operator keep processing even if OK_NEW_SCHEMA outcome is received, but the schema wasn’t changed(yes, I know it sounds wild, but it’s possible and it is expected behavior).

Documentation

Testing

Manual testing with a file from the Jira ticket DRILL-8480

@cgivre cgivre requested a review from jnturton March 28, 2024 17:22
@rymarm rymarm added the bug label Mar 28, 2024
Copy link
Contributor

@cgivre cgivre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1 Thanks @rymarm for this.

I pre-emptively reviewed this, as @jnturton is getting ready to cut a release candidate this weekend.

@cgivre
Copy link
Contributor

cgivre commented Apr 10, 2024

@rymarm. What is the status of this PR? Is it ready for merging?

@rymarm
Copy link
Member Author

rymarm commented Apr 10, 2024

@cgivre Actually, there is one more thing, that I would fix in the scope of this PR:

// if left batch is empty, fetch next
if (leftUpstream != IterOutcome.NONE && left.getRecordCount() == 0) {
leftUpstream = next(LEFT_INPUT, left);
}

These few lines seem to be kludge which may not work in some very rare cases, but I forgot what issue occurs if remove it.
I would like to take a look at this, but I can do this in a separate PR because the current issue is completely fixed with the current changes.

@rymarm rymarm marked this pull request as ready for review April 10, 2024 17:14
Copy link
Contributor

@jnturton jnturton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks also for the detailed repro steps in the ticket.

@jnturton jnturton merged commit 613769d into apache:master May 12, 2024
8 checks passed
@jnturton jnturton added the backport-to-stable This bug fix is applicable to the latest stable release and should be considered for inclusion there label May 12, 2024
jnturton pushed a commit to jnturton/drill that referenced this pull request May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-to-stable This bug fix is applicable to the latest stable release and should be considered for inclusion there bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants