Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pagination to query Execution #141

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

matthewmturner
Copy link
Collaborator

@matthewmturner matthewmturner commented Sep 13, 2024

This PR adds pagination to query execution.

  • Adds a new PaginatingRecordBatchStream
  • Adds new UI and key handlers for going to the next and previous pagaes

@matthewmturner
Copy link
Collaborator Author

@alamb not done yet, but thought you may be interested.

@matthewmturner matthewmturner linked an issue Sep 13, 2024 that may be closed by this pull request
@matthewmturner matthewmturner changed the title Start setting up pagination Add pagination to query Execution Sep 13, 2024
current_batch: Option<usize>,
}

impl PaginatingRecordBatchStream {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure yet if this is the final api signatures - I will know once I plug into the display code

@@ -108,3 +110,179 @@ impl AppExecution {
Ok(())
}
}

/// A stream of [`RecordBatch`]es that can be paginated for display in the TUI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that isn't entirely clear to me is if an entire RecordBatch is shown at a time or just a slice (like only rows 100-200)

This might be an important distinction in terms of what "current batch" means and if you want this structure to pagninate based on batch index or logical row number

I think either could work -- but if you pagniate on batch, you'll have to implement logic somewhere else to translate that into logical row number for display

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that could depend on the UI that we expose. I have two options in mind:

  1. Incrementing a page completely replaces the displayed records with the next page of records
  2. Incrementing a page adds the records to the existing view (i think getting the UI correct for this may not be straight forward as i believe the naive expectation could be that the next page is triggered while scrolling and reaching the bottom of the table - like how it works in DBeaver. But happy to hear other views on this)

One thing to note - is that the Table widget we use automatically provides some scrolling capabilities (it tracks the selected row automatically - so if we continue to use that, which i think is beneficial at our stage, we at least dont have to do everything from scratch).

One potentially terrible idea I had, that would at least enable a very simple v1 (at the cost of worse query performance) is default batch size for the TUI to some relatively small amount (say 200 rows) and then all rows for that batch would be added to the Table. Next page just gets the next batch and replaces the Table records. (I.e. option 1). We wouldnt need to track rows or figure out how to stitch records together between record batch boundaries. I believe this is the simplest approach and would at least make the app usable for larger queries, then we could add a todo for something more user friendly / that doesnt impact query performance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incrementing a page completely replaces the displayed records with the next page of records

This is how I (naively) as a user would expect things to behave

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One potentially terrible idea I had, that would at least enable a very simple v1 (at the cost of worse query performance) is default batch size for the TUI to some relatively small amount (say 200 rows) and then all rows for that batch would be added to the Table. Next page just gets the next batch and replaces the Table records. (I.e. option 1). We wouldnt need to track rows or figure out how to stitch records together between record batch boundaries. I believe this is the simplest approach and would at least make the app usable for larger queries, then we could add a todo for something more user friendly / that doesnt impact query performance.

This seems reasonable to me.

I also think all the APIs to stitch rows together are in arrow-rs (like concat_batches and slice) so we could also make some sort of adapter stream (another wrapper!) that took the incoming stream and reformatted it to smaller sized record batches

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewmturner
Copy link
Collaborator Author

Good news - i got query results to display that are plugged into the new paginated stream. However, this is a quite invasive change. Theres still a lot to cleanup and im not sure yet whether this is the desired structure. That being said, im quite happy to have an end to end flow working.

image

src/app/mod.rs Outdated
@@ -61,6 +62,8 @@ pub enum AppEvent {
Resize(u16, u16),
ExecuteDDL(String),
QueryResult(Query),
// PaginatedQueryResult(PaginatingRecordBatchStream),
QueryResultsNextPage,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be new AppEvent's for paginating

.title(Title::from(" Page ").alignment(Alignment::Right));

let results = app.execution().results();
let locked = tokio::task::block_in_place(|| results.blocking_lock());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we didn't need to do this while rendering, but without it the app panics.

/// A stream of [`RecordBatch`]es that can be paginated for display in the TUI.
pub struct PaginatingRecordBatchStream {
// currently executing stream
inner: SendableRecordBatchStream,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because SendableRecordBatchStream is not Sync this struct doesnt work well with our AppEvent which requires Send and Sync. I think we can restructure so that the batches / current idx are managed separate from the batch stream so they can be sent as AppEvent and state can be updated in the existing pattern (i.e in our handlers) and then we dont need to lock during rendering.

@matthewmturner
Copy link
Collaborator Author

matthewmturner commented Sep 15, 2024

So I ended up going with a different approach on paginating but i think it aligns better with the rendering model.

Now, the sql tab state will store the record batches and current index to display and those will be updated synchronously in AppEvent handlers. And AppExecution will just have Arc<Mutex<Option<SendableRecordBatchStream>>>. Pagination on this will be done in tokio tasks and when batch is available will be sent in AppEvent::QueryResultNextPage(batch). This means that rendering will no longer require a lock to get access to the current batch as was required in the previous model. I still need to update several places to work with with this model but I did successfully test that it worked correctly for a single query to display results and im pretty sure it will work for the remaining places.

@alamb
Copy link
Contributor

alamb commented Sep 15, 2024

. This means that rendering will no longer require a lock to get access to the current batch as was required in the previous model. I

This makes sense to me

@matthewmturner
Copy link
Collaborator Author

matthewmturner commented Sep 17, 2024

Making good progress

DFT.Paginating.mov

Edit: Sorry its kind of blurry but im pressing the right arrow and youll see the results updating and page number changing.

Still more to clean up and improve but its all coming together now

@matthewmturner
Copy link
Collaborator Author

Might end up needing those arrow utilities sooner than expected because the FlightSQL client doesnt let us set the record batch size.

@alamb
Copy link
Contributor

alamb commented Sep 19, 2024

BTW I am hoping to next contribute:

  1. An upgrade to datafusion 42 (https://github.com/datafusion-contrib/datafusion-functions-json is updated but I haven't checked the others)
  2. Then local file support
  3. Some sort of test for S3 integration (likely based on localstack) / Delta integration

matthewmturner added a commit that referenced this pull request Sep 23, 2024
(#141) was
becoming hard for me to reason about because FlightSQL pagination has to
be handled differently than SQL tab pagination (because on the SQL tab
we can control batch size directly on the context but we cant do that
with FlightSQL). So I would like to split pagination on each of those
tabs into their own PRs to keep them more focused and easier to review /
reason about / etc.

So this PR makes pagination work and adds integration tests for testing
pagination. There is still room for improvement in the end to end
testing (right now we test some of the implementation details) but at
least we have some coverage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Querys that return large # of results can freeze / crash the app
2 participants