Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add streaming support for arrow batches #44

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

sgrebnov
Copy link
Contributor

@sgrebnov sgrebnov commented May 8, 2024

PRs adds exec_streamed method that returns Arrow Batches via stream while downloading/converting response. This reduces memory usage for large datasets as the records can be processed by chunks and improves performance by giving access for already loaded records.

  • Similar to Snowflake go driver MAX_CHUNK_DOWNLOAD_WORKERS(10) download workers are used: https://github.com/snowflakedb/gosnowflake/blob/master/rows.go#L22
  • I originally made RawQueryResult to always return result via stream but then realized that there is a polars dependency that requires RawQueryResult in bytes as it can't use async functionality to convert stream to bytes (defines TryFrom that is always sync)
  • With this change I was finally able to perform queries agains the very large snowflake_sample_data.tpch_sf100 dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant