Skip to content

Reduce memory usage in delta format code paths #723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

PatrickJin-db
Copy link
Collaborator

@PatrickJin-db PatrickJin-db commented May 13, 2025

Reduces memory consumption when using delta-kernel-rs by converting each batch into a pandas dataframe and concatenating the results together, rather than creating a pyarrow table from all the batches and converting that table with all the results. The pyarrow table usually remains in memory during the conversion to pandas, so it is best to convert smaller batches to pandas rather than the entire table.

Existing tests should be sufficient.

@PatrickJin-db PatrickJin-db force-pushed the PatrickJin-db/kernel-explicit-iterator-to_pandas-args branch from 0a141b3 to ade4ee3 Compare May 13, 2025 00:03
@PatrickJin-db PatrickJin-db reopened this May 13, 2025
@PatrickJin-db PatrickJin-db marked this pull request as draft May 13, 2025 00:04
@PatrickJin-db PatrickJin-db changed the title Patrick jin db/kernel explicit iterator to pandas args Reduce memory usage in delta format code path May 13, 2025
@PatrickJin-db PatrickJin-db force-pushed the PatrickJin-db/kernel-explicit-iterator-to_pandas-args branch 2 times, most recently from d2748d0 to d99d480 Compare May 23, 2025 00:07
@PatrickJin-db PatrickJin-db marked this pull request as ready for review May 23, 2025 00:09
@PatrickJin-db PatrickJin-db changed the title Reduce memory usage in delta format code path Reduce memory usage in delta format code paths May 23, 2025
@PatrickJin-db PatrickJin-db requested a review from linzhou-db May 24, 2025 02:52
@PatrickJin-db PatrickJin-db force-pushed the PatrickJin-db/kernel-explicit-iterator-to_pandas-args branch from d99d480 to 09e1a38 Compare May 24, 2025 02:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant