Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve way parquet metadata size is handled #9

Open
rdettai opened this issue Jan 22, 2021 · 0 comments
Open

Improve way parquet metadata size is handled #9

rdettai opened this issue Jan 22, 2021 · 0 comments
Labels
bug Something isn't working

Comments

@rdettai
Copy link
Contributor

rdettai commented Jan 22, 2021

The parquet metadata size is not known from the catalog. Having a dedicated call to the footer containing the metadata size would also be quite inefficient. This is why currently the first call downloads 1MB at the end of the file and hopes that the entire metadata will be within this range:

  • on one side 1MB is kind of large and download duration is not negligible
  • at the same time, parquet metadata can be large for files with many row groups

Solutions might be:

  • reduce the default size to 256KB and implement the logic that fetches the rest of the metadata if it didn't fit in the initial dl
@rdettai rdettai added the bug Something isn't working label Jan 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant