Skip to content

chore: Improve performance of Parquet statistics conversion #10932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 17, 2024

Conversation

Weijun-H
Copy link
Member

Which issue does this PR close?

Closes #.

Rationale for this change

Refactor the code get_statistics! in statistics.rs to improve the performance of converting Parquet statistics to Arrow arrays. Instead of using conditional checks and try_from conversions, use the ok() method to handle the conversion and filtering of values. This change simplifies the code and improves readability.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Weijun-H -- I think this is a nice code impeovement ❤️

I am not sure this will improve the performance -- but you could run the benchmarks if you wanted to measure (cargo bench --bench parquet_statistic)

I filed #10934 to track adding a benchmark for data pages

@alamb
Copy link
Contributor

alamb commented Jun 16, 2024

cc @xinlifoobar and @marvinlanhenke

@marvinlanhenke
Copy link
Contributor

I think this is a nice code impeovement ❤️

It is indeed. Thanks @Weijun-H really nice refactor.

I am not sure this will improve the performance

I'll guess in terms of performance it won't make any difference.
However, it is more readable and thats always great, so thanks again.

Copy link
Contributor

@xinlifoobar xinlifoobar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great refactor, thanks :)

@alamb alamb merged commit 378b9ee into apache:main Jun 17, 2024
23 checks passed
@alamb
Copy link
Contributor

alamb commented Jun 17, 2024

Thanks again everyone

findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants