Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer reading CSV and issue warning #203

Merged
merged 2 commits into from
Feb 1, 2024
Merged

Conversation

delucchi-cmu
Copy link
Contributor

@delucchi-cmu delucchi-cmu commented Jan 31, 2024

Change Description

Close astronomy-commons/lsdb#107

After considering alternative storage strategies, there isn't a significant enough performance gain to warrant adding a new file. So this change make the partition info (and partition join info) reading prefer using the partition_info.csv file. We throw a warning to say that this operation is slow (which the user will notice when it's slow, but can get a hint that there's a faster way to proceed instead).

See analysis with numpy binary files as example. https://github.com/lincc-frameworks/notebooks_lf/blob/main/sprints/2024/02_08/npy_stuff.ipynb

Code Quality

  • I have read the Contribution Guide
  • My code follows the code style of this project
  • My code builds (or compiles) cleanly without any errors or warnings
  • My code contains relevant comments and necessary documentation

Copy link

codecov bot commented Jan 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (3a77c01) 100.00% compared to head (133745b) 100.00%.
Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #203   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           53        54    +1     
  Lines         1831      1848   +17     
=========================================
+ Hits          1831      1848   +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Jan 31, 2024

Before [22467ac] After [5c0212b] Ratio Benchmark (Parameter)
278±2ms 287±2ms 1.03 benchmarks.MetadataSuite.time_load_partition_info_order6
1.13±0s 1.15±0s 1.02 benchmarks.MetadataSuite.time_load_partition_info_order7
96.1±0.5ms 97.4±0.3ms 1.01 benchmarks.time_test_cone_filter_multiple_order
1.87±0.01s 1.87±0.01s 1 benchmarks.MetadataSuite.time_load_partition_join_info
643±1ms 644±4ms 1 benchmarks.Suite.time_pixel_tree_creation
125±0.6ms 126±1ms 1 benchmarks.time_test_alignment_even_sky

Click here to view all benchmarks.

Copy link
Contributor

@camposandro camposandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @delucchi-cmu!

@delucchi-cmu delucchi-cmu merged commit fb575dd into main Feb 1, 2024
12 checks passed
@delucchi-cmu delucchi-cmu deleted the issue/lsdb/107/csv branch February 1, 2024 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speed up reading partition info
2 participants