-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace pandas IO in margin generation #500
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #500 +/- ##
==========================================
- Coverage 94.97% 94.82% -0.15%
==========================================
Files 28 28
Lines 1811 1818 +7
==========================================
+ Hits 1720 1724 +4
- Misses 91 94 +3 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
In the margin generation pipeline, read and write parquet files using
pyarrow.parquet
instead of pandas'sread_parquet
.This allows us to manipulate the margin schema directly with pyarrow tables, and it solves the schema issues we were facing when generating margins for catalogs with nested columns. I added a unit test where I created
small_sky_nested_catalog
, fromsmall_sky_object
andsmall_sky_source
, which covers this use case. Closes #465.