Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big speedup for ObjID checking #1042

Merged
merged 2 commits into from
Oct 22, 2024

Conversation

astronomerritt
Copy link
Contributor

@astronomerritt astronomerritt commented Oct 19, 2024

Fixes #1040.

  • Changed the way the ObjID columns in auxiliary input files are checked against each other in CombinedDataReader.check_aux_object_ids().
  • Tweaked error handling a bit so it worked properly again.

Initially I implemented this in the silliest way possible, by checking each individual element in one ObjID list to make sure it exists in the other list. This is O(n^2) complexity -- runtime scales quadratically with the size of the data -- and I should have known better. I plead temporary insanity.

It's now using collections.Counter() to compare one list's count of unique elements to the other. This is O(n) so runtime scales linearly with data size.

On an input file of 250,000 rows it takes 150ms.

(Not twelve minutes.)

Review Checklist for Source Code Changes

  • Does pip install still work?
  • Have you written a unit test for any new functions?
  • Do all the units tests run successfully?
  • Does Sorcha run successfully on a test set of input files/databases?
  • Have you used black on the files you have updated to confirm python programming style guide enforcement?

Copy link

codecov bot commented Oct 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.17%. Comparing base (bbece40) to head (844dc1d).
Report is 315 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1042      +/-   ##
==========================================
+ Coverage   81.14%   81.17%   +0.02%     
==========================================
  Files          70       70              
  Lines        3166     3171       +5     
==========================================
+ Hits         2569     2574       +5     
  Misses        597      597              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mschwamb mschwamb merged commit 1c5e0bc into dirac-institute:main Oct 22, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

reader.check_aux_object_ids() chokes on very large input files
2 participants