Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bright Neighbor Validation #2

Closed
wants to merge 16 commits into from
Closed

Bright Neighbor Validation #2

wants to merge 16 commits into from

Conversation

imedan
Copy link
Contributor

@imedan imedan commented Apr 2, 2024

@albireox I wanted to start a draft of a pull request so we can comment on what I have so far for the bright neighbor validation. The current implementation relies on having the bright neighbor healpix indices saved in a file that are pointed to by the environment variable BN_HEALPIX. Then it just uses the R.A. and Decl. of the ToOs to see if they are in the bright neighbor regions on the sky for a given design_mode. Currently, for the sample file docs/sample.csv, the check for one design_mode takes ~1 minute.

A couple things that may need to be added are:

  • Account for proper motions
  • Account for offsets
  • Speed up index search

Let me know what you think.

Copy link

codecov bot commented Apr 2, 2024

Codecov Report

Attention: Patch coverage is 0% with 24 lines in your changes are missing coverage. Please review.

Project coverage is 88.7%. Comparing base (0912bce) to head (cd7056a).
Report is 1 commits behind head on main.

❗ Current head cd7056a differs from pull request most recent head 764d437. Consider uploading reports for the commit 764d437 to get more accurate results

Files Patch % Lines
src/too/validate.py 0.0% 24 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main      #2     +/-   ##
=======================================
- Coverage   95.2%   88.7%   -6.5%     
=======================================
  Files          8       9      +1     
  Lines        327     351     +24     
  Branches      64      66      +2     
=======================================
  Hits         311     311             
- Misses         4      28     +24     
  Partials      12      12             

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@imedan
Copy link
Contributor Author

imedan commented Apr 4, 2024

I have now added offsetting and proper motion correction to the bright neighbor check. A couple things that have changed as a result of this:

  • I have added can_offset to the data model and to the sample.csv.
  • It should be noted I am working under the assumption that if pmra is not Null, then the epoch value must be set. I have updated this in sample.csv.
  • The code now depends on mugatu, but I cannot add it as a dependency. This is because mugatu depends on sdssdb (a tagged version), so poetry seems to not like having the branch you require work with the mugatu requirement. This should be fixable down the road, but for now you need to install mugatu separately for the code to run.

One other thing to figure out. The offsetting function relies on knowing what observatory the observation will be at. Is this something we will know at the validation stage? Or is it simpler to just run the validation for each observatory?

@albireox
Copy link
Member

Hi @imedan,

I have finally gotten to a point testing the validation that I think I have some relevant comments. Instead of merging this branch directly I did merge/copy-paste/modify your additions in this file in the dev branch

https://github.com/sdss/too/blob/dev/src/too/validate.py

The main reason for this is that I wanted to avoid depending on Mugatu for now, as it requires some additional dependencies to build Kaiju and it only works with Python <= 3.10 (which is what Kaiju allows). Happy to revisit that if we can make Mugatu to not depend on Kaiju and work with the latest Pythons.

In any case, what I did was to copy some functions from Mugatu and a trimmed down version of DesignMode. I also changed a bit the inputs, but this is mostly for my own convenience when calling these functions, and because I've been trying in too to not use the default connection that sdssdb provides and allow it to be overridden (which probably is not very relevant for production, but it was important for development). To make this work I also had to make two changes:

  • Here I had to comment this block as I think it does the same as in magnitude_array (although it seems you already deleted that?)
  • Here I think the ev_boss should be ev_apogee and that was causing a shape mismatch.

The non-critical but useful change I made is to use polars lazy query options to speed things up. The BN validation was taking some 10 minutes to run, which is not terrible. I tried to parallelise it but run into issues and ended up giving up. The slowdown is 90% due to reading the large pickle files for some design modes. On one side, I think pickle is not ideal here because it's uncompressed, and the Healpix arrays in those files are quite compressible. I create Parquet files from the pickle files and the size decreases by a factor 4-5 (something line numpy.savez_compressed also works at the same level). But if one does not load the entire file in memory and opens it with polars.scan_parquet and then queries the pixels that match, that is done in a few seconds (because of whatever ultra-efficient magic the polars query planner does in the background).

Currently the version in dev seems to work fine and I can validate targets (see my implementation of the call here.

Let me know what you think and whether we should just merge those changes to main, or you want to port them back here or to Mugatu.

@imedan
Copy link
Contributor Author

imedan commented Apr 23, 2024

So, I think it does make sense (for now) to just copy over the mugatu functions to the validate.py file for development purposes (for the reasons you stated). I do think we should not do this for the production version of the code though, as if these functions need to change for some reason it will be annoying to have to do it in two separate places. But, we can cross that bridge when we get there...

Other than this, the changes that you made look all good to me! For the two errors you caught, I did correct those locally but forgot to push them to the remote branch...so thanks for catching them! Also, the polars query seems nice if it speeds things up. Similarly, the polars.scan_parquet magic sounds great to me, the large size of those pickle files was annoying...

I also looked at your implementation of validating the targets. It looks mostly good to me. I guess it is still missing differentiating between the observatories for the targets? This is a change we will have to make, as the bright neighbor radius is not the same at APO and LCO.

Other than that last comment, this looks good to merge to main to me. I think it makes the most sense to just merge the dev branch to main and we can work out the mugatu dependency issues later.

@albireox
Copy link
Member

Sounds good, thanks for looking at the dev branch. And yes, the observatory is hardcoded right. Since each field is associated with an observatory, I think the easiest is to match targets to the FoV of each field and then use that observatory.

That reminds me that I wanted to ask about a detail of how the function should be called. Is the idea (as I wrote it right now) that we check every target for all the design modes that match their bright/dark sky brightness mode, and that if any fail we reject that target? Or was your idea that we only check the design mode of the design in which the too target is replacing a science target? I think the latter cannot be done except on the mountain.

@imedan
Copy link
Contributor Author

imedan commented Apr 24, 2024

Since each field is associated with an observatory, I think the easiest is to match targets to the FoV of each field and then use that observatory.

Yes, I think this would make the most sense and we should be able to do this ahead of time. Should I add a function to validate.py that is able to do this check? I think I would know how to go about doing this.

Is the idea (as I wrote it right now) that we check every target for all the design modes that match their bright/dark sky brightness mode, and that if any fail we reject that target?

This is close to my intention. You are right that you need to check every target for the design modes that match their bright/dark sky brightness mode (so if "bright", only need to check design modes with "bright" in them). The second half isn't quite correct though. For example, it is possible that a ToO could be valid in dark_plane, but not dark_faint. I would imagine the intention should be that the ToO could be observed in dark_plane designs, but not dark_faint ones. So, I really think we need to store the validation results for each target and for each design mode. Then, on the mountain for a specific design we should be able to select 1) targets that are within the FOV of the field and 2) targets that are valid for the design mode for the specific design. Does that sound right to you?

@albireox
Copy link
Member

Ah yes, I had not thought of storing those results. That makes total sense.

@albireox
Copy link
Member

albireox commented May 7, 2024

This is all integrated in the main branch now and I think working fine, so I'm gong to close this PR.

@albireox albireox closed this May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants