Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

converting panstarrs read_hipscat to read_hats #382

Merged
merged 3 commits into from
Mar 5, 2025
Merged

Conversation

jkrick
Copy link
Contributor

@jkrick jkrick commented Feb 27, 2025

panstarrs module in light_curve_generator was broken maybe because read_hipscat is gone, or the hipscat files themselves were gone. So, the switch to hats format is in this PR.

I mostly just changed lsdb.read_hipscat to lsdb.read_hats. Some warnings pop up now in this panstarrs module, so I have added a line of text to warn people that the warnings are expected.

This closes #380

This also closes #376

@jkrick jkrick added bug Something isn't working use case: light curves labels Feb 27, 2025
@jkrick jkrick requested a review from bsipocz February 27, 2025 22:48
@jkrick jkrick changed the title converting panstarrs read_hipsdcat to read_hats converting panstarrs read_hipscat to read_hats Feb 27, 2025
Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments and questions, mostly for cleanup purposes.

# !pip install -r requirements_light_curve_generator.txt
!pip install -r requirements_light_curve_generator.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make CI happy :)

The other failures have already been fixed on the default branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything we can do about this because I need it to run the notebooks, and you need it not to be there for CI? Somehow remembering to comment it out isn't sticking in my brain. Can we wrap this in an if statement or something? It is not worth a lot of effort, but if there is a workaround that keeps you from having to remind me every single time, that might be worth it. And sorry I can't seem to remember this.

Comment on lines -9 to +11
display_name: science_demo
display_name: notebook
language: python
name: conda-env-science_demo-py
name: python3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these the new names on fornax?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't know, this all happens for me in the background when I open a notebook on Fornax.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we can change them, python3 was the default for CI anyway, but it didn't make Fornax happy back then.

Comment on lines 457 to 460

```{code-cell} ipython3

```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for the empty cell at the end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

lsdb
lsdb==0.4.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a minimum requirement or strictly only this version works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I test this? It doesn't work without the ==0.4.5 which is the current version. Is it best practice to make this a >= and assume future versions will work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap, add >= as we go with the assumption future versions work, too. If not, then we introduce an upper pin or change our code.

@@ -4,7 +4,7 @@
import lsdb
from dask.distributed import Client
from data_structures import MultiIndexDFObject

from upath import UPath
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this coming from? Do we really need the extra dependency? (upath is not a direct dependency until this point)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think lsdb.read_hats() will be able to deal with raw s3 strings, so no need to wrap it up as a path object, directly use the urls

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now required by lsdb. Sending a raw string for s3 now throws an error:

FileNotFoundError: Failed to read HATS at location s3://stpubdata/panstarrs/ps1/public/hats/otmo

It also will not accept a filesystem object, only a UPath.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Before I commented I tried it locally and it worked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I think it's an upstream issue, their API should be user friendly enough to accept reasonable inputs like an s3 uri string

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the one todo here thus is to add universal_pathlib to the requirements file as we directly use it (although it's already an indirect requirement via hats/lsdb)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

astronomy-commons/hats#466 added support for submitting the s3 uri as a string when only an anonymous connection is needed. So this kind of call should work now:

hats.read_hats('s3://stpubdata/panstarrs/ps1/public/hats/otmo')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It hasn't been released though, so I would just open a reminder issue to do the cleanup once we have it installed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, thanks. I see you opened #383.

"nStackDetections", # some other data to use
]
)
panstarrs_object = lsdb.read_hats(UPath('s3://stpubdata/panstarrs/ps1/public/hats/otmo', anon=True), margin_cache=UPath('s3://stpubdata/panstarrs/ps1/public/hats/otmo_10arcs', anon=True), columns=[ "objID", # PS1 ID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please cut up this very long line (and the other similar one below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good resource on how to do this? I tried cutting these up and ended up with syntax errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@jkrick
Copy link
Contributor Author

jkrick commented Mar 5, 2025

I think I have addressed all comments, so merging this now.

@jkrick jkrick merged commit 0b771f3 into main Mar 5, 2025
1 of 3 checks passed
@jkrick jkrick deleted the panstarrs_hats branch March 5, 2025 18:22
@bsipocz
Copy link
Member

bsipocz commented Mar 5, 2025

note: the failing CI doesn't seem to be related to the changes here, and at least a fraction of them is fixed in main already. (the timeout for the scale up notebook is less deterministic, maybe the traceback in https://output.circle-artifacts.com/output/job/63c2c42c-79af-4f72-b523-3c6142e8116e/artifacts/0/_build/html/light_curves/scale_up.html gives a little bit of help with debugging)

@troyraen
Copy link
Contributor

troyraen commented Mar 5, 2025

(the timeout for the scale up notebook is less deterministic, maybe the traceback in https://output.circle-artifacts.com/output/job/63c2c42c-79af-4f72-b523-3c6142e8116e/artifacts/0/_build/html/light_curves/scale_up.html gives a little bit of help with debugging)

This is #375

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working use case: light curves
Projects
None yet
3 participants