Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

converting panstarrs read_hipscat to read_hats #382

Merged
merged 3 commits into from
Mar 5, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 11 additions & 18 deletions light_curves/code_src/panstarrs_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import lsdb
from dask.distributed import Client
from data_structures import MultiIndexDFObject

from upath import UPath
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this coming from? Do we really need the extra dependency? (upath is not a direct dependency until this point)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think lsdb.read_hats() will be able to deal with raw s3 strings, so no need to wrap it up as a path object, directly use the urls

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now required by lsdb. Sending a raw string for s3 now throws an error:

FileNotFoundError: Failed to read HATS at location s3://stpubdata/panstarrs/ps1/public/hats/otmo

It also will not accept a filesystem object, only a UPath.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Before I commented I tried it locally and it worked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I think it's an upstream issue, their API should be user friendly enough to accept reasonable inputs like an s3 uri string

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the one todo here thus is to add universal_pathlib to the requirements file as we directly use it (although it's already an indirect requirement via hats/lsdb)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

astronomy-commons/hats#466 added support for submitting the s3 uri as a string when only an anonymous connection is needed. So this kind of call should work now:

hats.read_hats('s3://stpubdata/panstarrs/ps1/public/hats/otmo')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It hasn't been released though, so I would just open a reminder issue to do the cleanup once we have it installed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, thanks. I see you opened #383.


# panstarrs light curves from hipscat catalog in S3 using lsdb
def panstarrs_get_lightcurves(sample_table, *, radius=1):
Expand All @@ -26,28 +26,21 @@ def panstarrs_get_lightcurves(sample_table, *, radius=1):
#read in the panstarrs object table to lsdb
#this table will be used for cross matching with our sample's ra and decs
#but does not have light curve information
panstarrs_object = lsdb.read_hipscat(
's3://stpubdata/panstarrs/ps1/public/hipscat/otmo',
storage_options={'anon': True},
columns=[
"objID", # PS1 ID
"raMean", "decMean", # coordinates to use for cross-matching
"nStackDetections", # some other data to use
]
)
panstarrs_object = lsdb.read_hats(UPath('s3://stpubdata/panstarrs/ps1/public/hats/otmo', anon=True), margin_cache=UPath('s3://stpubdata/panstarrs/ps1/public/hats/otmo_10arcs', anon=True), columns=[ "objID", # PS1 ID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please cut up this very long line (and the other similar one below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good resource on how to do this? I tried cutting these up and ended up with syntax errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

"raMean", "decMean", # coordinates to use for cross-matching
"nStackDetections", # some other data to use
]
)
#read in the panstarrs light curves to lsdb
#panstarrs recommendation is not to index into this table with ra and dec
#but to use object ids from the above object table
panstarrs_detect = lsdb.read_hipscat(
's3://stpubdata/panstarrs/ps1/public/hipscat/detection',
storage_options={'anon': True},
columns=[
"objID", # PS1 object ID
panstarrs_detect = lsdb.read_hats(UPath('s3://stpubdata/panstarrs/ps1/public/hats/detection', anon=True), margin_cache=UPath('s3://stpubdata/panstarrs/ps1/public/hats/detection_10arcs', anon=True),
columns=[ "objID", # PS1 object ID
"detectID", # PS1 detection ID
# light-curve stuff
"obsTime", "filterID", "psfFlux", "psfFluxErr",
],
)
]
)
#convert astropy table to pandas dataframe
#special care for the SkyCoords in the table
sample_df = pd.DataFrame({'objectid': sample_table['objectid'],
Expand Down Expand Up @@ -119,4 +112,4 @@ def panstarrs_get_lightcurves(sample_table, *, radius=1):
}).set_index(["objectid", "label", "band", "time"])

return MultiIndexDFObject(data=df_lc)

12 changes: 9 additions & 3 deletions light_curves/light_curve_generator.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ jupytext:
format_version: 0.13
jupytext_version: 1.16.4
kernelspec:
display_name: science_demo
display_name: notebook
language: python
name: conda-env-science_demo-py
name: python3
Comment on lines -9 to +11
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these the new names on fornax?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't know, this all happens for me in the background when I open a notebook on Fornax.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we can change them, python3 was the default for CI anyway, but it didn't make Fornax happy back then.

---

# Make Multi-Wavelength Light Curves Using Archival Data
Expand Down Expand Up @@ -74,7 +74,7 @@ This cell will install them if needed:

```{code-cell} ipython3
# Uncomment the next line to install dependencies if needed.
# !pip install -r requirements_light_curve_generator.txt
!pip install -r requirements_light_curve_generator.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make CI happy :)

The other failures have already been fixed on the default branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything we can do about this because I need it to run the notebooks, and you need it not to be there for CI? Somehow remembering to comment it out isn't sticking in my brain. Can we wrap this in an if statement or something? It is not worth a lot of effort, but if there is a workaround that keeps you from having to remind me every single time, that might be worth it. And sorry I can't seem to remember this.

```

```{code-cell} ipython3
Expand Down Expand Up @@ -258,6 +258,8 @@ print('WISE search took:', time.time() - WISEstarttime, 's')
### 2.4 MAST: Pan-STARRS
The function to retrieve lightcurves from Pan-STARRS uses a version of both the object and light curve catalogs that are stored in the cloud and accessed using [lsdb](https://docs.lsdb.io/en/stable/). This function is efficient at large scale (sample sizes > ~1000).

Some warnings are expected.

```{code-cell} ipython3
panstarrsstarttime = time.time()

Expand Down Expand Up @@ -452,3 +454,7 @@ This work made use of:
* Lightkurve; Lightkurve Collaboration 2018, 2018ascl.soft12013L
* acstools; https://zenodo.org/record/7406933#.ZBH1HS-B0eY
* unWISE light curves; Meisner et al., 2023, 2023AJ....165...36M

```{code-cell} ipython3

```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for the empty cell at the end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

2 changes: 1 addition & 1 deletion light_curves/requirements_light_curve_generator.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ astroquery>=0.4.8.dev0
acstools
lightkurve
alerce
lsdb
lsdb==0.4.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a minimum requirement or strictly only this version works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I test this? It doesn't work without the ==0.4.5 which is the current version. Is it best practice to make this a >= and assume future versions will work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap, add >= as we go with the assumption future versions work, too. If not, then we introduce an upper pin or change our code.

# We use distributed in this notebook, but installing any dask would make the [dataframe] extras required by dependencies for other notebooks.
# It feels to be the cleanest solution to add the dependency here as we don't directly use it elsewhere.
dask[distributed,dataframe]
Expand Down