Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

converting panstarrs read_hipscat to read_hats #382

Merged
merged 3 commits into from
Mar 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 19 additions & 21 deletions light_curves/code_src/panstarrs_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import lsdb
from dask.distributed import Client
from data_structures import MultiIndexDFObject

from upath import UPath
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this coming from? Do we really need the extra dependency? (upath is not a direct dependency until this point)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think lsdb.read_hats() will be able to deal with raw s3 strings, so no need to wrap it up as a path object, directly use the urls

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now required by lsdb. Sending a raw string for s3 now throws an error:

FileNotFoundError: Failed to read HATS at location s3://stpubdata/panstarrs/ps1/public/hats/otmo

It also will not accept a filesystem object, only a UPath.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Before I commented I tried it locally and it worked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I think it's an upstream issue, their API should be user friendly enough to accept reasonable inputs like an s3 uri string

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the one todo here thus is to add universal_pathlib to the requirements file as we directly use it (although it's already an indirect requirement via hats/lsdb)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

astronomy-commons/hats#466 added support for submitting the s3 uri as a string when only an anonymous connection is needed. So this kind of call should work now:

hats.read_hats('s3://stpubdata/panstarrs/ps1/public/hats/otmo')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It hasn't been released though, so I would just open a reminder issue to do the cleanup once we have it installed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, thanks. I see you opened #383.


# panstarrs light curves from hipscat catalog in S3 using lsdb
def panstarrs_get_lightcurves(sample_table, *, radius=1):
Expand All @@ -26,28 +26,26 @@ def panstarrs_get_lightcurves(sample_table, *, radius=1):
#read in the panstarrs object table to lsdb
#this table will be used for cross matching with our sample's ra and decs
#but does not have light curve information
panstarrs_object = lsdb.read_hipscat(
's3://stpubdata/panstarrs/ps1/public/hipscat/otmo',
storage_options={'anon': True},
columns=[
"objID", # PS1 ID
"raMean", "decMean", # coordinates to use for cross-matching
"nStackDetections", # some other data to use
]
)
panstarrs_object = lsdb.read_hats(
UPath('s3://stpubdata/panstarrs/ps1/public/hats/otmo', anon=True),
margin_cache=UPath('s3://stpubdata/panstarrs/ps1/public/hats/otmo_10arcs', anon=True),
columns=[ "objID", # PS1 ID
"raMean", "decMean", # coordinates to use for cross-matching
"nStackDetections", # some other data to use
]
)
#read in the panstarrs light curves to lsdb
#panstarrs recommendation is not to index into this table with ra and dec
#but to use object ids from the above object table
panstarrs_detect = lsdb.read_hipscat(
's3://stpubdata/panstarrs/ps1/public/hipscat/detection',
storage_options={'anon': True},
columns=[
"objID", # PS1 object ID
"detectID", # PS1 detection ID
# light-curve stuff
"obsTime", "filterID", "psfFlux", "psfFluxErr",
],
)
panstarrs_detect = lsdb.read_hats(
UPath('s3://stpubdata/panstarrs/ps1/public/hats/detection', anon=True),
margin_cache=UPath('s3://stpubdata/panstarrs/ps1/public/hats/detection_10arcs', anon=True),
columns=[ "objID", # PS1 object ID
"detectID", # PS1 detection ID
# light-curve stuff
"obsTime", "filterID", "psfFlux", "psfFluxErr",
]
)
#convert astropy table to pandas dataframe
#special care for the SkyCoords in the table
sample_df = pd.DataFrame({'objectid': sample_table['objectid'],
Expand Down Expand Up @@ -119,4 +117,4 @@ def panstarrs_get_lightcurves(sample_table, *, radius=1):
}).set_index(["objectid", "label", "band", "time"])

return MultiIndexDFObject(data=df_lc)

10 changes: 6 additions & 4 deletions light_curves/light_curve_generator.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ jupytext:
format_version: 0.13
jupytext_version: 1.16.4
kernelspec:
display_name: science_demo
display_name: notebook
language: python
name: conda-env-science_demo-py
name: python3
Comment on lines -9 to +11
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these the new names on fornax?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't know, this all happens for me in the background when I open a notebook on Fornax.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we can change them, python3 was the default for CI anyway, but it didn't make Fornax happy back then.

---

# Make Multi-Wavelength Light Curves Using Archival Data
Expand Down Expand Up @@ -74,7 +74,7 @@ This cell will install them if needed:

```{code-cell} ipython3
# Uncomment the next line to install dependencies if needed.
# !pip install -r requirements_light_curve_generator.txt
#!pip install -r requirements_light_curve_generator.txt
```

```{code-cell} ipython3
Expand All @@ -101,7 +101,7 @@ from tess_kepler_functions import tess_kepler_get_lightcurves
from wise_functions import wise_get_lightcurves
# Note: ZTF data is temporarily located in a non-public AWS S3 bucket. It is automatically available
# from the Fornax Science Console, but otherwise will require explicit user credentials.
from ztf_functions import ztf_get_lightcurves
# from ztf_functions import ztf_get_lightcurves
```

## 1. Define the sample
Expand Down Expand Up @@ -258,6 +258,8 @@ print('WISE search took:', time.time() - WISEstarttime, 's')
### 2.4 MAST: Pan-STARRS
The function to retrieve lightcurves from Pan-STARRS uses a version of both the object and light curve catalogs that are stored in the cloud and accessed using [lsdb](https://docs.lsdb.io/en/stable/). This function is efficient at large scale (sample sizes > ~1000).

Some warnings are expected.

```{code-cell} ipython3
panstarrsstarttime = time.time()

Expand Down
3 changes: 2 additions & 1 deletion light_curves/requirements_light_curve_generator.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ astroquery>=0.4.8.dev0
acstools
lightkurve
alerce
lsdb
lsdb>=0.4.5
universal_pathlib
# We use distributed in this notebook, but installing any dask would make the [dataframe] extras required by dependencies for other notebooks.
# It feels to be the cleanest solution to add the dependency here as we don't directly use it elsewhere.
dask[distributed,dataframe]
Expand Down