-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
converting panstarrs read_hipscat to read_hats #382
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,7 +4,7 @@ | |
import lsdb | ||
from dask.distributed import Client | ||
from data_structures import MultiIndexDFObject | ||
|
||
from upath import UPath | ||
|
||
# panstarrs light curves from hipscat catalog in S3 using lsdb | ||
def panstarrs_get_lightcurves(sample_table, *, radius=1): | ||
|
@@ -26,28 +26,21 @@ def panstarrs_get_lightcurves(sample_table, *, radius=1): | |
#read in the panstarrs object table to lsdb | ||
#this table will be used for cross matching with our sample's ra and decs | ||
#but does not have light curve information | ||
panstarrs_object = lsdb.read_hipscat( | ||
's3://stpubdata/panstarrs/ps1/public/hipscat/otmo', | ||
storage_options={'anon': True}, | ||
columns=[ | ||
"objID", # PS1 ID | ||
"raMean", "decMean", # coordinates to use for cross-matching | ||
"nStackDetections", # some other data to use | ||
] | ||
) | ||
panstarrs_object = lsdb.read_hats(UPath('s3://stpubdata/panstarrs/ps1/public/hats/otmo', anon=True), margin_cache=UPath('s3://stpubdata/panstarrs/ps1/public/hats/otmo_10arcs', anon=True), columns=[ "objID", # PS1 ID | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please cut up this very long line (and the other similar one below) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a good resource on how to do this? I tried cutting these up and ended up with syntax errors. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done. |
||
"raMean", "decMean", # coordinates to use for cross-matching | ||
"nStackDetections", # some other data to use | ||
] | ||
) | ||
#read in the panstarrs light curves to lsdb | ||
#panstarrs recommendation is not to index into this table with ra and dec | ||
#but to use object ids from the above object table | ||
panstarrs_detect = lsdb.read_hipscat( | ||
's3://stpubdata/panstarrs/ps1/public/hipscat/detection', | ||
storage_options={'anon': True}, | ||
columns=[ | ||
"objID", # PS1 object ID | ||
panstarrs_detect = lsdb.read_hats(UPath('s3://stpubdata/panstarrs/ps1/public/hats/detection', anon=True), margin_cache=UPath('s3://stpubdata/panstarrs/ps1/public/hats/detection_10arcs', anon=True), | ||
columns=[ "objID", # PS1 object ID | ||
"detectID", # PS1 detection ID | ||
# light-curve stuff | ||
"obsTime", "filterID", "psfFlux", "psfFluxErr", | ||
], | ||
) | ||
] | ||
) | ||
#convert astropy table to pandas dataframe | ||
#special care for the SkyCoords in the table | ||
sample_df = pd.DataFrame({'objectid': sample_table['objectid'], | ||
|
@@ -119,4 +112,4 @@ def panstarrs_get_lightcurves(sample_table, *, radius=1): | |
}).set_index(["objectid", "label", "band", "time"]) | ||
|
||
return MultiIndexDFObject(data=df_lc) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,9 +6,9 @@ jupytext: | |
format_version: 0.13 | ||
jupytext_version: 1.16.4 | ||
kernelspec: | ||
display_name: science_demo | ||
display_name: notebook | ||
language: python | ||
name: conda-env-science_demo-py | ||
name: python3 | ||
Comment on lines
-9
to
+11
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are these the new names on fornax? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, I don't know, this all happens for me in the background when I open a notebook on Fornax. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, we can change them, |
||
--- | ||
|
||
# Make Multi-Wavelength Light Curves Using Archival Data | ||
|
@@ -74,7 +74,7 @@ This cell will install them if needed: | |
|
||
```{code-cell} ipython3 | ||
# Uncomment the next line to install dependencies if needed. | ||
# !pip install -r requirements_light_curve_generator.txt | ||
!pip install -r requirements_light_curve_generator.txt | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't make CI happy :) The other failures have already been fixed on the default branch. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there anything we can do about this because I need it to run the notebooks, and you need it not to be there for CI? Somehow remembering to comment it out isn't sticking in my brain. Can we wrap this in an if statement or something? It is not worth a lot of effort, but if there is a workaround that keeps you from having to remind me every single time, that might be worth it. And sorry I can't seem to remember this. |
||
``` | ||
|
||
```{code-cell} ipython3 | ||
|
@@ -258,6 +258,8 @@ print('WISE search took:', time.time() - WISEstarttime, 's') | |
### 2.4 MAST: Pan-STARRS | ||
The function to retrieve lightcurves from Pan-STARRS uses a version of both the object and light curve catalogs that are stored in the cloud and accessed using [lsdb](https://docs.lsdb.io/en/stable/). This function is efficient at large scale (sample sizes > ~1000). | ||
|
||
Some warnings are expected. | ||
|
||
```{code-cell} ipython3 | ||
panstarrsstarttime = time.time() | ||
|
||
|
@@ -452,3 +454,7 @@ This work made use of: | |
* Lightkurve; Lightkurve Collaboration 2018, 2018ascl.soft12013L | ||
* acstools; https://zenodo.org/record/7406933#.ZBH1HS-B0eY | ||
* unWISE light curves; Meisner et al., 2023, 2023AJ....165...36M | ||
|
||
```{code-cell} ipython3 | ||
|
||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no need for the empty cell at the end There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,7 +14,7 @@ astroquery>=0.4.8.dev0 | |
acstools | ||
lightkurve | ||
alerce | ||
lsdb | ||
lsdb==0.4.5 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this a minimum requirement or strictly only this version works? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do I test this? It doesn't work without the ==0.4.5 which is the current version. Is it best practice to make this a >= and assume future versions will work? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeap, add |
||
# We use distributed in this notebook, but installing any dask would make the [dataframe] extras required by dependencies for other notebooks. | ||
# It feels to be the cleanest solution to add the dependency here as we don't directly use it elsewhere. | ||
dask[distributed,dataframe] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this coming from? Do we really need the extra dependency? (upath is not a direct dependency until this point)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think lsdb.read_hats() will be able to deal with raw s3 strings, so no need to wrap it up as a path object, directly use the urls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now required by
lsdb
. Sending a raw string for s3 now throws an error:It also will not accept a filesystem object, only a
UPath
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Before I commented I tried it locally and it worked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, I think it's an upstream issue, their API should be user friendly enough to accept reasonable inputs like an s3 uri string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the one todo here thus is to add
universal_pathlib
to the requirements file as we directly use it (although it's already an indirect requirement via hats/lsdb)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened astronomy-commons/hats#462
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
astronomy-commons/hats#466 added support for submitting the s3 uri as a string when only an anonymous connection is needed. So this kind of call should work now:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It hasn't been released though, so I would just open a reminder issue to do the cleanup once we have it installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, thanks. I see you opened #383.