Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Science Driver: Running a Periodogram over all ZTF Lightcurves #42

Open
dougbrn opened this issue Jul 8, 2024 · 3 comments
Open

Science Driver: Running a Periodogram over all ZTF Lightcurves #42

dougbrn opened this issue Jul 8, 2024 · 3 comments
Assignees
Labels
Science Driver Use Cases to Drive Development

Comments

@dougbrn
Copy link
Collaborator

dougbrn commented Jul 8, 2024

Describe the desired workflow.

Including details and what needs to be run, on what system (if relevant), and any technologies that will be used alongside Nested-Dask

This driver is straightforward, in that we simply want to run a common analysis function at the scale of ZTF (~4.5 Billion Lightcurves). Periodogram (likely just a single band via Astropy's implementation) seems like a sensible choice for it's ubiquity. It's preferred to run this on some kind of distributed system, like the PSC or Fornax. It's also preferred to do this analysis with LSDB as well.

How will doing this driver create impact?

Does this enable scientific work that wasn't possible (or just difficult) before? Will this test the scalability and robustness of Nested-Dask/Nested-Pandas?

The sole impact of this is to assess the scalability limitations of our current implementation of Nested-Dask/Nested-Pandas. Will we be able to get through the full workflow and what issues will we encounter?

Does this require any new functionality to be added to Nested-Dask?

E.g. Are there API functions needed that are not present (to the best of your knowledge)? Independent tickets should be created for these features and linked back to this issue.

We should be able to do this with the current functionality.

Should this produce documentation?

Can we capture the result of this driver in some way? For example, as a tutorial or longer-form notebook (held in a different repository)

This should be kept at minimum as a long-form notebook in a different repository (notebooks-lf), or directly within the Nested-Dask and/or LSDB docs. There is potential for this to inform some new best practices for working at scale in our main documentation.

@dougbrn dougbrn added the Science Driver Use Cases to Drive Development label Jul 8, 2024
@dougbrn
Copy link
Collaborator Author

dougbrn commented Aug 21, 2024

The notebook that will be run for this is here: https://github.com/lincc-frameworks/notebooks_lf/blob/main/ztf_periodogram/ztf_periodogram_epyc.ipynb

A spreadsheet for run tracking is here: https://docs.google.com/spreadsheets/d/19-GexwAu1TBunGKkCNMU7c3uwhbcLwZXteLLvT3Q48w/edit?gid=0#gid=0

Timing for above is centered on how long it takes to run the histogram plot cell towards the end.

@dougbrn
Copy link
Collaborator Author

dougbrn commented Aug 21, 2024

@wilsonbb will be testing on epyc with the locally available ztf_axs dr14, with a switch to dr20 when available
@hombit will be testing on psc after downloading ztf dr20 to it
@dougbrn will be testing on a local macbook via https

@nevencaplar
Copy link

Please also test with baldur (which can go ``epyc'').

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Science Driver Use Cases to Drive Development
Projects
None yet
Development

No branches or pull requests

4 participants