Science Driver: Running a Periodogram over all ZTF Lightcurves #42

dougbrn · 2024-07-08T18:58:13Z

Describe the desired workflow.

Including details and what needs to be run, on what system (if relevant), and any technologies that will be used alongside Nested-Dask

This driver is straightforward, in that we simply want to run a common analysis function at the scale of ZTF (~4.5 Billion Lightcurves). Periodogram (likely just a single band via Astropy's implementation) seems like a sensible choice for it's ubiquity. It's preferred to run this on some kind of distributed system, like the PSC or Fornax. It's also preferred to do this analysis with LSDB as well.

How will doing this driver create impact?

Does this enable scientific work that wasn't possible (or just difficult) before? Will this test the scalability and robustness of Nested-Dask/Nested-Pandas?

The sole impact of this is to assess the scalability limitations of our current implementation of Nested-Dask/Nested-Pandas. Will we be able to get through the full workflow and what issues will we encounter?

Does this require any new functionality to be added to Nested-Dask?

E.g. Are there API functions needed that are not present (to the best of your knowledge)? Independent tickets should be created for these features and linked back to this issue.

We should be able to do this with the current functionality.

Should this produce documentation?

Can we capture the result of this driver in some way? For example, as a tutorial or longer-form notebook (held in a different repository)

This should be kept at minimum as a long-form notebook in a different repository (notebooks-lf), or directly within the Nested-Dask and/or LSDB docs. There is potential for this to inform some new best practices for working at scale in our main documentation.

dougbrn · 2024-08-21T19:03:06Z

The notebook that will be run for this is here: https://github.com/lincc-frameworks/notebooks_lf/blob/main/ztf_periodogram/ztf_periodogram_epyc.ipynb

A spreadsheet for run tracking is here: https://docs.google.com/spreadsheets/d/19-GexwAu1TBunGKkCNMU7c3uwhbcLwZXteLLvT3Q48w/edit?gid=0#gid=0

Timing for above is centered on how long it takes to run the histogram plot cell towards the end.

dougbrn · 2024-08-21T19:05:01Z

@wilsonbb will be testing on epyc with the locally available ztf_axs dr14, with a switch to dr20 when available
@hombit will be testing on psc after downloading ztf dr20 to it
@dougbrn will be testing on a local macbook via https

nevencaplar · 2024-08-21T22:38:55Z

Please also test with baldur (which can go ``epyc'').

dougbrn added the Science Driver Use Cases to Drive Development label Jul 8, 2024

dougbrn assigned wilsonbb, hombit and dougbrn Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Science Driver: Running a Periodogram over all ZTF Lightcurves #42

Science Driver: Running a Periodogram over all ZTF Lightcurves #42

dougbrn commented Jul 8, 2024 •

edited

Loading

dougbrn commented Aug 21, 2024

dougbrn commented Aug 21, 2024

nevencaplar commented Aug 21, 2024

Science Driver: Running a Periodogram over all ZTF Lightcurves #42

Science Driver: Running a Periodogram over all ZTF Lightcurves #42

Comments

dougbrn commented Jul 8, 2024 • edited Loading

Describe the desired workflow.

How will doing this driver create impact?

Does this require any new functionality to be added to Nested-Dask?

Should this produce documentation?

dougbrn commented Aug 21, 2024

dougbrn commented Aug 21, 2024

nevencaplar commented Aug 21, 2024

dougbrn commented Jul 8, 2024 •

edited

Loading