Add Euclid MER HATS Parquet notebook #73

troyraen · 2025-03-25T00:57:43Z

This PR adds a notebook with an introduction to the HATS version of the Euclid Q1 MER catalogs that IRSA is preparing to release. The dataset is currently in a testing bucket that is available from Fornax and IPAC networks only (see nasa-fornax/fornax-demo-notebooks#394 for details).

Note: Before release, I plan to update both the dataset and the notebook to include the data from the Q1 PHZ (photo-z) catalogs along with the MER data that is already there. Many Euclid use cases will require a redshift -- making this product will give users easier access to that information because they won't have to join the tables themselves. We are interested in adding the spectroscopy catalogs as well but that may or may not happen in this first round.

bsipocz

Minor comments. If you plan to push many commits here while developing, we should consider temporarily turning off execution for the rendering, too.

tutorials/parquet-catalog-demos/euclid-hats-parquet.md

bsipocz · 2025-03-28T14:23:38Z

I'm not sure we should do the numpy uninstall trick in a notebook, it's bad enough that we have installs in there 😅

(That said, I wonder why the install command is not picking up on the numpy upgrade, after all the minimum dependency is changed due to lash/hats requirements)

bsipocz

OK, some suggestions for swapping out the pip uninstall line.

tutorials/parquet-catalog-demos/euclid-hats-parquet.md

bsipocz · 2025-03-29T02:35:38Z

Sorry about the conf.py conflict, you may want to rebase now.

Co-authored-by: Brigitta Sipőcz <[email protected]>

troyraen · 2025-03-29T10:47:05Z

Rebased and force pushed.

jaladh-singhal

@troyraen this notebook was a good exercise for me to learn how to access HATS-format data.

The code looks good and was easy to follow, I mostly have comments about text. Please note that my comments are coming from a POV of someone who is new to HATS, LSDB, Dask, etc. so feel free to ignore the ones you think are too obvious for an average reader of this tutorial.

tutorials/parquet-catalog-demos/euclid-hats-parquet.md

jaladh-singhal · 2025-04-01T23:57:57Z

tutorials/parquet-catalog-demos/euclid-hats-parquet.md

+We peeked at the data but we haven't loaded all of it yet.
+What we really need in order to create a CMD is the magnitudes, so let's calculate those now.
+Appending `.compute()` to the commands will trigger Dask to actually load this data into memory.
+It is not strictly necessary, but will allow us to look at the data repeatedly without having to re-load it each time.


I find it helpful if the text warns about long-running cell to avoid assuming that I did something wrong. Following cell took ~12min for me locally (VPNed at home). Maybe we can add a rough estimate here?

Yes, I will add an estimate for Fornax.

You may know already, but in case not, I would expect this to take noticeably longer with your setup (Home -> IPAC -> S3 bucket on east coast) than on Fornax due to proximity to data, time to route through VPN, and that most home internet speeds are slower. But it's convenient! Just fyi, tradeoffs.

tutorials/parquet-catalog-demos/euclid-hats-parquet.md

Co-authored-by: Jaladh Singhal <[email protected]> Co-authored-by: Brigitta Sipőcz <[email protected]>

bsipocz · 2025-05-06T05:55:23Z

Please rebase for sortinh out the conflicting file.

jaladh-singhal

From the discussion on my HATS notebook, I wonder if we can remove hats import from here and replace hats calls in your code with equivalent lsdb calls?

jaladh-singhal · 2025-05-14T01:06:07Z

tutorials/parquet-catalog-demos/euclid-hats-parquet.md

+import os
+
+import dask.distributed
+import hats


Suggested change

import hats

Thanks, yes I am in process of removing hats from this notebook.

jaladh-singhal · 2025-05-14T01:06:58Z

tutorials/parquet-catalog-demos/euclid-hats-parquet.md

+try:
+    # If running from within IPAC's network (maybe VPN'd in with "tunnel-all"),
+    # your IP address acts as your credentials and this should just work.
+    hats.read_hats(euclid_s3_path)


Suggested change

hats.read_hats(euclid_s3_path)

lsdb.read_hats(euclid_s3_path)

jaladh-singhal · 2025-05-14T01:07:24Z

tutorials/parquet-catalog-demos/euclid-hats-parquet.md

+
+```{code-cell}
+# Load the dataset.
+euclid_hats = hats.read_hats(euclid_s3_path)


Suggested change

euclid_hats = hats.read_hats(euclid_s3_path)

euclid_hats = lsdb.read_hats(euclid_s3_path)

jaladh-singhal · 2025-05-14T01:14:14Z

tutorials/parquet-catalog-demos/euclid-hats-parquet.md

+euclid_hats = hats.read_hats(euclid_s3_path)
+
+# Visualize the on-sky distribution of objects in the Q1 MER Catalog.
+hats.inspection.plot_density(euclid_hats)


Not sure of exact lsdb equivalent - skymap_histogram()?

jaladh-singhal · 2025-05-14T01:15:04Z

tutorials/parquet-catalog-demos/euclid-hats-parquet.md

+
+```{code-cell}
+# Visualize the HEALPix orders of the dataset partitions.
+hats.inspection.plot_pixels(euclid_hats)


Suggested change

hats.inspection.plot_pixels(euclid_hats)

euclid_hats.plot_pixels()

jaladh-singhal · 2025-05-14T01:15:41Z

tutorials/parquet-catalog-demos/euclid-hats-parquet.md

+
+```{code-cell}
+# Fetch the pyarrow schema from hats.
+euclid_hats = hats.read_hats(euclid_s3_path)


Suggested change

euclid_hats = hats.read_hats(euclid_s3_path)

euclid_hats = lsdb.read_hats(euclid_s3_path)

jaladh-singhal · 2025-05-14T01:23:51Z

tutorials/parquet-catalog-demos/euclid-hats-parquet.md

+```{code-cell}
+# Fetch the pyarrow schema from hats.
+euclid_hats = hats.read_hats(euclid_s3_path)
+schema = euclid_hats.schema


not sure how to extract pyarrow schema directly from lsdb catalog

Ya, I'm not sure there's a user-friendly way to get the whole schema with lsdb. I'll check once more, but I think pyarrow will be simpler for this.

troyraen · 2025-05-16T19:22:02Z

I have added several more Euclid Q1 tables (>3x more columns) to this dataset plus a couple of ancillary HATS products (nasa-fornax/fornax-demo-notebooks#416) since this notebook was first drafted. I am redrafting the notebook now and it will be quite a bit different in order to explain the full product and demonstrate things. So I will close this PR and open a new one in order to ease the review process. Thanks for your feedback everyone!

bsipocz reviewed Mar 25, 2025

View reviewed changes

tutorials/parquet-catalog-demos/euclid-hats-parquet.md Outdated Show resolved Hide resolved

tutorials/parquet-catalog-demos/euclid-hats-parquet.md Outdated Show resolved Hide resolved

tutorials/parquet-catalog-demos/euclid-hats-parquet.md Outdated Show resolved Hide resolved

bsipocz added content Content related issues/PRs. html rendering / skip testing Rendering related issues/PRs. Skips tests in PRs. labels Mar 25, 2025

troyraen marked this pull request as ready for review March 25, 2025 06:24

troyraen requested a review from afaisst March 25, 2025 06:25

troyraen commented Mar 25, 2025

View reviewed changes

tutorials/parquet-catalog-demos/euclid-hats-parquet.md Show resolved Hide resolved

troyraen changed the title ~~Add Euclid HATS Parquet notebook~~ Add Euclid MER HATS Parquet notebook Mar 25, 2025

troyraen mentioned this pull request Mar 26, 2025

HATS-style Euclid Q1 MER Catalogs now available in S3 for testing nasa-fornax/fornax-demo-notebooks#394

Closed

afaisst reviewed Mar 26, 2025

View reviewed changes

tutorials/parquet-catalog-demos/euclid-hats-parquet.md Outdated Show resolved Hide resolved

afaisst reviewed Mar 26, 2025

View reviewed changes

tutorials/parquet-catalog-demos/euclid-hats-parquet.md Outdated Show resolved Hide resolved

afaisst reviewed Mar 26, 2025

View reviewed changes

tutorials/parquet-catalog-demos/euclid-hats-parquet.md Outdated Show resolved Hide resolved

afaisst reviewed Mar 26, 2025

View reviewed changes

tutorials/parquet-catalog-demos/euclid-hats-parquet.md Outdated Show resolved Hide resolved

afaisst reviewed Mar 26, 2025

View reviewed changes

tutorials/parquet-catalog-demos/euclid-hats-parquet.md Show resolved Hide resolved

afaisst reviewed Mar 26, 2025

View reviewed changes

tutorials/parquet-catalog-demos/euclid-hats-parquet.md Outdated Show resolved Hide resolved

bsipocz approved these changes Mar 28, 2025

View reviewed changes

tutorials/parquet-catalog-demos/euclid-hats-parquet.md Outdated Show resolved Hide resolved

tutorials/parquet-catalog-demos/euclid-hats-parquet.md Outdated Show resolved Hide resolved

bsipocz reviewed Mar 28, 2025

View reviewed changes

tutorials/parquet-catalog-demos/euclid-hats-parquet.md Outdated Show resolved Hide resolved

troyraen mentioned this pull request Mar 29, 2025

Update Dask extension instructions nasa-fornax/fornax-documentation#65

Open

troyraen and others added 11 commits March 29, 2025 03:44

Drop obsolete .gitkeep

33f822f

Add euclid-hats-parquet notebook

52c966b

Update index.md

f90fc39

TMP: ignore euclid-hats-parquet notebook until data is fully public

1a3eef1

Apply review feedback from @bsipocz

9335d49

Add a sentence about Parquet

8bc87e4

Add the common Euclid abbreviations "ERO" and "Q1" index headers

a43f395

Apply @afaisst feedback. Use euclid_s3_path.

af6b5d7

Temp fix. Uninstall numpy and pyerfa before installs.

d4e8b48

Add anon=True option for IPAC

6dcd3e1

Adding new dependencies to the central requirements file, too

c4c46dd

troyraen and others added 4 commits March 29, 2025 03:45

Apply suggestions from @bsipocz code review

1dceedd

Co-authored-by: Brigitta Sipőcz <[email protected]>

Apply suggestions from @bsipocz code review

31a55ca

Co-authored-by: Brigitta Sipőcz <[email protected]>

Add pyerfa>=2.0.1.3 to binder requirements.txt. Needed for numpy>=2.0.

7eb2cc1

Apply suggestions from @afaisst and @bsipocz code reviews.

f1c5686

troyraen force-pushed the add/euclid/q1 branch from cd88285 to f1c5686 Compare March 29, 2025 10:46

jaladh-singhal reviewed Apr 2, 2025

View reviewed changes

bsipocz reviewed May 1, 2025

View reviewed changes

tutorials/parquet-catalog-demos/euclid-hats-parquet.md Show resolved Hide resolved

Apply suggestions from code review

1325150

Co-authored-by: Jaladh Singhal <[email protected]> Co-authored-by: Brigitta Sipőcz <[email protected]>

Apply suggestions from @jaladh-singhal code review

8b71974

troyraen mentioned this pull request May 8, 2025

Euclid Q1 Catalog in HATS format available for testing nasa-fornax/fornax-demo-notebooks#416

Open

jaladh-singhal reviewed May 14, 2025

View reviewed changes

troyraen closed this May 16, 2025

	hats.read_hats(euclid_s3_path)
	lsdb.read_hats(euclid_s3_path)

	euclid_hats = hats.read_hats(euclid_s3_path)
	euclid_hats = lsdb.read_hats(euclid_s3_path)

	hats.inspection.plot_pixels(euclid_hats)
	euclid_hats.plot_pixels()

Add Euclid MER HATS Parquet notebook #73

Add Euclid MER HATS Parquet notebook #73

Uh oh!

Conversation

troyraen commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bsipocz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bsipocz commented Mar 28, 2025

Uh oh!

bsipocz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bsipocz commented Mar 29, 2025

Uh oh!

troyraen commented Mar 29, 2025

Uh oh!

jaladh-singhal left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bsipocz commented May 6, 2025

Uh oh!

jaladh-singhal left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

troyraen commented May 16, 2025

Uh oh!

Uh oh!

troyraen commented Mar 25, 2025 •

edited

Loading