Update example measurement sets to a more original representation #6

iancze · 2023-12-06T21:42:12Z

Currently, we store example visibility sets in a .npy or .asdf format uploaded to Zenodo. There is some variation, but generally these are:

data (complex visibility), often with shape (nchan, nvis)
spatial frequencies $u$ and $v$ in kilo $\lambda$, with shape (nchan, nvis)
flag (Boolean), with shape (nchan, nvis)
weight, with shape (nchan, nvis)

But it's more efficient to save these as CASA does,

$u$, $v$ in meters, shape (nvis)
channel frequencies, shape (nchan)
data (nchan, nvis) (assuming we average over the npol dimension)
flag (nchan, nvis) (assuming we average over the npol dimension)
weight (nvis) (assuming we average over the npol dimension, and the weights are not channelized)

I think there are a few benefits to this.

The file size is much smaller, so this should speed up downloads for builds
Tutorials can show the user how to convert from the format their data is likely to be in into the format used by MPoL ($u$ and $v$ in $\lambda$, following MPoL #223)

@jeffjennings we might want to discuss this as part of a larger redesign for MPoL #223 and tutorials (#63).

The text was updated successfully, but these errors were encountered:

jeffjennings · 2023-12-07T16:33:35Z

Sounds good, can cover it Monday. The commit history of the large versions of the files will also have to be removed, else the file size won't decrease in downloads. https://github.com/newren/git-filter-repo might be useful for this.

iancze · 2023-12-07T16:41:26Z

I'm not sure if the commit history matters for the Zenodo repo, since we're downloading these files directly from the Zenodo repo links?

Separately, it is a good idea to scan the MPoL repo to see what large binary files may be lurking in commits, and whether we can safely remove them.

jeffjennings · 2023-12-07T16:44:11Z

Ah right mpoldatasets isn't a dependency of MPoL.

But yes that's a good idea.

iancze · 2023-12-07T16:50:17Z

The mpoldatasets git repo should be pretty lightweight, since it's just source code and Makefiles. The code downloads relevant datasets, does clean / reweighting / averaging etc. to produce large datasets in .npy or .asdf and then uses the Zenodo API to upload them to the Zenodo repository. I think we'll need to update several of the elements of this package to point to the new repo (possibly updated API token) and the other suggested changes in this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update example measurement sets to a more original representation #6

Update example measurement sets to a more original representation #6

iancze commented Dec 6, 2023

jeffjennings commented Dec 7, 2023

iancze commented Dec 7, 2023

jeffjennings commented Dec 7, 2023

iancze commented Dec 7, 2023

Update example measurement sets to a more original representation #6

Update example measurement sets to a more original representation #6

Comments

iancze commented Dec 6, 2023

jeffjennings commented Dec 7, 2023

iancze commented Dec 7, 2023

jeffjennings commented Dec 7, 2023

iancze commented Dec 7, 2023