Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify channel and correlation-like dimensions in non-standard MS columns #329

Merged
merged 7 commits into from
May 13, 2024

Conversation

sjperkins
Copy link
Member

@sjperkins sjperkins commented May 13, 2024

@sjperkins
Copy link
Member Author

I also want to to try address #268 in this PR.

@landmanbester
Copy link
Collaborator

I'm gonna rerun a convert command that used to fail to confirm that the fix works but it looks pretty uncontroversial to me. I'll get back to you as soon as the convert is done

Copy link
Collaborator

@JSKenyon JSKenyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Am correct in saying that this doesn't store this information - it simply presumes that any non-standard column will probably be visibility shaped? This will work in 99.99% of cases, but if the sizes of chan and corr are the same it may fail/assign (row, chan, chan).

@sjperkins
Copy link
Member Author

Looks good! Am correct in saying that this doesn't store this information - it simply presumes that any non-standard column will probably be visibility shaped? This will work in 99.99% of cases, but if the sizes of chan and corr are the same it may fail/assign (row, chan, chan).

Its slightly more intelligent than that, it checks if the dimension name is something like {varname}-{d} where varname is the variable name and d is the dimension number. Then it checks if it matches chan or corr.

@sjperkins
Copy link
Member Author

Looks good! Am correct in saying that this doesn't store this information - it simply presumes that any non-standard column will probably be visibility shaped? This will work in 99.99% of cases, but if the sizes of chan and corr are the same it may fail/assign (row, chan, chan).

Yep, I think this should try to handle the chan == corr case by prioritising chan, and then corr.

@JSKenyon
Copy link
Collaborator

One extra note - this doesn't work if you only select the column with unknown dimensions:

print(xds_from_ms("msdir/C147_minimal.MS", columns=("TEST_DATA",))[0])

prints

<xarray.Dataset> Size: 3GB
Dimensions:    (row: 1374420, TEST_DATA-1: 64, TEST_DATA-2: 4)
Coordinates:
    ROWID      (row) int32 5MB dask.array<chunksize=(10000,), meta=np.ndarray>
Dimensions without coordinates: row, TEST_DATA-1, TEST_DATA-2
Data variables:
    TEST_DATA  (row, TEST_DATA-1, TEST_DATA-2) complex64 3GB dask.array<chunksize=(10000, 64, 4), meta=np.ndarray>
Attributes:
    __daskms_partition_schema__:  (('FIELD_ID', 'int32'), ('DATA_DESC_ID', 'i...
    FIELD_ID:                     0
    DATA_DESC_ID:                 0

@sjperkins
Copy link
Member Author

One extra note - this doesn't work if you only select the column with unknown dimensions:

Yes. Unfortunately due to the fact that dask-ms was designed to handle arbitrary tables produced by the CTDS, it doesn't really reason about the MS structure very deeply -- so the returned datasets will only have chan or corr dimensions if standard columns are read to provide this information.

Is this a critical blocker to your current workflows or could this be left to a future PR? I'd imagine that dask-ms convert should work with this change as it bulk exports MS columns.

@JSKenyon
Copy link
Collaborator

Is this a critical blocker to your current workflows or could this be left to a future PR? I'd imagine that dask-ms convert should work with this change as it bulk exports MS columns.

Not really, as the schema option still exists (QC always used it for extra model/weight columns). As you say, this should be good enough on the convert. Just thought it worth mentioning as it was the first thing I stumbled over when trying out the branch.

@landmanbester
Copy link
Collaborator

Is this a critical blocker to your current workflows or could this be left to a future PR? I'd imagine that dask-ms convert should work with this change as it bulk exports MS columns.

This will work perfectly almost every time. Thanks @sjperkins

@sjperkins sjperkins merged commit 747408f into master May 13, 2024
@sjperkins sjperkins deleted the post-process-ms-like-columns branch May 13, 2024 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants