Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dimension names not saving correctly when adding non-standard columns #334

Closed
chrisfinlay opened this issue Jun 21, 2024 · 10 comments
Closed

Comments

@chrisfinlay
Copy link
Contributor

  • dask-ms version: 0.2.21
  • Python version: 3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0]
  • Operating System: Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-112-generic x86_64)

Description

I would like to add a non standard column with dims=("row", "chan", "corr") and save this to a MS.
When doing this and reading the MS file again the dims come out as ('row', 'COLNAME-1', 'COLNAME-2') as opposed to ('row', 'chan', 'corr').

What I Did

from daskms import xds_from_table, xds_to_table
from daskms.example_data import example_ms
import dask.array as da
import dask

ms = example_ms()
datasets = xds_from_table(ms)
# Add BITFLAG data to datasets
for i, ds in enumerate(datasets):
    datasets[i] = ds.assign(BITFLAG=(("row", "chan", "corr"),
                                          da.zeros_like(ds.DATA.data)))
# Write data back to ms
writes = xds_to_table(datasets, ms, ["BITFLAG"])
dask.compute(writes)

datasets = xds_from_table(ms)
print(datasets[0].BITFLAG.dims)
Output
('row', 'BITFLAG-1', 'BITFLAG-2')
Data type: 24, SORT_COLUMNSnot handled
Data type: 24, SORT_ORDERnot handled
Sidenote

Is Data type: 24, SORT_COLUMNSnot handled an issue?

@landmanbester
Copy link
Collaborator

This may have been fixed in the latest release. You can get around it by passing a schema when reading eg.

schema = {}
schema['BITFLAG] = {'dims': ('chan', 'corr')}
datasets = xds_from_ms(ms, table_schema=schema)

Note the use of xds_from_ms instead of xds_from_table, not sure if the latter accepts schemas

@sjperkins
Copy link
Member

Try replace the xds_from_table calls with xds_from_ms?

dataset = xds_from_ms(ms)

Interestingly, a heuristic was recently added to try and plug this gap:

@sjperkins
Copy link
Member

This may have been fixed in the latest release. You can get around it by passing a schema when reading eg.

schema = {}
schema['BITFLAG] = {'dims': ('chan', 'corr')}
datasets = xds_from_ms(ms, table_schema=schema)

Note the use of xds_from_ms instead of xds_from_table, not sure if the latter accepts schemas

Thanks @landmanbester :-)

@sjperkins
Copy link
Member

Sidenote
Is Data type: 24, SORT_COLUMNSnot handled an issue?

IIRC this means that SORT_COLUMNS has a relatively unused data type like short or unsigned short that isn't handled everywhere in the casacore code base. Do you need this column?

@chrisfinlay
Copy link
Contributor Author

chrisfinlay commented Jun 21, 2024

Thanks for the quick response. xds_from_ms does fix this, however, in my actual problem I am creating an MS from scratch with daskms so I do not use xds_from_{ms|table}. Can I define the schema in xds_to_table somehow?

@chrisfinlay
Copy link
Contributor Author

chrisfinlay commented Jun 21, 2024

IIRC this means that SORT_COLUMNS has a relatively unused data type like short or unsigned short that isn't handled everywhere in the casacore code base. Do you need this column?

Not that I know of but I wanted to check in case it causes issues that I am unaware of.

@sjperkins
Copy link
Member

Thanks for the quick response. xds_from_ms does fix this, however, in my actual problem I am creating an MS from scratch with daskms so I do not xds_from_{ms|table} can I define the schema in xds_to_table somehow?

The schema is implicit in the xarray datasets you pass to xds_to_table as the dimension names will need to be specified for each variable. I would guess you'll assign (row, chan, corr) in the appropriate variables.

xds_from_ms should identfiy these cases up when opening the written Measurement Set, given the heuristic applied in:

@chrisfinlay
Copy link
Contributor Author

Ok I see now. It doesn't matter about using xds_from_ms for the initial read but specifically the second read where there are non-standard columns present.

@chrisfinlay
Copy link
Contributor Author

@sjperkins @landmanbester Thanks again for the quick response!

@sjperkins
Copy link
Member

Ok I see now. It doesn't matter about using xds_from_ms for the initial read but specifically the second read where there are non-standard columns present.

Yes, another way of understanding this is that those dimension names are inferred from the MSv2 spec. For non-standard columns, dask-ms does some "inference".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants