Dimension names not saving correctly when adding non-standard columns #334

chrisfinlay · 2024-06-21T11:39:56Z

dask-ms version: 0.2.21
Python version: 3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0]
Operating System: Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-112-generic x86_64)

Description

I would like to add a non standard column with dims=("row", "chan", "corr") and save this to a MS.
When doing this and reading the MS file again the dims come out as ('row', 'COLNAME-1', 'COLNAME-2') as opposed to ('row', 'chan', 'corr').

What I Did

from daskms import xds_from_table, xds_to_table
from daskms.example_data import example_ms
import dask.array as da
import dask

ms = example_ms()
datasets = xds_from_table(ms)
# Add BITFLAG data to datasets
for i, ds in enumerate(datasets):
    datasets[i] = ds.assign(BITFLAG=(("row", "chan", "corr"),
                                          da.zeros_like(ds.DATA.data)))
# Write data back to ms
writes = xds_to_table(datasets, ms, ["BITFLAG"])
dask.compute(writes)

datasets = xds_from_table(ms)
print(datasets[0].BITFLAG.dims)

Output

('row', 'BITFLAG-1', 'BITFLAG-2')
Data type: 24, SORT_COLUMNSnot handled
Data type: 24, SORT_ORDERnot handled

Sidenote

Is Data type: 24, SORT_COLUMNSnot handled an issue?

The text was updated successfully, but these errors were encountered:

landmanbester · 2024-06-21T11:48:35Z

This may have been fixed in the latest release. You can get around it by passing a schema when reading eg.

schema = {}
schema['BITFLAG] = {'dims': ('chan', 'corr')}
datasets = xds_from_ms(ms, table_schema=schema)

Note the use of xds_from_ms instead of xds_from_table, not sure if the latter accepts schemas

sjperkins · 2024-06-21T11:49:34Z

Try replace the xds_from_table calls with xds_from_ms?

dataset = xds_from_ms(ms)

Interestingly, a heuristic was recently added to try and plug this gap:

Identify channel and correlation-like dimensions in non-standard MS columns #329

sjperkins · 2024-06-21T11:50:37Z

This may have been fixed in the latest release. You can get around it by passing a schema when reading eg.
schema = {}
schema['BITFLAG] = {'dims': ('chan', 'corr')}
datasets = xds_from_ms(ms, table_schema=schema)
Note the use of xds_from_ms instead of xds_from_table, not sure if the latter accepts schemas

Thanks @landmanbester :-)

sjperkins · 2024-06-21T11:53:56Z

Sidenote
Is Data type: 24, SORT_COLUMNSnot handled an issue?

IIRC this means that SORT_COLUMNS has a relatively unused data type like short or unsigned short that isn't handled everywhere in the casacore code base. Do you need this column?

chrisfinlay · 2024-06-21T11:55:39Z

Thanks for the quick response. xds_from_ms does fix this, however, in my actual problem I am creating an MS from scratch with daskms so I do not use xds_from_{ms|table}. Can I define the schema in xds_to_table somehow?

chrisfinlay · 2024-06-21T11:58:13Z

IIRC this means that SORT_COLUMNS has a relatively unused data type like short or unsigned short that isn't handled everywhere in the casacore code base. Do you need this column?

Not that I know of but I wanted to check in case it causes issues that I am unaware of.

sjperkins · 2024-06-21T11:59:20Z

Thanks for the quick response. xds_from_ms does fix this, however, in my actual problem I am creating an MS from scratch with daskms so I do not xds_from_{ms|table} can I define the schema in xds_to_table somehow?

The schema is implicit in the xarray datasets you pass to xds_to_table as the dimension names will need to be specified for each variable. I would guess you'll assign (row, chan, corr) in the appropriate variables.

xds_from_ms should identfiy these cases up when opening the written Measurement Set, given the heuristic applied in:

Identify channel and correlation-like dimensions in non-standard MS columns #329

chrisfinlay · 2024-06-21T12:05:41Z

Ok I see now. It doesn't matter about using xds_from_ms for the initial read but specifically the second read where there are non-standard columns present.

chrisfinlay · 2024-06-21T12:06:41Z

@sjperkins @landmanbester Thanks again for the quick response!

sjperkins · 2024-06-21T12:08:32Z

Ok I see now. It doesn't matter about using xds_from_ms for the initial read but specifically the second read where there are non-standard columns present.

Yes, another way of understanding this is that those dimension names are inferred from the MSv2 spec. For non-standard columns, dask-ms does some "inference".

chrisfinlay closed this as completed Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dimension names not saving correctly when adding non-standard columns #334

Dimension names not saving correctly when adding non-standard columns #334

chrisfinlay commented Jun 21, 2024

landmanbester commented Jun 21, 2024

sjperkins commented Jun 21, 2024

sjperkins commented Jun 21, 2024

sjperkins commented Jun 21, 2024

chrisfinlay commented Jun 21, 2024 •

edited

Loading

chrisfinlay commented Jun 21, 2024 •

edited

Loading

sjperkins commented Jun 21, 2024

chrisfinlay commented Jun 21, 2024

chrisfinlay commented Jun 21, 2024

sjperkins commented Jun 21, 2024

Dimension names not saving correctly when adding non-standard columns #334

Dimension names not saving correctly when adding non-standard columns #334

Comments

chrisfinlay commented Jun 21, 2024

Description

What I Did

Output

Sidenote

landmanbester commented Jun 21, 2024

sjperkins commented Jun 21, 2024

sjperkins commented Jun 21, 2024

sjperkins commented Jun 21, 2024

chrisfinlay commented Jun 21, 2024 • edited Loading

chrisfinlay commented Jun 21, 2024 • edited Loading

sjperkins commented Jun 21, 2024

chrisfinlay commented Jun 21, 2024

chrisfinlay commented Jun 21, 2024

sjperkins commented Jun 21, 2024

chrisfinlay commented Jun 21, 2024 •

edited

Loading

chrisfinlay commented Jun 21, 2024 •

edited

Loading