Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Required attribute table type not checked consistently #903

Open
peterjc opened this issue Mar 9, 2023 · 6 comments
Open

Required attribute table type not checked consistently #903

peterjc opened this issue Mar 9, 2023 · 6 comments

Comments

@peterjc
Copy link
Contributor

peterjc commented Mar 9, 2023

Reproducible example

Example code based on https://biom-format.org/documentation/table_objects.html#examples

import numpy as np
from biom.table import Table

data = np.arange(40).reshape(10, 4)
sample_ids = ["S%d" % i for i in range(4)]
observ_ids = ["O%d" % i for i in range(10)]
table = Table(
    data,
    observ_ids,
    sample_ids,
    # observ_metadata,
    # sample_metadata,
    # table_id='Example Table'
)

from biom.util import biom_open

with biom_open("example-hdf5.biom", "w") as handle:
    table.to_hdf5(handle, generated_by="BIOM Pycode", compress=True)

with open("example-json.biom", "w") as handle:
    handle.write(table.to_json(generated_by="BIOM Pycode"))

Follow this by command line validation of the output:

$ biom validate-table -i example-hdf5.biom 
$ biom validate-table -i example-json.biom 

Actual behaviour:

Python script runs without error (bad).

HDF5 file passess validation (bad):

$ biom validate-table -i example-hdf5.biom 

The input file is a valid BIOM-formatted file.

JSON file fails validation (good):

$ biom validate-table -i example-json.biom 
Unknown table type, however that is likely okay.
The input file is not a valid BIOM-formatted file.

Expected behaviour:

Runtime error during Table.__init__ since defaults include type=None and validate=True by default, and for all BIOM formats to date, type is a required top level attribute.

https://biom-format.org/documentation/format_versions/biom-1.0.html
https://biom-format.org/documentation/format_versions/biom-2.0.html
https://biom-format.org/documentation/format_versions/biom-2.1.html

Furthermore, using the example BIOM files as generated without the table type, both the JSON and the HDF5 ought to fail consistently.

@wasade
Copy link
Member

wasade commented Mar 9, 2023

Hm, this is going to be a delicate one. If type=None raises, it at a minimum triggers a minor release as this is breaking behavior for the current API and it will create headaches for many users of the library. I'm inclined to, if None, to default to "OTU table" and raise a warning with a deprecation notice that this behavior will be unsupported in the future.

That is correct the validation should be consistent -- good catch.

@peterjc
Copy link
Contributor Author

peterjc commented Mar 9, 2023

I agree that's a good plan - add a deprecation warning in a revision release of the tool, and upgrade to an exception in a minor release.

At least fixing the HDF5 validation to be stricter shouldn't be so complicated 😃

I found this bug because (following the examples), I'd not set the table type myself. In my case "OUT table" is the best match from the options defined.

@wasade
Copy link
Member

wasade commented Mar 9, 2023

Ah yes the classic "OUT table", which coincidentally is a common interpretation by Word :)

@gregcaporaso
Copy link
Contributor

In some offline discussion with @wasade, he suggested that maybe we could grandfather type=None in as allowable by the format. I'm in favor of that as it hasn't caused any problems to-date (that I'm aware of). Changing the behavior would cause other tools to need updating (e.g., QIIME 2, which doesn't set this value - see below), and would make old biom-formatted tables not work with new versions of the software.

In [1]: import qiime2
In [2]: import biom
In [3]: t = qiime2.Artifact.load('table.qza').view(biom.Table)
In [5]: print(t.type)
None

@wasade
Copy link
Member

wasade commented Apr 13, 2023

Thank you, @gregcaporaso!

@peterjc
Copy link
Contributor Author

peterjc commented Mar 8, 2024

Also using biom.Table.from_tsv(...) does not have a type= argument to set the table type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants