Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OMERO table space in column name #57

Open
jburel opened this issue May 14, 2021 · 8 comments
Open

OMERO table space in column name #57

jburel opened this issue May 14, 2021 · 8 comments

Comments

@jburel
Copy link
Member

jburel commented May 14, 2021

Tables in IDR have spaces in most of the columns' name. This implies that it is not possible to retrieve specifying the value in a given column e.g. give me the row with Remdesivir in the Compound Name column.
To filter one needs to load the full table (~15mins loading time) to retrieve few relevant rows, in the remdesivir example, 24/9792 rows are relevant.

@sbesson
Copy link
Member

sbesson commented May 18, 2021

The issue with spaces in column names has been mentioned several times. As far as I understand, the investigation seemed to indication the limitation comes from PyTables i.e. the underlying storage mechanism for OMERO.tables.

Trying to find a few pointers, from the source code, do we know if the querying issues is related to the NaturalNameWarning thrown in:

https://github.com/PyTables/PyTables/blob/0eed850b9031fb540edd2c1ff5c81b91efeba9d6/tables/path.py#L21
https://github.com/PyTables/PyTables/blob/0eed850b9031fb540edd2c1ff5c81b91efeba9d6/tables/path.py#L47-L49
https://github.com/PyTables/PyTables/blob/0eed850b9031fb540edd2c1ff5c81b91efeba9d6/tables/path.py#L87-L90

If this is the underlying problem, other characters commonly used in column headers like () or [] would also suffer from the same issue.

/cc @will-moore

@jburel
Copy link
Member Author

jburel commented May 18, 2021

An option could be to also add the CSV alongside the table.
In some case it is good to have all the data in your hand.

@jburel
Copy link
Member Author

jburel commented May 18, 2021

@joshmoore
Copy link
Member

I can definitely see having the CSV attached as a workaround, but to some extent, it's saying that the tables services does not suffice.

@jburel
Copy link
Member Author

jburel commented May 18, 2021

The CSV is a workaround but can be a valid option depending on the language used to access the data e.g. R due to the data manipulation java <-> R.
As it stands the service is not enough. So we need to revisit it.

@sbesson
Copy link
Member

sbesson commented May 27, 2021

ome/omero-py#287 starts exploring solutions for searching tables using columns with space in names.

The underlying problem is that you cannot write a valid PyTables condition e.g. table.where("my column"=="foo") is not valid.
ome/omero-py#287 contains a proof of concept that these queries are possible using a substitution variable and condvars to map the variable to the appropriate column in the table using getattr.

Currently blocked on passing this condvars mapping using the remote API. Up for discussion, but I suspect one way forward would to define an API passing the mapping as a simple <variable name>: <column name> dictionary and internalize the logic allowing to retrieve the column using getattr.

@jburel
Copy link
Member Author

jburel commented May 27, 2021

The CSV workaround is not really needed, I have opted to use the Web API to load the table data and it works nicely. it has been used in https://github.com/IDR/idr0094-ellinger-sarscov2/blob/master/notebooks/idr0094-ic50.ipynb and https://github.com/IDR/idr0094-ellinger-sarscov2/blob/master/apps/app.R

@sbesson
Copy link
Member

sbesson commented Apr 5, 2022

The corresponding change has been merged upstream in OMERO.py - https://github.com/ome/openmicroscopy/pull/6283/files brings a proof of concept of how to write a query against a column with space in its name. I have not retested in the IDR context but I assume this issue can either can be closed (as we decided it was not an issue specific to the metadata plugin) and/or moved as a documentation issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants