Csv to tables scripts #5

will-moore · 2021-05-27T14:24:12Z

Adds a script used in OMERO.figure workshop prep. See ome/omero-figure#431

Script csv_to_roi_table.py creates OMERO.tables on Images, corresponding to ROIs on the image and would be useful to have these tables in IDR.

It uses the *_other_measurements.tsv file for each Image. But there are 30 other .tsv files for each Image (see https://github.com/IDR/idr0079-hartmann-lateralline/tree/master/experimentA/idr0079_experimentA_extracted_measurements/00E41C184C) so we'd need to decide which ones to pick.

NB: This script loads the ROIs on each image using roi_service.findByImage() and assumes that these are in the same order as the ROIs in the .tsv table. This has been verified experimentally for 1 or 2 images during workshop prep above.

NB: This script re-names columns, replacing white-space with underscore, to avoid issues with queries. But this may not be needed with ome/omero-py#287?

Don't need to use coordinates to look up each ROI

sbesson · 2021-06-24T15:47:24Z

Briefly tested on pilot-idr0104 using the csv_to_table_on_project.py. A few notes:

the script requires a recent version of omero-metadata with the allow_nans functionality. I think there was an issue with deploying this to production but I will dig it up
the script executes quickly and generates two file annotations attached at the project level
the new bulk annotation table can be inspected individually using the Web UI
at the image level, an issue is that the Web client does not handle multiple tables at the same container level so only the last one is displayed and override the bulk annotation table

From my side, the major question is whether these extra columns should be appended to the single bulk annotation of the project i.e. update the annotation.csv with extra column and replace it on production. This would be more amenable to the standard worfklow that has been used previously when library/assays file and processed files e.g. containing features are combined into a single annotation CSV/bulk annotation.

Agreed that the question of the column renaming could be made unncessary as per the API extension proposed in ome/omero-py#287? We still need to define how these queries could be passed as a URL and converted into the relevant getWhereList column.

will-moore · 2021-06-24T15:58:34Z

Ah, apologies @sbesson I meant to remove csv_to_table_on_project.py since that was just a sample of columns that could be summarised for the OMERO.parade demo (which needs a table on the Project). I updated the PR description but didn't remove the code.
But I'm not sure that has so much value for IDR?

The bulk of this PR is the csv_to_roi_table.py which creates a Table on each Image.

sbesson

With the inline modification to the csv_to_roi_table.py, the script worked as expected iterating through images and creating tables:

A few comments:

generally 💯 for augmenting the value of datasets and exposing cell-level features via OMERO.tables more systematically
I would remove csv_to_table_on_project.py scrpt if it is unused
the CSV file attached to the image feels redundant with the table. Could the intermediate CSV be stored in a temporary folder and passed to the metadata population parser?
there are definitely several measurement tables per image. From a quick look, all of these tables are consistent in terms of rows i.e. one row per ROI. It should be possible to concatenate multiple (or all) of them and prefix the columns names. I have no intuition on the most valuable. Would that be something the submitter would be able to help us with?

scripts/csv_to_roi_table.py

will-moore · 2021-08-12T15:25:15Z

@sbesson Those commits should address your points. Also e-mailed Jonas for feedback on what tables to include.

will-moore · 2021-08-18T14:53:47Z

Feedback from Jonas:

"shape_CFOR_pca_measured.tsv" and "shape_TFOR_pca_measured.tsv" -- unbiased cell shape descriptors; only the first 5-10 columns (principal components) are of interest, as they explain most of the cell population's variance.
"pea3smFISH_RNAcounts_predicted.tsv" -- predicted expression of pea3 mRNA; an example of multi-modal data integration.
"archetype_TFOR_classifications.tsv" -- classification of cells into different morphological archetypes; as this is categorical data, it can be used to stratify analyses/plots of the other measurements.

Number of columns for tables (other than Source Name and Cell ID:

other_measurements.tsv: 26 (see above)
shape_CFOR_pca_measured.tsv and shape_TFOR_pca_measured.tsv: 60 each (PC 1 -> PC 60)
pea3smFISH_RNAcounts_predicted.tsv: 1 pea3 spot count
archetype_TFOR_classifications.tsv: 1 Predicted Archetype

@seb, I think we could add each of those 5 tables to each image.
I could look at updating the script to combine tables but we wouldn't want to combine the 2 *pca_measured tables as they have identical column names. It might be possible to combine the other tables that have a single column of data, to add that into the other_measurements but the effort is probably not worth it and it distorts the submitted data.

I'll update the script to take a table name filter, then run it on a pilot somewhere?

sbesson · 2021-08-18T14:55:01Z

I'll update the script to take a table name filter, then run it on a pilot somewhere?

👍 sounds like the best plan of action

will-moore · 2021-08-18T16:19:28Z

One limitation of populate_metadata is that it always names the files bulk_annotations regardless of the name of the table.
So if we are relying on the name to know which table to find a particular column, or (as above) column names are identical between different tables, we are in trouble:

I think I'll open a omero-metadata PR to add a table_name option...

sbesson · 2021-08-18T18:36:34Z

From a quick look at the code, the omero_metadata.DEFAULT_TABLE_NAME is only used for the CSV -> OMERO.tables population workflow and nothing else depends on it at the level of the plugin. The OMERO.tables -> map annotations workflow selection is based on the bulk annotation namespace.

Looking into adding support for custom bulk annotation OMERO.table names makes sense to me at least in the context of this work.

will-moore · 2021-08-20T11:31:18Z

Running locally (using populate-metadata branch above) connected to pilot-idr0101:

$ python scripts/csv_to_roi_table.py _other_measurements.tsv
... Successfully attached 165 tables to images

# 2 tables at once:
$ python scripts/csv_to_roi_table.py _pea3smFISH_RNAcounts_predicted.tsv,_archetype_TFOR_classifications.tsv
# connection failed - a few images not processed with archetype_TFOR_classifications.tsv

$ python scripts/csv_to_roi_table.py _shape_CFOR_pca_measured.tsv
...Successfully attached 165 tables to images

$ python scripts/csv_to_roi_table.py _shape_TFOR_pca_measured.tsv
...Successfully attached 165 tables to images

will-moore · 2021-08-20T11:41:51Z

With the tables added, you can filter by query, e.g. to find cells with a least negative X coordinate, open the other_measurements table linked to an Image, add query like:

/webclient/omero_table/ID/?query=Centroids_TFOR_X>-10

Clicking on the ROI links will show the ROIs in iviewer - and can confirm that they are do have expected X coordinate (to the right of the image).

sbesson

Tested on a few sample raw images from idr0079-hartmann-lateralline/experimentA:

the images contain several attachement: a zip with the original sources (already present in the DB) and several tables with the openmicroscopy.org/omero/bulk_annotations namespace
the tables have all different names reflecting the original source
opening the tables gives a Roi column which links to the OMERO.iviewer ROI URLs

From my side this feels like a useful addition for querying masks. Leaving @francesw @pwalczysko and others a chance to comment but my vote would be for deploying this in prod100.

While testing the table filtering in the client, I noticed some issue with complex queries. The one suggested by the text i.e. ?query=(Cell_ID>11)&(Cell_ID<14) works but ?query=(Centroids_RAW_Z>10)&(Centroids_RAW_Z<11) only applies the first part of the query. I do not know whether it is an existing limitation, otherwise I can turn this into an omero-web issue

will-moore · 2021-08-23T12:21:22Z

@sbesson - This is an issue with the URL encoding. If you use ?query=(Centroids_RAW_Z>10)%26(Centroids_RAW_Z<11) it works. And this is the form used when you click on the suggested query.
Not sure the best way to explain that and avoid this issue on the page. Any ideas?

sbesson · 2021-08-23T12:46:09Z

Understood, I had missed that the ampersand was encoded when clicking on the sample query. Then I agree it's only a usability issue. I suspect one extreme solution would be a more advanced UI allowing you to construct these queries via boxes. In the menatime, could we detect query fragment with a decoded & and suggest the encoded version instead?

will-moore added 6 commits May 20, 2021 07:39

Add csv_to_roi_table.py

14855f9

Add csv_to_table_on_project.py

1766954

Update to csv_to_roi_table.py

7b192f6

csv_to_roi_table.py assumes ROI ordered

a3519e1

Don't need to use coordinates to look up each ROI

Use relative path to .tsv. Add TFOR cols to table

09f6802

Tweak usage info on csv_to_roi_table.py

815a0aa

will-moore mentioned this pull request May 27, 2021

ome 2021 workshop ome/training-scripts#88

Merged

will-moore added 2 commits June 22, 2021 10:43

Remove duplicate rows for every Shape

336d6e5

csv_to_roi_table.py doesn't edit roi.name

3857d18

sbesson requested changes Jul 1, 2021

View reviewed changes

scripts/csv_to_roi_table.py Outdated Show resolved Hide resolved

will-moore added 4 commits August 12, 2021 15:23

Remove unused csv_to_table_on_project.py

ada9d50

Tweak usage comment in csv_to_roi_table.py

b5e8474

Don't create csv file annotation. Just pass csv to create table

7b58357

Remove column type for previous 'shape' column

1f218d2

Add table_name parameter to csv_to_roi_table.py

4bd7dc6

Set bulk annotations table_name

57e507f

will-moore mentioned this pull request Aug 18, 2021

Add table_name option to ParsingContext() ome/omero-metadata#61

Merged

Remove csv after populate_metadata

dc9d9fc

sbesson approved these changes Aug 23, 2021

View reviewed changes

sbesson merged commit 62b7656 into IDR:master Aug 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Csv to tables scripts #5

Csv to tables scripts #5

will-moore commented May 27, 2021 •

edited

Loading

sbesson commented Jun 24, 2021

will-moore commented Jun 24, 2021

sbesson left a comment

will-moore commented Aug 12, 2021

will-moore commented Aug 18, 2021

sbesson commented Aug 18, 2021

will-moore commented Aug 18, 2021

sbesson commented Aug 18, 2021 •

edited

Loading

will-moore commented Aug 20, 2021

will-moore commented Aug 20, 2021

sbesson left a comment

will-moore commented Aug 23, 2021 •

edited

Loading

sbesson commented Aug 23, 2021

Csv to tables scripts #5

Csv to tables scripts #5

Conversation

will-moore commented May 27, 2021 • edited Loading

sbesson commented Jun 24, 2021

will-moore commented Jun 24, 2021

sbesson left a comment

Choose a reason for hiding this comment

will-moore commented Aug 12, 2021

will-moore commented Aug 18, 2021

sbesson commented Aug 18, 2021

will-moore commented Aug 18, 2021

sbesson commented Aug 18, 2021 • edited Loading

will-moore commented Aug 20, 2021

will-moore commented Aug 20, 2021

sbesson left a comment

Choose a reason for hiding this comment

will-moore commented Aug 23, 2021 • edited Loading

sbesson commented Aug 23, 2021

will-moore commented May 27, 2021 •

edited

Loading

sbesson commented Aug 18, 2021 •

edited

Loading

will-moore commented Aug 23, 2021 •

edited

Loading