Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Csv to tables scripts #5

Merged
merged 15 commits into from
Aug 30, 2021
Merged

Conversation

will-moore
Copy link
Member

@will-moore will-moore commented May 27, 2021

Adds a script used in OMERO.figure workshop prep. See ome/omero-figure#431

Script csv_to_roi_table.py creates OMERO.tables on Images, corresponding to ROIs on the image and would be useful to have these tables in IDR.

It uses the *_other_measurements.tsv file for each Image. But there are 30 other .tsv files for each Image (see https://github.com/IDR/idr0079-hartmann-lateralline/tree/master/experimentA/idr0079_experimentA_extracted_measurements/00E41C184C) so we'd need to decide which ones to pick.

NB: This script loads the ROIs on each image using roi_service.findByImage() and assumes that these are in the same order as the ROIs in the .tsv table. This has been verified experimentally for 1 or 2 images during workshop prep above.

NB: This script re-names columns, replacing white-space with underscore, to avoid issues with queries. But this may not be needed with ome/omero-py#287?

@sbesson
Copy link
Member

sbesson commented Jun 24, 2021

Briefly tested on pilot-idr0104 using the csv_to_table_on_project.py. A few notes:

  • the script requires a recent version of omero-metadata with the allow_nans functionality. I think there was an issue with deploying this to production but I will dig it up
  • the script executes quickly and generates two file annotations attached at the project level
    Screenshot 2021-06-24 at 16 14 44
  • the new bulk annotation table can be inspected individually using the Web UI
    Screenshot 2021-06-24 at 16 14 51
  • at the image level, an issue is that the Web client does not handle multiple tables at the same container level so only the last one is displayed and override the bulk annotation table
    Screenshot 2021-06-24 at 16 15 22

From my side, the major question is whether these extra columns should be appended to the single bulk annotation of the project i.e. update the annotation.csv with extra column and replace it on production. This would be more amenable to the standard worfklow that has been used previously when library/assays file and processed files e.g. containing features are combined into a single annotation CSV/bulk annotation.

Agreed that the question of the column renaming could be made unncessary as per the API extension proposed in ome/omero-py#287? We still need to define how these queries could be passed as a URL and converted into the relevant getWhereList column.

@will-moore
Copy link
Member Author

Ah, apologies @sbesson I meant to remove csv_to_table_on_project.py since that was just a sample of columns that could be summarised for the OMERO.parade demo (which needs a table on the Project). I updated the PR description but didn't remove the code.
But I'm not sure that has so much value for IDR?

The bulk of this PR is the csv_to_roi_table.py which creates a Table on each Image.

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the inline modification to the csv_to_roi_table.py, the script worked as expected iterating through images and creating tables:

Screenshot 2021-07-01 at 15 33 16

A few comments:

  • generally 💯 for augmenting the value of datasets and exposing cell-level features via OMERO.tables more systematically
  • I would remove csv_to_table_on_project.py scrpt if it is unused
  • the CSV file attached to the image feels redundant with the table. Could the intermediate CSV be stored in a temporary folder and passed to the metadata population parser?
  • there are definitely several measurement tables per image. From a quick look, all of these tables are consistent in terms of rows i.e. one row per ROI. It should be possible to concatenate multiple (or all) of them and prefix the columns names. I have no intuition on the most valuable. Would that be something the submitter would be able to help us with?

scripts/csv_to_roi_table.py Outdated Show resolved Hide resolved
@will-moore
Copy link
Member Author

@sbesson Those commits should address your points. Also e-mailed Jonas for feedback on what tables to include.

@will-moore
Copy link
Member Author

Feedback from Jonas:

  • "shape_CFOR_pca_measured.tsv" and "shape_TFOR_pca_measured.tsv" -- unbiased cell shape descriptors; only the first 5-10 columns (principal components) are of interest, as they explain most of the cell population's variance.
  • "pea3smFISH_RNAcounts_predicted.tsv" -- predicted expression of pea3 mRNA; an example of multi-modal data integration.
  • "archetype_TFOR_classifications.tsv" -- classification of cells into different morphological archetypes; as this is categorical data, it can be used to stratify analyses/plots of the other measurements.

Number of columns for tables (other than Source Name and Cell ID:

  • other_measurements.tsv: 26 (see above)
  • shape_CFOR_pca_measured.tsv and shape_TFOR_pca_measured.tsv: 60 each (PC 1 -> PC 60)
  • pea3smFISH_RNAcounts_predicted.tsv: 1 pea3 spot count
  • archetype_TFOR_classifications.tsv: 1 Predicted Archetype

@seb, I think we could add each of those 5 tables to each image.
I could look at updating the script to combine tables but we wouldn't want to combine the 2 *pca_measured tables as they have identical column names. It might be possible to combine the other tables that have a single column of data, to add that into the other_measurements but the effort is probably not worth it and it distorts the submitted data.

I'll update the script to take a table name filter, then run it on a pilot somewhere?

@sbesson
Copy link
Member

sbesson commented Aug 18, 2021

I'll update the script to take a table name filter, then run it on a pilot somewhere?

👍 sounds like the best plan of action

@will-moore
Copy link
Member Author

One limitation of populate_metadata is that it always names the files bulk_annotations regardless of the name of the table.
So if we are relying on the name to know which table to find a particular column, or (as above) column names are identical between different tables, we are in trouble:

Screenshot 2021-08-18 at 17 10 29

I think I'll open a omero-metadata PR to add a table_name option...

@sbesson
Copy link
Member

sbesson commented Aug 18, 2021

From a quick look at the code, the omero_metadata.DEFAULT_TABLE_NAME is only used for the CSV -> OMERO.tables population workflow and nothing else depends on it at the level of the plugin. The OMERO.tables -> map annotations workflow selection is based on the bulk annotation namespace.

Looking into adding support for custom bulk annotation OMERO.table names makes sense to me at least in the context of this work.

@will-moore
Copy link
Member Author

Running locally (using populate-metadata branch above) connected to pilot-idr0101:

$ python scripts/csv_to_roi_table.py _other_measurements.tsv
... Successfully attached 165 tables to images

# 2 tables at once:
$ python scripts/csv_to_roi_table.py _pea3smFISH_RNAcounts_predicted.tsv,_archetype_TFOR_classifications.tsv
# connection failed - a few images not processed with archetype_TFOR_classifications.tsv

$ python scripts/csv_to_roi_table.py _shape_CFOR_pca_measured.tsv
...Successfully attached 165 tables to images

$ python scripts/csv_to_roi_table.py _shape_TFOR_pca_measured.tsv
...Successfully attached 165 tables to images

@will-moore
Copy link
Member Author

With the tables added, you can filter by query, e.g. to find cells with a least negative X coordinate, open the other_measurements table linked to an Image, add query like:

/webclient/omero_table/ID/?query=Centroids_TFOR_X>-10

Clicking on the ROI links will show the ROIs in iviewer - and can confirm that they are do have expected X coordinate (to the right of the image).

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on a few sample raw images from idr0079-hartmann-lateralline/experimentA:

  • the images contain several attachement: a zip with the original sources (already present in the DB) and several tables with the openmicroscopy.org/omero/bulk_annotations namespace
  • the tables have all different names reflecting the original source
  • opening the tables gives a Roi column which links to the OMERO.iviewer ROI URLs

From my side this feels like a useful addition for querying masks. Leaving @francesw @pwalczysko and others a chance to comment but my vote would be for deploying this in prod100.

While testing the table filtering in the client, I noticed some issue with complex queries. The one suggested by the text i.e. ?query=(Cell_ID>11)&(Cell_ID<14) works but ?query=(Centroids_RAW_Z>10)&(Centroids_RAW_Z<11) only applies the first part of the query. I do not know whether it is an existing limitation, otherwise I can turn this into an omero-web issue

@will-moore
Copy link
Member Author

will-moore commented Aug 23, 2021

@sbesson - This is an issue with the URL encoding. If you use ?query=(Centroids_RAW_Z>10)%26(Centroids_RAW_Z<11) it works. And this is the form used when you click on the suggested query.
Not sure the best way to explain that and avoid this issue on the page. Any ideas?

@sbesson
Copy link
Member

sbesson commented Aug 23, 2021

Understood, I had missed that the ampersand was encoded when clicking on the sample query. Then I agree it's only a usability issue. I suspect one extreme solution would be a more advanced UI allowing you to construct these queries via boxes. In the menatime, could we detect query fragment with a decoded & and suggest the encoded version instead?

@sbesson sbesson merged commit 62b7656 into IDR:master Aug 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants