Bulk annotations and streaming #5

kkoz · 2018-08-20T19:06:07Z

Added changes to support bulk annotations for projects and datasets, and to change csv reading to process the data line by line instead of buffering the data in memory and writing to the table all at once.

…roscopy/openmicroscopy/pull/5691/commits

…ather than store everything in memory

sbesson · 2018-08-21T07:04:12Z

test/integration/tablestest/test_populate_metadata.py

@@ -24,12 +24,16 @@

 """

+import csv


This import causes the Python code compliance step to fail in Travis

sbesson · 2018-08-21T07:09:31Z

@kkoz thanks for your first contribution to this plugin. I commented on one import which causes the Travis build to fail. Once CI passes, we can look into testing these changes functionally. A couple of immediate questions:

I assume this change should be tested against examples of P/D as well as S/P/W for non-regression. We have certainly examples of bulk annotations for both types of hierarchies in IDR
briefly scanning through the changes, the API changes in populate_metadata probably mean this would be released as 0.3.0
it looks like info argument is now largely a no-op as write_to_omero is no longer conditional. From the perspective of the CLI plugin this is entirely fine as this argument was never exposed.

kkoz · 2018-08-21T16:05:47Z

@sbesson Thanks for the notes. I will be making a commit shortly with the unnecessary import removed. For future reference, is there any way to check if my code will pass a travis-ci check without doing a commit? I've responded to your questions/comments below.

The code should be tested against P/D and S/P/W data.
That's fine - I will update CHANGES.rst
Removed info from command line args. Please let me know if there's anything else that needs to be done on this front.

sbesson · 2018-08-22T09:37:28Z

test/integration/tablestest/test_populate_metadata.py

@@ -29,7 +29,9 @@

 from omero.model import PlateI, WellI, WellSampleI, OriginalFileI
 from omero.rtypes import rint, rstring, unwrap
-from omero.util.populate_metadata import ParsingContext
+from omero.util.populate_metadata import (ParsingContext,


Just realizing that this still imports the legacy OMERO.py classes rather than the class in the package. Largely, something I missed in #1 as I did not activate these tests. Can you try to update this line to

from populate_metadata import (ParsingContext, ... `` and similarly in `test/integration/metadata/test_populate.py` so that your new factory class is picked up? NB: in a follow-up PR, we'll probably want to look into properly namespacing these packages.

sbesson · 2018-08-22T09:52:07Z

@kkoz for pre-testing, assuming you have Docker installed, you should be able to test your changes locally by running the same calls as the ones executed in .travis.yml:

$ git clone git://github.com/openmicroscopy/omero-test-infra .omero
$ .omero/cli-docker

This is going to consume the omero-test-infra infrastructure to 1- create a Docker-based OMERO environment (DB, server, web), 2- install the plugin and run Python compliance checks, 3- execute the integration tests.

kkoz · 2018-08-22T16:30:22Z

@sbesson Thanks for the info. I ran .omero/cli-docke but the output is mostly flake8 notes like ./omero-metadata/venv/lib/python2.7/distutils/__init__.py:100:1: E305 expected 2 blank lines after class or function definition, found 0
There are a lot of these: flake8.main.application MainProcess 130168 INFO Found a total of 108169 violations and reported 104755
Is there some way for me to disable the flake8 checks so I can see the output of the tests?
Also, which version of populate_metadata.py is being used in the container? How can I make sure it's the one I want to test? I'll perform another commit with the import changes in the meantime. Thanks again for your help.

…gacy versions

sbesson · 2018-08-22T17:15:27Z

@kkoz looks great now. Re flake8, you can disable it by updating .omero/py-check and commenting the flake8 line. Is the venv folder your local development virtual environment? If so, it should be possible to add a file excluding the check of known folders.

The version of populate_metadata.py tested should be the local version. The local directory is copied in the docker container via docker cp and installed there via python setup.py install before executing the tests.

sbesson · 2018-08-23T10:57:45Z

src/populate_metadata.py

 class ParsingContext(object):
    """Generic parsing context for CSV files."""

-    def __init__(self, client, target_object, file=None, fileid=None,
+    def __init__(self, client, target_object,
+                 parsing_util_factory, file=None, fileid=None,


Is parsing_util_factory a new mandatory argument in ParsingContext as proposed here or an optional key/value pair as suggested in the test updates. Trying to consume this branch using the CLI wrapper against a project, I get the following error:

bash-4.2$ pip install git+git://github.com/kkoz/omero-metadata.git@bulk-annotations-and-streaming#egg=omero-metadata --user Collecting omero-metadata from git+git://github.com/kkoz/omero-metadata.git@bulk-annotations-and-streaming#egg=omero-metadata ... bash-4.2$ OMERO-server/OMERO.server/bin/omero metadata populate --report --batch=10000 --file IDR-import/idr0021-lawo-pericentriolarmaterial/experimentA/idr0021-experimentA-annotation.csv Project:1 Using session for root@localhost:4064. Idle timeout: 10 min. Current group: system Traceback (most recent call last): File "OMERO-server/OMERO.server/bin/omero", line 125, in <module> rv = omero.cli.argv() File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/cli.py", line 1620, in argv cli.invoke(args[1:]) File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/cli.py", line 1095, in invoke stop = self.onecmd(line, previous_args) File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/cli.py", line 1172, in onecmd self.execute(line, previous_args) File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/cli.py", line 1254, in execute args.func(args) File "/home/omero/.local/lib/python2.7/site-packages/omero_cli_metadata.py", line 482, in populate options=localcfg) TypeError: __init__() takes at least 4 arguments (9 given)

NB: this also reveals the absence of integration test for the CLI metadata populate subcommand which would have picked up this error. This is independent of this PR but I will try and produce an extra commit to cover this use case.

The intention was to have it be mandatory. This was probably a bit of premature refactoring on my part. I will see if I can change the code to restore the signature of __init__ in ParsingContext. My eventual goal was to refactor the code such that it allowed for unit testing. But that is outside the scope of this PR so I'll try to change it.

…ignature the same as before

sbesson · 2018-08-24T14:28:39Z

A couple of notes from the discussion with @emilroz @kkoz and @joshmoore. Generally, in the absence of any other outstanding requirement, we are starting to define the roadmap for a 0.3.0 release of omero-metadata including:

the support of large CSV (multiple 10-100G)
better support/fixes of the behavior for various column types
additional API cleanup/refactoring including =the usage of the new factory class
more unit and integration tests (CLI metadata populate integration tests #8)

In terms of functional testing perspective, while the latest omero-metadata:0.2.2 can keep been used in production, different types of IDR studies e.g. idr0021 and idr0008 should be used to confirm the various population contexts (bulk annotation population, map annotation population, map annotation deletion) remain functional.

Testing the last commit, the default metadata populate command failed as the CLI wrapper is expecting ctx.write_to_omero which is removed by this PR:

bin/omero metadata populate --report --batch=1000 --file=../../IDR-import/screenA/idr0037-screenA-annotation.csv Screen:52
Using session for root@localhost:4064. Idle timeout: 10 min. Current group: system
DEBUG:omero.util.populate_metadata:Loading Screen:52
...
Traceback (most recent call last):
  File "bin/omero", line 125, in <module>
    rv = omero.cli.argv()
  File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/cli.py", line 1620, in argv
    cli.invoke(args[1:])
  File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/cli.py", line 1095, in invoke
    stop = self.onecmd(line, previous_args)
  File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/cli.py", line 1172, in onecmd
    self.execute(line, previous_args)
  File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/cli.py", line 1254, in execute
    args.func(args)
  File "/home/omero/.local/lib/python2.7/site-packages/omero_cli_metadata.py", line 492, in populate
    ctx.write_to_omero(batch_size=args.batch, loops=loops, ms=ms)

As discussed, no major objection to removing the write_to_omero API as long as the CLI options (--batch, --loops, --ms) remain functional. The simplest option might be to add these to the context constructor and handle them internally depending on the type of operation.

…() for all contexts. Updated omero_cli_metadata.py and integration tests to no longer call write_to_omero.

kkoz · 2018-08-27T23:10:42Z

@sbesson I made changes to solve the problem, but I'm having trouble testing them. When I run:
bin/omero metadata populate --report --batch=1000 --file=/home/kevin/testdata/test.csv Plate:1
I get:
bin/omero: error: argument <subcommand>: invalid choice: 'metadata'
Also it might be good to have an IDR csv file to test with. How can I get one?

sbesson · 2018-08-28T11:59:49Z

@kkoz about the invalid choice issue, immediate question: does pip freeze list you the omero-metadata module?
All of the IDR CSV files are located in https://github.com/IDR/idr-metadata and its submodules - we are moving towards 1 repository per IDR study. For testing, two "small" studies we have been using are:

https://github.com/IDR/idr0021-lawo-pericentriolarmaterial/blob/master/experimentA/idr0021-experimentA-annotation.csv for an example of P/D/I bulk annotation
https://github.com/IDR/idr0008-rohn-actinome/blob/master/screenB/idr0008-screenB-annotation.csv for an example of HCS bulk annotation

For both of these, the raw data is also available if needed.

kkoz · 2018-08-29T18:00:54Z

@sbesson I successfully imported the idr0021datasets into omero. I added them to a project I created and ran ./bin/omero metadata populate --report --batch=1000 --file=/home/kevin/code/idr0021-lawo-pericentriolarmaterial/experimentA/idr0021-experimentA-annotation.csv Project:201 without any issues. The project now has a "bulk_annotations" attachment but none of the datasets or images have Tables with data associated with them. Did I do use the populate metadata command incorrectly, or is this unexpected behavior? I am using OMERO.web 5.4.6-ice36-b87.

sbesson · 2018-08-30T14:25:49Z

@kkoz excellent. I'll have a go at retesting the annotation workflow (might be tomorrow).
I think the absence of Tables in P/D/I hierarchies in OMERO.web 5.4.6 is expected and this is what ome/openmicroscopy#5832 is trying to address. I will try to use a Web client deployed with these changes to confirm from the Web UI that the bulk annotation works as expected.

sbesson · 2018-08-31T08:24:04Z

Tested these changes against the idr0021 study imported into a test server. After import, went through the three population commands:

$ bin/omero metadata populate Project:101 --file /home/omero/workspace/IDR-import/experimentA/idr0021-experimentA-annotation.csv --report
$ bin/omero metadata populate --context bulkmap --cfg /home/omero/workspace/IDR-import/experimentA/idr0021-experimentA-bulkmap-config.yml --report Project:101
$ bin/omero metadata populate --context deletemap --batch 100 --wait 120 --cfg /home/omero/workspace/IDR-import/experimentA/idr0021-experimentA-bulkmap-config.yml --report Project:101
$ bin/omero metadata populate --context bulkmap --cfg /home/omero/workspace/IDR-import/experimentA/idr0021-experimentA-bulkmap-config.yml --report Project:101

For this type of annotation i.e. P/D/I with Dataset Name and Image Name columns, everything worked as expected - see https://web-proxy.openmicroscopy.org/east-web/webclient/?show=project-101 for the output (including the webclient table support from ome/openmicroscopy#5832).

sbesson · 2018-08-31T20:28:53Z

Executed a similar workflow to #5 (comment) against a screen (idr0008/screenB). Both table population and table to map annotation conversion worked without issue. I was not able to fully test the deletion operation due to time issue. However we know this is one for which can be improved.

I will perform a final round of code review on Monday.

sbesson

A few inline comments. Proposing to merge this as it is especially as further refactoring is planned down the line /cc @joshmoore

sbesson · 2018-09-03T14:16:55Z

src/omero_cli_metadata.py

-                            options=localcfg)
-        ctx.parse()
+        loops = 0
+        ms = 0
        if not args.dry_run:


If I understand correctly the --dry-run argument is now largely a no-op, similarly to the -info command. Primary question is whether we might want to resurrect it at some point via some constructor option of equivalent.

sbesson · 2018-09-03T14:17:08Z

src/populate_metadata.py

+                      WELL_NAME_COLUMN,
+                      DATASET_NAME_COLUMN,
+                      IMAGE_NAME_COLUMN,
+                      'image']


I'll update this line in my #10 once merged

sbesson · 2018-09-03T14:23:19Z

src/populate_metadata.py

+
+        original_file = table.getOriginalFile()
+        file_annotation = FileAnnotationI()
+        file_annotation.ns = rstring(


Unrelated to this PR, it should be possible to consume omero.constants.namespaces.NSBULKANNOTATIONS

Dataset Project only supported for omero-metadata external plugin. Deal with breaking change in omero-metadata: ome/omero-metadata#5

kkoz added 3 commits August 16, 2018 13:01

Added dataset bulk annotation changes. See https://github.com/openmic…

4331cd7

…roscopy/openmicroscopy/pull/5691/commits

Changed parsing context to add data to tables in a streaming method r…

957f32b

…ather than store everything in memory

Fixed formatting and one problem with tabletests

1ede450

sbesson reviewed Aug 21, 2018

View reviewed changes

Updated version, removed unused import, removed info cmd line arg

99c06e8

sbesson reviewed Aug 22, 2018

View reviewed changes

sbesson mentioned this pull request Aug 22, 2018

Populate dataset webclient ome/openmicroscopy#5832

Merged

Changed imports in tests to import from the package instead of the le…

fb9823a

…gacy versions

sbesson reviewed Aug 23, 2018

View reviewed changes

kkoz added 2 commits August 23, 2018 10:12

Create ParsingUtilFactory inside ParsingContext to keep constructor s…

91bc2a6

…ignature the same as before

Fixed formatting issues after refactor

6b47e3d

sbesson added this to the 0.3.0 milestone Aug 24, 2018

sbesson mentioned this pull request Aug 24, 2018

CLI metadata populate integration tests #8

Open

Removed explicit calls to write_to_omero in favor of writing in parse…

ef91b15

…() for all contexts. Updated omero_cli_metadata.py and integration tests to no longer call write_to_omero.

sbesson mentioned this pull request Aug 28, 2018

Image column name #10

Merged

sbesson approved these changes Sep 3, 2018

View reviewed changes

sbesson merged commit 782d9ed into ome:master Sep 3, 2018

sbesson mentioned this pull request Mar 11, 2019

Bulk annotations on Dataset ome/omero-parade#56

Closed

manics added a commit to manics/scripts that referenced this pull request Jun 21, 2019

populate_metadata: deal with breaking changes

838d184

Dataset Project only supported for omero-metadata external plugin. Deal with breaking change in omero-metadata: ome/omero-metadata#5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk annotations and streaming #5

Bulk annotations and streaming #5

kkoz commented Aug 20, 2018

sbesson Aug 21, 2018

sbesson commented Aug 21, 2018

kkoz commented Aug 21, 2018

sbesson Aug 22, 2018

sbesson commented Aug 22, 2018

kkoz commented Aug 22, 2018

sbesson commented Aug 22, 2018 •

edited

Loading

sbesson Aug 23, 2018 •

edited

Loading

kkoz Aug 23, 2018

sbesson commented Aug 24, 2018

kkoz commented Aug 27, 2018

sbesson commented Aug 28, 2018

kkoz commented Aug 29, 2018

sbesson commented Aug 30, 2018 •

edited

Loading

sbesson commented Aug 31, 2018

sbesson commented Aug 31, 2018

sbesson left a comment

sbesson Sep 3, 2018

sbesson Sep 3, 2018

sbesson Sep 3, 2018

Bulk annotations and streaming #5

Bulk annotations and streaming #5

Conversation

kkoz commented Aug 20, 2018

sbesson Aug 21, 2018

Choose a reason for hiding this comment

sbesson commented Aug 21, 2018

kkoz commented Aug 21, 2018

sbesson Aug 22, 2018

Choose a reason for hiding this comment

sbesson commented Aug 22, 2018

kkoz commented Aug 22, 2018

sbesson commented Aug 22, 2018 • edited Loading

sbesson Aug 23, 2018 • edited Loading

Choose a reason for hiding this comment

kkoz Aug 23, 2018

Choose a reason for hiding this comment

sbesson commented Aug 24, 2018

kkoz commented Aug 27, 2018

sbesson commented Aug 28, 2018

kkoz commented Aug 29, 2018

sbesson commented Aug 30, 2018 • edited Loading

sbesson commented Aug 31, 2018

sbesson commented Aug 31, 2018

sbesson left a comment

Choose a reason for hiding this comment

sbesson Sep 3, 2018

Choose a reason for hiding this comment

sbesson Sep 3, 2018

Choose a reason for hiding this comment

sbesson Sep 3, 2018

Choose a reason for hiding this comment

sbesson commented Aug 22, 2018 •

edited

Loading

sbesson Aug 23, 2018 •

edited

Loading

sbesson commented Aug 30, 2018 •

edited

Loading