EVA-1795 — Update docs & scripts for Open Targets 2020/02 release #87

tskir · 2020-01-28T12:02:49Z

This includes changes to the code & documentation which I had to make for the new Open Targets batch submission:

Migrate to Open Targets schema version 1.6.3
Ironing out documentation throughout the pipeline
Handle occasional OxO issues when querying for cross-links (this is a known issue: Mapping of Orphanet:121715 results in 500 Internal Server Error EBISPOT/OXO#26)

coveralls · 2020-01-28T12:04:48Z

Coverage remained the same at 75.099% when pulling f57cb09 on tskir:eva-1795-updates-2020-02 into 17cfe20 on EBIvariation:master.

docs/build.md

jmmut · 2020-01-28T16:06:49Z

docs/submit-opentargets-batch.md


-Generated evidence strings must be additionally validated using tool provided by OpenTargets. _Note: as of August 2019, there is a problem running `opentargets_validator` module using Python 3; as a workaround you can install and run it locally using Python 2.__
+Generated evidence strings must be additionally validated using tool provided by Open Targets. _Note: as of August 2019, there is a problem running `opentargets_validator` module using Python 3; as a workaround you can install and run it locally using Python 2. To solve this, Python version needs to be updated to at least 3.6.__


so it would work for us if we used python3.6 in this pipeline? Do you know if there's breaking changes?

Yes, if we used Python 3.6 or newer, then it wouldn't be an issue. I don't know if there are breaking changes—this will need to be tested. I created an issue to address this: https://www.ebi.ac.uk/panda/jira/browse/EVA-1797

sundarvenkata-EBI · 2020-01-28T16:25:43Z

docs/submit-opentargets-batch.md

 export CODE_ROOT=/nfs/production3/eva/software/eva-cttv-pipeline

-# Setting up Python version
+# Setting up Python version (the same one which you installed using build instructions)
 PYTHON_VERSION=3.5.6
 INSTALL_PATH=/nfs/production3/eva/software/python-${PYTHON_VERSION}


Sorry to be a pain here but I am really antsy about having production file system paths in a public repo. Is it possible to work around this at all (perhaps passing the root path as a command line argument or such)?....

Agreed. I removed production paths: f57cb09 and added them into the private "configuration" repository

tcezard · 2020-01-28T13:21:44Z

docs/submit-opentargets-batch.md

+echo "</ReleaseSet>" >> ${CODE_ROOT}/clinvar-xml-parser/src/test/resources/ClinvarExample.xml
+gzip -c \
+  <${CODE_ROOT}/clinvar-xml-parser/src/test/resources/ClinvarExample.xml \
+  >${CODE_ROOT}/clinvar-xml-parser/src/test/resources/ClinvarExample.xml.gz


This could be rewritten as a single command

zcat ${BATCH_ROOT}/clinvar/ClinVarFullRelease_${CLINVAR_RELEASE}.xml.gz \ | awk 'BEGIN {RS="</ClinVarSet>\n\n"; ORS=RS} {print} NR==10 {exit} END{print "</ReleaseSet>"}' \ | gzip -c > ${CODE_ROOT}/clinvar-xml-parser/src/test/resources/ClinvarExample.xml.gz

Yes, that's much nicer, thank you for the suggestion! The only adjustment I had to make to this command is to add tee, because the current test suite design requires both of those files (the compressed and the uncompressed one). Addressed in 5fd796f, and now it looks like this:

zcat ${BATCH_ROOT}/clinvar/ClinVarFullRelease_${CLINVAR_RELEASE}.xml.gz \ | awk 'BEGIN {RS="</ClinVarSet>\n\n"; ORS=RS} {print} NR==10 {exit} END {print "</ReleaseSet>"}' \ | tee ${CODE_ROOT}/clinvar-xml-parser/src/test/resources/ClinvarExample.xml \ | gzip -c >${CODE_ROOT}/clinvar-xml-parser/src/test/resources/ClinvarExample.xml.gz

sundarvenkata-EBI · 2020-01-28T16:31:38Z

docs/manual-curation.md

-  + New mappings for previously unmapped traits
-
-The resulting file must be named `trait_names_to_ontology_mappings.tsv` and saved to `${BATCH_ROOT}/trait_mapping` directory as well.
+Once the manual curation is completed, apply a spreadsheet filter so that only traits with Status = DONE are visible. Copy data for all non-empty rows from three columns: “ClinVar label”; “URI of selected mapping”; “Label of selected mapping”, in that order. Do not include header lines. Save the data to a file `${BATCH_ROOT}/trait_mapping/finished_mappings_curation.tsv`.


Suggest bold font in the text to exclude the header lines.

Good suggestion, addressed in 47056aa

sundarvenkata-EBI · 2020-01-28T16:44:20Z

docs/submit-opentargets-batch.md

 ```bash
-cd ${CODE_ROOT} && ${BSUB_CMDLINE} -n 8 -M 4G \
+cd ${CODE_ROOT} && \
+${BSUB_CMDLINE} -K -n 8 -M 16G \
  -o ${BATCH_ROOT}/logs/convert_clinvar_files.out \
  -e ${BATCH_ROOT}/logs/convert_clinvar_files.err \
  java -jar ${CODE_ROOT}/clinvar-xml-parser/target/clinvar-parser-1.0-SNAPSHOT-jar-with-dependencies.jar \


We are providing a memory requirement of 16G at the bsub level. Shouldn't we also provide that for the java command through -Xmx?

Addressed in 357b306

(I pass -Xmx15G on purpose so that Java doesn't get too close to the bsub limit)

sundarvenkata-EBI · 2020-01-28T16:50:03Z

docs/submit-opentargets-batch.md

+4. https://www.ebi.ac.uk/panda/jira/browse/EVA-1777
+5. https://www.ebi.ac.uk/panda/jira/browse/EVA-1778
+6. https://www.ebi.ac.uk/panda/jira/browse/EVA-1779
+7. https://www.ebi.ac.uk/panda/jira/browse/EVA-1780


Judgement call: May be my attention span to scan documents isn't what it used to be but is this preferable to having the template links beside the individual sections?

I see your point, and originally the template links were scattered throughout the document, each near the header line of the corresponding section. However, this turned out to be not convenient in real life, because the only time you want to look at those templates is when creating the new tickets for the current batch. And this is done simultaneously for all issues, at the very start of processing the new batch.

sundarvenkata-EBI

Looks good for the most part. Left a few comments...

tskir requested review from jmmut and sundarvenkata-EBI January 28, 2020 12:02

tskir commented Jan 28, 2020

View reviewed changes

docs/build.md Show resolved Hide resolved

jmmut approved these changes Jan 28, 2020

View reviewed changes

sundarvenkata-EBI reviewed Jan 28, 2020

View reviewed changes

tcezard reviewed Jan 28, 2020

View reviewed changes

sundarvenkata-EBI reviewed Jan 28, 2020

View reviewed changes

tskir added 5 commits January 29, 2020 16:12

Handle OxO errors inside create_efo_table script

276cec5

Ironing out documentation

fd83f95

Migrate to Open Targets schema 1.6.3

c050cc8

Add instructions for OT schema version changes

476d1bf

Final style updates

1a77831

tskir force-pushed the eva-1795-updates-2020-02 branch from 4a207f8 to 1a77831 Compare January 29, 2020 16:12

tskir added 3 commits January 29, 2020 16:33

Review: remove production paths

f57cb09

Review: simplify command to update Java parser test files

0757c5a

Review: emphasis on not including header lines

47056aa

tskir force-pushed the eva-1795-updates-2020-02 branch from 5fd796f to 47056aa Compare January 29, 2020 16:48

Review: Pass memory limits to Java via -Xmx

357b306

tskir requested review from sundarvenkata-EBI and tcezard January 29, 2020 16:56

sundarvenkata-EBI approved these changes Jan 30, 2020

View reviewed changes

tcezard approved these changes Jan 30, 2020

View reviewed changes

tskir merged commit fbd83e7 into EBIvariation:master Jan 31, 2020

tskir deleted the eva-1795-updates-2020-02 branch January 31, 2020 09:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EVA-1795 — Update docs & scripts for Open Targets 2020/02 release #87

EVA-1795 — Update docs & scripts for Open Targets 2020/02 release #87

tskir commented Jan 28, 2020

coveralls commented Jan 28, 2020 •

edited

Loading

jmmut Jan 28, 2020

tskir Jan 29, 2020

sundarvenkata-EBI Jan 28, 2020 •

edited

Loading

tskir Jan 29, 2020

tcezard Jan 28, 2020

tskir Jan 29, 2020

sundarvenkata-EBI Jan 28, 2020

tskir Jan 29, 2020

sundarvenkata-EBI Jan 28, 2020

tskir Jan 29, 2020

tskir Jan 29, 2020

sundarvenkata-EBI Jan 28, 2020 •

edited

Loading

tskir Jan 29, 2020

sundarvenkata-EBI left a comment


		Generated evidence strings must be additionally validated using tool provided by OpenTargets. _Note: as of August 2019, there is a problem running `opentargets_validator` module using Python 3; as a workaround you can install and run it locally using Python 2.__
		Generated evidence strings must be additionally validated using tool provided by Open Targets. _Note: as of August 2019, there is a problem running `opentargets_validator` module using Python 3; as a workaround you can install and run it locally using Python 2. To solve this, Python version needs to be updated to at least 3.6.__

EVA-1795 — Update docs & scripts for Open Targets 2020/02 release #87

EVA-1795 — Update docs & scripts for Open Targets 2020/02 release #87

Conversation

tskir commented Jan 28, 2020

coveralls commented Jan 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sundarvenkata-EBI Jan 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sundarvenkata-EBI Jan 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sundarvenkata-EBI left a comment

Choose a reason for hiding this comment

coveralls commented Jan 28, 2020 •

edited

Loading

sundarvenkata-EBI Jan 28, 2020 •

edited

Loading

sundarvenkata-EBI Jan 28, 2020 •

edited

Loading