Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EVA-1795 — Update docs & scripts for Open Targets 2020/02 release #87

Merged
merged 9 commits into from
Jan 31, 2020

Conversation

tskir
Copy link
Member

@tskir tskir commented Jan 28, 2020

This includes changes to the code & documentation which I had to make for the new Open Targets batch submission:

@coveralls
Copy link

coveralls commented Jan 28, 2020

Coverage Status

Coverage remained the same at 75.099% when pulling f57cb09 on tskir:eva-1795-updates-2020-02 into 17cfe20 on EBIvariation:master.

docs/build.md Show resolved Hide resolved

Generated evidence strings must be additionally validated using tool provided by OpenTargets. _Note: as of August 2019, there is a problem running `opentargets_validator` module using Python 3; as a workaround you can install and run it locally using Python 2.__
Generated evidence strings must be additionally validated using tool provided by Open Targets. _Note: as of August 2019, there is a problem running `opentargets_validator` module using Python 3; as a workaround you can install and run it locally using Python 2. To solve this, Python version needs to be updated to at least 3.6.__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it would work for us if we used python3.6 in this pipeline? Do you know if there's breaking changes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we used Python 3.6 or newer, then it wouldn't be an issue. I don't know if there are breaking changes—this will need to be tested. I created an issue to address this: https://www.ebi.ac.uk/panda/jira/browse/EVA-1797

export CODE_ROOT=/nfs/production3/eva/software/eva-cttv-pipeline

# Setting up Python version
# Setting up Python version (the same one which you installed using build instructions)
PYTHON_VERSION=3.5.6
INSTALL_PATH=/nfs/production3/eva/software/python-${PYTHON_VERSION}
Copy link
Member

@sundarvenkata-EBI sundarvenkata-EBI Jan 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to be a pain here but I am really antsy about having production file system paths in a public repo. Is it possible to work around this at all (perhaps passing the root path as a command line argument or such)?....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I removed production paths: f57cb09 and added them into the private "configuration" repository

echo "</ReleaseSet>" >> ${CODE_ROOT}/clinvar-xml-parser/src/test/resources/ClinvarExample.xml
gzip -c \
<${CODE_ROOT}/clinvar-xml-parser/src/test/resources/ClinvarExample.xml \
>${CODE_ROOT}/clinvar-xml-parser/src/test/resources/ClinvarExample.xml.gz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be rewritten as a single command

zcat ${BATCH_ROOT}/clinvar/ClinVarFullRelease_${CLINVAR_RELEASE}.xml.gz \
  | awk 'BEGIN {RS="</ClinVarSet>\n\n"; ORS=RS} {print} NR==10 {exit} END{print "</ReleaseSet>"}' \
  | gzip -c > ${CODE_ROOT}/clinvar-xml-parser/src/test/resources/ClinvarExample.xml.gz

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's much nicer, thank you for the suggestion! The only adjustment I had to make to this command is to add tee, because the current test suite design requires both of those files (the compressed and the uncompressed one). Addressed in 5fd796f, and now it looks like this:

zcat ${BATCH_ROOT}/clinvar/ClinVarFullRelease_${CLINVAR_RELEASE}.xml.gz \
  | awk 'BEGIN {RS="</ClinVarSet>\n\n"; ORS=RS} {print} NR==10 {exit} END {print "</ReleaseSet>"}' \
  | tee ${CODE_ROOT}/clinvar-xml-parser/src/test/resources/ClinvarExample.xml \
  | gzip -c >${CODE_ROOT}/clinvar-xml-parser/src/test/resources/ClinvarExample.xml.gz

+ New mappings for previously unmapped traits

The resulting file must be named `trait_names_to_ontology_mappings.tsv` and saved to `${BATCH_ROOT}/trait_mapping` directory as well.
Once the manual curation is completed, apply a spreadsheet filter so that only traits with Status = DONE are visible. Copy data for all non-empty rows from three columns: “ClinVar label”; “URI of selected mapping”; “Label of selected mapping”, in that order. Do not include header lines. Save the data to a file `${BATCH_ROOT}/trait_mapping/finished_mappings_curation.tsv`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest bold font in the text to exclude the header lines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, addressed in 47056aa

```bash
cd ${CODE_ROOT} && ${BSUB_CMDLINE} -n 8 -M 4G \
cd ${CODE_ROOT} && \
${BSUB_CMDLINE} -K -n 8 -M 16G \
-o ${BATCH_ROOT}/logs/convert_clinvar_files.out \
-e ${BATCH_ROOT}/logs/convert_clinvar_files.err \
java -jar ${CODE_ROOT}/clinvar-xml-parser/target/clinvar-parser-1.0-SNAPSHOT-jar-with-dependencies.jar \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are providing a memory requirement of 16G at the bsub level. Shouldn't we also provide that for the java command through -Xmx?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 357b306

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I pass -Xmx15G on purpose so that Java doesn't get too close to the bsub limit)

4. https://www.ebi.ac.uk/panda/jira/browse/EVA-1777
5. https://www.ebi.ac.uk/panda/jira/browse/EVA-1778
6. https://www.ebi.ac.uk/panda/jira/browse/EVA-1779
7. https://www.ebi.ac.uk/panda/jira/browse/EVA-1780
Copy link
Member

@sundarvenkata-EBI sundarvenkata-EBI Jan 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Judgement call: May be my attention span to scan documents isn't what it used to be but is this preferable to having the template links beside the individual sections?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, and originally the template links were scattered throughout the document, each near the header line of the corresponding section. However, this turned out to be not convenient in real life, because the only time you want to look at those templates is when creating the new tickets for the current batch. And this is done simultaneously for all issues, at the very start of processing the new batch.

Copy link
Member

@sundarvenkata-EBI sundarvenkata-EBI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for the most part. Left a few comments...

@tskir tskir force-pushed the eva-1795-updates-2020-02 branch from 4a207f8 to 1a77831 Compare January 29, 2020 16:12
@tskir tskir force-pushed the eva-1795-updates-2020-02 branch from 5fd796f to 47056aa Compare January 29, 2020 16:48
@tskir tskir merged commit fbd83e7 into EBIvariation:master Jan 31, 2020
@tskir tskir deleted the eva-1795-updates-2020-02 branch January 31, 2020 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants