-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EVA-1795 — Update docs & scripts for Open Targets 2020/02 release #87
Conversation
|
||
Generated evidence strings must be additionally validated using tool provided by OpenTargets. _Note: as of August 2019, there is a problem running `opentargets_validator` module using Python 3; as a workaround you can install and run it locally using Python 2.__ | ||
Generated evidence strings must be additionally validated using tool provided by Open Targets. _Note: as of August 2019, there is a problem running `opentargets_validator` module using Python 3; as a workaround you can install and run it locally using Python 2. To solve this, Python version needs to be updated to at least 3.6.__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it would work for us if we used python3.6 in this pipeline? Do you know if there's breaking changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if we used Python 3.6 or newer, then it wouldn't be an issue. I don't know if there are breaking changes—this will need to be tested. I created an issue to address this: https://www.ebi.ac.uk/panda/jira/browse/EVA-1797
docs/submit-opentargets-batch.md
Outdated
export CODE_ROOT=/nfs/production3/eva/software/eva-cttv-pipeline | ||
|
||
# Setting up Python version | ||
# Setting up Python version (the same one which you installed using build instructions) | ||
PYTHON_VERSION=3.5.6 | ||
INSTALL_PATH=/nfs/production3/eva/software/python-${PYTHON_VERSION} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to be a pain here but I am really antsy about having production file system paths in a public repo. Is it possible to work around this at all (perhaps passing the root path as a command line argument or such)?....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I removed production paths: f57cb09 and added them into the private "configuration" repository
docs/submit-opentargets-batch.md
Outdated
echo "</ReleaseSet>" >> ${CODE_ROOT}/clinvar-xml-parser/src/test/resources/ClinvarExample.xml | ||
gzip -c \ | ||
<${CODE_ROOT}/clinvar-xml-parser/src/test/resources/ClinvarExample.xml \ | ||
>${CODE_ROOT}/clinvar-xml-parser/src/test/resources/ClinvarExample.xml.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be rewritten as a single command
zcat ${BATCH_ROOT}/clinvar/ClinVarFullRelease_${CLINVAR_RELEASE}.xml.gz \
| awk 'BEGIN {RS="</ClinVarSet>\n\n"; ORS=RS} {print} NR==10 {exit} END{print "</ReleaseSet>"}' \
| gzip -c > ${CODE_ROOT}/clinvar-xml-parser/src/test/resources/ClinvarExample.xml.gz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's much nicer, thank you for the suggestion! The only adjustment I had to make to this command is to add tee
, because the current test suite design requires both of those files (the compressed and the uncompressed one). Addressed in 5fd796f, and now it looks like this:
zcat ${BATCH_ROOT}/clinvar/ClinVarFullRelease_${CLINVAR_RELEASE}.xml.gz \
| awk 'BEGIN {RS="</ClinVarSet>\n\n"; ORS=RS} {print} NR==10 {exit} END {print "</ReleaseSet>"}' \
| tee ${CODE_ROOT}/clinvar-xml-parser/src/test/resources/ClinvarExample.xml \
| gzip -c >${CODE_ROOT}/clinvar-xml-parser/src/test/resources/ClinvarExample.xml.gz
docs/manual-curation.md
Outdated
+ New mappings for previously unmapped traits | ||
|
||
The resulting file must be named `trait_names_to_ontology_mappings.tsv` and saved to `${BATCH_ROOT}/trait_mapping` directory as well. | ||
Once the manual curation is completed, apply a spreadsheet filter so that only traits with Status = DONE are visible. Copy data for all non-empty rows from three columns: “ClinVar label”; “URI of selected mapping”; “Label of selected mapping”, in that order. Do not include header lines. Save the data to a file `${BATCH_ROOT}/trait_mapping/finished_mappings_curation.tsv`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest bold font in the text to exclude the header lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, addressed in 47056aa
docs/submit-opentargets-batch.md
Outdated
```bash | ||
cd ${CODE_ROOT} && ${BSUB_CMDLINE} -n 8 -M 4G \ | ||
cd ${CODE_ROOT} && \ | ||
${BSUB_CMDLINE} -K -n 8 -M 16G \ | ||
-o ${BATCH_ROOT}/logs/convert_clinvar_files.out \ | ||
-e ${BATCH_ROOT}/logs/convert_clinvar_files.err \ | ||
java -jar ${CODE_ROOT}/clinvar-xml-parser/target/clinvar-parser-1.0-SNAPSHOT-jar-with-dependencies.jar \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are providing a memory requirement of 16G at the bsub level. Shouldn't we also provide that for the java command through -Xmx?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 357b306
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I pass -Xmx15G
on purpose so that Java doesn't get too close to the bsub limit)
4. https://www.ebi.ac.uk/panda/jira/browse/EVA-1777 | ||
5. https://www.ebi.ac.uk/panda/jira/browse/EVA-1778 | ||
6. https://www.ebi.ac.uk/panda/jira/browse/EVA-1779 | ||
7. https://www.ebi.ac.uk/panda/jira/browse/EVA-1780 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Judgement call: May be my attention span to scan documents isn't what it used to be but is this preferable to having the template links beside the individual sections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, and originally the template links were scattered throughout the document, each near the header line of the corresponding section. However, this turned out to be not convenient in real life, because the only time you want to look at those templates is when creating the new tickets for the current batch. And this is done simultaneously for all issues, at the very start of processing the new batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for the most part. Left a few comments...
4a207f8
to
1a77831
Compare
5fd796f
to
47056aa
Compare
This includes changes to the code & documentation which I had to make for the new Open Targets batch submission: