-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
host_filter.wdl modernization #70
Conversation
short-read-mngs/host_filter.wdl
Outdated
File human_bowtie2_index_tar | ||
File human_hisat2_index_tar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the host genome is not a human? would I still provide the custom host genome file path as a value to human_bowtie2_index_tar
?
I'm trying to figure out how the inputs here change depending on human vs non-human hosts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ovalenzuela19 Good question. If the host genome is non-human then we filter against both the host and human genomes. This protects privacy in case for example, we're sequencing a mosquito carrying some blood from a human it recently bit. So the pipeline always takes in the human index files in addition to the host index files (human or not).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh this makes sense! Thanks so much!
short-read-mngs/host_filter.wdl
Outdated
String host_genome | ||
String genome_dir = "STAR_genome/part-0/" | ||
# Adapter trimming and QC filtering | ||
call fastp_qc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we rename the steps slightly? I believe our step names are in camelCase and start with a verb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. fastp_qc
=> RunFastpQc
489c116
to
acdff09
Compare
@ovalenzuela19 @morsecodist @rzlim08 @katrinakalantar Please have a look over this PR before merging at last. I took out the indexing from this since that's subsumed in #182. Note that this replaces the existing host_filter.wdl, not sure if there might be value in keeping both versions around (on HEAD)? |
@ovalenzuela19 @morsecodist @rzlim08 @katrinakalantar In today's pipeline meeting we discussed that indeed, it seems like a good idea to keep both versions of the WDL alive on the main branch, since we expect to continue supporting the original version for some time. One way to do this would be to keep the current diff that replaces host_filter.wdl and rewires other parts that refers to it; and simply add Thoughts? would this need some change to whatever part of the system actually launches the original version if so requested? |
@@ -31,7 +31,7 @@ def test_bench3_viral(short_read_mngs_bench3_viral_outputs): | |||
taxon_counts = json.load(infile)["pipeline_output"]["taxon_counts_attributes"] | |||
|
|||
taxa = set(entry["tax_id"] for entry in taxon_counts) | |||
assert len(taxa) == 177 | |||
assert abs(len(taxa) - 184) < 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we expecting to introduce some variability into the host filtering? Or was this just a holdover from the old assert?
Thanks Mike and sorry for the late review. One option is to just use I'm not sure if it's possible to have wdl expect different inputs given a flag, but that would be nice. Otherwise I guess we'd have to made almost all of the differing inputs optional. |
@rzlim08 I think it would be closer to the second case unfortunately. Would you still want to see that approach in view of that potential awkwardness? (Thx- I'm just not familiar enough with the pieces of code actually invoking these workflows to understand the constraints here) |
I think the other option would be to run with 2 WDL files and the webapp could choose between them. This would likely mean one of the wdl's would be set as a "default" for e.g. local testing/ benchmarking, but the webapp could have the logic to run one vs the other. This might be the easiest solution for now. |
@ovalenzuela19 The tentative solution here is to merge with the modern host filtering WDL replacing the original one as Question for you- is that going to work for the webapp as far as how it invokes the original pipeline when asked to do so? Or does it need the zip to have Thanks, we're obviously trying to avoid breaking anything that depends on the structure of this repo that aren't necessarily obvious when making these kinds of changes. |
@mlin so would I think from a webapp perspective that solution should be fine |
@ovalenzuela19
Mostly yes, for reference. The unanswered bit is what would we do if a user absolutely needs us to make some further change/bugfix to the original pipeline they're still depending on. How would we roll that out as a tagged release & zip file the webapp can use, once it's been renamed to If we're optimists we would say that's fairly unlikely to happen, to the extent we can punt now and figure it out later if it does, but, thought we should at least air it out here. cc @rzlim08 @morsecodist @katrinakalantar for visibility |
Yeah I think the main question is how to support both pipelines at the same time indefinitely. If we have both WDLs we'd at least be able to patch either. I'll see if I can make a small customization to pin a release to a major/minor version. That being said I don't think we should be running with this forever and would like to drop support for the old version eventually if we're comfortable enough with the new one. |
What if we just create a new workflow that is just the old short-read-mngs pipeline with the old host filtering wdl? That way we can still maintain it & package it like normal and it's up to the web app which wdl will be used? By default it'll use the existing short-read-mngs workflow which uses the modern host filtering stage. That way if we need to run the old v7 short-read-mngs pipeline, the web app can just have some conditional in place to fetch the docker image for the short-read-mngs old version. LMK if this approach sounds absurd |
This reverts commit aeb234f.
* fix bowtie2 counts for single file * fix extra expansions * relieve hisat2 dependency * single sample hisat2 * fix hisat2 * fix dockerfile for hisat2 --------- Co-authored-by: Omar Valenzuela <[email protected]>
* legacy-host-filter-inital-commit * linting * add stage io map * remove stage io map swp file
Whoa. Just noticed this! Huge step forward for czid. Congrats on landing this! |
copilot:summary