Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update ncbi-scrub task (version 2.2.1) #202

Closed
wants to merge 8 commits into from
Closed

Conversation

cimendes
Copy link
Member

@cimendes cimendes commented Oct 2, 2023

Closes

🛠️ Changes Being Made

  • ncbi_scrub
    • Update docker container to us-docker.pkg.dev/general-theiagen/ncbi/sra-human-scrubber:2.2.1
    • Update task so paired-end reads are processed as interleaved files

🧠 Context and Rationale

For ncbi_scrub, the latest version solves the issue of paired reads not being correctly masked, according to their documentation . The reads are still processed individually in the task, so the phenomenon might still persist if we do not change the tasks to first interleave the reads, then process them using ncbi-scrub, and then split them again.

I've also altered the ncbi_scrub_pe task to first interleave the reads for processing with HRRT, and then split the resulting file back to forward and reverse read files.

📋 Workflow/Task Steps

N/A

Inputs

N/A

Outputs

  • read1_human_spots_removed and read2_human_spots_removed were removed (this isn't actually a final output of any of the workflows)
  • human_spots_removed was added with the stats containing the number of spots removed for the interleaved files

🧪 Testing

Locally

Tests passed locally with miniwdl for the individual task.

Terra

Tests under way...

🔬 Quality checks

Pull Request (PR) checklist:

  • Include a description of what is in this pull request in this message.
  • The workflow/task has been tested locally and on Terra
  • The CI/CD has been adjusted and tests are passing
  • Everything follows the style guide

@cimendes cimendes marked this pull request as draft October 11, 2023 09:32
Copy link
Contributor

@frankambrosio3 frankambrosio3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrotieno jrotieno marked this pull request as ready for review November 28, 2023 08:11
@jrotieno jrotieno requested a review from andrewjpage November 30, 2023 15:58
read2_unzip=~{read2}
fi
# unzip read files as scrub tool does not take in .gz fastq files, and interleave them
paste <(zcat ~{read1} | paste - - - -) <(zcat ~{read2} | paste - - - -) | tr '\t' '\n' > interleaved.fastq
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work if there are uneven reads in ~{read1} and ~{read2}?

@kevinlibuit
Copy link
Contributor

Seeing failures in workflows utilizing the task ncbi_scrub_se. Unclear from GHA log files as to why this is occurring. Also seeing many failures in the Terra runs posted by @frankambrosio3 above

* A new hostile task, version 0.3.0

* making hostile an optional dehosting tool to ncbi-scrubber

* replacing ncbi-scrub with hostile for human reads removal (dehosting)

* reverting to v0.2.0 to test terra errors

* updated hostile output

* making hostie ouput optional

* Increased RAM for hostile task
@cimendes cimendes marked this pull request as draft December 11, 2023 13:28
@sage-wright sage-wright linked an issue Dec 20, 2023 that may be closed by this pull request
@sam-baird
Copy link
Contributor

Seeing failures in workflows utilizing the task ncbi_scrub_se. Unclear from GHA log files as to why this is occurring. Also seeing many failures in the Terra runs posted by @frankambrosio3 above

This error may be occurring in ncbi_scrub_se because the -n option for scrub.sh was removed as of 2.0.0 and its functionality made the default. I'm using code from this pull request to add read scrubbing to a workflow, and removing -n (and adding -i) fixed it for me.

@cimendes
Copy link
Member Author

Closing PR as stale

@cimendes cimendes closed this Feb 14, 2024
@cimendes
Copy link
Member Author

⚠️ Please do not delete the branch! Thank you! ⚠️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants