-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Utility] Workflow to rename FASTQ files (non-destructive) #267
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested successfully on 10 single-end and 10 paired-end SARS-CoV-2 read files in the renaming_test table in this Terra workspace. Files were renamed according to corresponding GISAID accession per the original request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but would drop the single end version, since r2 is already optional and you're checking for its existence in the PE version of the task anyways.
The reason why I introduced the SE task was because I couldn't find a smart way to have the output be *.fastq.gz or *_1.fastq.gz for the forward read... Maybe we can think of something smart together? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done! ⭐
Closes #266
🛠️ Changes Being Made
This PR implements a new utility workflow as requested in #266. This workflow received a read file or a pair of read files (FASTQ), compressed or uncompressed, and returned a new, renamed and compressed FASTQ file for submission in GISAID.
Impacted Workflows/Tasks
None, this is a new implementation
🧠 Context and Rationale
Requested by SFPHL
📋 Workflow/Task Steps
This is a sample-level workflow. If a reverse read (
read2
) is provided, the files get renamed to the providednew_filename
input with the notation<new_filename>_R1.fastq.gz
and<new_filename>_R2.fastq.gz
. If onlyread1
is provided, the file gets renamed to<new_filename>.fastq.gz
. If a not-compressed file is provided, this gets compressed automatically by the workflow. ´Inputs
read1
: Mandatory input; Forward-facing or single-end reads, compressed or uncompressedread2
: Optional input; Reverse-facing reads, compressed or uncompressednew_filename
: Mandatory input; String with new name for read filesOutputs
rename_fastq_files_version
: version of PHB used to run this workflowrename_fastq_files_analysis_date
: date of renamingread1_renamed
: Forward-facing or single-end reads. Always presentread2_renamed
: Reverse-facing reads. Only present ifread2
is providedImpacted Outputs
None, new workflow
🧪 Testing
Locally
Paired-end reads (compressed):
Single-end reads (compressed):
Single-end reads (uncompressed):
Terra
Underway22 Samples PE, Compressed: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/ff1cda0d-1c92-4fc0-8b5e-7001d520cfe6
Scenarios for Reviewer to Test
_R1.fastq.gz
output name or change it to something else?🔬 Quality checks
Pull Request (PR) checklist: