Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Utility] Workflow to rename FASTQ files (non-destructive) #267

Merged
merged 7 commits into from
Dec 20, 2023

Conversation

cimendes
Copy link
Member

@cimendes cimendes commented Dec 13, 2023

Closes #266

🛠️ Changes Being Made

This PR implements a new utility workflow as requested in #266. This workflow received a read file or a pair of read files (FASTQ), compressed or uncompressed, and returned a new, renamed and compressed FASTQ file for submission in GISAID.

Impacted Workflows/Tasks

None, this is a new implementation

🧠 Context and Rationale

Requested by SFPHL

📋 Workflow/Task Steps

This is a sample-level workflow. If a reverse read (read2) is provided, the files get renamed to the provided new_filename input with the notation <new_filename>_R1.fastq.gz and <new_filename>_R2.fastq.gz. If only read1 is provided, the file gets renamed to <new_filename>.fastq.gz. If a not-compressed file is provided, this gets compressed automatically by the workflow. ´

Inputs

  input {
    File read1  
    File? read2
    String new_filename
  }

read1: Mandatory input; Forward-facing or single-end reads, compressed or uncompressed
read2: Optional input; Reverse-facing reads, compressed or uncompressed
new_filename: Mandatory input; String with new name for read files

Outputs

output {
    String rename_fastq_files_version = version_capture.phb_version
    String rename_fastq_files_analysis_date = version_capture.date
    File read1_renamed = select_first([rename_PE_files.read1_renamed, rename_SE_files.read1_renamed])
    File? read2_renamed = rename_PE_files.read2_renamed
  }

rename_fastq_files_version: version of PHB used to run this workflow
rename_fastq_files_analysis_date: date of renaming
read1_renamed: Forward-facing or single-end reads. Always present
read2_renamed: Reverse-facing reads. Only present if read2 is provided

Impacted Outputs

None, new workflow

🧪 Testing

Locally

Paired-end reads (compressed):
image

Single-end reads (compressed):
image

Single-end reads (uncompressed):
image

Terra

Underway

22 Samples PE, Compressed: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/ff1cda0d-1c92-4fc0-8b5e-7001d520cfe6

Scenarios for Reviewer to Test

  • Weird filenames
  • Should we keep the _R1.fastq.gz output name or change it to something else?

🔬 Quality checks

Pull Request (PR) checklist:

  • Include a description of what is in this pull request in this message.
  • The workflow/task has been tested locally and on Terra
  • The CI/CD has been adjusted and tests are passing
  • Everything follows the style guide

@cimendes cimendes marked this pull request as ready for review December 14, 2023 09:58
Copy link

@emily-smith1 emily-smith1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested successfully on 10 single-end and 10 paired-end SARS-CoV-2 read files in the renaming_test table in this Terra workspace. Files were renamed according to corresponding GISAID accession per the original request.

@cimendes cimendes requested a review from sage-wright December 15, 2023 12:48
Copy link
Member

@sage-wright sage-wright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but would drop the single end version, since r2 is already optional and you're checking for its existence in the PE version of the task anyways.

@cimendes
Copy link
Member Author

Looks good, but would drop the single end version, since r2 is already optional and you're checking for its existence in the PE version of the task anyways.

The reason why I introduced the SE task was because I couldn't find a smart way to have the output be *.fastq.gz or *_1.fastq.gz for the forward read... Maybe we can think of something smart together? :)

Copy link
Member

@sage-wright sage-wright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done! ⭐

@sage-wright sage-wright merged commit b49fd16 into main Dec 20, 2023
5 checks passed
@cimendes cimendes deleted the im-utilities-rename-files branch January 4, 2024 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Utilities] Add workflow to rename FASTQ files in a non-destructive manner
3 participants