Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TheiaCoV_Fasta_Batch] Substitute FASTA concatenating task to ensure proper sample_id propagation #274

Merged
merged 7 commits into from
Dec 21, 2023

Conversation

cimendes
Copy link
Member

@cimendes cimendes commented Dec 19, 2023

Closes #261

🛠️ Changes Being Made

This PR introduces a new task to concatenate FASTA files where the array of samplenames are passed along the FASTA files for correct sample_id propagation. This is important for cases where the FASTA header doesn't match the assigned sample_id, such as with GISAID FASTA files

Impacted Workflows/Tasks

TheiaCoV_FASTA_Batch

🧠 Context and Rationale

None to be considered.

📋 Workflow/Task Steps

The new cat_files_fasta task has been integrated onto the TheiaCoV_Fasta_Batch workflow. It is not used by any other workflows.

Inputs

None added.

Outputs

No outputs were altered.

Impacted Outputs

No outputs were altered.

🧪 Testing

Underway!

Locally

image

Terra

https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/ad02cccf-63e7-4bc6-9eb6-c4b430113260

Scenarios for Reviewer to Test

  • GISAID FASTA files.
  • FASTA files where headers match sample_ID

🔬 Quality checks

Pull Request (PR) checklist:

  • Include a description of what is in this pull request in this message.
  • The workflow/task has been tested locally and on Terra
  • The CI/CD has been adjusted and tests are passing
  • Everything follows the style guide

@cimendes cimendes requested a review from sage-wright December 19, 2023 10:38
@cimendes cimendes marked this pull request as ready for review December 20, 2023 11:48
Copy link
Member

@sage-wright sage-wright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works great! One thing we might want to consider doing is removing the long Array[Pair[String, File]] object since it's no longer necessary and just passing in the array of samplenames instead. That would simply the code a bit and I think it would be worth it.

Once that's done, I'll approve & merge. Well done! ⭐

@cimendes
Copy link
Member Author

This works great! One thing we might want to consider doing is removing the long Array[Pair[String, File]] object since it's no longer necessary and just passing in the array of samplenames instead. That would simply the code a bit and I think it would be worth it.

Once that's done, I'll approve & merge. Well done! ⭐

All done! :D And thank you for finding and squishing the newline bug! 😍

Copy link
Member

@sage-wright sage-wright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sage-wright sage-wright merged commit 53ffec9 into main Dec 21, 2023
6 checks passed
@cimendes cimendes deleted the im-theiacov-fasta-batch-fix branch January 9, 2024 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[TheiaCoV_FASTA_Batch] FASTA header not matching sample name
2 participants