Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating import yaml files to fix fastq.gz.md5 import issues #426

Conversation

AmitBinf
Copy link
Contributor

Changed import_suffix for data_object_type: Metagenome Raw Reads in import.yaml and import-mt.yaml
from --> import_suffix: .[A,C,G,T]+-[A,C,G,T]+.fastq.gz
to --> import_suffix: \.[ACGT]+-[ACGT]+\.fastq\.gz$
should resolve this issue.

\. matches a literal dot (.).

[ACGT]+ matches one or more occurrences of the letters A, C, G, or T.

- matches a literal hyphen (-).

[ACGT]+ matches one or more occurrences of the letters A, C, G, or T again.

\.fastq\.gz matches the literal string .fastq.gz.

$ ensures that the pattern matches only at the end of the string, so it won’t match .fastq.gz.md5

@aclum aclum self-requested a review March 11, 2025 16:53
Copy link
Contributor

@mbthornton-lbl mbthornton-lbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Substitute print statement with assertion(s)

@@ -31,6 +31,8 @@ def mock_minted_ids():

def test_update_do_mappings_from_import_files(import_mapper_instance):
import_mapper_instance.update_do_mappings_from_import_files()
for fm_all in import_mapper_instance.mappings:
print(fm_all, "\n\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this print statement? Can we have an assert statement that confirms that the correct files are being imported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the PR to include an assert statement to test no .md5 files were picked. The print statement helps while testing to quickly skim over the files that are being imported.

@AmitBinf AmitBinf merged commit 5f5229e into main Mar 13, 2025
1 check passed
@AmitBinf AmitBinf deleted the 421-import-automation-in-some-cases-picks-up-the-md5-sum-file-instead-of-the-fastq-file branch March 14, 2025 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

import automation in some cases picks up the md5 sum file instead of the fastq file
3 participants