Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating import yaml files to fix fastq.gz.md5 import issues #426

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 37 additions & 37 deletions configs/import-mt.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ Data Objects:
- data_object_type: Metagenome Raw Reads
description: Metagenome Raw Reads for {id}
name: Raw sequencer read data
import_suffix: .[A-Z]+-[A-Z]+.fastq.gz
import_suffix: \.[ACGT]+-[ACGT]+\.fastq\.gz$
nmdc_suffix: .fastq.gz
input_to: [nmdc:ReadQcAnalysis]
output_of: nmdc:NucleotideSequencing
Expand All @@ -137,7 +137,7 @@ Data Objects:
- data_object_type: Annotation Amino Acid FASTA
description: FASTA Amino Acid File for {id}
name: FASTA amino acid file for annotated proteins
import_suffix: _proteins.faa
import_suffix: "^(?!.*_(cds|genemark|prodigal)_proteins\\.faa$).*proteins\\.faa$"
nmdc_suffix: _proteins.faa
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
Expand All @@ -146,7 +146,7 @@ Data Objects:
- data_object_type: Contig Mapping File
description: Contig mapping file for {id}
name: Contig mappings between old and new contig names
import_suffix: _contig_names_mapping.tsv
import_suffix: "_contig_names_mapping\\.tsv$"
nmdc_suffix: _contig_names_mapping.tsv
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
Expand All @@ -155,7 +155,7 @@ Data Objects:
- data_object_type: Structural Annotation GFF
description: Structural Annotation for {id}
name: GFF3 format file with structural annotations
import_suffix: _structural_annotation.gff
import_suffix: _structural_annotation\.gff$
nmdc_suffix: _structural_annotation.gff
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
Expand All @@ -164,7 +164,7 @@ Data Objects:
- data_object_type: Functional Annotation GFF
description: Functional Annotation for {id}
name: GFF3 format file with functional annotations
import_suffix: _functional_annotation.gff
import_suffix: _functional_annotation\.gff$
nmdc_suffix: _functional_annotation.gff
input_to: [nmdc:MetatranscriptomeExpressionAnalysis]
output_of: nmdc:MetatranscriptomeAnnotation
Expand All @@ -173,7 +173,7 @@ Data Objects:
- data_object_type: Annotation KEGG Orthology
description: KEGG Orthology for {id}
name: Tab delimited file for KO annotation
import_suffix: _ko.tsv
import_suffix: _ko\.tsv$
nmdc_suffix: _ko.tsv
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
Expand All @@ -182,7 +182,7 @@ Data Objects:
- data_object_type: Annotation Enzyme Commission
description: EC Annotations for {id}
name: Tab delimited file for EC annotation
import_suffix: _ec.tsv
import_suffix: _ec\.tsv$
nmdc_suffix: _ec.tsv
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
Expand All @@ -191,15 +191,15 @@ Data Objects:
- data_object_type Scaffold Lineage tsv
description: Scaffold Lineage tsv for {id}
name: Phylogeny at the scaffold level
import_suffix: _scaffold_lineage.tsv
import_suffix: _scaffold_lineage\.tsv$
nmdc_suffix: _scaffold_lineage.tsv
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
multiple: false
- data_object_type: Clusters of Orthologous Groups (COG) Annotation GFF
description: COGs for {id}
name: GFF3 format file with COGs
import_suffix: _cog.gff
import_suffix: _cog\.gff$
nmdc_suffix: _cog.gff
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
Expand All @@ -208,7 +208,7 @@ Data Objects:
- data_object_type: Pfam Annotation GFF
description: Pfam Annotation for {id}
name: GFF3 format file with Pfam
import_suffix: _pfam.gff
import_suffix: _pfam\.gff$
nmdc_suffix: _pfam.gff
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
Expand All @@ -217,7 +217,7 @@ Data Objects:
- data_object_type: TIGRFam Annotation GFF
description: TIGRFam for {id}
name: GFF3 format file with TIGRfam
import_suffix: _tigrfam.gff
import_suffix: _tigrfam\.gff$
nmdc_suffix: _tigrfam.gff
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
Expand All @@ -226,7 +226,7 @@ Data Objects:
- data_object_type: SMART Annotation GFF
description: SMART Annotations for {id}
name: GFF3 format file with SMART
import_suffix: _smart.gff
import_suffix: _smart\.gff$
nmdc_suffix: _smart.gff
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
Expand All @@ -235,7 +235,7 @@ Data Objects:
- data_object_type: SUPERFam Annotation GFF
description: SUPERFam Annotations for {id}
name: GFF3 format file with SUPERFam
import_suffix: _supfam.gff
import_suffix: _supfam\.gff$
nmdc_suffix: _supfam.gff
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
Expand All @@ -244,7 +244,7 @@ Data Objects:
- data_object_type: CATH FunFams (Functional Families) Annotation GFF
description: CATH FunFams for {id}
name: GFF3 format file with CATH FunFams
import_suffix: _cath_funfam.gff
import_suffix: _cath_funfam\.gff$
nmdc_suffix: _cath_funfam.gff
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
Expand All @@ -253,7 +253,7 @@ Data Objects:
- data_object_type: CRT Annotation GFF
description: CRT Annotations for {id}
name: GFF3 format file with CRT
import_suffix: _crt.gff
import_suffix: _crt\.gff$
nmdc_suffix: _crt.gff
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
Expand All @@ -262,7 +262,7 @@ Data Objects:
- data_object_type: Genemark Annotation GFF
description: Genemark Annotations for {id}
name: GFF3 format file with Genemark
import_suffix: _genemark.gff
import_suffix: _genemark\.gff$
nmdc_suffix: _genemark.gff
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
Expand All @@ -271,7 +271,7 @@ Data Objects:
- data_object_type: Prodigal Annotation GFF
description: Prodigal Annotations {id}
name: GFF3 format file with Prodigal
import_suffix: _prodigal.gff
import_suffix: _prodigal\.gff$
nmdc_suffix: _prodigal.gff
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
Expand All @@ -280,7 +280,7 @@ Data Objects:
- data_object_type: TRNA Annotation GFF
description: TRNA Annotations {id}
name: GFF3 format file with TRNA
import_suffix: _trna.gff
import_suffix: _trna\.gff$
nmdc_suffix: _trna.gff
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
Expand All @@ -289,7 +289,7 @@ Data Objects:
- data_object_type: RFAM Annotation GFF
description: RFAM Annotations for {id}
name: GFF3 format file with RFAM
import_suffix: _rfam.gff
import_suffix: _rfam\.gff$
nmdc_suffix: _rfam.gff
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
Expand All @@ -298,7 +298,7 @@ Data Objects:
- data_object_type: KO_EC Annotation GFF
description: KO_EC Annotations for {id}
name: GFF3 format file with KO_EC
import_suffix: _ko_ec.gff
import_suffix: _ko_ec\.gff$
nmdc_suffix: _ko_ec.gff
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
Expand All @@ -307,7 +307,7 @@ Data Objects:
- data_object_type: Product Names
description: Product names for {id}
name: Product names file
import_suffix: _product_names.tsv
import_suffix: _product_names\.tsv$
nmdc_suffix: _product_names.tsv
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
Expand All @@ -316,7 +316,7 @@ Data Objects:
- data_object_type: Gene Phylogeny tsv
description: Gene Phylogeny for {id}
name: Gene Phylogeny file
import_suffix: _gene_phylogeny.tsv
import_suffix: _gene_phylogeny\.tsv$
nmdc_suffix: _gene_phylogeny.tsv
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
Expand All @@ -325,7 +325,7 @@ Data Objects:
- data_object_type: Crispr Terms
description: Crispr Terms for {id}
name: Crispr Terms
import_suffix: _crt.crisprs
import_suffix: _crt\.crisprs$
nmdc_suffix: _crt.crisprs
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
Expand All @@ -334,7 +334,7 @@ Data Objects:
- data_object_type: Annotation Statistics
description: Annotation Stats for {id}
name: Annotation statistics report
import_suffix: _stats.tsv
import_suffix: _stats\.tsv$
nmdc_suffix: _stats.tsv
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
Expand All @@ -343,23 +343,23 @@ Data Objects:
- data_object_type: Annotation Info File
description: Annotation Info File for {id}
name: File containing annotation info
import_suffix: _imgap.info
import_suffix: _imgap\.info$
nmdc_suffix: _imgap.info
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
multiple: false
action: rename
- data_object_type: Assembly Contigs
description: Assembly contigs (remapped) for {id}
import_suffix: _contigs.fna
import_suffix: _contigs\.fna$
nmdc_suffix: _renamed_contigs.fna
input_to: []
output_of: nmdc:MetatranscriptomeAnnotation
multiple: false
- data_object_type: Filtered Sequencing Reads
description: Reads QC for {id}
name: Reads QC result fastq (clean data)
import_suffix: filter-MTF.fastq.gz
import_suffix: filter-MTF\.fastq\.gz$
nmdc_suffix: _filtered.fastq.gz
input_to: [nmdc:MetatranscriptomeAssembly]
output_of: nmdc:ReadQcAnalysis
Expand All @@ -368,7 +368,7 @@ Data Objects:
- data_object_type: rRNA Filtered Sequencing Reads
description: Reads QC rRNA reads file for {id}
name: Reads QC rRNA reads result fastq (clean data)
import_suffix: .rRNA.fastq.gz
import_suffix: \.rRNA\.fastq\.gz$
nmdc_suffix: _rRNA.fastq.gz
input_to: []
output_of: nmdc:ReadQcAnalysis
Expand All @@ -377,7 +377,7 @@ Data Objects:
- data_object_type: QC Statistics
description: Reads QC summary for {id}
name: Reads QC summary statistics
import_suffix: .filtered-report.txt
import_suffix: \.filtered-report\.txt$
nmdc_suffix: _filterStats.txt
input_to: []
output_of: nmdc:ReadQcAnalysis
Expand All @@ -386,7 +386,7 @@ Data Objects:
- data_object_type: Read Filtering Info File
description: Read Filtering Info File for {id}
name: File containing read filtering information
import_suffix: .filter_cmd-MTF.sh
import_suffix: \.filter_cmd-MTF\.sh$
nmdc_suffix: _readsQC.info
input_to: []
output_of: nmdc:ReadQcAnalysis
Expand All @@ -395,7 +395,7 @@ Data Objects:
- data_object_type: Assembly Contigs
description: Assembly contigs for {id}
name: Final assembly contigs fasta
import_suffix: assembly.contigs.fasta
import_suffix: assembly\.contigs\.fasta$
nmdc_suffix: _contigs.fna
input_to: [nmdc:MetatranscriptomeAnnotation]
output_of: nmdc:MetatranscriptomeAssembly
Expand All @@ -404,7 +404,7 @@ Data Objects:
- data_object_type: Assembly Info File
description: Assembly info file for {id}
name: File containing assembly information
import_suffix: README.txt
import_suffix: README\.txt$
nmdc_suffix: _metaAsm.info
input_to: []
output_of: nmdc:MetatranscriptomeAssembly
Expand All @@ -414,15 +414,15 @@ Data Objects:
description: Coverage Stats for {id}
name: Assembled contigs coverage information
import_suffix: pairedMapped_sorted.bam.cov
nmdc_suffix: _covstats.txt
nmdc_suffix: _covstats\.txt$
input_to: []
output_of: nmdc:MetatranscriptomeAssembly
multiple: false
action: rename
- data_object_type: Assembly Coverage BAM
description: Sorted Bam for {id}
name: Sorted bam file of reads mapping back to the final assembly
import_suffix: pairedMapped.bam.gz
import_suffix: pairedMapped\.bam\.gz$
nmdc_suffix: _pairedMapped_sorted.bam.gz
input_to: [nmdc:MetatranscriptomeExpressionAnalysis]
output_of: nmdc:MetatranscriptomeAssembly
Expand All @@ -431,7 +431,7 @@ Data Objects:
- data_object_type: BAI File
description: Alignment index file for {id}
name: BAM index file
import_suffix: _pairedMapped_sorted.bam.bai
import_suffix: _pairedMapped_sorted\.bam\.bai$
nmdc_suffix: _pairedMapped_sorted.bam.bai
input_to: []
output_of: nmdc:MetatranscriptomeAssembly
Expand All @@ -440,7 +440,7 @@ Data Objects:
- data_object_type: Metatranscriptome Expression
description: Expression counts for {id}
name: Expression counts file
import_suffix: .rnaseq_gea.txt
import_suffix: \.rnaseq_gea\.txt$
nmdc_suffix: _rnaseq_gea.txt
input_to: []
output_of: nmdc:MetatranscriptomeExpressionAnalysis
Expand All @@ -449,7 +449,7 @@ Data Objects:
- data_object_type: Metatranscriptome Expression Intergenic
description: Expression intergenic counts for {id}
name: Expression intergenic counts file
import_suffix: .rnaseq_gea.intergenic.txt
import_suffix: \.rnaseq_gea\.intergenic\.txt$
nmdc_suffix: _rnaseq_gea.intergenic.txt
input_to: []
output_of: nmdc:MetatranscriptomeExpressionAnalysis
Expand Down
Loading