dragen-transcriptome-pipeline/4.2.4__20240803074622
Overview
MD5Sum: c142f00004a02ac7d7247c0767ee9ff2
Documentation
Documentation for dragen-transcriptome-pipeline v4.2.4
Dockstore
ICAv2
Tenant: umccr-prod
Bundles Generated
Bundle Name: dragen_transcriptome_pipeline_with_validation_data__4_2_4__20240803074622 / Bundle Version v9_r3__20240803074622
Description
This bundle has been generated by the release of workflows/dragen-transcriptome-pipeline/4.2.4/dragen-transcriptome-pipeline__4.2.4.cwl. The pipeline can be found at https://github.com/umccr/cwl-ica/releases/tag/dragen-transcriptome-pipeline/4.2.4__20240803074622.
Version Description
Bundle version description is currently redundant while we cannot append versions to bundles. Regardless - the bunch version is v9_r3
Bundle ID: bed469f2-06f0-4f20-a03f-d6ed18dd4ab7
- Bundle Link
Pipeline Project ID: 5844391a-69db-4b52-86b5-6a0d55c2386f
Pipeline Project Name: pipelines
Pipeline ID: 66c89437-ec33-4138-8a92-9c018ee533af
Pipeline Code: dragen-transcriptome-pipeline__4_2_4__20240803074622
Projects
- development
- staging
Datasets
- dragen_hash_table_v9_r3_alt_masked_cnv_hla_rna
- hg38_fasta
- arriba_2_4_0
- hg38_v39_gencode_annotation
- wts_validation_fastq__SBJ00480
- wts_validation_fastq__SBJ00028
- wts_validation_fastq__SBJ00061
- wts_validation_fastq__SBJ00188
- wts_validation_fastq__SBJ00199
- wts_validation_fastq__SBJ00236
- wts_validation_fastq__SBJ00238
- wts_multiqc__2023_07_21__4_2_4__Ref_1_Good__SBJ01563
- wts_multiqc__2023_07_21__4_2_4__Ref_2_Good__SBJ01147
- wts_multiqc__2023_07_21__4_2_4__Ref_3_Good__SBJ01620
- wts_multiqc__2023_07_21__4_2_4__Ref_4_Bad__SBJ01286
- wts_multiqc__2023_07_21__4_2_4__Ref_5_Bad__SBJ01673
Bundle Name: dragen_transcriptome_pipeline_prod__4_2_4__20240803074622 / Bundle Version v9_r3__20240803074622
Description
This bundle has been generated by the release of workflows/dragen-transcriptome-pipeline/4.2.4/dragen-transcriptome-pipeline__4.2.4.cwl. The pipeline can be found at https://github.com/umccr/cwl-ica/releases/tag/dragen-transcriptome-pipeline/4.2.4__20240803074622.
Version Description
Bundle version description is currently redundant while we cannot append versions to bundles. Regardless - the bunch version is v9_r3
Bundle ID: b7fa9d82-907d-43f0-9062-ba684d951a5f
- Bundle Link
Pipeline Project ID: 5844391a-69db-4b52-86b5-6a0d55c2386f
Pipeline Project Name: pipelines
Pipeline ID: 66c89437-ec33-4138-8a92-9c018ee533af
Pipeline Code: dragen-transcriptome-pipeline__4_2_4__20240803074622
Projects
- production
Datasets
- dragen_hash_table_v9_r3_alt_masked_cnv_hla_rna
- hg38_fasta
- arriba_2_4_0
- hg38_v39_gencode_annotation
- wts_multiqc__2023_07_21__4_2_4__Ref_1_Good__SBJ01563
- wts_multiqc__2023_07_21__4_2_4__Ref_2_Good__SBJ01147
- wts_multiqc__2023_07_21__4_2_4__Ref_3_Good__SBJ01620
- wts_multiqc__2023_07_21__4_2_4__Ref_4_Bad__SBJ01286
- wts_multiqc__2023_07_21__4_2_4__Ref_5_Bad__SBJ01673
Visual Overview
Inputs Template
Yaml
Click to expand!
# yaml-language-server: $schema=https://github.com/umccr/cwl-ica/releases/download/dragen-transcriptome-pipeline%2F4.2.4__20240803074622/dragen-transcriptome-pipeline__4.2.4__20240803074622.schema.json
# algorithm (Optional)
# Default value: proportional
# Docs: Counting algorithm:
# uniquely-mapped-reads(default) or proportional.
algorithm: "proportional"
# annotation file (Required)
# Docs: Path to annotation transcript file.
annotation_file:
class: File
location: icav2://project_id/path/to/file
# bam input (Optional)
# Docs: Input a BAM file for WTS analysis
bam_input:
class: File
location: icav2://project_id/path/to/file
# blacklist (Required)
# Docs: File with blacklist range
blacklist:
class: File
location: icav2://project_id/path/to/file
# cl config (Optional)
# Docs: command line config to supply additional config values on the command line.
cl_config: string
# contigs (Optional)
# Docs: Optional - List of interesting contigs
# If not specified, defaults to 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y
contigs: string
# cytobands (Required)
# Docs: Coordinates of the Giemsa staining bands.
cytobands:
class: File
location: icav2://project_id/path/to/file
# enable duplicate marking (Required)
# Docs: Mark identical alignments as duplicates
enable_duplicate_marking: false
# enable map align (Optional)
# Docs: Enabled by default.
# Set this value to false if using bam_input AND tumor_bam_input
enable_map_align: false
# enable map align output (Required)
# Docs: Do you wish to have the output bam files present
enable_map_align_output: false
# enable rna gene fusion (Optional)
# Docs: Optional - Enable the DRAGEN Gene Fusion module - defaults to true
enable_rna_gene_fusion: false
# enable rna quantification (Optional)
# Docs: Optional - Enable the quantification module - defaults to true
enable_rna_quantification: false
# enable sort (Optional)
# Docs: True by default, only set this to false if using --bam-input as input parameter
enable_sort: false
# fastq list (Optional)
# Docs: CSV file that contains a list of FASTQ files
# to process. read_1 and read_2 components in the CSV file must be presigned urls.
fastq_list:
class: File
location: icav2://project_id/path/to/file
# Row of fastq lists (Optional)
# Docs: The row of fastq lists.
# Each row has the following attributes:
# * RGID
# * RGLB
# * RGSM
# * Lane
# * Read1File
# * Read2File (optional)
fastq_list_rows:
- rgid: string
rglb: string
rgsm: string
lane: string
read_1:
class: File
location: icav2://project_id/path/to/file
read_2:
class: File
location: icav2://project_id/path/to/file
# java mem (Optional)
# Default value: 20G
# Docs: Set desired Java heap memory size
java_mem: "20G"
# license instance id location (Optional)
# Docs: You may wish to place your own in.
# Optional value, default set to /opt/instance-identity
# which is a path inside the dragen container
lic_instance_id_location:
class: File
location: icav2://project_id/path/to/file
# output directory (Required)
# Docs: The directory where all output files are placed
output_directory: string
# output directory name arriba (Optional)
# Default value: arriba
# Docs: Name of the directory to collect arriba outputs in.
output_directory_name_arriba: "arriba"
# output file prefix (Required)
# Docs: The prefix given to all output files
output_file_prefix: string
# protein domains (Required)
# Docs: GFF3 file containing the genomic coordinates of protein domains.
protein_domains:
class: File
location: icav2://project_id/path/to/file
# qc reference samples (Required)
# Docs: Reference samples for multiQC report
qc_reference_samples:
- class: Directory
location: icav2://project_id/path/to/dir/
# read trimming (Optional)
# Docs: To enable trimming filters in hard-trimming mode, set to a comma-separated list of the trimmer tools
# you would like to use. To disable trimming, set to none. During mapping, artifacts are removed from all reads.
# Read trimming is disabled by default.
read_trimmers: string
# reference Fasta (Required)
# Docs: FastA file with genome sequence
reference_fasta:
class: File
location: icav2://project_id/path/to/file
# reference tar (Required)
# Docs: Path to ref data tarball
reference_tar:
class: File
location: icav2://project_id/path/to/file
# soft read trimming (Optional)
# Docs: To enable trimming filters in soft-trimming mode, set to a comma-separated list of the trimmer tools
# you would like to use. To disable soft trimming, set to none. During mapping, reads are aligned as if trimmed,
# and bases are not removed from the reads. Soft-trimming is enabled for the polyg filter by default.
soft_read_trimmers: string
# trim adapter r1 5prime (Optional)
# Docs: Specify the FASTA file that contains adapter sequences to trim from the 5' end of Read 1.
# NB: the sequences should be in reverse order (with respect to their appearance in the FASTQ) but not complemented.
trim_adapter_r1_5prime:
class: File
location: icav2://project_id/path/to/file
# trim adapter read1 (Optional)
# Docs: Specify the FASTA file that contains adapter sequences to trim from the 3' end of Read 1.
trim_adapter_read1:
class: File
location: icav2://project_id/path/to/file
# trim adapter read2 (Optional)
# Docs: Specify the FASTA file that contains adapter sequences to trim from the 3' end of Read 2.
trim_adapter_read2:
class: File
location: icav2://project_id/path/to/file
# trim adapter stringency (Optional)
# Docs: Specify the minimum number of adapter bases required for trimming
trim_adapter_stringency: string
# trim adapter r2 5prime (Optional)
# Docs: Specify the FASTA file that contains adapter sequences to trim from the 5' end of Read 2.
# NB: the sequences should be in reverse order (with respect to their appearance in the FASTQ) but not complemented.
trim_dapter_r2_5prime:
class: File
location: icav2://project_id/path/to/file
# trim r1 3prime (Optional)
# Docs: Specify the minimum number of bases to trim from the 3' end of Read 1 (default: 0).
trim_r1_3prime: string
# trim r1 5prime (Optional)
# Docs: Specify the minimum number of bases to trim from the 5' end of Read 1 (default: 0).
trim_r1_5prime: string
# trim r2 3prime (Optional)
# Docs: Specify the minimum number of bases to trim from the 3' end of Read 2 (default: 0).
trim_r2_3prime: string
# trim r2 5prime (Optional)
# Docs: Specify the minimum number of bases to trim from the 5' end of Read 2 (default: 0).
trim_r2_5prime: string
Json
Click to expand!
{
"algorithm": "proportional",
"annotation_file": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"bam_input": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"blacklist": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"cl_config": "string",
"contigs": "string",
"cytobands": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"enable_duplicate_marking": false,
"enable_map_align": false,
"enable_map_align_output": false,
"enable_rna_gene_fusion": false,
"enable_rna_quantification": false,
"enable_sort": false,
"fastq_list": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"fastq_list_rows": [
{
"rgid": "string",
"rglb": "string",
"rgsm": "string",
"lane": "string",
"read_1": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"read_2": {
"class": "File",
"location": "icav2://project_id/path/to/file"
}
}
],
"java_mem": "20G",
"lic_instance_id_location": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"output_directory": "string",
"output_directory_name_arriba": "arriba",
"output_file_prefix": "string",
"protein_domains": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"qc_reference_samples": [
{
"class": "Directory",
"location": "icav2://project_id/path/to/dir/"
}
],
"read_trimmers": "string",
"reference_fasta": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"reference_tar": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"soft_read_trimmers": "string",
"trim_adapter_r1_5prime": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"trim_adapter_read1": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"trim_adapter_read2": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"trim_adapter_stringency": "string",
"trim_dapter_r2_5prime": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"trim_r1_3prime": "string",
"trim_r1_5prime": "string",
"trim_r2_3prime": "string",
"trim_r2_5prime": "string"
}
Outputs Template
Click to expand!
{
"arriba_output_directory": {
"class": "Directory",
"location": "icav2://project_id/path/to/dir/"
},
"dragen_transcriptome_output_directory": {
"class": "Directory",
"location": "icav2://project_id/path/to/dir/"
},
"multiqc_output_directory": {
"class": "Directory",
"location": "icav2://project_id/path/to/dir/"
},
"qualimap_output_directory": {
"class": "Directory",
"location": "icav2://project_id/path/to/dir/"
}
}
Overrides Template
Zipped workflow
Click to expand!
[
"workflow.cwl#dragen-transcriptome-pipeline--4.2.4/arriba_drawing_step",
"workflow.cwl#dragen-transcriptome-pipeline--4.2.4/arriba_fusion_step",
"workflow.cwl#dragen-transcriptome-pipeline--4.2.4/create_arriba_output_directory",
"workflow.cwl#dragen-transcriptome-pipeline--4.2.4/create_dummy_file_step",
"workflow.cwl#dragen-transcriptome-pipeline--4.2.4/dragen_qc_step",
"workflow.cwl#dragen-transcriptome-pipeline--4.2.4/run_dragen_transcriptome_step",
"workflow.cwl#dragen-transcriptome-pipeline--4.2.4/run_qualimap_step"
]
Packed workflow
Click to expand!
[
"#main/arriba_drawing_step",
"#main/arriba_fusion_step",
"#main/create_arriba_output_directory",
"#main/create_dummy_file_step",
"#main/dragen_qc_step",
"#main/run_dragen_transcriptome_step",
"#main/run_qualimap_step"
]
Inputs
Click to expand!
algorithm
ID: algorithm
Optional: True
Type: string
Docs:
Counting algorithm:
uniquely-mapped-reads(default) or proportional.
annotation file
ID: annotation_file
Optional: False
Type: File
Docs:
Path to annotation transcript file.
bam input
ID: bam_input
Optional: True
Type: File
Docs:
Input a BAM file for WTS analysis
blacklist
ID: blacklist
Optional: False
Type: File
Docs:
File with blacklist range
cl config
ID: cl_config
Optional: True
Type: string
Docs:
command line config to supply additional config values on the command line.
contigs
ID: contigs
Optional: True
Type: string
Docs:
Optional - List of interesting contigs
If not specified, defaults to 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y
cytobands
ID: cytobands
Optional: False
Type: File
Docs:
Coordinates of the Giemsa staining bands.
enable duplicate marking
ID: enable_duplicate_marking
Optional: False
Type: boolean
Docs:
Mark identical alignments as duplicates
enable map align
ID: enable_map_align
Optional: True
Type: boolean
Docs:
Enabled by default.
Set this value to false if using bam_input AND tumor_bam_input
enable map align output
ID: enable_map_align_output
Optional: False
Type: boolean
Docs:
Do you wish to have the output bam files present
enable rna gene fusion
ID: enable_rna_gene_fusion
Optional: True
Type: boolean
Docs:
Optional - Enable the DRAGEN Gene Fusion module - defaults to true
enable rna quantification
ID: enable_rna_quantification
Optional: True
Type: boolean
Docs:
Optional - Enable the quantification module - defaults to true
enable sort
ID: enable_sort
Optional: True
Type: boolean
Docs:
True by default, only set this to false if using --bam-input as input parameter
fastq list
ID: fastq_list
Optional: True
Type: File
Docs:
CSV file that contains a list of FASTQ files
to process. read_1 and read_2 components in the CSV file must be presigned urls.
Row of fastq lists
ID: fastq_list_rows
Optional: True
Type: fastq-list-row[]
Docs:
The row of fastq lists.
Each row has the following attributes:
- RGID
- RGLB
- RGSM
- Lane
- Read1File
- Read2File (optional)
java mem
ID: java_mem
Optional: False
Type: string
Docs:
Set desired Java heap memory size
license instance id location
ID: lic_instance_id_location
Optional: True
Type: ['File', 'string']
Docs:
You may wish to place your own in.
Optional value, default set to /opt/instance-identity
which is a path inside the dragen container
output directory
ID: output_directory
Optional: False
Type: string
Docs:
The directory where all output files are placed
output directory name arriba
ID: output_directory_name_arriba
Optional: True
Type: string
Docs:
Name of the directory to collect arriba outputs in.
output file prefix
ID: output_file_prefix
Optional: False
Type: string
Docs:
The prefix given to all output files
protein domains
ID: protein_domains
Optional: False
Type: File
Docs:
GFF3 file containing the genomic coordinates of protein domains.
qc reference samples
ID: qc_reference_samples
Optional: False
Type: .[]
Docs:
Reference samples for multiQC report
read trimming
ID: read_trimmers
Optional: True
Type: string
Docs:
To enable trimming filters in hard-trimming mode, set to a comma-separated list of the trimmer tools
you would like to use. To disable trimming, set to none. During mapping, artifacts are removed from all reads.
Read trimming is disabled by default.
reference Fasta
ID: reference_fasta
Optional: False
Type: File
Docs:
FastA file with genome sequence
reference tar
ID: reference_tar
Optional: False
Type: File
Docs:
Path to ref data tarball
soft read trimming
ID: soft_read_trimmers
Optional: True
Type: string
Docs:
To enable trimming filters in soft-trimming mode, set to a comma-separated list of the trimmer tools
you would like to use. To disable soft trimming, set to none. During mapping, reads are aligned as if trimmed,
and bases are not removed from the reads. Soft-trimming is enabled for the polyg filter by default.
trim adapter r1 5prime
ID: trim_adapter_r1_5prime
Optional: True
Type: File
Docs:
Specify the FASTA file that contains adapter sequences to trim from the 5' end of Read 1.
NB: the sequences should be in reverse order (with respect to their appearance in the FASTQ) but not complemented.
trim adapter read1
ID: trim_adapter_read1
Optional: True
Type: File
Docs:
Specify the FASTA file that contains adapter sequences to trim from the 3' end of Read 1.
trim adapter read2
ID: trim_adapter_read2
Optional: True
Type: File
Docs:
Specify the FASTA file that contains adapter sequences to trim from the 3' end of Read 2.
trim adapter stringency
ID: trim_adapter_stringency
Optional: True
Type: int
Docs:
Specify the minimum number of adapter bases required for trimming
trim adapter r2 5prime
ID: trim_dapter_r2_5prime
Optional: True
Type: File
Docs:
Specify the FASTA file that contains adapter sequences to trim from the 5' end of Read 2.
NB: the sequences should be in reverse order (with respect to their appearance in the FASTQ) but not complemented.
trim r1 3prime
ID: trim_r1_3prime
Optional: True
Type: int
Docs:
Specify the minimum number of bases to trim from the 3' end of Read 1 (default: 0).
trim r1 5prime
ID: trim_r1_5prime
Optional: True
Type: int
Docs:
Specify the minimum number of bases to trim from the 5' end of Read 1 (default: 0).
trim r2 3prime
ID: trim_r2_3prime
Optional: True
Type: int
Docs:
Specify the minimum number of bases to trim from the 3' end of Read 2 (default: 0).
trim r2 5prime
ID: trim_r2_5prime
Optional: True
Type: int
Docs:
Specify the minimum number of bases to trim from the 5' end of Read 2 (default: 0).
Steps
Click to expand!
arriba drawing step
ID: dragen-transcriptome-pipeline--4.2.4/arriba_drawing_step
Step Type: tool
Docs:
Run Arriba drawing script for fusions predicted by previous step.
arriba fusion step
ID: dragen-transcriptome-pipeline--4.2.4/arriba_fusion_step
Step Type: tool
Docs:
Runs Arriba fusion calling on the bam file produced by Dragen.
create arriba output directory
ID: dragen-transcriptome-pipeline--4.2.4/create_arriba_output_directory
Step Type: tool
Docs:
Create an output directory to contain the arriba files
Create dummy file
ID: dragen-transcriptome-pipeline--4.2.4/create_dummy_file_step
Step Type: tool
Docs:
Intermediate step for letting multiqc-interop be placed in stream mode
dragen qc step
ID: dragen-transcriptome-pipeline--4.2.4/dragen_qc_step
Step Type: tool
Docs:
The dragen qc step - this takes in an array of dirs
run dragen transcriptome step
ID: dragen-transcriptome-pipeline--4.2.4/run_dragen_transcriptome_step
Step Type: tool
Docs:
Runs the dragen transcriptome workflow on the FPGA.
Takes in a fastq list and corresponding mount paths from the predefined_mount_paths.
All other options avaiable at the top of the workflow
run qualimap step
ID: dragen-transcriptome-pipeline--4.2.4/run_qualimap_step
Step Type: tool
Docs:
Run qualimap step to generate additional QC metrics
Outputs
Click to expand!
arriba output directory
ID: dragen-transcriptome-pipeline--4.2.4/arriba_output_directory
Optional: False
Output Type: Directory
Docs:
The directory containing output files from arriba
dragen transcriptome output directory
ID: dragen-transcriptome-pipeline--4.2.4/dragen_transcriptome_output_directory
Optional: False
Output Type: Directory
Docs:
The output directory containing all transcriptome output files
multiqc output directory
ID: dragen-transcriptome-pipeline--4.2.4/multiqc_output_directory
Optional: False
Output Type: Directory
Docs:
The output directory for multiqc
dragen transcriptome output directory
ID: dragen-transcriptome-pipeline--4.2.4/qualimap_output_directory
Optional: False
Output Type: Directory
Docs:
The output directory containing all transcriptome output files