Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update filenames #60

Merged
merged 13 commits into from
Mar 23, 2024
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
### Changed
- Update CI/CD workflow to use current image
- Update samtools depth default output options
- Update filenames to standardized format

## [v1.0.0-rc.2] - 2024-02-14
### Changed
Expand Down
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,12 +119,12 @@ A directed acyclic graph of your pipeline.
Output and Output Parameter/Flag | Description |
| ------------ | ------------------------ |
| `output_dir` | Location where generated output should be saved. |
| `.target-with-enriched-off-target-intervals.bed` | New target file including original target intervals and intervals encompassing coverage-enriched off-target dbSNP sites. |
|`.off-target-dbSNP_depth-per-base.bed`|Per-base read depth at dbSNP loci outside of targeted regions.|
| `.collapsed_coverage.bed` | Per-base read depth at specified target intervals, collapsed by interval. (OPTIONAL) Set `target_depth` in config file. |
|`.target-depth-per-base.bed`|Per-base read depth at target intervals (not collapsed). (OPTIONAL) set `save_raw_target_bed` in config file.|
|`.genome-wide-dbSNP_depth-per-base.bed`| Per-base read depth at all dbSNP loci. (OPTIONAL) Set `save_all_dbSNP` in config file.|
| `.HsMetrics.txt` | QC output from CollectHsMetrics()|
| `*target-with-enriched-off-target-intervals.bed` | New target file including original target intervals and intervals encompassing coverage-enriched off-target dbSNP sites. |
|`*off-target-dbSNP-depth-per-base.bed`|Per-base read depth at dbSNP loci outside of targeted regions.|
| `*collapsed_coverage.bed` | Per-base read depth at specified target intervals, collapsed by interval. (OPTIONAL) Set `target_depth` in config file. |
|`*target-depth-per-base.bed`|Per-base read depth at target intervals (not collapsed). (OPTIONAL) set `save_raw_target_bed` in config file.|
|`*genome-wide-dbSNP-depth-per-base.bed`| Per-base read depth at all dbSNP loci. (OPTIONAL) Set `save_all_dbSNP` in config file.|
| `*HsMetrics.txt` | QC output from CollectHsMetrics()|
| `.tsv`,`.bed` | Intermediate outputs of unformatted and unmerged depth files. (OPTIONAL) Set `save_intermediate_files` in config file. |
| `.interval_list` | Intermediate output of target bed file converted to picard's interval list format. (OPTIONAL) Set `save_interval_list` in config file. |
| `report.html`, `timeline.html` and `trace.txt` | A Nextflowreport, timeline and trace files |
Expand Down
8 changes: 4 additions & 4 deletions docs/calculate-targeted-coverage-flow.puml
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@ rectangle "Target-Focused Workflow" {
rectangle "Convert to BED" as ConvtoBED1 <<ProcessStep>>
rectangle "Calculate Coverage Metrics" as CalcCovMetrics <<ProcessStep>>
' output files
rectangle ".target-depth-per-base.bed" as TargetReadDepthBED <<OutputFile>>
rectangle "~*target-depth-per-base.bed" as TargetReadDepthBED <<OutputFile>>
' QC files
rectangle ".HSMetrics.txt" as HSMetrics <<QCFile>>
rectangle "~*HSMetrics.txt" as HSMetrics <<QCFile>>
}


Expand All @@ -50,8 +50,8 @@ rectangle "Off-Target Workflow" {
rectangle "Add Slop to Target Intervals" as AddSlopTarget <<ProcessStep>>

' output files
rectangle ".target-with-enriched-off-target-intervals.bed" as TargetPlusOffTargetReadDepthBED <<OutputFile>>
rectangle ".off-target-dbSNP_depth-per-base.bed" as OffTargetReadDepth <<OutputFile>>
rectangle "~*target-with-enriched-off-target-intervals.bed" as TargetPlusOffTargetReadDepthBED <<OutputFile>>
rectangle "~*off-target-dbSNP-depth-per-base.bed" as OffTargetReadDepth <<OutputFile>>

' end node
rectangle "recalibrate-BAM" as RecalBAM #White
Expand Down
94 changes: 47 additions & 47 deletions docs/calculate-targeted-coverage-flow.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 12 additions & 1 deletion module/depth_to_bed.nf
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
include { generate_standard_filename } from '../external/pipeline-Nextflow-module/modules/common/generate_standardized_filename/main.nf'
/*
* Module/process description here
*
Expand Down Expand Up @@ -32,12 +33,22 @@ process convert_depth_to_bed {
path ".command.*"

script:

output_filename = generate_standard_filename(
"SAMtools-${params.samtools_version}",
params.dataset_id,
params.sample_id,
[
'additional_information': "${tag}-depth-per-base.bed"
]
)

"""
set -euo pipefail

cat ${input_tsv} | \
awk 'BEGIN {OFS="\t"} {chr = \$1; start=\$2-1; stop=\$2; depth=\$3; print chr,start,stop,depth}' \
| sort -k1,1 -k2,2n \
> ${params.sample_id}.${tag}_depth-per-base.bed
> ${output_filename}
"""
}
38 changes: 34 additions & 4 deletions module/filter_off_target_depth.nf
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@

include { generate_standard_filename } from '../external/pipeline-Nextflow-module/modules/common/generate_standardized_filename/main.nf'
/*
* Filter for sites with read depth above a minimum threshold.
* Important for excluding near-target regions from off-target calculations.
Expand Down Expand Up @@ -28,14 +28,24 @@ process run_depth_filter {
path ".command.*"

script:

output_filename = generate_standard_filename(
"SAMtools-${params.samtools_version}",
params.dataset_id,
params.sample_id,
[
'additional_information': "depth-filtered.bed"
]
)

"""
set -euo pipefail

awk \
-v min_depth="${params.min_read_depth}" \
'\$4 >= min_depth' \
${input} \
> ${params.sample_id}.depth-filtered.bed
> ${output_filename}
"""
}

Expand Down Expand Up @@ -73,6 +83,16 @@ process run_slop_BEDtools {
path ".command.*"

script:

output_filename = generate_standard_filename(
"BEDtools-${params.bedtools_version}",
params.dataset_id,
params.sample_id,
[
'additional_information': "${tag}_slop-${slop}.bed"
]
)

"""
set -euo pipefail

Expand All @@ -81,7 +101,7 @@ process run_slop_BEDtools {
-i ${target_bed} \
-g ${genome_sizes} \
-b ${slop} \
> ${params.sample_id}.${tag}_slop-${slop}.bed
> ${output_filename}
"""
}

Expand Down Expand Up @@ -113,6 +133,16 @@ process run_intersect_BEDtools {
path ".command.*"

script:

output_filename = generate_standard_filename(
"BEDtools-${params.bedtools_version}",
params.dataset_id,
params.sample_id,
[
'additional_information': "off-target-dbSNP_depth-per-base.bed"
]
)

"""
set -euo pipefail

Expand All @@ -121,6 +151,6 @@ process run_intersect_BEDtools {
-a ${off_target_bed} \
-b ${target_bed} \
-v \
> ${params.sample_id}.off-target-dbSNP_depth-per-base.bed
> ${output_filename}
"""
}
11 changes: 10 additions & 1 deletion module/get_depth_samtools.nf
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
include { generate_standard_filename } from '../external/pipeline-Nextflow-module/modules/common/generate_standardized_filename/main.nf'
/*
* Module/process description here
*
Expand Down Expand Up @@ -28,6 +29,14 @@ process run_depth_SAMtools {
path ".command.*"

script:
output_filename = generate_standard_filename(
"SAMtools-${params.samtools_version}",
params.dataset_id,
params.sample_id,
[
'additional_information': "${tag}-depth-per-base.tsv"
]
)
"""
set -euo pipefail

Expand All @@ -38,7 +47,7 @@ process run_depth_SAMtools {
-aa \
--min-BQ ${params.min_base_quality} \
--min-MQ ${params.min_mapping_quality} \
-o ${params.sample_id}.${tag}.depth_per_base.tsv \
-o ${output_filename} \
${params.samtools_depth_extra_args}
"""
}
13 changes: 12 additions & 1 deletion module/merge_bedfiles_bedtools.nf
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
include { generate_standard_filename } from '../external/pipeline-Nextflow-module/modules/common/generate_standardized_filename/main.nf'
/*
* Module/process description here
*
Expand Down Expand Up @@ -26,13 +27,23 @@ process merge_bedfiles_BEDtools {
path ".command.*"

script:

output_filename = generate_standard_filename(
"BEDtools-${params.bedtools_version}",
params.dataset_id,
params.sample_id,
[
'additional_information': "target-with-enriched-off-target-intervals.bed"
]
)

"""
set -euo pipefail

cat ${target_bed} ${off_target_bed} | \
sort -k1,1 -k2,2n | \
awk '{OFS = "\t"}{print \$1, \$2, \$3}' | \
bedtools merge \
> ${params.sample_id}.target_with_enriched_off-target_intervals.bed
> ${output_filename}
"""
}
13 changes: 12 additions & 1 deletion module/merge_intervals_bedtools.nf
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
include { generate_standard_filename } from '../external/pipeline-Nextflow-module/modules/common/generate_standardized_filename/main.nf'
/*
* Module/process description here
*
Expand Down Expand Up @@ -25,6 +26,16 @@ process run_merge_BEDtools {
path ".command.*"

script:

output_filename = generate_standard_filename(
"BEDtools-${params.bedtools_version}",
params.dataset_id,
params.sample_id,
[
'additional_information': "collapsed-coverage.bed"
]
)

"""
set -euo pipefail

Expand All @@ -33,6 +44,6 @@ process run_merge_BEDtools {
-i ${input_depth_bed} \
-c 4 \
-o ${params.merge_operation} \
> ${params.sample_id}.collapsed_coverage.bed
> ${output_filename}
"""
}
27 changes: 25 additions & 2 deletions module/run_HS_metrics.nf
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
include { generate_standard_filename } from '../external/pipeline-Nextflow-module/modules/common/generate_standardized_filename/main.nf'
/*
* Module/process description here
*
Expand Down Expand Up @@ -34,14 +35,26 @@ process run_BedToIntervalList_picard {
path ".command.*"

script:

output_filename_base = generate_standard_filename(
"Picard-${params.picard_version}",
params.dataset_id,
params.sample_id,
[
'additional_information': "${tag}"
]
)

output_filename = "${output_filename_base}.interval_list"

"""
set -euo pipefail

java \"-Xmx${(task.memory - params.gatk_command_mem_diff).getMega()}m\" \
-jar /usr/local/share/picard-slim-${params.picard_version}-0/picard.jar \
BedToIntervalList \
--INPUT $input_bed \
--OUTPUT ${params.sample_id}.${tag}.interval_list \
--OUTPUT ${output_filename} \
--SEQUENCE_DICTIONARY $reference_dict \
--SORT false
"""
Expand Down Expand Up @@ -70,6 +83,16 @@ process run_CollectHsMetrics_picard {
path ".command.*"

script:

output_filename = generate_standard_filename(
"Picard-${params.picard_version}",
params.dataset_id,
params.sample_id,
[
'additional_information': "HsMetrics.txt"
]
)

"""
set -euo pipefail

Expand All @@ -79,7 +102,7 @@ process run_CollectHsMetrics_picard {
--BAIT_INTERVALS $bait_interval_list \
--INPUT $input_bam \
--TARGET_INTERVALS $target_interval_list \
--OUTPUT ${params.sample_id}.HsMetrics.txt \
--OUTPUT ${output_filename} \
--COVERAGE_CAP ${params.coverage_cap} \
${params.picard_CollectHsMetrics_extra_args} \
--NEAR_DISTANCE ${params.near_distance} \
Expand Down
15 changes: 3 additions & 12 deletions nftest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,21 +12,12 @@ cases:
skip: false
verbose: true
asserts:
- actual: calculate-targeted-coverage-*/TWGSAMIN000001-T002-S02-F/SAMtools-*/output/TWGSAMIN000001-T002-S02-F.collapsed_coverage.bed
- actual: calculate-targeted-coverage-*/TWGSAMIN000001-T002-S02-F/SAMtools-*/output/BEDtools-*_TWGSAMIN000001_TWGSAMIN000001-T002-S02-F_collapsed-coverage.bed
expect: /hot/software/pipeline/pipeline-calculate-targeted-coverage/Nextflow/development/output/TWGSAMIN000001-T002-S02-F.collapsed_coverage.bed
method: md5
- actual: calculate-targeted-coverage-*/TWGSAMIN000001-T002-S02-F/SAMtools-*/output/TWGSAMIN000001-T002-S02-F.collapsed_coverage.bed.sha512
expect: /hot/software/pipeline/pipeline-calculate-targeted-coverage/Nextflow/development/output/TWGSAMIN000001-T002-S02-F.collapsed_coverage.bed.sha512
method: md5
- actual: calculate-targeted-coverage-*/TWGSAMIN000001-T002-S02-F/SAMtools-*/output/TWGSAMIN000001-T002-S02-F.off-target-dbSNP_depth-per-base.bed
- actual: calculate-targeted-coverage-*/TWGSAMIN000001-T002-S02-F/SAMtools-*/output/BEDtools-*_TWGSAMIN000001_TWGSAMIN000001-T002-S02-F_off-target-dbSNP-depth-per-base.bed
expect: /hot/software/pipeline/pipeline-calculate-targeted-coverage/Nextflow/development/output/TWGSAMIN000001-T002-S02-F.off-target-dbSNP_depth-per-base.bed
method: md5
- actual: calculate-targeted-coverage-*/TWGSAMIN000001-T002-S02-F/SAMtools-*/output/TWGSAMIN000001-T002-S02-F.off-target-dbSNP_depth-per-base.bed.sha512
expect: /hot/software/pipeline/pipeline-calculate-targeted-coverage/Nextflow/development/output/TWGSAMIN000001-T002-S02-F.off-target-dbSNP_depth-per-base.bed.sha512
method: md5
- actual: calculate-targeted-coverage-*/TWGSAMIN000001-T002-S02-F/SAMtools-*/output/TWGSAMIN000001-T002-S02-F.target_with_enriched_off-target_intervals.bed
- actual: calculate-targeted-coverage-*/TWGSAMIN000001-T002-S02-F/SAMtools-*/output/BEDtools-*_TWGSAMIN000001_TWGSAMIN000001-T002-S02-F_target-with-enriched-off-target-intervals.bed
expect: /hot/software/pipeline/pipeline-calculate-targeted-coverage/Nextflow/development/output/TWGSAMIN000001-T002-S02-F.target_with_enriched_off-target_intervals.bed
method: md5
- actual: calculate-targeted-coverage-*/TWGSAMIN000001-T002-S02-F/SAMtools-*/output/TWGSAMIN000001-T002-S02-F.target_with_enriched_off-target_intervals.bed.sha512
expect: /hot/software/pipeline/pipeline-calculate-targeted-coverage/Nextflow/development/output/TWGSAMIN000001-T002-S02-F.target_with_enriched_off-target_intervals.bed.sha512
method: md5
Loading
Loading