Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sfitz by readgroup and add FastQC #62

Merged
merged 46 commits into from
Jul 19, 2024
Merged
Show file tree
Hide file tree
Changes from 42 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
d0d3a7b
process by readgroup in progress
sorelfitzgibbon Apr 19, 2024
adb0591
standardize algorithms to algorithm
sorelfitzgibbon Jun 10, 2024
ad00e21
samtools stats by readgroup
sorelfitzgibbon Jun 11, 2024
046ce02
update changelog
sorelfitzgibbon Jun 12, 2024
07b5458
fix mislabel
sorelfitzgibbon Jun 12, 2024
eb422cc
revert unintentional resource changes
sorelfitzgibbon Jun 12, 2024
ae43ccd
add fastqc
sorelfitzgibbon Jun 5, 2024
68cf4b4
add fastqc module
sorelfitzgibbon Jun 5, 2024
b738e51
use process_afterscript
sorelfitzgibbon Jun 5, 2024
27f1581
update changelog
sorelfitzgibbon Jun 5, 2024
1c922f9
update nftest for fastqc
sorelfitzgibbon Jun 5, 2024
dff3a7a
fix nftest path
sorelfitzgibbon Jun 5, 2024
dbf3a01
merge with sfitz-by-readgroup complete and tested
sorelfitzgibbon Jun 12, 2024
9fd01e9
use fastqc docker with samtools
sorelfitzgibbon Jun 13, 2024
c5407f8
fastqc by readgroup
sorelfitzgibbon Jun 13, 2024
8653bbc
nftest paths updated
sorelfitzgibbon Jun 13, 2024
20085c3
refactor channels
sorelfitzgibbon Jun 14, 2024
b34451e
add hg003 to NFTest
sorelfitzgibbon Jun 14, 2024
d049c14
update samtools
sorelfitzgibbon May 29, 2024
53632de
pull main - samtools update
sorelfitzgibbon Jun 5, 2024
19b68a2
use fastqc docker with samtools
sorelfitzgibbon Jun 13, 2024
0e2c3df
add fastqc resource allocations
sorelfitzgibbon Jun 14, 2024
f95d47e
add fastqc threading and adjust resources
sorelfitzgibbon Jun 16, 2024
da059b8
add fastqc
sorelfitzgibbon Jun 5, 2024
8f92e36
add fastqc to metadata.yaml
sorelfitzgibbon Jun 17, 2024
cc38b86
Bump the pipeline-submodules group with 2 updates
dependabot[bot] Jun 15, 2024
9c9e207
Merge branch 'main' into sfitz-by-readgroup
sorelfitzgibbon Jun 20, 2024
198b40c
add final newline
sorelfitzgibbon Jun 20, 2024
91bf337
update readme
sorelfitzgibbon Jun 24, 2024
4e2b724
update nftest.yml
sorelfitzgibbon Jun 24, 2024
01b50bf
parameterize fastqc and add max gps to stats
sorelfitzgibbon Jun 25, 2024
e61b1ac
sanitize library ID
sorelfitzgibbon Jun 25, 2024
e6d0200
update test configs
sorelfitzgibbon Jun 25, 2024
91f47d3
change process input variable names
sorelfitzgibbon Jun 25, 2024
a2f0489
add slurm logs and extra test files to .gitignore
sorelfitzgibbon Jun 25, 2024
fae512d
update readme
sorelfitzgibbon Jun 26, 2024
ecf44dd
update comments
sorelfitzgibbon Jun 26, 2024
206c7e9
adjust run level triggers
sorelfitzgibbon Jun 26, 2024
8587520
adjust log filename
sorelfitzgibbon Jun 26, 2024
137874f
update nftests
sorelfitzgibbon Jun 26, 2024
6ad240e
fix test name
sorelfitzgibbon Jun 26, 2024
86cc4b2
remove out of date comments
sorelfitzgibbon Jun 27, 2024
8fd7119
change process names
sorelfitzgibbon Jun 27, 2024
9bb4d2f
rename bamqc_outformat to bamqc_output_format
sorelfitzgibbon Jun 27, 2024
2a63bb4
rename bamqc_outformat to bamqc_output_format
sorelfitzgibbon Jun 27, 2024
4e1e2fd
remove fastqc as default
sorelfitzgibbon Jun 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -80,3 +80,8 @@ work/
*.gz
*.tar
*.zip

# Other
test/*
test/*/*
slurm-*.out
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,14 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
## [Unreleased]

### Added
- Add FastQC workflow
- Add per readgroup and per library functionality
- Add `process_afterscript`
- Add Nextflow version requirement to `README`

### Changed
- Update SAMtools 1.18 to 1.20
- Update NFTest for FastQC and new test sample
- Update repository/pipeline description
- Update Nextflow configuration test workflows

Expand Down
14 changes: 11 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ input:

| Field | Type | Required | Description |
| ----- | ---- | ------------ | ------------------------ |
| `algorithms` | list | no | List of tools to be run: ['stats', 'collectwgsmetrics', 'bamqc'], default = ['stats', 'collectwgsmetrics'] |
| `algorithm` | list | no | List of tools to be run: ['stats', 'collectwgsmetrics', 'bamqc'], default = ['stats', 'collectwgsmetrics'] |
| `reference` | path | yes/no | Reference fasta is required only for `CollectWgsMetrics` |
| `output_dir` | path | yes | Not required if `blcds_registered_dataset` = `true` |
| `blcds_registered_dataset` | boolean | no | Default is `false`. Only `uclahs_cds` users should change this. When `true`, BLCDS folder structure is used |
Expand All @@ -80,8 +80,16 @@ input:
#### SAMtools specific configuration
| Field | Type | Required | Description |
| ----- | ---- | ------------ | ------------------------ |
| remove_duplicates | boolean | no | Ignore reads marked as duplicate. default = `false` |
| samtools_stats_additional_options | string | no | Any additional options recognized by `samtools stats` |
| stats_max_rgs_per_sample | integer | no | If a sample has more than this number of readgroups, `SAMtools stats` will not run per readgroup analysis. Default = 20 |
| stats_max_libs_per_sample | integer | no | If a sample has more than this number of libraries, `SAMtools stats` will not run per library analysis. Default = 20 |
| stats_remove_duplicates | boolean | no | Ignore reads marked as duplicate. default = `false` |
| stats_additional_options | string | no | Any additional options recognized by `samtools stats` |

#### FastQC specific configuration
| Field | Type | Required | Description |
| ----- | ---- | ------------ | ------------------------ |
| fastqc_level | string | yes | 'readgroup', 'library' or 'sample' |
| fastqc_additional_options | string | no | Any additional options recognized by `FastQC` |

#### Picard specific configuration
| Field | Type | Required | Description |
Expand Down
52 changes: 51 additions & 1 deletion config/F16.config
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,57 @@ process {
cpus = 1
memory = 250.MB
}
withName: run_stats_SAMtools {
withName: assess_ReadQuality_FastQC_readgroup {
cpus = 2
memory = 1.GB
retry_strategy {
memory {
strategy = 'add'
operand = 4.GB
}
}
}
withName: assess_ReadQuality_FastQC_library {
cpus = 2
memory = 1.GB
retry_strategy {
memory {
strategy = 'add'
operand = 4.GB
}
}
}
withName: assess_ReadQuality_FastQC_sample {
cpus = 2
memory = 1.GB
retry_strategy {
memory {
strategy = 'add'
operand = 4.GB
}
}
}
withName: run_stats_SAMtools_readgroup {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(question|standardization : non-blocking) Does this follow our NF standards?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so. The standards say:
"It’s ok to add an additional attribute if a process needs to be run more than once"

run_stats_SAMtools is the base process name.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"It’s ok to add an additional attribute if a process needs to be run more than once"

Right. I think placing the tool name as the last attribute would make more sense though (e.g. run_VariantRecalibratorSNP_GATK and run_VariantRecalibratorINDEL_GATK)? I'll defer this to Nextflow WG.
https://uclahs-cds.atlassian.net/wiki/spaces/BOUTROSLAB/pages/3193890/Nextflow+pipeline+standardization

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see! What are your thoughts on whether the three sections should be maintained? run_statsReadgroups_SAMtools vs run_stats_readgroups_SAMtools

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with maintaining the three sections and using run_statsReadgroups_SAMtools

cpus = 1
memory = 1.GB
retry_strategy {
memory {
strategy = 'add'
operand = 4.GB
}
}
}
withName: run_stats_SAMtools_library {
cpus = 1
memory = 1.GB
retry_strategy {
memory {
strategy = 'add'
operand = 4.GB
}
}
}
withName: run_stats_SAMtools_sample {
cpus = 1
memory = 1.GB
retry_strategy {
Expand Down
52 changes: 51 additions & 1 deletion config/F2.config
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,57 @@ process {
cpus = 1
memory = 250.MB
}
withName: run_stats_SAMtools {
withName: assess_ReadQuality_FastQC_readgroup {
cpus = 1
memory = 1500.MB
retry_strategy {
memory {
strategy = 'add'
operand = 2000.MB
}
}
}
withName: assess_ReadQuality_FastQC_library {
cpus = 1
memory = 1500.MB
retry_strategy {
memory {
strategy = 'add'
operand = 2000.MB
}
}
}
withName: assess_ReadQuality_FastQC_sample {
cpus = 1
memory = 1500.MB
retry_strategy {
memory {
strategy = 'add'
operand = 2000.MB
}
}
}
withName: run_stats_SAMtools_readgroup {
cpus = 1
memory = 1500.MB
retry_strategy {
memory {
strategy = 'add'
operand = 2000.MB
}
}
}
withName: run_stats_SAMtools_library {
cpus = 1
memory = 1500.MB
retry_strategy {
memory {
strategy = 'add'
operand = 2000.MB
}
}
}
withName: run_stats_SAMtools_sample {
cpus = 1
memory = 1500.MB
retry_strategy {
Expand Down
52 changes: 51 additions & 1 deletion config/F32.config
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,57 @@ process {
cpus = 1
memory = 250.MB
}
withName: run_stats_SAMtools {
withName: assess_ReadQuality_FastQC_readgroup {
cpus = 2
memory = 1.GB
retry_strategy {
memory {
strategy = 'add'
operand = 4.GB
}
}
}
withName: assess_ReadQuality_FastQC_library {
cpus = 2
memory = 1.GB
retry_strategy {
memory {
strategy = 'add'
operand = 4.GB
}
}
}
withName: assess_ReadQuality_FastQC_sample {
cpus = 2
memory = 1.GB
retry_strategy {
memory {
strategy = 'add'
operand = 4.GB
}
}
}
withName: run_stats_SAMtools_readgroup {
cpus = 1
memory = 1.GB
retry_strategy {
memory {
strategy = 'add'
operand = 4.GB
}
}
}
withName: run_stats_SAMtools_library {
cpus = 1
memory = 1.GB
retry_strategy {
memory {
strategy = 'add'
operand = 4.GB
}
}
}
withName: run_stats_SAMtools_sample {
cpus = 1
memory = 1.GB
retry_strategy {
Expand Down
52 changes: 51 additions & 1 deletion config/F4.config
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,57 @@ process {
cpus = 1
memory = 250.MB
}
withName: run_stats_SAMtools {
withName: assess_ReadQuality_FastQC_readgroup {
cpus = 2
memory = 1.GB
retry_strategy {
memory {
strategy = 'add'
operand = 4.GB
}
}
}
withName: assess_ReadQuality_FastQC_library {
cpus = 2
memory = 1.GB
retry_strategy {
memory {
strategy = 'add'
operand = 4.GB
}
}
}
withName: assess_ReadQuality_FastQC_sample {
cpus = 2
memory = 1.GB
retry_strategy {
memory {
strategy = 'add'
operand = 4.GB
}
}
}
withName: run_stats_SAMtools_readgroup {
cpus = 1
memory = 1.GB
retry_strategy {
memory {
strategy = 'add'
operand = 3.GB
}
}
}
withName: run_stats_SAMtools_library {
cpus = 1
memory = 1.GB
retry_strategy {
memory {
strategy = 'add'
operand = 3.GB
}
}
}
withName: run_stats_SAMtools_sample {
cpus = 1
memory = 1.GB
retry_strategy {
Expand Down
52 changes: 51 additions & 1 deletion config/F72.config
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,57 @@ process {
cpus = 1
memory = 250.MB
}
withName: run_stats_SAMtools {
withName: assess_ReadQuality_FastQC_readgroup {
cpus = 2
memory = 1.GB
retry_strategy {
memory {
strategy = 'add'
operand = 4.GB
}
}
}
withName: assess_ReadQuality_FastQC_library {
cpus = 2
memory = 1.GB
retry_strategy {
memory {
strategy = 'add'
operand = 4.GB
}
}
}
withName: assess_ReadQuality_FastQC_sample {
cpus = 2
memory = 1.GB
retry_strategy {
memory {
strategy = 'add'
operand = 4.GB
}
}
}
withName: run_stats_SAMtools_readgroup {
cpus = 1
memory = 1.GB
retry_strategy {
memory {
strategy = 'add'
operand = 4.GB
}
}
}
withName: run_stats_SAMtools_library {
cpus = 1
memory = 1.GB
retry_strategy {
memory {
strategy = 'add'
operand = 4.GB
}
}
}
withName: run_stats_SAMtools_sample {
cpus = 1
memory = 1.GB
retry_strategy {
Expand Down
Loading