Skip to content

DSL2 - CLASSIFY_MTDNA_HAPLOGROUP #1134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 73 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
6b8c5f6
WIP
VerbalCant May 1, 2025
6ea169a
Stop being so abstracty
VerbalCant May 1, 2025
3c4e232
Add haplogrep3 classify @3.2.2 (nf-core module)
VerbalCant May 1, 2025
4b1a58b
TODO: This is an upstream bug/regression, right?
VerbalCant May 1, 2025
585971b
Rename to CLASSIFY_MTDNA_HAPLOGROUP
VerbalCant May 1, 2025
4ce4ac2
Rename command line option to `run_mtdna_haplogroup_classification`
VerbalCant May 1, 2025
1a46a31
Fix linter error
VerbalCant May 1, 2025
c9d7656
Update docs with working examples for manual testing
VerbalCant May 1, 2025
7cba844
Remove markdownlint and prettier processing for
VerbalCant May 2, 2025
689a5cf
Add general structure, and generate (not working!) snapshot for basic…
jfy133 Apr 26, 2024
24f1f35
Try adding an ignore zip system, doesn't work atm though...
jfy133 Apr 26, 2024
218d88f
Continue customisation
jfy133 Jun 28, 2024
a4f129a
More debugging
jfy133 Jul 12, 2024
435ce68
Get working function
jfy133 Jul 19, 2024
e0f5e36
Get working function properly, and start integrating into snapshot: p…
jfy133 Jul 19, 2024
707de42
Extract names properly
jfy133 Jul 19, 2024
5b9a9ab
Remove unnecesary code
jfy133 Jul 19, 2024
781cd13
Get working string check function!
jfy133 Jul 19, 2024
b5a8a92
TODO
jfy133 Jul 19, 2024
846cf35
Update MQC so works on gitpod, try to re-add original getAllFilesFrom…
jfy133 Jul 20, 2024
97ea10d
And get it back again
jfy133 Jul 20, 2024
d929a63
Sort output to ensure consistency
jfy133 Jul 22, 2024
5908afe
Sort file name only funciton
jfy133 Jul 22, 2024
79eff19
Start testing preprocessing
jfy133 Jul 26, 2024
4ce6bdd
Start adding preprocessing dir but diff not working so can't ser vari…
jfy133 Jul 26, 2024
d58f321
Backing up TODO notes
jfy133 Jul 26, 2024
520ea52
Final snapshots, need to double check all files are covered but I thi…
jfy133 Aug 16, 2024
77fd7fe
Finalise first test
jfy133 Sep 20, 2024
bf91b15
Update tests to latest dev
jfy133 Sep 20, 2024
1c7392d
Start refactoring to use nft-utils
jfy133 Oct 4, 2024
4ff981c
bump nft-utils version
TCLamnidis Oct 18, 2024
fd4b709
start refactoring to use nft-utils
TCLamnidis Oct 18, 2024
83f3c16
update snapshot. Not sure it works yet.
TCLamnidis Oct 18, 2024
fd244a8
add preprocessing
TCLamnidis Nov 8, 2024
e9685c1
fix preprocessing checks
TCLamnidis Nov 29, 2024
722c2c8
reorder and add todos
TCLamnidis Dec 3, 2024
f8ed168
remove versions.ymls from snapshot
TCLamnidis Dec 3, 2024
4a7edc6
check bam names instead of content
TCLamnidis Dec 4, 2024
03f7b4e
fix result pickup for bam filtering and deduplication
TCLamnidis Dec 4, 2024
d58c07f
add final_bams,mapstats. fix mapping. remove old code
TCLamnidis Dec 6, 2024
10f2cb3
simplify bam_input_stats snapshot
TCLamnidis Dec 6, 2024
0e6aef5
update snapshot
TCLamnidis Dec 6, 2024
e083cd6
exclude unstable qualimap results.
TCLamnidis Dec 6, 2024
be4f354
Check existence of Multiqc output files. reorder tests to match alpha…
TCLamnidis Dec 11, 2024
1014012
remove leftover todo
TCLamnidis Dec 20, 2024
b81baee
add command legend, and align a bit more
TCLamnidis Dec 20, 2024
c4dbdb9
Fix test. and snapshot.
TCLamnidis Jan 24, 2025
eb5b05a
bump nft-bam version
TCLamnidis Feb 21, 2025
93b78cc
Start over with new output directory structure
TCLamnidis Feb 21, 2025
f259820
bump nft-bams to latest
TCLamnidis Feb 21, 2025
bc13a92
remove premature snapshot
TCLamnidis Feb 28, 2025
a2feb3e
WIP test reimplementation
TCLamnidis Feb 28, 2025
1372a10
add all remaining sections. tests still fail.
TCLamnidis Mar 7, 2025
56376e0
Slightly improved docs phrasing and structure
jfy133 Mar 14, 2025
f7f2ccc
Add some TODOs
jfy133 Mar 14, 2025
9efc16a
Add ebugging prints
jfy133 Mar 21, 2025
289efda
Fix var name
jfy133 Mar 21, 2025
072d57e
Update test snapshot to exclude directories and contained files
jfy133 Mar 21, 2025
11741b1
Add missing bam and flagstat files
jfy133 Mar 25, 2025
f609d74
remove unused nft-bam plugin
TCLamnidis Apr 11, 2025
e20aa6a
cleanup unused code and debug output
TCLamnidis Apr 11, 2025
4b5be30
add bamfiltering_savefilteredbams again
TCLamnidis May 2, 2025
781c16e
remve duplicate Qualimap config
TCLamnidis May 2, 2025
787c0e2
update test. add genotyping and metagenomics
TCLamnidis May 2, 2025
fe0c6bc
update snapshot
TCLamnidis May 2, 2025
50df7f2
exclude unstable files from md5sum
TCLamnidis May 2, 2025
ad2d0f7
Stricter checking of VCF file checksums using nft-vcf.
TCLamnidis May 2, 2025
99ef48b
add to unstable file list
TCLamnidis May 8, 2025
3f7d22f
better comments
TCLamnidis May 9, 2025
ada2f9d
add 5p_misincorporation to ignore list
TCLamnidis May 9, 2025
22d22f4
Pass phylotree through to haplogrep3 module
VerbalCant May 2, 2025
b39f974
Update testing docs with new params
VerbalCant May 2, 2025
432bd67
Merge branch 'dev' into dsl2-haplogrep3
TCLamnidis May 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .prettierignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ adaptivecard.json
slackreport.json
.nextflow*
work/
docs/manual_tests.md
data/
results/
.DS_Store
Expand Down
7 changes: 7 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -1752,4 +1752,11 @@ process {
]
]
}

withName: 'NFCORE_EAGER:EAGER:CLASSIFY_MTDNA_HAPLOGROUP:HAPLOGREP3_CLASSIFY' {
ext.args = {
def phylotree = params.human_mtdna_phylotree ?: (params.human_mtdna_reference.toLowerCase() == 'rsrs' ? '[email protected]' : '[email protected]')
"--tree ${phylotree}"
}
}
}
10 changes: 5 additions & 5 deletions docs/development/code_conventions.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,11 @@ The alias should ideally make it intuitive to understand which subworkflow the m

- The unique module names specified above should make it possible to always configure modules without the need for a regex/glob when using `withName`. Exception to this is modules named within nf-core subworkflows, which should be configured with a regex/glob.
- The order of attributes within configuration blocks should always be the following:
1. tag (mandatory)
2. ext.args\* (optional. Followed by ext.args{2,3,...} in ascending order)
3. ext.prefix (optional)
4. publishDir (optional)
5. any other attributes go to the end.
1. tag (mandatory)
2. ext.args\* (optional. Followed by ext.args{2,3,...} in ascending order)
3. ext.prefix (optional)
4. publishDir (optional)
5. any other attributes go to the end.
- NEVER use `meta.id` in module configuration (`tag`,`ext.*`), but instead the full explicit combination of unique attributes expected. `meta.sample_id` is fine to use and is equivalent to `meta.id`, but should be supplemented by `meta.library_id` and `meta.lane` etc, as required.
- Every process that is reference-specific MUST include `${meta.reference}` in its `tag` and `ext.prefix` attributes. This is to avoid confusion when running the pipeline with multiple references.
- Tags that include reference and sample information should be formatted as `${meta.reference}|${meta.sample_id}_*`. Reference specific attributes go on the left-hand-side of the tag, data-specific attributes on the right-hand-side.
Expand Down
2 changes: 1 addition & 1 deletion docs/development/dev_docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ To add new input files or options to the reference sheet, you have to complete a

### Multi-reference input workflow

1. Add new column named <SOFTWARE_FILETYPE> and test data to the test reference sheet (https://github.com/nf-core/test-datasets/blob/eager/reference/reference_sheet_multiref.csv).
1. Add new column named <SOFTWARE_FILETYPE> and test data to the test reference sheet (<https://github.com/nf-core/test-datasets/blob/eager/reference/reference_sheet_multiref.csv>).
2. Read in new input via nf-validation plugin within the reference_indexing_multi local subworkflow.
1. Add new "property" <SOFTWARE_FILETYPE> to the fasta validation schema (assets/schema_fasta.json).
1. Add "type" of your object, e.g. `"type": "string"` for file paths and `"type": "integer"` for numbers.
Expand Down
16 changes: 16 additions & 0 deletions docs/development/manual_tests.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
<!-- markdownlint-disable -->
# Manual Tests

Here is a list of manual tests we can run with the expect output commands
Expand Down Expand Up @@ -1133,3 +1134,18 @@ nextflow run main.nf -profile test,docker --outdir ./results -w work/ -resume --
## Expect: BAM input shows up in FastQC -> mapping results.
nextflow run main.nf -profile test,docker --outdir ./results -w work/ --convert_inputbam --skip_deduplication -resume -ansi-log false -dump-channels
```

### MTDNA HAPLOGROUP CLASSIFICATION

```bash
#### MTDNA HAPLOGROUP CLASSIFICATION with default settings
## Expect: Directory created 'mtdna_haplogroup/<reference>/<sample_id>' containing a .txt file for each sample with haplogroup assignments
## Expect: The haplogroup .txt file contains at minimum columns for rank, name, quality, range, and details of the haplogroup assignment
nextflow run main.nf -profile docker,test --outdir ./results/mtdna_haplogroup_test --run_genotyping --genotyping_tool ug --genotyping_source raw --run_classify_mtdna_haplogroup -resume

#### MTDNA HAPLOGROUP CLASSIFICATION with specific arguments
## Expect: Directory created 'mtdna_haplogroup/<reference>/<sample_id>' containing a .txt file for each sample with haplogroup assignments
## Expect: The haplogroup assignment may differ based on the classification settings
nextflow run main.nf -profile docker,test --outdir ./results/mtdna_haplogroup_test --run_classify_mtdna_haplogroup --run_genotyping --genotyping_tool ug --genotyping_source raw --run_classify_mtdna_haplogroup --human_mtdna_reference rsrs --human_mtdna_phylotree [email protected] -resume
```
<!-- markdownlint-enable -->
4 changes: 2 additions & 2 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ Only the `reference_name`, and `fasta` columns are mandatory, whereas all other

Files for `fai`, `dict`, `mapper_index` will be generated by the pipeline for you if not specified.

A real-world example could look as follows, where a user-supplied `.dict` file and `circular_target ` and `mitochondrion_header` are not specified:
A real-world example could look as follows, where a user-supplied `.dict` file and `circular_target` and `mitochondrion_header` are not specified:

```txt
reference_name,fasta,fai,dict,mapper_index,circular_target,mitochondrion
Expand Down Expand Up @@ -217,7 +217,7 @@ If `-profile` is not specified, the pipeline will run locally and expect all sof
- `apptainer`
- A generic configuration profile to be used with [Apptainer](https://apptainer.org/)
- `wave`
- A generic configuration profile to enable [Wave](https://seqera.io/wave/) containers. Use together with one of the above (requires Nextflow ` 24.03.0-edge` or later).
- A generic configuration profile to enable [Wave](https://seqera.io/wave/) containers. Use together with one of the above (requires Nextflow `24.03.0-edge` or later).
- `conda`
- A generic configuration profile to be used with [Conda](https://conda.io/docs/). Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter, Charliecloud, or Apptainer.

Expand Down
5 changes: 5 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,11 @@
"git_sha": "3a5fef109d113b4997c9822198664ca5f2716208",
"installed_by": ["modules"]
},
"haplogrep3/classify": {
"branch": "master",
"git_sha": "81880787133db07d9b4c1febd152c090eb8325dc",
"installed_by": ["modules"]
},
"kraken2/kraken2": {
"branch": "master",
"git_sha": "653218e79ffa76fde20319e9062f8b8da5cf7555",
Expand Down
2 changes: 1 addition & 1 deletion modules/nf-core/fastqc/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions modules/nf-core/haplogrep3/classify/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

47 changes: 47 additions & 0 deletions modules/nf-core/haplogrep3/classify/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

45 changes: 45 additions & 0 deletions modules/nf-core/haplogrep3/classify/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

59 changes: 59 additions & 0 deletions modules/nf-core/haplogrep3/classify/tests/main.nf.test

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

72 changes: 72 additions & 0 deletions modules/nf-core/haplogrep3/classify/tests/main.nf.test.snap

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions modules/nf-core/haplogrep3/classify/tests/nextflow.config

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -249,6 +249,11 @@ params {
run_sexdeterrmine = false
sexdeterrmine_bedfile = null

// mtDNA haplogroup classification
run_mtdna_haplogroup_classification = false
human_mtdna_reference = 'rcrs'
human_mtdna_phylotree = null

// Genotyping
run_genotyping = false
genotyping_tool = null
Expand Down
Loading
Loading