Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multiple regions analysis / 5R / SMURF / q2-sidle #702

Merged
merged 30 commits into from
Mar 18, 2024
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
117e426
move primers to meta map
d4straub Feb 5, 2024
c26cbac
Merge branch 'dev' of https://github.com/nf-core/ampliseq into add-mu…
d4straub Feb 6, 2024
3f16376
process regions with --input_mutiregion
d4straub Feb 6, 2024
45fefd4
fix when no primers are given
d4straub Feb 7, 2024
715ae05
produce per-region ASV tables and fasta
d4straub Feb 7, 2024
0f04676
adjust container and input
d4straub Feb 8, 2024
aa1ff5a
add SIDLE workflow from d4straub/pipesidle
d4straub Feb 9, 2024
afbb18f
add sidle reference taxonomy entries & custom input
d4straub Feb 9, 2024
a8bc971
fix prettier
d4straub Feb 9, 2024
0185e0c
plugin sidle output to downstream analysis
d4straub Feb 22, 2024
5b4d4d0
--sidle_ref_taxonomy greengenes works
d4straub Feb 23, 2024
fdb8d68
Add multiregion test
d4straub Mar 8, 2024
74de9fb
update documentation and changelog
d4straub Mar 8, 2024
2b72e2a
Fix prettier
d4straub Mar 8, 2024
1617780
update README
d4straub Mar 8, 2024
f68b18a
add smaller test database
d4straub Mar 8, 2024
c26a2c2
make sidle_ref_taxonomy entry tree_qza optional
d4straub Mar 8, 2024
6b7a5a4
Fix prettier
d4straub Mar 8, 2024
2370c40
update multiregion nf.test
d4straub Mar 11, 2024
35b0ddc
update multiregion.nf.test.snap
d4straub Mar 11, 2024
5bdb033
correct multiregion.nf.test
d4straub Mar 11, 2024
cd30e46
fix sidle silva ref db
d4straub Mar 11, 2024
cef0875
adjust settings based on ref db
d4straub Mar 11, 2024
d6a6939
silva ref db works
d4straub Mar 13, 2024
221a554
cleanup
d4straub Mar 13, 2024
dd3698e
check incompatible params with sidle
d4straub Mar 13, 2024
8c711a7
fix overzealous check
d4straub Mar 13, 2024
77e03b1
re-arrange param documentation and rename --input_multiregion to --mu…
d4straub Mar 18, 2024
8575090
prevent execution with conda
d4straub Mar 18, 2024
dc5e8a3
remove empty lines
d4straub Mar 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ jobs:
- "test_pacbio_its"
- "test_sintax"
- "test_pplace"
- "test_multiregion"
profile:
- "docker"

Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### `Added`

- [#700](https://github.com/nf-core/ampliseq/pull/700) - Optional `--save_intermediates` to publish QIIME2 data objects (.qza) and visualisation objects (.qzv)
- [#702](https://github.com/nf-core/ampliseq/pull/702) - Add multiple regions analysis (including 5R / SMURF / q2-sidle)

### `Changed`

Expand Down
18 changes: 18 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,24 @@

> Czech, Lucas, Pierre Barbera, and Alexandros Stamatakis. “Genesis and Gappa: Processing, Analyzing and Visualizing Phylogenetic (Placement) Data.” Bioinformatics 36, no. 10 (May 1, 2020): 3263–65. https://doi.org/10.1093/bioinformatics/btaa070.

### Multi region analysis (also include Greengenes 13_8 or SILVA 128)

- [q2-sidle](https://doi.org/10.1101/2021.03.23.436606)

> Debelius, J.W.; Robeson, M.; Lhugerth, L.W.; Boulund, F.; Ye, W.; Engstrand, L. "A comparison of approaches to scaffolding multiple regions along the 16S rRNA gene for improved resolution." Preprint in BioRxiv. doi: 10.1101/2021.03.23.436606

- [SMURF](https://doi.org/10.1186/s40168-017-0396-x)

> Fuks, G.; Elgart, M.; Amir, A.; Zeisel, A.; Turnbaugh, P.J., Soen, Y.; and Shental, N. (2018). "Combining 16S rRNA gene variable regions enables high-resolution microbial community profiling." Microbiome. 6: 17. doi: 10.1186/s40168-017-0396-x

- [RESCRIPt](https://doi.org/10.1371/journal.pcbi.1009581)

> Robeson MS 2nd, O'Rourke DR, Kaehler BD, Ziemski M, Dillon MR, Foster JT, Bokulich NA. RESCRIPt: Reproducible sequence taxonomy reference database management. PLoS Comput Biol. 2021 Nov 8;17(11):e1009581. doi: 10.1371/journal.pcbi.1009581. PMID: 34748542; PMCID: PMC8601625.

- [SEPP](https://doi.org/10.1128/msystems.00021-18)

> Janssen S, McDonald D, Gonzalez A, Navas-Molina JA, Jiang L, Xu ZZ, Winker K, Kado DM, Orwoll E, Manary M, Mirarab S, Knight R. Phylogenetic Placement of Exact Amplicon Sequences Improves Associations with Clinical Information. mSystems. 2018 Apr 17;3(3):e00021-18. doi: 10.1128/mSystems.00021-18. PMID: 29719869; PMCID: PMC5904434.

### Downstream analysis

- [QIIME2](https://pubmed.ncbi.nlm.nih.gov/31341288/)
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@

## Introduction

**nfcore/ampliseq** is a bioinformatics analysis pipeline used for amplicon sequencing, supporting denoising of any amplicon and supports a variety of taxonomic databases for taxonomic assignment including 16S, ITS, CO1 and 18S. Phylogenetic placement is also possible. Supported is paired-end Illumina or single-end Illumina, PacBio and IonTorrent data. Default is the analysis of 16S rRNA gene amplicons sequenced paired-end with Illumina.
**nfcore/ampliseq** is a bioinformatics analysis pipeline used for amplicon sequencing, supporting denoising of any amplicon and supports a variety of taxonomic databases for taxonomic assignment including 16S, ITS, CO1 and 18S. Phylogenetic placement is also possible. Multiple region analysis such as 5R is implemented. Supported is paired-end Illumina or single-end Illumina, PacBio and IonTorrent data. Default is the analysis of 16S rRNA gene amplicons sequenced paired-end with Illumina.

A video about relevance, usage and output of the pipeline (version 2.1.0; 26th Oct. 2021) can also be found in [YouTube](https://youtu.be/a0VOEeAvETs) and [billibilli](https://www.bilibili.com/video/BV1B44y1e7MM), the slides are deposited at [figshare](https://doi.org/10.6084/m9.figshare.16871008.v1).

Expand Down
2 changes: 1 addition & 1 deletion assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
"pattern": "^[a-zA-Z][a-zA-Z0-9_]+$",
"unique": true,
"errorMessage": "Unique sample ID must be provided: Must start with a letter, and can only contain letters, numbers or underscores; Regex: '^[a-zA-Z][a-zA-Z0-9_]+$'",
"meta": ["id"]
"meta": ["sample"]
},
"forwardReads": {
"type": "string",
Expand Down
37 changes: 37 additions & 0 deletions assets/schema_multiregion.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "https://raw.githubusercontent.com/nf-core/ampliseq/master/assets/schema_multiregion.json",
"title": "nf-core/ampliseq pipeline - params.multiregion schema",
"description": "Schema for the file provided with params.multiregion",
"type": "array",
"items": {
"type": "object",
"properties": {
"region": {
"type": "string",
"pattern": "^\\S+$",
"unique": true,
"errorMessage": "Region name is mandatory, cannot contain spaces, and must be unique",
"meta": ["region"]
},
"region_length": {
"type": "integer",
"errorMessage": "Length of region must be an integer",
"meta": ["region_length"]
},
"FW_primer": {
"type": "string",
"pattern": "^[ATUGCYRSWKMBDHVN]*$",
"errorMessage": "FW_primer must be provided and may contain only uppercase nucleotide IUPAC code [ATUGCYRSWKMBDHVN]",
"meta": ["fw_primer"]
},
"RV_primer": {
"type": "string",
"pattern": "^[ATUGCYRSWKMBDHVN]*$",
"errorMessage": "RV_primer must be provided and may contain only uppercase nucleotide IUPAC code [ATUGCYRSWKMBDHVN]",
"meta": ["rv_primer"]
}
},
"required": ["region", "region_length", "FW_primer", "RV_primer"]
}
}
28 changes: 28 additions & 0 deletions bin/taxref_reformat_sidle.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/bin/sh

derep="$1"

# Untar any tar file in the working directory
tar xzf database.tar.gz

# Greengenes 13_8
if [ -d "gg_13_8_otus" ]; then
mv gg_13_8_otus/rep_set/${derep}_otus.fasta gg_13_8_otus_rep_set_${derep}_otus.seq.fasta
mv gg_13_8_otus/rep_set_aligned/${derep}_otus.fasta gg_13_8_otus_rep_set_aligned_${derep}_otus.alnseq.fasta
mv gg_13_8_otus/taxonomy/${derep}_otu_taxonomy.txt gg_13_8_otus_taxonomy_${derep}_otu_taxonomy.tax.txt
# remove uncompressed folder
rm -r gg_13_8_otus
elif [ -d "SILVA_128_QIIME_release" ]; then
mv SILVA_128_QIIME_release/rep_set/rep_set_all/${derep}/${derep}_otus.fasta SILVA_128_QIIME_release_rep_set_all_${derep}_otus.seq.fasta
gunzip -c SILVA_128_QIIME_release/rep_set_aligned/${derep}/${derep}_otus_aligned.fasta.gz > SILVA_128_QIIME_release_rep_set_aligned_${derep}_otus_aligned.alnseq.fasta
mv SILVA_128_QIIME_release/taxonomy/taxonomy_all/${derep}/consensus_taxonomy_7_levels.txt SILVA_128_QIIME_release_taxonomy_all_${derep}_consensus_taxonomy_7_levels.tax.txt
# remove uncompressed folder
rm -r SILVA_128_QIIME_release
else
echo "No expected directory detected"
fi





d4straub marked this conversation as resolved.
Show resolved Hide resolved
Loading
Loading