Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev #615

Merged
merged 10 commits into from
Aug 11, 2023
Merged

Dev #615

Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,10 @@

> Jari Oksanen, F. Guillaume Blanchet, Michael Friendly, Roeland Kindt, Pierre Legendre, Dan McGlinn, Peter R. Minchin, R. B. O’Hara, Gavin L. Simpson, Peter Solymos, M. Henry H. Stevens, Eduard Szoecs, and Helene Wagner. vegan: Community Ecology Package. 2018. R package version 2.5-3.

- [Phyloseq](https://doi.org/10.1371/journal.pone.0061217)

> McMurdie PJ, Holmes S (2013). “phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data.” PLoS ONE, 8(4), e61217.

### Non-default tools

- [ITSx](https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.12073)
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ By default, the pipeline currently performs the following:
- Taxonomical classification using DADA2, [SINTAX](https://doi.org/10.1101/074161) or [QIIME2](https://www.nature.com/articles/s41587-019-0209-9)
- Excludes unwanted taxa, produces absolute and relative feature/taxa count tables and plots, plots alpha rarefaction curves, computes alpha and beta diversity indices and plots thereof ([QIIME2](https://www.nature.com/articles/s41587-019-0209-9))
- Calls differentially abundant taxa ([ANCOM](https://www.ncbi.nlm.nih.gov/pubmed/26028277))
- Creates phyloseq R objects ([Phyloseq](https://www.bioconductor.org/packages/release/bioc/html/phyloseq.html))
- Overall pipeline run summaries ([MultiQC](https://multiqc.info/))

## Usage
Expand Down
32 changes: 32 additions & 0 deletions bin/reformat_tax_for_phyloseq.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/usr/bin/env python3

import pandas as pd
import sys

tax_file = sys.argv[1]
out_file = sys.argv[2]

# Import tsv file
tax_df = pd.read_csv(tax_file, sep="\t")

# The second column should hold the taxonomy information
tax_col = tax_df.columns[1]

# Split the values in the tax column
split_tax = tax_df[tax_col].str.split(';', expand=True)

# Assign names to the new columns with an auto incrementing integer
new_col_names = [f'{tax_col}_{i+1}' for i in range(split_tax.shape[1])]
split_tax.columns = new_col_names

# Strip whitespace from the tax names
split_tax = split_tax.applymap(lambda x: x.strip() if isinstance(x, str) else x)

# Drop the original tax column
tax_df = tax_df.drop(columns=[tax_col])

# Add the new tax columns to the df
result = pd.concat([tax_df, split_tax], axis=1)

# Create new tsv file
result.to_csv(out_file, sep='\t', index=False)
8 changes: 8 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -785,6 +785,14 @@ process {
]
}

withName: 'PHYLOSEQ' {
publishDir = [
path: { "${params.outdir}/phyloseq" },
mode: params.publish_dir_mode,
pattern: "*.rds"
]
}

withName: CUSTOM_DUMPSOFTWAREVERSIONS {
publishDir = [
path: { "${params.outdir}/pipeline_info" },
Expand Down
13 changes: 13 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Diversity analysis](#diversity-analysis) - High level overview with different diversity indices
- [ANCOM](#ancom) - Differential abundance analysis
- [PICRUSt2](#picrust2) - Predict the functional potential of a bacterial community
- [Phyloseq](#phyloseq) - Phyloseq R objects
- [Read count report](#read-count-report) - Report of read counts during various steps of the pipeline
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution

Expand Down Expand Up @@ -518,6 +519,18 @@ Most of the fields in the template will not be populated by the export process,

</details>

### Phyloseq

This directory will hold phyloseq objects for each taxonomy table produced by this pipeline. The objects will contain an ASV abundance table and a taxonomy table. If the pipeline is provided with metadata, that metadata will also be included in the phyloseq object. A phylogenetic tree will also be included if the pipeline produces a tree.

<details markdown="1">
<summary>Output files</summary>

- `phyloseq/`
- `<taxonomy>_phyloseq.rds`: Phyloseq R object.

</details>

## Read count report

This report includes information on how many reads per sample passed each pipeline step in which a loss can occur. Specifically, how many read pairs entered cutadapt, were reverse complemented, passed trimming; how many read pairs entered DADA2, were denoised, merged and non-chimeric; and how many counts were lost during excluding unwanted taxa and removing low abundance/prevalence sequences in QIIME2.
Expand Down
59 changes: 59 additions & 0 deletions modules/local/phyloseq.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
process PHYLOSEQ {
tag "$prefix"
label 'process_low'

conda "bioconda::bioconductor-phyloseq=1.44.0"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/bioconductor-phyloseq:1.44.0--r43hdfd78af_0' :
'quay.io/biocontainers/bioconductor-phyloseq:1.44.0--r43hdfd78af_0' }"

input:
tuple val(prefix), path(tax_tsv)
path otu_tsv
path sam_tsv
path tree

output:
tuple val(prefix), path("*phyloseq.rds"), emit: rds
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def sam_tsv = "\"${sam_tsv}\""
def otu_tsv = "\"${otu_tsv}\""
def tax_tsv = "\"${tax_tsv}\""
def tree = "\"${tree}\""
def prefix = "\"${prefix}\""
a4000 marked this conversation as resolved.
Show resolved Hide resolved
"""
#!/usr/bin/env Rscript

suppressPackageStartupMessages(library(phyloseq))

otu_df <- read.table($otu_tsv, sep="\\t", header=TRUE, row.names=1)
tax_df <- read.table($tax_tsv, sep="\\t", header=TRUE, row.names=1)
otu_mat <- as.matrix(otu_df)
tax_mat <- as.matrix(tax_df)

OTU <- otu_table(otu_mat, taxa_are_rows=TRUE)
TAX <- tax_table(tax_mat)
phy_obj <- phyloseq(OTU, TAX)

if (file.exists($sam_tsv)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not aware of any example for conditional input that uses file.exists($file), so I am a little skeptic here. It might be working fine locally and on github, but we will need to see whether that works also on other systems. But easy enough to change in case it needs to be modified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I did my testing on a Linux server, but I'm hoping it would work on other systems.

sam_df <- read.table($sam_tsv, sep="\\t", header=TRUE, row.names=1)
SAM <- sample_data(sam_df)
phy_obj <- merge_phyloseq(phy_obj, SAM)
}

if (file.exists($tree)) {
TREE <- read_tree($tree)
phy_obj <- merge_phyloseq(phy_obj, TREE)
}

saveRDS(phy_obj, file = paste0($prefix, "_phyloseq.rds"))

# Version information
writeLines(c("\\"${task.process}\\":", paste0(" R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0(" phyloseq: ", packageVersion("phyloseq"))), "versions.yml")
a4000 marked this conversation as resolved.
Show resolved Hide resolved
"""
}
28 changes: 28 additions & 0 deletions modules/local/phyloseq_inasv.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
process PHYLOSEQ_INASV {
label 'process_low'

conda "conda-forge::sed=4.7"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/ubuntu:20.04' :
'nf-core/ubuntu:20.04' }"

input:
path(biom_file)

output:
path( "*.tsv" ) , emit: tsv
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
"""
tail $biom_file -n +2 | sed '1s/#OTU ID/ASV_ID/' > reformat_$biom_file

cat <<-END_VERSIONS > versions.yml
"${task.process}":
bash: \$(bash --version | sed -n 1p | sed 's/GNU bash, version //g')
END_VERSIONS
"""
}
29 changes: 29 additions & 0 deletions modules/local/phyloseq_intax.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
process PHYLOSEQ_INTAX {
label 'process_low'

conda "conda-forge::pandas=1.1.5"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/pandas:1.1.5':
'biocontainers/pandas:1.1.5' }"

input:
path(tax_tsv)

output:
path( "*.tsv" ) , emit: tsv
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
"""
reformat_tax_for_phyloseq.py $tax_tsv reformat_$tax_tsv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
python: \$(python --version 2>&1 | sed 's/Python //g')
pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
END_VERSIONS
"""
}
3 changes: 2 additions & 1 deletion tests/pipeline/iontorrent.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,8 @@ nextflow_pipeline {
{ assert snapshot(path("$outputDir/input/Samplesheet_it_SE_ITS.tsv")).match("input") },
{ assert snapshot(path("$outputDir/multiqc/multiqc_data/multiqc_fastqc.txt"),
path("$outputDir/multiqc/multiqc_data/multiqc_general_stats.txt"),
path("$outputDir/multiqc/multiqc_data/multiqc_cutadapt.txt")).match("multiqc") }
path("$outputDir/multiqc/multiqc_data/multiqc_cutadapt.txt")).match("multiqc") },
{ assert new File("$outputDir/phyloseq/dada2_phyloseq.rds").exists() }
d4straub marked this conversation as resolved.
Show resolved Hide resolved
)
}
}
Expand Down
3 changes: 2 additions & 1 deletion tests/pipeline/multi.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,8 @@ nextflow_pipeline {
{ assert new File("$outputDir/qiime2/representative_sequences/filtered-sequences.qza").exists() },
{ assert new File("$outputDir/qiime2/representative_sequences/rep-seq.fasta").exists() },
{ assert snapshot(path("$outputDir/qiime2/representative_sequences/descriptive_stats.tsv"),
path("$outputDir/qiime2/representative_sequences/seven_number_summary.tsv")).match("qiime2") }
path("$outputDir/qiime2/representative_sequences/seven_number_summary.tsv")).match("qiime2") },
{ assert new File("$outputDir/phyloseq/dada2_phyloseq.rds").exists() }
)
}
}
Expand Down
3 changes: 2 additions & 1 deletion tests/pipeline/pacbio_its.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,8 @@ nextflow_pipeline {
path("$outputDir/SBDI/emof.tsv"),
path("$outputDir/SBDI/event.tsv")).match("SBDI") },
{ assert new File("$outputDir/SBDI/annotation.tsv").exists() },
{ assert new File("$outputDir/SBDI/asv-table.tsv").exists() }
{ assert new File("$outputDir/SBDI/asv-table.tsv").exists() },
{ assert new File("$outputDir/phyloseq/dada2_phyloseq.rds").exists() }
)
}
}
Expand Down
4 changes: 3 additions & 1 deletion tests/pipeline/pplace.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,9 @@ nextflow_pipeline {
{ assert new File("$outputDir/pplace/test_pplace.taxonomy.per_query.tsv").exists() },
{ assert new File("$outputDir/pplace/test_pplace.graft.test_pplace.epa_result.newick").exists() },
{ assert snapshot(path("$outputDir/multiqc/multiqc_data/multiqc_general_stats.txt"),
path("$outputDir/multiqc/multiqc_data/multiqc_cutadapt.txt")).match("multiqc") }
path("$outputDir/multiqc/multiqc_data/multiqc_cutadapt.txt")).match("multiqc") },
{ assert new File("$outputDir/phyloseq/pplace_phyloseq.rds").exists() },
{ assert new File("$outputDir/phyloseq/qiime2_phyloseq.rds").exists() }
)
}
}
Expand Down
3 changes: 2 additions & 1 deletion tests/pipeline/reftaxcustom.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,8 @@ nextflow_pipeline {
{ assert snapshot(path("$outputDir/input/Samplesheet.tsv")).match("input") },
{ assert snapshot(path("$outputDir/multiqc/multiqc_data/multiqc_fastqc.txt"),
path("$outputDir/multiqc/multiqc_data/multiqc_general_stats.txt"),
path("$outputDir/multiqc/multiqc_data/multiqc_cutadapt.txt")).match("multiqc") }
path("$outputDir/multiqc/multiqc_data/multiqc_cutadapt.txt")).match("multiqc") },
{ assert new File("$outputDir/phyloseq/dada2_phyloseq.rds").exists() }
)
}
}
Expand Down
3 changes: 2 additions & 1 deletion tests/pipeline/single.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,8 @@ nextflow_pipeline {
{ assert snapshot(path("$outputDir/input/Samplesheet_single_end.tsv")).match("input") },
{ assert snapshot(path("$outputDir/multiqc/multiqc_data/multiqc_fastqc.txt"),
path("$outputDir/multiqc/multiqc_data/multiqc_general_stats.txt"),
path("$outputDir/multiqc/multiqc_data/multiqc_cutadapt.txt")).match("multiqc") }
path("$outputDir/multiqc/multiqc_data/multiqc_cutadapt.txt")).match("multiqc") },
{ assert new File("$outputDir/phyloseq/dada2_phyloseq.rds").exists() }
)
}
}
Expand Down
3 changes: 2 additions & 1 deletion tests/pipeline/sintax.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,8 @@ nextflow_pipeline {
{ assert new File("$outputDir/sintax/ASV_tax_sintax.unite-fungi.tsv").exists() },
{ assert new File("$outputDir/sintax/ref_taxonomy_sintax.txt").exists() },
{ assert snapshot(path("$outputDir/multiqc/multiqc_data/multiqc_general_stats.txt"),
path("$outputDir/multiqc/multiqc_data/multiqc_cutadapt.txt")).match("multiqc") }
path("$outputDir/multiqc/multiqc_data/multiqc_cutadapt.txt")).match("multiqc") },
{ assert new File("$outputDir/phyloseq/sintax_phyloseq.rds").exists() }
)
}
}
Expand Down
4 changes: 3 additions & 1 deletion tests/pipeline/test.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,9 @@ nextflow_pipeline {
path("$outputDir/SBDI/emof.tsv"),
path("$outputDir/SBDI/event.tsv")).match("SBDI") },
{ assert new File("$outputDir/SBDI/annotation.tsv").exists() },
{ assert new File("$outputDir/SBDI/asv-table.tsv").exists() }
{ assert new File("$outputDir/SBDI/asv-table.tsv").exists() },
{ assert new File("$outputDir/phyloseq/dada2_phyloseq.rds").exists() },
{ assert new File("$outputDir/phyloseq/qiime2_phyloseq.rds").exists() }
)
}
}
Expand Down
66 changes: 63 additions & 3 deletions workflows/ampliseq.nf
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,10 @@ include { QIIME2_INTAX } from '../modules/local/qiime2_intax'
include { PICRUST } from '../modules/local/picrust'
include { SBDIEXPORT } from '../modules/local/sbdiexport'
include { SBDIEXPORTREANNOTATE } from '../modules/local/sbdiexportreannotate'
include { PHYLOSEQ } from '../modules/local/phyloseq'
include { PHYLOSEQ_INASV } from '../modules/local/phyloseq_inasv'
include { PHYLOSEQ_INTAX as PHYLOSEQ_INTAX_PPLACE } from '../modules/local/phyloseq_intax'
include { PHYLOSEQ_INTAX as PHYLOSEQ_INTAX_QIIME2 } from '../modules/local/phyloseq_intax'

//
// SUBWORKFLOW: Consisting of a mix of local and nf-core/modules
Expand Down Expand Up @@ -456,7 +460,7 @@ workflow AMPLISEQ {
}
FASTA_NEWICK_EPANG_GAPPA ( ch_pp_data )
ch_versions = ch_versions.mix( FASTA_NEWICK_EPANG_GAPPA.out.versions )

a4000 marked this conversation as resolved.
Show resolved Hide resolved
ch_pplace_tax = FORMAT_PPLACETAX ( FASTA_NEWICK_EPANG_GAPPA.out.taxonomy_per_query ).tsv
} else {
ch_pplace_tax = Channel.empty()
Expand All @@ -477,7 +481,7 @@ workflow AMPLISEQ {
ch_qiime_classifier
)
ch_versions = ch_versions.mix( QIIME2_TAXONOMY.out.versions.ifEmpty(null) ) //usually a .first() is here, dont know why this leads here to a warning
}
}
a4000 marked this conversation as resolved.
Show resolved Hide resolved

//
// SUBWORKFLOW / MODULES : Downstream analysis with QIIME2
Expand Down Expand Up @@ -597,7 +601,7 @@ workflow AMPLISEQ {
tax_agglom_max
)
}
}
}
a4000 marked this conversation as resolved.
Show resolved Hide resolved

//
// MODULE: Predict functional potential of a bacterial community from marker genes with Picrust2
Expand Down Expand Up @@ -627,6 +631,62 @@ workflow AMPLISEQ {
ch_versions = ch_versions.mix(SBDIEXPORT.out.versions.first())
}

//
// MODULE: Create phyloseq objects
//
if ( !params.skip_taxonomy ) {
d4straub marked this conversation as resolved.
Show resolved Hide resolved
if ( params.metadata ) {
ch_phyloseq_inmeta = ch_metadata.first() // The .first() is to make sure it's a value channel
} else {
ch_phyloseq_inmeta = []
}

ch_phyloseq_intax = Channel.empty()
if ( !params.skip_dada_taxonomy ) {
ch_phyloseq_intax = ch_phyloseq_intax.mix (
ch_dada2_tax.map { it = [ "dada2", file(it) ] }
)
}

if ( params.sintax_ref_taxonomy ) {
ch_phyloseq_intax = ch_phyloseq_intax.mix (
ch_sintax_tax.map { it = [ "sintax", file(it) ] }
)
}

if ( params.pplace_tree ) {
ch_phyloseq_intax = ch_phyloseq_intax.mix (
PHYLOSEQ_INTAX_PPLACE (
ch_pplace_tax
).tsv.map { it = [ "pplace", file(it) ] }
)

ch_phyloseq_intree = FASTA_NEWICK_EPANG_GAPPA.out.grafted_phylogeny.map { it = it[1] }.first()
} else {
ch_phyloseq_intree = []
}

if ( run_qiime2 ) {
ch_phyloseq_intax = ch_phyloseq_intax.mix (
PHYLOSEQ_INTAX_QIIME2 (
QIIME2_TAXONOMY.out.tsv
).tsv.map { it = [ "qiime2", file(it) ] }
)

if ( params.exclude_taxa != "none" || params.min_frequency != 1 || params.min_samples != 1 ) {
ch_phyloseq_inasv = PHYLOSEQ_INASV ( QIIME2_FILTERTAXA.out.tsv ).tsv

} else {
ch_phyloseq_inasv = ch_dada2_asv
}
} else {
ch_phyloseq_inasv = ch_dada2_asv
}

PHYLOSEQ ( ch_phyloseq_intax, ch_phyloseq_inasv, ch_phyloseq_inmeta, ch_phyloseq_intree )
ch_versions = ch_versions.mix(PHYLOSEQ.out.versions.first())
}

CUSTOM_DUMPSOFTWAREVERSIONS (
ch_versions.unique().collectFile(name: 'collated_versions.yml')
)
Expand Down
Loading