-
Notifications
You must be signed in to change notification settings - Fork 0
Home
This Wiki describes in detail how to use Tourmaline. Navigate using the sidebar on the right.
Tourmaline is an amplicon sequence processing workflow for Illumina sequence data that uses QIIME 2 and the software packages it wraps. Tourmaline manages commands, inputs, and outputs using the Snakemake workflow management system.
- QIIME 2. The core commands of Tourmaline are all commands of QIIME 2, one of the most popular amplicon sequence analysis software tools availalbe. You can print all of the QIIME 2 and other shell commands of your workflow before or while running the workflow.
- Configuration file. Parameters are set in a single configuration file, so it's easy to see what your workflow is doing and make changes for a subsequent run.
- Directory structure. Every Tourmaline run produces the same directory structure, so you always know where your outputs are. Analyze multiple outputs programmatically using the R or Python tools of your choice, such as the Tourmaline Toolkit notebooks included in this repository.
- Parameter optimization. The configuaration file and standard directory structure make it simple to test and compare different parameter sets to optimize your workflow.
- Reports. Every Tourmaline run produces an HTML report containing a summary of your metadata and outputs, with links to web-viewable QIIME 2 visualization files.
Ready to get started? If this is your first time using Tourmaline or Snakemake, you may want to browse the Wiki pages on the right. If you want to get started right away, check out the Quick Start below.
Tourmaline provides Snakemake rules for DADA2 (single-end and paired-end) and Deblur (single-end). For each type of processing, the "denoise" rule imports data and runs denoising; the "diversity" rule does representative sequence curation, core diversity analyses, and alpha and beta group significance; and the "report" rule generates the QC report.
Start by cloning the Tourmaline directory and files:
git clone https://github.com/aomlomics/tourmaline.git
If this is your first time running Tourmaline, you'll need to set up your directory. See the Wiki's Setup page for instructions. Briefly, to process the Test data:
- Put reference database taxonomy and FASTA (as imported QIIME 2 archives) in
01-imported
. - Edit FASTQ manifests
manifest_se.csv
andmanifest_pe.csv
in00-data
so file paths match the location of yourtourmaline
directory. - Create a symbolic link from
Snakefile_mac
orSnakefile_linux
(depending on your system) toSnakefile
.
Or to process Your data:
- Put reference database taxonomy and FASTA files in
00-data
or imported QIIME 2 archives in01-imported
. - Edit FASTQ manifests
manifest_se.csv
andmanifest_pe.csv
so file paths point to your .fastq.gz files (they can be anywhere on your computer) and sample names match the metadata file. - Edit metadata file
metadata.tsv
to contain your sample names and any relevant metadata for your samples. - Edit configuration file
config.yaml
to change PCR locus/primers, DADA2/Deblur parameters, and rarefaction depth. - Create a symbolic link from
Snakefile_mac
orSnakefile_linux
(depending on your system) toSnakefile
.
If you've run Tourmaline on your dataset before, you can initialize a new Tourmaline directory with the files and symlinks of an existing one using the command below:
cd /PATH/TO/NEW/TOURMALINE
scripts/initialize_dir_from_existing_tourmaline_dir.sh /PATH/TO/EXISTING/TOURMALINE
# then make any changes to your configuration before running
Shown here is the DADA2 paired-end workflow. From the tourmaline
directory (which you may rename), run Snakemake with the "denoise" rule as the target:
snakemake dada2_pe_denoise
Pausing after the "denoise" step allows you to make changes before proceeding:
- Check the table summaries and representative sequence lengths to determine if DADA2 or Deblur parameters need to be modified. If so, you can rename or delete the output directories and then rerun the "denoise" rule.
- View the table visualization to decide an appropriate subsampling (rarefaction) depth. Then modify the parameters "alpha_max_depth" and "core_sampling_depth" in
config.yaml
. - Filter your biom table and representative sequences to remove unwanted sequences. For example, if your amplicon is 16S rRNA, you may want to filter out chloroplast/mitochondria sequences. You should keep the same filenames so that Snakemake will recognize them; you can save the old versions with different names if you don't want to overwrite them.
After you are satisfied with your parameters and files, run the "diversity" rule:
snakemake dada2_pe_diversity
Finally, run the "report" rule:
snakemake dada2_pe_report
If any of the above commands don't work, read the error messages carefully, try to figure out what went wrong, and attempt to fix the offending file. A common issue is the file paths in your FASTQ manifest file need to be updated.
If you want to make a fresh run and not save the previous output, simply delete the output directories (e.g., 02-output-{method}
and 03-report
) generated in the previous run.