Skip to content

Commit 686beb6

Browse files
committed
metagenome
1 parent 02a9503 commit 686beb6

File tree

3 files changed

+43
-3
lines changed

3 files changed

+43
-3
lines changed

README.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,8 @@ Additionally, I would recommend reading the following guides to bacterial genome
3030
- [Documentation](#documentation)
3131
- [Method](#method)
3232
- [Other Features](#other-features)
33-
- [Quality Control](#quality-control)
33+
- [Quality Control](#quality-control)
34+
- [Metagenomes](#metagenomes)
3435
- [Installation](#installation)
3536
- [Conda](#conda)
3637
- [Pip](#pip)
@@ -152,13 +153,18 @@ Please see [here](docs/multiple_chromosomes.md) for more details and an example.
152153
* However, if you think you have reasonably simple metagenomic samples with long reads, in theory `plassembler` could be used.
153154
* We tested `plassembler` on the R10.4 sequenced ZymoBIOMICS HMW Standard sequenced in this [paper](https://www.nature.com/articles/s41592-022-01539-7), and it successfully recovered the two plasmids present in the 7 strains present in that mock community.
154155

155-
156-
# Quality Control
156+
## Quality Control
157157

158158
* `plassembler` can also be used for quality control to test whether your long and short read sets come from the same isolate, even within the same species.
159159

160160
Please see [here](docs/quality_control.md) for more details and some examples.
161161

162+
## Metagenomes
163+
164+
* `plassembler` is not currently recommended for metagenomic datasets, because of their high diversity, leading to difficulties in recovering chromosome-length contigs for bacteria. Additionally, Unicycler is not recommended for metagenomes. However, `plassembler` was tested on a high depth simple mock community dataset. It worked quite nicely, but we don't anticipate it will work as well on your data!
165+
166+
Please see [here](docs/metagenomics.md) for more details.
167+
162168
## Installation
163169

164170
Plassembler has been tested on Linux and MacOS machines.

docs/metagenomics.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
##### Metagenomics
2+
3+
As it currently stands, we do not recommend `plassembler` for metagenomic sequences. This is because of their high diversity, leading to difficulties in recovering chromosome-length contigs for bacteria. Additionally, Unicycler (a core dependency of Plassembler) is not recommended for metagenomes.
4+
5+
However, we anticipate that as sequencing becomes more accurate and cheaper, it will be increasingly possible to assemble plasmids using a `plassembler` like approach from metagenomes - it's a work in progress.
6+
7+
So as a test, we tried assembling the ZYMO HMW DNA Standard dataset from this [paper](https://www.nature.com/articles/s41592-022-01539-7), under ENA accession PRJEB48692. This mock community contains 7 bacteria and 1 fungus isolate. Notably, this dataset had extremely had deep (all bacterial chromosomes >100x coverage) and long (N50 > 20kbp) reads, so is unlikely to reflect your real-world metagenomic data as of 2023.
8+
9+
## Get Data
10+
11+
```
12+
# installation
13+
mamba create -n fastq-dl fastq-dl
14+
conda activate fastq-dl
15+
16+
# downloads all the read sets
17+
fastq-dl PRJEB48692
18+
19+
conda deactivate
20+
```
21+
22+
## Run Plassembler
23+
24+
We decided to use `-m 10000`, because we figures that smalll plasmids would be missed by Flye anyway, and wanted complete chromosome assemblies, and a `-c 500000`. We used 32 threads on 16 cores and allocated 80 GB of RAM.
25+
26+
```
27+
plassembler run -d Plassembler_DB -l ERR7287988.fastq.gz -1 ERR7255689_1.fastq.gz -2 ERR7255689_2.fastq.gz \
28+
-f -t 32 -q 10 -o zymo_R10.4_flye -m 10000 -c 500000
29+
```
30+
31+
`plassembler` took around 8 hours (wall clock) to finish and excitingly we assembled all 7 bacterial chromosomes using Flye (unsurprising!) along with the 5 plasmids indicated in the ground truth (1 _E. coli_ 100kbp, 1 _S. enterica_ 49kbp and 3 small _S. aureus_ plasmids (6, 2 and 2 kbp)) with genome fraction 100% from QUAST.
32+
33+
So in theory `plassembler` might work on metagenomes, but I would caution against using it, for now.

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,4 +27,5 @@ nav:
2727
- OTHER FEATURES:
2828
- Quality Control: quality_control.md
2929
- Multiple Chromosomes: multiple_chromosome.md
30+
- Metagenomes: metagenomics.md
3031

0 commit comments

Comments
 (0)