Skip to content

Commit

Permalink
metagenome
Browse files Browse the repository at this point in the history
  • Loading branch information
gbouras13 committed Jun 2, 2023
1 parent 02a9503 commit 686beb6
Show file tree
Hide file tree
Showing 3 changed files with 43 additions and 3 deletions.
12 changes: 9 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ Additionally, I would recommend reading the following guides to bacterial genome
- [Documentation](#documentation)
- [Method](#method)
- [Other Features](#other-features)
- [Quality Control](#quality-control)
- [Quality Control](#quality-control)
- [Metagenomes](#metagenomes)
- [Installation](#installation)
- [Conda](#conda)
- [Pip](#pip)
Expand Down Expand Up @@ -152,13 +153,18 @@ Please see [here](docs/multiple_chromosomes.md) for more details and an example.
* However, if you think you have reasonably simple metagenomic samples with long reads, in theory `plassembler` could be used.
* We tested `plassembler` on the R10.4 sequenced ZymoBIOMICS HMW Standard sequenced in this [paper](https://www.nature.com/articles/s41592-022-01539-7), and it successfully recovered the two plasmids present in the 7 strains present in that mock community.


# Quality Control
## Quality Control

* `plassembler` can also be used for quality control to test whether your long and short read sets come from the same isolate, even within the same species.

Please see [here](docs/quality_control.md) for more details and some examples.

## Metagenomes

* `plassembler` is not currently recommended for metagenomic datasets, because of their high diversity, leading to difficulties in recovering chromosome-length contigs for bacteria. Additionally, Unicycler is not recommended for metagenomes. However, `plassembler` was tested on a high depth simple mock community dataset. It worked quite nicely, but we don't anticipate it will work as well on your data!

Please see [here](docs/metagenomics.md) for more details.

## Installation

Plassembler has been tested on Linux and MacOS machines.
Expand Down
33 changes: 33 additions & 0 deletions docs/metagenomics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
##### Metagenomics

As it currently stands, we do not recommend `plassembler` for metagenomic sequences. This is because of their high diversity, leading to difficulties in recovering chromosome-length contigs for bacteria. Additionally, Unicycler (a core dependency of Plassembler) is not recommended for metagenomes.

However, we anticipate that as sequencing becomes more accurate and cheaper, it will be increasingly possible to assemble plasmids using a `plassembler` like approach from metagenomes - it's a work in progress.

So as a test, we tried assembling the ZYMO HMW DNA Standard dataset from this [paper](https://www.nature.com/articles/s41592-022-01539-7), under ENA accession PRJEB48692. This mock community contains 7 bacteria and 1 fungus isolate. Notably, this dataset had extremely had deep (all bacterial chromosomes >100x coverage) and long (N50 > 20kbp) reads, so is unlikely to reflect your real-world metagenomic data as of 2023.

## Get Data

```
# installation
mamba create -n fastq-dl fastq-dl
conda activate fastq-dl
# downloads all the read sets
fastq-dl PRJEB48692
conda deactivate
```

## Run Plassembler

We decided to use `-m 10000`, because we figures that smalll plasmids would be missed by Flye anyway, and wanted complete chromosome assemblies, and a `-c 500000`. We used 32 threads on 16 cores and allocated 80 GB of RAM.

```
plassembler run -d Plassembler_DB -l ERR7287988.fastq.gz -1 ERR7255689_1.fastq.gz -2 ERR7255689_2.fastq.gz \
-f -t 32 -q 10 -o zymo_R10.4_flye -m 10000 -c 500000
```

`plassembler` took around 8 hours (wall clock) to finish and excitingly we assembled all 7 bacterial chromosomes using Flye (unsurprising!) along with the 5 plasmids indicated in the ground truth (1 _E. coli_ 100kbp, 1 _S. enterica_ 49kbp and 3 small _S. aureus_ plasmids (6, 2 and 2 kbp)) with genome fraction 100% from QUAST.

So in theory `plassembler` might work on metagenomes, but I would caution against using it, for now.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,5 @@ nav:
- OTHER FEATURES:
- Quality Control: quality_control.md
- Multiple Chromosomes: multiple_chromosome.md
- Metagenomes: metagenomics.md

0 comments on commit 686beb6

Please sign in to comment.