-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List of ideas to improve assemblies #57
Comments
Working on Flye and Pilon! |
Filtlong can down-sample reads to the longest/highest quality reads and rasusa can downsample randomly. I know there are more papers about the ideal depth for assembly, but I can only find this old one for now (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0060204). In my own experience, there are a lot more sequencing artifacts once you get above 100X. |
Another idea I recommend adding is a rotation step. This ensures all bacterial chromosomes at least start at dnaA. A case-in-point. These are two chromosomes from a clonal outbreak. They are actual very similar, but one wasn't rotated correctly. There are a few tools that rotate circular sequences. I think circlator fixstart (abandonware) and dnaapler are the ones that I use most. |
For Assembly QC, I'm a fan of gfastats for metrics about the created gfa files and nanoplot. They have a lot of overlapping features, but gfastats does indicate if a sequence is circular. Nanoplot already has a module in multiQC. |
I actually made very good experience for nanopore assembly with dragonflye (in nf-core modules: https://nf-co.re/modules/dragonflye), the results were close to identical with trycycler results, but execution of the former was very fast (few minutes) while with trycycler it was a chore with many manual inventions. |
Those are really good points @erinyoung and @d4straub 🙌🏾 🙌🏾 . Downsample stepYep, downsample is indeed necessary. We could try random subsampling with rasusa.. In De Maio N et.al., 2019 mentioned that the random strategy generates better assemblies compared to filtering strategy. But, it always depends on the input data and goal. Rotation stepSure, but I think that Ciclator is not supported either... What do you suggest? Adding ciclator together with dnaapler?, or just dnaapler? dragonflye - Longreads assemblyInteresting, I haven't tried this tool yet. But if it overcomes the manual intervention of Tricycler, then it would be great to add this module. I know that Flye allows not only ONT but also PACBIO. |
I have found these two papers that may help us to decide. Both include a detailed flowchart with some of the tools we already have included and additional tools/strategies: |
Trycycler will require large effort to automatize. For example rrwick/Trycycler#47 |
Here's a blog post from Dr. Wick about depth and quality : https://rrwick.github.io/2023/11/06/accuracy-vs-depth-update.html
|
This is a collection of ideas that should be considered after the DSL2 conversion #56 is finished. The list is subject to change. Any ideas or discussions are welcome.
Preprocessing (check out nf-core/mag, any other examples out there?)
Assemblers:
Assembly QC:
Structural:
Defaults
--skip_kraken2
should be either removed (i.e. using--krakendb
to determine whether Kraken2 is used) or a simple default (small, fast, but helpful) value should be chosen for--krakendb
, e.g. "https://genome-idx.s3.amazonaws.com/kraken/16S_Greengenes13.5_20200326.tgz". This is a very small 16S database but should be sufficient to detect serious bacterial contamination.The text was updated successfully, but these errors were encountered: