diff --git a/docs/v0.1_r11_speedup.md b/docs/v0.1_r11_speedup.md new file mode 100644 index 0000000..f584aab --- /dev/null +++ b/docs/v0.1_r11_speedup.md @@ -0,0 +1,41 @@ +# Notes on v0.1-r11 + +We focused on speedup in `v0.1-r11`. We tried a few techniques and listed those that worked as follows. + +1. **C implementation for pileup and full-alignment feature generation.** Before r11, feature generation (tensor creation) in Clair3 was sped up using pypy on python code. The speedup was ~10x over native python. The practice balanced speed and ease of coding in the developmental stage of Clair3. In r11, we added C implementation, bringing another ~2-3 times speedup over pypy. The C code is integrated with the other python parts using CFFI (C Foreign Function Interface). The variants called with the new C implementation are identical to the previous version. +2. **Use longphase for phasing.** [longphase](https://github.com/twolinin/longphase) by [Lin et al.](https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btac058/6519151) is an ultra-fast chromosome-scale phasing algorithm for small and large variants. In our experiments, longphase took ~3 minutes to phase 69x Q20 ONT WGS with 24 CPU cores and no I/O bound, faster than `whatshap` that took 52 minutes. To enable using longphase for phasing, please use the `--longphase_for_phasing` option. Our suggestions on when to enable longphase are shown in the section below. +3. **Haplotagging on the fly.** Whatshap `haplotag` was used to add an `HP` tag to each read after phasing. This process writes out a new BAM, which is I/O intensive and in fact, unnecessary. In r11, we implemented haplotagging to feed tagged read directly to full-alignment calling. We used the exact logic that was implemented in whatshap's haplotag module. This technique, no matter whatshap or longphase was used, saves more than 10-20 minutes on compressing, writing and reading a new BAM. + +We benchmarked r11 against r10 with [69x Q20 ONT HG002 data](https://labs.epi2me.io/gm24385_q20_2021.10). 24 CPU cores with minimal I/O speed limit were used. The results are as follows. With C implementation and longphase enabled, the total runtime reduced from 234 to 101 minutes. + +| Implementation | Sample | CPU cores | Inference hardware | Total runtime | Pileup runtime | Phasing runtime | Full-alignment runtime | +| ------------------ | ----------------- | --------- | ------------------ | ------------- | -------------- | --------------- | ---------------------- | +| c\_impl, longphase | HG002 WGS Q20 69x | 24 | CPU | 101m | 38m | 3m | 56m | +| v0.1-r10, whatshap | HG002 WGS Q20 69x | 24 | CPU | 234m | 57m | 52m | 118m | + +---- + +## When to use `longphase` (to replace `whatshap`) + +`longphase` is **not** enabled by default. We suggest enabling `longphase` through the `--longphase_for_phasing` option when calling variants in human with ≥20x of data. **Use `whatshap` with non-human samples or insufficient depth.** + +Benchmarks between using longphase and whatshap on HG003 WGS ONT Guppy5 with five depths from 10x to 50x are as follows. + +| Phasing algorithm | Depth | SNP-Precision | SNP-Recall | SNP-F1 | Indel-Precision | Indel-Recall | Indel-F1 | +| ----------------- | ----- | ------------- | ---------- | ------ | --------------- | ------------ | -------- | +| longphase | 10x | 96.75% | 93.94% | 95.32% | 82.86% | 47.30% | 60.22% | +| whatshap | 10x | 95.87% | 96.64% | 96.26% | 83.37% | 47.50% | 60.52% | +| longphase | 20x | 99.22% | 99.27% | 99.25% | 88.49% | 62.22% | 73.07% | +| whatshap | 20x | 99.21% | 99.36% | 99.28% | 88.75% | 60.47% | 71.93% | +| longphase | 30x | 99.50% | 99.60% | 99.55% | 90.63% | 68.39% | 77.96% | +| whatshap | 30x | 99.50% | 99.61% | 99.56% | 90.61% | 66.52% | 76.72% | +| longphase | 40x | 99.59% | 99.67% | 99.63% | 91.69% | 72.34% | 80.87% | +| whatshap | 40x | 99.60% | 99.70% | 99.65% | 91.71% | 72.39% | 80.91% | +| longphase | 50x | 99.63% | 99.70% | 99.66% | 92.17% | 75.29% | 82.88% | +| whatshap | 50x | 99.62% | 99.70% | 99.66% | 91.59% | 73.66% | 81.65% | + +--- + +## Use the old python-based feature generation code (to disable the new C implementation) + +The new C implementation generates results identical to the previous version. However, we retained the old python-based feature generation code for benchmarking or back-compatibility purposes. Users can use it through the `--disable_c_impl` option.