diff --git a/docs/v0.1_r11_speedup.md b/docs/v0.1_r11_speedup.md
new file mode 100644
index 0000000..f584aab
--- /dev/null
+++ b/docs/v0.1_r11_speedup.md
@@ -0,0 +1,41 @@
+# Notes on v0.1-r11
+
+We focused on speedup in `v0.1-r11`. We tried a few techniques and listed those that worked as follows.
+
+1.  **C implementation for pileup and full-alignment feature generation.**  Before r11, feature generation (tensor creation) in Clair3 was sped up using pypy on python code. The speedup was ~10x over native python. The practice balanced speed and ease of coding in the developmental stage of Clair3. In r11, we added C implementation, bringing another ~2-3 times speedup over pypy. The C code is integrated with the other python parts using CFFI (C Foreign Function Interface). The variants called with the new C implementation are identical to the previous version. 
+2. **Use longphase for phasing.**  [longphase](https://github.com/twolinin/longphase) by [Lin et al.](https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btac058/6519151) is an ultra-fast chromosome-scale phasing algorithm for small and large variants. In our experiments, longphase took ~3 minutes to phase 69x Q20 ONT WGS with 24 CPU cores and no I/O bound, faster than `whatshap` that took 52 minutes. To enable using longphase for phasing, please use the `--longphase_for_phasing` option. Our suggestions on when to enable longphase are shown in the section below.
+3. **Haplotagging on the fly.**  Whatshap `haplotag` was used to add an `HP` tag to each read after phasing. This process writes out a new BAM, which is I/O intensive and in fact, unnecessary. In r11, we implemented haplotagging to feed tagged read directly to full-alignment calling. We used the exact logic that was implemented in whatshap's haplotag module. This technique, no matter whatshap or longphase was used, saves more than 10-20 minutes on compressing, writing and reading a new BAM.
+
+We benchmarked r11 against r10 with [69x Q20 ONT HG002 data](https://labs.epi2me.io/gm24385_q20_2021.10). 24 CPU cores with minimal I/O speed limit were used. The results are as follows. With C implementation and longphase enabled, the total runtime reduced from 234 to 101 minutes.
+
+| Implementation     | Sample            | CPU cores | Inference hardware | Total runtime | Pileup runtime | Phasing runtime | Full-alignment runtime |
+| ------------------ | ----------------- | --------- | ------------------ | ------------- | -------------- | --------------- | ---------------------- |
+| c\_impl, longphase | HG002 WGS Q20 69x | 24        | CPU                | 101m          | 38m            | 3m              | 56m                    |
+| v0.1-r10, whatshap | HG002 WGS Q20 69x | 24        | CPU                | 234m          | 57m            | 52m             | 118m                   |
+
+----
+
+## When to use `longphase` (to replace `whatshap`)
+
+`longphase` is **not** enabled by default. We suggest enabling `longphase` through the `--longphase_for_phasing` option when calling variants in human with ≥20x of data. **Use `whatshap` with non-human samples or insufficient depth.**
+
+Benchmarks between using longphase and whatshap on HG003 WGS ONT Guppy5 with five depths from 10x to 50x are as follows.
+
+| Phasing algorithm | Depth | SNP-Precision | SNP-Recall | SNP-F1 | Indel-Precision | Indel-Recall | Indel-F1 |
+| ----------------- | ----- | ------------- | ---------- | ------ | --------------- | ------------ | -------- |
+| longphase         | 10x   | 96.75%        | 93.94%     | 95.32% | 82.86%          | 47.30%       | 60.22%   |
+| whatshap          | 10x   | 95.87%        | 96.64%     | 96.26% | 83.37%          | 47.50%       | 60.52%   |
+| longphase         | 20x   | 99.22%        | 99.27%     | 99.25% | 88.49%          | 62.22%       | 73.07%   |
+| whatshap          | 20x   | 99.21%        | 99.36%     | 99.28% | 88.75%          | 60.47%       | 71.93%   |
+| longphase         | 30x   | 99.50%        | 99.60%     | 99.55% | 90.63%          | 68.39%       | 77.96%   |
+| whatshap          | 30x   | 99.50%        | 99.61%     | 99.56% | 90.61%          | 66.52%       | 76.72%   |
+| longphase         | 40x   | 99.59%        | 99.67%     | 99.63% | 91.69%          | 72.34%       | 80.87%   |
+| whatshap          | 40x   | 99.60%        | 99.70%     | 99.65% | 91.71%          | 72.39%       | 80.91%   |
+| longphase         | 50x   | 99.63%        | 99.70%     | 99.66% | 92.17%          | 75.29%       | 82.88%   |
+| whatshap          | 50x   | 99.62%        | 99.70%     | 99.66% | 91.59%          | 73.66%       | 81.65%   |
+
+---
+
+## Use the old python-based feature generation code (to disable the new C implementation)
+
+The new C implementation generates results identical to the previous version. However, we retained the old python-based feature generation code for benchmarking or back-compatibility purposes. Users can use it through the `--disable_c_impl` option.