diff --git a/README.md b/README.md
index 15e9e17e69..61920be654 100644
--- a/README.md
+++ b/README.md
@@ -2,46 +2,83 @@
-## Introduction
+# Introduction
-icefall contains ASR recipes for various datasets
-using .
+The icefall peoject contains speech related recipes for various datasets
+using [k2-fsa](https://github.com/k2-fsa/k2) and [lhotse](https://github.com/lhotse-speech/lhotse).
-You can use to deploy models
-trained with icefall.
+You can use [sherpa](https://github.com/k2-fsa/sherpa), [sherpa-ncnn](https://github.com/k2-fsa/sherpa-ncnn) or [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) for deployment with models
+in icefall; these frameworks also support models not included in icefall; please refer to respective documents for more details.
You can try pre-trained models from within your browser without the need
-to download or install anything by visiting
-See for more details.
+to download or install anything by visiting this [huggingface space](https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition).
+Please refer to [document](https://k2-fsa.github.io/icefall/huggingface/spaces.html) for more details.
-## Installation
+# Installation
-Please refer to
+Please refer to [document](https://icefall.readthedocs.io/en/latest/installation/index.html)
for installation.
-## Recipes
+# Recipes
-Please refer to
-for more information.
+Please refer to [document](https://icefall.readthedocs.io/en/latest/recipes/index.html)
+for more details.
-We provide the following recipes:
+## ASR: Automatic Speech Recognition
+### Supported Datasets
- [yesno][yesno]
- - [LibriSpeech][librispeech]
- - [GigaSpeech][gigaspeech]
- - [AMI][ami]
+
+ - [Aidatatang_200zh][aidatatang_200zh]
- [Aishell][aishell]
- [Aishell2][aishell2]
- [Aishell4][aishell4]
- - [TIMIT][timit]
- - [TED-LIUM3][tedlium3]
- - [Aidatatang_200zh][aidatatang_200zh]
- - [WenetSpeech][wenetspeech]
- [Alimeeting][alimeeting]
+ - [AMI][ami]
+ - [CommonVoice][commonvoice]
+ - [Corpus of Spontaneous Japanese][csj]
+ - [GigaSpeech][gigaspeech]
+ - [LibriCSS][libricss]
+ - [LibriSpeech][librispeech]
+ - [Libriheavy][libriheavy]
+ - [Multi-Dialect Broadcast News Arabic Speech Recognition][mgb2]
+ - [PeopleSpeech][peoplespeech]
+ - [SPGISpeech][spgispeech]
- [Switchboard][swbd]
+ - [TIMIT][timit]
+ - [TED-LIUM3][tedlium3]
- [TAL_CSASR][tal_csasr]
+ - [Voxpopuli][voxpopuli]
+ - [XBMU-AMDO31][xbmu-amdo31]
+ - [WenetSpeech][wenetspeech]
+
+More datasets will be added in the future.
+
+### Supported Models
+
+The [LibriSpeech][librispeech] recipe supports the most comprehensive set of models, you are welcome to try them out.
+
+#### CTC
+ - TDNN LSTM CTC
+ - Conformer CTC
+ - Zipformer CTC
+
+#### MMI
+ - Conformer MMI
+ - Zipformer MMI
+
+#### Transducer
+ - Conformer-based Encoder
+ - LSTM-based Encoder
+ - Zipformer-based Encoder
+ - LSTM-based Predictor
+ - [Stateless Predictor](https://research.google/pubs/rnn-transducer-with-stateless-prediction-network/)
-### yesno
+If you are willing to contribute to icefall, please refer to [contributing](https://icefall.readthedocs.io/en/latest/contributing/index.html) for more details.
+
+We would like to highlight the performance of some of the recipes here.
+
+### [yesno][yesno]
This is the simplest ASR recipe in `icefall` and can be run on CPU.
Training takes less than 30 seconds and gives you the following WER:
@@ -52,350 +89,264 @@ Training takes less than 30 seconds and gives you the following WER:
We provide a Colab notebook for this recipe: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tIjjzaJc3IvGyKiMCDWO-TSnBgkcuN3B?usp=sharing)
-### LibriSpeech
+### [LibriSpeech][librispeech]
-Please see
+Please see [RESULTS.md](https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md)
for the **latest** results.
-We provide 5 models for this recipe:
-
-- [conformer CTC model][LibriSpeech_conformer_ctc]
-- [TDNN LSTM CTC model][LibriSpeech_tdnn_lstm_ctc]
-- [Transducer: Conformer encoder + LSTM decoder][LibriSpeech_transducer]
-- [Transducer: Conformer encoder + Embedding decoder][LibriSpeech_transducer_stateless]
-- [Transducer: Zipformer encoder + Embedding decoder][LibriSpeech_zipformer]
-
-#### Conformer CTC Model
-
-The best WER we currently have is:
+#### [Conformer CTC](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/conformer_ctc)
| | test-clean | test-other |
|-----|------------|------------|
| WER | 2.42 | 5.73 |
-We provide a Colab notebook to run a pre-trained conformer CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1huyupXAcHsUrKaWfI83iMEJ6J0Nh0213?usp=sharing)
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1huyupXAcHsUrKaWfI83iMEJ6J0Nh0213?usp=sharing)
-#### TDNN LSTM CTC Model
-
-The WER for this model is:
+#### [TDNN LSTM CTC](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/tdnn_lstm_ctc)
| | test-clean | test-other |
|-----|------------|------------|
| WER | 6.59 | 17.69 |
-We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1-iSfQMp2So-We_Uu49N4AAcMInB72u9z?usp=sharing)
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1-iSfQMp2So-We_Uu49N4AAcMInB72u9z?usp=sharing)
-#### Transducer: Conformer encoder + LSTM decoder
+#### [Transducer (Conformer Encoder + LSTM Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/transducer)
-Using Conformer as encoder and LSTM as decoder.
+| | test-clean | test-other |
+|---------------|------------|------------|
+| greedy search | 3.07 | 7.51 |
-The best WER with greedy search is:
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1_u6yK9jDkPwG_NLrZMN2XK7Aeq4suMO2?usp=sharing)
-| | test-clean | test-other |
-|-----|------------|------------|
-| WER | 3.07 | 7.51 |
+#### [Transducer (Conformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/transducer)
-We provide a Colab notebook to run a pre-trained RNN-T conformer model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1_u6yK9jDkPwG_NLrZMN2XK7Aeq4suMO2?usp=sharing)
+| | test-clean | test-other |
+|---------------------------------------|------------|------------|
+| modified_beam_search (`beam_size=4`) | 2.56 | 6.27 |
-#### Transducer: Conformer encoder + Embedding decoder
-Using Conformer as encoder. The decoder consists of 1 embedding layer
-and 1 convolutional layer.
+We provide a Colab notebook to run test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1CO1bXJ-2khDckZIW8zjOPHGSKLHpTDlp?usp=sharing)
-The best WER using modified beam search with beam size 4 is:
-| | test-clean | test-other |
-|-----|------------|------------|
-| WER | 2.56 | 6.27 |
+#### [Transducer (Zipformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/zipformer)
-Note: No auxiliary losses are used in the training and no LMs are used
-in the decoding.
+WER (modified_beam_search `beam_size=4` unless further stated)
-We provide a Colab notebook to run a pre-trained transducer conformer + stateless decoder model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1CO1bXJ-2khDckZIW8zjOPHGSKLHpTDlp?usp=sharing)
-
-
-#### k2 pruned RNN-T
+1. LibriSpeech-960hr
| Encoder | Params | test-clean | test-other | epochs | devices |
|-----------------|--------|------------|------------|---------|------------|
-| zipformer | 65.5M | 2.21 | 4.79 | 50 | 4 32G-V100 |
-| zipformer-small | 23.2M | 2.42 | 5.73 | 50 | 2 32G-V100 |
-| zipformer-large | 148.4M | 2.06 | 4.63 | 50 | 4 32G-V100 |
-| zipformer-large | 148.4M | 2.00 | 4.38 | 174 | 8 80G-A100 |
-
-Note: No auxiliary losses are used in the training and no LMs are used
-in the decoding.
+| Zipformer | 65.5M | 2.21 | 4.79 | 50 | 4 32G-V100 |
+| Zipformer-small | 23.2M | 2.42 | 5.73 | 50 | 2 32G-V100 |
+| Zipformer-large | 148.4M | 2.06 | 4.63 | 50 | 4 32G-V100 |
+| Zipformer-large | 148.4M | 2.00 | 4.38 | 174 | 8 80G-A100 |
-#### k2 pruned RNN-T + GigaSpeech
+2. LibriSpeech-960hr + GigaSpeech
-| | test-clean | test-other |
-|-----|------------|------------|
-| WER | 1.78 | 4.08 |
+| Encoder | Params | test-clean | test-other |
+|-----------------|--------|------------|------------|
+| Zipformer | 65.5M | 1.78 | 4.08 |
-Note: No auxiliary losses are used in the training and no LMs are used
-in the decoding.
-
-#### k2 pruned RNN-T + GigaSpeech + CommonVoice
-
-| | test-clean | test-other |
-|-----|------------|------------|
-| WER | 1.90 | 3.98 |
-Note: No auxiliary losses are used in the training and no LMs are used
-in the decoding.
+3. LibriSpeech-960hr + GigaSpeech + CommonVoice
+| Encoder | Params | test-clean | test-other |
+|-----------------|--------|------------|------------|
+| Zipformer | 65.5M | 1.90 | 3.98 |
-### GigaSpeech
-We provide three models for this recipe:
+### [GigaSpeech][gigaspeech]
-- [Conformer CTC model][GigaSpeech_conformer_ctc]
-- [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][GigaSpeech_pruned_transducer_stateless2].
-- [Transducer: Zipformer encoder + Embedding decoder][GigaSpeech_zipformer]
-
-#### Conformer CTC
+#### [Conformer CTC](https://github.com/k2-fsa/icefall/tree/master/egs/gigaspeech/ASR/conformer_ctc)
| | Dev | Test |
|-----|-------|-------|
| WER | 10.47 | 10.58 |
-#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss
+#### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/gigaspeech/ASR/pruned_transducer_stateless2)
+
+Conformer Encoder + Stateless Predictor + k2 Pruned RNN-T Loss
| | Dev | Test |
|----------------------|-------|-------|
-| greedy search | 10.51 | 10.73 |
-| fast beam search | 10.50 | 10.69 |
-| modified beam search | 10.40 | 10.51 |
+| greedy_search | 10.51 | 10.73 |
+| fast_beam_search | 10.50 | 10.69 |
+| modified_beam_search | 10.40 | 10.51 |
-#### Transducer: Zipformer encoder + Embedding decoder
+#### [Transducer (Zipformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/gigaspeech/ASR/zipformer)
| | Dev | Test |
|----------------------|-------|-------|
-| greedy search | 10.31 | 10.50 |
-| fast beam search | 10.26 | 10.48 |
-| modified beam search | 10.25 | 10.38 |
-
-
-### Aishell
-
-We provide three models for this recipe: [conformer CTC model][Aishell_conformer_ctc],
-[TDNN LSTM CTC model][Aishell_tdnn_lstm_ctc], and [Transducer Stateless Model][Aishell_pruned_transducer_stateless7],
+| greedy_search | 10.31 | 10.50 |
+| fast_beam_search | 10.26 | 10.48 |
+| modified_beam_search | 10.25 | 10.38 |
-#### Conformer CTC Model
-The best CER we currently have is:
+### [Aishell][aishell]
-| | test |
-|-----|------|
-| CER | 4.26 |
-
-#### TDNN LSTM CTC Model
-
-The CER for this model is:
+#### [TDNN LSTM CTC](https://github.com/k2-fsa/icefall/tree/master/egs/aishell/ASR/tdnn_lstm_ctc)
| | test |
|-----|-------|
| CER | 10.16 |
-We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1jbyzYq3ytm6j2nlEt-diQm-6QVWyDDEa?usp=sharing)
-
-#### Transducer Stateless Model
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1jbyzYq3ytm6j2nlEt-diQm-6QVWyDDEa?usp=sharing)
-The best CER we currently have is:
+#### [Transducer (Conformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/aishell/ASR/transducer_stateless)
| | test |
|-----|------|
| CER | 4.38 |
-We provide a Colab notebook to run a pre-trained TransducerStateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/14XaT2MhnBkK-3_RqqWq3K90Xlbin-GZC?usp=sharing)
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/14XaT2MhnBkK-3_RqqWq3K90Xlbin-GZC?usp=sharing)
+#### [Transducer (Zipformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/aishell/ASR/zipformer)
-### Aishell2
+WER (modified_beam_search `beam_size=4`)
-We provide one model for this recipe: [Transducer Stateless Model][Aishell2_pruned_transducer_stateless5].
+| Encoder | Params | dev | test | epochs |
+|-----------------|--------|-----|------|---------|
+| Zipformer | 73.4M | 4.13| 4.40 | 55 |
+| Zipformer-small | 30.2M | 4.40| 4.67 | 55 |
+| Zipformer-large | 157.3M | 4.03| 4.28 | 56 |
-#### Transducer Stateless Model
-The best WER we currently have is:
+### [Aishell4][aishell4]
-| | dev-ios | test-ios |
-|-----|------------|------------|
-| WER | 5.32 | 5.56 |
-
-
-### Aishell4
-
-We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][Aishell4_pruned_transducer_stateless5].
-
-#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with all subsets)
-
-The best CER we currently have is:
+#### [Transducer (pruned_transducer_stateless5)](https://github.com/k2-fsa/icefall/tree/master/egs/aishell4/ASR/pruned_transducer_stateless5)
+1 Trained with all subsets:
| | test |
|-----|------------|
| CER | 29.08 |
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing)
-We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing)
-
-
-### TIMIT
-
-We provide two models for this recipe: [TDNN LSTM CTC model][TIMIT_tdnn_lstm_ctc]
-and [TDNN LiGRU CTC model][TIMIT_tdnn_ligru_ctc].
-#### TDNN LSTM CTC Model
+### [TIMIT][timit]
-The best PER we currently have is:
+#### [TDNN LSTM CTC](https://github.com/k2-fsa/icefall/tree/master/egs/timit/ASR/tdnn_lstm_ctc)
-||TEST|
-|--|--|
+| |TEST|
+|---|----|
|PER| 19.71% |
-We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Hs9DA4V96uapw_30uNp32OMJgkuR5VVd?usp=sharing)
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Hs9DA4V96uapw_30uNp32OMJgkuR5VVd?usp=sharing)
-#### TDNN LiGRU CTC Model
+#### [TDNN LiGRU CTC](https://github.com/k2-fsa/icefall/tree/master/egs/timit/ASR/tdnn_ligru_ctc)
-The PER for this model is:
-
-||TEST|
-|--|--|
+| |TEST|
+|---|----|
|PER| 17.66% |
-We provide a Colab notebook to run a pre-trained TDNN LiGRU CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing)
-
-
-### TED-LIUM3
-
-We provide two models for this recipe: [Transducer Stateless: Conformer encoder + Embedding decoder][TED-LIUM3_transducer_stateless] and [Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][TED-LIUM3_pruned_transducer_stateless].
-
-#### Transducer Stateless: Conformer encoder + Embedding decoder
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing)
-The best WER using modified beam search with beam size 4 is:
-| | dev | test |
-|-----|-------|--------|
-| WER | 6.91 | 6.33 |
+### [TED-LIUM3][tedlium3]
-Note: No auxiliary losses are used in the training and no LMs are used in the decoding.
+#### [Transducer (Conformer Encoder + Embedding Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/tedlium3/ASR/transducer_stateless)
-We provide a Colab notebook to run a pre-trained Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1MmY5bBxwvKLNT4A2DJnwiqRXhdchUqPN?usp=sharing)
+| | dev | test |
+|--------------------------------------|-------|--------|
+| modified_beam_search (`beam_size=4`) | 6.91 | 6.33 |
-#### Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss
-The best WER using modified beam search with beam size 4 is:
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1MmY5bBxwvKLNT4A2DJnwiqRXhdchUqPN?usp=sharing)
-| | dev | test |
-|-----|-------|--------|
-| WER | 6.77 | 6.14 |
+#### [Transducer (pruned_transducer_stateless)](https://github.com/k2-fsa/icefall/tree/master/egs/tedlium3/ASR/pruned_transducer_stateless)
-We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1je_1zGrOkGVVd4WLzgkXRHxl-I27yWtz?usp=sharing)
+| | dev | test |
+|--------------------------------------|-------|--------|
+| modified_beam_search (`beam_size=4`) | 6.77 | 6.14 |
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1je_1zGrOkGVVd4WLzgkXRHxl-I27yWtz?usp=sharing)
-### Aidatatang_200zh
-We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][Aidatatang_200zh_pruned_transducer_stateless2].
+### [Aidatatang_200zh][aidatatang_200zh]
-#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss
+#### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/aidatatang_200zh/ASR/pruned_transducer_stateless2)
| | Dev | Test |
|----------------------|-------|-------|
-| greedy search | 5.53 | 6.59 |
-| fast beam search | 5.30 | 6.34 |
-| modified beam search | 5.27 | 6.33 |
+| greedy_search | 5.53 | 6.59 |
+| fast_beam_search | 5.30 | 6.34 |
+| modified_beam_search | 5.27 | 6.33 |
-We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wNSnSj3T5oOctbh5IGCa393gKOoQw2GH?usp=sharing)
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wNSnSj3T5oOctbh5IGCa393gKOoQw2GH?usp=sharing)
-### WenetSpeech
+### [WenetSpeech][wenetspeech]
-We provide some models for this recipe: [Pruned stateless RNN-T_2: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][WenetSpeech_pruned_transducer_stateless2] and [Pruned stateless RNN-T_5: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][WenetSpeech_pruned_transducer_stateless5].
-
-#### Pruned stateless RNN-T_2: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with L subset, offline ASR)
+#### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/wenetspeech/ASR/pruned_transducer_stateless2)
| | Dev | Test-Net | Test-Meeting |
|----------------------|-------|----------|--------------|
-| greedy search | 7.80 | 8.75 | 13.49 |
-| modified beam search| 7.76 | 8.71 | 13.41 |
-| fast beam search | 7.94 | 8.74 | 13.80 |
+| greedy_search | 7.80 | 8.75 | 13.49 |
+| fast_beam_search | 7.94 | 8.74 | 13.80 |
+| modified_beam_search | 7.76 | 8.71 | 13.41 |
+
+We provide a Colab notebook to run the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1EV4e1CHa1GZgEF-bZgizqI9RyFFehIiN?usp=sharing)
+
+#### [Transducer **Streaming** (pruned_transducer_stateless5) ](https://github.com/k2-fsa/icefall/tree/master/egs/wenetspeech/ASR/pruned_transducer_stateless5)
-#### Pruned stateless RNN-T_5: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with L subset)
-**Streaming**:
| | Dev | Test-Net | Test-Meeting |
|----------------------|-------|----------|--------------|
| greedy_search | 8.78 | 10.12 | 16.16 |
-| modified_beam_search | 8.53| 9.95 | 15.81 |
| fast_beam_search| 9.01 | 10.47 | 16.28 |
+| modified_beam_search | 8.53| 9.95 | 15.81 |
-We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless2 model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1EV4e1CHa1GZgEF-bZgizqI9RyFFehIiN?usp=sharing)
-
-### Alimeeting
-We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][Alimeeting_pruned_transducer_stateless2].
+### [Alimeeting][alimeeting]
-#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with far subset)
+#### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/alimeeting/ASR/pruned_transducer_stateless2)
| | Eval | Test-Net |
|----------------------|--------|----------|
-| greedy search | 31.77 | 34.66 |
-| fast beam search | 31.39 | 33.02 |
-| modified beam search | 30.38 | 34.25 |
+| greedy_search | 31.77 | 34.66 |
+| fast_beam_search | 31.39 | 33.02 |
+| modified_beam_search | 30.38 | 34.25 |
-We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tKr3f0mL17uO_ljdHGKtR7HOmthYHwJG?usp=sharing)
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tKr3f0mL17uO_ljdHGKtR7HOmthYHwJG?usp=sharing)
-### TAL_CSASR
+### [TAL_CSASR][tal_csasr]
-We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][TAL_CSASR_pruned_transducer_stateless5].
-#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss
+#### [Transducer (pruned_transducer_stateless5)](https://github.com/k2-fsa/icefall/tree/master/egs/tal_csasr/ASR/pruned_transducer_stateless5)
The best results for Chinese CER(%) and English WER(%) respectively (zh: Chinese, en: English):
|decoding-method | dev | dev_zh | dev_en | test | test_zh | test_en |
|--|--|--|--|--|--|--|
|greedy_search| 7.30 | 6.48 | 19.19 |7.39| 6.66 | 19.13|
-|modified_beam_search| 7.15 | 6.35 | 18.95 | 7.22| 6.50 | 18.70 |
|fast_beam_search| 7.18 | 6.39| 18.90 | 7.27| 6.55 | 18.77|
+|modified_beam_search| 7.15 | 6.35 | 18.95 | 7.22| 6.50 | 18.70 |
+
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1DmIx-NloI1CMU5GdZrlse7TRu4y3Dpf8?usp=sharing)
+
+## TTS: Text-to-Speech
+
+### Supported Datasets
+
+ - [LJSpeech][ljspeech]
+ - [VCTK][vctk]
+
+### Supported Models
-We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1DmIx-NloI1CMU5GdZrlse7TRu4y3Dpf8?usp=sharing)
+ - [VITS](https://arxiv.org/abs/2106.06103)
-## Deployment with C++
+# Deployment with C++
-Once you have trained a model in icefall, you may want to deploy it with C++,
-without Python dependencies.
+Once you have trained a model in icefall, you may want to deploy it with C++ without Python dependencies.
-Please refer to the documentation
-
+Please refer to the [document](https://icefall.readthedocs.io/en/latest/recipes/Non-streaming-ASR/librispeech/conformer_ctc.html#deployment-with-c)
for how to do this.
We also provide a Colab notebook, showing you how to run a torch scripted model in [k2][k2] with C++.
Please see: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1BIGLWzS36isskMXHKcqC9ysN6pspYXs_?usp=sharing)
-[LibriSpeech_tdnn_lstm_ctc]: egs/librispeech/ASR/tdnn_lstm_ctc
-[LibriSpeech_conformer_ctc]: egs/librispeech/ASR/conformer_ctc
-[LibriSpeech_transducer]: egs/librispeech/ASR/transducer
-[LibriSpeech_transducer_stateless]: egs/librispeech/ASR/transducer_stateless
-[LibriSpeech_zipformer]: egs/librispeech/ASR/zipformer
-[Aishell_tdnn_lstm_ctc]: egs/aishell/ASR/tdnn_lstm_ctc
-[Aishell_conformer_ctc]: egs/aishell/ASR/conformer_ctc
-[Aishell_pruned_transducer_stateless7]: egs/aishell/ASR/pruned_transducer_stateless7_bbpe
-[Aishell2_pruned_transducer_stateless5]: egs/aishell2/ASR/pruned_transducer_stateless5
-[Aishell4_pruned_transducer_stateless5]: egs/aishell4/ASR/pruned_transducer_stateless5
-[TIMIT_tdnn_lstm_ctc]: egs/timit/ASR/tdnn_lstm_ctc
-[TIMIT_tdnn_ligru_ctc]: egs/timit/ASR/tdnn_ligru_ctc
-[TED-LIUM3_transducer_stateless]: egs/tedlium3/ASR/transducer_stateless
-[TED-LIUM3_pruned_transducer_stateless]: egs/tedlium3/ASR/pruned_transducer_stateless
-[GigaSpeech_conformer_ctc]: egs/gigaspeech/ASR/conformer_ctc
-[GigaSpeech_pruned_transducer_stateless2]: egs/gigaspeech/ASR/pruned_transducer_stateless2
-[GigaSpeech_zipformer]: egs/gigaspeech/ASR/zipformer
-[Aidatatang_200zh_pruned_transducer_stateless2]: egs/aidatatang_200zh/ASR/pruned_transducer_stateless2
-[WenetSpeech_pruned_transducer_stateless2]: egs/wenetspeech/ASR/pruned_transducer_stateless2
-[WenetSpeech_pruned_transducer_stateless5]: egs/wenetspeech/ASR/pruned_transducer_stateless5
-[Alimeeting_pruned_transducer_stateless2]: egs/alimeeting/ASR/pruned_transducer_stateless2
-[TAL_CSASR_pruned_transducer_stateless5]: egs/tal_csasr/ASR/pruned_transducer_stateless5
[yesno]: egs/yesno/ASR
[librispeech]: egs/librispeech/ASR
[aishell]: egs/aishell/ASR
@@ -411,3 +362,15 @@ Please see: [![Open In Colab](https://colab.research.google.com/assets/colab-bad
[ami]: egs/ami
[swbd]: egs/swbd/ASR
[k2]: https://github.com/k2-fsa/k2
+[commonvoice]: egs/commonvoice/ASR
+[csj]: egs/csj/ASR
+[libricss]: egs/libricss/SURT
+[libriheavy]: egs/libriheavy/ASR
+[mgb2]: egs/mgb2/ASR
+[peoplespeech]: egs/peoplespeech/ASR
+[spgispeech]: egs/spgispeech/ASR
+[voxpopuli]: egs/voxpopuli/ASR
+[xbmu-amdo31]: egs/xbmu-amdo31/ASR
+
+[vctk]: egs/vctk/TTS
+[ljspeech]: egs/ljspeech/TTS
\ No newline at end of file