diff --git a/examples/aishell/asr0/README.md b/examples/aishell/asr0/README.md index 16489992d..4459b1382 100644 --- a/examples/aishell/asr0/README.md +++ b/examples/aishell/asr0/README.md @@ -151,21 +151,14 @@ avg.sh best exp/deepspeech2/checkpoints 1 CUDA_VISIBLE_DEVICES= ./local/test.sh conf/deepspeech2.yaml exp/deepspeech2/checkpoints/avg_1 ``` ## Pretrained Model -You can get the pretrained transformer or conformer using the scripts below: -```bash -Deepspeech2 offline: -wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/ds2.model.tar.gz - -Deepspeech2 online: -wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/aishell_ds2_online_cer8.00_release.tar.gz +You can get the pretrained models from [this](../../../docs/source/released_model.md). -``` using the `tar` scripts to unpack the model and then you can use the script to test the model. For example: ``` -wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/ds2.model.tar.gz -tar xzvf ds2.model.tar.gz +wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz +tar xzvf asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz source path.sh # If you have process the data and get the manifest file, you can skip the following 2 steps bash local/data.sh --stage -1 --stop_stage -1 @@ -209,8 +202,8 @@ if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then ``` you can train the model by yourself, or you can download the pretrained model by the script below: ```bash -wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/ds2.model.tar.gz -tar xzvf ds2.model.tar.gz +wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz +tar xzvf asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz ``` You can download the audio demo: ```bash diff --git a/examples/aishell/asr1/README.md b/examples/aishell/asr1/README.md index 5277a31eb..25b28ede8 100644 --- a/examples/aishell/asr1/README.md +++ b/examples/aishell/asr1/README.md @@ -143,25 +143,14 @@ avg.sh best exp/conformer/checkpoints 20 CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml exp/conformer/checkpoints/avg_20 ``` ## Pretrained Model -You can get the pretrained transformer or conformer using the scripts below: +You can get the pretrained transformer or conformer from [this](../../../docs/source/released_model.md) -```bash -# Conformer: -wget https://deepspeech.bj.bcebos.com/release2.1/aishell/s1/aishell.release.tar.gz - -# Chunk Conformer: -wget https://deepspeech.bj.bcebos.com/release2.1/aishell/s1/aishell.chunk.release.tar.gz - -# Transformer: -wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/transformer.model.tar.gz - -``` using the `tar` scripts to unpack the model and then you can use the script to test the model. For example: ``` -wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/transformer.model.tar.gz -tar xzvf transformer.model.tar.gz +wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_transformer_aishell_ckpt_0.1.1.model.tar.gz +tar xzvf asr1_transformer_aishell_ckpt_0.1.1.model.tar.gz source path.sh # If you have process the data and get the manifest file, you can skip the following 2 steps bash local/data.sh --stage -1 --stop_stage -1 @@ -206,7 +195,7 @@ In some situations, you want to use the trained model to do the inference for th ``` you can train the model by yourself using ```bash run.sh --stage 0 --stop_stage 3```, or you can download the pretrained model through the script below: ```bash -wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/transformer.model.tar.gz +wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_transformer_aishell_ckpt_0.1.1.model.tar.gz tar xzvf transformer.model.tar.gz ``` You can download the audio demo: diff --git a/examples/librispeech/asr1/README.md b/examples/librispeech/asr1/README.md index eb1a44001..ae252a58b 100644 --- a/examples/librispeech/asr1/README.md +++ b/examples/librispeech/asr1/README.md @@ -151,44 +151,22 @@ avg.sh best exp/conformer/checkpoints 20 CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml exp/conformer/checkpoints/avg_20 ``` ## Pretrained Model -You can get the pretrained transformer or conformer using the scripts below: -```bash -# Conformer: -wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/conformer.model.tar.gz -# Transformer: -wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/transformer.model.tar.gz -``` +You can get the pretrained transformer or conformer from [this](../../../docs/source/released_model.md). + using the `tar` scripts to unpack the model and then you can use the script to test the model. For example: ```bash -wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/conformer.model.tar.gz -tar xzvf transformer.model.tar.gz +wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz +tar xzvf asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz source path.sh # If you have process the data and get the manifest file, you can skip the following 2 steps bash local/data.sh --stage -1 --stop_stage -1 bash local/data.sh --stage 2 --stop_stage 2 CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml exp/conformer/checkpoints/avg_20 ``` -The performance of the released models are shown below: -## Conformer -train: Epoch 70, 4 V100-32G, best avg: 20 - -| Model | Params | Config | Augmentation | Test set | Decode method | Loss | WER | -| --------- | ------- | ------------------- | ------------ | ---------- | ---------------------- | ----------------- | -------- | -| conformer | 47.63 M | conf/conformer.yaml | spec_aug | test-clean | attention | 6.433612394332886 | 0.039771 | -| conformer | 47.63 M | conf/conformer.yaml | spec_aug | test-clean | ctc_greedy_search | 6.433612394332886 | 0.040342 | -| conformer | 47.63 M | conf/conformer.yaml | spec_aug | test-clean | ctc_prefix_beam_search | 6.433612394332886 | 0.040342 | -| conformer | 47.63 M | conf/conformer.yaml | spec_aug | test-clean | attention_rescoring | 6.433612394332886 | 0.033761 | -## Transformer -train: Epoch 120, 4 V100-32G, 27 Day, best avg: 10 +The performance of the released models are shown in [here](./RESULTS.md). -| Model | Params | Config | Augmentation | Test set | Decode method | Loss | WER | -| ----------- | ------- | --------------------- | ------------ | ---------- | ---------------------- | ----------------- | -------- | -| transformer | 32.52 M | conf/transformer.yaml | spec_aug | test-clean | attention | 6.382194232940674 | 0.049661 | -| transformer | 32.52 M | conf/transformer.yaml | spec_aug | test-clean | ctc_greedy_search | 6.382194232940674 | 0.049566 | -| transformer | 32.52 M | conf/transformer.yaml | spec_aug | test-clean | ctc_prefix_beam_search | 6.382194232940674 | 0.049585 | -| transformer | 32.52 M | conf/transformer.yaml | spec_aug | test-clean | attention_rescoring | 6.382194232940674 | 0.038135 | ## Stage 4: CTC Alignment If you want to get the alignment between the audio and the text, you can use the ctc alignment. The code of this stage is shown below: ```bash @@ -227,8 +205,8 @@ In some situations, you want to use the trained model to do the inference for th ``` you can train the model by yourself using ```bash run.sh --stage 0 --stop_stage 3```, or you can download the pretrained model through the script below: ```bash -wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/conformer.model.tar.gz -tar xzvf conformer.model.tar.gz +wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz +tar xzvf asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz ``` You can download the audio demo: ```bash diff --git a/examples/librispeech/asr2/README.md b/examples/librispeech/asr2/README.md index 7d6fe11df..5bc7185a9 100644 --- a/examples/librispeech/asr2/README.md +++ b/examples/librispeech/asr2/README.md @@ -1,4 +1,4 @@ -# Transformer/Conformer ASR with Librispeech Asr2 +# Transformer/Conformer ASR with Librispeech ASR2 This example contains code used to train a Transformer or [Conformer](http://arxiv.org/abs/2008.03802) model with [Librispeech dataset](http://www.openslr.org/resources/12) and use some functions in kaldi. @@ -213,17 +213,14 @@ avg.sh latest exp/transformer/checkpoints 10 ./local/recog.sh --ckpt_prefix exp/transformer/checkpoints/avg_10 ``` ## Pretrained Model -You can get the pretrained transformer using the scripts below: -```bash -# Transformer: -wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr2/transformer.model.tar.gz -``` +You can get the pretrained models from [this](../../../docs/source/released_model.md). + using the `tar` scripts to unpack the model and then you can use the script to test the model. For example: ```bash -wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr2/transformer.model.tar.gz -tar xzvf transformer.model.tar.gz +wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr2/asr2_transformer_librispeech_ckpt_0.1.1.model.tar.gz +tar xzvf asr2_transformer_librispeech_ckpt_0.1.1.model.tar.gz source path.sh # If you have process the data and get the manifest file, you can skip the following 2 steps bash local/data.sh --stage -1 --stop_stage -1 @@ -231,26 +228,7 @@ bash local/data.sh --stage 2 --stop_stage 2 CUDA_VISIBLE_DEVICES= ./local/test.sh conf/transformer.yaml exp/ctc/checkpoints/avg_10 ``` -The performance of the released models are shown below: -### Transformer -| Model | Params | GPUS | Averaged Model | Config | Augmentation | Loss | -| :---------: | :----: | :--------------------: | :--------------: | :-------------------: | :----------: | :-------------: | -| transformer | 32.52M | 8 Tesla V100-SXM2-32GB | 10-best val_loss | conf/transformer.yaml | spec_aug | 6.3197922706604 | - -#### Attention Rescore -| Test Set | Decode Method | #Snt | #Wrd | Corr | Sub | Del | Ins | Err | S.Err | -| ---------- | --------------------- | ---- | ----- | ---- | ---- | ---- | ---- | ---- | ----- | -| test-clean | attention | 2620 | 52576 | 96.4 | 2.5 | 1.1 | 0.4 | 4.0 | 34.7 | -| test-clean | ctc_greedy_search | 2620 | 52576 | 95.9 | 3.7 | 0.4 | 0.5 | 4.6 | 48.0 | -| test-clean | ctc_prefix_beamsearch | 2620 | 52576 | 95.9 | 3.7 | 0.4 | 0.5 | 4.6 | 47.6 | -| test-clean | attention_rescore | 2620 | 52576 | 96.8 | 2.9 | 0.3 | 0.4 | 3.7 | 38.0 | - -#### JoinCTC -| Test Set | Decode Method | #Snt | #Wrd | Corr | Sub | Del | Ins | Err | S.Err | -| ---------- | ----------------- | ---- | ----- | ---- | ---- | ---- | ---- | ---- | ----- | -| test-clean | join_ctc_only_att | 2620 | 52576 | 96.1 | 2.5 | 1.4 | 0.4 | 4.4 | 34.7 | -| test-clean | join_ctc_w/o_lm | 2620 | 52576 | 97.2 | 2.6 | 0.3 | 0.4 | 3.2 | 34.9 | -| test-clean | join_ctc_w_lm | 2620 | 52576 | 97.9 | 1.8 | 0.2 | 0.3 | 2.4 | 27.8 | +The performance of the released models are shown [here](./RESULTS.md). Compare with [ESPNET](https://github.com/espnet/espnet/blob/master/egs/librispeech/asr1/RESULTS.md#pytorch-large-transformer-with-specaug-4-gpus--transformer-lm-4-gpus) we using 8gpu, but the model size (aheads4-adim256) small than it. ## Stage 5: CTC Alignment