From ae1b22273fef250d929aef130a07115f4a1d874d Mon Sep 17 00:00:00 2001 From: Jackwaterveg <87408988+Jackwaterveg@users.noreply.github.com> Date: Fri, 8 Apr 2022 10:58:44 +0800 Subject: [PATCH 1/8] [Doc] update readem for aishell/asr0, test=doc --- examples/aishell/asr0/README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/examples/aishell/asr0/README.md b/examples/aishell/asr0/README.md index 16489992d..6692a6384 100644 --- a/examples/aishell/asr0/README.md +++ b/examples/aishell/asr0/README.md @@ -154,18 +154,18 @@ CUDA_VISIBLE_DEVICES= ./local/test.sh conf/deepspeech2.yaml exp/deepspeech2/chec You can get the pretrained transformer or conformer using the scripts below: ```bash Deepspeech2 offline: -wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/ds2.model.tar.gz +wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz Deepspeech2 online: -wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/aishell_ds2_online_cer8.00_release.tar.gz +wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_ckpt_0.2.0.model.tar.gz ``` using the `tar` scripts to unpack the model and then you can use the script to test the model. For example: ``` -wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/ds2.model.tar.gz -tar xzvf ds2.model.tar.gz +wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz +tar xzvf asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz source path.sh # If you have process the data and get the manifest file, you can skip the following 2 steps bash local/data.sh --stage -1 --stop_stage -1 @@ -209,8 +209,8 @@ if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then ``` you can train the model by yourself, or you can download the pretrained model by the script below: ```bash -wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/ds2.model.tar.gz -tar xzvf ds2.model.tar.gz +wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz +tar xzvf asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz ``` You can download the audio demo: ```bash From a22f29ba105bb81d4922b832caa19c2ef4f512fd Mon Sep 17 00:00:00 2001 From: Jackwaterveg <87408988+Jackwaterveg@users.noreply.github.com> Date: Fri, 8 Apr 2022 11:09:25 +0800 Subject: [PATCH 2/8] test=doc --- examples/aishell/asr1/README.md | 18 ++++-------------- 1 file changed, 4 insertions(+), 14 deletions(-) diff --git a/examples/aishell/asr1/README.md b/examples/aishell/asr1/README.md index 5277a31eb..a3f30353f 100644 --- a/examples/aishell/asr1/README.md +++ b/examples/aishell/asr1/README.md @@ -143,25 +143,15 @@ avg.sh best exp/conformer/checkpoints 20 CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml exp/conformer/checkpoints/avg_20 ``` ## Pretrained Model -You can get the pretrained transformer or conformer using the scripts below: - -```bash -# Conformer: -wget https://deepspeech.bj.bcebos.com/release2.1/aishell/s1/aishell.release.tar.gz - -# Chunk Conformer: -wget https://deepspeech.bj.bcebos.com/release2.1/aishell/s1/aishell.chunk.release.tar.gz - -# Transformer: -wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/transformer.model.tar.gz +You can get the pretrained transformer or conformer from [this](../../../docs/source/released_model.md) ``` using the `tar` scripts to unpack the model and then you can use the script to test the model. For example: ``` -wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/transformer.model.tar.gz -tar xzvf transformer.model.tar.gz +wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_transformer_aishell_ckpt_0.1.1.model.tar.gz +tar xzvf asr1_transformer_aishell_ckpt_0.1.1.model.tar.gz source path.sh # If you have process the data and get the manifest file, you can skip the following 2 steps bash local/data.sh --stage -1 --stop_stage -1 @@ -206,7 +196,7 @@ In some situations, you want to use the trained model to do the inference for th ``` you can train the model by yourself using ```bash run.sh --stage 0 --stop_stage 3```, or you can download the pretrained model through the script below: ```bash -wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/transformer.model.tar.gz +wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_transformer_aishell_ckpt_0.1.1.model.tar.gz tar xzvf transformer.model.tar.gz ``` You can download the audio demo: From ee96fb40f0edb17923046632291340e6276e6d48 Mon Sep 17 00:00:00 2001 From: Jackwaterveg <87408988+Jackwaterveg@users.noreply.github.com> Date: Fri, 8 Apr 2022 11:13:27 +0800 Subject: [PATCH 3/8] test=doc --- examples/aishell/asr1/README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/examples/aishell/asr1/README.md b/examples/aishell/asr1/README.md index a3f30353f..25b28ede8 100644 --- a/examples/aishell/asr1/README.md +++ b/examples/aishell/asr1/README.md @@ -145,7 +145,6 @@ CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml exp/conformer/checkpoi ## Pretrained Model You can get the pretrained transformer or conformer from [this](../../../docs/source/released_model.md) -``` using the `tar` scripts to unpack the model and then you can use the script to test the model. For example: From 88f5595bd75c71e48760a96e8dc80b587ac363ef Mon Sep 17 00:00:00 2001 From: Jackwaterveg <87408988+Jackwaterveg@users.noreply.github.com> Date: Fri, 8 Apr 2022 11:17:18 +0800 Subject: [PATCH 4/8] test=doc --- examples/librispeech/asr1/README.md | 36 ++++++----------------------- 1 file changed, 7 insertions(+), 29 deletions(-) diff --git a/examples/librispeech/asr1/README.md b/examples/librispeech/asr1/README.md index eb1a44001..397f18d8a 100644 --- a/examples/librispeech/asr1/README.md +++ b/examples/librispeech/asr1/README.md @@ -151,44 +151,22 @@ avg.sh best exp/conformer/checkpoints 20 CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml exp/conformer/checkpoints/avg_20 ``` ## Pretrained Model -You can get the pretrained transformer or conformer using the scripts below: -```bash -# Conformer: -wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/conformer.model.tar.gz -# Transformer: -wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/transformer.model.tar.gz -``` +You can get the pretrained transformer or conformer from [this](../../../docs/source/released_model.md). + using the `tar` scripts to unpack the model and then you can use the script to test the model. For example: ```bash -wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/conformer.model.tar.gz -tar xzvf transformer.model.tar.gz +wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz +tar xzvf asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz source path.sh # If you have process the data and get the manifest file, you can skip the following 2 steps bash local/data.sh --stage -1 --stop_stage -1 bash local/data.sh --stage 2 --stop_stage 2 CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml exp/conformer/checkpoints/avg_20 ``` -The performance of the released models are shown below: -## Conformer -train: Epoch 70, 4 V100-32G, best avg: 20 - -| Model | Params | Config | Augmentation | Test set | Decode method | Loss | WER | -| --------- | ------- | ------------------- | ------------ | ---------- | ---------------------- | ----------------- | -------- | -| conformer | 47.63 M | conf/conformer.yaml | spec_aug | test-clean | attention | 6.433612394332886 | 0.039771 | -| conformer | 47.63 M | conf/conformer.yaml | spec_aug | test-clean | ctc_greedy_search | 6.433612394332886 | 0.040342 | -| conformer | 47.63 M | conf/conformer.yaml | spec_aug | test-clean | ctc_prefix_beam_search | 6.433612394332886 | 0.040342 | -| conformer | 47.63 M | conf/conformer.yaml | spec_aug | test-clean | attention_rescoring | 6.433612394332886 | 0.033761 | -## Transformer -train: Epoch 120, 4 V100-32G, 27 Day, best avg: 10 +The performance of the released models are shown in (here)[./RESULTS.md]. -| Model | Params | Config | Augmentation | Test set | Decode method | Loss | WER | -| ----------- | ------- | --------------------- | ------------ | ---------- | ---------------------- | ----------------- | -------- | -| transformer | 32.52 M | conf/transformer.yaml | spec_aug | test-clean | attention | 6.382194232940674 | 0.049661 | -| transformer | 32.52 M | conf/transformer.yaml | spec_aug | test-clean | ctc_greedy_search | 6.382194232940674 | 0.049566 | -| transformer | 32.52 M | conf/transformer.yaml | spec_aug | test-clean | ctc_prefix_beam_search | 6.382194232940674 | 0.049585 | -| transformer | 32.52 M | conf/transformer.yaml | spec_aug | test-clean | attention_rescoring | 6.382194232940674 | 0.038135 | ## Stage 4: CTC Alignment If you want to get the alignment between the audio and the text, you can use the ctc alignment. The code of this stage is shown below: ```bash @@ -227,8 +205,8 @@ In some situations, you want to use the trained model to do the inference for th ``` you can train the model by yourself using ```bash run.sh --stage 0 --stop_stage 3```, or you can download the pretrained model through the script below: ```bash -wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/conformer.model.tar.gz -tar xzvf conformer.model.tar.gz +wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz +tar xzvf asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz ``` You can download the audio demo: ```bash From 1a670386165eaf0d0ce928de2eed6932b0d3c638 Mon Sep 17 00:00:00 2001 From: Jackwaterveg <87408988+Jackwaterveg@users.noreply.github.com> Date: Fri, 8 Apr 2022 11:18:37 +0800 Subject: [PATCH 5/8] test=doc --- examples/librispeech/asr1/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/librispeech/asr1/README.md b/examples/librispeech/asr1/README.md index 397f18d8a..ae252a58b 100644 --- a/examples/librispeech/asr1/README.md +++ b/examples/librispeech/asr1/README.md @@ -165,7 +165,7 @@ bash local/data.sh --stage -1 --stop_stage -1 bash local/data.sh --stage 2 --stop_stage 2 CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml exp/conformer/checkpoints/avg_20 ``` -The performance of the released models are shown in (here)[./RESULTS.md]. +The performance of the released models are shown in [here](./RESULTS.md). ## Stage 4: CTC Alignment If you want to get the alignment between the audio and the text, you can use the ctc alignment. The code of this stage is shown below: From f71b9b915d04534053125b34009ad037ba93fe09 Mon Sep 17 00:00:00 2001 From: Jackwaterveg <87408988+Jackwaterveg@users.noreply.github.com> Date: Fri, 8 Apr 2022 11:20:48 +0800 Subject: [PATCH 6/8] test=doc --- examples/aishell/asr0/README.md | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/examples/aishell/asr0/README.md b/examples/aishell/asr0/README.md index 6692a6384..4459b1382 100644 --- a/examples/aishell/asr0/README.md +++ b/examples/aishell/asr0/README.md @@ -151,15 +151,8 @@ avg.sh best exp/deepspeech2/checkpoints 1 CUDA_VISIBLE_DEVICES= ./local/test.sh conf/deepspeech2.yaml exp/deepspeech2/checkpoints/avg_1 ``` ## Pretrained Model -You can get the pretrained transformer or conformer using the scripts below: -```bash -Deepspeech2 offline: -wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz +You can get the pretrained models from [this](../../../docs/source/released_model.md). -Deepspeech2 online: -wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_ckpt_0.2.0.model.tar.gz - -``` using the `tar` scripts to unpack the model and then you can use the script to test the model. For example: From 3c93953550d5a8a04382a3cc181370b6da2fe74c Mon Sep 17 00:00:00 2001 From: Jackwaterveg <87408988+Jackwaterveg@users.noreply.github.com> Date: Fri, 8 Apr 2022 11:23:17 +0800 Subject: [PATCH 7/8] test=doc --- examples/librispeech/asr2/README.md | 32 +++++------------------------ 1 file changed, 5 insertions(+), 27 deletions(-) diff --git a/examples/librispeech/asr2/README.md b/examples/librispeech/asr2/README.md index 7d6fe11df..209a20787 100644 --- a/examples/librispeech/asr2/README.md +++ b/examples/librispeech/asr2/README.md @@ -213,17 +213,14 @@ avg.sh latest exp/transformer/checkpoints 10 ./local/recog.sh --ckpt_prefix exp/transformer/checkpoints/avg_10 ``` ## Pretrained Model -You can get the pretrained transformer using the scripts below: -```bash -# Transformer: -wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr2/transformer.model.tar.gz -``` +You can get the pretrained models from [this](../../../docs/source/released_model.md). + using the `tar` scripts to unpack the model and then you can use the script to test the model. For example: ```bash -wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr2/transformer.model.tar.gz -tar xzvf transformer.model.tar.gz +wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr2/asr2_transformer_librispeech_ckpt_0.1.1.model.tar.gz +tar xzvf asr2_transformer_librispeech_ckpt_0.1.1.model.tar.gz source path.sh # If you have process the data and get the manifest file, you can skip the following 2 steps bash local/data.sh --stage -1 --stop_stage -1 @@ -231,26 +228,7 @@ bash local/data.sh --stage 2 --stop_stage 2 CUDA_VISIBLE_DEVICES= ./local/test.sh conf/transformer.yaml exp/ctc/checkpoints/avg_10 ``` -The performance of the released models are shown below: -### Transformer -| Model | Params | GPUS | Averaged Model | Config | Augmentation | Loss | -| :---------: | :----: | :--------------------: | :--------------: | :-------------------: | :----------: | :-------------: | -| transformer | 32.52M | 8 Tesla V100-SXM2-32GB | 10-best val_loss | conf/transformer.yaml | spec_aug | 6.3197922706604 | - -#### Attention Rescore -| Test Set | Decode Method | #Snt | #Wrd | Corr | Sub | Del | Ins | Err | S.Err | -| ---------- | --------------------- | ---- | ----- | ---- | ---- | ---- | ---- | ---- | ----- | -| test-clean | attention | 2620 | 52576 | 96.4 | 2.5 | 1.1 | 0.4 | 4.0 | 34.7 | -| test-clean | ctc_greedy_search | 2620 | 52576 | 95.9 | 3.7 | 0.4 | 0.5 | 4.6 | 48.0 | -| test-clean | ctc_prefix_beamsearch | 2620 | 52576 | 95.9 | 3.7 | 0.4 | 0.5 | 4.6 | 47.6 | -| test-clean | attention_rescore | 2620 | 52576 | 96.8 | 2.9 | 0.3 | 0.4 | 3.7 | 38.0 | - -#### JoinCTC -| Test Set | Decode Method | #Snt | #Wrd | Corr | Sub | Del | Ins | Err | S.Err | -| ---------- | ----------------- | ---- | ----- | ---- | ---- | ---- | ---- | ---- | ----- | -| test-clean | join_ctc_only_att | 2620 | 52576 | 96.1 | 2.5 | 1.4 | 0.4 | 4.4 | 34.7 | -| test-clean | join_ctc_w/o_lm | 2620 | 52576 | 97.2 | 2.6 | 0.3 | 0.4 | 3.2 | 34.9 | -| test-clean | join_ctc_w_lm | 2620 | 52576 | 97.9 | 1.8 | 0.2 | 0.3 | 2.4 | 27.8 | +The performance of the released models are shown [here](./RESULTS.md). Compare with [ESPNET](https://github.com/espnet/espnet/blob/master/egs/librispeech/asr1/RESULTS.md#pytorch-large-transformer-with-specaug-4-gpus--transformer-lm-4-gpus) we using 8gpu, but the model size (aheads4-adim256) small than it. ## Stage 5: CTC Alignment From 75c9dc773bd7feac8deea8e0e0c7dfdefa1ec9ee Mon Sep 17 00:00:00 2001 From: Jackwaterveg <87408988+Jackwaterveg@users.noreply.github.com> Date: Fri, 8 Apr 2022 11:23:54 +0800 Subject: [PATCH 8/8] test=doc --- examples/librispeech/asr2/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/librispeech/asr2/README.md b/examples/librispeech/asr2/README.md index 209a20787..5bc7185a9 100644 --- a/examples/librispeech/asr2/README.md +++ b/examples/librispeech/asr2/README.md @@ -1,4 +1,4 @@ -# Transformer/Conformer ASR with Librispeech Asr2 +# Transformer/Conformer ASR with Librispeech ASR2 This example contains code used to train a Transformer or [Conformer](http://arxiv.org/abs/2008.03802) model with [Librispeech dataset](http://www.openslr.org/resources/12) and use some functions in kaldi.