Merge branch 'develop' of github.com:SmileGoat/PaddleSpeech into add_websocketlib

4 years ago · 340ed3b6e0
parent 813f4d6a26 372bf5fccf
commit 340ed3b6e0
4 changed files with 156 additions and 5 deletions
--- a/docs/source/released_model.md
+++ b/docs/source/released_model.md
@ -8,8 +8,8 @@ Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER |
 :-------------:| :------------:| :-----: | -----: | :-----: |:-----:| :-----:  | :-----:  | :-----: 
 [Ds2 Online Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_ckpt_0.2.0.model.tar.gz) | Aishell Dataset | Char-based | 345 MB  | 2 Conv + 5 LSTM layers with only forward direction | 0.078 |-| 151 h | [D2 Online Aishell ASR0](../../examples/aishell/asr0) 
 [Ds2 Offline Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz)| Aishell Dataset | Char-based | 306 MB | 2 Conv + 3 bidirectional GRU layers| 0.064 |-| 151 h | [Ds2 Offline Aishell ASR0](../../examples/aishell/asr0) 
-[Conformer Online Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_chunk_conformer_aishell_ckpt_0.2.0.model.tar.gz) | Aishell Dataset | Char-based | 189 MB  | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention| 0.0534 |-| 151 h | [Conformer Online Aishell ASR1](../../examples/aishell/asr1) 
-[Conformer Offline Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_conformer_aishell_ckpt_0.1.2.model.tar.gz) | Aishell Dataset | Char-based | 189 MB  | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0483 |-| 151 h | [Conformer Offline Aishell ASR1](../../examples/aishell/asr1) 
+[Conformer Online Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_chunk_conformer_aishell_ckpt_0.2.0.model.tar.gz) | Aishell Dataset | Char-based | 189 MB  | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring| 0.0544 |-| 151 h | [Conformer Online Aishell ASR1](../../examples/aishell/asr1) 
+[Conformer Offline Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_conformer_aishell_ckpt_0.1.2.model.tar.gz) | Aishell Dataset | Char-based | 189 MB  | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0464 |-| 151 h | [Conformer Offline Aishell ASR1](../../examples/aishell/asr1) 
 [Transformer Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_transformer_aishell_ckpt_0.1.1.model.tar.gz) | Aishell Dataset | Char-based | 128 MB | Encoder:Transformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0523 || 151 h | [Transformer  Aishell ASR1](../../examples/aishell/asr1) 
 [Ds2 Offline Librispeech ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr0/asr0_deepspeech2_librispeech_ckpt_0.1.1.model.tar.gz)| Librispeech Dataset | Char-based | 518 MB | 2 Conv + 3 bidirectional LSTM layers| - |0.0725| 960 h | [Ds2 Offline Librispeech ASR0](../../examples/librispeech/asr0) 
 [Conformer Librispeech ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz) | Librispeech Dataset | subword-based | 191 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring |-| 0.0337 | 960 h | [Conformer Librispeech ASR1](../../examples/librispeech/asr1) 
--- a/examples/aishell/asr1/RESULTS.md
+++ b/examples/aishell/asr1/RESULTS.md
@ -18,7 +18,7 @@ Need set `decoding.decoding_chunk_size=16` when decoding.

 | Model | Params | Config | Augmentation| Test set | Decode method | Chunk Size & Left Chunks | Loss | CER |  
 | --- | --- | --- | --- | --- | --- | --- | --- | --- |  
-| conformer | 47.06M | conf/chunk_conformer.yaml | spec_aug | test | attention | 16, -1 | - | 0.0534 |  
+| conformer | 47.06M | conf/chunk_conformer.yaml | spec_aug | test | attention | 16, -1 | - | 0.0551 |  
 | conformer | 47.06M | conf/chunk_conformer.yaml | spec_aug | test | ctc_greedy_search | 16, -1 | - | 0.0629 |  
 | conformer | 47.06M | conf/chunk_conformer.yaml | spec_aug | test | ctc_prefix_beam_search | 16, -1 | - | 0.0629 |  
 | conformer | 47.06M | conf/chunk_conformer.yaml | spec_aug | test | attention_rescoring | 16, -1 |  - | 0.0544 |  
--- a/examples/aishell/asr1/local/test.sh
+++ b/examples/aishell/asr1/local/test.sh
@ -84,7 +84,7 @@ if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
            exit 1
        fi
        python utils/format_rsl.py \
-            --origin_hyp ${output_dir}/${type}.rsl
+            --origin_hyp ${output_dir}/${type}.rsl \
            --trans_hyp ${output_dir}/${type}.rsl.text
        python utils/compute-wer.py --char=1 --v=1 \
            data/manifest.test.text ${output_dir}/${type}.rsl.text > ${output_dir}/${type}.error 
@ -100,7 +100,7 @@ if [ ${stage} -le 101 ] && [ ${stop_stage} -ge 101 ]; then
    output_dir=${ckpt_prefix}
    for type in attention ctc_greedy_search ctc_prefix_beam_search attention_rescoring; do
        python utils/format_rsl.py \
-            --origin_hyp ${output_dir}/${type}.rsl
+            --origin_hyp ${output_dir}/${type}.rsl \
            --trans_hyp_sclite ${output_dir}/${type}.rsl.text.sclite

        mkdir -p ${output_dir}/${type}_sclite
--- a/examples/voxceleb/sv0/README.md
+++ b/examples/voxceleb/sv0/README.md
@ -0,0 +1,151 @@
+# ECAPA-TDNN with VoxCeleb
+This example contains code used to train a ECAPA-TDNN model with [VoxCeleb dataset](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/index.html#about)
+
+## Overview
+All the scripts you need are in the `run.sh`. There are several stages in the `run.sh`, and each stage has its function.
+| Stage | Function                                                     |
+|:---- |:----------------------------------------------------------- |
+| 0     | Process data. It includes: <br>       (1) Download the VoxCeleb1 dataset <br>       (2) Download the VoxCeleb2 dataset  <br>       (3) Convert the VoxCeleb2 m4a to wav format <br>       (4) Get the manifest files of the train, development and test dataset <br> (5) Download the RIR Noise dataset and Get the noise manifest files for augmentation |
+| 1     | Train the model                                              |
+| 2     | Test the speaker verification with VoxCeleb trial|
+
+You can choose to run a range of stages by setting the `stage` and `stop_stage `. 
+
+For example, if you want to execute the code in stage 1 and stage 2, you can run this script:
+```bash
+bash run.sh --stage 1 --stop_stage 2
+```
+Or you can set `stage` equal to `stop-stage` to only run one stage.
+For example, if you only want to run `stage 0`, you can use the script below:
+```bash
+bash run.sh --stage 1 --stop_stage 1
+```
+The document below will describe the scripts in the `run.sh` in detail.
+## The environment variables
+The path.sh contains the environment variable. 
+```bash
+source path.sh
+```
+This script needs to be run first.  
+
+And another script is also needed:
+```bash
+source ${MAIN_ROOT}/utils/parse_options.sh
+```
+It will support the way of using `--variable value` in the shell scripts.
+
+## The local variables
+Some local variables are set in the `run.sh`. 
+`gpus` denotes the GPU number you want to use. If you set `gpus=`,  it means you only use CPU. 
+`stage` denotes the number of the stage you want to start from in the experiments.
+`stop stage` denotes the number of the stage you want to end at in the experiments. 
+`conf_path` denotes the config path of the model.
+`exp_dir` denotes the experiment directory, e.g. "exp/ecapa-tdnn-vox12-big/"
+
+You can set the local variables when you use the `run.sh`
+
+For example, you can set the `gpus` when you use the command line.:
+```bash
+bash run.sh --gpus 0,1 
+```
+## Stage 0: Data processing
+To use this example, you need to process data firstly and you can use stage 0 in the `run.sh` to do this. The code is shown below:
+
+```bash
+ if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
+     # prepare data
+     bash ./local/data.sh || exit -1
+ fi
+```
+Stage 0 is for processing the data. If you only want to process the data. You can run
+```bash
+bash run.sh --stage 0 --stop_stage 0
+```
+You can also just run these scripts in your command line.
+```bash
+source path.sh
+bash ./local/data.sh
+```
+After processing the data, the `data` directory will look like this:
+```bash
+data/
+├── rir_noise
+│   ├── csv
+│   │   ├── noise.csv
+│   │   └── rir.csv
+│   ├── manifest.pointsource_noises
+│   ├── manifest.real_rirs_isotropic_noises
+│   └── manifest.simulated_rirs
+├── vox
+│   ├── csv
+│   │   ├── dev.csv
+│   │   ├── enroll.csv
+│   │   ├── test.csv
+│   │   └── train.csv
+│   └── meta
+│       └── label2id.txt
+└── vox1
+    ├── list_test_all2.txt
+    ├── list_test_all.txt
+    ├── list_test_hard2.txt
+    ├── list_test_hard.txt
+    ├── manifest.dev
+    ├── manifest.test
+    ├── veri_test2.txt
+    ├── veri_test.txt
+    ├── voxceleb1.dev.meta
+    └── voxceleb1.test.meta
+```
+## Stage 1: Model training
+If you want to train the model. you can use stage 1 in the `run.sh`. The code is shown below. 
+```bash
+if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
+     # train model, all `ckpt` under `exp` dir
+     CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path}  ${ckpt}
+ fi
+```
+If you want to train the model, you can use the script below to execute stage 0 and stage 1:
+```bash
+bash run.sh --stage 0 --stop_stage 1
+```
+or you can run these scripts in the command line (only use CPU).
+```bash
+source path.sh
+bash ./local/data.sh ./data/ conf/ecapa_tdnn.yaml
+CUDA_VISIBLE_DEVICES= ./local/train.sh ./data/ exp/ecapa-tdnn-vox12-big/ conf/ecapa_tdnn.yaml
+```
+## Stage 2: Model Testing
+The test stage is to evaluate the model performance. The code of the test stage is shown below:
+```bash
+ if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
+     # test ckpt avg_n
+     CUDA_VISIBLE_DEVICES=0 ./local/test.sh ${dir} ${exp_dir} ${conf_path} || exit -1
+ fi
+```
+If you want to train a model and test it,  you can use the script below to execute stage 0, stage 1 and stage 2:
+```bash
+bash run.sh --stage 0 --stop_stage 2
+```
+or you can run these scripts in the command line (only use CPU).
+```bash
+source path.sh
+bash ./local/data.sh ./data/ conf/ecapa_tdnn.yaml
+CUDA_VISIBLE_DEVICES= ./local/train.sh ./data/ exp/ecapa-tdnn-vox12-big/ conf/ecapa_tdnn.yaml
+CUDA_VISIBLE_DEVICES= ./local/test.sh ./data/ exp/ecapa-tdnn-vox12-big/ conf/ecapa_tdnn.yaml
+```
+
+## 3: Pretrained Model
+You can get the pretrained models from [this](../../../docs/source/released_model.md).
+
+using the `tar` scripts to unpack the model and then you can use the script to test the model.
+
+For example:
+```
+wget https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_0.tar.gz
+tar xzvf sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_0.tar.gz
+source path.sh
+# If you have processed the data and get the manifest file， you can skip the following 2 steps
+
+CUDA_VISIBLE_DEVICES= ./local/test.sh ./data sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_2 conf/ecapa_tdnn.yaml
+```
+The performance of the released models are shown in [this](./RESULTS.md)