|
|
3 years ago | |
|---|---|---|
| .. | ||
| local | 3 years ago | |
| .gitignore | 4 years ago | |
| README.md | 3 years ago | |
| path.sh | 4 years ago | |
| run.sh | 3 years ago | |
| run_fbank.sh | 3 years ago | |
| utils | 4 years ago | |
README.md
Aishell - Deepspeech2 Streaming
We recommend using U2/U2++ model instead of DS2, please see here.
A C++ deployment example for using the deepspeech2 model to recognize wav and compute CER. We using AISHELL-1 as test data.
Source path.sh
. path.sh
SpeechX bins is under echo $SPEECHX_BUILD, more info please see path.sh.
Recognize with linear feature
bash run.sh
run.sh has multi stage, for details please see run.sh:
- donwload dataset, model and lm
- convert cmvn format and compute feature
- decode w/o lm by feature
- decode w/ ngram lm by feature
- decode w/ TLG graph by feature
- recognize w/ TLG graph by wav input
Recognize with .scp file for wav
This sciprt using recognizer_main to recognize wav file.
The input is scp file which look like this:
# head data/split1/1/aishell_test.scp
BAC009S0764W0121 /workspace/PaddleSpeech/speechx/examples/u2pp_ol/wenetspeech/data/test/S0764/BAC009S0764W0121.wav
BAC009S0764W0122 /workspace/PaddleSpeech/speechx/examples/u2pp_ol/wenetspeech/data/test/S0764/BAC009S0764W0122.wav
...
BAC009S0764W0125 /workspace/PaddleSpeech/speechx/examples/u2pp_ol/wenetspeech/data/test/S0764/BAC009S0764W0125.wav
If you want to recognize one wav, you can make scp file like this:
key path/to/wav/file
Then specify --wav_rspecifier= param for recognizer_main bin. For other flags meaning, please see help:
recognizer_main --help
For the exmaple to using recognizer_main please see run.sh.
CTC Prefix Beam Search w/o LM
Overall -> 16.14 % N=104612 C=88190 S=16110 D=312 I=465
Mandarin -> 16.14 % N=104612 C=88190 S=16110 D=312 I=465
Other -> 0.00 % N=0 C=0 S=0 D=0 I=0
CTC Prefix Beam Search w/ LM
LM: zh_giga.no_cna_cmn.prune01244.klm
Overall -> 7.86 % N=104768 C=96865 S=7573 D=330 I=327
Mandarin -> 7.86 % N=104768 C=96865 S=7573 D=330 I=327
Other -> 0.00 % N=0 C=0 S=0 D=0 I=0
CTC TLG WFST
LM: aishell train --acoustic_scale=1.2
Overall -> 11.14 % N=103017 C=93363 S=9583 D=71 I=1819
Mandarin -> 11.14 % N=103017 C=93363 S=9583 D=71 I=1818
Other -> 0.00 % N=0 C=0 S=0 D=0 I=1
LM: wenetspeech --acoustic_scale=1.5
Overall -> 10.93 % N=104765 C=93410 S=9780 D=1575 I=95
Mandarin -> 10.93 % N=104762 C=93410 S=9779 D=1573 I=95
Other -> 100.00 % N=3 C=0 S=1 D=2 I=0
Recognize with fbank feature
This script is same to run.sh, but using fbank feature.
bash run_fbank.sh
CTC Prefix Beam Search w/o LM
Overall -> 10.44 % N=104765 C=94194 S=10174 D=397 I=369
Mandarin -> 10.44 % N=104762 C=94194 S=10171 D=397 I=369
Other -> 100.00 % N=3 C=0 S=3 D=0 I=0
CTC Prefix Beam Search w/ LM
LM: zh_giga.no_cna_cmn.prune01244.klm
Overall -> 5.82 % N=104765 C=99386 S=4944 D=435 I=720
Mandarin -> 5.82 % N=104762 C=99386 S=4941 D=435 I=720
English -> 0.00 % N=0 C=0 S=0 D=0 I=0
CTC TLG WFST
LM: aishell train
Overall -> 9.58 % N=104765 C=94817 S=4326 D=5622 I=84
Mandarin -> 9.57 % N=104762 C=94817 S=4325 D=5620 I=84
Other -> 100.00 % N=3 C=0 S=1 D=2 I=0
Build TLG WFST graph
The script is for building TLG wfst graph, depending on srilm, please make sure it is installed.
For more information please see the script below.
bash ./local/run_build_tlg.sh