3.6 KiB
Aishell - Deepspeech2 Streaming
We recommend using U2/U2++ model instead of DS2, please see here.
A C++ deployment example for using the deepspeech2 model to recognize wav
and compute CER
. We using AISHELL-1 as test data.
Source path.sh
. path.sh
SpeechX bins is under echo $SPEECHX_BUILD
, more info please see path.sh
.
Recognize with linear feature
bash run.sh
run.sh
has multi stage, for details please see run.sh
:
- donwload dataset, model and lm
- convert cmvn format and compute feature
- decode w/o lm by feature
- decode w/ ngram lm by feature
- decode w/ TLG graph by feature
- recognize w/ TLG graph by wav input
Recognize with .scp
file for wav
This sciprt using recognizer_main
to recognize wav file.
The input is scp
file which look like this:
# head data/split1/1/aishell_test.scp
BAC009S0764W0121 /workspace/PaddleSpeech/speechx/examples/u2pp_ol/wenetspeech/data/test/S0764/BAC009S0764W0121.wav
BAC009S0764W0122 /workspace/PaddleSpeech/speechx/examples/u2pp_ol/wenetspeech/data/test/S0764/BAC009S0764W0122.wav
...
BAC009S0764W0125 /workspace/PaddleSpeech/speechx/examples/u2pp_ol/wenetspeech/data/test/S0764/BAC009S0764W0125.wav
If you want to recognize one wav, you can make scp
file like this:
key path/to/wav/file
Then specify --wav_rspecifier=
param for recognizer_main
bin. For other flags meaning, please see help
:
recognizer_main --help
For the exmaple to using recognizer_main
please see run.sh
.
CTC Prefix Beam Search w/o LM
Overall -> 16.14 % N=104612 C=88190 S=16110 D=312 I=465
Mandarin -> 16.14 % N=104612 C=88190 S=16110 D=312 I=465
Other -> 0.00 % N=0 C=0 S=0 D=0 I=0
CTC Prefix Beam Search w/ LM
LM: zh_giga.no_cna_cmn.prune01244.klm
Overall -> 7.86 % N=104768 C=96865 S=7573 D=330 I=327
Mandarin -> 7.86 % N=104768 C=96865 S=7573 D=330 I=327
Other -> 0.00 % N=0 C=0 S=0 D=0 I=0
CTC TLG WFST
LM: aishell train --acoustic_scale=1.2
Overall -> 11.14 % N=103017 C=93363 S=9583 D=71 I=1819
Mandarin -> 11.14 % N=103017 C=93363 S=9583 D=71 I=1818
Other -> 0.00 % N=0 C=0 S=0 D=0 I=1
LM: wenetspeech --acoustic_scale=1.5
Overall -> 10.93 % N=104765 C=93410 S=9780 D=1575 I=95
Mandarin -> 10.93 % N=104762 C=93410 S=9779 D=1573 I=95
Other -> 100.00 % N=3 C=0 S=1 D=2 I=0
Recognize with fbank feature
This script is same to run.sh
, but using fbank feature.
bash run_fbank.sh
CTC Prefix Beam Search w/o LM
Overall -> 10.44 % N=104765 C=94194 S=10174 D=397 I=369
Mandarin -> 10.44 % N=104762 C=94194 S=10171 D=397 I=369
Other -> 100.00 % N=3 C=0 S=3 D=0 I=0
CTC Prefix Beam Search w/ LM
LM: zh_giga.no_cna_cmn.prune01244.klm
Overall -> 5.82 % N=104765 C=99386 S=4944 D=435 I=720
Mandarin -> 5.82 % N=104762 C=99386 S=4941 D=435 I=720
English -> 0.00 % N=0 C=0 S=0 D=0 I=0
CTC TLG WFST
LM: aishell train
Overall -> 9.58 % N=104765 C=94817 S=4326 D=5622 I=84
Mandarin -> 9.57 % N=104762 C=94817 S=4325 D=5620 I=84
Other -> 100.00 % N=3 C=0 S=1 D=2 I=0
Build TLG WFST graph
The script is for building TLG wfst graph, depending on srilm
, please make sure it is installed.
For more information please see the script below.
bash ./local/run_build_tlg.sh