You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
79 lines
4.0 KiB
79 lines
4.0 KiB
2 years ago
|
# Streaming DeepSpeech2 Server with WebSocket
|
||
|
|
||
|
This example is about using `websocket` as streaming deepspeech2 server. For deepspeech2 model training please see [here](../../../../examples/aishell/asr0/).
|
||
|
|
||
|
The websocket protocal is same to [PaddleSpeech Server](../../../../demos/streaming_asr_server/),
|
||
|
for detail of implementation please see [here](../../../speechx/protocol/websocket/).
|
||
|
|
||
|
|
||
|
## Source path.sh
|
||
|
|
||
|
```bash
|
||
|
. path.sh
|
||
|
```
|
||
|
|
||
|
SpeechX bins is under `echo $SPEECHX_BUILD`, more info please see `path.sh`.
|
||
|
|
||
|
|
||
|
## Start WebSocket Server
|
||
|
|
||
|
```bash
|
||
|
bash websoket_server.sh
|
||
|
```
|
||
|
|
||
|
The output is like below:
|
||
|
|
||
|
```text
|
||
|
I1130 02:19:32.029882 12856 cmvn_json2kaldi_main.cc:39] cmvn josn path: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/model/data/mean_std.json
|
||
|
I1130 02:19:32.032230 12856 cmvn_json2kaldi_main.cc:73] nframe: 907497
|
||
|
I1130 02:19:32.032564 12856 cmvn_json2kaldi_main.cc:85] cmvn stats have write into: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/cmvn.ark
|
||
|
I1130 02:19:32.032579 12856 cmvn_json2kaldi_main.cc:86] Binary: 1
|
||
|
I1130 02:19:32.798342 12937 feature_pipeline.h:53] cmvn file: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/cmvn.ark
|
||
|
I1130 02:19:32.798542 12937 feature_pipeline.h:58] dither: 0
|
||
|
I1130 02:19:32.798583 12937 feature_pipeline.h:60] frame shift ms: 10
|
||
|
I1130 02:19:32.798588 12937 feature_pipeline.h:62] feature type: linear
|
||
|
I1130 02:19:32.798596 12937 feature_pipeline.h:80] frame length ms: 20
|
||
|
I1130 02:19:32.798601 12937 feature_pipeline.h:88] subsampling rate: 4
|
||
|
I1130 02:19:32.798606 12937 feature_pipeline.h:90] nnet receptive filed length: 7
|
||
|
I1130 02:19:32.798611 12937 feature_pipeline.h:92] nnet chunk size: 1
|
||
|
I1130 02:19:32.798615 12937 feature_pipeline.h:94] frontend fill zeros: 0
|
||
|
I1130 02:19:32.798630 12937 nnet_itf.h:52] subsampling rate: 4
|
||
|
I1130 02:19:32.798635 12937 nnet_itf.h:54] model path: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/model/exp/deepspeech2_online/checkpoints//avg_1.jit.pdmodel
|
||
|
I1130 02:19:32.798640 12937 nnet_itf.h:57] param path: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/model/exp/deepspeech2_online/checkpoints//avg_1.jit.pdiparams
|
||
|
I1130 02:19:32.798643 12937 nnet_itf.h:59] DS2 param:
|
||
|
I1130 02:19:32.798647 12937 nnet_itf.h:61] cache names: chunk_state_h_box,chunk_state_c_box
|
||
|
I1130 02:19:32.798652 12937 nnet_itf.h:63] cache shape: 5-1-1024,5-1-1024
|
||
|
I1130 02:19:32.798656 12937 nnet_itf.h:65] input names: audio_chunk,audio_chunk_lens,chunk_state_h_box,chunk_state_c_box
|
||
|
I1130 02:19:32.798660 12937 nnet_itf.h:67] output names: softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0
|
||
|
I1130 02:19:32.798664 12937 ctc_tlg_decoder.h:41] fst path: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/wfst//TLG.fst
|
||
|
I1130 02:19:32.798669 12937 ctc_tlg_decoder.h:42] fst symbole table: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/wfst//words.txt
|
||
|
I1130 02:19:32.798673 12937 ctc_tlg_decoder.h:47] LatticeFasterDecoder max active: 7500
|
||
|
I1130 02:19:32.798677 12937 ctc_tlg_decoder.h:49] LatticeFasterDecoder beam: 15
|
||
|
I1130 02:19:32.798681 12937 ctc_tlg_decoder.h:50] LatticeFasterDecoder lattice_beam: 7.5
|
||
|
I1130 02:19:32.798708 12937 websocket_server_main.cc:37] Listening at port 8082
|
||
|
```
|
||
|
|
||
|
## Start WebSocket Client
|
||
|
|
||
|
```bash
|
||
|
bash websocket_client.sh
|
||
|
```
|
||
|
|
||
|
This script using AISHELL-1 test data to call websocket server.
|
||
|
|
||
|
The input is specific by `--wav_rspecifier=scp:$data/$aishell_wav_scp`.
|
||
|
|
||
|
The `scp` file which look like this:
|
||
|
```text
|
||
|
# head data/split1/1/aishell_test.scp
|
||
|
BAC009S0764W0121 /workspace/PaddleSpeech/speechx/examples/u2pp_ol/wenetspeech/data/test/S0764/BAC009S0764W0121.wav
|
||
|
BAC009S0764W0122 /workspace/PaddleSpeech/speechx/examples/u2pp_ol/wenetspeech/data/test/S0764/BAC009S0764W0122.wav
|
||
|
...
|
||
|
BAC009S0764W0125 /workspace/PaddleSpeech/speechx/examples/u2pp_ol/wenetspeech/data/test/S0764/BAC009S0764W0125.wav
|
||
|
```
|
||
|
|
||
|
If you want to recognize one wav, you can make `scp` file like this:
|
||
|
```text
|
||
|
key path/to/wav/file
|
||
|
```
|