You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
PaddleSpeech/speechx/examples/ds2_ol/websocket/README.md

79 lines
4.0 KiB

# Streaming DeepSpeech2 Server with WebSocket
This example is about using `websocket` as streaming deepspeech2 server. For deepspeech2 model training please see [here](../../../../examples/aishell/asr0/).
The websocket protocal is same to [PaddleSpeech Server](../../../../demos/streaming_asr_server/),
for detail of implementation please see [here](../../../speechx/protocol/websocket/).
## Source path.sh
```bash
. path.sh
```
SpeechX bins is under `echo $SPEECHX_BUILD`, more info please see `path.sh`.
## Start WebSocket Server
```bash
bash websoket_server.sh
```
The output is like below:
```text
I1130 02:19:32.029882 12856 cmvn_json2kaldi_main.cc:39] cmvn josn path: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/model/data/mean_std.json
I1130 02:19:32.032230 12856 cmvn_json2kaldi_main.cc:73] nframe: 907497
I1130 02:19:32.032564 12856 cmvn_json2kaldi_main.cc:85] cmvn stats have write into: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/cmvn.ark
I1130 02:19:32.032579 12856 cmvn_json2kaldi_main.cc:86] Binary: 1
I1130 02:19:32.798342 12937 feature_pipeline.h:53] cmvn file: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/cmvn.ark
I1130 02:19:32.798542 12937 feature_pipeline.h:58] dither: 0
I1130 02:19:32.798583 12937 feature_pipeline.h:60] frame shift ms: 10
I1130 02:19:32.798588 12937 feature_pipeline.h:62] feature type: linear
I1130 02:19:32.798596 12937 feature_pipeline.h:80] frame length ms: 20
I1130 02:19:32.798601 12937 feature_pipeline.h:88] subsampling rate: 4
I1130 02:19:32.798606 12937 feature_pipeline.h:90] nnet receptive filed length: 7
I1130 02:19:32.798611 12937 feature_pipeline.h:92] nnet chunk size: 1
I1130 02:19:32.798615 12937 feature_pipeline.h:94] frontend fill zeros: 0
I1130 02:19:32.798630 12937 nnet_itf.h:52] subsampling rate: 4
I1130 02:19:32.798635 12937 nnet_itf.h:54] model path: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/model/exp/deepspeech2_online/checkpoints//avg_1.jit.pdmodel
I1130 02:19:32.798640 12937 nnet_itf.h:57] param path: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/model/exp/deepspeech2_online/checkpoints//avg_1.jit.pdiparams
I1130 02:19:32.798643 12937 nnet_itf.h:59] DS2 param:
I1130 02:19:32.798647 12937 nnet_itf.h:61] cache names: chunk_state_h_box,chunk_state_c_box
I1130 02:19:32.798652 12937 nnet_itf.h:63] cache shape: 5-1-1024,5-1-1024
I1130 02:19:32.798656 12937 nnet_itf.h:65] input names: audio_chunk,audio_chunk_lens,chunk_state_h_box,chunk_state_c_box
I1130 02:19:32.798660 12937 nnet_itf.h:67] output names: softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0
I1130 02:19:32.798664 12937 ctc_tlg_decoder.h:41] fst path: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/wfst//TLG.fst
I1130 02:19:32.798669 12937 ctc_tlg_decoder.h:42] fst symbole table: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/wfst//words.txt
I1130 02:19:32.798673 12937 ctc_tlg_decoder.h:47] LatticeFasterDecoder max active: 7500
I1130 02:19:32.798677 12937 ctc_tlg_decoder.h:49] LatticeFasterDecoder beam: 15
I1130 02:19:32.798681 12937 ctc_tlg_decoder.h:50] LatticeFasterDecoder lattice_beam: 7.5
I1130 02:19:32.798708 12937 websocket_server_main.cc:37] Listening at port 8082
```
## Start WebSocket Client
```bash
bash websocket_client.sh
```
This script using AISHELL-1 test data to call websocket server.
The input is specific by `--wav_rspecifier=scp:$data/$aishell_wav_scp`.
The `scp` file which look like this:
```text
# head data/split1/1/aishell_test.scp
BAC009S0764W0121 /workspace/PaddleSpeech/speechx/examples/u2pp_ol/wenetspeech/data/test/S0764/BAC009S0764W0121.wav
BAC009S0764W0122 /workspace/PaddleSpeech/speechx/examples/u2pp_ol/wenetspeech/data/test/S0764/BAC009S0764W0122.wav
...
BAC009S0764W0125 /workspace/PaddleSpeech/speechx/examples/u2pp_ol/wenetspeech/data/test/S0764/BAC009S0764W0125.wav
```
If you want to recognize one wav, you can make `scp` file like this:
```text
key path/to/wav/file
```