diff --git a/speechx/examples/ds2_ol/websocket/README.md b/speechx/examples/ds2_ol/websocket/README.md new file mode 100644 index 000000000..3fa84135f --- /dev/null +++ b/speechx/examples/ds2_ol/websocket/README.md @@ -0,0 +1,78 @@ +# Streaming DeepSpeech2 Server with WebSocket + +This example is about using `websocket` as streaming deepspeech2 server. For deepspeech2 model training please see [here](../../../../examples/aishell/asr0/). + +The websocket protocal is same to [PaddleSpeech Server](../../../../demos/streaming_asr_server/), +for detail of implementation please see [here](../../../speechx/protocol/websocket/). + + +## Source path.sh + +```bash +. path.sh +``` + +SpeechX bins is under `echo $SPEECHX_BUILD`, more info please see `path.sh`. + + +## Start WebSocket Server + +```bash +bash websoket_server.sh +``` + +The output is like below: + +```text +I1130 02:19:32.029882 12856 cmvn_json2kaldi_main.cc:39] cmvn josn path: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/model/data/mean_std.json +I1130 02:19:32.032230 12856 cmvn_json2kaldi_main.cc:73] nframe: 907497 +I1130 02:19:32.032564 12856 cmvn_json2kaldi_main.cc:85] cmvn stats have write into: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/cmvn.ark +I1130 02:19:32.032579 12856 cmvn_json2kaldi_main.cc:86] Binary: 1 +I1130 02:19:32.798342 12937 feature_pipeline.h:53] cmvn file: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/cmvn.ark +I1130 02:19:32.798542 12937 feature_pipeline.h:58] dither: 0 +I1130 02:19:32.798583 12937 feature_pipeline.h:60] frame shift ms: 10 +I1130 02:19:32.798588 12937 feature_pipeline.h:62] feature type: linear +I1130 02:19:32.798596 12937 feature_pipeline.h:80] frame length ms: 20 +I1130 02:19:32.798601 12937 feature_pipeline.h:88] subsampling rate: 4 +I1130 02:19:32.798606 12937 feature_pipeline.h:90] nnet receptive filed length: 7 +I1130 02:19:32.798611 12937 feature_pipeline.h:92] nnet chunk size: 1 +I1130 02:19:32.798615 12937 feature_pipeline.h:94] frontend fill zeros: 0 +I1130 02:19:32.798630 12937 nnet_itf.h:52] subsampling rate: 4 +I1130 02:19:32.798635 12937 nnet_itf.h:54] model path: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/model/exp/deepspeech2_online/checkpoints//avg_1.jit.pdmodel +I1130 02:19:32.798640 12937 nnet_itf.h:57] param path: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/model/exp/deepspeech2_online/checkpoints//avg_1.jit.pdiparams +I1130 02:19:32.798643 12937 nnet_itf.h:59] DS2 param: +I1130 02:19:32.798647 12937 nnet_itf.h:61] cache names: chunk_state_h_box,chunk_state_c_box +I1130 02:19:32.798652 12937 nnet_itf.h:63] cache shape: 5-1-1024,5-1-1024 +I1130 02:19:32.798656 12937 nnet_itf.h:65] input names: audio_chunk,audio_chunk_lens,chunk_state_h_box,chunk_state_c_box +I1130 02:19:32.798660 12937 nnet_itf.h:67] output names: softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 +I1130 02:19:32.798664 12937 ctc_tlg_decoder.h:41] fst path: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/wfst//TLG.fst +I1130 02:19:32.798669 12937 ctc_tlg_decoder.h:42] fst symbole table: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/wfst//words.txt +I1130 02:19:32.798673 12937 ctc_tlg_decoder.h:47] LatticeFasterDecoder max active: 7500 +I1130 02:19:32.798677 12937 ctc_tlg_decoder.h:49] LatticeFasterDecoder beam: 15 +I1130 02:19:32.798681 12937 ctc_tlg_decoder.h:50] LatticeFasterDecoder lattice_beam: 7.5 +I1130 02:19:32.798708 12937 websocket_server_main.cc:37] Listening at port 8082 +``` + +## Start WebSocket Client + +```bash +bash websocket_client.sh +``` + +This script using AISHELL-1 test data to call websocket server. + +The input is specific by `--wav_rspecifier=scp:$data/$aishell_wav_scp`. + +The `scp` file which look like this: +```text +# head data/split1/1/aishell_test.scp +BAC009S0764W0121 /workspace/PaddleSpeech/speechx/examples/u2pp_ol/wenetspeech/data/test/S0764/BAC009S0764W0121.wav +BAC009S0764W0122 /workspace/PaddleSpeech/speechx/examples/u2pp_ol/wenetspeech/data/test/S0764/BAC009S0764W0122.wav +... +BAC009S0764W0125 /workspace/PaddleSpeech/speechx/examples/u2pp_ol/wenetspeech/data/test/S0764/BAC009S0764W0125.wav +``` + +If you want to recognize one wav, you can make `scp` file like this: +```text +key path/to/wav/file +```