Merge pull request #1891 from SmileGoat/add_demos

[speechx] add custom_streaming_asr
4 years ago · c119810664
parent 8ed8c9c161 8126ae726b
commit c119810664
6 changed files with 181 additions and 0 deletions
--- a/demos/custom_streaming_asr/README.md
+++ b/demos/custom_streaming_asr/README.md
@ -0,0 +1,64 @@
 ([简体中文](./README_cn.md)|English)
 # Customized Auto Speech Recognition
 ## introduction
 In some cases, we need to recognize the specific rare words with high accuracy. eg: address recognition in navigation apps. customized ASR can slove those issues.
 this demo is customized for expense account, which need to recognize rare address.
 * G with slot: 打车到 "address_slot"。
 ![](https://ai-studio-static-online.cdn.bcebos.com/28d9ef132a7f47a895a65ae9e5c4f55b8f472c9f3dd24be8a2e66e0b88b173a4)
 * this is address slot wfst, you can add the address which want to recognize.
 ![](https://ai-studio-static-online.cdn.bcebos.com/47c89100ef8c465bac733605ffc53d76abefba33d62f4d818d351f8cea3c8fe2)
 * after replace operation, G = fstreplace(G_with_slot, address_slot), we will get the customized graph.
 ![](https://ai-studio-static-online.cdn.bcebos.com/60a3095293044f10b73039ab10c7950d139a6717580a44a3ba878c6e74de402b)  
 ## Usage
 ### 1. Installation
 install paddle:2.2.2 docker.
 ```
 sudo nvidia-docker pull registry.baidubce.com/paddlepaddle/paddle:2.2.2
 sudo nvidia-docker run --privileged  --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash 
 ```
 ### 2. demo
 * run websocket_server.sh.  This script will download resources and libs, and launch the service.
 ```
 bash websocket_server.sh
 ```
 this script run in two steps:
 1. download the resources.tar.gz, those direcotries will be found in resource directory.
 model: acustic model
 graph: the decoder graph (TLG.fst)  
 lib: some libs  
 bin: binary  
 data: audio and wav.scp
 2. websocket_server_main launch the service.
 some params:
 port: the service port  
 graph_path: the decoder graph path  
 model_path: acustic model path  
 please refer other params in those files:  
 PaddleSpeech/speechx/speechx/decoder/param.h  
 PaddleSpeech/speechx/examples/ds2_ol/websocket/websocket_server_main.cc  
 * In other terminal, run script websocket_client.sh, the client will send data and get the results.
 ```
 bash websocket_client.sh
 ```
 websocket_client_main will launch the client, the wav_scp is the wav set, port is the server service port.
 * result:
 In the log of client, you will see the message below:
 ```
 0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208
 I0513 10:58:13.884493 41768 feature_cache.h:52] set finished
 I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240
 I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240
 LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90)  the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元
 ```
--- a/demos/custom_streaming_asr/README_cn.md
+++ b/demos/custom_streaming_asr/README_cn.md
@ -0,0 +1,63 @@
 (简体中文|[English](./README.md))
 # 定制化语音识别演示
 ## 介绍
 在一些场景中，识别系统需要高精度的识别一些稀有词，例如导航软件中地名识别。而通过定制化识别可以满足这一需求。  
 这个 demo 是打车报销单的场景识别，需要识别一些稀有的地名，可以通过如下操作实现。
 * G with slot: 打车到 "address_slot"。
 ![](https://ai-studio-static-online.cdn.bcebos.com/28d9ef132a7f47a895a65ae9e5c4f55b8f472c9f3dd24be8a2e66e0b88b173a4)
 * 这是address slot wfst, 可以添加一些需要识别的地名.
 ![](https://ai-studio-static-online.cdn.bcebos.com/47c89100ef8c465bac733605ffc53d76abefba33d62f4d818d351f8cea3c8fe2)
 * 通过replace 操作, G = fstreplace(G_with_slot, address_slot), 最终可以得到定制化的解码图。
 ![](https://ai-studio-static-online.cdn.bcebos.com/60a3095293044f10b73039ab10c7950d139a6717580a44a3ba878c6e74de402b)  
 ## 使用方法
 ### 1. 配置环境
 安装paddle:2.2.2 docker镜像。
 ```
 sudo nvidia-docker pull registry.baidubce.com/paddlepaddle/paddle:2.2.2
 sudo nvidia-docker run --privileged  --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash 
 ```
 ### 2. 演示
 * 运行如下命令，完成相关资源和库的下载和服务启动。
 ```
 bash websocket_server.sh
 ```
 上面脚本完成了如下两个功能：
 1. 完成resource.tar.gz下载，解压后,会在resource中发现如下目录：
 model: 声学模型
 graph: 解码构图
 lib: 相关库
 bin: 运行程序
 data: 语音数据
 2. 通过websocket_server_main来启动服务。
 这里简单的介绍几个参数:
 port是服务端口，
 graph_path用来指定解码图文件，
 model相关参数用来指定声学模型文件。
 其他参数说明可参见代码：
 PaddleSpeech/speechx/speechx/decoder/param.h
 PaddleSpeech/speechx/examples/ds2_ol/websocket/websocket_server_main.cc
 * 在另一个终端中， 通过client发送数据，得到结果。运行如下命令：
 ```
 bash websocket_client.sh
 ```
 通过websocket_client_main来启动client服务，其中$wav_scp是发送的语音句子集合，port为服务端口。
 * 结果：
 client的log中可以看到如下类似的结果
 ```
 0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208
 I0513 10:58:13.884493 41768 feature_cache.h:52] set finished
 I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240
 I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240
 LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90)  the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元
 ```
--- a/demos/custom_streaming_asr/path.sh
+++ b/demos/custom_streaming_asr/path.sh
@ -0,0 +1,2 @@
 export LD_LIBRARY_PATH=$PWD/resource/lib
 export PATH=$PATH:$PWD/resource/bin
--- a/demos/custom_streaming_asr/setup_docker.sh
+++ b/demos/custom_streaming_asr/setup_docker.sh
@ -0,0 +1 @@
 sudo nvidia-docker run --privileged  --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash
--- a/demos/custom_streaming_asr/websocket_client.sh
+++ b/demos/custom_streaming_asr/websocket_client.sh
@ -0,0 +1,18 @@
 #!/bin/bash
 set +x
 set -e
 . path.sh
 # input
 data=$PWD/data
 # output
 wav_scp=wav.scp
 export GLOG_logtostderr=1
 # websocket client
 websocket_client_main \
    --wav_rspecifier=scp:$data/$wav_scp \
    --streaming_chunk=0.36 \
    --port=8881
--- a/demos/custom_streaming_asr/websocket_server.sh
+++ b/demos/custom_streaming_asr/websocket_server.sh
@ -0,0 +1,33 @@
 #!/bin/bash
 set +x
 set -e
 export GLOG_logtostderr=1
 . path.sh
 #test websocket server 
 model_dir=./resource/model
 graph_dir=./resource/graph
 cmvn=./data/cmvn.ark
 #paddle_asr_online/resource.tar.gz
 if [ ! -f $cmvn ]; then
    wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/resource.tar.gz
    tar xzfv resource.tar.gz
    ln -s ./resource/data .
 fi
 websocket_server_main \
    --cmvn_file=$cmvn \
    --streaming_chunk=0.1 \
    --use_fbank=true \
    --model_path=$model_dir/avg_10.jit.pdmodel \
    --param_path=$model_dir/avg_10.jit.pdiparams \
    --model_cache_shapes="5-1-2048,5-1-2048" \
    --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
    --word_symbol_table=$graph_dir/words.txt \
    --graph_path=$graph_dir/TLG.fst --max_active=7500 \
    --port=8881 \
    --acoustic_scale=12
		`@ -0,0 +1,2 @@`
							`export LD_LIBRARY_PATH=$PWD/resource/lib`
							`export PATH=$PATH:$PWD/resource/bin`
		`@ -0,0 +1 @@`
							`sudo nvidia-docker run --privileged --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash`