diff --git a/demos/custom_streaming_asr/README.md b/demos/custom_streaming_asr/README.md new file mode 100644 index 00000000..5d94856f --- /dev/null +++ b/demos/custom_streaming_asr/README.md @@ -0,0 +1,64 @@ +([简体中文](./README_cn.md)|English) + +# Customized Auto Speech Recognition + +## introduction +In some cases, we need to recognize the specific rare words with high accuracy. eg: address recognition in navigation apps. customized ASR can slove those issues. + +this demo is customized for expense account, which need to recognize rare address. + +* G with slot: 打车到 "address_slot"。 +![](https://ai-studio-static-online.cdn.bcebos.com/28d9ef132a7f47a895a65ae9e5c4f55b8f472c9f3dd24be8a2e66e0b88b173a4) + +* this is address slot wfst, you can add the address which want to recognize. +![](https://ai-studio-static-online.cdn.bcebos.com/47c89100ef8c465bac733605ffc53d76abefba33d62f4d818d351f8cea3c8fe2) + +* after replace operation, G = fstreplace(G_with_slot, address_slot), we will get the customized graph. +![](https://ai-studio-static-online.cdn.bcebos.com/60a3095293044f10b73039ab10c7950d139a6717580a44a3ba878c6e74de402b) + +## Usage +### 1. Installation +install paddle:2.2.2 docker. +``` +sudo nvidia-docker pull registry.baidubce.com/paddlepaddle/paddle:2.2.2 + +sudo nvidia-docker run --privileged --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash +``` + +### 2. demo +* run websocket_server.sh. This script will download resources and libs, and launch the service. +``` +bash websocket_server.sh +``` +this script run in two steps: +1. download the resources.tar.gz, those direcotries will be found in resource directory. +model: acustic model +graph: the decoder graph (TLG.fst) +lib: some libs +bin: binary +data: audio and wav.scp + +2. websocket_server_main launch the service. +some params: +port: the service port +graph_path: the decoder graph path +model_path: acustic model path +please refer other params in those files: +PaddleSpeech/speechx/speechx/decoder/param.h +PaddleSpeech/speechx/examples/ds2_ol/websocket/websocket_server_main.cc + +* In other terminal, run script websocket_client.sh, the client will send data and get the results. +``` +bash websocket_client.sh +``` +websocket_client_main will launch the client, the wav_scp is the wav set, port is the server service port. + +* result: +In the log of client, you will see the message below: +``` +0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208 +I0513 10:58:13.884493 41768 feature_cache.h:52] set finished +I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240 +I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240 +LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90) the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元 +``` \ No newline at end of file diff --git a/demos/custom_streaming_asr/README_cn.md b/demos/custom_streaming_asr/README_cn.md new file mode 100644 index 00000000..209b882e --- /dev/null +++ b/demos/custom_streaming_asr/README_cn.md @@ -0,0 +1,63 @@ +(简体中文|[English](./README.md)) + +# 定制化语音识别演示 +## 介绍 +在一些场景中,识别系统需要高精度的识别一些稀有词,例如导航软件中地名识别。而通过定制化识别可以满足这一需求。 + +这个 demo 是打车报销单的场景识别,需要识别一些稀有的地名,可以通过如下操作实现。 + +* G with slot: 打车到 "address_slot"。 +![](https://ai-studio-static-online.cdn.bcebos.com/28d9ef132a7f47a895a65ae9e5c4f55b8f472c9f3dd24be8a2e66e0b88b173a4) + +* 这是address slot wfst, 可以添加一些需要识别的地名. +![](https://ai-studio-static-online.cdn.bcebos.com/47c89100ef8c465bac733605ffc53d76abefba33d62f4d818d351f8cea3c8fe2) + +* 通过replace 操作, G = fstreplace(G_with_slot, address_slot), 最终可以得到定制化的解码图。 +![](https://ai-studio-static-online.cdn.bcebos.com/60a3095293044f10b73039ab10c7950d139a6717580a44a3ba878c6e74de402b) + +## 使用方法 +### 1. 配置环境 +安装paddle:2.2.2 docker镜像。 +``` +sudo nvidia-docker pull registry.baidubce.com/paddlepaddle/paddle:2.2.2 + +sudo nvidia-docker run --privileged --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash +``` + +### 2. 演示 +* 运行如下命令,完成相关资源和库的下载和服务启动。 +``` +bash websocket_server.sh +``` +上面脚本完成了如下两个功能: +1. 完成resource.tar.gz下载,解压后,会在resource中发现如下目录: +model: 声学模型 +graph: 解码构图 +lib: 相关库 +bin: 运行程序 +data: 语音数据 + +2. 通过websocket_server_main来启动服务。 +这里简单的介绍几个参数: +port是服务端口, +graph_path用来指定解码图文件, +model相关参数用来指定声学模型文件。 +其他参数说明可参见代码: +PaddleSpeech/speechx/speechx/decoder/param.h +PaddleSpeech/speechx/examples/ds2_ol/websocket/websocket_server_main.cc + +* 在另一个终端中, 通过client发送数据,得到结果。运行如下命令: +``` +bash websocket_client.sh +``` +通过websocket_client_main来启动client服务,其中$wav_scp是发送的语音句子集合,port为服务端口。 + +* 结果: +client的log中可以看到如下类似的结果 +``` +0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208 +I0513 10:58:13.884493 41768 feature_cache.h:52] set finished +I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240 +I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240 +LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90) the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元 +``` diff --git a/demos/custom_streaming_asr/path.sh b/demos/custom_streaming_asr/path.sh new file mode 100644 index 00000000..47462324 --- /dev/null +++ b/demos/custom_streaming_asr/path.sh @@ -0,0 +1,2 @@ +export LD_LIBRARY_PATH=$PWD/resource/lib +export PATH=$PATH:$PWD/resource/bin diff --git a/demos/custom_streaming_asr/setup_docker.sh b/demos/custom_streaming_asr/setup_docker.sh new file mode 100644 index 00000000..329a75db --- /dev/null +++ b/demos/custom_streaming_asr/setup_docker.sh @@ -0,0 +1 @@ +sudo nvidia-docker run --privileged --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash diff --git a/demos/custom_streaming_asr/websocket_client.sh b/demos/custom_streaming_asr/websocket_client.sh new file mode 100755 index 00000000..ede076ca --- /dev/null +++ b/demos/custom_streaming_asr/websocket_client.sh @@ -0,0 +1,18 @@ +#!/bin/bash +set +x +set -e + +. path.sh +# input +data=$PWD/data + +# output +wav_scp=wav.scp + +export GLOG_logtostderr=1 + +# websocket client +websocket_client_main \ + --wav_rspecifier=scp:$data/$wav_scp \ + --streaming_chunk=0.36 \ + --port=8881 diff --git a/demos/custom_streaming_asr/websocket_server.sh b/demos/custom_streaming_asr/websocket_server.sh new file mode 100755 index 00000000..041c345b --- /dev/null +++ b/demos/custom_streaming_asr/websocket_server.sh @@ -0,0 +1,33 @@ +#!/bin/bash +set +x +set -e + +export GLOG_logtostderr=1 + +. path.sh +#test websocket server + +model_dir=./resource/model +graph_dir=./resource/graph +cmvn=./data/cmvn.ark + + +#paddle_asr_online/resource.tar.gz +if [ ! -f $cmvn ]; then + wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/resource.tar.gz + tar xzfv resource.tar.gz + ln -s ./resource/data . +fi + +websocket_server_main \ + --cmvn_file=$cmvn \ + --streaming_chunk=0.1 \ + --use_fbank=true \ + --model_path=$model_dir/avg_10.jit.pdmodel \ + --param_path=$model_dir/avg_10.jit.pdiparams \ + --model_cache_shapes="5-1-2048,5-1-2048" \ + --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \ + --word_symbol_table=$graph_dir/words.txt \ + --graph_path=$graph_dir/TLG.fst --max_active=7500 \ + --port=8881 \ + --acoustic_scale=12