add readme

3 years ago · 81ae5ffd72
parent a5f52d6d8e
commit 81ae5ffd72
2 changed files with 93 additions and 10 deletions
--- a/demos/custom_streaming_asr/README.md
+++ b/demos/custom_streaming_asr/README.md
@ -3,14 +3,60 @@
 # Customized Auto Speech Recognition

 ## introduction
-In some cases, we need to recognize the specific sentence with high accuracy. eg: customized keyword spotting, address recognition in navigation apps . customized ASR can slove those issues.
+In some cases, we need to recognize the specific rare words with high accuracy. eg: address recognition in navigation apps. customized ASR can slove those issues.

-this demo is customized for expense account of taxi, which need to recognize rare address.
+this demo is customized for expense account, which need to recognize rare address.
+
+* G with slot: 打车到 "address_slot"。
+![](https://ai-studio-static-online.cdn.bcebos.com/28d9ef132a7f47a895a65ae9e5c4f55b8f472c9f3dd24be8a2e66e0b88b173a4)
+
+* this is address slot wfst, you can add the address which want to recognize.
+![](https://ai-studio-static-online.cdn.bcebos.com/47c89100ef8c465bac733605ffc53d76abefba33d62f4d818d351f8cea3c8fe2)
+
+* after replace operation, G = fstreplace(G_with_slot, address_slot), we will get the customized graph.
+![](https://ai-studio-static-online.cdn.bcebos.com/60a3095293044f10b73039ab10c7950d139a6717580a44a3ba878c6e74de402b)  

 ## Usage
 ### 1. Installation
-Install docker by runing script setup_docker.sh. And then, install tmux (apt-get install tmux).
+install paddle:2.2.2 docker.
+```
+sudo nvidia-docker run --privileged  --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash 
+```

 ### 2. demo
-* bash websocket_server.sh.  This script will download resources and libs, and then setup the server.
-* In the other terminal of docker, run script websocket_client.sh, the client will send data and get the results.
+* run websocket_server.sh.  This script will download resources and libs, and launch the service.
+```
+bash websocket_server.sh
+```
+this script run in two steps:
+1. download the resources.tar.gz, those direcotries will be found in resource directory.
+model: acustic model
+graph: the decoder graph (TLG.fst)  
+lib: some libs  
+bin: binary  
+data: audio and wav.scp
+
+2. websocket_server_main launch the service.
+some params:
+port: the service port  
+graph_path: the decoder graph path  
+model_path: acustic model path  
+please refer other params in those files:  
+PaddleSpeech/speechx/speechx/decoder/param.h  
+PaddleSpeech/speechx/examples/ds2_ol/websocket/websocket_server_main.cc  
+
+* In other terminal, run script websocket_client.sh, the client will send data and get the results.
+```
+bash websocket_client.sh
+```
+websocket_client_main will launch the client, the wav_scp is the wav set, port is the server service port.
+
+* result:
+In the log of client, you will see the message below:
+```
+0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208
+I0513 10:58:13.884493 41768 feature_cache.h:52] set finished
+I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240
+I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240
+LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90)  the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元
+```
--- a/demos/custom_streaming_asr/README_cn.md
+++ b/demos/custom_streaming_asr/README_cn.md
@ -1,18 +1,55 @@
-(简体中文|[English](./README.md)
+(简体中文|[English](./README.md))

 # 定制化语音识别演示
 ## 介绍
 定制化的语音识别是满足一些特定场景的语句识别的技术。

-可以参见简单的教程：
+可以参见简单的原理教程：
 https://aistudio.baidu.com/aistudio/projectdetail/3986429

 这个 demo 是打车报销单的场景识别，定制化了地点。

 ## 使用方法
 ### 1. 配置环境
-请通过 setup_docker.sh 安装镜像。进入镜像后，安装tmux (apt-get install tmux)，方便后续演示。
+安装paddle:2.2.2 docker镜像。
+```
+sudo nvidia-docker run --privileged  --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash 
+```

 ### 2. 演示
-* bash websocket_server.sh, 完成相关资源和库的下载。这时候服务已经启动。
-* 在镜像另一个终端中，bash websocket_client.sh， 通过client发送数据，得到结果。
+* 运行如下命令，完成相关资源和库的下载和服务启动。
+```
+bash websocket_server.sh
+```
+上面脚本完成了如下两个功能：
+1. 完成resource.tar.gz下载，解压后,会在resource中发现如下目录：
+model: 声学模型
+graph: 解码构图
+lib: 相关库
+bin: 运行程序
+data: 语音数据
+
+2. 通过websocket_server_main来启动服务。
+这里简单的介绍几个参数:
+port是服务端口，
+graph_path用来指定解码图文件，
+model相关参数用来指定声学模型文件。
+其他参数说明可参见代码：
+PaddleSpeech/speechx/speechx/decoder/param.h
+PaddleSpeech/speechx/examples/ds2_ol/websocket/websocket_server_main.cc
+
+* 在另一个终端中， 通过client发送数据，得到结果。运行如下命令：
+```
+bash websocket_client.sh
+```
+通过websocket_client_main来启动client服务，其中$wav_scp是发送的语音句子集合，port为服务端口。
+
+* 结果：
+client的log中可以看到如下类似的结果
+```
+0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208
+I0513 10:58:13.884493 41768 feature_cache.h:52] set finished
+I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240
+I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240
+LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90)  the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元
+```