You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
PaddleSpeech/demos/custom_streaming_asr
Hui Zhang caaa5cd502
more cli for speech demos
2 years ago
..
README.md fix 2 years ago
README_cn.md fix 2 years ago
path.sh add custom_streaming_asr 2 years ago
setup_docker.sh more cli for speech demos 2 years ago
websocket_client.sh add custom_streaming_asr 2 years ago
websocket_server.sh add custom_streaming_asr 2 years ago

README.md

(简体中文|English)

Customized Auto Speech Recognition

introduction

In some cases, we need to recognize the specific rare words with high accuracy. eg: address recognition in navigation apps. customized ASR can slove those issues.

this demo is customized for expense account, which need to recognize rare address.

the scripts are in https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/speechx/examples/custom_asr

  • G with slot: 打车到 "address_slot"。

  • this is address slot wfst, you can add the address which want to recognize.

  • after replace operation, G = fstreplace(G_with_slot, address_slot), we will get the customized graph.

Usage

1. Installation

install paddle:2.2.2 docker.

sudo docker pull registry.baidubce.com/paddlepaddle/paddle:2.2.2

sudo docker run --privileged  --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash 

2. demo

  • run websocket_server.sh. This script will download resources and libs, and launch the service.
cd /paddle
bash websocket_server.sh

this script run in two steps:

  1. download the resources.tar.gz, those direcotries will be found in resource directory.
    model: acustic model
    graph: the decoder graph (TLG.fst)
    lib: some libs
    bin: binary
    data: audio and wav.scp

  2. websocket_server_main launch the service.
    some params:
    port: the service port
    graph_path: the decoder graph path
    model_path: acustic model path
    please refer other params in those files:
    PaddleSpeech/speechx/speechx/decoder/param.h
    PaddleSpeech/speechx/examples/ds2_ol/websocket/websocket_server_main.cc

  • In other terminal, run script websocket_client.sh, the client will send data and get the results.
bash websocket_client.sh

websocket_client_main will launch the client, the wav_scp is the wav set, port is the server service port.

  • result: In the log of client, you will see the message below:
0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208
I0513 10:58:13.884493 41768 feature_cache.h:52] set finished
I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240
I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240
LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90)  the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元