Merge branch 'develop' into dev-hym

pull/1928/head
iftaken 3 years ago
commit 2938d3e49b

@ -161,6 +161,7 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
- 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV). - 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).
### Recent Update ### Recent Update
- 👑 2022.05.13: Release [PP-ASR](./docs/source/asr/PPASR.md)、[PP-TTS](./docs/source/tts/PPTTS.md)、[PP-VPR](docs/source/vpr/PPVPR.md)
- 👏🏻 2022.05.06: `Streaming ASR` with `Punctuation Restoration` and `Token Timestamp`. - 👏🏻 2022.05.06: `Streaming ASR` with `Punctuation Restoration` and `Token Timestamp`.
- 👏🏻 2022.05.06: `Server` is available for `Speaker Verification`, and `Punctuation Restoration`. - 👏🏻 2022.05.06: `Server` is available for `Speaker Verification`, and `Punctuation Restoration`.
- 👏🏻 2022.04.28: `Streaming Server` is available for `Automatic Speech Recognition` and `Text-to-Speech`. - 👏🏻 2022.04.28: `Streaming Server` is available for `Automatic Speech Recognition` and `Text-to-Speech`.

@ -182,6 +182,7 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
<!--- <!---
2021.12.14: We would like to have an online courses to introduce basics and research of speech, as well as code practice with `paddlespeech`. Please pay attention to our [Calendar](https://www.paddlepaddle.org.cn/live). 2021.12.14: We would like to have an online courses to introduce basics and research of speech, as well as code practice with `paddlespeech`. Please pay attention to our [Calendar](https://www.paddlepaddle.org.cn/live).
---> --->
- 👑 2022.05.13: PaddleSpeech 发布 [PP-ASR](./docs/source/asr/PPASR_cn.md)、[PP-TTS](./docs/source/tts/PPTTS_cn.md)、[PP-VPR](docs/source/vpr/PPVPR_cn.md)
- 👏🏻 2022.05.06: PaddleSpeech Streaming Server 上线! 覆盖了语音识别(标点恢复、时间戳),和语音合成。 - 👏🏻 2022.05.06: PaddleSpeech Streaming Server 上线! 覆盖了语音识别(标点恢复、时间戳),和语音合成。
- 👏🏻 2022.05.06: PaddleSpeech Server 上线! 覆盖了声音分类、语音识别、语音合成、声纹识别,标点恢复。 - 👏🏻 2022.05.06: PaddleSpeech Server 上线! 覆盖了声音分类、语音识别、语音合成、声纹识别,标点恢复。
- 👏🏻 2022.03.28: PaddleSpeech CLI 覆盖声音分类、语音识别、语音翻译(英译中)、语音合成,声纹验证。 - 👏🏻 2022.03.28: PaddleSpeech CLI 覆盖声音分类、语音识别、语音翻译(英译中)、语音合成,声纹验证。

@ -0,0 +1,74 @@
([简体中文](./README_cn.md)|English)
# ACS (Audio Content Search)
## Introduction
ACS, or Audio Content Search, refers to the problem of getting the key word time stamp from automatically transcribe spoken language (speech-to-text).
This demo is an implementation of obtaining the keyword timestamp in the text from a given audio file. It can be done by a single command or a few lines in python using `PaddleSpeech`.
Now, the search word in demo is:
```
```
## Usage
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from meduim and hard to install paddlespeech.
The dependency refers to the requirements.txt
### 2. Prepare Input File
The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
Here are sample files for this demo that can be downloaded:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
```
### 3. Usage
- Command Line(Recommended)
```bash
# Chinese
paddlespeech_client acs --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
```
Usage:
```bash
paddlespeech asr --help
```
Arguments:
- `input`(required): Audio file to recognize.
- `server_ip`: the server ip.
- `port`: the server port.
- `lang`: the language type of the model. Default: `zh`.
- `sample_rate`: Sample rate of the model. Default: `16000`.
- `audio_format`: The audio format.
Output:
```bash
[2022-05-15 15:00:58,185] [ INFO] - acs http client start
[2022-05-15 15:00:58,185] [ INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
[2022-05-15 15:01:03,220] [ INFO] - acs http client finished
[2022-05-15 15:01:03,221] [ INFO] - ACS result: {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
[2022-05-15 15:01:03,221] [ INFO] - Response time 5.036084 s.
```
- Python API
```python
from paddlespeech.server.bin.paddlespeech_client import ACSClientExecutor
acs_executor = ACSClientExecutor()
res = acs_executor(
input='./zh.wav',
server_ip="127.0.0.1",
port=8490,)
print(res)
```
Output:
```bash
[2022-05-15 15:08:13,955] [ INFO] - acs http client start
[2022-05-15 15:08:13,956] [ INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
[2022-05-15 15:08:19,026] [ INFO] - acs http client finished
{'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
```

@ -0,0 +1,74 @@
(简体中文|[English](./README.md))
# 语音内容搜索
## 介绍
语音内容搜索是一项用计算机程序获取转录语音内容关键词时间戳的技术。
这个 demo 是一个从给定音频文件获取其文本中关键词时间戳的实现,它可以通过使用 `PaddleSpeech` 的单个命令或 python 中的几行代码来实现。
当前示例中检索词是
```
```
## 使用方法
### 1. 安装
请看[安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)。
你可以从 mediumhard 三中方式中选择一种方式安装。
依赖参见 requirements.txt
### 2. 准备输入
这个 demo 的输入应该是一个 WAV 文件(`.wav`),并且采样率必须与模型的采样率相同。
可以下载此 demo 的示例音频:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
```
### 3. 使用方法
- 命令行 (推荐使用)
```bash
# 中文
paddlespeech_client acs --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
```
使用方法:
```bash
paddlespeech acs --help
```
参数:
- `input`(必须输入):用于识别的音频文件。
- `server_ip`: 服务的ip。
- `port`:服务的端口。
- `lang`:模型语言,默认值:`zh`。
- `sample_rate`:音频采样率,默认值:`16000`。
- `audio_format`: 音频的格式。
输出:
```bash
[2022-05-15 15:00:58,185] [ INFO] - acs http client start
[2022-05-15 15:00:58,185] [ INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
[2022-05-15 15:01:03,220] [ INFO] - acs http client finished
[2022-05-15 15:01:03,221] [ INFO] - ACS result: {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
[2022-05-15 15:01:03,221] [ INFO] - Response time 5.036084 s.
```
- Python API
```python
from paddlespeech.server.bin.paddlespeech_client import ACSClientExecutor
acs_executor = ACSClientExecutor()
res = acs_executor(
input='./zh.wav',
server_ip="127.0.0.1",
port=8490,)
print(res)
```
输出:
```bash
[2022-05-15 15:08:13,955] [ INFO] - acs http client start
[2022-05-15 15:08:13,956] [ INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
[2022-05-15 15:08:19,026] [ INFO] - acs http client finished
{'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
```

@ -0,0 +1,49 @@
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
from paddlespeech.cli.log import logger
from paddlespeech.server.utils.audio_handler import ASRHttpHandler
def main(args):
logger.info("asr http client start")
audio_format = "wav"
sample_rate = 16000
lang = "zh"
handler = ASRHttpHandler(
server_ip=args.server_ip, port=args.port, endpoint=args.endpoint)
res = handler.run(args.wavfile, audio_format, sample_rate, lang)
# res = res['result']
logger.info(f"the final result: {res}")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="audio content search client")
parser.add_argument(
'--server_ip', type=str, default='127.0.0.1', help='server ip')
parser.add_argument('--port', type=int, default=8090, help='server port')
parser.add_argument(
"--wavfile",
action="store",
help="wav file path ",
default="./16_audio.wav")
parser.add_argument(
'--endpoint',
type=str,
default='/paddlespeech/asr/search',
help='server endpoint')
args = parser.parse_args()
main(args)

@ -0,0 +1,34 @@
#################################################################################
# SERVER SETTING #
#################################################################################
host: 0.0.0.0
port: 8490
# The task format in the engin_list is: <speech task>_<engine type>
# task choices = ['acs_python']
# protocol = ['http'] (only one can be selected).
# http only support offline engine type.
protocol: 'http'
engine_list: ['acs_python']
#################################################################################
# ENGINE CONFIG #
#################################################################################
################################### ACS #########################################
################### acs task: engine_type: python ###############################
acs_python:
task: acs
asr_protocol: 'websocket' # 'websocket'
offset: 1.0 # second
asr_server_ip: 127.0.0.1
asr_server_port: 8390
lang: 'zh'
word_list: "./conf/words.txt"
sample_rate: 16000
device: 'cpu' # set 'gpu:id' or 'cpu'

@ -0,0 +1,43 @@
#################################################################################
# SERVER SETTING #
#################################################################################
host: 0.0.0.0
port: 8390
# The task format in the engin_list is: <speech task>_<engine type>
# task choices = ['asr_online']
# protocol = ['websocket'] (only one can be selected).
# websocket only support online engine type.
protocol: 'websocket'
engine_list: ['asr_online']
#################################################################################
# ENGINE CONFIG #
#################################################################################
################################### ASR #########################################
################### speech task: asr; engine_type: online #######################
asr_online:
model_type: 'conformer_online_multicn'
am_model: # the pdmodel file of am static model [optional]
am_params: # the pdiparams file of am static model [optional]
lang: 'zh'
sample_rate: 16000
cfg_path:
decode_method: 'attention_rescoring'
force_yes: True
device: 'cpu' # cpu or gpu:id
am_predictor_conf:
device: # set 'gpu:id' or 'cpu'
switch_ir_optim: True
glog_info: False # True -> print glog
summary: True # False -> do not show predictor config
chunk_buffer_conf:
window_n: 7 # frame
shift_n: 4 # frame
window_ms: 25 # ms
shift_ms: 10 # ms
sample_rate: 16000
sample_width: 2

@ -0,0 +1,46 @@
# This is the parameter configuration file for PaddleSpeech Serving.
#################################################################################
# SERVER SETTING #
#################################################################################
host: 0.0.0.0
port: 8390
# The task format in the engin_list is: <speech task>_<engine type>
# task choices = ['asr_online']
# protocol = ['websocket'] (only one can be selected).
# websocket only support online engine type.
protocol: 'websocket'
engine_list: ['asr_online']
#################################################################################
# ENGINE CONFIG #
#################################################################################
################################### ASR #########################################
################### speech task: asr; engine_type: online #######################
asr_online:
model_type: 'conformer_online_wenetspeech'
am_model: # the pdmodel file of am static model [optional]
am_params: # the pdiparams file of am static model [optional]
lang: 'zh'
sample_rate: 16000
cfg_path:
decode_method:
force_yes: True
device: 'cpu' # cpu or gpu:id
decode_method: "attention_rescoring"
am_predictor_conf:
device: # set 'gpu:id' or 'cpu'
switch_ir_optim: True
glog_info: False # True -> print glog
summary: True # False -> do not show predictor config
chunk_buffer_conf:
window_n: 7 # frame
shift_n: 4 # frame
window_ms: 25 # ms
shift_ms: 10 # ms
sample_rate: 16000
sample_width: 2

@ -0,0 +1,7 @@
export CUDA_VISIBLE_DEVICE=0,1,2,3
# we need the streaming asr server
nohup python3 streaming_asr_server.py --config_file conf/ws_conformer_application.yaml > streaming_asr.log 2>&1 &
# start the acs server
nohup paddlespeech_server start --config_file conf/acs_application.yaml > acs.log 2>&1 &

@ -0,0 +1,65 @@
([简体中文](./README_cn.md)|English)
# Customized Auto Speech Recognition
## introduction
In some cases, we need to recognize the specific rare words with high accuracy. eg: address recognition in navigation apps. customized ASR can slove those issues.
this demo is customized for expense account, which need to recognize rare address.
* G with slot: 打车到 "address_slot"。
![](https://ai-studio-static-online.cdn.bcebos.com/28d9ef132a7f47a895a65ae9e5c4f55b8f472c9f3dd24be8a2e66e0b88b173a4)
* this is address slot wfst, you can add the address which want to recognize.
![](https://ai-studio-static-online.cdn.bcebos.com/47c89100ef8c465bac733605ffc53d76abefba33d62f4d818d351f8cea3c8fe2)
* after replace operation, G = fstreplace(G_with_slot, address_slot), we will get the customized graph.
![](https://ai-studio-static-online.cdn.bcebos.com/60a3095293044f10b73039ab10c7950d139a6717580a44a3ba878c6e74de402b)
## Usage
### 1. Installation
install paddle:2.2.2 docker.
```
sudo docker pull registry.baidubce.com/paddlepaddle/paddle:2.2.2
sudo docker run --privileged --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash
```
### 2. demo
* run websocket_server.sh. This script will download resources and libs, and launch the service.
```
cd /paddle
bash websocket_server.sh
```
this script run in two steps:
1. download the resources.tar.gz, those direcotries will be found in resource directory.
model: acustic model
graph: the decoder graph (TLG.fst)
lib: some libs
bin: binary
data: audio and wav.scp
2. websocket_server_main launch the service.
some params:
port: the service port
graph_path: the decoder graph path
model_path: acustic model path
please refer other params in those files:
PaddleSpeech/speechx/speechx/decoder/param.h
PaddleSpeech/speechx/examples/ds2_ol/websocket/websocket_server_main.cc
* In other terminal, run script websocket_client.sh, the client will send data and get the results.
```
bash websocket_client.sh
```
websocket_client_main will launch the client, the wav_scp is the wav set, port is the server service port.
* result:
In the log of client, you will see the message below:
```
0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208
I0513 10:58:13.884493 41768 feature_cache.h:52] set finished
I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240
I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240
LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90) the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元
```

@ -0,0 +1,63 @@
(简体中文|[English](./README.md))
# 定制化语音识别演示
## 介绍
在一些场景中,识别系统需要高精度的识别一些稀有词,例如导航软件中地名识别。而通过定制化识别可以满足这一需求。
这个 demo 是打车报销单的场景识别,需要识别一些稀有的地名,可以通过如下操作实现。
* G with slot: 打车到 "address_slot"。
![](https://ai-studio-static-online.cdn.bcebos.com/28d9ef132a7f47a895a65ae9e5c4f55b8f472c9f3dd24be8a2e66e0b88b173a4)
* 这是 address slot wfst, 可以添加一些需要识别的地名.
![](https://ai-studio-static-online.cdn.bcebos.com/47c89100ef8c465bac733605ffc53d76abefba33d62f4d818d351f8cea3c8fe2)
* 通过 replace 操作, G = fstreplace(G_with_slot, address_slot), 最终可以得到定制化的解码图。
![](https://ai-studio-static-online.cdn.bcebos.com/60a3095293044f10b73039ab10c7950d139a6717580a44a3ba878c6e74de402b)
## 使用方法
### 1. 配置环境
安装paddle:2.2.2 docker镜像。
```
sudo docker pull registry.baidubce.com/paddlepaddle/paddle:2.2.2
sudo docker run --privileged --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash
```
### 2. 演示
* 运行如下命令,完成相关资源和库的下载和服务启动。
```
cd /paddle
bash websocket_server.sh
```
上面脚本完成了如下两个功能:
1. 完成 resource.tar.gz 下载,解压后,会在 resource 中发现如下目录:
model: 声学模型
graph: 解码构图
lib: 相关库
bin: 运行程序
data: 语音数据
2. 通过 websocket_server_main 来启动服务。
这里简单的介绍几个参数:
port 是服务端口,
graph_path 用来指定解码图文件,
其他参数说明可参见代码:
PaddleSpeech/speechx/speechx/decoder/param.h
PaddleSpeech/speechx/examples/ds2_ol/websocket/websocket_server_main.cc
* 在另一个终端中, 通过 client 发送数据,得到结果。运行如下命令:
```
bash websocket_client.sh
```
通过 websocket_client_main 来启动 client 服务,其中 wav_scp 是发送的语音句子集合port 为服务端口。
* 结果:
client 的 log 中可以看到如下类似的结果
```
0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208
I0513 10:58:13.884493 41768 feature_cache.h:52] set finished
I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240
I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240
LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90) the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元
```

@ -0,0 +1,2 @@
export LD_LIBRARY_PATH=$PWD/resource/lib
export PATH=$PATH:$PWD/resource/bin

@ -0,0 +1 @@
sudo nvidia-docker run --privileged --net=host --ipc=host -it --rm -v $PWD:/paddle --name=paddle_demo_docker registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash

@ -0,0 +1,18 @@
#!/bin/bash
set +x
set -e
. path.sh
# input
data=$PWD/data
# output
wav_scp=wav.scp
export GLOG_logtostderr=1
# websocket client
websocket_client_main \
--wav_rspecifier=scp:$data/$wav_scp \
--streaming_chunk=0.36 \
--port=8881

@ -0,0 +1,33 @@
#!/bin/bash
set +x
set -e
export GLOG_logtostderr=1
. path.sh
#test websocket server
model_dir=./resource/model
graph_dir=./resource/graph
cmvn=./data/cmvn.ark
#paddle_asr_online/resource.tar.gz
if [ ! -f $cmvn ]; then
wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/resource.tar.gz
tar xzfv resource.tar.gz
ln -s ./resource/data .
fi
websocket_server_main \
--cmvn_file=$cmvn \
--streaming_chunk=0.1 \
--use_fbank=true \
--model_path=$model_dir/avg_10.jit.pdmodel \
--param_path=$model_dir/avg_10.jit.pdiparams \
--model_cache_shapes="5-1-2048,5-1-2048" \
--model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
--word_symbol_table=$graph_dir/words.txt \
--graph_path=$graph_dir/TLG.fst --max_active=7500 \
--port=8881 \
--acoustic_scale=12

@ -52,8 +52,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
[2022-02-23 11:17:32] [INFO] [on.py:26] Waiting for application startup. [2022-02-23 11:17:32] [INFO] [on.py:26] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-02-23 11:17:32] [INFO] [on.py:38] Application startup complete. [2022-02-23 11:17:32] [INFO] [on.py:38] Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit) INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
[2022-02-23 11:17:32] [INFO] [server.py:204] Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit) [2022-02-23 11:17:32] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
``` ```
@ -75,8 +75,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
[2022-02-23 14:57:56] [INFO] [on.py:26] Waiting for application startup. [2022-02-23 14:57:56] [INFO] [on.py:26] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-02-23 14:57:56] [INFO] [on.py:38] Application startup complete. [2022-02-23 14:57:56] [INFO] [on.py:38] Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit) INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
[2022-02-23 14:57:56] [INFO] [server.py:204] Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit) [2022-02-23 14:57:56] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
``` ```
@ -84,6 +84,9 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
### 4. ASR Client Usage ### 4. ASR Client Usage
**Note:** The response time will be slightly longer when using the client for the first time **Note:** The response time will be slightly longer when using the client for the first time
- Command Line (Recommended) - Command Line (Recommended)
If `127.0.0.1` is not accessible, you need to use the actual service IP address.
``` ```
paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
``` ```
@ -132,6 +135,9 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
### 5. TTS Client Usage ### 5. TTS Client Usage
**Note:** The response time will be slightly longer when using the client for the first time **Note:** The response time will be slightly longer when using the client for the first time
- Command Line (Recommended) - Command Line (Recommended)
If `127.0.0.1` is not accessible, you need to use the actual service IP address
```bash ```bash
paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
``` ```
@ -192,6 +198,9 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
### 6. CLS Client Usage ### 6. CLS Client Usage
**Note:** The response time will be slightly longer when using the client for the first time **Note:** The response time will be slightly longer when using the client for the first time
- Command Line (Recommended) - Command Line (Recommended)
If `127.0.0.1` is not accessible, you need to use the actual service IP address.
``` ```
paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
``` ```
@ -242,9 +251,11 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
**Note:** The response time will be slightly longer when using the client for the first time **Note:** The response time will be slightly longer when using the client for the first time
- Command Line (Recommended) - Command Line (Recommended)
``` bash If `127.0.0.1` is not accessible, you need to use the actual service IP address.
paddlespeech_client vector --task spk --server_ip 127.0.0.1 --port 8090 --input 85236145389.wav
``` ``` bash
paddlespeech_client vector --task spk --server_ip 127.0.0.1 --port 8090 --input 85236145389.wav
```
* Usage: * Usage:
@ -297,6 +308,8 @@ print(res)
- Command Line (Recommended) - Command Line (Recommended)
If `127.0.0.1` is not accessible, you need to use the actual service IP address.
``` bash ``` bash
paddlespeech_client vector --task score --server_ip 127.0.0.1 --port 8090 --enroll 85236145389.wav --test 123456789.wav paddlespeech_client vector --task score --server_ip 127.0.0.1 --port 8090 --enroll 85236145389.wav --test 123456789.wav
``` ```
@ -357,6 +370,9 @@ print(res)
**Note:** The response time will be slightly longer when using the client for the first time **Note:** The response time will be slightly longer when using the client for the first time
- Command Line (Recommended) - Command Line (Recommended)
If `127.0.0.1` is not accessible, you need to use the actual service IP address.
``` bash ``` bash
paddlespeech_client text --server_ip 127.0.0.1 --port 8090 --input "我认为跑步最重要的就是给我带来了身体健康" paddlespeech_client text --server_ip 127.0.0.1 --port 8090 --input "我认为跑步最重要的就是给我带来了身体健康"
``` ```

@ -53,8 +53,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
[2022-02-23 11:17:32] [INFO] [on.py:26] Waiting for application startup. [2022-02-23 11:17:32] [INFO] [on.py:26] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-02-23 11:17:32] [INFO] [on.py:38] Application startup complete. [2022-02-23 11:17:32] [INFO] [on.py:38] Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit) INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
[2022-02-23 11:17:32] [INFO] [server.py:204] Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit) [2022-02-23 11:17:32] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
``` ```
@ -76,39 +76,42 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
[2022-02-23 14:57:56] [INFO] [on.py:26] Waiting for application startup. [2022-02-23 14:57:56] [INFO] [on.py:26] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-02-23 14:57:56] [INFO] [on.py:38] Application startup complete. [2022-02-23 14:57:56] [INFO] [on.py:38] Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit) INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
[2022-02-23 14:57:56] [INFO] [server.py:204] Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit) [2022-02-23 14:57:56] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
``` ```
### 4. ASR 客户端使用方法 ### 4. ASR 客户端使用方法
**注意:** 初次使用客户端时响应时间会略长 **注意:** 初次使用客户端时响应时间会略长
- 命令行 (推荐使用) - 命令行 (推荐使用)
```
paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
``` `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址
使用帮助: ```
paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
```bash
paddlespeech_client asr --help
```
参数: ```
- `server_ip`: 服务端ip地址默认: 127.0.0.1。
- `port`: 服务端口,默认: 8090。
- `input`(必须输入): 用于识别的音频文件。
- `sample_rate`: 音频采样率默认值16000。
- `lang`: 模型语言默认值zh_cn。
- `audio_format`: 音频格式默认值wav。
输出: 使用帮助:
```bash
paddlespeech_client asr --help
```
参数:
- `server_ip`: 服务端ip地址默认: 127.0.0.1。
- `port`: 服务端口,默认: 8090。
- `input`(必须输入): 用于识别的音频文件。
- `sample_rate`: 音频采样率默认值16000。
- `lang`: 模型语言默认值zh_cn。
- `audio_format`: 音频格式默认值wav。
输出:
```bash ```bash
[2022-02-23 18:11:22,819] [ INFO] - {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}} [2022-02-23 18:11:22,819] [ INFO] - {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}}
[2022-02-23 18:11:22,820] [ INFO] - time cost 0.689145 s. [2022-02-23 18:11:22,820] [ INFO] - time cost 0.689145 s.
``` ```
- Python API - Python API
```python ```python
@ -135,33 +138,35 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
### 5. TTS 客户端使用方法 ### 5. TTS 客户端使用方法
**注意:** 初次使用客户端时响应时间会略长 **注意:** 初次使用客户端时响应时间会略长
- 命令行 (推荐使用) - 命令行 (推荐使用)
```bash
paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
```
使用帮助:
```bash `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址
paddlespeech_client tts --help
``` ```bash
paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
参数: ```
- `server_ip`: 服务端ip地址默认: 127.0.0.1。 使用帮助:
- `port`: 服务端口,默认: 8090。
- `input`(必须输入): 待合成的文本。 ```bash
- `spk_id`: 说话人 id用于多说话人语音合成默认值 0。 paddlespeech_client tts --help
- `speed`: 音频速度,该值应设置在 0 到 3 之间。 默认值1.0 ```
- `volume`: 音频音量,该值应设置在 0 到 3 之间。 默认值: 1.0
- `sample_rate`: 采样率,可选 [0, 8000, 16000],默认与模型相同。 默认值0 参数:
- `output`: 输出音频的路径, 默认值None表示不保存音频到本地。 - `server_ip`: 服务端ip地址默认: 127.0.0.1。
- `port`: 服务端口,默认: 8090。
输出: - `input`(必须输入): 待合成的文本。
```bash - `spk_id`: 说话人 id用于多说话人语音合成默认值 0。
[2022-02-23 15:20:37,875] [ INFO] - {'description': 'success.'} - `speed`: 音频速度,该值应设置在 0 到 3 之间。 默认值1.0
[2022-02-23 15:20:37,875] [ INFO] - Save synthesized audio successfully on output.wav. - `volume`: 音频音量,该值应设置在 0 到 3 之间。 默认值: 1.0
[2022-02-23 15:20:37,875] [ INFO] - Audio duration: 3.612500 s. - `sample_rate`: 采样率,可选 [0, 8000, 16000],默认与模型相同。 默认值0
[2022-02-23 15:20:37,875] [ INFO] - Response time: 0.348050 s. - `output`: 输出音频的路径, 默认值None表示不保存音频到本地。
```
输出:
```bash
[2022-02-23 15:20:37,875] [ INFO] - {'description': 'success.'}
[2022-02-23 15:20:37,875] [ INFO] - Save synthesized audio successfully on output.wav.
[2022-02-23 15:20:37,875] [ INFO] - Audio duration: 3.612500 s.
[2022-02-23 15:20:37,875] [ INFO] - Response time: 0.348050 s.
```
- Python API - Python API
```python ```python
@ -197,9 +202,12 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
**注意:** 初次使用客户端时响应时间会略长 **注意:** 初次使用客户端时响应时间会略长
- 命令行 (推荐使用) - 命令行 (推荐使用)
```
paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址
```
```
paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
```
使用帮助: 使用帮助:
@ -247,15 +255,17 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
注意: 初次使用客户端时响应时间会略长 注意: 初次使用客户端时响应时间会略长
* 命令行 (推荐使用) * 命令行 (推荐使用)
``` bash `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址
paddlespeech_client vector --task spk --server_ip 127.0.0.1 --port 8090 --input 85236145389.wav
``` ``` bash
paddlespeech_client vector --task spk --server_ip 127.0.0.1 --port 8090 --input 85236145389.wav
```
* 使用帮助: * 使用帮助:
``` bash ``` bash
paddlespeech_client vector --help paddlespeech_client vector --help
``` ```
* 参数: * 参数:
* server_ip: 服务端ip地址默认: 127.0.0.1。 * server_ip: 服务端ip地址默认: 127.0.0.1。
* port: 服务端口,默认: 8090。 * port: 服务端口,默认: 8090。
@ -299,15 +309,17 @@ print(res)
注意: 初次使用客户端时响应时间会略长 注意: 初次使用客户端时响应时间会略长
* 命令行 (推荐使用) * 命令行 (推荐使用)
``` bash `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址
paddlespeech_client vector --task score --server_ip 127.0.0.1 --port 8090 --enroll 85236145389.wav --test 123456789.wav
``` ``` bash
paddlespeech_client vector --task score --server_ip 127.0.0.1 --port 8090 --enroll 85236145389.wav --test 123456789.wav
```
* 使用帮助: * 使用帮助:
``` bash ``` bash
paddlespeech_client vector --help paddlespeech_client vector --help
``` ```
* 参数: * 参数:
* server_ip: 服务端ip地址默认: 127.0.0.1。 * server_ip: 服务端ip地址默认: 127.0.0.1。
@ -357,9 +369,12 @@ print(res)
**注意:** 初次使用客户端时响应时间会略长 **注意:** 初次使用客户端时响应时间会略长
- 命令行 (推荐使用) - 命令行 (推荐使用)
``` bash
paddlespeech_client text --server_ip 127.0.0.1 --port 8090 --input "我认为跑步最重要的就是给我带来了身体健康" `127.0.0.1` 不能访问,则需要使用实际服务 IP 地址
```
``` bash
paddlespeech_client text --server_ip 127.0.0.1 --port 8090 --input "我认为跑步最重要的就是给我带来了身体健康"
```
使用帮助: 使用帮助:
@ -409,4 +424,4 @@ print(res)
通过 `paddlespeech_server stats --task vector` 获取Vector服务支持的所有模型。 通过 `paddlespeech_server stats --task vector` 获取Vector服务支持的所有模型。
### Text支持的模型 ### Text支持的模型
通过 `paddlespeech_server stats --task text` 获取Text服务支持的所有模型。 通过 `paddlespeech_server stats --task text` 获取Text服务支持的所有模型。

@ -1,4 +1,6 @@
#!/bin/bash #!/bin/bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav

@ -1,4 +1,6 @@
#!/bin/bash #!/bin/bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav --topk 1 paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav --topk 1

@ -3,7 +3,7 @@
################################################################################# #################################################################################
# SERVER SETTING # # SERVER SETTING #
################################################################################# #################################################################################
host: 127.0.0.1 host: 0.0.0.0
port: 8090 port: 8090
# The task format in the engin_list is: <speech task>_<engine type> # The task format in the engin_list is: <speech task>_<engine type>
@ -157,4 +157,4 @@ vector_python:
sample_rate: 16000 sample_rate: 16000
cfg_path: # [optional] cfg_path: # [optional]
ckpt_path: # [optional] ckpt_path: # [optional]
device: # set 'gpu:id' or 'cpu' device: # set 'gpu:id' or 'cpu'

@ -1,3 +1,4 @@
#!/bin/bash #!/bin/bash
# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav

@ -1,6 +1,6 @@
([简体中文](./README_cn.md)|English) ([简体中文](./README_cn.md)|English)
# Speech Server # Streaming ASR Server
## Introduction ## Introduction
This demo is an implementation of starting the streaming speech service and accessing the service. It can be achieved with a single command using `paddlespeech_server` and `paddlespeech_client` or a few lines of code in python. This demo is an implementation of starting the streaming speech service and accessing the service. It can be achieved with a single command using `paddlespeech_server` and `paddlespeech_client` or a few lines of code in python.
@ -15,7 +15,7 @@ It is recommended to use **paddlepaddle 2.2.1** or above.
You can choose one way from meduim and hard to install paddlespeech. You can choose one way from meduim and hard to install paddlespeech.
### 2. Prepare config File ### 2. Prepare config File
The configuration file can be found in `conf/ws_application.yaml``conf/ws_conformer_application.yaml`. The configuration file can be found in `conf/ws_application.yaml``conf/ws_conformer_wenetspeech_application.yaml`.
At present, the speech tasks integrated by the model include: DeepSpeech2 and conformer. At present, the speech tasks integrated by the model include: DeepSpeech2 and conformer.
@ -32,7 +32,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
**Note:** The default deployment of the server is on the 'CPU' device, which can be deployed on the 'GPU' by modifying the 'device' parameter in the service configuration file. **Note:** The default deployment of the server is on the 'CPU' device, which can be deployed on the 'GPU' by modifying the 'device' parameter in the service configuration file.
```bash ```bash
# in PaddleSpeech/demos/streaming_asr_server start the service # in PaddleSpeech/demos/streaming_asr_server start the service
paddlespeech_server start --config_file ./conf/ws_conformer_application.yaml paddlespeech_server start --config_file ./conf/ws_conformer_wenetspeech_application.yaml
``` ```
Usage: Usage:
@ -46,31 +46,27 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
Output: Output:
```bash ```bash
[2022-04-21 15:52:18,126] [ INFO] - create the online asr engine instance [2022-05-14 04:56:13,086] [ INFO] - create the online asr engine instance
[2022-04-21 15:52:18,127] [ INFO] - paddlespeech_server set the device: cpu [2022-05-14 04:56:13,086] [ INFO] - paddlespeech_server set the device: cpu
[2022-04-21 15:52:18,128] [ INFO] - Load the pretrained model, tag = conformer_online_multicn-zh-16k [2022-05-14 04:56:13,087] [ INFO] - Load the pretrained model, tag = conformer_online_wenetspeech-zh-16k
[2022-04-21 15:52:18,128] [ INFO] - File /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/asr1_chunk_conformer_multi_cn_ckpt_0.2.3.model.tar.gz md5 checking... [2022-05-14 04:56:13,087] [ INFO] - File /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz md5 checking...
[2022-04-21 15:52:18,727] [ INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k [2022-05-14 04:56:17,542] [ INFO] - Use pretrained model stored in: /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1. 0.0a.model.tar
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/model.yaml [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/model.yaml
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams
[2022-04-21 15:52:19,446] [ INFO] - start to create the stream conformer asr engine [2022-05-14 04:56:17,852] [ INFO] - start to create the stream conformer asr engine
[2022-04-21 15:52:19,473] [ INFO] - model name: conformer_online [2022-05-14 04:56:17,863] [ INFO] - model name: conformer_online
[2022-04-21 15:52:21,731] [ INFO] - create the transformer like model success [2022-05-14 04:56:22,756] [ INFO] - create the transformer like model success
[2022-04-21 15:52:21,733] [ INFO] - Initialize ASR server engine successfully. [2022-05-14 04:56:22,758] [ INFO] - Initialize ASR server engine successfully.
INFO: Started server process [11173] INFO: Started server process [4242]
[2022-04-21 15:52:21] [INFO] [server.py:75] Started server process [11173] [2022-05-14 04:56:22] [INFO] [server.py:75] Started server process [4242]
INFO: Waiting for application startup. INFO: Waiting for application startup.
[2022-04-21 15:52:21] [INFO] [on.py:45] Waiting for application startup. [2022-05-14 04:56:22] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-04-21 15:52:21] [INFO] [on.py:59] Application startup complete. [2022-05-14 04:56:22] [INFO] [on.py:59] Application startup complete.
/home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1460: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10. INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
infos = await tasks.gather(*fs, loop=self) [2022-05-14 04:56:22] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
/home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1518: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
await tasks.sleep(0, loop=self)
INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
[2022-04-21 15:52:21] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
``` ```
- Python API - Python API
@ -81,37 +77,33 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
server_executor = ServerExecutor() server_executor = ServerExecutor()
server_executor( server_executor(
config_file="./conf/ws_conformer_application.yaml", config_file="./conf/ws_conformer_wenetspeech_application.yaml",
log_file="./log/paddlespeech.log") log_file="./log/paddlespeech.log")
``` ```
Output: Output:
```bash ```bash
[2022-04-21 15:52:18,126] [ INFO] - create the online asr engine instance [2022-05-14 04:56:13,086] [ INFO] - create the online asr engine instance
[2022-04-21 15:52:18,127] [ INFO] - paddlespeech_server set the device: cpu [2022-05-14 04:56:13,086] [ INFO] - paddlespeech_server set the device: cpu
[2022-04-21 15:52:18,128] [ INFO] - Load the pretrained model, tag = conformer_online_multicn-zh-16k [2022-05-14 04:56:13,087] [ INFO] - Load the pretrained model, tag = conformer_online_wenetspeech-zh-16k
[2022-04-21 15:52:18,128] [ INFO] - File /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/asr1_chunk_conformer_multi_cn_ckpt_0.2.3.model.tar.gz md5 checking... [2022-05-14 04:56:13,087] [ INFO] - File /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz md5 checking...
[2022-04-21 15:52:18,727] [ INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k [2022-05-14 04:56:17,542] [ INFO] - Use pretrained model stored in: /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1. 0.0a.model.tar
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/model.yaml [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/model.yaml
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams
[2022-04-21 15:52:19,446] [ INFO] - start to create the stream conformer asr engine [2022-05-14 04:56:17,852] [ INFO] - start to create the stream conformer asr engine
[2022-04-21 15:52:19,473] [ INFO] - model name: conformer_online [2022-05-14 04:56:17,863] [ INFO] - model name: conformer_online
[2022-04-21 15:52:21,731] [ INFO] - create the transformer like model success [2022-05-14 04:56:22,756] [ INFO] - create the transformer like model success
[2022-04-21 15:52:21,733] [ INFO] - Initialize ASR server engine successfully. [2022-05-14 04:56:22,758] [ INFO] - Initialize ASR server engine successfully.
INFO: Started server process [11173] INFO: Started server process [4242]
[2022-04-21 15:52:21] [INFO] [server.py:75] Started server process [11173] [2022-05-14 04:56:22] [INFO] [server.py:75] Started server process [4242]
INFO: Waiting for application startup. INFO: Waiting for application startup.
[2022-04-21 15:52:21] [INFO] [on.py:45] Waiting for application startup. [2022-05-14 04:56:22] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-04-21 15:52:21] [INFO] [on.py:59] Application startup complete. [2022-05-14 04:56:22] [INFO] [on.py:59] Application startup complete.
/home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1460: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10. INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
infos = await tasks.gather(*fs, loop=self) [2022-05-14 04:56:22] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
/home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1518: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
await tasks.sleep(0, loop=self)
INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
[2022-04-21 15:52:21] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
``` ```
@ -119,9 +111,12 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
**Note:** The response time will be slightly longer when using the client for the first time **Note:** The response time will be slightly longer when using the client for the first time
- Command Line (Recommended) - Command Line (Recommended)
```
paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input ./zh.wav If `127.0.0.1` is not accessible, you need to use the actual service IP address.
```
```
paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
```
Usage: Usage:
@ -374,10 +369,13 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
### 2. Client usage ### 2. Client usage
**Note** The response time will be slightly longer when using the client for the first time **Note** The response time will be slightly longer when using the client for the first time
- Command line - Command line:
```
paddlespeech_client text --server_ip 127.0.0.1 --port 8190 --input "我认为跑步最重要的就是给我带来了身体健康" If `127.0.0.1` is not accessible, you need to use the actual service IP address.
```
```
paddlespeech_client text --server_ip 127.0.0.1 --port 8190 --input "我认为跑步最重要的就是给我带来了身体健康"
```
Output Output
``` ```
@ -419,6 +417,9 @@ bash server.sh
### 2. Call client ### 2. Call client
- Command line - Command line
If `127.0.0.1` is not accessible, you need to use the actual service IP address.
``` ```
paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --input ./zh.wav paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --input ./zh.wav
``` ```
@ -494,6 +495,9 @@ bash server.sh
``` ```
- Use script - Use script
If `127.0.0.1` is not accessible, you need to use the actual service IP address.
``` ```
python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --wavfile ./zh.wav python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --wavfile ./zh.wav
``` ```

@ -1,6 +1,6 @@
([English](./README.md)|中文) ([English](./README.md)|中文)
# 语音服务 # 流式语音识别服务
## 介绍 ## 介绍
这个demo是一个启动流式语音服务和访问服务的实现。 它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。 这个demo是一个启动流式语音服务和访问服务的实现。 它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。
@ -19,11 +19,11 @@
流式ASR的服务启动脚本和服务测试脚本存放在 `PaddleSpeech/demos/streaming_asr_server` 目录。 流式ASR的服务启动脚本和服务测试脚本存放在 `PaddleSpeech/demos/streaming_asr_server` 目录。
下载好 `PaddleSpeech` 之后,进入到 `PaddleSpeech/demos/streaming_asr_server` 目录。 下载好 `PaddleSpeech` 之后,进入到 `PaddleSpeech/demos/streaming_asr_server` 目录。
配置文件可参见该目录下 `conf/ws_application.yaml``conf/ws_conformer_application.yaml` 。 配置文件可参见该目录下 `conf/ws_application.yaml``conf/ws_conformer_wenetspeech_application.yaml` 。
目前服务集成的模型有: DeepSpeech2和 conformer模型对应的配置文件如下 目前服务集成的模型有: DeepSpeech2和 conformer模型对应的配置文件如下
* DeepSpeech: `conf/ws_application.yaml` * DeepSpeech: `conf/ws_application.yaml`
* conformer: `conf/ws_conformer_application.yaml` * conformer: `conf/ws_conformer_wenetspeech_application.yaml`
@ -39,7 +39,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
**注意:** 默认部署在 `cpu` 设备上,可以通过修改服务配置文件中 `device` 参数部署在 `gpu` 上。 **注意:** 默认部署在 `cpu` 设备上,可以通过修改服务配置文件中 `device` 参数部署在 `gpu` 上。
```bash ```bash
# 在 PaddleSpeech/demos/streaming_asr_server 目录启动服务 # 在 PaddleSpeech/demos/streaming_asr_server 目录启动服务
paddlespeech_server start --config_file ./conf/ws_conformer_application.yaml paddlespeech_server start --config_file ./conf/ws_conformer_wenetspeech_application.yaml
``` ```
使用方法: 使用方法:
@ -53,31 +53,27 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
输出: 输出:
```bash ```bash
[2022-04-21 15:52:18,126] [ INFO] - create the online asr engine instance [2022-05-14 04:56:13,086] [ INFO] - create the online asr engine instance
[2022-04-21 15:52:18,127] [ INFO] - paddlespeech_server set the device: cpu [2022-05-14 04:56:13,086] [ INFO] - paddlespeech_server set the device: cpu
[2022-04-21 15:52:18,128] [ INFO] - Load the pretrained model, tag = conformer_online_multicn-zh-16k [2022-05-14 04:56:13,087] [ INFO] - Load the pretrained model, tag = conformer_online_wenetspeech-zh-16k
[2022-04-21 15:52:18,128] [ INFO] - File /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/asr1_chunk_conformer_multi_cn_ckpt_0.2.3.model.tar.gz md5 checking... [2022-05-14 04:56:13,087] [ INFO] - File /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz md5 checking...
[2022-04-21 15:52:18,727] [ INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k [2022-05-14 04:56:17,542] [ INFO] - Use pretrained model stored in: /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1. 0.0a.model.tar
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/model.yaml [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/model.yaml
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams
[2022-04-21 15:52:19,446] [ INFO] - start to create the stream conformer asr engine [2022-05-14 04:56:17,852] [ INFO] - start to create the stream conformer asr engine
[2022-04-21 15:52:19,473] [ INFO] - model name: conformer_online [2022-05-14 04:56:17,863] [ INFO] - model name: conformer_online
[2022-04-21 15:52:21,731] [ INFO] - create the transformer like model success [2022-05-14 04:56:22,756] [ INFO] - create the transformer like model success
[2022-04-21 15:52:21,733] [ INFO] - Initialize ASR server engine successfully. [2022-05-14 04:56:22,758] [ INFO] - Initialize ASR server engine successfully.
INFO: Started server process [11173] INFO: Started server process [4242]
[2022-04-21 15:52:21] [INFO] [server.py:75] Started server process [11173] [2022-05-14 04:56:22] [INFO] [server.py:75] Started server process [4242]
INFO: Waiting for application startup. INFO: Waiting for application startup.
[2022-04-21 15:52:21] [INFO] [on.py:45] Waiting for application startup. [2022-05-14 04:56:22] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-04-21 15:52:21] [INFO] [on.py:59] Application startup complete. [2022-05-14 04:56:22] [INFO] [on.py:59] Application startup complete.
/home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1460: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10. INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
infos = await tasks.gather(*fs, loop=self) [2022-05-14 04:56:22] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
/home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1518: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
await tasks.sleep(0, loop=self)
INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
[2022-04-21 15:52:21] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
``` ```
- Python API - Python API
@ -88,43 +84,42 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
server_executor = ServerExecutor() server_executor = ServerExecutor()
server_executor( server_executor(
config_file="./conf/ws_conformer_application.yaml", config_file="./conf/ws_conformer_wenetspeech_application",
log_file="./log/paddlespeech.log") log_file="./log/paddlespeech.log")
``` ```
输出: 输出:
```bash ```bash
[2022-04-21 15:52:18,126] [ INFO] - create the online asr engine instance [2022-05-14 04:56:13,086] [ INFO] - create the online asr engine instance
[2022-04-21 15:52:18,127] [ INFO] - paddlespeech_server set the device: cpu [2022-05-14 04:56:13,086] [ INFO] - paddlespeech_server set the device: cpu
[2022-04-21 15:52:18,128] [ INFO] - Load the pretrained model, tag = conformer_online_multicn-zh-16k [2022-05-14 04:56:13,087] [ INFO] - Load the pretrained model, tag = conformer_online_wenetspeech-zh-16k
[2022-04-21 15:52:18,128] [ INFO] - File /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/asr1_chunk_conformer_multi_cn_ckpt_0.2.3.model.tar.gz md5 checking... [2022-05-14 04:56:13,087] [ INFO] - File /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz md5 checking...
[2022-04-21 15:52:18,727] [ INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k [2022-05-14 04:56:17,542] [ INFO] - Use pretrained model stored in: /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1. 0.0a.model.tar
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/model.yaml [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/model.yaml
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams [2022-05-14 04:56:17,543] [ INFO] - /root/.paddlespeech/models/conformer_online_wenetspeech-zh-16k/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar/exp/ chunk_conformer/checkpoints/avg_10.pdparams
[2022-04-21 15:52:19,446] [ INFO] - start to create the stream conformer asr engine [2022-05-14 04:56:17,852] [ INFO] - start to create the stream conformer asr engine
[2022-04-21 15:52:19,473] [ INFO] - model name: conformer_online [2022-05-14 04:56:17,863] [ INFO] - model name: conformer_online
[2022-04-21 15:52:21,731] [ INFO] - create the transformer like model success [2022-05-14 04:56:22,756] [ INFO] - create the transformer like model success
[2022-04-21 15:52:21,733] [ INFO] - Initialize ASR server engine successfully. [2022-05-14 04:56:22,758] [ INFO] - Initialize ASR server engine successfully.
INFO: Started server process [11173] INFO: Started server process [4242]
[2022-04-21 15:52:21] [INFO] [server.py:75] Started server process [11173] [2022-05-14 04:56:22] [INFO] [server.py:75] Started server process [4242]
INFO: Waiting for application startup. INFO: Waiting for application startup.
[2022-04-21 15:52:21] [INFO] [on.py:45] Waiting for application startup. [2022-05-14 04:56:22] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-04-21 15:52:21] [INFO] [on.py:59] Application startup complete. [2022-05-14 04:56:22] [INFO] [on.py:59] Application startup complete.
/home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1460: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10. INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
infos = await tasks.gather(*fs, loop=self) [2022-05-14 04:56:22] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
/home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1518: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
await tasks.sleep(0, loop=self)
INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
[2022-04-21 15:52:21] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
``` ```
### 4. ASR 客户端使用方法 ### 4. ASR 客户端使用方法
**注意:** 初次使用客户端时响应时间会略长 **注意:** 初次使用客户端时响应时间会略长
- 命令行 (推荐使用) - 命令行 (推荐使用)
`127.0.0.1` 不能访问,则需要使用实际服务 IP 地址
``` ```
paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input ./zh.wav paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
``` ```
@ -384,6 +379,9 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
**注意:** 初次使用客户端时响应时间会略长 **注意:** 初次使用客户端时响应时间会略长
- 命令行 (推荐使用) - 命令行 (推荐使用)
`127.0.0.1` 不能访问,则需要使用实际服务 IP 地址
``` ```
paddlespeech_client text --server_ip 127.0.0.1 --port 8190 --input "我认为跑步最重要的就是给我带来了身体健康" paddlespeech_client text --server_ip 127.0.0.1 --port 8190 --input "我认为跑步最重要的就是给我带来了身体健康"
``` ```
@ -427,6 +425,9 @@ bash server.sh
### 2. 调用服务 ### 2. 调用服务
- 使用命令行: - 使用命令行:
`127.0.0.1` 不能访问,则需要使用实际服务 IP 地址
``` ```
paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --input ./zh.wav paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --input ./zh.wav
``` ```
@ -502,6 +503,9 @@ bash server.sh
``` ```
- 使用脚本调用 - 使用脚本调用
`127.0.0.1` 不能访问,则需要使用实际服务 IP 地址
``` ```
python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --wavfile ./zh.wav python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --wavfile ./zh.wav
``` ```

@ -29,7 +29,8 @@ asr_online:
cfg_path: cfg_path:
decode_method: decode_method:
force_yes: True force_yes: True
device: cpu # cpu or gpu:id device: 'cpu' # cpu or gpu:id
decode_method: "attention_rescoring"
am_predictor_conf: am_predictor_conf:
device: # set 'gpu:id' or 'cpu' device: # set 'gpu:id' or 'cpu'
switch_ir_optim: True switch_ir_optim: True
@ -42,4 +43,4 @@ asr_online:
window_ms: 25 # ms window_ms: 25 # ms
shift_ms: 10 # ms shift_ms: 10 # ms
sample_rate: 16000 sample_rate: 16000
sample_width: 2 sample_width: 2

@ -2,9 +2,11 @@
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
# read the wav and pass it to only streaming asr service # read the wav and pass it to only streaming asr service
# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
# python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --wavfile ./zh.wav # python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --wavfile ./zh.wav
paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --input ./zh.wav paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --input ./zh.wav
# read the wav and call streaming and punc service # read the wav and call streaming and punc service
# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
# python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --wavfile ./zh.wav # python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --wavfile ./zh.wav
paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --input ./zh.wav paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --input ./zh.wav

@ -13,6 +13,9 @@
# limitations under the License. # limitations under the License.
#!/usr/bin/python #!/usr/bin/python
# -*- coding: UTF-8 -*- # -*- coding: UTF-8 -*-
# script for calc RTF: grep -rn RTF log.txt | awk '{print $NF}' | awk -F "=" '{sum += $NF} END {print "all time",sum, "audio num", NR, "RTF", sum/NR}'
import argparse import argparse
import asyncio import asyncio
import codecs import codecs
@ -40,7 +43,7 @@ def main(args):
result = result["result"] result = result["result"]
logger.info(f"asr websocket client finished : {result}") logger.info(f"asr websocket client finished : {result}")
# support to process batch audios from wav.scp # support to process batch audios from wav.scp
if args.wavscp and os.path.exists(args.wavscp): if args.wavscp and os.path.exists(args.wavscp):
logging.info(f"start to process the wavscp: {args.wavscp}") logging.info(f"start to process the wavscp: {args.wavscp}")
with codecs.open(args.wavscp, 'r', encoding='utf-8') as f,\ with codecs.open(args.wavscp, 'r', encoding='utf-8') as f,\

@ -63,8 +63,8 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
[2022-04-24 20:05:28] [INFO] [on.py:45] Waiting for application startup. [2022-04-24 20:05:28] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-04-24 20:05:28] [INFO] [on.py:59] Application startup complete. [2022-04-24 20:05:28] [INFO] [on.py:59] Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
[2022-04-24 20:05:28] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) [2022-04-24 20:05:28] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
``` ```
@ -90,8 +90,8 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
[2022-04-24 21:00:17] [INFO] [on.py:45] Waiting for application startup. [2022-04-24 21:00:17] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-04-24 21:00:17] [INFO] [on.py:59] Application startup complete. [2022-04-24 21:00:17] [INFO] [on.py:59] Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
[2022-04-24 21:00:17] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) [2022-04-24 21:00:17] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
``` ```
@ -101,6 +101,8 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
Access http streaming TTS service: Access http streaming TTS service:
If `127.0.0.1` is not accessible, you need to use the actual service IP address.
```bash ```bash
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
``` ```
@ -198,8 +200,8 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
[2022-04-27 10:18:09] [INFO] [on.py:45] Waiting for application startup. [2022-04-27 10:18:09] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-04-27 10:18:09] [INFO] [on.py:59] Application startup complete. [2022-04-27 10:18:09] [INFO] [on.py:59] Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
[2022-04-27 10:18:09] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) [2022-04-27 10:18:09] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
``` ```
@ -226,8 +228,8 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
[2022-04-27 10:20:16] [INFO] [on.py:45] Waiting for application startup. [2022-04-27 10:20:16] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-04-27 10:20:16] [INFO] [on.py:59] Application startup complete. [2022-04-27 10:20:16] [INFO] [on.py:59] Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
[2022-04-27 10:20:16] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) [2022-04-27 10:20:16] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
``` ```
@ -236,6 +238,8 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
Access websocket streaming TTS service: Access websocket streaming TTS service:
If `127.0.0.1` is not accessible, you need to use the actual service IP address.
```bash ```bash
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
``` ```

@ -62,8 +62,8 @@
[2022-04-24 20:05:28] [INFO] [on.py:45] Waiting for application startup. [2022-04-24 20:05:28] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-04-24 20:05:28] [INFO] [on.py:59] Application startup complete. [2022-04-24 20:05:28] [INFO] [on.py:59] Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
[2022-04-24 20:05:28] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) [2022-04-24 20:05:28] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
``` ```
@ -89,8 +89,8 @@
[2022-04-24 21:00:17] [INFO] [on.py:45] Waiting for application startup. [2022-04-24 21:00:17] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-04-24 21:00:17] [INFO] [on.py:59] Application startup complete. [2022-04-24 21:00:17] [INFO] [on.py:59] Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
[2022-04-24 21:00:17] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) [2022-04-24 21:00:17] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
``` ```
@ -100,6 +100,8 @@
访问 http 流式TTS服务 访问 http 流式TTS服务
`127.0.0.1` 不能访问,则需要使用实际服务 IP 地址
```bash ```bash
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
``` ```
@ -198,8 +200,8 @@
[2022-04-27 10:18:09] [INFO] [on.py:45] Waiting for application startup. [2022-04-27 10:18:09] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-04-27 10:18:09] [INFO] [on.py:59] Application startup complete. [2022-04-27 10:18:09] [INFO] [on.py:59] Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
[2022-04-27 10:18:09] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) [2022-04-27 10:18:09] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
``` ```
@ -226,8 +228,8 @@
[2022-04-27 10:20:16] [INFO] [on.py:45] Waiting for application startup. [2022-04-27 10:20:16] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-04-27 10:20:16] [INFO] [on.py:59] Application startup complete. [2022-04-27 10:20:16] [INFO] [on.py:59] Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
[2022-04-27 10:20:16] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) [2022-04-27 10:20:16] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
``` ```
@ -236,6 +238,8 @@
访问 websocket 流式TTS服务 访问 websocket 流式TTS服务
`127.0.0.1` 不能访问,则需要使用实际服务 IP 地址
```bash ```bash
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
``` ```

@ -3,7 +3,7 @@
################################################################################# #################################################################################
# SERVER SETTING # # SERVER SETTING #
################################################################################# #################################################################################
host: 127.0.0.1 host: 0.0.0.0
port: 8092 port: 8092
# The task format in the engin_list is: <speech task>_<engine type> # The task format in the engin_list is: <speech task>_<engine type>

@ -1,7 +1,9 @@
#!/bin/bash #!/bin/bash
# http client test # http client test
# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
# websocket client test # websocket client test
#paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav # If `127.0.0.1` is not accessible, you need to use the actual service IP address.
# paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav

@ -0,0 +1,96 @@
([简体中文](./PPASR_cn.md)|English)
# PP-ASR
## Catalogue
- [1. Introduction](#1)
- [2. Characteristic](#2)
- [3. Tutorials](#3)
- [3.1 Pre-trained Models](#31)
- [3.2 Training](#32)
- [3.3 Inference](#33)
- [3.4 Service Deployment](#33)
- [3.5 Customized Auto Speech Recognition and Deployment](#33)
- [4. Quick Start](#4)
<a name="1"></a>
## 1. Introduction
PP-ASR is a tool to provide ASR(Automatic speech recognition) function. It provides a variety of Chinese and English models and supports model training. It also supports model inference using the command line. In addition, PP-ASR supports the deployment of streaming models and customized ASR.
<a name="2"></a>
## 2. Characteristic
The basic process of ASR is shown in the figure below:
<center><img src=https://user-images.githubusercontent.com/87408988/168259962-cbe2008b-47b6-443d-9566-d77a5ca2eb25.png width="800" ></center>
The main characteristics of PP-ASR are shown below:
- Provides pre-trained models on Chinese/English open source datasets: aishell(Chinese), wenetspeech(Chinese) and librispeech(English). The models include deepspeech2 and conformer/transformer.
- Support model training on Chinese/English datasets.
- Support model inference using the command line. You can use to use `paddlespeech asr --model xxx --input xxx.wav` to use the pre-trained model to do model inference.
- Support deployment of streaming ASR server. Besides ASR function, the server supports timestamp function.
- Support customized auto speech recognition and deployment.
<a name="3"></a>
## 3. Tutorials
<a name="31"></a>
## 3.1 Pre-trained Models
The support pre-trained model list: [released_model](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md).
The model with good effect are Ds2 Online Wenetspeech ASR0 Model and Conformer Online Wenetspeech ASR1 Model. Both two models support streaming ASR.
For more information about model design, you can refer to the aistudio tutorial:
- [Deepspeech2](https://aistudio.baidu.com/aistudio/projectdetail/3866807)
- [Transformer](https://aistudio.baidu.com/aistudio/projectdetail/3470110)
<a name="32"></a>
## 3.2 Training
The referenced script for model training is stored in [examples](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples) and stored according to "examples/dataset/model". The dataset mainly supports aishell and librispeech. The model supports deepspeech2 and u2(conformer/transformer).
The specific steps of executing the script are recorded in `run.sh`.
For more information, you can refer to [asr1](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell/asr1)
<a name="33"></a>
## 3.3 Inference
PP-ASR supports use `paddlespeech asr --model xxx --input xxx.wav` to use the pre-trained model to do model inference after install `paddlespeech` by `pip install paddlespeech`.
Specific supported functions include:
- Prediction of single audio
- Use the pipe to predict multiple audio
- Support RTF calculation
For specific usage, please refer to: [speech_recognition](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speech_recognition/README_cn.md)
<a name="34"></a>
## 3.4 Service Deployment
PP-ASR supports the service deployment of streaming ASR. Support the simultaneous use of speech recognition and punctuation processing.
Demo of ASR Server: [streaming_asr_server](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/streaming_asr_server)
![image](https://user-images.githubusercontent.com/87408988/168255342-1fc790c0-16f4-4540-a861-db239076727c.png)
Display of using ASR server on Web page: [streaming_asr_demo_video](https://paddlespeech.readthedocs.io/en/latest/streaming_asr_demo_video.html)
For more information about service deployment, you can refer to the aistudio tutorial:
- [Streaming service - model part](https://aistudio.baidu.com/aistudio/projectdetail/3839884)
- [Streaming service](https://aistudio.baidu.com/aistudio/projectdetail/4017905)
<a name="35"></a>
## 3.5 Customized Auto Speech Recognition and Deployment
For customized auto speech recognition and deployment, PP-ASR provides feature extraction(fbank) => Inference modelScoring Library=> C++ program of TLGWFST, token, lexion, grammer). For specific usage, please refer to: [speechx](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/speechx)
If you want to quickly use it, you can refer to [custom_streaming_asr](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/custom_streaming_asr/README_cn.md)
For more information about customized auto speech recognition and deployment, you can refer to the aistudio tutorial:
- [Customized Auto Speech Recognition](https://aistudio.baidu.com/aistudio/projectdetail/4021561)
<a name="4"></a>
## 4. Quick Start
To use PP-ASR, you can see here [install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md), It supplies three methods to install `paddlespeech`, which are **Easy**, **Medium** and **Hard**. If you want to experience the inference function of paddlespeech, you can use **Easy** installation method.

@ -0,0 +1,96 @@
(简体中文|[English](./PPASR.md))
# PP-ASR
## 目录
- [1. 简介](#1)
- [2. 特点](#2)
- [3. 使用教程](#3)
- [3.1 预训练模型](#31)
- [3.2 模型训练](#32)
- [3.3 模型推理](#33)
- [3.4 服务部署](#33)
- [3.5 支持个性化场景部署](#33)
- [4. 快速开始](#4)
<a name="1"></a>
## 1. 简介
PP-ASR 是一个 提供 ASR 功能的工具。其提供了多种中文和英文的模型,支持模型的训练,并且支持使用命令行的方式进行模型的推理。 PP-ASR 也支持流式模型的部署,以及个性化场景的部署。
<a name="2"></a>
## 2. 特点
语音识别的基本流程如下图所示:
<center><img src=https://user-images.githubusercontent.com/87408988/168259962-cbe2008b-47b6-443d-9566-d77a5ca2eb25.png width="800" ></center>
PP-ASR 的主要特点如下:
- 提供在中/英文开源数据集 aishell 中文wenetspeech中文librispeech (英文)上的预训练模型。模型包含 deepspeech2 模型以及 conformer/transformer 模型。
- 支持中/英文的模型训练功能。
- 支持命令行方式的模型推理,可使用 `paddlespeech asr --model xxx --input xxx.wav` 方式调用各个预训练模型进行推理。
- 支持流式 ASR 的服务部署,也支持输出时间戳。
- 支持个性化场景的部署。
<a name="3"></a>
## 3. 使用教程
<a name="31"></a>
## 3.1 预训练模型
支持的预训练模型列表:[released_model](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md)。
其中效果较好的模型为 Ds2 Online Wenetspeech ASR0 Model 以及 Conformer Online Wenetspeech ASR1 Model。 两个模型都支持流式 ASR。
更多关于模型设计的部分,可以参考 AIStudio 教程:
- [Deepspeech2](https://aistudio.baidu.com/aistudio/projectdetail/3866807)
- [Transformer](https://aistudio.baidu.com/aistudio/projectdetail/3470110)
<a name="32"></a>
## 3.2 模型训练
模型的训练的参考脚本存放在 [examples](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples) 中,并按照 `examples/数据集/模型` 存放,数据集主要支持 aishell 和 librispeech模型支持 deepspeech2 模型和 u2 (conformer/transformer) 模型。
具体的执行脚本的步骤记录在 `run.sh` 当中。具体可参考: [asr1](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell/asr1)
<a name="33"></a>
## 3.3 模型推理
PP-ASR 支持在使用`pip install paddlespeech`后 使用命令行的方式来使用预训练模型进行推理。
具体支持的功能包括:
- 对单条音频进行预测
- 使用管道的方式对多条音频进行预测
- 支持 RTF 的计算
具体的使用方式可以参考: [speech_recognition](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speech_recognition/README_cn.md)
<a name="34"></a>
## 3.4 服务部署
PP-ASR 支持流式ASR的服务部署。支持 语音识别 + 标点处理两个功能同时使用。
server 的 demo [streaming_asr_server](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/streaming_asr_server)
![image](https://user-images.githubusercontent.com/87408988/168255342-1fc790c0-16f4-4540-a861-db239076727c.png)
网页上使用 asr server 的效果展示:[streaming_asr_demo_video](https://paddlespeech.readthedocs.io/en/latest/streaming_asr_demo_video.html)
关于服务部署方面的更多资料,可以参考 AIStudio 教程:
- [流式服务-模型部分](https://aistudio.baidu.com/aistudio/projectdetail/3839884)
- [流式服务](https://aistudio.baidu.com/aistudio/projectdetail/4017905)
<a name="35"></a>
## 3.5 支持个性化场景部署
针对个性化场景部署提供了特征提取fbank => 推理模型(打分库)=> TLGWFST token, lexion, grammer的 C++ 程序。具体参考 [speechx](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/speechx)。
如果想快速了解和使用,可以参考: [custom_streaming_asr](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/custom_streaming_asr/README_cn.md)
关于支持个性化场景部署的更多资料,可以参考 AIStudio 教程:
- [定制化识别](https://aistudio.baidu.com/aistudio/projectdetail/4021561)
<a name="4"></a>
## 4. 快速开始
关于如果使用 PP-ASR可以看这里的 [install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md),其中提供了 **简单**、**中等**、**困难** 三种安装方式。如果想体验 paddlespeech 的推理功能,可以用 **简单** 安装方式。

@ -54,6 +54,7 @@ Contents
:caption: Demos :caption: Demos
demo_video demo_video
streaming_asr_demo_video
tts_demo_video tts_demo_video
streaming_tts_demo_video streaming_tts_demo_video

@ -139,7 +139,7 @@ pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple
To avoid the trouble of environment setup, running in a Docker container is highly recommended. Otherwise, if you work on `Ubuntu` with `root` privilege, you can still complete the installation. To avoid the trouble of environment setup, running in a Docker container is highly recommended. Otherwise, if you work on `Ubuntu` with `root` privilege, you can still complete the installation.
### Choice 1: Running in Docker Container (Recommend) ### Choice 1: Running in Docker Container (Recommend)
Docker is an open-source tool to build, ship, and run distributed applications in an isolated environment. A Docker image for this project has been provided in [hub.docker.com](https://hub.docker.com) with all the dependencies installed. This Docker image requires the support of NVIDIA GPU, so please make sure its availability and the [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) has been installed. Docker is an open-source tool to build, ship, and run distributed applications in an isolated environment. A Docker image for this project has been provided in [hub.docker.com](https://hub.docker.com) with dependencies of cuda and cudnn installed. This Docker image requires the support of NVIDIA GPU, so please make sure its availability and the [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) has been installed.
Take several steps to launch the Docker image: Take several steps to launch the Docker image:
- Download the Docker image - Download the Docker image

@ -0,0 +1,10 @@
Streaming ASR Demo Video
==================
.. raw:: html
<video controls width="1024">
<source src="https://paddlespeech.bj.bcebos.com/demos/asr_demos/streaming_ASR_slice.mp4" type="video/mp4">
Sorry, your browser doesn't support embedded videos.
</video>

@ -1,5 +1,7 @@
([简体中文](./PPTTS_cn.md)|English) ([简体中文](./PPTTS_cn.md)|English)
# PPTTS
- [1. Introduction](#1) - [1. Introduction](#1)
- [2. Characteristic](#2) - [2. Characteristic](#2)
- [3. Benchmark](#3) - [3. Benchmark](#3)

@ -0,0 +1,78 @@
([简体中文](./PPVPR_cn.md)|English)
# PP-VPR
## Catalogue
- [1. Introduction](#1)
- [2. Characteristic](#2)
- [3. Tutorials](#3)
- [3.1 Pre-trained Models](#31)
- [3.2 Training](#32)
- [3.3 Inference](#33)
- [3.4 Service Deployment](#33)
- [4. Quick Start](#4)
<a name="1"></a>
## 1. Introduction
PP-VPR is a tool that provides voice print feature extraction and retrieval functions. Provides a variety of quasi-industrial solutions, easy to solve the difficult problems in complex scenes, support the use of command line model reasoning. PP-VPR also supports interface operations and container deployment.
<a name="2"></a>
## 2. Characteristic
The basic process of VPR is shown in the figure below:
<center><img src=https://ai-studio-static-online.cdn.bcebos.com/3aed59b8c8874046ad19fe583d15a8dd53c5b33e68db4383b79706e5add5c2d0 width="800" ></center>
The main characteristics of PP-ASR are shown below:
- Provides pre-trained models on Chinese open source datasets: VoxCeleb(English). The models include ecapa-tdnn.
- Support model training/evaluation.
- Support model inference using the command line. You can use to use `paddlespeech vector --task spk --input xxx.wav` to use the pre-trained model to do model inference.
- Support interface operations and container deployment.
<a name="3"></a>
## 3. Tutorials
<a name="31"></a>
## 3.1 Pre-trained Models
The support pre-trained model list: [released_model](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md).
For more information about model design, you can refer to the aistudio tutorial:
- [ecapa-tdnn](https://aistudio.baidu.com/aistudio/projectdetail/4027664)
<a name="32"></a>
## 3.2 Training
The referenced script for model training is stored in [examples](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples) and stored according to "examples/dataset/model". The dataset mainly supports VoxCeleb. The model supports ecapa-tdnn.
The specific steps of executing the script are recorded in `run.sh`.
For more information, you can refer to [sv0](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/voxceleb/sv0)
<a name="33"></a>
## 3.3 Inference
PP-VPR supports use `paddlespeech vector --task spk --input xxx.wav` to use the pre-trained model to do inference after install `paddlespeech` by `pip install paddlespeech`.
Specific supported functions include:
- Prediction of single audio
- Score the similarity between the two audios
- Support RTF calculation
For specific usage, please refer to: [speaker_verification](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speaker_verification/README_cn.md)
<a name="34"></a>
## 3.4 Service Deployment
PP-VPR supports Docker containerized service deployment. Through Milvus, MySQL performs high performance library building search.
Demo of VPR Server: [audio_searching](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/audio_searching)
![arch](https://ai-studio-static-online.cdn.bcebos.com/7b32dd0200084866863095677e8b40d3b725b867d2e6439e9cf21514e235dfd5)
For more information about service deployment, you can refer to the aistudio tutorial:
- [speaker_recognition](https://aistudio.baidu.com/aistudio/projectdetail/4027664)
<a name="4"></a>
## 4. Quick Start
To use PP-VPR, you can see here [install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md), It supplies three methods to install `paddlespeech`, which are **Easy**, **Medium** and **Hard**. If you want to experience the inference function of paddlespeech, you can use **Easy** installation method.

@ -0,0 +1,79 @@
(简体中文|[English](./PPVPR.md))
# PP-VPR
## 目录
- [1. 简介](#1)
- [2. 特点](#2)
- [3. 使用教程](#3)
- [3.1 预训练模型](#31)
- [3.2 模型训练](#32)
- [3.3 模型推理](#33)
- [3.4 服务部署](#33)
- [4. 快速开始](#4)
<a name="1"></a>
## 1. 简介
PP-VPR 是一个 提供声纹特征提取,检索功能的工具。提供了多种准工业化的方案,轻松搞定复杂场景中的难题,支持使用命令行的方式进行模型的推理。 PP-VPR 也支持界面化的操作,容器化的部署。
<a name="2"></a>
## 2. 特点
VPR 的基本流程如下图所示:
<center><img src=https://ai-studio-static-online.cdn.bcebos.com/3aed59b8c8874046ad19fe583d15a8dd53c5b33e68db4383b79706e5add5c2d0 width="800" ></center>
PP-VPR 的主要特点如下:
- 提供在英文开源数据集 VoxCeleb英文上的预训练模型ecapa-tdnn。
- 支持模型训练评估功能。
- 支持命令行方式的模型推理,可使用 `paddlespeech vector --task spk --input xxx.wav` 方式调用预训练模型进行推理。
- 支持 VPR 的服务容器化部署,界面化操作。
<a name="3"></a>
## 3. 使用教程
<a name="31"></a>
## 3.1 预训练模型
支持的预训练模型列表:[released_model](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md)。
更多关于模型设计的部分,可以参考 AIStudio 教程:
- [ecapa-tdnn](https://aistudio.baidu.com/aistudio/projectdetail/4027664)
<a name="32"></a>
## 3.2 模型训练
模型的训练的参考脚本存放在 [examples](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples) 中,并按照 `examples/数据集/模型` 存放,数据集主要支持 VoxCeleb模型支持 ecapa-tdnn 模型。
具体的执行脚本的步骤记录在 `run.sh` 当中。具体可参考: [sv0](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/voxceleb/sv0)
<a name="33"></a>
## 3.3 模型推理
PP-VPR 支持在使用`pip install paddlespeech`后 使用命令行的方式来使用预训练模型进行推理。
具体支持的功能包括:
- 对单条音频进行预测
- 对两条音频进行打分
- 支持 RTF 的计算
具体的使用方式可以参考: [speaker_verification](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speaker_verification/README_cn.md)
<a name="34"></a>
## 3.4 服务部署
PP-VPR 支持 Docker 容器化服务部署。通过 Milvus, MySQL 进行高性能建库检索。
server 的 demo [audio_searching](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/audio_searching)
![arch](https://ai-studio-static-online.cdn.bcebos.com/7b32dd0200084866863095677e8b40d3b725b867d2e6439e9cf21514e235dfd5)
关于服务部署方面的更多资料,可以参考 AIStudio 教程:
- [speaker_recognition](https://aistudio.baidu.com/aistudio/projectdetail/4027664)
<a name="4"></a>
## 4. 快速开始
关于如何使用 PP-VPR可以看这里的 [install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md),其中提供了 **简单**、**中等**、**困难** 三种安装方式。如果想体验 paddlespeech 的推理功能,可以用 **简单** 安装方式。

@ -11,7 +11,7 @@ paddlespeech version: 0.2.0
| conformer | 47.07M | conf/conformer.yaml | spec_aug | test | attention_rescoring | - | 0.0464 | | conformer | 47.07M | conf/conformer.yaml | spec_aug | test | attention_rescoring | - | 0.0464 |
## Chunk Conformer ## Conformer Streaming
paddle version: 2.2.2 paddle version: 2.2.2
paddlespeech version: 0.2.0 paddlespeech version: 0.2.0
Need set `decoding.decoding_chunk_size=16` when decoding. Need set `decoding.decoding_chunk_size=16` when decoding.

@ -1,6 +1,6 @@
# LibriSpeech # LibriSpeech
## Deepspeech2 ## Deepspeech2 Non-Streaming
| Model | Params | release | Config | Test set | Loss | WER | | Model | Params | release | Config | Test set | Loss | WER |
| --- | --- | --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- | --- | --- |
| DeepSpeech2 | 42.96M | 2.2.0 | conf/deepspeech2.yaml + spec_aug | test-clean | 14.49190807 | 0.067283 | | DeepSpeech2 | 42.96M | 2.2.0 | conf/deepspeech2.yaml + spec_aug | test-clean | 14.49190807 | 0.067283 |

@ -11,7 +11,7 @@ train: Epoch 70, 4 V100-32G, best avg: 20
| conformer | 47.63 M | conf/conformer.yaml | spec_aug | test-clean | attention_rescoring | 6.433612394332886 | 0.033761 | | conformer | 47.63 M | conf/conformer.yaml | spec_aug | test-clean | attention_rescoring | 6.433612394332886 | 0.033761 |
## Chunk Conformer ## Conformer Streaming
| Model | Params | Config | Augmentation| Test set | Decode method | Chunk Size & Left Chunks | Loss | WER | | Model | Params | Config | Augmentation| Test set | Decode method | Chunk Size & Left Chunks | Loss | WER |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- | --- | --- | --- | --- |

@ -1,6 +1,6 @@
# WenetSpeech # WenetSpeech
## Conformer online ## Conformer Streaming
| Model | Params | Config | Augmentation| Test set | Decode method | Valid Loss | CER | | Model | Params | Config | Augmentation| Test set | Decode method | Valid Loss | CER |
| --- | --- | --- | --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- | --- | --- | --- |

@ -187,6 +187,7 @@ class ASRExecutor(BaseExecutor):
vocab=self.config.vocab_filepath, vocab=self.config.vocab_filepath,
spm_model_prefix=self.config.spm_model_prefix) spm_model_prefix=self.config.spm_model_prefix)
self.config.decode.decoding_method = decode_method self.config.decode.decoding_method = decode_method
else: else:
raise Exception("wrong type") raise Exception("wrong type")
model_name = model_type[:model_type.rindex( model_name = model_type[:model_type.rindex(
@ -201,6 +202,21 @@ class ASRExecutor(BaseExecutor):
model_dict = paddle.load(self.ckpt_path) model_dict = paddle.load(self.ckpt_path)
self.model.set_state_dict(model_dict) self.model.set_state_dict(model_dict)
# compute the max len limit
if "conformer" in model_type or "transformer" in model_type or "wenetspeech" in model_type:
# in transformer like model, we may use the subsample rate cnn network
subsample_rate = self.model.subsampling_rate()
frame_shift_ms = self.config.preprocess_config.process[0][
'n_shift'] / self.config.preprocess_config.process[0]['fs']
max_len = self.model.encoder.embed.pos_enc.max_len
if self.config.encoder_conf.get("max_len", None):
max_len = self.config.encoder_conf.max_len
self.max_len = frame_shift_ms * max_len * subsample_rate
logger.info(
f"The asr server limit max duration len: {self.max_len}")
def preprocess(self, model_type: str, input: Union[str, os.PathLike]): def preprocess(self, model_type: str, input: Union[str, os.PathLike]):
""" """
Input preprocess and return paddle.Tensor stored in self.input. Input preprocess and return paddle.Tensor stored in self.input.
@ -352,9 +368,10 @@ class ASRExecutor(BaseExecutor):
audio, audio_sample_rate = soundfile.read( audio, audio_sample_rate = soundfile.read(
audio_file, dtype="int16", always_2d=True) audio_file, dtype="int16", always_2d=True)
audio_duration = audio.shape[0] / audio_sample_rate audio_duration = audio.shape[0] / audio_sample_rate
max_duration = 50.0 if audio_duration > self.max_len:
if audio_duration >= max_duration: logger.error(
logger.error("Please input audio file less then 50 seconds.\n") f"Please input audio file less then {self.max_len} seconds.\n"
)
return False return False
except Exception as e: except Exception as e:
logger.exception(e) logger.exception(e)

@ -62,21 +62,21 @@ class TransformerDecoder(BatchScorerInterface, nn.Layer):
False: x -> x + att(x) False: x -> x + att(x)
""" """
def __init__( def __init__(self,
self, vocab_size: int,
vocab_size: int, encoder_output_size: int,
encoder_output_size: int, attention_heads: int=4,
attention_heads: int=4, linear_units: int=2048,
linear_units: int=2048, num_blocks: int=6,
num_blocks: int=6, dropout_rate: float=0.1,
dropout_rate: float=0.1, positional_dropout_rate: float=0.1,
positional_dropout_rate: float=0.1, self_attention_dropout_rate: float=0.0,
self_attention_dropout_rate: float=0.0, src_attention_dropout_rate: float=0.0,
src_attention_dropout_rate: float=0.0, input_layer: str="embed",
input_layer: str="embed", use_output_layer: bool=True,
use_output_layer: bool=True, normalize_before: bool=True,
normalize_before: bool=True, concat_after: bool=False,
concat_after: bool=False, ): max_len: int=5000):
assert check_argument_types() assert check_argument_types()
@ -87,7 +87,8 @@ class TransformerDecoder(BatchScorerInterface, nn.Layer):
if input_layer == "embed": if input_layer == "embed":
self.embed = nn.Sequential( self.embed = nn.Sequential(
Embedding(vocab_size, attention_dim), Embedding(vocab_size, attention_dim),
PositionalEncoding(attention_dim, positional_dropout_rate), ) PositionalEncoding(
attention_dim, positional_dropout_rate, max_len=max_len), )
else: else:
raise ValueError(f"only 'embed' is supported: {input_layer}") raise ValueError(f"only 'embed' is supported: {input_layer}")

@ -112,7 +112,9 @@ class PositionalEncoding(nn.Layer, PositionalEncodingInterface):
paddle.Tensor: for compatibility to RelPositionalEncoding, (batch=1, time, ...) paddle.Tensor: for compatibility to RelPositionalEncoding, (batch=1, time, ...)
""" """
T = x.shape[1] T = x.shape[1]
assert offset + x.shape[1] < self.max_len assert offset + x.shape[
1] < self.max_len, "offset: {} + x.shape[1]: {} is larger than the max_len: {}".format(
offset, x.shape[1], self.max_len)
#TODO(Hui Zhang): using T = x.size(1), __getitem__ not support Tensor #TODO(Hui Zhang): using T = x.size(1), __getitem__ not support Tensor
pos_emb = self.pe[:, offset:offset + T] pos_emb = self.pe[:, offset:offset + T]
x = x * self.xscale + pos_emb x = x * self.xscale + pos_emb
@ -148,6 +150,7 @@ class RelPositionalEncoding(PositionalEncoding):
max_len (int, optional): [Maximum input length.]. Defaults to 5000. max_len (int, optional): [Maximum input length.]. Defaults to 5000.
""" """
super().__init__(d_model, dropout_rate, max_len, reverse=True) super().__init__(d_model, dropout_rate, max_len, reverse=True)
logger.info(f"max len: {max_len}")
def forward(self, x: paddle.Tensor, def forward(self, x: paddle.Tensor,
offset: int=0) -> Tuple[paddle.Tensor, paddle.Tensor]: offset: int=0) -> Tuple[paddle.Tensor, paddle.Tensor]:
@ -158,7 +161,9 @@ class RelPositionalEncoding(PositionalEncoding):
paddle.Tensor: Encoded tensor (batch, time, `*`). paddle.Tensor: Encoded tensor (batch, time, `*`).
paddle.Tensor: Positional embedding tensor (1, time, `*`). paddle.Tensor: Positional embedding tensor (1, time, `*`).
""" """
assert offset + x.shape[1] < self.max_len assert offset + x.shape[
1] < self.max_len, "offset: {} + x.shape[1]: {} is larger than the max_len: {}".format(
offset, x.shape[1], self.max_len)
x = x * self.xscale x = x * self.xscale
#TODO(Hui Zhang): using x.size(1), __getitem__ not support Tensor #TODO(Hui Zhang): using x.size(1), __getitem__ not support Tensor
pos_emb = self.pe[:, offset:offset + x.shape[1]] pos_emb = self.pe[:, offset:offset + x.shape[1]]

@ -47,24 +47,24 @@ __all__ = ["BaseEncoder", 'TransformerEncoder', "ConformerEncoder"]
class BaseEncoder(nn.Layer): class BaseEncoder(nn.Layer):
def __init__( def __init__(self,
self, input_size: int,
input_size: int, output_size: int=256,
output_size: int=256, attention_heads: int=4,
attention_heads: int=4, linear_units: int=2048,
linear_units: int=2048, num_blocks: int=6,
num_blocks: int=6, dropout_rate: float=0.1,
dropout_rate: float=0.1, positional_dropout_rate: float=0.1,
positional_dropout_rate: float=0.1, attention_dropout_rate: float=0.0,
attention_dropout_rate: float=0.0, input_layer: str="conv2d",
input_layer: str="conv2d", pos_enc_layer_type: str="abs_pos",
pos_enc_layer_type: str="abs_pos", normalize_before: bool=True,
normalize_before: bool=True, concat_after: bool=False,
concat_after: bool=False, static_chunk_size: int=0,
static_chunk_size: int=0, use_dynamic_chunk: bool=False,
use_dynamic_chunk: bool=False, global_cmvn: paddle.nn.Layer=None,
global_cmvn: paddle.nn.Layer=None, use_dynamic_left_chunk: bool=False,
use_dynamic_left_chunk: bool=False, ): max_len: int=5000):
""" """
Args: Args:
input_size (int): input dim, d_feature input_size (int): input dim, d_feature
@ -127,7 +127,9 @@ class BaseEncoder(nn.Layer):
odim=output_size, odim=output_size,
dropout_rate=dropout_rate, dropout_rate=dropout_rate,
pos_enc_class=pos_enc_class( pos_enc_class=pos_enc_class(
d_model=output_size, dropout_rate=positional_dropout_rate), ) d_model=output_size,
dropout_rate=positional_dropout_rate,
max_len=max_len), )
self.normalize_before = normalize_before self.normalize_before = normalize_before
self.after_norm = LayerNorm(output_size, epsilon=1e-12) self.after_norm = LayerNorm(output_size, epsilon=1e-12)
@ -415,32 +417,32 @@ class TransformerEncoder(BaseEncoder):
class ConformerEncoder(BaseEncoder): class ConformerEncoder(BaseEncoder):
"""Conformer encoder module.""" """Conformer encoder module."""
def __init__( def __init__(self,
self, input_size: int,
input_size: int, output_size: int=256,
output_size: int=256, attention_heads: int=4,
attention_heads: int=4, linear_units: int=2048,
linear_units: int=2048, num_blocks: int=6,
num_blocks: int=6, dropout_rate: float=0.1,
dropout_rate: float=0.1, positional_dropout_rate: float=0.1,
positional_dropout_rate: float=0.1, attention_dropout_rate: float=0.0,
attention_dropout_rate: float=0.0, input_layer: str="conv2d",
input_layer: str="conv2d", pos_enc_layer_type: str="rel_pos",
pos_enc_layer_type: str="rel_pos", normalize_before: bool=True,
normalize_before: bool=True, concat_after: bool=False,
concat_after: bool=False, static_chunk_size: int=0,
static_chunk_size: int=0, use_dynamic_chunk: bool=False,
use_dynamic_chunk: bool=False, global_cmvn: nn.Layer=None,
global_cmvn: nn.Layer=None, use_dynamic_left_chunk: bool=False,
use_dynamic_left_chunk: bool=False, positionwise_conv_kernel_size: int=1,
positionwise_conv_kernel_size: int=1, macaron_style: bool=True,
macaron_style: bool=True, selfattention_layer_type: str="rel_selfattn",
selfattention_layer_type: str="rel_selfattn", activation_type: str="swish",
activation_type: str="swish", use_cnn_module: bool=True,
use_cnn_module: bool=True, cnn_module_kernel: int=15,
cnn_module_kernel: int=15, causal: bool=False,
causal: bool=False, cnn_module_norm: str="batch_norm",
cnn_module_norm: str="batch_norm", ): max_len: int=5000):
"""Construct ConformerEncoder """Construct ConformerEncoder
Args: Args:
input_size to use_dynamic_chunk, see in BaseEncoder input_size to use_dynamic_chunk, see in BaseEncoder
@ -464,7 +466,7 @@ class ConformerEncoder(BaseEncoder):
attention_dropout_rate, input_layer, attention_dropout_rate, input_layer,
pos_enc_layer_type, normalize_before, concat_after, pos_enc_layer_type, normalize_before, concat_after,
static_chunk_size, use_dynamic_chunk, global_cmvn, static_chunk_size, use_dynamic_chunk, global_cmvn,
use_dynamic_left_chunk) use_dynamic_left_chunk, max_len)
activation = get_activation(activation_type) activation = get_activation(activation_type)
# self-attention module definition # self-attention module definition

@ -20,6 +20,7 @@ import os
import random import random
import sys import sys
import time import time
import warnings
from typing import List from typing import List
import numpy as np import numpy as np
@ -34,6 +35,7 @@ from paddlespeech.server.utils.audio_handler import ASRWsAudioHandler
from paddlespeech.server.utils.audio_process import wav2pcm from paddlespeech.server.utils.audio_process import wav2pcm
from paddlespeech.server.utils.util import compute_delay from paddlespeech.server.utils.util import compute_delay
from paddlespeech.server.utils.util import wav2base64 from paddlespeech.server.utils.util import wav2base64
warnings.filterwarnings("ignore")
__all__ = [ __all__ = [
'TTSClientExecutor', 'TTSOnlineClientExecutor', 'ASRClientExecutor', 'TTSClientExecutor', 'TTSOnlineClientExecutor', 'ASRClientExecutor',
@ -752,3 +754,88 @@ class VectorClientExecutor(BaseExecutor):
logger.info(f"The vector score is: {res}") logger.info(f"The vector score is: {res}")
else: else:
logger.error(f"Sorry, we have not support such task {task}") logger.error(f"Sorry, we have not support such task {task}")
@cli_client_register(
name='paddlespeech_client.acs', description='visit acs service')
class ACSClientExecutor(BaseExecutor):
def __init__(self):
super(ACSClientExecutor, self).__init__()
self.parser = argparse.ArgumentParser(
prog='paddlespeech_client.acs', add_help=True)
self.parser.add_argument(
'--server_ip', type=str, default='127.0.0.1', help='server ip')
self.parser.add_argument(
'--port', type=int, default=8090, help='server port')
self.parser.add_argument(
'--input',
type=str,
default=None,
help='Audio file to be recognized',
required=True)
self.parser.add_argument(
'--sample_rate', type=int, default=16000, help='audio sample rate')
self.parser.add_argument(
'--lang', type=str, default="zh_cn", help='language')
self.parser.add_argument(
'--audio_format', type=str, default="wav", help='audio format')
def execute(self, argv: List[str]) -> bool:
args = self.parser.parse_args(argv)
input_ = args.input
server_ip = args.server_ip
port = args.port
sample_rate = args.sample_rate
lang = args.lang
audio_format = args.audio_format
try:
time_start = time.time()
res = self(
input=input_,
server_ip=server_ip,
port=port,
sample_rate=sample_rate,
lang=lang,
audio_format=audio_format, )
time_end = time.time()
logger.info(f"ACS result: {res}")
logger.info("Response time %f s." % (time_end - time_start))
return True
except Exception as e:
logger.error("Failed to speech recognition.")
logger.error(e)
return False
@stats_wrapper
def __call__(
self,
input: str,
server_ip: str="127.0.0.1",
port: int=8090,
sample_rate: int=16000,
lang: str="zh_cn",
audio_format: str="wav", ):
"""Python API to call an executor.
Args:
input (str): The input audio file path
server_ip (str, optional): The ASR server ip. Defaults to "127.0.0.1".
port (int, optional): The ASR server port. Defaults to 8090.
sample_rate (int, optional): The audio sample rate. Defaults to 16000.
lang (str, optional): The audio language type. Defaults to "zh_cn".
audio_format (str, optional): The audio format information. Defaults to "wav".
Returns:
str: The ACS results
"""
# we use the acs server to get the key word time stamp in audio text content
logger.info("acs http client start")
from paddlespeech.server.utils.audio_handler import ASRHttpHandler
handler = ASRHttpHandler(
server_ip=server_ip, port=port, endpoint="/paddlespeech/asr/search")
res = handler.run(input, audio_format, sample_rate, lang)
res = res['result']
logger.info("acs http client finished")
return res

@ -13,12 +13,14 @@
# limitations under the License. # limitations under the License.
import argparse import argparse
import sys import sys
import warnings
from typing import List from typing import List
import uvicorn import uvicorn
from fastapi import FastAPI from fastapi import FastAPI
from starlette.middleware.cors import CORSMiddleware from starlette.middleware.cors import CORSMiddleware
from prettytable import PrettyTable from prettytable import PrettyTable
from starlette.middleware.cors import CORSMiddleware
from ..executor import BaseExecutor from ..executor import BaseExecutor
from ..util import cli_server_register from ..util import cli_server_register
@ -28,6 +30,7 @@ from paddlespeech.server.engine.engine_pool import init_engine_pool
from paddlespeech.server.restful.api import setup_router as setup_http_router from paddlespeech.server.restful.api import setup_router as setup_http_router
from paddlespeech.server.utils.config import get_config from paddlespeech.server.utils.config import get_config
from paddlespeech.server.ws.api import setup_router as setup_ws_router from paddlespeech.server.ws.api import setup_router as setup_ws_router
warnings.filterwarnings("ignore")
__all__ = ['ServerExecutor', 'ServerStatsExecutor'] __all__ = ['ServerExecutor', 'ServerStatsExecutor']
@ -40,6 +43,10 @@ app.add_middleware(
allow_credentials=True, allow_credentials=True,
allow_methods=["*"], allow_methods=["*"],
allow_headers=["*"]) allow_headers=["*"])
<<<<<<< HEAD
=======
>>>>>>> develop
@cli_server_register( @cli_server_register(
name='paddlespeech_server.start', description='Start the service') name='paddlespeech_server.start', description='Start the service')
@ -79,7 +86,7 @@ class ServerExecutor(BaseExecutor):
else: else:
raise Exception("unsupported protocol") raise Exception("unsupported protocol")
app.include_router(api_router) app.include_router(api_router)
logger.info("start to init the engine")
if not init_engine_pool(config): if not init_engine_pool(config):
return False return False

@ -3,7 +3,7 @@
################################################################################# #################################################################################
# SERVER SETTING # # SERVER SETTING #
################################################################################# #################################################################################
host: 127.0.0.1 host: 0.0.0.0
port: 8090 port: 8090
# The task format in the engin_list is: <speech task>_<engine type> # The task format in the engin_list is: <speech task>_<engine type>
@ -157,4 +157,4 @@ vector_python:
sample_rate: 16000 sample_rate: 16000
cfg_path: # [optional] cfg_path: # [optional]
ckpt_path: # [optional] ckpt_path: # [optional]
device: # set 'gpu:id' or 'cpu' device: # set 'gpu:id' or 'cpu'

@ -3,7 +3,7 @@
################################################################################# #################################################################################
# SERVER SETTING # # SERVER SETTING #
################################################################################# #################################################################################
host: 127.0.0.1 host: 0.0.0.0
port: 8092 port: 8092
# The task format in the engin_list is: <speech task>_<engine type> # The task format in the engin_list is: <speech task>_<engine type>

@ -0,0 +1,188 @@
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import io
import json
import os
import re
import paddle
import soundfile
import websocket
from paddlespeech.cli.log import logger
from paddlespeech.server.engine.base_engine import BaseEngine
class ACSEngine(BaseEngine):
def __init__(self):
"""The ACSEngine Engine
"""
super(ACSEngine, self).__init__()
logger.info("Create the ACSEngine Instance")
self.word_list = []
def init(self, config: dict):
"""Init the ACSEngine Engine
Args:
config (dict): The server configuation
Returns:
bool: The engine instance flag
"""
logger.info("Init the acs engine")
try:
self.config = config
if self.config.device:
self.device = self.config.device
else:
self.device = paddle.get_device()
paddle.set_device(self.device)
logger.info(f"ACS Engine set the device: {self.device}")
except BaseException as e:
logger.error(
"Set device failed, please check if device is already used and the parameter 'device' in the yaml file"
)
logger.error("Initialize Text server engine Failed on device: %s." %
(self.device))
return False
self.read_search_words()
# init the asr url
self.url = "ws://" + self.config.asr_server_ip + ":" + str(
self.config.asr_server_port) + "/paddlespeech/asr/streaming"
logger.info("Init the acs engine successfully")
return True
def read_search_words(self):
word_list = self.config.word_list
if word_list is None:
logger.error(
"No word list file in config, please set the word list parameter"
)
return
if not os.path.exists(word_list):
logger.error("Please input correct word list file")
return
with open(word_list, 'r') as fp:
self.word_list = [line.strip() for line in fp.readlines()]
logger.info(f"word list: {self.word_list}")
def get_asr_content(self, audio_data):
"""Get the streaming asr result
Args:
audio_data (_type_): _description_
Returns:
_type_: _description_
"""
logger.info("send a message to the server")
if self.url is None:
logger.error("No asr server, please input valid ip and port")
return ""
ws = websocket.WebSocket()
ws.connect(self.url)
# with websocket.WebSocket.connect(self.url) as ws:
audio_info = json.dumps(
{
"name": "test.wav",
"signal": "start",
"nbest": 1
},
sort_keys=True,
indent=4,
separators=(',', ': '))
ws.send(audio_info)
msg = ws.recv()
logger.info("client receive msg={}".format(msg))
# send the total audio data
samples, sample_rate = soundfile.read(audio_data, dtype='int16')
ws.send_binary(samples.tobytes())
msg = ws.recv()
msg = json.loads(msg)
logger.info(f"audio result: {msg}")
# 3. send chunk audio data to engine
logger.info("send the end signal")
audio_info = json.dumps(
{
"name": "test.wav",
"signal": "end",
"nbest": 1
},
sort_keys=True,
indent=4,
separators=(',', ': '))
ws.send(audio_info)
msg = ws.recv()
msg = json.loads(msg)
logger.info(f"the final result: {msg}")
ws.close()
return msg
def get_macthed_word(self, msg):
"""Get the matched info in msg
Args:
msg (dict): the asr info, including the asr result and time stamp
Returns:
acs_result, asr_result: the acs result and the asr result
"""
asr_result = msg['result']
time_stamp = msg['times']
acs_result = []
# search for each word in self.word_list
offset = self.config.offset
max_ed = time_stamp[-1]['ed']
for w in self.word_list:
# search the w in asr_result and the index in asr_result
for m in re.finditer(w, asr_result):
start = max(time_stamp[m.start(0)]['bg'] - offset, 0)
end = min(time_stamp[m.end(0) - 1]['ed'] + offset, max_ed)
logger.info(f'start: {start}, end: {end}')
acs_result.append({'w': w, 'bg': start, 'ed': end})
return acs_result, asr_result
def run(self, audio_data):
"""process the audio data in acs engine
the engine does not store any data, so all the request use the self.run api
Args:
audio_data (str): the audio data
Returns:
acs_result, asr_result: the acs result and the asr result
"""
logger.info("start to process the audio content search")
msg = self.get_asr_content(io.BytesIO(audio_data))
acs_result, asr_result = self.get_macthed_word(msg)
logger.info(f'the asr result {asr_result}')
logger.info(f'the acs result: {acs_result}')
return acs_result, asr_result

@ -13,6 +13,7 @@
# limitations under the License. # limitations under the License.
import copy import copy
import os import os
import sys
from typing import Optional from typing import Optional
import numpy as np import numpy as np
@ -588,7 +589,7 @@ class ASRServerExecutor(ASRExecutor):
self.pretrained_models = pretrained_models self.pretrained_models = pretrained_models
def _init_from_path(self, def _init_from_path(self,
model_type: str='deepspeech2online_aishell', model_type: str=None,
am_model: Optional[os.PathLike]=None, am_model: Optional[os.PathLike]=None,
am_params: Optional[os.PathLike]=None, am_params: Optional[os.PathLike]=None,
lang: str='zh', lang: str='zh',
@ -599,6 +600,12 @@ class ASRServerExecutor(ASRExecutor):
""" """
Init model and other resources from a specific path. Init model and other resources from a specific path.
""" """
if not model_type or not lang or not sample_rate:
logger.error(
"The model type or lang or sample rate is None, please input an valid server parameter yaml"
)
return False
self.model_type = model_type self.model_type = model_type
self.sample_rate = sample_rate self.sample_rate = sample_rate
sample_rate_str = '16k' if sample_rate == 16000 else '8k' sample_rate_str = '16k' if sample_rate == 16000 else '8k'
@ -730,6 +737,8 @@ class ASRServerExecutor(ASRExecutor):
# update the ctc decoding # update the ctc decoding
self.searcher = CTCPrefixBeamSearch(self.config.decode) self.searcher = CTCPrefixBeamSearch(self.config.decode)
self.transformer_decode_reset() self.transformer_decode_reset()
return True
def reset_decoder_and_chunk(self): def reset_decoder_and_chunk(self):
"""reset decoder and chunk state for an new audio """reset decoder and chunk state for an new audio
@ -1028,20 +1037,27 @@ class ASREngine(BaseEngine):
self.device = paddle.get_device() self.device = paddle.get_device()
logger.info(f"paddlespeech_server set the device: {self.device}") logger.info(f"paddlespeech_server set the device: {self.device}")
paddle.set_device(self.device) paddle.set_device(self.device)
except BaseException: except BaseException as e:
logger.error( logger.error(
"Set device failed, please check if device is already used and the parameter 'device' in the yaml file" f"Set device failed, please check if device '{self.device}' is already used and the parameter 'device' in the yaml file"
) )
logger.error(
self.executor._init_from_path( "If all GPU or XPU is used, you can set the server to 'cpu'")
model_type=self.config.model_type, sys.exit(-1)
am_model=self.config.am_model,
am_params=self.config.am_params, if not self.executor._init_from_path(
lang=self.config.lang, model_type=self.config.model_type,
sample_rate=self.config.sample_rate, am_model=self.config.am_model,
cfg_path=self.config.cfg_path, am_params=self.config.am_params,
decode_method=self.config.decode_method, lang=self.config.lang,
am_predictor_conf=self.config.am_predictor_conf) sample_rate=self.config.sample_rate,
cfg_path=self.config.cfg_path,
decode_method=self.config.decode_method,
am_predictor_conf=self.config.am_predictor_conf):
logger.error(
"Init the ASR server occurs error, please check the server configuration yaml"
)
return False
logger.info("Initialize ASR server engine successfully.") logger.info("Initialize ASR server engine successfully.")
return True return True

@ -78,21 +78,26 @@ class ASREngine(BaseEngine):
Args: Args:
audio_data (bytes): base64.b64decode audio_data (bytes): base64.b64decode
""" """
if self.executor._check( try:
io.BytesIO(audio_data), self.config.sample_rate, if self.executor._check(
self.config.force_yes): io.BytesIO(audio_data), self.config.sample_rate,
logger.info("start run asr engine") self.config.force_yes):
self.executor.preprocess(self.config.model, io.BytesIO(audio_data)) logger.info("start run asr engine")
st = time.time() self.executor.preprocess(self.config.model,
self.executor.infer(self.config.model) io.BytesIO(audio_data))
infer_time = time.time() - st st = time.time()
self.output = self.executor.postprocess() # Retrieve result of asr. self.executor.infer(self.config.model)
else: infer_time = time.time() - st
logger.info("file check failed!") self.output = self.executor.postprocess(
self.output = None ) # Retrieve result of asr.
else:
logger.info("inference time: {}".format(infer_time)) logger.info("file check failed!")
logger.info("asr engine type: python") self.output = None
logger.info("inference time: {}".format(infer_time))
logger.info("asr engine type: python")
except Exception as e:
logger.info(e)
def postprocess(self): def postprocess(self):
"""postprocess """postprocess

@ -52,5 +52,8 @@ class EngineFactory(object):
elif engine_name.lower() == 'vector' and engine_type.lower() == 'python': elif engine_name.lower() == 'vector' and engine_type.lower() == 'python':
from paddlespeech.server.engine.vector.python.vector_engine import VectorEngine from paddlespeech.server.engine.vector.python.vector_engine import VectorEngine
return VectorEngine() return VectorEngine()
elif engine_name.lower() == 'acs' and engine_type.lower() == 'python':
from paddlespeech.server.engine.acs.python.acs_engine import ACSEngine
return ACSEngine()
else: else:
return None return None

@ -34,6 +34,7 @@ def init_engine_pool(config) -> bool:
engine_type = engine_and_type.split("_")[1] engine_type = engine_and_type.split("_")[1]
ENGINE_POOL[engine] = EngineFactory.get_engine( ENGINE_POOL[engine] = EngineFactory.get_engine(
engine_name=engine, engine_type=engine_type) engine_name=engine, engine_type=engine_type)
if not ENGINE_POOL[engine].init(config=config[engine_and_type]): if not ENGINE_POOL[engine].init(config=config[engine_and_type]):
return False return False

@ -0,0 +1,101 @@
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import base64
from typing import Union
from fastapi import APIRouter
from paddlespeech.cli.log import logger
from paddlespeech.server.engine.engine_pool import get_engine_pool
from paddlespeech.server.restful.request import ASRRequest
from paddlespeech.server.restful.response import ACSResponse
from paddlespeech.server.restful.response import ErrorResponse
from paddlespeech.server.utils.errors import ErrorCode
from paddlespeech.server.utils.errors import failed_response
from paddlespeech.server.utils.exception import ServerBaseException
router = APIRouter()
@router.get('/paddlespeech/asr/search/help')
def help():
"""help
Returns:
json: the audio content search result
"""
response = {
"success": "True",
"code": 200,
"message": {
"global": "success"
},
"result": {
"description": "acs server",
"input": "base64 string of wavfile",
"output": {
"asr_result": "你好",
"acs_result": [{
'w': '',
'bg': 0.0,
'ed': 1.2
}]
}
}
}
return response
@router.post(
"/paddlespeech/asr/search",
response_model=Union[ACSResponse, ErrorResponse])
def acs(request_body: ASRRequest):
"""acs api
Args:
request_body (ASRRequest): the acs request, we reuse the http ASRRequest
Returns:
json: the acs result
"""
try:
# 1. get the audio data via base64 decoding
audio_data = base64.b64decode(request_body.audio)
# 2. get single engine from engine pool
engine_pool = get_engine_pool()
acs_engine = engine_pool['acs']
# 3. no data stored in acs_engine, so we need to create the another instance process the data
acs_result, asr_result = acs_engine.run(audio_data)
response = {
"success": True,
"code": 200,
"message": {
"description": "success"
},
"result": {
"transcription": asr_result,
"acs": acs_result
}
}
except ServerBaseException as e:
response = failed_response(e.error_code, e.msg)
except BaseException as e:
response = failed_response(ErrorCode.SERVER_UNKOWN_ERR)
logger.error(e)
return response

@ -22,6 +22,7 @@ from paddlespeech.server.restful.cls_api import router as cls_router
from paddlespeech.server.restful.text_api import router as text_router from paddlespeech.server.restful.text_api import router as text_router
from paddlespeech.server.restful.tts_api import router as tts_router from paddlespeech.server.restful.tts_api import router as tts_router
from paddlespeech.server.restful.vector_api import router as vec_router from paddlespeech.server.restful.vector_api import router as vec_router
from paddlespeech.server.restful.acs_api import router as acs_router
_router = APIRouter() _router = APIRouter()
@ -45,6 +46,8 @@ def setup_router(api_list: List):
_router.include_router(text_router) _router.include_router(text_router)
elif api_name.lower() == 'vector': elif api_name.lower() == 'vector':
_router.include_router(vec_router) _router.include_router(vec_router)
elif api_name.lower() == 'acs':
_router.include_router(acs_router)
else: else:
logger.error( logger.error(
f"PaddleSpeech has not support such service: {api_name}") f"PaddleSpeech has not support such service: {api_name}")

@ -17,7 +17,7 @@ from pydantic import BaseModel
__all__ = [ __all__ = [
'ASRResponse', 'TTSResponse', 'CLSResponse', 'TextResponse', 'ASRResponse', 'TTSResponse', 'CLSResponse', 'TextResponse',
'VectorResponse', 'VectorScoreResponse' 'VectorResponse', 'VectorScoreResponse', 'ACSResponse'
] ]
@ -231,3 +231,32 @@ class ErrorResponse(BaseModel):
success: bool success: bool
code: int code: int
message: Message message: Message
#****************************************************************************************/
#************************************ ACS response **************************************/
#****************************************************************************************/
class AcsResult(BaseModel):
transcription: str
acs: list
class ACSResponse(BaseModel):
"""
response example
{
"success": true,
"code": 0,
"message": {
"description": "success"
},
"result": {
"transcription": "你好,飞桨"
"acs": [(你好, 0.0, 0.45)]
}
}
"""
success: bool
code: int
message: Message
result: AcsResult

@ -205,7 +205,7 @@ class ASRWsAudioHandler:
class ASRHttpHandler: class ASRHttpHandler:
def __init__(self, server_ip=None, port=None): def __init__(self, server_ip=None, port=None, endpoint="/paddlespeech/asr"):
"""The ASR client http request """The ASR client http request
Args: Args:
@ -219,7 +219,7 @@ class ASRHttpHandler:
self.url = None self.url = None
else: else:
self.url = 'http://' + self.server_ip + ":" + str( self.url = 'http://' + self.server_ip + ":" + str(
self.port) + '/paddlespeech/asr' self.port) + endpoint
logger.info(f"endpoint: {self.url}") logger.info(f"endpoint: {self.url}")
def run(self, input, audio_format, sample_rate, lang): def run(self, input, audio_format, sample_rate, lang):
@ -248,7 +248,7 @@ class ASRHttpHandler:
} }
res = requests.post(url=self.url, data=json.dumps(data)) res = requests.post(url=self.url, data=json.dumps(data))
return res.json() return res.json()

@ -18,9 +18,9 @@ from fastapi import WebSocket
from fastapi import WebSocketDisconnect from fastapi import WebSocketDisconnect
from starlette.websockets import WebSocketState as WebSocketState from starlette.websockets import WebSocketState as WebSocketState
from paddlespeech.cli.log import logger
from paddlespeech.server.engine.asr.online.asr_engine import PaddleASRConnectionHanddler from paddlespeech.server.engine.asr.online.asr_engine import PaddleASRConnectionHanddler
from paddlespeech.server.engine.engine_pool import get_engine_pool from paddlespeech.server.engine.engine_pool import get_engine_pool
router = APIRouter() router = APIRouter()
@ -106,5 +106,5 @@ async def websocket_endpoint(websocket: WebSocket):
# if the engine create the vad instance, this connection will have many period results # if the engine create the vad instance, this connection will have many period results
resp = {'result': asr_results} resp = {'result': asr_results}
await websocket.send_json(resp) await websocket.send_json(resp)
except WebSocketDisconnect: except WebSocketDisconnect as e:
pass logger.error(e)

@ -63,7 +63,8 @@ class ToneSandhi():
'扫把', '惦记' '扫把', '惦记'
} }
self.must_not_neural_tone_words = { self.must_not_neural_tone_words = {
"男子", "女子", "分子", "原子", "量子", "莲子", "石子", "瓜子", "电子", "人人", "虎虎" "男子", "女子", "分子", "原子", "量子", "莲子", "石子", "瓜子", "电子", "人人", "虎虎",
"幺幺"
} }
self.punc = ":,;。?!“”‘’':,;.?!" self.punc = ":,;。?!“”‘’':,;.?!"

@ -103,7 +103,7 @@ def replace_default_num(match):
str str
""" """
number = match.group(0) number = match.group(0)
return verbalize_digit(number) return verbalize_digit(number, alt_one=True)
# 数字表达式 # 数字表达式

@ -1,5 +1,6 @@
#!/bin/bash #!/bin/bash
source path.sh
stage=-1 stage=-1
stop_stage=100 stop_stage=100
MAIN_ROOT=../../.. MAIN_ROOT=../../..
@ -23,5 +24,5 @@ if [ ${stage} -le -1 ] && [ ${stop_stage} -ge -1 ]; then
fi fi
if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
cat data/manifest.test | paddlespeech asr --model conformer_online_aishell --rtf -v cat data/manifest.test | paddlespeech asr --model conformer_online_aishell --device gpu --decode_method ctc_prefix_beam_search --rtf -v
fi fi

@ -0,0 +1,11 @@
export MAIN_ROOT=`realpath ${PWD}/../../../`
export PATH=${MAIN_ROOT}:${MAIN_ROOT}/utils:${PATH}
export LC_ALL=C
export PYTHONDONTWRITEBYTECODE=1
# Use UTF-8 in Python to avoid UnicodeDecodeError when LC_ALL=C
export PYTHONIOENCODING=UTF-8
export PYTHONPATH=${MAIN_ROOT}:${PYTHONPATH}
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib/

@ -3,7 +3,7 @@
################################################################################# #################################################################################
# SERVER SETTING # # SERVER SETTING #
################################################################################# #################################################################################
host: 127.0.0.1 host: 0.0.0.0
port: 8090 port: 8090
# The task format in the engin_list is: <speech task>_<engine type> # The task format in the engin_list is: <speech task>_<engine type>

@ -3,7 +3,7 @@
################################################################################# #################################################################################
# SERVER SETTING # # SERVER SETTING #
################################################################################# #################################################################################
host: 127.0.0.1 host: 0.0.0.0
port: 8092 port: 8092
# The task format in the engin_list is: <speech task>_<engine type> # The task format in the engin_list is: <speech task>_<engine type>

@ -3,7 +3,7 @@
################################################################################# #################################################################################
# SERVER SETTING # # SERVER SETTING #
################################################################################# #################################################################################
host: 127.0.0.1 host: 0.0.0.0
port: 8092 port: 8092
# The task format in the engin_list is: <speech task>_<engine type> # The task format in the engin_list is: <speech task>_<engine type>

Loading…
Cancel
Save