Merge pull request #1813 from Honei/v0.3

[R1.0]update the paddlespeech_client asr_online cli
pull/1823/head
Hui Zhang 3 years ago committed by GitHub
commit cdb9a1b20b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -31,7 +31,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
- Command Line (Recommended) - Command Line (Recommended)
```bash ```bash
# start the service # in PaddleSpeech/demos/streaming_asr_server start the service
paddlespeech_server start --config_file ./conf/ws_conformer_application.yaml paddlespeech_server start --config_file ./conf/ws_conformer_application.yaml
``` ```
@ -111,6 +111,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
- Python API - Python API
```python ```python
# in PaddleSpeech/demos/streaming_asr_server directory
from paddlespeech.server.bin.paddlespeech_server import ServerExecutor from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
server_executor = ServerExecutor() server_executor = ServerExecutor()
@ -186,10 +187,11 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
### 4. ASR Client Usage ### 4. ASR Client Usage
**Note:** The response time will be slightly longer when using the client for the first time **Note:** The response time will be slightly longer when using the client for the first time
- Command Line (Recommended) - Command Line (Recommended)
``` ```
paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav --protocol websocket paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
``` ```
Usage: Usage:
@ -204,6 +206,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
- `sample_rate`: Audio ampling rate, default: 16000. - `sample_rate`: Audio ampling rate, default: 16000.
- `lang`: Language. Default: "zh_cn". - `lang`: Language. Default: "zh_cn".
- `audio_format`: Audio format. Default: "wav". - `audio_format`: Audio format. Default: "wav".
- `punc.server_ip`: punctuation server ip. Default: None.
- `punc.server_port`: punctuation server port. Default: None.
Output: Output:
```bash ```bash
@ -275,18 +279,16 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
- Python API - Python API
```python ```python
from paddlespeech.server.bin.paddlespeech_client import ASRClientExecutor from paddlespeech.server.bin.paddlespeech_client import ASROnlineClientExecutor
import json
asrclient_executor = ASRClientExecutor() asrclient_executor = ASROnlineClientExecutor()
res = asrclient_executor( res = asrclient_executor(
input="./zh.wav", input="./zh.wav",
server_ip="127.0.0.1", server_ip="127.0.0.1",
port=8090, port=8090,
sample_rate=16000, sample_rate=16000,
lang="zh_cn", lang="zh_cn",
audio_format="wav", audio_format="wav")
protocol="websocket")
print(res) print(res)
``` ```
@ -353,5 +355,4 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
[2022-04-21 15:59:08,016] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'} [2022-04-21 15:59:08,016] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:08,024] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'} [2022-04-21 15:59:08,024] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:12,883] [ INFO] - final receive msg={'status': 'ok', 'signal': 'finished', 'asr_results': '我认为跑步最重要的就是给我带来了身体健康'} [2022-04-21 15:59:12,883] [ INFO] - final receive msg={'status': 'ok', 'signal': 'finished', 'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:12,884] [ INFO] - 我认为跑步最重要的就是给我带来了身体健康
``` ```

@ -5,19 +5,26 @@
## 介绍 ## 介绍
这个demo是一个启动流式语音服务和访问服务的实现。 它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。 这个demo是一个启动流式语音服务和访问服务的实现。 它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。
流式语音识别服务只支持 `weboscket` 协议,不支持 `http` 协议。 **流式语音识别服务只支持 `weboscket` 协议,不支持 `http` 协议。**
## 使用方法 ## 使用方法
### 1. 安装 ### 1. 安装
请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). 安装 PaddleSpeech 的详细过程请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md)
推荐使用 **paddlepaddle 2.2.1** 或以上版本。 推荐使用 **paddlepaddle 2.2.1** 或以上版本。
你可以从 mediumhard 三种方式中选择一种方式安装 PaddleSpeech。 你可以从mediumhard 两种方式中选择一种方式安装 PaddleSpeech。
### 2. 准备配置文件 ### 2. 准备配置文件
配置文件可参见 `conf/ws_application.yaml``conf/ws_conformer_application.yaml`
目前服务集成的模型有: DeepSpeech2和conformer模型。 流式ASR的服务启动脚本和服务测试脚本存放在 `PaddleSpeech/demos/streaming_asr_server` 目录。
下载好 `PaddleSpeech` 之后,进入到 `PaddleSpeech/demos/streaming_asr_server` 目录。
配置文件可参见该目录下 `conf/ws_application.yaml``conf/ws_conformer_application.yaml`
目前服务集成的模型有: DeepSpeech2和 conformer模型对应的配置文件如下
* DeepSpeech: `conf/ws_application.yaml`
* conformer: `conf/ws_conformer_application.yaml`
这个 ASR client 的输入应该是一个 WAV 文件(`.wav`),并且采样率必须与模型的采样率相同。 这个 ASR client 的输入应该是一个 WAV 文件(`.wav`),并且采样率必须与模型的采样率相同。
@ -31,7 +38,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
- 命令行 (推荐使用) - 命令行 (推荐使用)
```bash ```bash
# 启动服务 # 在 PaddleSpeech/demos/streaming_asr_server 目录启动服务
paddlespeech_server start --config_file ./conf/ws_conformer_application.yaml paddlespeech_server start --config_file ./conf/ws_conformer_application.yaml
``` ```
@ -111,6 +118,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
- Python API - Python API
```python ```python
# 在 PaddleSpeech/demos/streaming_asr_server 目录
from paddlespeech.server.bin.paddlespeech_server import ServerExecutor from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
server_executor = ServerExecutor() server_executor = ServerExecutor()
@ -185,11 +193,11 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
``` ```
### 4. ASR 客户端使用方法 ### 4. ASR 客户端使用方法
**注意:** 初次使用客户端时响应时间会略长 **注意:** 初次使用客户端时响应时间会略长
- 命令行 (推荐使用) - 命令行 (推荐使用)
``` ```
paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav --protocol websocket paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
``` ```
使用帮助: 使用帮助:
@ -205,6 +213,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
- `sample_rate`: 音频采样率默认值16000。 - `sample_rate`: 音频采样率默认值16000。
- `lang`: 模型语言默认值zh_cn。 - `lang`: 模型语言默认值zh_cn。
- `audio_format`: 音频格式默认值wav。 - `audio_format`: 音频格式默认值wav。
- `punc.server_ip` 标点预测服务的ip。默认是None。
- `punc.server_port` 标点预测服务的端口port。默认是None。
输出: 输出:
@ -276,18 +286,16 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
- Python API - Python API
```python ```python
from paddlespeech.server.bin.paddlespeech_client import ASRClientExecutor from paddlespeech.server.bin.paddlespeech_client import ASROnlineClientExecutor
import json
asrclient_executor = ASRClientExecutor() asrclient_executor = ASROnlineClientExecutor()
res = asrclient_executor( res = asrclient_executor(
input="./zh.wav", input="./zh.wav",
server_ip="127.0.0.1", server_ip="127.0.0.1",
port=8090, port=8090,
sample_rate=16000, sample_rate=16000,
lang="zh_cn", lang="zh_cn",
audio_format="wav", audio_format="wav")
protocol="websocket")
print(res) print(res)
``` ```
@ -354,5 +362,4 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
[2022-04-21 15:59:08,016] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'} [2022-04-21 15:59:08,016] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:08,024] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'} [2022-04-21 15:59:08,024] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:12,883] [ INFO] - final receive msg={'status': 'ok', 'signal': 'finished', 'asr_results': '我认为跑步最重要的就是给我带来了身体健康'} [2022-04-21 15:59:12,883] [ INFO] - final receive msg={'status': 'ok', 'signal': 'finished', 'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:12,884] [ INFO] - 我认为跑步最重要的就是给我带来了身体健康
``` ```

@ -146,6 +146,6 @@ tar -xvf sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_0.tar.gz
source path.sh source path.sh
# If you have processed the data and get the manifest file you can skip the following 2 steps # If you have processed the data and get the manifest file you can skip the following 2 steps
CUDA_VISIBLE_DEVICES= ./local/test.sh ./data sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_2 conf/ecapa_tdnn.yaml CUDA_VISIBLE_DEVICES= bash ./local/test.sh ./data sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_2/model/ conf/ecapa_tdnn.yaml
``` ```
The performance of the released models are shown in [this](./RESULTS.md) The performance of the released models are shown in [this](./RESULTS.md)

@ -33,10 +33,26 @@ dir=$1
exp_dir=$2 exp_dir=$2
conf_path=$3 conf_path=$3
# get the gpu nums for training
ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo "using $ngpu gpus..."
# setting training device
device="cpu"
if ${use_gpu}; then
device="gpu"
fi
if [ $ngpu -le 0 ]; then
echo "no gpu, training in cpu mode"
device='cpu'
use_gpu=false
fi
if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
# test the model and compute the eer metrics # test the model and compute the eer metrics
python3 ${BIN_DIR}/test.py \ python3 ${BIN_DIR}/test.py \
--data-dir ${dir} \ --data-dir ${dir} \
--load-checkpoint ${exp_dir} \ --load-checkpoint ${exp_dir} \
--config ${conf_path} --config ${conf_path} \
--device ${device}
fi fi

@ -35,7 +35,7 @@ from paddlespeech.server.utils.util import wav2base64
__all__ = [ __all__ = [
'TTSClientExecutor', 'TTSOnlineClientExecutor', 'ASRClientExecutor', 'TTSClientExecutor', 'TTSOnlineClientExecutor', 'ASRClientExecutor',
'CLSClientExecutor' 'ASROnlineClientExecutor', 'CLSClientExecutor'
] ]
@ -370,6 +370,8 @@ class ASRClientExecutor(BaseExecutor):
str: The ASR results str: The ASR results
""" """
# we use the asr server to recognize the audio text content # we use the asr server to recognize the audio text content
# and paddlespeech_client asr only support http protocol
protocol = "http"
if protocol.lower() == "http": if protocol.lower() == "http":
from paddlespeech.server.utils.audio_handler import ASRHttpHandler from paddlespeech.server.utils.audio_handler import ASRHttpHandler
logger.info("asr http client start") logger.info("asr http client start")
@ -377,18 +379,6 @@ class ASRClientExecutor(BaseExecutor):
res = handler.run(input, audio_format, sample_rate, lang) res = handler.run(input, audio_format, sample_rate, lang)
res = res['result']['transcription'] res = res['result']['transcription']
logger.info("asr http client finished") logger.info("asr http client finished")
elif protocol.lower() == "websocket":
logger.info("asr websocket client start")
handler = ASRWsAudioHandler(
server_ip,
port,
punc_server_ip=punc_server_ip,
punc_server_port=punc_server_port)
loop = asyncio.get_event_loop()
res = loop.run_until_complete(handler.run(input))
res = res['result']
logger.info("asr websocket client finished")
else: else:
logger.error(f"Sorry, we have not support protocol: {protocol}," logger.error(f"Sorry, we have not support protocol: {protocol},"
"please use http or websocket protocol") "please use http or websocket protocol")
@ -397,6 +387,77 @@ class ASRClientExecutor(BaseExecutor):
return res return res
@cli_client_register(
name='paddlespeech_client.asr_online',
description='visit asr online service')
class ASROnlineClientExecutor(BaseExecutor):
def __init__(self):
super(ASROnlineClientExecutor, self).__init__()
self.parser = argparse.ArgumentParser(
prog='paddlespeech_client.asr_online', add_help=True)
self.parser.add_argument(
'--server_ip', type=str, default='127.0.0.1', help='server ip')
self.parser.add_argument(
'--port', type=int, default=8091, help='server port')
self.parser.add_argument(
'--input',
type=str,
default=None,
help='Audio file to be recognized',
required=True)
self.parser.add_argument(
'--sample_rate', type=int, default=16000, help='audio sample rate')
self.parser.add_argument(
'--lang', type=str, default="zh_cn", help='language')
self.parser.add_argument(
'--audio_format', type=str, default="wav", help='audio format')
def execute(self, argv: List[str]) -> bool:
args = self.parser.parse_args(argv)
input_ = args.input
server_ip = args.server_ip
port = args.port
sample_rate = args.sample_rate
lang = args.lang
audio_format = args.audio_format
try:
time_start = time.time()
res = self(
input=input_,
server_ip=server_ip,
port=port,
sample_rate=sample_rate,
lang=lang,
audio_format=audio_format)
time_end = time.time()
logger.info(res)
logger.info("Response time %f s." % (time_end - time_start))
return True
except Exception as e:
logger.error("Failed to speech recognition.")
logger.error(e)
return False
@stats_wrapper
def __call__(self,
input: str,
server_ip: str="127.0.0.1",
port: int=8091,
sample_rate: int=16000,
lang: str="zh_cn",
audio_format: str="wav"):
"""
Python API to call an executor.
"""
logger.info("asr websocket client start")
handler = ASRWsAudioHandler(server_ip, port)
loop = asyncio.get_event_loop()
res = loop.run_until_complete(handler.run(input))
logger.info("asr websocket client finished")
return res['result']
@cli_client_register( @cli_client_register(
name='paddlespeech_client.cls', description='visit cls service') name='paddlespeech_client.cls', description='visit cls service')
class CLSClientExecutor(BaseExecutor): class CLSClientExecutor(BaseExecutor):

Loading…
Cancel
Save