@ -0,0 +1,246 @@
|
||||
([简体中文](./README_cn.md)|English)
|
||||
|
||||
# Speech Server
|
||||
|
||||
## Introduction
|
||||
This demo is an implementation of starting the voice service and accessing the service. It can be achieved with a single command using `paddlespeech_server` and `paddlespeech_client` or a few lines of code in python.
|
||||
|
||||
|
||||
## Usage
|
||||
### 1. Installation
|
||||
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
|
||||
|
||||
It is recommended to use **paddlepaddle 2.2.1** or above.
|
||||
You can choose one way from meduim and hard to install paddlespeech.
|
||||
|
||||
### 2. Prepare config File
|
||||
The configuration file can be found in `conf/application.yaml` .
|
||||
Among them, `engine_list` indicates the speech engine that will be included in the service to be started, in the format of `<speech task>_<engine type>`.
|
||||
At present, the speech tasks integrated by the service include: asr (speech recognition), tts (text to sppech) and cls (audio classification).
|
||||
Currently the engine type supports two forms: python and inference (Paddle Inference)
|
||||
|
||||
|
||||
The input of ASR client demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
|
||||
|
||||
Here are sample files for thisASR client demo that can be downloaded:
|
||||
```bash
|
||||
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
|
||||
```
|
||||
|
||||
### 3. Server Usage
|
||||
- Command Line (Recommended)
|
||||
|
||||
```bash
|
||||
# start the service
|
||||
paddlespeech_server start --config_file ./conf/application.yaml
|
||||
```
|
||||
|
||||
Usage:
|
||||
|
||||
```bash
|
||||
paddlespeech_server start --help
|
||||
```
|
||||
Arguments:
|
||||
- `config_file`: yaml file of the app, defalut: ./conf/application.yaml
|
||||
- `log_file`: log file. Default: ./log/paddlespeech.log
|
||||
|
||||
Output:
|
||||
```bash
|
||||
[2022-02-23 11:17:32] [INFO] [server.py:64] Started server process [6384]
|
||||
INFO: Waiting for application startup.
|
||||
[2022-02-23 11:17:32] [INFO] [on.py:26] Waiting for application startup.
|
||||
INFO: Application startup complete.
|
||||
[2022-02-23 11:17:32] [INFO] [on.py:38] Application startup complete.
|
||||
INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
|
||||
[2022-02-23 11:17:32] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
|
||||
|
||||
```
|
||||
|
||||
- Python API
|
||||
```python
|
||||
from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
|
||||
|
||||
server_executor = ServerExecutor()
|
||||
server_executor(
|
||||
config_file="./conf/application.yaml",
|
||||
log_file="./log/paddlespeech.log")
|
||||
```
|
||||
|
||||
Output:
|
||||
```bash
|
||||
INFO: Started server process [529]
|
||||
[2022-02-23 14:57:56] [INFO] [server.py:64] Started server process [529]
|
||||
INFO: Waiting for application startup.
|
||||
[2022-02-23 14:57:56] [INFO] [on.py:26] Waiting for application startup.
|
||||
INFO: Application startup complete.
|
||||
[2022-02-23 14:57:56] [INFO] [on.py:38] Application startup complete.
|
||||
INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
|
||||
[2022-02-23 14:57:56] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
|
||||
|
||||
```
|
||||
|
||||
|
||||
### 4. ASR Client Usage
|
||||
**Note:** The response time will be slightly longer when using the client for the first time
|
||||
- Command Line (Recommended)
|
||||
```
|
||||
paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
|
||||
```
|
||||
|
||||
Usage:
|
||||
|
||||
```bash
|
||||
paddlespeech_client asr --help
|
||||
```
|
||||
Arguments:
|
||||
- `server_ip`: server ip. Default: 127.0.0.1
|
||||
- `port`: server port. Default: 8090
|
||||
- `input`(required): Audio file to be recognized.
|
||||
- `sample_rate`: Audio ampling rate, default: 16000.
|
||||
- `lang`: Language. Default: "zh_cn".
|
||||
- `audio_format`: Audio format. Default: "wav".
|
||||
|
||||
Output:
|
||||
```bash
|
||||
[2022-02-23 18:11:22,819] [ INFO] - {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}}
|
||||
[2022-02-23 18:11:22,820] [ INFO] - time cost 0.689145 s.
|
||||
|
||||
```
|
||||
|
||||
- Python API
|
||||
```python
|
||||
from paddlespeech.server.bin.paddlespeech_client import ASRClientExecutor
|
||||
import json
|
||||
|
||||
asrclient_executor = ASRClientExecutor()
|
||||
res = asrclient_executor(
|
||||
input="./zh.wav",
|
||||
server_ip="127.0.0.1",
|
||||
port=8090,
|
||||
sample_rate=16000,
|
||||
lang="zh_cn",
|
||||
audio_format="wav")
|
||||
print(res.json())
|
||||
```
|
||||
|
||||
Output:
|
||||
```bash
|
||||
{'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}}
|
||||
```
|
||||
|
||||
### 5. TTS Client Usage
|
||||
**Note:** The response time will be slightly longer when using the client for the first time
|
||||
- Command Line (Recommended)
|
||||
```bash
|
||||
paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
|
||||
```
|
||||
Usage:
|
||||
|
||||
```bash
|
||||
paddlespeech_client tts --help
|
||||
```
|
||||
Arguments:
|
||||
- `server_ip`: server ip. Default: 127.0.0.1
|
||||
- `port`: server port. Default: 8090
|
||||
- `input`(required): Input text to generate.
|
||||
- `spk_id`: Speaker id for multi-speaker text to speech. Default: 0
|
||||
- `speed`: Audio speed, the value should be set between 0 and 3. Default: 1.0
|
||||
- `volume`: Audio volume, the value should be set between 0 and 3. Default: 1.0
|
||||
- `sample_rate`: Sampling rate, choice: [0, 8000, 16000], the default is the same as the model. Default: 0
|
||||
- `output`: Output wave filepath. Default: None, which means not to save the audio to the local.
|
||||
|
||||
Output:
|
||||
```bash
|
||||
[2022-02-23 15:20:37,875] [ INFO] - {'description': 'success.'}
|
||||
[2022-02-23 15:20:37,875] [ INFO] - Save synthesized audio successfully on output.wav.
|
||||
[2022-02-23 15:20:37,875] [ INFO] - Audio duration: 3.612500 s.
|
||||
[2022-02-23 15:20:37,875] [ INFO] - Response time: 0.348050 s.
|
||||
|
||||
```
|
||||
|
||||
- Python API
|
||||
```python
|
||||
from paddlespeech.server.bin.paddlespeech_client import TTSClientExecutor
|
||||
import json
|
||||
|
||||
ttsclient_executor = TTSClientExecutor()
|
||||
res = ttsclient_executor(
|
||||
input="您好,欢迎使用百度飞桨语音合成服务。",
|
||||
server_ip="127.0.0.1",
|
||||
port=8090,
|
||||
spk_id=0,
|
||||
speed=1.0,
|
||||
volume=1.0,
|
||||
sample_rate=0,
|
||||
output="./output.wav")
|
||||
|
||||
response_dict = res.json()
|
||||
print(response_dict["message"])
|
||||
print("Save synthesized audio successfully on %s." % (response_dict['result']['save_path']))
|
||||
print("Audio duration: %f s." %(response_dict['result']['duration']))
|
||||
```
|
||||
|
||||
Output:
|
||||
```bash
|
||||
{'description': 'success.'}
|
||||
Save synthesized audio successfully on ./output.wav.
|
||||
Audio duration: 3.612500 s.
|
||||
|
||||
```
|
||||
|
||||
### 6. CLS Client Usage
|
||||
**Note:** The response time will be slightly longer when using the client for the first time
|
||||
- Command Line (Recommended)
|
||||
```
|
||||
paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
|
||||
```
|
||||
|
||||
Usage:
|
||||
|
||||
```bash
|
||||
paddlespeech_client cls --help
|
||||
```
|
||||
Arguments:
|
||||
- `server_ip`: server ip. Default: 127.0.0.1
|
||||
- `port`: server port. Default: 8090
|
||||
- `input`(required): Audio file to be classified.
|
||||
- `topk`: topk scores of classification result.
|
||||
|
||||
Output:
|
||||
```bash
|
||||
[2022-03-09 20:44:39,974] [ INFO] - {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'topk': 1, 'results': [{'class_name': 'Speech', 'prob': 0.9027184844017029}]}}
|
||||
[2022-03-09 20:44:39,975] [ INFO] - Response time 0.104360 s.
|
||||
|
||||
|
||||
```
|
||||
|
||||
- Python API
|
||||
```python
|
||||
from paddlespeech.server.bin.paddlespeech_client import CLSClientExecutor
|
||||
import json
|
||||
|
||||
clsclient_executor = CLSClientExecutor()
|
||||
res = clsclient_executor(
|
||||
input="./zh.wav",
|
||||
server_ip="127.0.0.1",
|
||||
port=8090,
|
||||
topk=1)
|
||||
print(res.json())
|
||||
```
|
||||
|
||||
Output:
|
||||
```bash
|
||||
{'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'topk': 1, 'results': [{'class_name': 'Speech', 'prob': 0.9027184844017029}]}}
|
||||
|
||||
```
|
||||
|
||||
|
||||
## Models supported by the service
|
||||
### ASR model
|
||||
Get all models supported by the ASR service via `paddlespeech_server stats --task asr`, where static models can be used for paddle inference inference.
|
||||
|
||||
### TTS model
|
||||
Get all models supported by the TTS service via `paddlespeech_server stats --task tts`, where static models can be used for paddle inference inference.
|
||||
|
||||
### CLS model
|
||||
Get all models supported by the CLS service via `paddlespeech_server stats --task cls`, where static models can be used for paddle inference inference.
|
@ -0,0 +1,47 @@
|
||||
# This is the parameter configuration file for PaddleSpeech Serving.
|
||||
|
||||
#################################################################################
|
||||
# SERVER SETTING #
|
||||
#################################################################################
|
||||
host: 0.0.0.0
|
||||
port: 8090
|
||||
|
||||
# The task format in the engin_list is: <speech task>_<engine type>
|
||||
# task choices = ['asr_online', 'tts_online']
|
||||
# protocol = ['websocket', 'http'] (only one can be selected).
|
||||
# websocket only support online engine type.
|
||||
protocol: 'websocket'
|
||||
engine_list: ['asr_online']
|
||||
|
||||
|
||||
#################################################################################
|
||||
# ENGINE CONFIG #
|
||||
#################################################################################
|
||||
|
||||
################################### ASR #########################################
|
||||
################### speech task: asr; engine_type: online #######################
|
||||
asr_online:
|
||||
model_type: 'deepspeech2online_aishell'
|
||||
am_model: # the pdmodel file of am static model [optional]
|
||||
am_params: # the pdiparams file of am static model [optional]
|
||||
lang: 'zh'
|
||||
sample_rate: 16000
|
||||
cfg_path:
|
||||
decode_method:
|
||||
force_yes: True
|
||||
|
||||
am_predictor_conf:
|
||||
device: # set 'gpu:id' or 'cpu'
|
||||
switch_ir_optim: True
|
||||
glog_info: False # True -> print glog
|
||||
summary: True # False -> do not show predictor config
|
||||
|
||||
chunk_buffer_conf:
|
||||
frame_duration_ms: 80
|
||||
shift_ms: 40
|
||||
sample_rate: 16000
|
||||
sample_width: 2
|
||||
window_n: 7 # frame
|
||||
shift_n: 4 # frame
|
||||
window_ms: 20 # ms
|
||||
shift_ms: 10 # ms
|
@ -0,0 +1,45 @@
|
||||
# This is the parameter configuration file for PaddleSpeech Serving.
|
||||
|
||||
#################################################################################
|
||||
# SERVER SETTING #
|
||||
#################################################################################
|
||||
host: 0.0.0.0
|
||||
port: 8090
|
||||
|
||||
# The task format in the engin_list is: <speech task>_<engine type>
|
||||
# task choices = ['asr_online', 'tts_online']
|
||||
# protocol = ['websocket', 'http'] (only one can be selected).
|
||||
# websocket only support online engine type.
|
||||
protocol: 'websocket'
|
||||
engine_list: ['asr_online']
|
||||
|
||||
|
||||
#################################################################################
|
||||
# ENGINE CONFIG #
|
||||
#################################################################################
|
||||
|
||||
################################### ASR #########################################
|
||||
################### speech task: asr; engine_type: online #######################
|
||||
asr_online:
|
||||
model_type: 'conformer_online_multicn'
|
||||
am_model: # the pdmodel file of am static model [optional]
|
||||
am_params: # the pdiparams file of am static model [optional]
|
||||
lang: 'zh'
|
||||
sample_rate: 16000
|
||||
cfg_path:
|
||||
decode_method:
|
||||
force_yes: True
|
||||
device: 'cpu' # cpu or gpu:id
|
||||
am_predictor_conf:
|
||||
device: # set 'gpu:id' or 'cpu'
|
||||
switch_ir_optim: True
|
||||
glog_info: False # True -> print glog
|
||||
summary: True # False -> do not show predictor config
|
||||
|
||||
chunk_buffer_conf:
|
||||
window_n: 7 # frame
|
||||
shift_n: 4 # frame
|
||||
window_ms: 25 # ms
|
||||
shift_ms: 10 # ms
|
||||
sample_rate: 16000
|
||||
sample_width: 2
|
@ -0,0 +1,2 @@
|
||||
# start the streaming asr service
|
||||
paddlespeech_server start --config_file ./conf/ws_conformer_application.yaml
|
@ -0,0 +1,5 @@
|
||||
# download the test wav
|
||||
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
|
||||
|
||||
# read the wav and pass it to service
|
||||
python3 websocket_client.py --wavfile ./zh.wav
|
Before Width: | Height: | Size: 949 KiB After Width: | Height: | Size: 949 KiB |
Before Width: | Height: | Size: 432 KiB After Width: | Height: | Size: 432 KiB |
Before Width: | Height: | Size: 72 KiB After Width: | Height: | Size: 72 KiB |
Before Width: | Height: | Size: 286 KiB After Width: | Height: | Size: 286 KiB |
Before Width: | Height: | Size: 4.2 KiB After Width: | Height: | Size: 4.2 KiB |
@ -0,0 +1,62 @@
|
||||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
#!/usr/bin/python
|
||||
# -*- coding: UTF-8 -*-
|
||||
import argparse
|
||||
import asyncio
|
||||
import codecs
|
||||
import logging
|
||||
import os
|
||||
|
||||
from paddlespeech.cli.log import logger
|
||||
from paddlespeech.server.utils.audio_handler import ASRAudioHandler
|
||||
|
||||
|
||||
def main(args):
|
||||
logger.info("asr websocket client start")
|
||||
handler = ASRAudioHandler("127.0.0.1", 8090)
|
||||
loop = asyncio.get_event_loop()
|
||||
|
||||
# support to process single audio file
|
||||
if args.wavfile and os.path.exists(args.wavfile):
|
||||
logger.info(f"start to process the wavscp: {args.wavfile}")
|
||||
result = loop.run_until_complete(handler.run(args.wavfile))
|
||||
result = result["asr_results"]
|
||||
logger.info(f"asr websocket client finished : {result}")
|
||||
|
||||
# support to process batch audios from wav.scp
|
||||
if args.wavscp and os.path.exists(args.wavscp):
|
||||
logging.info(f"start to process the wavscp: {args.wavscp}")
|
||||
with codecs.open(args.wavscp, 'r', encoding='utf-8') as f,\
|
||||
codecs.open("result.txt", 'w', encoding='utf-8') as w:
|
||||
for line in f:
|
||||
utt_name, utt_path = line.strip().split()
|
||||
result = loop.run_until_complete(handler.run(utt_path))
|
||||
result = result["asr_results"]
|
||||
w.write(f"{utt_name} {result}\n")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
logger.info("Start to do streaming asr client")
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument(
|
||||
"--wavfile",
|
||||
action="store",
|
||||
help="wav file path ",
|
||||
default="./16_audio.wav")
|
||||
parser.add_argument(
|
||||
"--wavscp", type=str, default=None, help="The batch audios dict text")
|
||||
args = parser.parse_args()
|
||||
|
||||
main(args)
|
@ -1,13 +0,0 @@
|
||||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
@ -1,161 +0,0 @@
|
||||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
"""
|
||||
record wave from the mic
|
||||
"""
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
import threading
|
||||
import wave
|
||||
from signal import SIGINT
|
||||
from signal import SIGTERM
|
||||
|
||||
import pyaudio
|
||||
import websockets
|
||||
|
||||
|
||||
class ASRAudioHandler(threading.Thread):
|
||||
def __init__(self, url="127.0.0.1", port=8091):
|
||||
threading.Thread.__init__(self)
|
||||
self.url = url
|
||||
self.port = port
|
||||
self.url = "ws://" + self.url + ":" + str(self.port) + "/ws/asr"
|
||||
self.fileName = "./output.wav"
|
||||
self.chunk = 5120
|
||||
self.format = pyaudio.paInt16
|
||||
self.channels = 1
|
||||
self.rate = 16000
|
||||
self._running = True
|
||||
self._frames = []
|
||||
self.data_backup = []
|
||||
|
||||
def startrecord(self):
|
||||
"""
|
||||
start a new thread to record wave
|
||||
"""
|
||||
threading._start_new_thread(self.recording, ())
|
||||
|
||||
def recording(self):
|
||||
"""
|
||||
recording wave
|
||||
"""
|
||||
self._running = True
|
||||
self._frames = []
|
||||
p = pyaudio.PyAudio()
|
||||
stream = p.open(
|
||||
format=self.format,
|
||||
channels=self.channels,
|
||||
rate=self.rate,
|
||||
input=True,
|
||||
frames_per_buffer=self.chunk)
|
||||
while (self._running):
|
||||
data = stream.read(self.chunk)
|
||||
self._frames.append(data)
|
||||
self.data_backup.append(data)
|
||||
|
||||
stream.stop_stream()
|
||||
stream.close()
|
||||
p.terminate()
|
||||
|
||||
def save(self):
|
||||
"""
|
||||
save wave data
|
||||
"""
|
||||
p = pyaudio.PyAudio()
|
||||
wf = wave.open(self.fileName, 'wb')
|
||||
wf.setnchannels(self.channels)
|
||||
wf.setsampwidth(p.get_sample_size(self.format))
|
||||
wf.setframerate(self.rate)
|
||||
wf.writeframes(b''.join(self.data_backup))
|
||||
wf.close()
|
||||
p.terminate()
|
||||
|
||||
def stoprecord(self):
|
||||
"""
|
||||
stop recording
|
||||
"""
|
||||
self._running = False
|
||||
|
||||
async def run(self):
|
||||
aa = input("是否开始录音? (y/n)")
|
||||
if aa.strip() == "y":
|
||||
self.startrecord()
|
||||
logging.info("*" * 10 + "开始录音,请输入语音")
|
||||
|
||||
async with websockets.connect(self.url) as ws:
|
||||
# 发送开始指令
|
||||
audio_info = json.dumps(
|
||||
{
|
||||
"name": "test.wav",
|
||||
"signal": "start",
|
||||
"nbest": 5
|
||||
},
|
||||
sort_keys=True,
|
||||
indent=4,
|
||||
separators=(',', ': '))
|
||||
await ws.send(audio_info)
|
||||
msg = await ws.recv()
|
||||
logging.info("receive msg={}".format(msg))
|
||||
|
||||
# send bytes data
|
||||
logging.info("结束录音请: Ctrl + c。继续请按回车。")
|
||||
try:
|
||||
while True:
|
||||
while len(self._frames) > 0:
|
||||
await ws.send(self._frames.pop(0))
|
||||
msg = await ws.recv()
|
||||
logging.info("receive msg={}".format(msg))
|
||||
except asyncio.CancelledError:
|
||||
# quit
|
||||
# send finished
|
||||
audio_info = json.dumps(
|
||||
{
|
||||
"name": "test.wav",
|
||||
"signal": "end",
|
||||
"nbest": 5
|
||||
},
|
||||
sort_keys=True,
|
||||
indent=4,
|
||||
separators=(',', ': '))
|
||||
await ws.send(audio_info)
|
||||
msg = await ws.recv()
|
||||
logging.info("receive msg={}".format(msg))
|
||||
|
||||
self.stoprecord()
|
||||
logging.info("*" * 10 + "录音结束")
|
||||
self.save()
|
||||
elif aa.strip() == "n":
|
||||
exit()
|
||||
else:
|
||||
print("无效输入!")
|
||||
exit()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logging.info("asr websocket client start")
|
||||
|
||||
handler = ASRAudioHandler("127.0.0.1", 8091)
|
||||
loop = asyncio.get_event_loop()
|
||||
main_task = asyncio.ensure_future(handler.run())
|
||||
for signal in [SIGINT, SIGTERM]:
|
||||
loop.add_signal_handler(signal, main_task.cancel)
|
||||
try:
|
||||
loop.run_until_complete(main_task)
|
||||
finally:
|
||||
loop.close()
|
||||
|
||||
logging.info("asr websocket client finished")
|
@ -1,139 +0,0 @@
|
||||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
#!/usr/bin/python
|
||||
# -*- coding: UTF-8 -*-
|
||||
import argparse
|
||||
import asyncio
|
||||
import codecs
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
|
||||
import numpy as np
|
||||
import soundfile
|
||||
import websockets
|
||||
|
||||
|
||||
class ASRAudioHandler:
|
||||
def __init__(self, url="127.0.0.1", port=8090):
|
||||
self.url = url
|
||||
self.port = port
|
||||
self.url = "ws://" + self.url + ":" + str(self.port) + "/ws/asr"
|
||||
|
||||
def read_wave(self, wavfile_path: str):
|
||||
samples, sample_rate = soundfile.read(wavfile_path, dtype='int16')
|
||||
x_len = len(samples)
|
||||
|
||||
chunk_size = 85 * 16 #80ms, sample_rate = 16kHz
|
||||
if x_len % chunk_size!= 0:
|
||||
padding_len_x = chunk_size - x_len % chunk_size
|
||||
else:
|
||||
padding_len_x = 0
|
||||
|
||||
padding = np.zeros((padding_len_x), dtype=samples.dtype)
|
||||
padded_x = np.concatenate([samples, padding], axis=0)
|
||||
|
||||
assert (x_len + padding_len_x) % chunk_size == 0
|
||||
num_chunk = (x_len + padding_len_x) / chunk_size
|
||||
num_chunk = int(num_chunk)
|
||||
for i in range(0, num_chunk):
|
||||
start = i * chunk_size
|
||||
end = start + chunk_size
|
||||
x_chunk = padded_x[start:end]
|
||||
yield x_chunk
|
||||
|
||||
async def run(self, wavfile_path: str):
|
||||
logging.info("send a message to the server")
|
||||
# self.read_wave()
|
||||
# send websocket handshake protocal
|
||||
async with websockets.connect(self.url) as ws:
|
||||
# server has already received handshake protocal
|
||||
# client start to send the command
|
||||
audio_info = json.dumps(
|
||||
{
|
||||
"name": "test.wav",
|
||||
"signal": "start",
|
||||
"nbest": 5
|
||||
},
|
||||
sort_keys=True,
|
||||
indent=4,
|
||||
separators=(',', ': '))
|
||||
await ws.send(audio_info)
|
||||
msg = await ws.recv()
|
||||
logging.info("receive msg={}".format(msg))
|
||||
|
||||
# send chunk audio data to engine
|
||||
for chunk_data in self.read_wave(wavfile_path):
|
||||
await ws.send(chunk_data.tobytes())
|
||||
msg = await ws.recv()
|
||||
msg = json.loads(msg)
|
||||
logging.info("receive msg={}".format(msg))
|
||||
|
||||
# finished
|
||||
audio_info = json.dumps(
|
||||
{
|
||||
"name": "test.wav",
|
||||
"signal": "end",
|
||||
"nbest": 5
|
||||
},
|
||||
sort_keys=True,
|
||||
indent=4,
|
||||
separators=(',', ': '))
|
||||
await ws.send(audio_info)
|
||||
msg = await ws.recv()
|
||||
|
||||
# decode the bytes to str
|
||||
msg = json.loads(msg)
|
||||
logging.info("final receive msg={}".format(msg))
|
||||
result = msg
|
||||
return result
|
||||
|
||||
|
||||
def main(args):
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logging.info("asr websocket client start")
|
||||
handler = ASRAudioHandler("127.0.0.1", 8090)
|
||||
loop = asyncio.get_event_loop()
|
||||
|
||||
# support to process single audio file
|
||||
if args.wavfile and os.path.exists(args.wavfile):
|
||||
logging.info(f"start to process the wavscp: {args.wavfile}")
|
||||
result = loop.run_until_complete(handler.run(args.wavfile))
|
||||
result = result["asr_results"]
|
||||
logging.info(f"asr websocket client finished : {result}")
|
||||
|
||||
# support to process batch audios from wav.scp
|
||||
if args.wavscp and os.path.exists(args.wavscp):
|
||||
logging.info(f"start to process the wavscp: {args.wavscp}")
|
||||
with codecs.open(args.wavscp, 'r', encoding='utf-8') as f,\
|
||||
codecs.open("result.txt", 'w', encoding='utf-8') as w:
|
||||
for line in f:
|
||||
utt_name, utt_path = line.strip().split()
|
||||
result = loop.run_until_complete(handler.run(utt_path))
|
||||
result = result["asr_results"]
|
||||
w.write(f"{utt_name} {result}\n")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument(
|
||||
"--wavfile",
|
||||
action="store",
|
||||
help="wav file path ",
|
||||
default="./16_audio.wav")
|
||||
parser.add_argument(
|
||||
"--wavscp", type=str, default=None, help="The batch audios dict text")
|
||||
args = parser.parse_args()
|
||||
|
||||
main(args)
|