Merge pull request #1771 from lym0302/add_streaming_cli

[server] add streaming tts demos
pull/1776/head
TianYuan 2 years ago committed by GitHub
commit f256bb9c0e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -0,0 +1,163 @@
([简体中文](./README_cn.md)|English)
# Streaming Speech Synthesis Service
## Introduction
This demo is an implementation of starting the streaming speech synthesis service and accessing the service. It can be achieved with a single command using `paddlespeech_server` and `paddlespeech_client` or a few lines of code in python.
## Usage
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
It is recommended to use **paddlepaddle 2.2.1** or above.
You can choose one way from meduim and hard to install paddlespeech.
### 2. Prepare config File
The configuration file can be found in `conf/tts_online_application.yaml`
Among them, `protocol` indicates the network protocol used by the streaming TTS service. Currently, both http and websocket are supported.
`engine_list` indicates the speech engine that will be included in the service to be started, in the format of `<speech task>_<engine type>`.
This demo mainly introduces the streaming speech synthesis service, so the speech task should be set to `tts`.
Currently, the engine type supports two forms: **online** and **online-onnx**. `online` indicates an engine that uses python for dynamic graph inference; `online-onnx` indicates an engine that uses onnxruntime for inference. The inference speed of online-onnx is faster.
Streaming TTS AM model support: **fastspeech2 and fastspeech2_cnndecoder**; Voc model support: **hifigan and mb_melgan**
### 3. Server Usage
- Command Line (Recommended)
```bash
# start the service
paddlespeech_server start --config_file ./conf/tts_online_application.yaml
```
Usage:
```bash
paddlespeech_server start --help
```
Arguments:
- `config_file`: yaml file of the app, defalut: ./conf/tts_online_application.yaml
- `log_file`: log file. Default: ./log/paddlespeech.log
Output:
```bash
[2022-04-24 20:05:27,887] [ INFO] - The first response time of the 0 warm up: 1.0123658180236816 s
[2022-04-24 20:05:28,038] [ INFO] - The first response time of the 1 warm up: 0.15108466148376465 s
[2022-04-24 20:05:28,191] [ INFO] - The first response time of the 2 warm up: 0.15317344665527344 s
[2022-04-24 20:05:28,192] [ INFO] - **********************************************************************
INFO: Started server process [14638]
[2022-04-24 20:05:28] [INFO] [server.py:75] Started server process [14638]
INFO: Waiting for application startup.
[2022-04-24 20:05:28] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete.
[2022-04-24 20:05:28] [INFO] [on.py:59] Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
[2022-04-24 20:05:28] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
```
- Python API
```python
from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
server_executor = ServerExecutor()
server_executor(
config_file="./conf/tts_online_application.yaml",
log_file="./log/paddlespeech.log")
```
Output:
```bash
[2022-04-24 21:00:16,934] [ INFO] - The first response time of the 0 warm up: 1.268730878829956 s
[2022-04-24 21:00:17,046] [ INFO] - The first response time of the 1 warm up: 0.11168622970581055 s
[2022-04-24 21:00:17,151] [ INFO] - The first response time of the 2 warm up: 0.10413002967834473 s
[2022-04-24 21:00:17,151] [ INFO] - **********************************************************************
INFO: Started server process [320]
[2022-04-24 21:00:17] [INFO] [server.py:75] Started server process [320]
INFO: Waiting for application startup.
[2022-04-24 21:00:17] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete.
[2022-04-24 21:00:17] [INFO] [on.py:59] Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
[2022-04-24 21:00:17] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
```
### 4. Streaming TTS client Usage
- Command Line (Recommended)
```bash
# Access http streaming TTS service
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
# Access websocket streaming TTS service
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
```
Usage:
```bash
paddlespeech_client tts_online --help
```
Arguments:
- `server_ip`: erver ip. Default: 127.0.0.1
- `port`: server port. Default: 8092
- `protocol`: Service protocol, choices: [http, websocket], default: http.
- `input`: (required): Input text to generate.
- `spk_id`: Speaker id for multi-speaker text to speech. Default: 0
- `speed`: Audio speed, the value should be set between 0 and 3. Default: 1.0
- `volume`: Audio volume, the value should be set between 0 and 3. Default: 1.0
- `sample_rate`: Sampling rate, choices: [0, 8000, 16000], the default is the same as the model. Default: 0
- `output`: Output wave filepath. Default: None, which means not to save the audio to the local.
- `play`: Whether to play audio, play while synthesizing, default value: False, which means not playing. **Playing audio needs to rely on the pyaudio library**.
Output:
```bash
[2022-04-24 21:08:18,559] [ INFO] - tts http client start
[2022-04-24 21:08:21,702] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。
[2022-04-24 21:08:21,703] [ INFO] - 首包响应0.18863153457641602 s
[2022-04-24 21:08:21,704] [ INFO] - 尾包响应3.1427218914031982 s
[2022-04-24 21:08:21,704] [ INFO] - 音频时长3.825 s
[2022-04-24 21:08:21,704] [ INFO] - RTF: 0.8216266382753459
[2022-04-24 21:08:21,739] [ INFO] - 音频保存至output.wav
```
- Python API
```python
from paddlespeech.server.bin.paddlespeech_client import TTSOnlineClientExecutor
import json
executor = TTSOnlineClientExecutor()
executor(
input="您好,欢迎使用百度飞桨语音合成服务。",
server_ip="127.0.0.1",
port=8092,
protocol="http",
spk_id=0,
speed=1.0,
volume=1.0,
sample_rate=0,
output="./output.wav",
play=False)
```
Output:
```bash
[2022-04-24 21:11:13,798] [ INFO] - tts http client start
[2022-04-24 21:11:16,800] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。
[2022-04-24 21:11:16,801] [ INFO] - 首包响应0.18234872817993164 s
[2022-04-24 21:11:16,801] [ INFO] - 尾包响应3.0013909339904785 s
[2022-04-24 21:11:16,802] [ INFO] - 音频时长3.825 s
[2022-04-24 21:11:16,802] [ INFO] - RTF: 0.7846773683635238
[2022-04-24 21:11:16,837] [ INFO] - 音频保存至:./output.wav
```

@ -0,0 +1,162 @@
([简体中文](./README_cn.md)|English)
# 流式语音合成服务
## 介绍
这个demo是一个启动流式语音合成服务和访问该服务的实现。 它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。
## 使用方法
### 1. 安装
请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
推荐使用 **paddlepaddle 2.2.1** 或以上版本。
你可以从 mediumhard 两种方式中选择一种方式安装 PaddleSpeech。
### 2. 准备配置文件
配置文件可参见 `conf/tts_online_application.yaml`
其中,`protocol`表示该流式TTS服务使用的网络协议目前支持 http 和 websocket 两种。
其中,`engine_list`表示即将启动的服务将会包含的语音引擎,格式为 <语音任务>_<引擎类型>。
该demo主要介绍流式语音合成服务因此语音任务应设置为tts。
目前引擎类型支持两种形式:**online** 表示使用python进行动态图推理的引擎**online-onnx** 表示使用onnxruntime进行推理的引擎。其中online-onnx的推理速度更快。
流式TTS的AM 模型支持fastspeech2 以及fastspeech2_cnndecoder; Voc 模型支持hifigan, mb_melgan
### 3. 服务端使用方法
- 命令行 (推荐使用)
```bash
# 启动服务
paddlespeech_server start --config_file ./conf/tts_online_application.yaml
```
使用方法:
```bash
paddlespeech_server start --help
```
参数:
- `config_file`: 服务的配置文件,默认: ./conf/application.yaml
- `log_file`: log 文件. 默认:./log/paddlespeech.log
输出:
```bash
[2022-04-24 20:05:27,887] [ INFO] - The first response time of the 0 warm up: 1.0123658180236816 s
[2022-04-24 20:05:28,038] [ INFO] - The first response time of the 1 warm up: 0.15108466148376465 s
[2022-04-24 20:05:28,191] [ INFO] - The first response time of the 2 warm up: 0.15317344665527344 s
[2022-04-24 20:05:28,192] [ INFO] - **********************************************************************
INFO: Started server process [14638]
[2022-04-24 20:05:28] [INFO] [server.py:75] Started server process [14638]
INFO: Waiting for application startup.
[2022-04-24 20:05:28] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete.
[2022-04-24 20:05:28] [INFO] [on.py:59] Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
[2022-04-24 20:05:28] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
```
- Python API
```python
from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
server_executor = ServerExecutor()
server_executor(
config_file="./conf/tts_online_application.yaml",
log_file="./log/paddlespeech.log")
```
输出:
```bash
[2022-04-24 21:00:16,934] [ INFO] - The first response time of the 0 warm up: 1.268730878829956 s
[2022-04-24 21:00:17,046] [ INFO] - The first response time of the 1 warm up: 0.11168622970581055 s
[2022-04-24 21:00:17,151] [ INFO] - The first response time of the 2 warm up: 0.10413002967834473 s
[2022-04-24 21:00:17,151] [ INFO] - **********************************************************************
INFO: Started server process [320]
[2022-04-24 21:00:17] [INFO] [server.py:75] Started server process [320]
INFO: Waiting for application startup.
[2022-04-24 21:00:17] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete.
[2022-04-24 21:00:17] [INFO] [on.py:59] Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
[2022-04-24 21:00:17] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
```
### 4. 流式TTS 客户端使用方法
- 命令行 (推荐使用)
```bash
# 访问 http 流式TTS服务
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
# 访问 websocket 流式TTS服务
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
```
使用帮助:
```bash
paddlespeech_client tts_online --help
```
参数:
- `server_ip`: 服务端ip地址默认: 127.0.0.1。
- `port`: 服务端口,默认: 8092。
- `protocol`: 服务协议,可选 [http, websocket], 默认: http。
- `input`: (必须输入): 待合成的文本。
- `spk_id`: 说话人 id用于多说话人语音合成默认值 0。
- `speed`: 音频速度,该值应设置在 0 到 3 之间。 默认值1.0
- `volume`: 音频音量,该值应设置在 0 到 3 之间。 默认值: 1.0
- `sample_rate`: 采样率,可选 [0, 8000, 16000]默认值0表示与模型采样率相同
- `output`: 输出音频的路径, 默认值None表示不保存音频到本地。
- `play`: 是否播放音频,边合成边播放, 默认值False表示不播放。**播放音频需要依赖pyaudio库**。
输出:
```bash
[2022-04-24 21:08:18,559] [ INFO] - tts http client start
[2022-04-24 21:08:21,702] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。
[2022-04-24 21:08:21,703] [ INFO] - 首包响应0.18863153457641602 s
[2022-04-24 21:08:21,704] [ INFO] - 尾包响应3.1427218914031982 s
[2022-04-24 21:08:21,704] [ INFO] - 音频时长3.825 s
[2022-04-24 21:08:21,704] [ INFO] - RTF: 0.8216266382753459
[2022-04-24 21:08:21,739] [ INFO] - 音频保存至output.wav
```
- Python API
```python
from paddlespeech.server.bin.paddlespeech_client import TTSOnlineClientExecutor
import json
executor = TTSOnlineClientExecutor()
executor(
input="您好,欢迎使用百度飞桨语音合成服务。",
server_ip="127.0.0.1",
port=8092,
protocol="http",
spk_id=0,
speed=1.0,
volume=1.0,
sample_rate=0,
output="./output.wav",
play=False)
```
输出:
```bash
[2022-04-24 21:11:13,798] [ INFO] - tts http client start
[2022-04-24 21:11:16,800] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。
[2022-04-24 21:11:16,801] [ INFO] - 首包响应0.18234872817993164 s
[2022-04-24 21:11:16,801] [ INFO] - 尾包响应3.0013909339904785 s
[2022-04-24 21:11:16,802] [ INFO] - 音频时长3.825 s
[2022-04-24 21:11:16,802] [ INFO] - RTF: 0.7846773683635238
[2022-04-24 21:11:16,837] [ INFO] - 音频保存至:./output.wav
```

@ -0,0 +1,88 @@
# This is the parameter configuration file for PaddleSpeech Serving.
#################################################################################
# SERVER SETTING #
#################################################################################
host: 127.0.0.1
port: 8092
# The task format in the engin_list is: <speech task>_<engine type>
# engine_list choices = ['tts_online', 'tts_online-onnx']
# protocol = ['websocket', 'http'] (only one can be selected).
protocol: 'http'
engine_list: ['tts_online-onnx']
#################################################################################
# ENGINE CONFIG #
#################################################################################
################################### TTS #########################################
################### speech task: tts; engine_type: online #######################
tts_online:
# am (acoustic model) choices=['fastspeech2_csmsc', 'fastspeech2_cnndecoder_csmsc']
am: 'fastspeech2_csmsc'
am_config:
am_ckpt:
am_stat:
phones_dict:
tones_dict:
speaker_dict:
spk_id: 0
# voc (vocoder) choices=['mb_melgan_csmsc, hifigan_csmsc']
voc: 'mb_melgan_csmsc'
voc_config:
voc_ckpt:
voc_stat:
# others
lang: 'zh'
device: 'cpu' # set 'gpu:id' or 'cpu'
am_block: 42
am_pad: 12
voc_block: 14
voc_pad: 14
#################################################################################
# ENGINE CONFIG #
#################################################################################
################################### TTS #########################################
################### speech task: tts; engine_type: online-onnx #######################
tts_online-onnx:
# am (acoustic model) choices=['fastspeech2_csmsc_onnx', 'fastspeech2_cnndecoder_csmsc_onnx']
am: 'fastspeech2_cnndecoder_csmsc_onnx'
# am_ckpt is a list, if am is fastspeech2_cnndecoder_csmsc_onnx, am_ckpt = [encoder model, decoder model, postnet model];
# if am is fastspeech2_csmsc_onnx, am_ckpt = [ckpt model];
am_ckpt: # list
am_stat:
phones_dict:
tones_dict:
speaker_dict:
spk_id: 0
am_sample_rate: 24000
am_sess_conf:
device: "cpu" # set 'gpu:id' or 'cpu'
use_trt: False
cpu_threads: 4
# voc (vocoder) choices=['mb_melgan_csmsc_onnx, hifigan_csmsc_onnx']
voc: 'hifigan_csmsc_onnx'
voc_ckpt:
voc_sample_rate: 24000
voc_sess_conf:
device: "cpu" # set 'gpu:id' or 'cpu'
use_trt: False
cpu_threads: 4
# others
lang: 'zh'
am_block: 42
am_pad: 12
voc_block: 14
voc_pad: 14
voc_upsample: 300

@ -0,0 +1,3 @@
#!/bin/bash
# start server
paddlespeech_server start --config_file ./conf/tts_online_application.yaml

@ -0,0 +1,7 @@
#!/bin/bash
# http client test
paddlespeech_client tts --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
# websocket client test
#paddlespeech_client tts --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav

@ -48,3 +48,16 @@ paddlespeech_server start --config_file conf/ws_conformer_application.yaml
``` ```
paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input input_16k.wav paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input input_16k.wav
``` ```
## Online TTS Server
### Lanuch online tts server
```
paddlespeech_server start --config_file conf/tts_online_application.yaml
```
### Access online tts server
```
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --input "您好,欢迎使用百度飞桨深度学习框架!" --output output.wav
```

@ -49,3 +49,17 @@ paddlespeech_server start --config_file conf/ws_conformer_application.yaml
``` ```
paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input zh.wav paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input zh.wav
``` ```
## 流式TTS
### 启动流式语音合成服务
```
paddlespeech_server start --config_file conf/tts_online_application.yaml
```
### 访问流式语音合成服务
```
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --input "您好,欢迎使用百度飞桨深度学习框架!" --output output.wav
```

@ -35,8 +35,8 @@ from paddlespeech.server.utils.audio_process import wav2pcm
from paddlespeech.server.utils.util import wav2base64 from paddlespeech.server.utils.util import wav2base64
__all__ = [ __all__ = [
'TTSClientExecutor', 'ASRClientExecutor', 'ASROnlineClientExecutor', 'TTSClientExecutor', 'TTSOnlineClientExecutor', 'ASRClientExecutor',
'CLSClientExecutor' 'ASROnlineClientExecutor', 'CLSClientExecutor'
] ]
@ -161,6 +161,116 @@ class TTSClientExecutor(BaseExecutor):
return res return res
@cli_client_register(
name='paddlespeech_client.tts_online',
description='visit tts online service')
class TTSOnlineClientExecutor(BaseExecutor):
def __init__(self):
super(TTSOnlineClientExecutor, self).__init__()
self.parser = argparse.ArgumentParser(
prog='paddlespeech_client.tts_online', add_help=True)
self.parser.add_argument(
'--server_ip', type=str, default='127.0.0.1', help='server ip')
self.parser.add_argument(
'--port', type=int, default=8092, help='server port')
self.parser.add_argument(
'--protocol',
type=str,
default="http",
choices=["http", "websocket"],
help='server protocol')
self.parser.add_argument(
'--input',
type=str,
default=None,
help='Text to be synthesized.',
required=True)
self.parser.add_argument(
'--spk_id', type=int, default=0, help='Speaker id')
self.parser.add_argument(
'--speed',
type=float,
default=1.0,
help='Audio speed, the value should be set between 0 and 3')
self.parser.add_argument(
'--volume',
type=float,
default=1.0,
help='Audio volume, the value should be set between 0 and 3')
self.parser.add_argument(
'--sample_rate',
type=int,
default=0,
choices=[0, 8000, 16000],
help='Sampling rate, the default is the same as the model')
self.parser.add_argument(
'--output', type=str, default=None, help='Synthesized audio file')
self.parser.add_argument(
"--play", type=bool, help="whether to play audio", default=False)
def execute(self, argv: List[str]) -> bool:
args = self.parser.parse_args(argv)
input_ = args.input
server_ip = args.server_ip
port = args.port
protocol = args.protocol
spk_id = args.spk_id
speed = args.speed
volume = args.volume
sample_rate = args.sample_rate
output = args.output
play = args.play
try:
res = self(
input=input_,
server_ip=server_ip,
port=port,
protocol=protocol,
spk_id=spk_id,
speed=speed,
volume=volume,
sample_rate=sample_rate,
output=output,
play=play)
return True
except Exception as e:
logger.error("Failed to synthesized audio.")
return False
@stats_wrapper
def __call__(self,
input: str,
server_ip: str="127.0.0.1",
port: int=8092,
protocol: str="http",
spk_id: int=0,
speed: float=1.0,
volume: float=1.0,
sample_rate: int=0,
output: str=None,
play: bool=False):
"""
Python API to call an executor.
"""
if protocol == "http":
logger.info("tts http client start")
from paddlespeech.server.utils.audio_handler import TTSHttpHandler
handler = TTSHttpHandler(server_ip, port, play)
handler.run(input, spk_id, speed, volume, sample_rate, output)
elif protocol == "websocket":
from paddlespeech.server.utils.audio_handler import TTSWsHandler
logger.info("tts websocket client start")
handler = TTSWsHandler(server_ip, port, play)
loop = asyncio.get_event_loop()
loop.run_until_complete(handler.run(input, output))
else:
logger.error("Please set correct protocol, http or websocket")
@cli_client_register( @cli_client_register(
name='paddlespeech_client.asr', description='visit asr service') name='paddlespeech_client.asr', description='visit asr service')
class ASRClientExecutor(BaseExecutor): class ASRClientExecutor(BaseExecutor):

@ -10,7 +10,7 @@ port: 8092
# task choices = ['tts_online', 'tts_online-onnx'] # task choices = ['tts_online', 'tts_online-onnx']
# protocol = ['websocket', 'http'] (only one can be selected). # protocol = ['websocket', 'http'] (only one can be selected).
protocol: 'http' protocol: 'http'
engine_list: ['tts_online'] engine_list: ['tts_online-onnx']
################################################################################# #################################################################################
@ -67,16 +67,16 @@ tts_online-onnx:
am_sess_conf: am_sess_conf:
device: "cpu" # set 'gpu:id' or 'cpu' device: "cpu" # set 'gpu:id' or 'cpu'
use_trt: False use_trt: False
cpu_threads: 1 cpu_threads: 4
# voc (vocoder) choices=['mb_melgan_csmsc_onnx, hifigan_csmsc_onnx'] # voc (vocoder) choices=['mb_melgan_csmsc_onnx, hifigan_csmsc_onnx']
voc: 'mb_melgan_csmsc_onnx' voc: 'hifigan_csmsc_onnx'
voc_ckpt: voc_ckpt:
voc_sample_rate: 24000 voc_sample_rate: 24000
voc_sess_conf: voc_sess_conf:
device: "cpu" # set 'gpu:id' or 'cpu' device: "cpu" # set 'gpu:id' or 'cpu'
use_trt: False use_trt: False
cpu_threads: 1 cpu_threads: 4
# others # others
lang: 'zh' lang: 'zh'

@ -202,7 +202,6 @@ class TTSServerExecutor(TTSExecutor):
""" """
Init model and other resources from a specific path. Init model and other resources from a specific path.
""" """
#import pdb;pdb.set_trace()
if hasattr(self, 'am_inference') and hasattr(self, 'voc_inference'): if hasattr(self, 'am_inference') and hasattr(self, 'voc_inference'):
logger.info('Models had been initialized.') logger.info('Models had been initialized.')
return return
@ -391,8 +390,7 @@ class TTSServerExecutor(TTSExecutor):
# fastspeech2_cnndecoder_csmsc # fastspeech2_cnndecoder_csmsc
elif am == "fastspeech2_cnndecoder_csmsc": elif am == "fastspeech2_cnndecoder_csmsc":
# am # am
orig_hs, h_masks = self.am_inference.encoder_infer( orig_hs = self.am_inference.encoder_infer(part_phone_ids)
part_phone_ids)
# streaming voc chunk info # streaming voc chunk info
mel_len = orig_hs.shape[1] mel_len = orig_hs.shape[1]
@ -404,7 +402,7 @@ class TTSServerExecutor(TTSExecutor):
hss = get_chunks(orig_hs, self.am_block, self.am_pad, "am") hss = get_chunks(orig_hs, self.am_block, self.am_pad, "am")
am_chunk_num = len(hss) am_chunk_num = len(hss)
for i, hs in enumerate(hss): for i, hs in enumerate(hss):
before_outs, _ = self.am_inference.decoder(hs) before_outs = self.am_inference.decoder(hs)
after_outs = before_outs + self.am_inference.postnet( after_outs = before_outs + self.am_inference.postnet(
before_outs.transpose((0, 2, 1))).transpose((0, 2, 1)) before_outs.transpose((0, 2, 1))).transpose((0, 2, 1))
normalized_mel = after_outs[0] normalized_mel = after_outs[0]

@ -1,4 +1,4 @@
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. # Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
@ -12,75 +12,19 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import argparse import argparse
import base64
import json
import os
import time
import requests
from paddlespeech.server.utils.audio_process import pcm2wav
def save_audio(buffer, audio_path) -> bool:
if args.save_path.endswith("pcm"):
with open(args.save_path, "wb") as f:
f.write(buffer)
elif args.save_path.endswith("wav"):
with open("./tmp.pcm", "wb") as f:
f.write(buffer)
pcm2wav("./tmp.pcm", audio_path, channels=1, bits=16, sample_rate=24000)
os.system("rm ./tmp.pcm")
else:
print("Only supports saved audio format is pcm or wav")
return False
return True
def test(args):
params = {
"text": args.text,
"spk_id": args.spk_id,
"speed": args.speed,
"volume": args.volume,
"sample_rate": args.sample_rate,
"save_path": ''
}
buffer = b''
flag = 1
url = "http://" + str(args.server) + ":" + str(
args.port) + "/paddlespeech/streaming/tts"
st = time.time()
html = requests.post(url, json.dumps(params), stream=True)
for chunk in html.iter_content(chunk_size=1024):
chunk = base64.b64decode(chunk) # bytes
if flag:
first_response = time.time() - st
print(f"首包响应:{first_response} s")
flag = 0
buffer += chunk
final_response = time.time() - st
duration = len(buffer) / 2.0 / 24000
print(f"尾包响应:{final_response} s")
print(f"音频时长:{duration} s")
print(f"RTF: {final_response / duration}")
if args.save_path is not None:
if save_audio(buffer, args.save_path):
print("音频保存至:", args.save_path)
from paddlespeech.server.utils.audio_handler import TTSHttpHandler
if __name__ == "__main__": if __name__ == "__main__":
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
parser.add_argument( parser.add_argument(
'--text', "--text",
type=str, type=str,
default="您好,欢迎使用语音合成服务。", help="A sentence to be synthesized",
help='A sentence to be synthesized') default="您好,欢迎使用语音合成服务。")
parser.add_argument(
"--server", type=str, help="server ip", default="127.0.0.1")
parser.add_argument("--port", type=int, help="server port", default=8092)
parser.add_argument('--spk_id', type=int, default=0, help='Speaker id') parser.add_argument('--spk_id', type=int, default=0, help='Speaker id')
parser.add_argument('--speed', type=float, default=1.0, help='Audio speed') parser.add_argument('--speed', type=float, default=1.0, help='Audio speed')
parser.add_argument( parser.add_argument(
@ -89,12 +33,15 @@ if __name__ == "__main__":
'--sample_rate', '--sample_rate',
type=int, type=int,
default=0, default=0,
choices=[0, 8000, 16000],
help='Sampling rate, the default is the same as the model') help='Sampling rate, the default is the same as the model')
parser.add_argument( parser.add_argument(
"--server", type=str, help="server ip", default="127.0.0.1") "--output", type=str, help="save audio path", default=None)
parser.add_argument("--port", type=int, help="server port", default=8092)
parser.add_argument( parser.add_argument(
"--save_path", type=str, help="save audio path", default=None) "--play", type=bool, help="whether to play audio", default=False)
args = parser.parse_args() args = parser.parse_args()
test(args)
print("tts http client start")
handler = TTSHttpHandler(args.server, args.port, args.play)
handler.run(args.text, args.spk_id, args.speed, args.volume,
args.sample_rate, args.output)

@ -1,112 +0,0 @@
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import base64
import json
import threading
import time
import pyaudio
import requests
mutex = threading.Lock()
buffer = b''
p = pyaudio.PyAudio()
stream = p.open(
format=p.get_format_from_width(2), channels=1, rate=24000, output=True)
max_fail = 50
def play_audio():
global stream
global buffer
global max_fail
while True:
if not buffer:
max_fail -= 1
time.sleep(0.05)
if max_fail < 0:
break
mutex.acquire()
stream.write(buffer)
buffer = b''
mutex.release()
def test(args):
global mutex
global buffer
params = {
"text": args.text,
"spk_id": args.spk_id,
"speed": args.speed,
"volume": args.volume,
"sample_rate": args.sample_rate,
"save_path": ''
}
all_bytes = 0.0
t = threading.Thread(target=play_audio)
flag = 1
url = "http://" + str(args.server) + ":" + str(
args.port) + "/paddlespeech/streaming/tts"
st = time.time()
html = requests.post(url, json.dumps(params), stream=True)
for chunk in html.iter_content(chunk_size=1024):
mutex.acquire()
chunk = base64.b64decode(chunk) # bytes
buffer += chunk
mutex.release()
if flag:
first_response = time.time() - st
print(f"首包响应:{first_response} s")
flag = 0
t.start()
all_bytes += len(chunk)
final_response = time.time() - st
duration = all_bytes / 2 / 24000
print(f"尾包响应:{final_response} s")
print(f"音频时长:{duration} s")
print(f"RTF: {final_response / duration}")
t.join()
stream.stop_stream()
stream.close()
p.terminate()
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
'--text',
type=str,
default="您好,欢迎使用语音合成服务。",
help='A sentence to be synthesized')
parser.add_argument('--spk_id', type=int, default=0, help='Speaker id')
parser.add_argument('--speed', type=float, default=1.0, help='Audio speed')
parser.add_argument(
'--volume', type=float, default=1.0, help='Audio volume')
parser.add_argument(
'--sample_rate',
type=int,
default=0,
help='Sampling rate, the default is the same as the model')
parser.add_argument(
"--server", type=str, help="server ip", default="127.0.0.1")
parser.add_argument("--port", type=int, help="server port", default=8092)
args = parser.parse_args()
test(args)

@ -11,92 +11,10 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import _thread as thread
import argparse import argparse
import base64 import asyncio
import json
import ssl
import time
import websocket
flag = 1
st = 0.0
all_bytes = b''
class WsParam(object):
# 初始化
def __init__(self, text, server="127.0.0.1", port=8090):
self.server = server
self.port = port
self.url = "ws://" + self.server + ":" + str(self.port) + "/ws/tts"
self.text = text
# 生成url
def create_url(self):
return self.url
def on_message(ws, message):
global flag
global st
global all_bytes
try:
message = json.loads(message)
audio = message["audio"]
audio = base64.b64decode(audio) # bytes
status = message["status"]
all_bytes += audio
if status == 0:
print("create successfully.")
elif status == 1:
if flag:
print(f"首包响应:{time.time() - st} s")
flag = 0
elif status == 2:
final_response = time.time() - st
duration = len(all_bytes) / 2.0 / 24000
print(f"尾包响应:{final_response} s")
print(f"音频时长:{duration} s")
print(f"RTF: {final_response / duration}")
with open("./out.pcm", "wb") as f:
f.write(all_bytes)
print("ws is closed")
ws.close()
else:
print("infer error")
except Exception as e:
print("receive msg,but parse exception:", e)
# 收到websocket错误的处理
def on_error(ws, error):
print("### error:", error)
# 收到websocket关闭的处理
def on_close(ws):
print("### closed ###")
# 收到websocket连接建立的处理
def on_open(ws):
def run(*args):
global st
text_base64 = str(
base64.b64encode((wsParam.text).encode('utf-8')), "UTF8")
d = {"text": text_base64}
d = json.dumps(d)
print("Start sending text data")
st = time.time()
ws.send(d)
thread.start_new_thread(run, ())
from paddlespeech.server.utils.audio_handler import TTSWsHandler
if __name__ == "__main__": if __name__ == "__main__":
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
@ -108,19 +26,13 @@ if __name__ == "__main__":
parser.add_argument( parser.add_argument(
"--server", type=str, help="server ip", default="127.0.0.1") "--server", type=str, help="server ip", default="127.0.0.1")
parser.add_argument("--port", type=int, help="server port", default=8092) parser.add_argument("--port", type=int, help="server port", default=8092)
parser.add_argument(
"--output", type=str, help="save audio path", default=None)
parser.add_argument(
"--play", type=bool, help="whether to play audio", default=False)
args = parser.parse_args() args = parser.parse_args()
print("***************************************") print("tts websocket client start")
print("Server ip: ", args.server) handler = TTSWsHandler(args.server, args.port, args.play)
print("Server port: ", args.port) loop = asyncio.get_event_loop()
print("Sentence to be synthesized: ", args.text) loop.run_until_complete(handler.run(args.text, args.output))
print("***************************************")
wsParam = WsParam(text=args.text, server=args.server, port=args.port)
websocket.enableTrace(False)
wsUrl = wsParam.create_url()
ws = websocket.WebSocketApp(
wsUrl, on_message=on_message, on_error=on_error, on_close=on_close)
ws.on_open = on_open
ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE})

@ -1,160 +0,0 @@
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import _thread as thread
import argparse
import base64
import json
import ssl
import threading
import time
import pyaudio
import websocket
mutex = threading.Lock()
buffer = b''
p = pyaudio.PyAudio()
stream = p.open(
format=p.get_format_from_width(2), channels=1, rate=24000, output=True)
flag = 1
st = 0.0
all_bytes = 0.0
class WsParam(object):
# 初始化
def __init__(self, text, server="127.0.0.1", port=8090):
self.server = server
self.port = port
self.url = "ws://" + self.server + ":" + str(self.port) + "/ws/tts"
self.text = text
# 生成url
def create_url(self):
return self.url
def play_audio():
global stream
global buffer
while True:
time.sleep(0.05)
if not buffer: # buffer 为空
break
mutex.acquire()
stream.write(buffer)
buffer = b''
mutex.release()
t = threading.Thread(target=play_audio)
def on_message(ws, message):
global flag
global t
global buffer
global st
global all_bytes
try:
message = json.loads(message)
audio = message["audio"]
audio = base64.b64decode(audio) # bytes
status = message["status"]
all_bytes += len(audio)
if status == 0:
print("create successfully.")
elif status == 1:
mutex.acquire()
buffer += audio
mutex.release()
if flag:
print(f"首包响应:{time.time() - st} s")
flag = 0
print("Start playing audio")
t.start()
elif status == 2:
final_response = time.time() - st
duration = all_bytes / 2 / 24000
print(f"尾包响应:{final_response} s")
print(f"音频时长:{duration} s")
print(f"RTF: {final_response / duration}")
print("ws is closed")
ws.close()
else:
print("infer error")
except Exception as e:
print("receive msg,but parse exception:", e)
# 收到websocket错误的处理
def on_error(ws, error):
print("### error:", error)
# 收到websocket关闭的处理
def on_close(ws):
print("### closed ###")
# 收到websocket连接建立的处理
def on_open(ws):
def run(*args):
global st
text_base64 = str(
base64.b64encode((wsParam.text).encode('utf-8')), "UTF8")
d = {"text": text_base64}
d = json.dumps(d)
print("Start sending text data")
st = time.time()
ws.send(d)
thread.start_new_thread(run, ())
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--text",
type=str,
help="A sentence to be synthesized",
default="您好,欢迎使用语音合成服务。")
parser.add_argument(
"--server", type=str, help="server ip", default="127.0.0.1")
parser.add_argument("--port", type=int, help="server port", default=8092)
args = parser.parse_args()
print("***************************************")
print("Server ip: ", args.server)
print("Server port: ", args.port)
print("Sentence to be synthesized: ", args.text)
print("***************************************")
wsParam = WsParam(text=args.text, server=args.server, port=args.port)
websocket.enableTrace(False)
wsUrl = wsParam.create_url()
ws = websocket.WebSocketApp(
wsUrl, on_message=on_message, on_error=on_error, on_close=on_close)
ws.on_open = on_open
ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE})
t.join()
print("End of playing audio")
stream.stop_stream()
stream.close()
p.terminate()

@ -11,14 +11,19 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import base64
import json import json
import logging import logging
import threading
import time
import numpy as np import numpy as np
import requests
import soundfile import soundfile
import websockets import websockets
from paddlespeech.cli.log import logger from paddlespeech.cli.log import logger
from paddlespeech.server.utils.audio_process import save_audio
class ASRAudioHandler: class ASRAudioHandler:
@ -117,3 +122,221 @@ class ASRAudioHandler:
logger.info("final receive msg={}".format(msg)) logger.info("final receive msg={}".format(msg))
result = msg result = msg
return result return result
class TTSWsHandler:
def __init__(self, server="127.0.0.1", port=8092, play: bool=False):
"""PaddleSpeech Online TTS Server Client audio handler
Online tts server use the websocket protocal
Args:
server (str, optional): the server ip. Defaults to "127.0.0.1".
port (int, optional): the server port. Defaults to 8092.
play (bool, optional): whether to play audio. Defaults False
"""
self.server = server
self.port = port
self.url = "ws://" + self.server + ":" + str(self.port) + "/ws/tts"
self.play = play
if self.play:
import pyaudio
self.buffer = b''
self.p = pyaudio.PyAudio()
self.stream = self.p.open(
format=self.p.get_format_from_width(2),
channels=1,
rate=24000,
output=True)
self.mutex = threading.Lock()
self.start_play = True
self.t = threading.Thread(target=self.play_audio)
self.max_fail = 50
def play_audio(self):
while True:
if not self.buffer:
self.max_fail -= 1
time.sleep(0.05)
if self.max_fail < 0:
break
self.mutex.acquire()
self.stream.write(self.buffer)
self.buffer = b''
self.mutex.release()
async def run(self, text: str, output: str=None):
"""Send a text to online server
Args:
text (str): sentence to be synthesized
output (str): save audio path
"""
all_bytes = b''
# 1. Send websocket handshake protocal
async with websockets.connect(self.url) as ws:
# 2. Server has already received handshake protocal
# send text to engine
text_base64 = str(base64.b64encode((text).encode('utf-8')), "UTF8")
d = {"text": text_base64}
d = json.dumps(d)
st = time.time()
await ws.send(d)
logging.info("send a message to the server")
# 3. Process the received response
message = await ws.recv()
logger.info(f"句子:{text}")
logger.info(f"首包响应:{time.time() - st} s")
message = json.loads(message)
status = message["status"]
while (status == 1):
audio = message["audio"]
audio = base64.b64decode(audio) # bytes
all_bytes += audio
if self.play:
self.mutex.acquire()
self.buffer += audio
self.mutex.release()
if self.start_play:
self.t.start()
self.start_play = False
message = await ws.recv()
message = json.loads(message)
status = message["status"]
# 4. Last packet, no audio information
if status == 2:
final_response = time.time() - st
duration = len(all_bytes) / 2.0 / 24000
logger.info(f"尾包响应:{final_response} s")
logger.info(f"音频时长:{duration} s")
logger.info(f"RTF: {final_response / duration}")
if output is not None:
if save_audio(all_bytes, output):
logger.info(f"音频保存至:{output}")
else:
logger.error("save audio error")
else:
logger.error("infer error")
if self.play:
self.t.join()
self.stream.stop_stream()
self.stream.close()
self.p.terminate()
class TTSHttpHandler:
def __init__(self, server="127.0.0.1", port=8092, play: bool=False):
"""PaddleSpeech Online TTS Server Client audio handler
Online tts server use the websocket protocal
Args:
server (str, optional): the server ip. Defaults to "127.0.0.1".
port (int, optional): the server port. Defaults to 8092.
play (bool, optional): whether to play audio. Defaults False
"""
self.server = server
self.port = port
self.url = "http://" + str(self.server) + ":" + str(
self.port) + "/paddlespeech/streaming/tts"
self.play = play
if self.play:
import pyaudio
self.buffer = b''
self.p = pyaudio.PyAudio()
self.stream = self.p.open(
format=self.p.get_format_from_width(2),
channels=1,
rate=24000,
output=True)
self.mutex = threading.Lock()
self.start_play = True
self.t = threading.Thread(target=self.play_audio)
self.max_fail = 50
def play_audio(self):
while True:
if not self.buffer:
self.max_fail -= 1
time.sleep(0.05)
if self.max_fail < 0:
break
self.mutex.acquire()
self.stream.write(self.buffer)
self.buffer = b''
self.mutex.release()
def run(self,
text: str,
spk_id=0,
speed=1.0,
volume=1.0,
sample_rate=0,
output: str=None):
"""Send a text to tts online server
Args:
text (str): sentence to be synthesized.
spk_id (int, optional): speaker id. Defaults to 0.
speed (float, optional): audio speed. Defaults to 1.0.
volume (float, optional): audio volume. Defaults to 1.0.
sample_rate (int, optional): audio sample rate, 0 means the same as model. Defaults to 0.
output (str, optional): save audio path. Defaults to None.
"""
# 1. Create request
params = {
"text": text,
"spk_id": spk_id,
"speed": speed,
"volume": volume,
"sample_rate": sample_rate,
"save_path": output
}
all_bytes = b''
first_flag = 1
# 2. Send request
st = time.time()
html = requests.post(self.url, json.dumps(params), stream=True)
# 3. Process the received response
for chunk in html.iter_content(chunk_size=1024):
audio = base64.b64decode(chunk) # bytes
if first_flag:
first_response = time.time() - st
first_flag = 0
if self.play:
self.mutex.acquire()
self.buffer += audio
self.mutex.release()
if self.start_play:
self.t.start()
self.start_play = False
all_bytes += audio
final_response = time.time() - st
duration = len(all_bytes) / 2.0 / 24000
logger.info(f"句子:{text}")
logger.info(f"首包响应:{first_response} s")
logger.info(f"尾包响应:{final_response} s")
logger.info(f"音频时长:{duration} s")
logger.info(f"RTF: {final_response / duration}")
if output is not None:
if save_audio(all_bytes, output):
logger.info(f"音频保存至:{output}")
else:
logger.error("save audio error")
if self.play:
self.t.join()
self.stream.stop_stream()
self.stream.close()
self.p.terminate()

@ -11,6 +11,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import os
import wave import wave
import numpy as np import numpy as np
@ -140,3 +141,35 @@ def pcm2float(data):
bits = np.iinfo(np.int16).bits bits = np.iinfo(np.int16).bits
data = data / (2**(bits - 1)) data = data / (2**(bits - 1))
return data return data
def save_audio(bytes_data, audio_path, sample_rate: int=24000) -> bool:
"""save byte to audio file.
Args:
bytes_data (bytes): audio samples, bytes format
audio_path (str): save audio path
sample_rate (int, optional): audio sample rate. Defaults to 24000.
Returns:
bool: Whether the audio was saved successfully
"""
if audio_path.endswith("pcm"):
with open(audio_path, "wb") as f:
f.write(bubytes_dataffer)
elif audio_path.endswith("wav"):
with open("./tmp.pcm", "wb") as f:
f.write(bytes_data)
pcm2wav(
"./tmp.pcm",
audio_path,
channels=1,
bits=16,
sample_rate=sample_rate)
os.system("rm ./tmp.pcm")
else:
print("Only supports saved audio format is pcm or wav")
return False
return True

@ -67,7 +67,7 @@ tts_online-onnx:
am_sess_conf: am_sess_conf:
device: "cpu" # set 'gpu:id' or 'cpu' device: "cpu" # set 'gpu:id' or 'cpu'
use_trt: False use_trt: False
cpu_threads: 1 cpu_threads: 4
# voc (vocoder) choices=['mb_melgan_csmsc_onnx', 'hifigan_csmsc_onnx'] # voc (vocoder) choices=['mb_melgan_csmsc_onnx', 'hifigan_csmsc_onnx']
voc: 'mb_melgan_csmsc_onnx' voc: 'mb_melgan_csmsc_onnx'
@ -76,7 +76,7 @@ tts_online-onnx:
voc_sess_conf: voc_sess_conf:
device: "cpu" # set 'gpu:id' or 'cpu' device: "cpu" # set 'gpu:id' or 'cpu'
use_trt: False use_trt: False
cpu_threads: 1 cpu_threads: 4
# others # others
lang: 'zh' lang: 'zh'

@ -28,7 +28,7 @@ StartService(){
ClientTest_http(){ ClientTest_http(){
for ((i=1; i<=3;i++)) for ((i=1; i<=3;i++))
do do
python http_client.py --save_path ./out_http.wav paddlespeech_client tts_online --input "您好,欢迎使用百度飞桨深度学习框架。"
((http_test_times+=1)) ((http_test_times+=1))
done done
} }
@ -36,7 +36,7 @@ ClientTest_http(){
ClientTest_ws(){ ClientTest_ws(){
for ((i=1; i<=3;i++)) for ((i=1; i<=3;i++))
do do
python ws_client.py paddlespeech_client tts_online --input "您好,欢迎使用百度飞桨深度学习框架。" --protocol websocket
((ws_test_times+=1)) ((ws_test_times+=1))
done done
} }
@ -71,6 +71,7 @@ rm -rf $log/server.log.wf
rm -rf $log/server.log rm -rf $log/server.log
rm -rf $log/test_result.log rm -rf $log/test_result.log
config_file=./conf/application.yaml config_file=./conf/application.yaml
server_ip=$(cat $config_file | grep "host" | awk -F " " '{print $2}') server_ip=$(cat $config_file | grep "host" | awk -F " " '{print $2}')
port=$(cat $config_file | grep "port" | awk '/port:/ {print $2}') port=$(cat $config_file | grep "port" | awk '/port:/ {print $2}')

@ -3,6 +3,8 @@
log_all_dir=./log log_all_dir=./log
cp ./tts_online_application.yaml ./conf/application.yaml -rf
bash test.sh tts_online $log_all_dir/log_tts_online_cpu bash test.sh tts_online $log_all_dir/log_tts_online_cpu
python change_yaml.py --change_type engine_type --target_key engine_list --target_value tts_online-onnx python change_yaml.py --change_type engine_type --target_key engine_list --target_value tts_online-onnx

@ -67,7 +67,7 @@ tts_online-onnx:
am_sess_conf: am_sess_conf:
device: "cpu" # set 'gpu:id' or 'cpu' device: "cpu" # set 'gpu:id' or 'cpu'
use_trt: False use_trt: False
cpu_threads: 1 cpu_threads: 4
# voc (vocoder) choices=['mb_melgan_csmsc_onnx', 'hifigan_csmsc_onnx'] # voc (vocoder) choices=['mb_melgan_csmsc_onnx', 'hifigan_csmsc_onnx']
voc: 'mb_melgan_csmsc_onnx' voc: 'mb_melgan_csmsc_onnx'
@ -76,7 +76,7 @@ tts_online-onnx:
voc_sess_conf: voc_sess_conf:
device: "cpu" # set 'gpu:id' or 'cpu' device: "cpu" # set 'gpu:id' or 'cpu'
use_trt: False use_trt: False
cpu_threads: 1 cpu_threads: 4
# others # others
lang: 'zh' lang: 'zh'

Loading…
Cancel
Save