diff --git a/demos/streaming_tts_server/README.md b/demos/streaming_tts_server/README.md new file mode 100644 index 00000000..801c4f31 --- /dev/null +++ b/demos/streaming_tts_server/README.md @@ -0,0 +1,163 @@ +([简体中文](./README_cn.md)|English) + +# Streaming Speech Synthesis Service + +## Introduction +This demo is an implementation of starting the streaming speech synthesis service and accessing the service. It can be achieved with a single command using `paddlespeech_server` and `paddlespeech_client` or a few lines of code in python. + + +## Usage +### 1. Installation +see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). + +It is recommended to use **paddlepaddle 2.2.1** or above. +You can choose one way from meduim and hard to install paddlespeech. + + +### 2. Prepare config File +The configuration file can be found in `conf/tts_online_application.yaml` 。 +Among them, `protocol` indicates the network protocol used by the streaming TTS service. Currently, both http and websocket are supported. +`engine_list` indicates the speech engine that will be included in the service to be started, in the format of `_`. +This demo mainly introduces the streaming speech synthesis service, so the speech task should be set to `tts`. +Currently, the engine type supports two forms: **online** and **online-onnx**. `online` indicates an engine that uses python for dynamic graph inference; `online-onnx` indicates an engine that uses onnxruntime for inference. The inference speed of online-onnx is faster. +Streaming TTS AM model support: **fastspeech2 and fastspeech2_cnndecoder**; Voc model support: **hifigan and mb_melgan** + + +### 3. Server Usage +- Command Line (Recommended) + + ```bash + # start the service + paddlespeech_server start --config_file ./conf/tts_online_application.yaml + ``` + + Usage: + + ```bash + paddlespeech_server start --help + ``` + Arguments: + - `config_file`: yaml file of the app, defalut: ./conf/tts_online_application.yaml + - `log_file`: log file. Default: ./log/paddlespeech.log + + Output: + ```bash + [2022-04-24 20:05:27,887] [ INFO] - The first response time of the 0 warm up: 1.0123658180236816 s + [2022-04-24 20:05:28,038] [ INFO] - The first response time of the 1 warm up: 0.15108466148376465 s + [2022-04-24 20:05:28,191] [ INFO] - The first response time of the 2 warm up: 0.15317344665527344 s + [2022-04-24 20:05:28,192] [ INFO] - ********************************************************************** + INFO: Started server process [14638] + [2022-04-24 20:05:28] [INFO] [server.py:75] Started server process [14638] + INFO: Waiting for application startup. + [2022-04-24 20:05:28] [INFO] [on.py:45] Waiting for application startup. + INFO: Application startup complete. + [2022-04-24 20:05:28] [INFO] [on.py:59] Application startup complete. + INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) + [2022-04-24 20:05:28] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) + + ``` + +- Python API + ```python + from paddlespeech.server.bin.paddlespeech_server import ServerExecutor + + server_executor = ServerExecutor() + server_executor( + config_file="./conf/tts_online_application.yaml", + log_file="./log/paddlespeech.log") + ``` + + Output: + ```bash + [2022-04-24 21:00:16,934] [ INFO] - The first response time of the 0 warm up: 1.268730878829956 s + [2022-04-24 21:00:17,046] [ INFO] - The first response time of the 1 warm up: 0.11168622970581055 s + [2022-04-24 21:00:17,151] [ INFO] - The first response time of the 2 warm up: 0.10413002967834473 s + [2022-04-24 21:00:17,151] [ INFO] - ********************************************************************** + INFO: Started server process [320] + [2022-04-24 21:00:17] [INFO] [server.py:75] Started server process [320] + INFO: Waiting for application startup. + [2022-04-24 21:00:17] [INFO] [on.py:45] Waiting for application startup. + INFO: Application startup complete. + [2022-04-24 21:00:17] [INFO] [on.py:59] Application startup complete. + INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) + [2022-04-24 21:00:17] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) + + + ``` + + +### 4. Streaming TTS client Usage +- Command Line (Recommended) + + ```bash + # Access http streaming TTS service + paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav + + # Access websocket streaming TTS service + paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav + ``` + Usage: + + ```bash + paddlespeech_client tts_online --help + ``` + + Arguments: + - `server_ip`: erver ip. Default: 127.0.0.1 + - `port`: server port. Default: 8092 + - `protocol`: Service protocol, choices: [http, websocket], default: http. + - `input`: (required): Input text to generate. + - `spk_id`: Speaker id for multi-speaker text to speech. Default: 0 + - `speed`: Audio speed, the value should be set between 0 and 3. Default: 1.0 + - `volume`: Audio volume, the value should be set between 0 and 3. Default: 1.0 + - `sample_rate`: Sampling rate, choices: [0, 8000, 16000], the default is the same as the model. Default: 0 + - `output`: Output wave filepath. Default: None, which means not to save the audio to the local. + - `play`: Whether to play audio, play while synthesizing, default value: False, which means not playing. **Playing audio needs to rely on the pyaudio library**. + + + Output: + ```bash + [2022-04-24 21:08:18,559] [ INFO] - tts http client start + [2022-04-24 21:08:21,702] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。 + [2022-04-24 21:08:21,703] [ INFO] - 首包响应:0.18863153457641602 s + [2022-04-24 21:08:21,704] [ INFO] - 尾包响应:3.1427218914031982 s + [2022-04-24 21:08:21,704] [ INFO] - 音频时长:3.825 s + [2022-04-24 21:08:21,704] [ INFO] - RTF: 0.8216266382753459 + [2022-04-24 21:08:21,739] [ INFO] - 音频保存至:output.wav + + ``` + +- Python API + ```python + from paddlespeech.server.bin.paddlespeech_client import TTSOnlineClientExecutor + import json + + executor = TTSOnlineClientExecutor() + executor( + input="您好,欢迎使用百度飞桨语音合成服务。", + server_ip="127.0.0.1", + port=8092, + protocol="http", + spk_id=0, + speed=1.0, + volume=1.0, + sample_rate=0, + output="./output.wav", + play=False) + + ``` + + Output: + ```bash + [2022-04-24 21:11:13,798] [ INFO] - tts http client start + [2022-04-24 21:11:16,800] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。 + [2022-04-24 21:11:16,801] [ INFO] - 首包响应:0.18234872817993164 s + [2022-04-24 21:11:16,801] [ INFO] - 尾包响应:3.0013909339904785 s + [2022-04-24 21:11:16,802] [ INFO] - 音频时长:3.825 s + [2022-04-24 21:11:16,802] [ INFO] - RTF: 0.7846773683635238 + [2022-04-24 21:11:16,837] [ INFO] - 音频保存至:./output.wav + + + ``` + + diff --git a/demos/streaming_tts_server/README_cn.md b/demos/streaming_tts_server/README_cn.md new file mode 100644 index 00000000..211dc388 --- /dev/null +++ b/demos/streaming_tts_server/README_cn.md @@ -0,0 +1,162 @@ +([简体中文](./README_cn.md)|English) + +# 流式语音合成服务 + +## 介绍 +这个demo是一个启动流式语音合成服务和访问该服务的实现。 它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。 + + +## 使用方法 +### 1. 安装 +请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). + +推荐使用 **paddlepaddle 2.2.1** 或以上版本。 +你可以从 medium,hard 两种方式中选择一种方式安装 PaddleSpeech。 + + +### 2. 准备配置文件 +配置文件可参见 `conf/tts_online_application.yaml` 。 +其中,`protocol`表示该流式TTS服务使用的网络协议,目前支持 http 和 websocket 两种。 +其中,`engine_list`表示即将启动的服务将会包含的语音引擎,格式为 <语音任务>_<引擎类型>。 +该demo主要介绍流式语音合成服务,因此语音任务应设置为tts。 +目前引擎类型支持两种形式:**online** 表示使用python进行动态图推理的引擎;**online-onnx** 表示使用onnxruntime进行推理的引擎。其中,online-onnx的推理速度更快。 +流式TTS的AM 模型支持:fastspeech2 以及fastspeech2_cnndecoder; Voc 模型支持:hifigan, mb_melgan + +### 3. 服务端使用方法 +- 命令行 (推荐使用) + + ```bash + # 启动服务 + paddlespeech_server start --config_file ./conf/tts_online_application.yaml + ``` + + 使用方法: + + ```bash + paddlespeech_server start --help + ``` + 参数: + - `config_file`: 服务的配置文件,默认: ./conf/application.yaml + - `log_file`: log 文件. 默认:./log/paddlespeech.log + + 输出: + ```bash + [2022-04-24 20:05:27,887] [ INFO] - The first response time of the 0 warm up: 1.0123658180236816 s + [2022-04-24 20:05:28,038] [ INFO] - The first response time of the 1 warm up: 0.15108466148376465 s + [2022-04-24 20:05:28,191] [ INFO] - The first response time of the 2 warm up: 0.15317344665527344 s + [2022-04-24 20:05:28,192] [ INFO] - ********************************************************************** + INFO: Started server process [14638] + [2022-04-24 20:05:28] [INFO] [server.py:75] Started server process [14638] + INFO: Waiting for application startup. + [2022-04-24 20:05:28] [INFO] [on.py:45] Waiting for application startup. + INFO: Application startup complete. + [2022-04-24 20:05:28] [INFO] [on.py:59] Application startup complete. + INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) + [2022-04-24 20:05:28] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) + + ``` + +- Python API + ```python + from paddlespeech.server.bin.paddlespeech_server import ServerExecutor + + server_executor = ServerExecutor() + server_executor( + config_file="./conf/tts_online_application.yaml", + log_file="./log/paddlespeech.log") + ``` + + 输出: + ```bash + [2022-04-24 21:00:16,934] [ INFO] - The first response time of the 0 warm up: 1.268730878829956 s + [2022-04-24 21:00:17,046] [ INFO] - The first response time of the 1 warm up: 0.11168622970581055 s + [2022-04-24 21:00:17,151] [ INFO] - The first response time of the 2 warm up: 0.10413002967834473 s + [2022-04-24 21:00:17,151] [ INFO] - ********************************************************************** + INFO: Started server process [320] + [2022-04-24 21:00:17] [INFO] [server.py:75] Started server process [320] + INFO: Waiting for application startup. + [2022-04-24 21:00:17] [INFO] [on.py:45] Waiting for application startup. + INFO: Application startup complete. + [2022-04-24 21:00:17] [INFO] [on.py:59] Application startup complete. + INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) + [2022-04-24 21:00:17] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit) + + + ``` + + +### 4. 流式TTS 客户端使用方法 +- 命令行 (推荐使用) + + ```bash + # 访问 http 流式TTS服务 + paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav + + # 访问 websocket 流式TTS服务 + paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav + ``` + 使用帮助: + + ```bash + paddlespeech_client tts_online --help + ``` + + 参数: + - `server_ip`: 服务端ip地址,默认: 127.0.0.1。 + - `port`: 服务端口,默认: 8092。 + - `protocol`: 服务协议,可选 [http, websocket], 默认: http。 + - `input`: (必须输入): 待合成的文本。 + - `spk_id`: 说话人 id,用于多说话人语音合成,默认值: 0。 + - `speed`: 音频速度,该值应设置在 0 到 3 之间。 默认值:1.0 + - `volume`: 音频音量,该值应设置在 0 到 3 之间。 默认值: 1.0 + - `sample_rate`: 采样率,可选 [0, 8000, 16000],默认值:0,表示与模型采样率相同 + - `output`: 输出音频的路径, 默认值:None,表示不保存音频到本地。 + - `play`: 是否播放音频,边合成边播放, 默认值:False,表示不播放。**播放音频需要依赖pyaudio库**。 + + + 输出: + ```bash + [2022-04-24 21:08:18,559] [ INFO] - tts http client start + [2022-04-24 21:08:21,702] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。 + [2022-04-24 21:08:21,703] [ INFO] - 首包响应:0.18863153457641602 s + [2022-04-24 21:08:21,704] [ INFO] - 尾包响应:3.1427218914031982 s + [2022-04-24 21:08:21,704] [ INFO] - 音频时长:3.825 s + [2022-04-24 21:08:21,704] [ INFO] - RTF: 0.8216266382753459 + [2022-04-24 21:08:21,739] [ INFO] - 音频保存至:output.wav + + ``` + +- Python API + ```python + from paddlespeech.server.bin.paddlespeech_client import TTSOnlineClientExecutor + import json + + executor = TTSOnlineClientExecutor() + executor( + input="您好,欢迎使用百度飞桨语音合成服务。", + server_ip="127.0.0.1", + port=8092, + protocol="http", + spk_id=0, + speed=1.0, + volume=1.0, + sample_rate=0, + output="./output.wav", + play=False) + + ``` + + 输出: + ```bash + [2022-04-24 21:11:13,798] [ INFO] - tts http client start + [2022-04-24 21:11:16,800] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。 + [2022-04-24 21:11:16,801] [ INFO] - 首包响应:0.18234872817993164 s + [2022-04-24 21:11:16,801] [ INFO] - 尾包响应:3.0013909339904785 s + [2022-04-24 21:11:16,802] [ INFO] - 音频时长:3.825 s + [2022-04-24 21:11:16,802] [ INFO] - RTF: 0.7846773683635238 + [2022-04-24 21:11:16,837] [ INFO] - 音频保存至:./output.wav + + + ``` + + diff --git a/demos/streaming_tts_server/conf/tts_online_application.yaml b/demos/streaming_tts_server/conf/tts_online_application.yaml new file mode 100644 index 00000000..353c3e32 --- /dev/null +++ b/demos/streaming_tts_server/conf/tts_online_application.yaml @@ -0,0 +1,88 @@ +# This is the parameter configuration file for PaddleSpeech Serving. + +################################################################################# +# SERVER SETTING # +################################################################################# +host: 127.0.0.1 +port: 8092 + +# The task format in the engin_list is: _ +# engine_list choices = ['tts_online', 'tts_online-onnx'] +# protocol = ['websocket', 'http'] (only one can be selected). +protocol: 'http' +engine_list: ['tts_online-onnx'] + + +################################################################################# +# ENGINE CONFIG # +################################################################################# + +################################### TTS ######################################### +################### speech task: tts; engine_type: online ####################### +tts_online: + # am (acoustic model) choices=['fastspeech2_csmsc', 'fastspeech2_cnndecoder_csmsc'] + am: 'fastspeech2_csmsc' + am_config: + am_ckpt: + am_stat: + phones_dict: + tones_dict: + speaker_dict: + spk_id: 0 + + # voc (vocoder) choices=['mb_melgan_csmsc, hifigan_csmsc'] + voc: 'mb_melgan_csmsc' + voc_config: + voc_ckpt: + voc_stat: + + # others + lang: 'zh' + device: 'cpu' # set 'gpu:id' or 'cpu' + am_block: 42 + am_pad: 12 + voc_block: 14 + voc_pad: 14 + + + +################################################################################# +# ENGINE CONFIG # +################################################################################# + +################################### TTS ######################################### +################### speech task: tts; engine_type: online-onnx ####################### +tts_online-onnx: + # am (acoustic model) choices=['fastspeech2_csmsc_onnx', 'fastspeech2_cnndecoder_csmsc_onnx'] + am: 'fastspeech2_cnndecoder_csmsc_onnx' + # am_ckpt is a list, if am is fastspeech2_cnndecoder_csmsc_onnx, am_ckpt = [encoder model, decoder model, postnet model]; + # if am is fastspeech2_csmsc_onnx, am_ckpt = [ckpt model]; + am_ckpt: # list + am_stat: + phones_dict: + tones_dict: + speaker_dict: + spk_id: 0 + am_sample_rate: 24000 + am_sess_conf: + device: "cpu" # set 'gpu:id' or 'cpu' + use_trt: False + cpu_threads: 4 + + # voc (vocoder) choices=['mb_melgan_csmsc_onnx, hifigan_csmsc_onnx'] + voc: 'hifigan_csmsc_onnx' + voc_ckpt: + voc_sample_rate: 24000 + voc_sess_conf: + device: "cpu" # set 'gpu:id' or 'cpu' + use_trt: False + cpu_threads: 4 + + # others + lang: 'zh' + am_block: 42 + am_pad: 12 + voc_block: 14 + voc_pad: 14 + voc_upsample: 300 + diff --git a/demos/streaming_tts_server/start_server.sh b/demos/streaming_tts_server/start_server.sh new file mode 100644 index 00000000..9c71f2fe --- /dev/null +++ b/demos/streaming_tts_server/start_server.sh @@ -0,0 +1,3 @@ +#!/bin/bash +# start server +paddlespeech_server start --config_file ./conf/tts_online_application.yaml \ No newline at end of file diff --git a/demos/streaming_tts_server/test_client.sh b/demos/streaming_tts_server/test_client.sh new file mode 100644 index 00000000..333ae00d --- /dev/null +++ b/demos/streaming_tts_server/test_client.sh @@ -0,0 +1,7 @@ +#!/bin/bash + +# http client test +paddlespeech_client tts --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav + +# websocket client test +#paddlespeech_client tts --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav \ No newline at end of file diff --git a/paddlespeech/server/README.md b/paddlespeech/server/README.md index 8f140e4e..98ec1e28 100644 --- a/paddlespeech/server/README.md +++ b/paddlespeech/server/README.md @@ -48,3 +48,16 @@ paddlespeech_server start --config_file conf/ws_conformer_application.yaml ``` paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input input_16k.wav ``` + +## Online TTS Server + +### Lanuch online tts server +``` +paddlespeech_server start --config_file conf/tts_online_application.yaml +``` + +### Access online tts server + +``` +paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --input "您好,欢迎使用百度飞桨深度学习框架!" --output output.wav +``` diff --git a/paddlespeech/server/README_cn.md b/paddlespeech/server/README_cn.md index 91df9817..e799bca8 100644 --- a/paddlespeech/server/README_cn.md +++ b/paddlespeech/server/README_cn.md @@ -49,3 +49,17 @@ paddlespeech_server start --config_file conf/ws_conformer_application.yaml ``` paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input zh.wav ``` + +## 流式TTS + +### 启动流式语音合成服务 + +``` +paddlespeech_server start --config_file conf/tts_online_application.yaml +``` + +### 访问流式语音合成服务 + +``` +paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --input "您好,欢迎使用百度飞桨深度学习框架!" --output output.wav +``` diff --git a/paddlespeech/server/bin/paddlespeech_client.py b/paddlespeech/server/bin/paddlespeech_client.py index d7858be6..f006a089 100644 --- a/paddlespeech/server/bin/paddlespeech_client.py +++ b/paddlespeech/server/bin/paddlespeech_client.py @@ -35,8 +35,8 @@ from paddlespeech.server.utils.audio_process import wav2pcm from paddlespeech.server.utils.util import wav2base64 __all__ = [ - 'TTSClientExecutor', 'ASRClientExecutor', 'ASROnlineClientExecutor', - 'CLSClientExecutor' + 'TTSClientExecutor', 'TTSOnlineClientExecutor', 'ASRClientExecutor', + 'ASROnlineClientExecutor', 'CLSClientExecutor' ] @@ -161,6 +161,116 @@ class TTSClientExecutor(BaseExecutor): return res +@cli_client_register( + name='paddlespeech_client.tts_online', + description='visit tts online service') +class TTSOnlineClientExecutor(BaseExecutor): + def __init__(self): + super(TTSOnlineClientExecutor, self).__init__() + self.parser = argparse.ArgumentParser( + prog='paddlespeech_client.tts_online', add_help=True) + self.parser.add_argument( + '--server_ip', type=str, default='127.0.0.1', help='server ip') + self.parser.add_argument( + '--port', type=int, default=8092, help='server port') + self.parser.add_argument( + '--protocol', + type=str, + default="http", + choices=["http", "websocket"], + help='server protocol') + self.parser.add_argument( + '--input', + type=str, + default=None, + help='Text to be synthesized.', + required=True) + self.parser.add_argument( + '--spk_id', type=int, default=0, help='Speaker id') + self.parser.add_argument( + '--speed', + type=float, + default=1.0, + help='Audio speed, the value should be set between 0 and 3') + self.parser.add_argument( + '--volume', + type=float, + default=1.0, + help='Audio volume, the value should be set between 0 and 3') + self.parser.add_argument( + '--sample_rate', + type=int, + default=0, + choices=[0, 8000, 16000], + help='Sampling rate, the default is the same as the model') + self.parser.add_argument( + '--output', type=str, default=None, help='Synthesized audio file') + self.parser.add_argument( + "--play", type=bool, help="whether to play audio", default=False) + + def execute(self, argv: List[str]) -> bool: + args = self.parser.parse_args(argv) + input_ = args.input + server_ip = args.server_ip + port = args.port + protocol = args.protocol + spk_id = args.spk_id + speed = args.speed + volume = args.volume + sample_rate = args.sample_rate + output = args.output + play = args.play + + try: + res = self( + input=input_, + server_ip=server_ip, + port=port, + protocol=protocol, + spk_id=spk_id, + speed=speed, + volume=volume, + sample_rate=sample_rate, + output=output, + play=play) + return True + except Exception as e: + logger.error("Failed to synthesized audio.") + return False + + @stats_wrapper + def __call__(self, + input: str, + server_ip: str="127.0.0.1", + port: int=8092, + protocol: str="http", + spk_id: int=0, + speed: float=1.0, + volume: float=1.0, + sample_rate: int=0, + output: str=None, + play: bool=False): + """ + Python API to call an executor. + """ + + if protocol == "http": + logger.info("tts http client start") + from paddlespeech.server.utils.audio_handler import TTSHttpHandler + handler = TTSHttpHandler(server_ip, port, play) + handler.run(input, spk_id, speed, volume, sample_rate, output) + + elif protocol == "websocket": + from paddlespeech.server.utils.audio_handler import TTSWsHandler + logger.info("tts websocket client start") + handler = TTSWsHandler(server_ip, port, play) + loop = asyncio.get_event_loop() + loop.run_until_complete(handler.run(input, output)) + + else: + logger.error("Please set correct protocol, http or websocket") + + @cli_client_register( name='paddlespeech_client.asr', description='visit asr service') class ASRClientExecutor(BaseExecutor): diff --git a/paddlespeech/server/conf/tts_online_application.yaml b/paddlespeech/server/conf/tts_online_application.yaml index 10abf0d4..6214188d 100644 --- a/paddlespeech/server/conf/tts_online_application.yaml +++ b/paddlespeech/server/conf/tts_online_application.yaml @@ -10,7 +10,7 @@ port: 8092 # task choices = ['tts_online', 'tts_online-onnx'] # protocol = ['websocket', 'http'] (only one can be selected). protocol: 'http' -engine_list: ['tts_online'] +engine_list: ['tts_online-onnx'] ################################################################################# @@ -67,16 +67,16 @@ tts_online-onnx: am_sess_conf: device: "cpu" # set 'gpu:id' or 'cpu' use_trt: False - cpu_threads: 1 + cpu_threads: 4 # voc (vocoder) choices=['mb_melgan_csmsc_onnx, hifigan_csmsc_onnx'] - voc: 'mb_melgan_csmsc_onnx' + voc: 'hifigan_csmsc_onnx' voc_ckpt: voc_sample_rate: 24000 voc_sess_conf: device: "cpu" # set 'gpu:id' or 'cpu' use_trt: False - cpu_threads: 1 + cpu_threads: 4 # others lang: 'zh' diff --git a/paddlespeech/server/engine/tts/online/python/tts_engine.py b/paddlespeech/server/engine/tts/online/python/tts_engine.py index a050a4d4..1f51586b 100644 --- a/paddlespeech/server/engine/tts/online/python/tts_engine.py +++ b/paddlespeech/server/engine/tts/online/python/tts_engine.py @@ -202,7 +202,6 @@ class TTSServerExecutor(TTSExecutor): """ Init model and other resources from a specific path. """ - #import pdb;pdb.set_trace() if hasattr(self, 'am_inference') and hasattr(self, 'voc_inference'): logger.info('Models had been initialized.') return @@ -391,8 +390,7 @@ class TTSServerExecutor(TTSExecutor): # fastspeech2_cnndecoder_csmsc elif am == "fastspeech2_cnndecoder_csmsc": # am - orig_hs, h_masks = self.am_inference.encoder_infer( - part_phone_ids) + orig_hs = self.am_inference.encoder_infer(part_phone_ids) # streaming voc chunk info mel_len = orig_hs.shape[1] @@ -404,7 +402,7 @@ class TTSServerExecutor(TTSExecutor): hss = get_chunks(orig_hs, self.am_block, self.am_pad, "am") am_chunk_num = len(hss) for i, hs in enumerate(hss): - before_outs, _ = self.am_inference.decoder(hs) + before_outs = self.am_inference.decoder(hs) after_outs = before_outs + self.am_inference.postnet( before_outs.transpose((0, 2, 1))).transpose((0, 2, 1)) normalized_mel = after_outs[0] diff --git a/paddlespeech/server/tests/tts/online/http_client.py b/paddlespeech/server/tests/tts/online/http_client.py index cbc1f5c0..756f7b5b 100644 --- a/paddlespeech/server/tests/tts/online/http_client.py +++ b/paddlespeech/server/tests/tts/online/http_client.py @@ -1,4 +1,4 @@ -# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,75 +12,19 @@ # See the License for the specific language governing permissions and # limitations under the License. import argparse -import base64 -import json -import os -import time - -import requests - -from paddlespeech.server.utils.audio_process import pcm2wav - - -def save_audio(buffer, audio_path) -> bool: - if args.save_path.endswith("pcm"): - with open(args.save_path, "wb") as f: - f.write(buffer) - elif args.save_path.endswith("wav"): - with open("./tmp.pcm", "wb") as f: - f.write(buffer) - pcm2wav("./tmp.pcm", audio_path, channels=1, bits=16, sample_rate=24000) - os.system("rm ./tmp.pcm") - else: - print("Only supports saved audio format is pcm or wav") - return False - - return True - - -def test(args): - params = { - "text": args.text, - "spk_id": args.spk_id, - "speed": args.speed, - "volume": args.volume, - "sample_rate": args.sample_rate, - "save_path": '' - } - - buffer = b'' - flag = 1 - url = "http://" + str(args.server) + ":" + str( - args.port) + "/paddlespeech/streaming/tts" - st = time.time() - html = requests.post(url, json.dumps(params), stream=True) - for chunk in html.iter_content(chunk_size=1024): - chunk = base64.b64decode(chunk) # bytes - if flag: - first_response = time.time() - st - print(f"首包响应:{first_response} s") - flag = 0 - buffer += chunk - - final_response = time.time() - st - duration = len(buffer) / 2.0 / 24000 - - print(f"尾包响应:{final_response} s") - print(f"音频时长:{duration} s") - print(f"RTF: {final_response / duration}") - - if args.save_path is not None: - if save_audio(buffer, args.save_path): - print("音频保存至:", args.save_path) +from paddlespeech.server.utils.audio_handler import TTSHttpHandler if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument( - '--text', + "--text", type=str, - default="您好,欢迎使用语音合成服务。", - help='A sentence to be synthesized') + help="A sentence to be synthesized", + default="您好,欢迎使用语音合成服务。") + parser.add_argument( + "--server", type=str, help="server ip", default="127.0.0.1") + parser.add_argument("--port", type=int, help="server port", default=8092) parser.add_argument('--spk_id', type=int, default=0, help='Speaker id') parser.add_argument('--speed', type=float, default=1.0, help='Audio speed') parser.add_argument( @@ -89,12 +33,15 @@ if __name__ == "__main__": '--sample_rate', type=int, default=0, + choices=[0, 8000, 16000], help='Sampling rate, the default is the same as the model') parser.add_argument( - "--server", type=str, help="server ip", default="127.0.0.1") - parser.add_argument("--port", type=int, help="server port", default=8092) + "--output", type=str, help="save audio path", default=None) parser.add_argument( - "--save_path", type=str, help="save audio path", default=None) - + "--play", type=bool, help="whether to play audio", default=False) args = parser.parse_args() - test(args) + + print("tts http client start") + handler = TTSHttpHandler(args.server, args.port, args.play) + handler.run(args.text, args.spk_id, args.speed, args.volume, + args.sample_rate, args.output) diff --git a/paddlespeech/server/tests/tts/online/http_client_playaudio.py b/paddlespeech/server/tests/tts/online/http_client_playaudio.py deleted file mode 100644 index 1e7e8064..00000000 --- a/paddlespeech/server/tests/tts/online/http_client_playaudio.py +++ /dev/null @@ -1,112 +0,0 @@ -# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import argparse -import base64 -import json -import threading -import time - -import pyaudio -import requests - -mutex = threading.Lock() -buffer = b'' -p = pyaudio.PyAudio() -stream = p.open( - format=p.get_format_from_width(2), channels=1, rate=24000, output=True) -max_fail = 50 - - -def play_audio(): - global stream - global buffer - global max_fail - while True: - if not buffer: - max_fail -= 1 - time.sleep(0.05) - if max_fail < 0: - break - mutex.acquire() - stream.write(buffer) - buffer = b'' - mutex.release() - - -def test(args): - global mutex - global buffer - params = { - "text": args.text, - "spk_id": args.spk_id, - "speed": args.speed, - "volume": args.volume, - "sample_rate": args.sample_rate, - "save_path": '' - } - - all_bytes = 0.0 - t = threading.Thread(target=play_audio) - flag = 1 - url = "http://" + str(args.server) + ":" + str( - args.port) + "/paddlespeech/streaming/tts" - st = time.time() - html = requests.post(url, json.dumps(params), stream=True) - for chunk in html.iter_content(chunk_size=1024): - mutex.acquire() - chunk = base64.b64decode(chunk) # bytes - buffer += chunk - mutex.release() - if flag: - first_response = time.time() - st - print(f"首包响应:{first_response} s") - flag = 0 - t.start() - all_bytes += len(chunk) - - final_response = time.time() - st - duration = all_bytes / 2 / 24000 - - print(f"尾包响应:{final_response} s") - print(f"音频时长:{duration} s") - print(f"RTF: {final_response / duration}") - - t.join() - stream.stop_stream() - stream.close() - p.terminate() - - -if __name__ == "__main__": - parser = argparse.ArgumentParser() - parser.add_argument( - '--text', - type=str, - default="您好,欢迎使用语音合成服务。", - help='A sentence to be synthesized') - parser.add_argument('--spk_id', type=int, default=0, help='Speaker id') - parser.add_argument('--speed', type=float, default=1.0, help='Audio speed') - parser.add_argument( - '--volume', type=float, default=1.0, help='Audio volume') - parser.add_argument( - '--sample_rate', - type=int, - default=0, - help='Sampling rate, the default is the same as the model') - parser.add_argument( - "--server", type=str, help="server ip", default="127.0.0.1") - parser.add_argument("--port", type=int, help="server port", default=8092) - - args = parser.parse_args() - test(args) diff --git a/paddlespeech/server/tests/tts/online/ws_client.py b/paddlespeech/server/tests/tts/online/ws_client.py index eef010cf..821d82a9 100644 --- a/paddlespeech/server/tests/tts/online/ws_client.py +++ b/paddlespeech/server/tests/tts/online/ws_client.py @@ -11,92 +11,10 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -import _thread as thread import argparse -import base64 -import json -import ssl -import time - -import websocket - -flag = 1 -st = 0.0 -all_bytes = b'' - - -class WsParam(object): - # 初始化 - def __init__(self, text, server="127.0.0.1", port=8090): - self.server = server - self.port = port - self.url = "ws://" + self.server + ":" + str(self.port) + "/ws/tts" - self.text = text - - # 生成url - def create_url(self): - return self.url - - -def on_message(ws, message): - global flag - global st - global all_bytes - - try: - message = json.loads(message) - audio = message["audio"] - audio = base64.b64decode(audio) # bytes - status = message["status"] - all_bytes += audio - - if status == 0: - print("create successfully.") - elif status == 1: - if flag: - print(f"首包响应:{time.time() - st} s") - flag = 0 - elif status == 2: - final_response = time.time() - st - duration = len(all_bytes) / 2.0 / 24000 - print(f"尾包响应:{final_response} s") - print(f"音频时长:{duration} s") - print(f"RTF: {final_response / duration}") - with open("./out.pcm", "wb") as f: - f.write(all_bytes) - print("ws is closed") - ws.close() - else: - print("infer error") - - except Exception as e: - print("receive msg,but parse exception:", e) - - -# 收到websocket错误的处理 -def on_error(ws, error): - print("### error:", error) - - -# 收到websocket关闭的处理 -def on_close(ws): - print("### closed ###") - - -# 收到websocket连接建立的处理 -def on_open(ws): - def run(*args): - global st - text_base64 = str( - base64.b64encode((wsParam.text).encode('utf-8')), "UTF8") - d = {"text": text_base64} - d = json.dumps(d) - print("Start sending text data") - st = time.time() - ws.send(d) - - thread.start_new_thread(run, ()) +import asyncio +from paddlespeech.server.utils.audio_handler import TTSWsHandler if __name__ == "__main__": parser = argparse.ArgumentParser() @@ -108,19 +26,13 @@ if __name__ == "__main__": parser.add_argument( "--server", type=str, help="server ip", default="127.0.0.1") parser.add_argument("--port", type=int, help="server port", default=8092) + parser.add_argument( + "--output", type=str, help="save audio path", default=None) + parser.add_argument( + "--play", type=bool, help="whether to play audio", default=False) args = parser.parse_args() - print("***************************************") - print("Server ip: ", args.server) - print("Server port: ", args.port) - print("Sentence to be synthesized: ", args.text) - print("***************************************") - - wsParam = WsParam(text=args.text, server=args.server, port=args.port) - - websocket.enableTrace(False) - wsUrl = wsParam.create_url() - ws = websocket.WebSocketApp( - wsUrl, on_message=on_message, on_error=on_error, on_close=on_close) - ws.on_open = on_open - ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE}) + print("tts websocket client start") + handler = TTSWsHandler(args.server, args.port, args.play) + loop = asyncio.get_event_loop() + loop.run_until_complete(handler.run(args.text, args.output)) diff --git a/paddlespeech/server/tests/tts/online/ws_client_playaudio.py b/paddlespeech/server/tests/tts/online/ws_client_playaudio.py deleted file mode 100644 index cdeb362d..00000000 --- a/paddlespeech/server/tests/tts/online/ws_client_playaudio.py +++ /dev/null @@ -1,160 +0,0 @@ -# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import _thread as thread -import argparse -import base64 -import json -import ssl -import threading -import time - -import pyaudio -import websocket - -mutex = threading.Lock() -buffer = b'' -p = pyaudio.PyAudio() -stream = p.open( - format=p.get_format_from_width(2), channels=1, rate=24000, output=True) -flag = 1 -st = 0.0 -all_bytes = 0.0 - - -class WsParam(object): - # 初始化 - def __init__(self, text, server="127.0.0.1", port=8090): - self.server = server - self.port = port - self.url = "ws://" + self.server + ":" + str(self.port) + "/ws/tts" - self.text = text - - # 生成url - def create_url(self): - return self.url - - -def play_audio(): - global stream - global buffer - while True: - time.sleep(0.05) - if not buffer: # buffer 为空 - break - mutex.acquire() - stream.write(buffer) - buffer = b'' - mutex.release() - - -t = threading.Thread(target=play_audio) - - -def on_message(ws, message): - global flag - global t - global buffer - global st - global all_bytes - - try: - message = json.loads(message) - audio = message["audio"] - audio = base64.b64decode(audio) # bytes - status = message["status"] - all_bytes += len(audio) - - if status == 0: - print("create successfully.") - elif status == 1: - mutex.acquire() - buffer += audio - mutex.release() - if flag: - print(f"首包响应:{time.time() - st} s") - flag = 0 - print("Start playing audio") - t.start() - elif status == 2: - final_response = time.time() - st - duration = all_bytes / 2 / 24000 - print(f"尾包响应:{final_response} s") - print(f"音频时长:{duration} s") - print(f"RTF: {final_response / duration}") - print("ws is closed") - ws.close() - else: - print("infer error") - - except Exception as e: - print("receive msg,but parse exception:", e) - - -# 收到websocket错误的处理 -def on_error(ws, error): - print("### error:", error) - - -# 收到websocket关闭的处理 -def on_close(ws): - print("### closed ###") - - -# 收到websocket连接建立的处理 -def on_open(ws): - def run(*args): - global st - text_base64 = str( - base64.b64encode((wsParam.text).encode('utf-8')), "UTF8") - d = {"text": text_base64} - d = json.dumps(d) - print("Start sending text data") - st = time.time() - ws.send(d) - - thread.start_new_thread(run, ()) - - -if __name__ == "__main__": - parser = argparse.ArgumentParser() - parser.add_argument( - "--text", - type=str, - help="A sentence to be synthesized", - default="您好,欢迎使用语音合成服务。") - parser.add_argument( - "--server", type=str, help="server ip", default="127.0.0.1") - parser.add_argument("--port", type=int, help="server port", default=8092) - args = parser.parse_args() - - print("***************************************") - print("Server ip: ", args.server) - print("Server port: ", args.port) - print("Sentence to be synthesized: ", args.text) - print("***************************************") - - wsParam = WsParam(text=args.text, server=args.server, port=args.port) - - websocket.enableTrace(False) - wsUrl = wsParam.create_url() - ws = websocket.WebSocketApp( - wsUrl, on_message=on_message, on_error=on_error, on_close=on_close) - ws.on_open = on_open - ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE}) - - t.join() - print("End of playing audio") - stream.stop_stream() - stream.close() - p.terminate() diff --git a/paddlespeech/server/utils/audio_handler.py b/paddlespeech/server/utils/audio_handler.py index dce7d09d..c2863115 100644 --- a/paddlespeech/server/utils/audio_handler.py +++ b/paddlespeech/server/utils/audio_handler.py @@ -11,14 +11,19 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. +import base64 import json import logging +import threading +import time import numpy as np +import requests import soundfile import websockets from paddlespeech.cli.log import logger +from paddlespeech.server.utils.audio_process import save_audio class ASRAudioHandler: @@ -117,3 +122,221 @@ class ASRAudioHandler: logger.info("final receive msg={}".format(msg)) result = msg return result + + +class TTSWsHandler: + def __init__(self, server="127.0.0.1", port=8092, play: bool=False): + """PaddleSpeech Online TTS Server Client audio handler + Online tts server use the websocket protocal + Args: + server (str, optional): the server ip. Defaults to "127.0.0.1". + port (int, optional): the server port. Defaults to 8092. + play (bool, optional): whether to play audio. Defaults False + """ + self.server = server + self.port = port + self.url = "ws://" + self.server + ":" + str(self.port) + "/ws/tts" + self.play = play + if self.play: + import pyaudio + self.buffer = b'' + self.p = pyaudio.PyAudio() + self.stream = self.p.open( + format=self.p.get_format_from_width(2), + channels=1, + rate=24000, + output=True) + self.mutex = threading.Lock() + self.start_play = True + self.t = threading.Thread(target=self.play_audio) + self.max_fail = 50 + + def play_audio(self): + while True: + if not self.buffer: + self.max_fail -= 1 + time.sleep(0.05) + if self.max_fail < 0: + break + self.mutex.acquire() + self.stream.write(self.buffer) + self.buffer = b'' + self.mutex.release() + + async def run(self, text: str, output: str=None): + """Send a text to online server + + Args: + text (str): sentence to be synthesized + output (str): save audio path + """ + all_bytes = b'' + + # 1. Send websocket handshake protocal + async with websockets.connect(self.url) as ws: + # 2. Server has already received handshake protocal + # send text to engine + text_base64 = str(base64.b64encode((text).encode('utf-8')), "UTF8") + d = {"text": text_base64} + d = json.dumps(d) + st = time.time() + await ws.send(d) + logging.info("send a message to the server") + + # 3. Process the received response + message = await ws.recv() + logger.info(f"句子:{text}") + logger.info(f"首包响应:{time.time() - st} s") + message = json.loads(message) + status = message["status"] + + while (status == 1): + audio = message["audio"] + audio = base64.b64decode(audio) # bytes + all_bytes += audio + if self.play: + self.mutex.acquire() + self.buffer += audio + self.mutex.release() + if self.start_play: + self.t.start() + self.start_play = False + + message = await ws.recv() + message = json.loads(message) + status = message["status"] + + # 4. Last packet, no audio information + if status == 2: + final_response = time.time() - st + duration = len(all_bytes) / 2.0 / 24000 + logger.info(f"尾包响应:{final_response} s") + logger.info(f"音频时长:{duration} s") + logger.info(f"RTF: {final_response / duration}") + + if output is not None: + if save_audio(all_bytes, output): + logger.info(f"音频保存至:{output}") + else: + logger.error("save audio error") + else: + logger.error("infer error") + + if self.play: + self.t.join() + self.stream.stop_stream() + self.stream.close() + self.p.terminate() + + +class TTSHttpHandler: + def __init__(self, server="127.0.0.1", port=8092, play: bool=False): + """PaddleSpeech Online TTS Server Client audio handler + Online tts server use the websocket protocal + Args: + server (str, optional): the server ip. Defaults to "127.0.0.1". + port (int, optional): the server port. Defaults to 8092. + play (bool, optional): whether to play audio. Defaults False + """ + self.server = server + self.port = port + self.url = "http://" + str(self.server) + ":" + str( + self.port) + "/paddlespeech/streaming/tts" + self.play = play + + if self.play: + import pyaudio + self.buffer = b'' + self.p = pyaudio.PyAudio() + self.stream = self.p.open( + format=self.p.get_format_from_width(2), + channels=1, + rate=24000, + output=True) + self.mutex = threading.Lock() + self.start_play = True + self.t = threading.Thread(target=self.play_audio) + self.max_fail = 50 + + def play_audio(self): + while True: + if not self.buffer: + self.max_fail -= 1 + time.sleep(0.05) + if self.max_fail < 0: + break + self.mutex.acquire() + self.stream.write(self.buffer) + self.buffer = b'' + self.mutex.release() + + def run(self, + text: str, + spk_id=0, + speed=1.0, + volume=1.0, + sample_rate=0, + output: str=None): + """Send a text to tts online server + + Args: + text (str): sentence to be synthesized. + spk_id (int, optional): speaker id. Defaults to 0. + speed (float, optional): audio speed. Defaults to 1.0. + volume (float, optional): audio volume. Defaults to 1.0. + sample_rate (int, optional): audio sample rate, 0 means the same as model. Defaults to 0. + output (str, optional): save audio path. Defaults to None. + """ + # 1. Create request + params = { + "text": text, + "spk_id": spk_id, + "speed": speed, + "volume": volume, + "sample_rate": sample_rate, + "save_path": output + } + + all_bytes = b'' + first_flag = 1 + + # 2. Send request + st = time.time() + html = requests.post(self.url, json.dumps(params), stream=True) + + # 3. Process the received response + for chunk in html.iter_content(chunk_size=1024): + audio = base64.b64decode(chunk) # bytes + if first_flag: + first_response = time.time() - st + first_flag = 0 + + if self.play: + self.mutex.acquire() + self.buffer += audio + self.mutex.release() + if self.start_play: + self.t.start() + self.start_play = False + all_bytes += audio + + final_response = time.time() - st + duration = len(all_bytes) / 2.0 / 24000 + + logger.info(f"句子:{text}") + logger.info(f"首包响应:{first_response} s") + logger.info(f"尾包响应:{final_response} s") + logger.info(f"音频时长:{duration} s") + logger.info(f"RTF: {final_response / duration}") + + if output is not None: + if save_audio(all_bytes, output): + logger.info(f"音频保存至:{output}") + else: + logger.error("save audio error") + + if self.play: + self.t.join() + self.stream.stop_stream() + self.stream.close() + self.p.terminate() diff --git a/paddlespeech/server/utils/audio_process.py b/paddlespeech/server/utils/audio_process.py index e85b9a27..c6dad889 100644 --- a/paddlespeech/server/utils/audio_process.py +++ b/paddlespeech/server/utils/audio_process.py @@ -11,6 +11,7 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. +import os import wave import numpy as np @@ -140,3 +141,35 @@ def pcm2float(data): bits = np.iinfo(np.int16).bits data = data / (2**(bits - 1)) return data + + +def save_audio(bytes_data, audio_path, sample_rate: int=24000) -> bool: + """save byte to audio file. + + Args: + bytes_data (bytes): audio samples, bytes format + audio_path (str): save audio path + sample_rate (int, optional): audio sample rate. Defaults to 24000. + + Returns: + bool: Whether the audio was saved successfully + """ + + if audio_path.endswith("pcm"): + with open(audio_path, "wb") as f: + f.write(bubytes_dataffer) + elif audio_path.endswith("wav"): + with open("./tmp.pcm", "wb") as f: + f.write(bytes_data) + pcm2wav( + "./tmp.pcm", + audio_path, + channels=1, + bits=16, + sample_rate=sample_rate) + os.system("rm ./tmp.pcm") + else: + print("Only supports saved audio format is pcm or wav") + return False + + return True diff --git a/tests/unit/server/online/tts/check_server/conf/application.yaml b/tests/unit/server/online/tts/check_server/conf/application.yaml index 347411b6..26cd325b 100644 --- a/tests/unit/server/online/tts/check_server/conf/application.yaml +++ b/tests/unit/server/online/tts/check_server/conf/application.yaml @@ -67,7 +67,7 @@ tts_online-onnx: am_sess_conf: device: "cpu" # set 'gpu:id' or 'cpu' use_trt: False - cpu_threads: 1 + cpu_threads: 4 # voc (vocoder) choices=['mb_melgan_csmsc_onnx', 'hifigan_csmsc_onnx'] voc: 'mb_melgan_csmsc_onnx' @@ -76,7 +76,7 @@ tts_online-onnx: voc_sess_conf: device: "cpu" # set 'gpu:id' or 'cpu' use_trt: False - cpu_threads: 1 + cpu_threads: 4 # others lang: 'zh' diff --git a/tests/unit/server/online/tts/check_server/test.sh b/tests/unit/server/online/tts/check_server/test.sh index 54e274f1..766aea85 100644 --- a/tests/unit/server/online/tts/check_server/test.sh +++ b/tests/unit/server/online/tts/check_server/test.sh @@ -28,7 +28,7 @@ StartService(){ ClientTest_http(){ for ((i=1; i<=3;i++)) do - python http_client.py --save_path ./out_http.wav + paddlespeech_client tts_online --input "您好,欢迎使用百度飞桨深度学习框架。" ((http_test_times+=1)) done } @@ -36,7 +36,7 @@ ClientTest_http(){ ClientTest_ws(){ for ((i=1; i<=3;i++)) do - python ws_client.py + paddlespeech_client tts_online --input "您好,欢迎使用百度飞桨深度学习框架。" --protocol websocket ((ws_test_times+=1)) done } @@ -71,6 +71,7 @@ rm -rf $log/server.log.wf rm -rf $log/server.log rm -rf $log/test_result.log + config_file=./conf/application.yaml server_ip=$(cat $config_file | grep "host" | awk -F " " '{print $2}') port=$(cat $config_file | grep "port" | awk '/port:/ {print $2}') diff --git a/tests/unit/server/online/tts/check_server/test_all.sh b/tests/unit/server/online/tts/check_server/test_all.sh index 8e490255..b2ea6b44 100644 --- a/tests/unit/server/online/tts/check_server/test_all.sh +++ b/tests/unit/server/online/tts/check_server/test_all.sh @@ -3,6 +3,8 @@ log_all_dir=./log +cp ./tts_online_application.yaml ./conf/application.yaml -rf + bash test.sh tts_online $log_all_dir/log_tts_online_cpu python change_yaml.py --change_type engine_type --target_key engine_list --target_value tts_online-onnx diff --git a/tests/unit/server/online/tts/check_server/tts_online_application.yaml b/tests/unit/server/online/tts/check_server/tts_online_application.yaml index 347411b6..26cd325b 100644 --- a/tests/unit/server/online/tts/check_server/tts_online_application.yaml +++ b/tests/unit/server/online/tts/check_server/tts_online_application.yaml @@ -67,7 +67,7 @@ tts_online-onnx: am_sess_conf: device: "cpu" # set 'gpu:id' or 'cpu' use_trt: False - cpu_threads: 1 + cpu_threads: 4 # voc (vocoder) choices=['mb_melgan_csmsc_onnx', 'hifigan_csmsc_onnx'] voc: 'mb_melgan_csmsc_onnx' @@ -76,7 +76,7 @@ tts_online-onnx: voc_sess_conf: device: "cpu" # set 'gpu:id' or 'cpu' use_trt: False - cpu_threads: 1 + cpu_threads: 4 # others lang: 'zh'