Merge pull request #1771 from lym0302/add_streaming_cli

[server] add streaming tts demos
4 years ago · f256bb9c0e
parent 87ef68f127 c00c31594c
commit f256bb9c0e
20 changed files with 859 additions and 455 deletions
--- a/demos/streaming_tts_server/README.md
+++ b/demos/streaming_tts_server/README.md
@ -0,0 +1,163 @@
 ([简体中文](./README_cn.md)|English)
 # Streaming Speech Synthesis Service
 ## Introduction
 This demo is an implementation of starting the streaming speech synthesis service and accessing the service. It can be achieved with a single command using `paddlespeech_server` and `paddlespeech_client` or a few lines of code in python.
 ## Usage
 ### 1. Installation
 see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
 It is recommended to use **paddlepaddle 2.2.1** or above.
 You can choose one way from meduim and hard to install paddlespeech.
 ### 2. Prepare config File
 The configuration file can be found in `conf/tts_online_application.yaml` 。
 Among them, `protocol` indicates the network protocol used by the streaming TTS service. Currently, both http and websocket are supported.
 `engine_list` indicates the speech engine that will be included in the service to be started, in the format of `<speech task>_<engine type>`.
 This demo mainly introduces the streaming speech synthesis service, so the speech task should be set to `tts`.
 Currently, the engine type supports two forms: **online**  and **online-onnx**. `online` indicates an engine that uses python for dynamic graph inference; `online-onnx` indicates an engine that uses onnxruntime for inference. The inference speed of online-onnx is faster.
 Streaming TTS AM model support: **fastspeech2 and fastspeech2_cnndecoder**; Voc model support: **hifigan and mb_melgan**
 ### 3. Server Usage
 - Command Line (Recommended)
  ```bash
  # start the service
  paddlespeech_server start --config_file ./conf/tts_online_application.yaml
  ```
  Usage:
  ```bash
  paddlespeech_server start --help
  ```
  Arguments:
  - `config_file`: yaml file of the app, defalut: ./conf/tts_online_application.yaml
  - `log_file`: log file. Default: ./log/paddlespeech.log
  Output:
  ```bash
  [2022-04-24 20:05:27,887] [    INFO] - The first response time of the 0 warm up: 1.0123658180236816 s
  [2022-04-24 20:05:28,038] [    INFO] - The first response time of the 1 warm up: 0.15108466148376465 s
  [2022-04-24 20:05:28,191] [    INFO] - The first response time of the 2 warm up: 0.15317344665527344 s
  [2022-04-24 20:05:28,192] [    INFO] - **********************************************************************
  INFO:     Started server process [14638]
  [2022-04-24 20:05:28] [INFO] [server.py:75] Started server process [14638]
  INFO:     Waiting for application startup.
  [2022-04-24 20:05:28] [INFO] [on.py:45] Waiting for application startup.
  INFO:     Application startup complete.
  [2022-04-24 20:05:28] [INFO] [on.py:59] Application startup complete.
  INFO:     Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
  [2022-04-24 20:05:28] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
  ```
 - Python API
  ```python
  from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
  server_executor = ServerExecutor()
  server_executor(
      config_file="./conf/tts_online_application.yaml", 
      log_file="./log/paddlespeech.log")
  ```
  Output:
  ```bash
  [2022-04-24 21:00:16,934] [    INFO] - The first response time of the 0 warm up: 1.268730878829956 s
  [2022-04-24 21:00:17,046] [    INFO] - The first response time of the 1 warm up: 0.11168622970581055 s
  [2022-04-24 21:00:17,151] [    INFO] - The first response time of the 2 warm up: 0.10413002967834473 s
  [2022-04-24 21:00:17,151] [    INFO] - **********************************************************************
  INFO:     Started server process [320]
  [2022-04-24 21:00:17] [INFO] [server.py:75] Started server process [320]
  INFO:     Waiting for application startup.
  [2022-04-24 21:00:17] [INFO] [on.py:45] Waiting for application startup.
  INFO:     Application startup complete.
  [2022-04-24 21:00:17] [INFO] [on.py:59] Application startup complete.
  INFO:     Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
  [2022-04-24 21:00:17] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
  ```
 ### 4. Streaming TTS client Usage
 - Command Line (Recommended)
    ```bash
    # Access http streaming TTS service
    paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
    # Access websocket streaming TTS service
    paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
    ```
    Usage:
    ```bash
    paddlespeech_client tts_online --help
    ```
    Arguments:
    - `server_ip`: erver ip. Default: 127.0.0.1
    - `port`: server port. Default: 8092
    - `protocol`: Service protocol, choices: [http, websocket], default: http.
    - `input`: (required): Input text to generate.
    - `spk_id`: Speaker id for multi-speaker text to speech. Default: 0
    - `speed`: Audio speed, the value should be set between 0 and 3. Default: 1.0
    - `volume`: Audio volume, the value should be set between 0 and 3. Default: 1.0
    - `sample_rate`: Sampling rate, choices: [0, 8000, 16000], the default is the same as the model. Default: 0
    - `output`: Output wave filepath. Default: None, which means not to save the audio to the local.
    - `play`: Whether to play audio, play while synthesizing, default value: False, which means not playing. **Playing audio needs to rely on the pyaudio library**.
    Output:
    ```bash
    [2022-04-24 21:08:18,559] [    INFO] - tts http client start
    [2022-04-24 21:08:21,702] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
    [2022-04-24 21:08:21,703] [    INFO] - 首包响应：0.18863153457641602 s
    [2022-04-24 21:08:21,704] [    INFO] - 尾包响应：3.1427218914031982 s
    [2022-04-24 21:08:21,704] [    INFO] - 音频时长：3.825 s
    [2022-04-24 21:08:21,704] [    INFO] - RTF: 0.8216266382753459
    [2022-04-24 21:08:21,739] [    INFO] - 音频保存至：output.wav
    ```
 - Python API
  ```python
  from paddlespeech.server.bin.paddlespeech_client import TTSOnlineClientExecutor
  import json
  executor = TTSOnlineClientExecutor()
  executor(
      input="您好，欢迎使用百度飞桨语音合成服务。",
      server_ip="127.0.0.1",
      port=8092,
      protocol="http",
      spk_id=0,
      speed=1.0,
      volume=1.0,
      sample_rate=0,
      output="./output.wav",
      play=False)
  ```
  Output:
  ```bash
  [2022-04-24 21:11:13,798] [    INFO] - tts http client start
  [2022-04-24 21:11:16,800] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
  [2022-04-24 21:11:16,801] [    INFO] - 首包响应：0.18234872817993164 s
  [2022-04-24 21:11:16,801] [    INFO] - 尾包响应：3.0013909339904785 s
  [2022-04-24 21:11:16,802] [    INFO] - 音频时长：3.825 s
  [2022-04-24 21:11:16,802] [    INFO] - RTF: 0.7846773683635238
  [2022-04-24 21:11:16,837] [    INFO] - 音频保存至：./output.wav
  ```
--- a/demos/streaming_tts_server/README_cn.md
+++ b/demos/streaming_tts_server/README_cn.md
@ -0,0 +1,162 @@
 ([简体中文](./README_cn.md)|English)
 # 流式语音合成服务
 ## 介绍
 这个demo是一个启动流式语音合成服务和访问该服务的实现。 它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。
 ## 使用方法
 ### 1. 安装
 请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
 推荐使用 **paddlepaddle 2.2.1** 或以上版本。
 你可以从 medium，hard 两种方式中选择一种方式安装 PaddleSpeech。
 ### 2. 准备配置文件
 配置文件可参见 `conf/tts_online_application.yaml` 。
 其中，`protocol`表示该流式TTS服务使用的网络协议，目前支持 http 和 websocket 两种。
 其中，`engine_list`表示即将启动的服务将会包含的语音引擎，格式为 <语音任务>_<引擎类型>。
 该demo主要介绍流式语音合成服务，因此语音任务应设置为tts。
 目前引擎类型支持两种形式：**online** 表示使用python进行动态图推理的引擎；**online-onnx** 表示使用onnxruntime进行推理的引擎。其中，online-onnx的推理速度更快。
 流式TTS的AM 模型支持：fastspeech2 以及fastspeech2_cnndecoder; Voc 模型支持：hifigan, mb_melgan
 ### 3. 服务端使用方法
 - 命令行 (推荐使用)
  ```bash
  # 启动服务
  paddlespeech_server start --config_file ./conf/tts_online_application.yaml
  ```
  使用方法：
  ```bash
  paddlespeech_server start --help
  ```
  参数:
  - `config_file`: 服务的配置文件，默认： ./conf/application.yaml
  - `log_file`: log 文件. 默认：./log/paddlespeech.log
  输出:
  ```bash
  [2022-04-24 20:05:27,887] [    INFO] - The first response time of the 0 warm up: 1.0123658180236816 s
  [2022-04-24 20:05:28,038] [    INFO] - The first response time of the 1 warm up: 0.15108466148376465 s
  [2022-04-24 20:05:28,191] [    INFO] - The first response time of the 2 warm up: 0.15317344665527344 s
  [2022-04-24 20:05:28,192] [    INFO] - **********************************************************************
  INFO:     Started server process [14638]
  [2022-04-24 20:05:28] [INFO] [server.py:75] Started server process [14638]
  INFO:     Waiting for application startup.
  [2022-04-24 20:05:28] [INFO] [on.py:45] Waiting for application startup.
  INFO:     Application startup complete.
  [2022-04-24 20:05:28] [INFO] [on.py:59] Application startup complete.
  INFO:     Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
  [2022-04-24 20:05:28] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
  ```
 - Python API
  ```python
  from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
  server_executor = ServerExecutor()
  server_executor(
      config_file="./conf/tts_online_application.yaml", 
      log_file="./log/paddlespeech.log")
  ```
  输出：
  ```bash
  [2022-04-24 21:00:16,934] [    INFO] - The first response time of the 0 warm up: 1.268730878829956 s
  [2022-04-24 21:00:17,046] [    INFO] - The first response time of the 1 warm up: 0.11168622970581055 s
  [2022-04-24 21:00:17,151] [    INFO] - The first response time of the 2 warm up: 0.10413002967834473 s
  [2022-04-24 21:00:17,151] [    INFO] - **********************************************************************
  INFO:     Started server process [320]
  [2022-04-24 21:00:17] [INFO] [server.py:75] Started server process [320]
  INFO:     Waiting for application startup.
  [2022-04-24 21:00:17] [INFO] [on.py:45] Waiting for application startup.
  INFO:     Application startup complete.
  [2022-04-24 21:00:17] [INFO] [on.py:59] Application startup complete.
  INFO:     Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
  [2022-04-24 21:00:17] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
  ```
 ### 4. 流式TTS 客户端使用方法
 - 命令行 (推荐使用)
    ```bash
    # 访问 http 流式TTS服务
    paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
    # 访问 websocket 流式TTS服务
    paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
    ```
    使用帮助:
    ```bash
    paddlespeech_client tts_online --help
    ```
    参数:
    - `server_ip`: 服务端ip地址，默认: 127.0.0.1。
    - `port`: 服务端口，默认: 8092。
    - `protocol`: 服务协议，可选 [http, websocket], 默认: http。
    - `input`: (必须输入): 待合成的文本。
    - `spk_id`: 说话人 id，用于多说话人语音合成，默认值： 0。
    - `speed`: 音频速度，该值应设置在 0 到 3 之间。 默认值：1.0
    - `volume`: 音频音量，该值应设置在 0 到 3 之间。 默认值： 1.0
    - `sample_rate`: 采样率，可选 [0, 8000, 16000]，默认值：0，表示与模型采样率相同
    - `output`: 输出音频的路径， 默认值：None，表示不保存音频到本地。
    - `play`: 是否播放音频，边合成边播放， 默认值：False，表示不播放。**播放音频需要依赖pyaudio库**。
    输出:
    ```bash
    [2022-04-24 21:08:18,559] [    INFO] - tts http client start
    [2022-04-24 21:08:21,702] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
    [2022-04-24 21:08:21,703] [    INFO] - 首包响应：0.18863153457641602 s
    [2022-04-24 21:08:21,704] [    INFO] - 尾包响应：3.1427218914031982 s
    [2022-04-24 21:08:21,704] [    INFO] - 音频时长：3.825 s
    [2022-04-24 21:08:21,704] [    INFO] - RTF: 0.8216266382753459
    [2022-04-24 21:08:21,739] [    INFO] - 音频保存至：output.wav
    ```
 - Python API
  ```python
  from paddlespeech.server.bin.paddlespeech_client import TTSOnlineClientExecutor
  import json
  executor = TTSOnlineClientExecutor()
  executor(
      input="您好，欢迎使用百度飞桨语音合成服务。",
      server_ip="127.0.0.1",
      port=8092,
      protocol="http",
      spk_id=0,
      speed=1.0,
      volume=1.0,
      sample_rate=0,
      output="./output.wav",
      play=False)
  ```
  输出:
  ```bash
  [2022-04-24 21:11:13,798] [    INFO] - tts http client start
  [2022-04-24 21:11:16,800] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
  [2022-04-24 21:11:16,801] [    INFO] - 首包响应：0.18234872817993164 s
  [2022-04-24 21:11:16,801] [    INFO] - 尾包响应：3.0013909339904785 s
  [2022-04-24 21:11:16,802] [    INFO] - 音频时长：3.825 s
  [2022-04-24 21:11:16,802] [    INFO] - RTF: 0.7846773683635238
  [2022-04-24 21:11:16,837] [    INFO] - 音频保存至：./output.wav
  ```
--- a/demos/streaming_tts_server/conf/tts_online_application.yaml
+++ b/demos/streaming_tts_server/conf/tts_online_application.yaml
@ -0,0 +1,88 @@
 # This is the parameter configuration file for PaddleSpeech Serving.
 #################################################################################
 #                             SERVER SETTING                                    #
 #################################################################################
 host: 127.0.0.1
 port: 8092
 # The task format in the engin_list is: <speech task>_<engine type>
 # engine_list choices = ['tts_online', 'tts_online-onnx']
 # protocol = ['websocket', 'http'] (only one can be selected).
 protocol: 'http'
 engine_list: ['tts_online-onnx']
 #################################################################################
 #                                ENGINE CONFIG                                  #
 #################################################################################
 ################################### TTS #########################################
 ################### speech task: tts; engine_type: online #######################
 tts_online: 
    # am (acoustic model) choices=['fastspeech2_csmsc', 'fastspeech2_cnndecoder_csmsc']        
    am: 'fastspeech2_csmsc'   
    am_config: 
    am_ckpt: 
    am_stat: 
    phones_dict: 
    tones_dict: 
    speaker_dict: 
    spk_id: 0
    # voc (vocoder) choices=['mb_melgan_csmsc, hifigan_csmsc']
    voc: 'mb_melgan_csmsc'
    voc_config: 
    voc_ckpt: 
    voc_stat: 
    # others
    lang: 'zh'
    device: 'cpu' # set 'gpu:id' or 'cpu'
    am_block: 42
    am_pad: 12
    voc_block: 14
    voc_pad: 14
 #################################################################################
 #                                ENGINE CONFIG                                  #
 #################################################################################
 ################################### TTS #########################################
 ################### speech task: tts; engine_type: online-onnx #######################
 tts_online-onnx: 
    # am (acoustic model) choices=['fastspeech2_csmsc_onnx', 'fastspeech2_cnndecoder_csmsc_onnx']        
    am: 'fastspeech2_cnndecoder_csmsc_onnx' 
    # am_ckpt is a list, if am is fastspeech2_cnndecoder_csmsc_onnx, am_ckpt = [encoder model, decoder model, postnet model];
    # if am is fastspeech2_csmsc_onnx, am_ckpt = [ckpt model];
    am_ckpt:   # list
    am_stat: 
    phones_dict: 
    tones_dict: 
    speaker_dict: 
    spk_id: 0
    am_sample_rate: 24000
    am_sess_conf:
        device: "cpu" # set 'gpu:id' or 'cpu'
        use_trt: False
        cpu_threads: 4
    # voc (vocoder) choices=['mb_melgan_csmsc_onnx, hifigan_csmsc_onnx']
    voc: 'hifigan_csmsc_onnx'
    voc_ckpt: 
    voc_sample_rate: 24000
    voc_sess_conf:
        device: "cpu" # set 'gpu:id' or 'cpu'
        use_trt: False
        cpu_threads: 4
    # others
    lang: 'zh'
    am_block: 42
    am_pad: 12
    voc_block: 14
    voc_pad: 14
    voc_upsample: 300
--- a/demos/streaming_tts_server/start_server.sh
+++ b/demos/streaming_tts_server/start_server.sh
@ -0,0 +1,3 @@
 #!/bin/bash
 # start server
 paddlespeech_server start --config_file ./conf/tts_online_application.yaml
--- a/demos/streaming_tts_server/test_client.sh
+++ b/demos/streaming_tts_server/test_client.sh
@ -0,0 +1,7 @@
 #!/bin/bash
 # http client test
 paddlespeech_client tts --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
 # websocket client test
 #paddlespeech_client tts --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
--- a/paddlespeech/server/README.md
+++ b/paddlespeech/server/README.md
@ -48,3 +48,16 @@ paddlespeech_server start --config_file conf/ws_conformer_application.yaml
 ```
 paddlespeech_client asr_online  --server_ip 127.0.0.1 --port 8090 --input input_16k.wav
 ```
 ## Online TTS Server
 ### Lanuch online tts server
 ```
 paddlespeech_server start --config_file conf/tts_online_application.yaml
 ```
 ### Access online tts server
 ```
 paddlespeech_client tts_online  --server_ip 127.0.0.1 --port 8092 --input "您好，欢迎使用百度飞桨深度学习框架！" --output output.wav
 ```
--- a/paddlespeech/server/README_cn.md
+++ b/paddlespeech/server/README_cn.md
@ -49,3 +49,17 @@ paddlespeech_server start --config_file conf/ws_conformer_application.yaml
 ```
 paddlespeech_client asr_online  --server_ip 127.0.0.1 --port 8090 --input zh.wav
 ```
 ## 流式TTS
 ### 启动流式语音合成服务
 ```
 paddlespeech_server start --config_file conf/tts_online_application.yaml
 ```
 ### 访问流式语音合成服务
 ```
 paddlespeech_client tts_online  --server_ip 127.0.0.1 --port 8092 --input "您好，欢迎使用百度飞桨深度学习框架！" --output output.wav
 ```
--- a/paddlespeech/server/bin/paddlespeech_client.py
+++ b/paddlespeech/server/bin/paddlespeech_client.py
@ -35,8 +35,8 @@ from paddlespeech.server.utils.audio_process import wav2pcm
 from paddlespeech.server.utils.util import wav2base64
 __all__ = [
-    'TTSClientExecutor', 'ASRClientExecutor', 'ASROnlineClientExecutor',
+    'TTSClientExecutor', 'TTSOnlineClientExecutor', 'ASRClientExecutor',
-    'CLSClientExecutor'
+    'ASROnlineClientExecutor', 'CLSClientExecutor'
 ]
@ -161,6 +161,116 @@ class TTSClientExecutor(BaseExecutor):
        return res
@cli_client_register(
    name='paddlespeech_client.tts_online',
    description='visit tts online service')
 class TTSOnlineClientExecutor(BaseExecutor):
    def __init__(self):
        super(TTSOnlineClientExecutor, self).__init__()
        self.parser = argparse.ArgumentParser(
            prog='paddlespeech_client.tts_online', add_help=True)
        self.parser.add_argument(
            '--server_ip', type=str, default='127.0.0.1', help='server ip')
        self.parser.add_argument(
            '--port', type=int, default=8092, help='server port')
        self.parser.add_argument(
            '--protocol',
            type=str,
            default="http",
            choices=["http", "websocket"],
            help='server protocol')
        self.parser.add_argument(
            '--input',
            type=str,
            default=None,
            help='Text to be synthesized.',
            required=True)
        self.parser.add_argument(
            '--spk_id', type=int, default=0, help='Speaker id')
        self.parser.add_argument(
            '--speed',
            type=float,
            default=1.0,
            help='Audio speed, the value should be set between 0 and 3')
        self.parser.add_argument(
            '--volume',
            type=float,
            default=1.0,
            help='Audio volume, the value should be set between 0 and 3')
        self.parser.add_argument(
            '--sample_rate',
            type=int,
            default=0,
            choices=[0, 8000, 16000],
            help='Sampling rate, the default is the same as the model')
        self.parser.add_argument(
            '--output', type=str, default=None, help='Synthesized audio file')
        self.parser.add_argument(
            "--play", type=bool, help="whether to play audio", default=False)
    def execute(self, argv: List[str]) -> bool:
        args = self.parser.parse_args(argv)
        input_ = args.input
        server_ip = args.server_ip
        port = args.port
        protocol = args.protocol
        spk_id = args.spk_id
        speed = args.speed
        volume = args.volume
        sample_rate = args.sample_rate
        output = args.output
        play = args.play
        try:
            res = self(
                input=input_,
                server_ip=server_ip,
                port=port,
                protocol=protocol,
                spk_id=spk_id,
                speed=speed,
                volume=volume,
                sample_rate=sample_rate,
                output=output,
                play=play)
            return True
        except Exception as e:
            logger.error("Failed to synthesized audio.")
            return False
    @stats_wrapper
    def __call__(self,
                 input: str,
                 server_ip: str="127.0.0.1",
                 port: int=8092,
                 protocol: str="http",
                 spk_id: int=0,
                 speed: float=1.0,
                 volume: float=1.0,
                 sample_rate: int=0,
                 output: str=None,
                 play: bool=False):
        """
        Python API to call an executor.
        """
        if protocol == "http":
            logger.info("tts http client start")
            from paddlespeech.server.utils.audio_handler import TTSHttpHandler
            handler = TTSHttpHandler(server_ip, port, play)
            handler.run(input, spk_id, speed, volume, sample_rate, output)
        elif protocol == "websocket":
            from paddlespeech.server.utils.audio_handler import TTSWsHandler
            logger.info("tts websocket client start")
            handler = TTSWsHandler(server_ip, port, play)
            loop = asyncio.get_event_loop()
            loop.run_until_complete(handler.run(input, output))
        else:
            logger.error("Please set correct protocol, http or websocket")
@cli_client_register(
    name='paddlespeech_client.asr', description='visit asr service')
 class ASRClientExecutor(BaseExecutor):
--- a/paddlespeech/server/conf/tts_online_application.yaml
+++ b/paddlespeech/server/conf/tts_online_application.yaml
@ -10,7 +10,7 @@ port: 8092
 # task choices = ['tts_online', 'tts_online-onnx']
 # protocol = ['websocket', 'http'] (only one can be selected).
 protocol: 'http'
-engine_list: ['tts_online']
+engine_list: ['tts_online-onnx']
 #################################################################################
@ -67,16 +67,16 @@ tts_online-onnx:
    am_sess_conf:
        device: "cpu" # set 'gpu:id' or 'cpu'
        use_trt: False
-        cpu_threads: 1
+        cpu_threads: 4
    # voc (vocoder) choices=['mb_melgan_csmsc_onnx, hifigan_csmsc_onnx']
-    voc: 'mb_melgan_csmsc_onnx'
+    voc: 'hifigan_csmsc_onnx'
    voc_ckpt: 
    voc_sample_rate: 24000
    voc_sess_conf:
        device: "cpu" # set 'gpu:id' or 'cpu'
        use_trt: False
-        cpu_threads: 1
+        cpu_threads: 4
    # others
    lang: 'zh'
--- a/paddlespeech/server/engine/tts/online/python/tts_engine.py
+++ b/paddlespeech/server/engine/tts/online/python/tts_engine.py
@ -202,7 +202,6 @@ class TTSServerExecutor(TTSExecutor):
        """
        Init model and other resources from a specific path.
        """
        #import pdb;pdb.set_trace()
        if hasattr(self, 'am_inference') and hasattr(self, 'voc_inference'):
            logger.info('Models had been initialized.')
            return
@ -391,8 +390,7 @@ class TTSServerExecutor(TTSExecutor):
            # fastspeech2_cnndecoder_csmsc 
            elif am == "fastspeech2_cnndecoder_csmsc":
                # am 
-                orig_hs, h_masks = self.am_inference.encoder_infer(
+                orig_hs = self.am_inference.encoder_infer(part_phone_ids)
                    part_phone_ids)
                # streaming voc chunk info
                mel_len = orig_hs.shape[1]
@ -404,7 +402,7 @@ class TTSServerExecutor(TTSExecutor):
                hss = get_chunks(orig_hs, self.am_block, self.am_pad, "am")
                am_chunk_num = len(hss)
                for i, hs in enumerate(hss):
-                    before_outs, _ = self.am_inference.decoder(hs)
+                    before_outs = self.am_inference.decoder(hs)
                    after_outs = before_outs + self.am_inference.postnet(
                        before_outs.transpose((0, 2, 1))).transpose((0, 2, 1))
                    normalized_mel = after_outs[0]
--- a/paddlespeech/server/tests/tts/online/http_client.py
+++ b/paddlespeech/server/tests/tts/online/http_client.py
@ -1,4 +1,4 @@
-# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@ -12,75 +12,19 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import argparse
 import base64
 import json
 import os
 import time
 import requests
 from paddlespeech.server.utils.audio_process import pcm2wav
 def save_audio(buffer, audio_path) -> bool:
    if args.save_path.endswith("pcm"):
        with open(args.save_path, "wb") as f:
            f.write(buffer)
    elif args.save_path.endswith("wav"):
        with open("./tmp.pcm", "wb") as f:
            f.write(buffer)
        pcm2wav("./tmp.pcm", audio_path, channels=1, bits=16, sample_rate=24000)
        os.system("rm ./tmp.pcm")
    else:
        print("Only supports saved audio format is pcm or wav")
        return False
    return True
 def test(args):
    params = {
        "text": args.text,
        "spk_id": args.spk_id,
        "speed": args.speed,
        "volume": args.volume,
        "sample_rate": args.sample_rate,
        "save_path": ''
    }
    buffer = b''
    flag = 1
    url = "http://" + str(args.server) + ":" + str(
        args.port) + "/paddlespeech/streaming/tts"
    st = time.time()
    html = requests.post(url, json.dumps(params), stream=True)
    for chunk in html.iter_content(chunk_size=1024):
        chunk = base64.b64decode(chunk)  # bytes
        if flag:
            first_response = time.time() - st
            print(f"首包响应：{first_response} s")
            flag = 0
        buffer += chunk
    final_response = time.time() - st
    duration = len(buffer) / 2.0 / 24000
    print(f"尾包响应：{final_response} s")
    print(f"音频时长：{duration} s")
    print(f"RTF: {final_response / duration}")
    if args.save_path is not None:
        if save_audio(buffer, args.save_path):
            print("音频保存至：", args.save_path)
 from paddlespeech.server.utils.audio_handler import TTSHttpHandler
 if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument(
-        '--text',
+        "--text",
        type=str,
-        default="您好，欢迎使用语音合成服务。",
+        help="A sentence to be synthesized",
-        help='A sentence to be synthesized')
+        default="您好，欢迎使用语音合成服务。")
    parser.add_argument(
        "--server", type=str, help="server ip", default="127.0.0.1")
    parser.add_argument("--port", type=int, help="server port", default=8092)
    parser.add_argument('--spk_id', type=int, default=0, help='Speaker id')
    parser.add_argument('--speed', type=float, default=1.0, help='Audio speed')
    parser.add_argument(
@ -89,12 +33,15 @@ if __name__ == "__main__":
        '--sample_rate',
        type=int,
        default=0,
        choices=[0, 8000, 16000],
        help='Sampling rate, the default is the same as the model')
    parser.add_argument(
-        "--server", type=str, help="server ip", default="127.0.0.1")
+        "--output", type=str, help="save audio path", default=None)
    parser.add_argument("--port", type=int, help="server port", default=8092)
    parser.add_argument(
-        "--save_path", type=str, help="save audio path", default=None)
+        "--play", type=bool, help="whether to play audio", default=False)
    args = parser.parse_args()
-    test(args)
+
    print("tts http client start")
    handler = TTSHttpHandler(args.server, args.port, args.play)
    handler.run(args.text, args.spk_id, args.speed, args.volume,
                args.sample_rate, args.output)
--- a/paddlespeech/server/tests/tts/online/http_client_playaudio.py
+++ b/paddlespeech/server/tests/tts/online/http_client_playaudio.py
@ -1,112 +0,0 @@
 # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import argparse
 import base64
 import json
 import threading
 import time
 import pyaudio
 import requests
 mutex = threading.Lock()
 buffer = b''
 p = pyaudio.PyAudio()
 stream = p.open(
    format=p.get_format_from_width(2), channels=1, rate=24000, output=True)
 max_fail = 50
 def play_audio():
    global stream
    global buffer
    global max_fail
    while True:
        if not buffer:
            max_fail -= 1
            time.sleep(0.05)
            if max_fail < 0:
                break
        mutex.acquire()
        stream.write(buffer)
        buffer = b''
        mutex.release()
 def test(args):
    global mutex
    global buffer
    params = {
        "text": args.text,
        "spk_id": args.spk_id,
        "speed": args.speed,
        "volume": args.volume,
        "sample_rate": args.sample_rate,
        "save_path": ''
    }
    all_bytes = 0.0
    t = threading.Thread(target=play_audio)
    flag = 1
    url = "http://" + str(args.server) + ":" + str(
        args.port) + "/paddlespeech/streaming/tts"
    st = time.time()
    html = requests.post(url, json.dumps(params), stream=True)
    for chunk in html.iter_content(chunk_size=1024):
        mutex.acquire()
        chunk = base64.b64decode(chunk)  # bytes
        buffer += chunk
        mutex.release()
        if flag:
            first_response = time.time() - st
            print(f"首包响应：{first_response} s")
            flag = 0
            t.start()
        all_bytes += len(chunk)
    final_response = time.time() - st
    duration = all_bytes / 2 / 24000
    print(f"尾包响应：{final_response} s")
    print(f"音频时长：{duration} s")
    print(f"RTF: {final_response / duration}")
    t.join()
    stream.stop_stream()
    stream.close()
    p.terminate()
 if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--text',
        type=str,
        default="您好，欢迎使用语音合成服务。",
        help='A sentence to be synthesized')
    parser.add_argument('--spk_id', type=int, default=0, help='Speaker id')
    parser.add_argument('--speed', type=float, default=1.0, help='Audio speed')
    parser.add_argument(
        '--volume', type=float, default=1.0, help='Audio volume')
    parser.add_argument(
        '--sample_rate',
        type=int,
        default=0,
        help='Sampling rate, the default is the same as the model')
    parser.add_argument(
        "--server", type=str, help="server ip", default="127.0.0.1")
    parser.add_argument("--port", type=int, help="server port", default=8092)
    args = parser.parse_args()
    test(args)
--- a/paddlespeech/server/tests/tts/online/ws_client.py
+++ b/paddlespeech/server/tests/tts/online/ws_client.py
@ -11,92 +11,10 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import _thread as thread
 import argparse
-import base64
+import asyncio
 import json
 import ssl
 import time
 import websocket
 flag = 1
 st = 0.0
 all_bytes = b''
 class WsParam(object):
    # 初始化
    def __init__(self, text, server="127.0.0.1", port=8090):
        self.server = server
        self.port = port
        self.url = "ws://" + self.server + ":" + str(self.port) + "/ws/tts"
        self.text = text
    # 生成url
    def create_url(self):
        return self.url
 def on_message(ws, message):
    global flag
    global st
    global all_bytes
    try:
        message = json.loads(message)
        audio = message["audio"]
        audio = base64.b64decode(audio)  # bytes
        status = message["status"]
        all_bytes += audio
        if status == 0:
            print("create successfully.")
        elif status == 1:
            if flag:
                print(f"首包响应：{time.time() - st} s")
                flag = 0
        elif status == 2:
            final_response = time.time() - st
            duration = len(all_bytes) / 2.0 / 24000
            print(f"尾包响应：{final_response} s")
            print(f"音频时长：{duration} s")
            print(f"RTF: {final_response / duration}")
            with open("./out.pcm", "wb") as f:
                f.write(all_bytes)
            print("ws is closed")
            ws.close()
        else:
            print("infer error")
    except Exception as e:
        print("receive msg,but parse exception:", e)
 # 收到websocket错误的处理
 def on_error(ws, error):
    print("### error:", error)
 # 收到websocket关闭的处理
 def on_close(ws):
    print("### closed ###")
 # 收到websocket连接建立的处理
 def on_open(ws):
    def run(*args):
        global st
        text_base64 = str(
            base64.b64encode((wsParam.text).encode('utf-8')), "UTF8")
        d = {"text": text_base64}
        d = json.dumps(d)
        print("Start sending text data")
        st = time.time()
        ws.send(d)
    thread.start_new_thread(run, ())
 from paddlespeech.server.utils.audio_handler import TTSWsHandler
 if __name__ == "__main__":
    parser = argparse.ArgumentParser()
@ -108,19 +26,13 @@ if __name__ == "__main__":
    parser.add_argument(
        "--server", type=str, help="server ip", default="127.0.0.1")
    parser.add_argument("--port", type=int, help="server port", default=8092)
    parser.add_argument(
        "--output", type=str, help="save audio path", default=None)
    parser.add_argument(
        "--play", type=bool, help="whether to play audio", default=False)
    args = parser.parse_args()
-    print("***************************************")
+    print("tts websocket client start")
-    print("Server ip: ", args.server)
+    handler = TTSWsHandler(args.server, args.port, args.play)
-    print("Server port: ", args.port)
+    loop = asyncio.get_event_loop()
-    print("Sentence to be synthesized: ", args.text)
+    loop.run_until_complete(handler.run(args.text, args.output))
    print("***************************************")
    wsParam = WsParam(text=args.text, server=args.server, port=args.port)
    websocket.enableTrace(False)
    wsUrl = wsParam.create_url()
    ws = websocket.WebSocketApp(
        wsUrl, on_message=on_message, on_error=on_error, on_close=on_close)
    ws.on_open = on_open
    ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE})
--- a/paddlespeech/server/tests/tts/online/ws_client_playaudio.py
+++ b/paddlespeech/server/tests/tts/online/ws_client_playaudio.py
@ -1,160 +0,0 @@
 # Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import _thread as thread
 import argparse
 import base64
 import json
 import ssl
 import threading
 import time
 import pyaudio
 import websocket
 mutex = threading.Lock()
 buffer = b''
 p = pyaudio.PyAudio()
 stream = p.open(
    format=p.get_format_from_width(2), channels=1, rate=24000, output=True)
 flag = 1
 st = 0.0
 all_bytes = 0.0
 class WsParam(object):
    # 初始化
    def __init__(self, text, server="127.0.0.1", port=8090):
        self.server = server
        self.port = port
        self.url = "ws://" + self.server + ":" + str(self.port) + "/ws/tts"
        self.text = text
    # 生成url
    def create_url(self):
        return self.url
 def play_audio():
    global stream
    global buffer
    while True:
        time.sleep(0.05)
        if not buffer:  # buffer 为空
            break
        mutex.acquire()
        stream.write(buffer)
        buffer = b''
        mutex.release()
 t = threading.Thread(target=play_audio)
 def on_message(ws, message):
    global flag
    global t
    global buffer
    global st
    global all_bytes
    try:
        message = json.loads(message)
        audio = message["audio"]
        audio = base64.b64decode(audio)  # bytes
        status = message["status"]
        all_bytes += len(audio)
        if status == 0:
            print("create successfully.")
        elif status == 1:
            mutex.acquire()
            buffer += audio
            mutex.release()
            if flag:
                print(f"首包响应：{time.time() - st} s")
                flag = 0
                print("Start playing audio")
                t.start()
        elif status == 2:
            final_response = time.time() - st
            duration = all_bytes / 2 / 24000
            print(f"尾包响应：{final_response} s")
            print(f"音频时长：{duration} s")
            print(f"RTF: {final_response / duration}")
            print("ws is closed")
            ws.close()
        else:
            print("infer error")
    except Exception as e:
        print("receive msg,but parse exception:", e)
 # 收到websocket错误的处理
 def on_error(ws, error):
    print("### error:", error)
 # 收到websocket关闭的处理
 def on_close(ws):
    print("### closed ###")
 # 收到websocket连接建立的处理
 def on_open(ws):
    def run(*args):
        global st
        text_base64 = str(
            base64.b64encode((wsParam.text).encode('utf-8')), "UTF8")
        d = {"text": text_base64}
        d = json.dumps(d)
        print("Start sending text data")
        st = time.time()
        ws.send(d)
    thread.start_new_thread(run, ())
 if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--text",
        type=str,
        help="A sentence to be synthesized",
        default="您好，欢迎使用语音合成服务。")
    parser.add_argument(
        "--server", type=str, help="server ip", default="127.0.0.1")
    parser.add_argument("--port", type=int, help="server port", default=8092)
    args = parser.parse_args()
    print("***************************************")
    print("Server ip: ", args.server)
    print("Server port: ", args.port)
    print("Sentence to be synthesized: ", args.text)
    print("***************************************")
    wsParam = WsParam(text=args.text, server=args.server, port=args.port)
    websocket.enableTrace(False)
    wsUrl = wsParam.create_url()
    ws = websocket.WebSocketApp(
        wsUrl, on_message=on_message, on_error=on_error, on_close=on_close)
    ws.on_open = on_open
    ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE})
    t.join()
    print("End of playing audio")
    stream.stop_stream()
    stream.close()
    p.terminate()
--- a/paddlespeech/server/utils/audio_handler.py
+++ b/paddlespeech/server/utils/audio_handler.py
@ -11,14 +11,19 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import base64
 import json
 import logging
 import threading
 import time
 import numpy as np
 import requests
 import soundfile
 import websockets
 from paddlespeech.cli.log import logger
 from paddlespeech.server.utils.audio_process import save_audio
 class ASRAudioHandler:
@ -117,3 +122,221 @@ class ASRAudioHandler:
            logger.info("final receive msg={}".format(msg))
            result = msg
            return result
 class TTSWsHandler:
    def __init__(self, server="127.0.0.1", port=8092, play: bool=False):
        """PaddleSpeech Online TTS Server Client  audio handler
           Online tts server use the websocket protocal
        Args:
            server (str, optional): the server ip. Defaults to "127.0.0.1".
            port (int, optional): the server port. Defaults to 8092.
            play (bool, optional): whether to play audio. Defaults False
        """
        self.server = server
        self.port = port
        self.url = "ws://" + self.server + ":" + str(self.port) + "/ws/tts"
        self.play = play
        if self.play:
            import pyaudio
            self.buffer = b''
            self.p = pyaudio.PyAudio()
            self.stream = self.p.open(
                format=self.p.get_format_from_width(2),
                channels=1,
                rate=24000,
                output=True)
            self.mutex = threading.Lock()
            self.start_play = True
            self.t = threading.Thread(target=self.play_audio)
            self.max_fail = 50
    def play_audio(self):
        while True:
            if not self.buffer:
                self.max_fail -= 1
                time.sleep(0.05)
                if self.max_fail < 0:
                    break
            self.mutex.acquire()
            self.stream.write(self.buffer)
            self.buffer = b''
            self.mutex.release()
    async def run(self, text: str, output: str=None):
        """Send a text to online server
        Args:
            text (str): sentence to be synthesized
            output (str): save audio path
        """
        all_bytes = b''
        # 1. Send websocket handshake protocal
        async with websockets.connect(self.url) as ws:
            # 2. Server has already received handshake protocal
            # send text to engine
            text_base64 = str(base64.b64encode((text).encode('utf-8')), "UTF8")
            d = {"text": text_base64}
            d = json.dumps(d)
            st = time.time()
            await ws.send(d)
            logging.info("send a message to the server")
            # 3. Process the received response 
            message = await ws.recv()
            logger.info(f"句子：{text}")
            logger.info(f"首包响应：{time.time() - st} s")
            message = json.loads(message)
            status = message["status"]
            while (status == 1):
                audio = message["audio"]
                audio = base64.b64decode(audio)  # bytes
                all_bytes += audio
                if self.play:
                    self.mutex.acquire()
                    self.buffer += audio
                    self.mutex.release()
                    if self.start_play:
                        self.t.start()
                        self.start_play = False
                message = await ws.recv()
                message = json.loads(message)
                status = message["status"]
            # 4. Last packet, no audio information
            if status == 2:
                final_response = time.time() - st
                duration = len(all_bytes) / 2.0 / 24000
                logger.info(f"尾包响应：{final_response} s")
                logger.info(f"音频时长：{duration} s")
                logger.info(f"RTF: {final_response / duration}")
                if output is not None:
                    if save_audio(all_bytes, output):
                        logger.info(f"音频保存至：{output}")
                    else:
                        logger.error("save audio error")
            else:
                logger.error("infer error")
            if self.play:
                self.t.join()
                self.stream.stop_stream()
                self.stream.close()
                self.p.terminate()
 class TTSHttpHandler:
    def __init__(self, server="127.0.0.1", port=8092, play: bool=False):
        """PaddleSpeech Online TTS Server Client  audio handler
           Online tts server use the websocket protocal
        Args:
            server (str, optional): the server ip. Defaults to "127.0.0.1".
            port (int, optional): the server port. Defaults to 8092.
            play (bool, optional): whether to play audio. Defaults False
        """
        self.server = server
        self.port = port
        self.url = "http://" + str(self.server) + ":" + str(
            self.port) + "/paddlespeech/streaming/tts"
        self.play = play
        if self.play:
            import pyaudio
            self.buffer = b''
            self.p = pyaudio.PyAudio()
            self.stream = self.p.open(
                format=self.p.get_format_from_width(2),
                channels=1,
                rate=24000,
                output=True)
            self.mutex = threading.Lock()
            self.start_play = True
            self.t = threading.Thread(target=self.play_audio)
            self.max_fail = 50
    def play_audio(self):
        while True:
            if not self.buffer:
                self.max_fail -= 1
                time.sleep(0.05)
                if self.max_fail < 0:
                    break
            self.mutex.acquire()
            self.stream.write(self.buffer)
            self.buffer = b''
            self.mutex.release()
    def run(self,
            text: str,
            spk_id=0,
            speed=1.0,
            volume=1.0,
            sample_rate=0,
            output: str=None):
        """Send a text to tts online server
        Args:
            text (str): sentence to be synthesized.
            spk_id (int, optional): speaker id. Defaults to 0.
            speed (float, optional): audio speed. Defaults to 1.0.
            volume (float, optional): audio volume. Defaults to 1.0.
            sample_rate (int, optional): audio sample rate, 0 means the same as model. Defaults to 0.
            output (str, optional): save audio path. Defaults to None.
        """
        # 1. Create request
        params = {
            "text": text,
            "spk_id": spk_id,
            "speed": speed,
            "volume": volume,
            "sample_rate": sample_rate,
            "save_path": output
        }
        all_bytes = b''
        first_flag = 1
        # 2. Send request
        st = time.time()
        html = requests.post(self.url, json.dumps(params), stream=True)
        # 3. Process the received response 
        for chunk in html.iter_content(chunk_size=1024):
            audio = base64.b64decode(chunk)  # bytes
            if first_flag:
                first_response = time.time() - st
                first_flag = 0
            if self.play:
                self.mutex.acquire()
                self.buffer += audio
                self.mutex.release()
                if self.start_play:
                    self.t.start()
                    self.start_play = False
            all_bytes += audio
        final_response = time.time() - st
        duration = len(all_bytes) / 2.0 / 24000
        logger.info(f"句子：{text}")
        logger.info(f"首包响应：{first_response} s")
        logger.info(f"尾包响应：{final_response} s")
        logger.info(f"音频时长：{duration} s")
        logger.info(f"RTF: {final_response / duration}")
        if output is not None:
            if save_audio(all_bytes, output):
                logger.info(f"音频保存至：{output}")
            else:
                logger.error("save audio error")
        if self.play:
            self.t.join()
            self.stream.stop_stream()
            self.stream.close()
            self.p.terminate()
--- a/paddlespeech/server/utils/audio_process.py
+++ b/paddlespeech/server/utils/audio_process.py
@ -11,6 +11,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import os
 import wave
 import numpy as np
@ -140,3 +141,35 @@ def pcm2float(data):
        bits = np.iinfo(np.int16).bits
        data = data / (2**(bits - 1))
    return data
 def save_audio(bytes_data, audio_path, sample_rate: int=24000) -> bool:
    """save byte to audio file.
    Args:
        bytes_data (bytes): audio samples, bytes format
        audio_path (str): save audio path
        sample_rate (int, optional): audio sample rate. Defaults to 24000.
    Returns:
        bool: Whether the audio was saved successfully
    """
    if audio_path.endswith("pcm"):
        with open(audio_path, "wb") as f:
            f.write(bubytes_dataffer)
    elif audio_path.endswith("wav"):
        with open("./tmp.pcm", "wb") as f:
            f.write(bytes_data)
        pcm2wav(
            "./tmp.pcm",
            audio_path,
            channels=1,
            bits=16,
            sample_rate=sample_rate)
        os.system("rm ./tmp.pcm")
    else:
        print("Only supports saved audio format is pcm or wav")
        return False
    return True
--- a/tests/unit/server/online/tts/check_server/conf/application.yaml
+++ b/tests/unit/server/online/tts/check_server/conf/application.yaml
@ -67,7 +67,7 @@ tts_online-onnx:
    am_sess_conf:
        device: "cpu" # set 'gpu:id' or 'cpu'
        use_trt: False
-        cpu_threads: 1
+        cpu_threads: 4
    # voc (vocoder) choices=['mb_melgan_csmsc_onnx', 'hifigan_csmsc_onnx']
    voc: 'mb_melgan_csmsc_onnx'
@ -76,7 +76,7 @@ tts_online-onnx:
    voc_sess_conf:
        device: "cpu" # set 'gpu:id' or 'cpu'
        use_trt: False
-        cpu_threads: 1
+        cpu_threads: 4
    # others
    lang: 'zh'
--- a/tests/unit/server/online/tts/check_server/test.sh
+++ b/tests/unit/server/online/tts/check_server/test.sh
@ -28,7 +28,7 @@ StartService(){
 ClientTest_http(){
    for ((i=1; i<=3;i++))
    do
-    python http_client.py --save_path ./out_http.wav 
+    paddlespeech_client tts_online --input "您好，欢迎使用百度飞桨深度学习框架。" 
    ((http_test_times+=1))
    done
 }
@ -36,7 +36,7 @@ ClientTest_http(){
 ClientTest_ws(){
    for ((i=1; i<=3;i++))
    do
-    python ws_client.py
+    paddlespeech_client tts_online --input "您好，欢迎使用百度飞桨深度学习框架。" --protocol websocket
    ((ws_test_times+=1))
    done
 }
@ -71,6 +71,7 @@ rm -rf $log/server.log.wf
 rm -rf $log/server.log
 rm -rf $log/test_result.log
 config_file=./conf/application.yaml
 server_ip=$(cat $config_file | grep "host" | awk -F " " '{print $2}')
 port=$(cat $config_file | grep "port" | awk '/port:/ {print $2}')
--- a/tests/unit/server/online/tts/check_server/test_all.sh
+++ b/tests/unit/server/online/tts/check_server/test_all.sh
@ -3,6 +3,8 @@
 log_all_dir=./log
 cp ./tts_online_application.yaml ./conf/application.yaml -rf
 bash test.sh tts_online $log_all_dir/log_tts_online_cpu
 python change_yaml.py --change_type engine_type --target_key engine_list --target_value tts_online-onnx
--- a/tests/unit/server/online/tts/check_server/tts_online_application.yaml
+++ b/tests/unit/server/online/tts/check_server/tts_online_application.yaml
@ -67,7 +67,7 @@ tts_online-onnx:
    am_sess_conf:
        device: "cpu" # set 'gpu:id' or 'cpu'
        use_trt: False
-        cpu_threads: 1
+        cpu_threads: 4
    # voc (vocoder) choices=['mb_melgan_csmsc_onnx', 'hifigan_csmsc_onnx']
    voc: 'mb_melgan_csmsc_onnx'
@ -76,7 +76,7 @@ tts_online-onnx:
    voc_sess_conf:
        device: "cpu" # set 'gpu:id' or 'cpu'
        use_trt: False
-        cpu_threads: 1
+        cpu_threads: 4
    # others
    lang: 'zh'