[server] update readme (#1851)

* update readme, test=doc

* update readme, test=doc

* update readme, test=doc
pull/1856/head
liangym 3 years ago committed by GitHub
parent 6f7c3d6c85
commit e87495f045
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -10,7 +10,7 @@ This demo is an implementation of starting the voice service and accessing the s
### 1. Installation ### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
It is recommended to use **paddlepaddle 2.2.1** or above. It is recommended to use **paddlepaddle 2.2.2** or above.
You can choose one way from meduim and hard to install paddlespeech. You can choose one way from meduim and hard to install paddlespeech.
### 2. Prepare config File ### 2. Prepare config File
@ -18,6 +18,7 @@ The configuration file can be found in `conf/application.yaml` .
Among them, `engine_list` indicates the speech engine that will be included in the service to be started, in the format of `<speech task>_<engine type>`. Among them, `engine_list` indicates the speech engine that will be included in the service to be started, in the format of `<speech task>_<engine type>`.
At present, the speech tasks integrated by the service include: asr (speech recognition), tts (text to sppech) and cls (audio classification). At present, the speech tasks integrated by the service include: asr (speech recognition), tts (text to sppech) and cls (audio classification).
Currently the engine type supports two forms: python and inference (Paddle Inference) Currently the engine type supports two forms: python and inference (Paddle Inference)
**Note:** If the service can be started normally in the container, but the client access IP is unreachable, you can try to replace the `host` address in the configuration file with the local IP address.
The input of ASR client demo should be a WAV file(`.wav`), and the sample rate must be the same as the model. The input of ASR client demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
@ -51,8 +52,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
[2022-02-23 11:17:32] [INFO] [on.py:26] Waiting for application startup. [2022-02-23 11:17:32] [INFO] [on.py:26] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-02-23 11:17:32] [INFO] [on.py:38] Application startup complete. [2022-02-23 11:17:32] [INFO] [on.py:38] Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) INFO: Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit)
[2022-02-23 11:17:32] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) [2022-02-23 11:17:32] [INFO] [server.py:204] Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit)
``` ```
@ -74,8 +75,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
[2022-02-23 14:57:56] [INFO] [on.py:26] Waiting for application startup. [2022-02-23 14:57:56] [INFO] [on.py:26] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-02-23 14:57:56] [INFO] [on.py:38] Application startup complete. [2022-02-23 14:57:56] [INFO] [on.py:38] Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) INFO: Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit)
[2022-02-23 14:57:56] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) [2022-02-23 14:57:56] [INFO] [server.py:204] Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit)
``` ```

@ -1,17 +1,17 @@
([简体中文](./README_cn.md)|English) (简体中文|[English](./README.md))
# 语音服务 # 语音服务
## 介绍 ## 介绍
这个demo是一个启动语音服务和访问服务的实现。 它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。 这个demo是一个启动离线语音服务和访问服务的实现。它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。
## 使用方法 ## 使用方法
### 1. 安装 ### 1. 安装
请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). 请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
推荐使用 **paddlepaddle 2.2.1** 或以上版本。 推荐使用 **paddlepaddle 2.2.2** 或以上版本。
你可以从 mediumhard 三中方式中选择一种方式安装 PaddleSpeech。 你可以从 mediumhard 两种方式中选择一种方式安装 PaddleSpeech。
### 2. 准备配置文件 ### 2. 准备配置文件
@ -19,9 +19,10 @@
其中,`engine_list`表示即将启动的服务将会包含的语音引擎,格式为 <语音任务>_<引擎类型>。 其中,`engine_list`表示即将启动的服务将会包含的语音引擎,格式为 <语音任务>_<引擎类型>。
目前服务集成的语音任务有: asr(语音识别)、tts(语音合成)以及cls(音频分类)。 目前服务集成的语音任务有: asr(语音识别)、tts(语音合成)以及cls(音频分类)。
目前引擎类型支持两种形式python 及 inference (Paddle Inference) 目前引擎类型支持两种形式python 及 inference (Paddle Inference)
**注意:** 如果在容器里可正常启动服务,但客户端访问 ip 不可达,可尝试将配置文件中 `host` 地址换成本地 ip 地址。
这个 ASR client 的输入应该是一个 WAV 文件(`.wav`),并且采样率必须与模型的采样率相同。 ASR client 的输入是一个 WAV 文件(`.wav`),并且采样率必须与模型的采样率相同。
可以下载此 ASR client的示例音频 可以下载此 ASR client的示例音频
```bash ```bash
@ -52,8 +53,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
[2022-02-23 11:17:32] [INFO] [on.py:26] Waiting for application startup. [2022-02-23 11:17:32] [INFO] [on.py:26] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-02-23 11:17:32] [INFO] [on.py:38] Application startup complete. [2022-02-23 11:17:32] [INFO] [on.py:38] Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) INFO: Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit)
[2022-02-23 11:17:32] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) [2022-02-23 11:17:32] [INFO] [server.py:204] Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit)
``` ```
@ -75,8 +76,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
[2022-02-23 14:57:56] [INFO] [on.py:26] Waiting for application startup. [2022-02-23 14:57:56] [INFO] [on.py:26] Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
[2022-02-23 14:57:56] [INFO] [on.py:38] Application startup complete. [2022-02-23 14:57:56] [INFO] [on.py:38] Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) INFO: Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit)
[2022-02-23 14:57:56] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) [2022-02-23 14:57:56] [INFO] [server.py:204] Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit)
``` ```

@ -1,4 +1,4 @@
# This is the parameter configuration file for PaddleSpeech Serving. # This is the parameter configuration file for PaddleSpeech Offline Serving.
################################################################################# #################################################################################
# SERVER SETTING # # SERVER SETTING #
@ -7,8 +7,8 @@ host: 127.0.0.1
port: 8090 port: 8090
# The task format in the engin_list is: <speech task>_<engine type> # The task format in the engin_list is: <speech task>_<engine type>
# task choices = ['asr_python', 'asr_inference', 'tts_python', 'tts_inference'] # task choices = ['asr_python', 'asr_inference', 'tts_python', 'tts_inference', 'cls_python', 'cls_inference']
protocol: 'http'
engine_list: ['asr_python', 'tts_python', 'cls_python'] engine_list: ['asr_python', 'tts_python', 'cls_python']

@ -10,7 +10,7 @@ This demo is an implementation of starting the streaming speech synthesis servic
### 1. Installation ### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
It is recommended to use **paddlepaddle 2.2.1** or above. It is recommended to use **paddlepaddle 2.2.2** or above.
You can choose one way from meduim and hard to install paddlespeech. You can choose one way from meduim and hard to install paddlespeech.
@ -29,6 +29,8 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
- When the voc model is mb_melgan, when voc_pad=14, the synthetic audio for streaming inference is consistent with the non-streaming synthetic audio; the minimum voc_pad can be set to 7, and the synthetic audio has no abnormal hearing. If the voc_pad is less than 7, the synthetic audio sounds abnormal. - When the voc model is mb_melgan, when voc_pad=14, the synthetic audio for streaming inference is consistent with the non-streaming synthetic audio; the minimum voc_pad can be set to 7, and the synthetic audio has no abnormal hearing. If the voc_pad is less than 7, the synthetic audio sounds abnormal.
- When the voc model is hifigan, when voc_pad=20, the streaming inference synthetic audio is consistent with the non-streaming synthetic audio; when voc_pad=14, the synthetic audio has no abnormal hearing. - When the voc model is hifigan, when voc_pad=20, the streaming inference synthetic audio is consistent with the non-streaming synthetic audio; when voc_pad=14, the synthetic audio has no abnormal hearing.
- Inference speed: mb_melgan > hifigan; Audio quality: mb_melgan < hifigan - Inference speed: mb_melgan > hifigan; Audio quality: mb_melgan < hifigan
- **Note:** If the service can be started normally in the container, but the client access IP is unreachable, you can try to replace the `host` address in the configuration file with the local IP address.
### 3. Streaming speech synthesis server and client using http protocol ### 3. Streaming speech synthesis server and client using http protocol
@ -120,6 +122,7 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
- `sample_rate`: Sampling rate, choices: [0, 8000, 16000], the default is the same as the model. Default: 0 - `sample_rate`: Sampling rate, choices: [0, 8000, 16000], the default is the same as the model. Default: 0
- `output`: Output wave filepath. Default: None, which means not to save the audio to the local. - `output`: Output wave filepath. Default: None, which means not to save the audio to the local.
- `play`: Whether to play audio, play while synthesizing, default value: False, which means not playing. **Playing audio needs to rely on the pyaudio library**. - `play`: Whether to play audio, play while synthesizing, default value: False, which means not playing. **Playing audio needs to rely on the pyaudio library**.
- `spk_id, speed, volume, sample_rate` do not take effect in streaming speech synthesis service temporarily.
Output: Output:
```bash ```bash
@ -254,6 +257,7 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
- `sample_rate`: Sampling rate, choices: [0, 8000, 16000], the default is the same as the model. Default: 0 - `sample_rate`: Sampling rate, choices: [0, 8000, 16000], the default is the same as the model. Default: 0
- `output`: Output wave filepath. Default: None, which means not to save the audio to the local. - `output`: Output wave filepath. Default: None, which means not to save the audio to the local.
- `play`: Whether to play audio, play while synthesizing, default value: False, which means not playing. **Playing audio needs to rely on the pyaudio library**. - `play`: Whether to play audio, play while synthesizing, default value: False, which means not playing. **Playing audio needs to rely on the pyaudio library**.
- `spk_id, speed, volume, sample_rate` do not take effect in streaming speech synthesis service temporarily.
Output: Output:

@ -10,25 +10,27 @@
### 1. 安装 ### 1. 安装
请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). 请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
推荐使用 **paddlepaddle 2.2.1** 或以上版本。 推荐使用 **paddlepaddle 2.2.2** 或以上版本。
你可以从 mediumhard 两种方式中选择一种方式安装 PaddleSpeech。 你可以从 mediumhard 两种方式中选择一种方式安装 PaddleSpeech。
### 2. 准备配置文件 ### 2. 准备配置文件
配置文件可参见 `conf/tts_online_application.yaml` 配置文件可参见 `conf/tts_online_application.yaml`
- `protocol`表示该流式TTS服务使用的网络协议目前支持 **http 和 websocket** 两种。 - `protocol` 表示该流式 TTS 服务使用的网络协议,目前支持 **http 和 websocket** 两种。
- `engine_list`表示即将启动的服务将会包含的语音引擎,格式为 <语音任务>_<引擎类型>。 - `engine_list` 表示即将启动的服务将会包含的语音引擎,格式为 <语音任务>_<引擎类型>。
- 该demo主要介绍流式语音合成服务因此语音任务应设置为tts。 - 该 demo 主要介绍流式语音合成服务,因此语音任务应设置为 tts。
- 目前引擎类型支持两种形式:**online** 表示使用python进行动态图推理的引擎**online-onnx** 表示使用onnxruntime进行推理的引擎。其中online-onnx的推理速度更快。 - 目前引擎类型支持两种形式:**online** 表示使用python进行动态图推理的引擎**online-onnx** 表示使用 onnxruntime 进行推理的引擎。其中online-onnx 的推理速度更快。
- 流式TTS引擎的AM模型支持**fastspeech2 以及fastspeech2_cnndecoder**; Voc 模型支持:**hifigan, mb_melgan** - 流式 TTS 引擎的 AM 模型支持:**fastspeech2 以及fastspeech2_cnndecoder**; Voc 模型支持:**hifigan, mb_melgan**
- 流式am推理中每次会对一个chunk的数据进行推理以达到流式的效果。其中`am_block`表示chunk中的有效帧数`am_pad` 表示一个chunk中am_block前后各加的帧数。am_pad的存在用于消除流式推理产生的误差避免由流式推理对合成音频质量的影响。 - 流式 am 推理中,每次会对一个 chunk 的数据进行推理以达到流式的效果。其中 `am_block` 表示 chunk 中的有效帧数,`am_pad` 表示一个 chunk am_block 前后各加的帧数。am_pad 的存在用于消除流式推理产生的误差,避免由流式推理对合成音频质量的影响。
- fastspeech2不支持流式am推理因此am_pad与am_block对它无效 - fastspeech2 不支持流式 am 推理,因此 am_pad 与 m_block 对它无效
- fastspeech2_cnndecoder 支持流式推理当am_pad=12时流式推理合成音频与非流式合成音频一致 - fastspeech2_cnndecoder 支持流式推理,当 am_pad=12 时,流式推理合成音频与非流式合成音频一致
- 流式voc推理中每次会对一个chunk的数据进行推理以达到流式的效果。其中`voc_block`表示chunk中的有效帧数`voc_pad` 表示一个chunk中voc_block前后各加的帧数。voc_pad的存在用于消除流式推理产生的误差避免由流式推理对合成音频质量的影响。 - 流式 voc 推理中,每次会对一个 chunk 的数据进行推理以达到流式的效果。其中 `voc_block` 表示chunk中的有效帧数`voc_pad` 表示一个 chunk voc_block 前后各加的帧数。voc_pad 的存在用于消除流式推理产生的误差,避免由流式推理对合成音频质量的影响。
- hifigan, mb_melgan 均支持流式voc 推理 - hifigan, mb_melgan 均支持流式 voc 推理
- 当voc模型为mb_melgan当voc_pad=14时流式推理合成音频与非流式合成音频一致voc_pad最小可以设置为7合成音频听感上没有异常若voc_pad小于7合成音频听感上存在异常。 - 当 voc 模型为 mb_melgan voc_pad=14 流式推理合成音频与非流式合成音频一致voc_pad 最小可以设置为7合成音频听感上没有异常 voc_pad 小于7合成音频听感上存在异常。
- 当voc模型为hifigan当voc_pad=20时流式推理合成音频与非流式合成音频一致当voc_pad=14时合成音频听感上没有异常。 - 当 voc 模型为 hifigan voc_pad=20 时,流式推理合成音频与非流式合成音频一致;当 voc_pad=14 时,合成音频听感上没有异常。
- 推理速度mb_melgan > hifigan; 音频质量mb_melgan < hifigan - 推理速度mb_melgan > hifigan; 音频质量mb_melgan < hifigan
- **注意:** 如果在容器里可正常启动服务,但客户端访问 ip 不可达,可尝试将配置文件中 `host` 地址换成本地 ip 地址。
### 3. 使用http协议的流式语音合成服务端及客户端使用方法 ### 3. 使用http协议的流式语音合成服务端及客户端使用方法
#### 3.1 服务端使用方法 #### 3.1 服务端使用方法
@ -119,6 +121,7 @@
- `sample_rate`: 采样率,可选 [0, 8000, 16000]默认值0表示与模型采样率相同 - `sample_rate`: 采样率,可选 [0, 8000, 16000]默认值0表示与模型采样率相同
- `output`: 输出音频的路径, 默认值None表示不保存音频到本地。 - `output`: 输出音频的路径, 默认值None表示不保存音频到本地。
- `play`: 是否播放音频,边合成边播放, 默认值False表示不播放。**播放音频需要依赖pyaudio库**。 - `play`: 是否播放音频,边合成边播放, 默认值False表示不播放。**播放音频需要依赖pyaudio库**。
- `spk_id, speed, volume, sample_rate` 在流式语音合成服务中暂时不生效。
输出: 输出:
@ -254,6 +257,7 @@
- `sample_rate`: 采样率,可选 [0, 8000, 16000]默认值0表示与模型采样率相同 - `sample_rate`: 采样率,可选 [0, 8000, 16000]默认值0表示与模型采样率相同
- `output`: 输出音频的路径, 默认值None表示不保存音频到本地。 - `output`: 输出音频的路径, 默认值None表示不保存音频到本地。
- `play`: 是否播放音频,边合成边播放, 默认值False表示不播放。**播放音频需要依赖pyaudio库**。 - `play`: 是否播放音频,边合成边播放, 默认值False表示不播放。**播放音频需要依赖pyaudio库**。
- `spk_id, speed, volume, sample_rate` 在流式语音合成服务中暂时不生效。
输出: 输出:

@ -10,7 +10,9 @@
paddlespeech_server help paddlespeech_server help
``` ```
### Start the server ### Start the server
First set the service-related configuration parameters, similar to `./conf/application.yaml`. Set `engine_list`, which represents the speech tasks included in the service to be started First set the service-related configuration parameters, similar to `./conf/application.yaml`. Set `engine_list`, which represents the speech tasks included in the service to be started.
**Note:** If the service can be started normally in the container, but the client access IP is unreachable, you can try to replace the `host` address in the configuration file with the local IP address.
Then start the service: Then start the service:
```bash ```bash
paddlespeech_server start --config_file ./conf/application.yaml paddlespeech_server start --config_file ./conf/application.yaml

@ -11,6 +11,7 @@
``` ```
### 启动服务 ### 启动服务
首先设置服务相关配置文件,类似于 `./conf/application.yaml`,设置 `engine_list`,该值表示即将启动的服务中包含的语音任务。 首先设置服务相关配置文件,类似于 `./conf/application.yaml`,设置 `engine_list`,该值表示即将启动的服务中包含的语音任务。
**注意:** 如果在容器里可正常启动服务,但客户端访问 ip 不可达,可尝试将配置文件中 `host` 地址换成本地 ip 地址。
然后启动服务: 然后启动服务:
```bash ```bash
paddlespeech_server start --config_file ./conf/application.yaml paddlespeech_server start --config_file ./conf/application.yaml

Loading…
Cancel
Save