Merge pull request #1890 from yt605155624/docs

[doc]add pptts readme
4 years ago · 9d0ff7a7d1
parent a11dc53c1b 3876290528
commit 9d0ff7a7d1
2 changed files with 150 additions and 0 deletions
--- a/docs/source/tts/PPTTS.md
+++ b/docs/source/tts/PPTTS.md
@ -0,0 +1,74 @@
+([简体中文](./PPTTS_cn.md)|English)
+
+- [1. Introduction](#1)
+- [2. Characteristic](#2)
+- [3. Benchmark](#3)
+- [4. Demo](#4)
+- [5. Tutorials](#5)
+    - [5.1 Training and Inference Optimization](#51)
+    - [5.2 Characteristic APPs of TTS](#52)
+    - [5.3 TTS Server](#53)
+
+<a name="1"></a>
+## 1. Introduction
+
+PP-TTS is a streaming speech synthesis system developed by PaddleSpeech. Based on the implementation of [SOTA Algorithms](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md#text-to-speech-models), a faster inference engine is used to realize streaming speech synthesis technology to meet the needs of commercial speech interaction scenarios.
+
+#### PP-TTS
+Pipline of TTS：
+<center><img src=https://ai-studio-static-online.cdn.bcebos.com/ea69ae1faff84940a59c7079d16b3a8db2741d2c423846f68822f4a7f28726e9 width="600" ></center>
+
+PP-TTS provides a Chinese streaming speech synthesis system based on FastSpeech2 and HiFiGAN by default:
+
+- Text Frontend： The rule-based Chinese text frontend system is adopted to optimize Chinese text such as text normalization, polyphony, and tone sandhi.
+- Acoustic Model: The decoder of FastSpeech2 is improved so that it can be stream synthesized
+- Vocoder: Streaming synthesis of GAN vocoder is supported
+- Inference Engine： Using ONNXRuntime to optimize the inference of TTS models, so that the TTS system can also achieve RTF < 1 on low-voltage, meeting the requirements of streaming synthesis
+
+<a name="2"></a>
+## 2. Characteristic
+- Open source leading Chinese TTS system
+- Using ONNXRuntime to optimize the inference of TTS models
+- The only open-source streaming TTS system
+- Easy disassembly: Developers can easily replace different acoustic models and vocoders in different languages, use different inference engines (Paddle dynamic graph, PaddleInference, ONNXRuntime, etc.), and use different network services (HTTP, WebSocket)
+
+<a name="3"></a>
+## 3. Benchmark
+PaddleSpeech TTS models' benchmark: [TTS-Benchmark](https://github.com/PaddlePaddle/PaddleSpeech/wiki/TTS-Benchmark)。
+
+<a name="4"></a>
+## 4. Demo 
+See: [Streaming TTS Demo Video](https://paddlespeech.readthedocs.io/en/latest/streaming_tts_demo_video.html)
+
+<a name="5"></a>
+## 5. Tutorials
+
+<a name="51"></a>
+### 5.1 Training and Inference Optimization
+
+Default FastSpeech2: [tts3/run.sh](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/csmsc/tts3/run.sh)
+
+Streaming FastSpeech2: [tts3/run_cnndecoder.sh](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/csmsc/tts3/run_cnndecoder.sh)
+
+HiFiGAN：[voc5/run.sh](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/csmsc/voc5/run.sh)
+
+<a name="52"></a>
+### 5.2 Characteristic APPs of TTS
+text_to_speech - convert text into speech: [text_to_speech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/text_to_speech)
+
+style_fs2 - multi style control for FastSpeech2 model: [style_fs2](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/style_fs2)
+
+story talker - book reader based on OCR and TTS: [story_talker](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/story_talker)
+
+metaverse - 2D AR with TTS: [metaverse](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/metaverse)
+
+<a name="53"></a>
+### 5.3 TTS Server
+
+Non-streaming TTS Server: [speech_server](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server)
+
+Streaming TTS Server: [streaming_tts_server](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/streaming_tts_server)
+
+
+For more tutorials please see: [PP-TTS：流式语音合成原理及服务部署
+](https://aistudio.baidu.com/aistudio/projectdetail/3885352)
--- a/docs/source/tts/PPTTS_cn.md
+++ b/docs/source/tts/PPTTS_cn.md
@ -0,0 +1,76 @@
+(简体中文|[English](./PPTTS.md))
+
+# PP-TTS
+
+- [1. 简介](#1)
+- [2. 特性](#2)
+- [3. Benchmark](#3)
+- [4. 效果展示](#4)
+- [5. 使用教程](#5)
+    - [5.1 模型训练与推理优化](#51)
+    - [5.2 语音合成特色应用](#52)
+    - [5.3 语音合成服务搭建](#53)
+
+<a name="1"></a>
+## 1. 简介
+
+PP-TTS 是 PaddleSpeech 自研的流式语音合成系统。在实现[前沿算法](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md#text-to-speech-models)的基础上，使用了更快的推理引擎，实现了流式语音合成技术，使其满足商业语音交互场景的需求。
+
+#### PP-TTS
+语音合成基本流程如下图所示：
+<center><img src=https://ai-studio-static-online.cdn.bcebos.com/ea69ae1faff84940a59c7079d16b3a8db2741d2c423846f68822f4a7f28726e9 width="600" ></center>
+
+PP-TTS 默认提供基于 FastSpeech2 声学模型和 HiFiGAN 声码器的中文流式语音合成系统：
+
+- 文本前端：采用基于规则的中文文本前端系统，对文本正则、多音字、变调等中文文本场景进行了优化。
+- 声学模型：对 FastSpeech2 模型的 Decoder 进行改进，使其可以流式合成
+- 声码器：支持对 GAN Vocoder 的流式合成
+- 推理引擎：使用 ONNXRuntime 推理引擎优化模型推理性能，使得语音合成系统在低压 CPU 上也能达到 RTF<1，满足流式合成的要求
+
+<a name="2"></a>
+## 2. 特性
+- 开源领先的中文语音合成系统
+- 使用 ONNXRuntime 推理引擎优化模型推理性能
+- 唯一开源的流式语音合成系统
+- 易拆卸性：可以很方便地更换不同语种上的不同声学模型和声码器、使用不同的推理引擎（Paddle 动态图、PaddleInference 和 ONNXRuntime 等）、使用不同的网络服务（HTTP、Websocket）
+
+<a name="3"></a>
+## 3. Benchmark
+PaddleSpeech TTS 模型之间的性能对比，请查看 [TTS-Benchmark](https://github.com/PaddlePaddle/PaddleSpeech/wiki/TTS-Benchmark)。
+
+<a name="4"></a>
+## 4. 效果展示 
+请参考：[Streaming TTS Demo Video](https://paddlespeech.readthedocs.io/en/latest/streaming_tts_demo_video.html)
+
+<a name="5"></a>
+## 5. 使用教程
+
+<a name="51"></a>
+### 5.1 模型训练与推理优化
+
+Default FastSpeech2：[tts3/run.sh](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/csmsc/tts3/run.sh)
+
+流式 FastSpeech2：[tts3/run_cnndecoder.sh](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/csmsc/tts3/run_cnndecoder.sh)
+
+HiFiGAN：[voc5/run.sh](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/csmsc/voc5/run.sh)
+
+<a name="52"></a>
+### 5.2 语音合成特色应用
+一键式实现语音合成：[text_to_speech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/text_to_speech)
+
+个性化语音合成 - 基于 FastSpeech2 模型的个性化语音合成：[style_fs2](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/style_fs2)
+
+会说话的故事书 - 基于 OCR 和语音合成的会说话的故事书：[story_talker](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/story_talker)
+
+元宇宙 - 基于语音合成的 2D 增强现实：[metaverse](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/metaverse)
+
+<a name="53"></a>
+### 5.3 语音合成服务搭建
+
+一键式搭建非流式语音合成服务：[speech_server](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server)
+
+一键式搭建流式语音合成服务：[streaming_tts_server](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/streaming_tts_server)
+
+
+更多教程，包括模型设计、模型训练、推理部署等，请参考 AIStudio 教程：[PP-TTS：流式语音合成原理及服务部署
+](https://aistudio.baidu.com/aistudio/projectdetail/3885352)