diff --git a/docs/source/tts/PPTTS.md b/docs/source/tts/PPTTS.md new file mode 100644 index 00000000..c8534cd3 --- /dev/null +++ b/docs/source/tts/PPTTS.md @@ -0,0 +1,74 @@ +([简体中文](./PPTTS_cn.md)|English) + +- [1. Introduction](#1) +- [2. Characteristic](#2) +- [3. Benchmark](#3) +- [4. Demo](#4) +- [5. Tutorials](#5) + - [5.1 Training and Inference Optimization](#51) + - [5.2 Characteristic APPs of TTS](#52) + - [5.3 TTS Server](#53) + + +## 1. Introduction + +PP-TTS is a streaming speech synthesis system developed by PaddleSpeech. Based on the implementation of [SOTA Algorithms](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md#text-to-speech-models), a faster inference engine is used to realize streaming speech synthesis technology to meet the needs of commercial speech interaction scenarios. + +#### PP-TTS +Pipline of TTS: +
+ +PP-TTS provides a Chinese streaming speech synthesis system based on FastSpeech2 and HiFiGAN by default: + +- Text Frontend: The rule-based Chinese text frontend system is adopted to optimize Chinese text such as text normalization, polyphony, and tone sandhi. +- Acoustic Model: The decoder of FastSpeech2 is improved so that it can be stream synthesized +- Vocoder: Streaming synthesis of GAN vocoder is supported +- Inference Engine: Using ONNXRuntime to optimize the inference of TTS models, so that the TTS system can also achieve RTF < 1 on low-voltage, meeting the requirements of streaming synthesis + + +## 2. Characteristic +- Open source leading Chinese TTS system +- Using ONNXRuntime to optimize the inference of TTS models +- The only open-source streaming TTS system +- Easy disassembly: Developers can easily replace different acoustic models and vocoders in different languages, use different inference engines (Paddle dynamic graph, PaddleInference, ONNXRuntime, etc.), and use different network services (HTTP, WebSocket) + + +## 3. Benchmark +PaddleSpeech TTS models' benchmark: [TTS-Benchmark](https://github.com/PaddlePaddle/PaddleSpeech/wiki/TTS-Benchmark)。 + + +## 4. Demo +See: [Streaming TTS Demo Video](https://paddlespeech.readthedocs.io/en/latest/streaming_tts_demo_video.html) + + +## 5. Tutorials + + +### 5.1 Training and Inference Optimization + +Default FastSpeech2: [tts3/run.sh](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/csmsc/tts3/run.sh) + +Streaming FastSpeech2: [tts3/run_cnndecoder.sh](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/csmsc/tts3/run_cnndecoder.sh) + +HiFiGAN:[voc5/run.sh](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/csmsc/voc5/run.sh) + + +### 5.2 Characteristic APPs of TTS +text_to_speech - convert text into speech: [text_to_speech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/text_to_speech) + +style_fs2 - multi style control for FastSpeech2 model: [style_fs2](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/style_fs2) + +story talker - book reader based on OCR and TTS: [story_talker](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/story_talker) + +metaverse - 2D AR with TTS: [metaverse](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/metaverse) + + +### 5.3 TTS Server + +Non-streaming TTS Server: [speech_server](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server) + +Streaming TTS Server: [streaming_tts_server](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/streaming_tts_server) + + +For more tutorials please see: [PP-TTS:流式语音合成原理及服务部署 +](https://aistudio.baidu.com/aistudio/projectdetail/3885352) diff --git a/docs/source/tts/PPTTS_cn.md b/docs/source/tts/PPTTS_cn.md new file mode 100644 index 00000000..2b650d62 --- /dev/null +++ b/docs/source/tts/PPTTS_cn.md @@ -0,0 +1,76 @@ +(简体中文|[English](./PPTTS.md)) + +# PP-TTS + +- [1. 简介](#1) +- [2. 特性](#2) +- [3. Benchmark](#3) +- [4. 效果展示](#4) +- [5. 使用教程](#5) + - [5.1 模型训练与推理优化](#51) + - [5.2 语音合成特色应用](#52) + - [5.3 语音合成服务搭建](#53) + + +## 1. 简介 + +PP-TTS 是 PaddleSpeech 自研的流式语音合成系统。在实现[前沿算法](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md#text-to-speech-models)的基础上,使用了更快的推理引擎,实现了流式语音合成技术,使其满足商业语音交互场景的需求。 + +#### PP-TTS +语音合成基本流程如下图所示: +
+ +PP-TTS 默认提供基于 FastSpeech2 声学模型和 HiFiGAN 声码器的中文流式语音合成系统: + +- 文本前端:采用基于规则的中文文本前端系统,对文本正则、多音字、变调等中文文本场景进行了优化。 +- 声学模型:对 FastSpeech2 模型的 Decoder 进行改进,使其可以流式合成 +- 声码器:支持对 GAN Vocoder 的流式合成 +- 推理引擎:使用 ONNXRuntime 推理引擎优化模型推理性能,使得语音合成系统在低压 CPU 上也能达到 RTF<1,满足流式合成的要求 + + +## 2. 特性 +- 开源领先的中文语音合成系统 +- 使用 ONNXRuntime 推理引擎优化模型推理性能 +- 唯一开源的流式语音合成系统 +- 易拆卸性:可以很方便地更换不同语种上的不同声学模型和声码器、使用不同的推理引擎(Paddle 动态图、PaddleInference 和 ONNXRuntime 等)、使用不同的网络服务(HTTP、Websocket) + + +## 3. Benchmark +PaddleSpeech TTS 模型之间的性能对比,请查看 [TTS-Benchmark](https://github.com/PaddlePaddle/PaddleSpeech/wiki/TTS-Benchmark)。 + + +## 4. 效果展示 +请参考:[Streaming TTS Demo Video](https://paddlespeech.readthedocs.io/en/latest/streaming_tts_demo_video.html) + + +## 5. 使用教程 + + +### 5.1 模型训练与推理优化 + +Default FastSpeech2:[tts3/run.sh](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/csmsc/tts3/run.sh) + +流式 FastSpeech2:[tts3/run_cnndecoder.sh](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/csmsc/tts3/run_cnndecoder.sh) + +HiFiGAN:[voc5/run.sh](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/csmsc/voc5/run.sh) + + +### 5.2 语音合成特色应用 +一键式实现语音合成:[text_to_speech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/text_to_speech) + +个性化语音合成 - 基于 FastSpeech2 模型的个性化语音合成:[style_fs2](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/style_fs2) + +会说话的故事书 - 基于 OCR 和语音合成的会说话的故事书:[story_talker](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/story_talker) + +元宇宙 - 基于语音合成的 2D 增强现实:[metaverse](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/metaverse) + + +### 5.3 语音合成服务搭建 + +一键式搭建非流式语音合成服务:[speech_server](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server) + +一键式搭建流式语音合成服务:[streaming_tts_server](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/streaming_tts_server) + + +更多教程,包括模型设计、模型训练、推理部署等,请参考 AIStudio 教程:[PP-TTS:流式语音合成原理及服务部署 +](https://aistudio.baidu.com/aistudio/projectdetail/3885352)