You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
PaddleSpeech/docs/source/tts/PPTTS.md

3.6 KiB

(简体中文|English)

PPTTS

1. Introduction

PP-TTS is a streaming speech synthesis system developed by PaddleSpeech. Based on the implementation of SOTA Algorithms, a faster inference engine is used to realize streaming speech synthesis technology to meet the needs of commercial speech interaction scenarios.

PP-TTS

Pipline of TTS

PP-TTS provides a Chinese streaming speech synthesis system based on FastSpeech2 and HiFiGAN by default:

  • Text Frontend The rule-based Chinese text frontend system is adopted to optimize Chinese text such as text normalization, polyphony, and tone sandhi.
  • Acoustic Model: The decoder of FastSpeech2 is improved so that it can be stream synthesized
  • Vocoder: Streaming synthesis of GAN vocoder is supported
  • Inference Engine Using ONNXRuntime to optimize the inference of TTS models, so that the TTS system can also achieve RTF < 1 on low-voltage, meeting the requirements of streaming synthesis

2. Characteristic

  • Open source leading Chinese TTS system
  • Using ONNXRuntime to optimize the inference of TTS models
  • The only open-source streaming TTS system
  • Easy disassembly: Developers can easily replace different acoustic models and vocoders in different languages, use different inference engines (Paddle dynamic graph, PaddleInference, ONNXRuntime, etc.), and use different network services (HTTP, WebSocket)

3. Benchmark

PaddleSpeech TTS models' benchmark: TTS-Benchmark

4. Demo

See: Streaming TTS Demo Video

5. Tutorials

5.1 Training and Inference Optimization

Default FastSpeech2: tts3/run.sh

Streaming FastSpeech2: tts3/run_cnndecoder.sh

HiFiGANvoc5/run.sh

5.2 Characteristic APPs of TTS

text_to_speech - convert text into speech: text_to_speech

style_fs2 - multi style control for FastSpeech2 model: style_fs2

story talker - book reader based on OCR and TTS: story_talker

metaverse - 2D AR with TTS: metaverse

5.3 TTS Server

Non-streaming TTS Server: speech_server

Streaming TTS Server: streaming_tts_server

For more tutorials please see: PP-TTS流式语音合成原理及服务部署