You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
57 lines
2.0 KiB
57 lines
2.0 KiB
# PaddleSpeech
|
|
|
|
## What is PaddleSpeech?
|
|
PaddleSpeech is an open-source toolkit on the PaddlePaddle platform for two critical tasks in Speech - Speech-to-Text (Automatic Speech Recognition, ASR) and Text-to-Speech Synthesis (TTS), with modules involving state-of-art and influential models.
|
|
|
|
## What can PaddleSpeech do?
|
|
|
|
### Speech-to-Text
|
|
PaddleSpeech ASR mainly consists of components below:
|
|
- Implementation of models and commonly used neural network layers.
|
|
- Dataset abstraction and common data preprocessing pipelines.
|
|
- Ready-to-run experiments.
|
|
|
|
PaddleSpeech ASR provides you with a complete ASR pipeline, including:
|
|
- Data Preparation
|
|
- Build vocabulary
|
|
- Compute Cepstral mean and variance normalization (CMVN)
|
|
- Featrue extraction
|
|
- linear
|
|
- fbank (also support kaldi feature)
|
|
- mfcc
|
|
- Acoustic Models
|
|
- Deepspeech2 (Streaming and Non-Streaming)
|
|
- Transformer (Streaming and Non-Streaming)
|
|
- Conformer (Streaming and Non-Streaming)
|
|
- Decoder
|
|
- ctc greedy search (used in DeepSpeech2, Transformer and Conformer)
|
|
- ctc beam search (used in DeepSpeech2, Transformer and Conformer)
|
|
- attention decoding (used in Transformer and Conformer)
|
|
- attention rescoring (used in Transformer and Conformer)
|
|
|
|
Speech-to-Text helps you train the ASR model very simply.
|
|
|
|
### Text-to-Speech
|
|
TTS mainly consists of components below:
|
|
- Implementation of models and commonly used neural network layers.
|
|
- Dataset abstraction and common data preprocessing pipelines.
|
|
- Ready-to-run experiments.
|
|
|
|
PaddleSpeech TTS provides you with a complete TTS pipeline, including:
|
|
- Text FrontEnd
|
|
- Rule based Chinese frontend.
|
|
- Acoustic Models
|
|
- FastSpeech2
|
|
- SpeedySpeech
|
|
- TransformerTTS
|
|
- Tacotron2
|
|
- Vocoders
|
|
- Multi Band MelGAN
|
|
- Parallel WaveGAN
|
|
- WaveFlow
|
|
- Voice Cloning
|
|
- Transfer Learning from Speaker Verification to Multispeaker Text-to-Speech Synthesis
|
|
- GE2E
|
|
|
|
Text-to-Speech helps you to train TTS models with simple commands.
|