------------------------------------------------------------------------------------ ![License](https://img.shields.io/badge/license-Apache%202-red.svg) ![python version](https://img.shields.io/badge/python-3.7+-orange.svg) ![support os](https://img.shields.io/badge/os-linux-yellow.svg) ![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue) **PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models. ##### Speech Recognition
Input Audio | Recognition Result |
---|---|
|
I knocked at the door on the ancient side of the building. |
|
我认为跑步最重要的就是给我带来了身体健康。 |
Input Text | Synthetic Audio |
---|---|
Life was like a box of chocolates, you never know what you're gonna get. |
|
早上好,今天是2020/10/29,最低温度是-3°C。 |
|
Speech-to-Text Module Type | Dataset | Model Type | Link |
---|---|---|---|
Speech Recogination | Aishell | DeepSpeech2 RNN + Conv based Models | deepspeech2-aishell |
Transformer based Attention Models | u2.transformer.conformer-aishell | ||
Librispeech | Transformer based Attention Models | deepspeech2-librispeech / transformer.conformer.u2-librispeech / transformer.conformer.u2-kaldi-librispeech | |
Alignment | THCHS30 | MFA | mfa-thchs30 |
Language Model | Ngram Language Model | kenlm | |
TIMIT | Unified Streaming & Non-streaming Two-pass | u2-timit | |
Speech Translation (English to Chinese) | TED En-Zh | Transformer + ASR MTL | transformer-ted |
FAT + Transformer + ASR MTL | fat-st-ted |
Text-to-Speech Module Type | Model Type | Dataset | Link |
---|---|---|---|
Text Frontend | tn / g2p | ||
Acoustic Model | Tacotron2 | LJSpeech | tacotron2-ljspeech |
Transformer TTS | transformer-ljspeech | ||
SpeedySpeech | CSMSC | speedyspeech-csmsc | |
FastSpeech2 | AISHELL-3 / VCTK / LJSpeech / CSMSC | fastspeech2-aishell3 / fastspeech2-vctk / fastspeech2-ljspeech / fastspeech2-csmsc | |
Vocoder | WaveFlow | LJSpeech | waveflow-ljspeech |
Parallel WaveGAN | LJSpeech / VCTK / CSMSC | PWGAN-ljspeech / PWGAN-vctk / PWGAN-csmsc | |
Multi Band MelGAN | CSMSC | Multi Band MelGAN-csmsc | |
Voice Cloning | GE2E | Librispeech, etc. | ge2e |
GE2E + Tactron2 | AISHELL-3 | ge2e-tactron2-aishell3 | |
GE2E + FastSpeech2 | AISHELL-3 | ge2e-fastspeech2-aishell3 |
Task | Dataset | Model Type | Link |
---|---|---|---|
Audio Classification | ESC-50 | PANN | pann-esc50 |