parent
e395462419
commit
3f9e30c9b3
After Width: | Height: | Size: 4.9 KiB |
After Width: | Height: | Size: 108 KiB |
@ -0,0 +1,6 @@
|
|||||||
|
myst-parser
|
||||||
|
recommonmark>=0.5.0
|
||||||
|
sphinx
|
||||||
|
sphinx-autobuild
|
||||||
|
sphinx-markdown-tables
|
||||||
|
sphinx_rtd_theme
|
@ -1,28 +0,0 @@
|
|||||||
# Released Models
|
|
||||||
|
|
||||||
## Acoustic Model Released in paddle 2.X
|
|
||||||
Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER | Hours of speech
|
|
||||||
:-------------:| :------------:| :-----: | -----: | :----------------- |:--------- | :---------- | :---------
|
|
||||||
[Ds2 Online Aishell Model](https://deepspeech.bj.bcebos.com/release2.1/aishell/s0/aishell.s0.ds_online.5rnn.debug.tar.gz) | Aishell Dataset | Char-based | 345 MB | 2 Conv + 5 LSTM layers with only forward direction | 0.0824 |-| 151 h
|
|
||||||
[Ds2 Offline Aishell Model](https://deepspeech.bj.bcebos.com/release2.1/aishell/s0/aishell.s0.ds2.offline.cer6p65.release.tar.gz)| Aishell Dataset | Char-based | 306 MB | 2 Conv + 3 bidirectional GRU layers| 0.065 |-| 151 h
|
|
||||||
[Conformer Online Aishell Model](https://deepspeech.bj.bcebos.com/release2.1/aishell/s1/aishell.chunk.release.tar.gz) | Aishell Dataset | Char-based | 283 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention + CTC | 0.0594 |-| 151 h
|
|
||||||
[Conformer Offline Aishell Model](https://deepspeech.bj.bcebos.com/release2.1/aishell/s1/aishell.release.tar.gz) | Aishell Dataset | Char-based | 284 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention | 0.0547 |-| 151 h
|
|
||||||
[Conformer Librispeech Model](https://deepspeech.bj.bcebos.com/release2.1/librispeech/s1/conformer.release.tar.gz) | Librispeech Dataset | Word-based | 287 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention |-| 0.0325 | 960 h
|
|
||||||
[Transformer Librispeech Model](https://deepspeech.bj.bcebos.com/release2.1/librispeech/s1/transformer.release.tar.gz) | Librispeech Dataset | Word-based | 195 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention |-| 0.0544 | 960 h
|
|
||||||
|
|
||||||
## Acoustic Model Transformed from paddle 1.8
|
|
||||||
Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER | Hours of speech
|
|
||||||
:-------------:| :------------:| :-----: | -----: | :----------------- | :---------- | :---------- | :---------
|
|
||||||
[Ds2 Offline Aishell model](https://deepspeech.bj.bcebos.com/mandarin_models/aishell_model_v1.8_to_v2.x.tar.gz)|Aishell Dataset| Char-based| 234 MB| 2 Conv + 3 bidirectional GRU layers| 0.0804 |-| 151 h|
|
|
||||||
[Ds2 Offline Librispeech model](https://deepspeech.bj.bcebos.com/eng_models/librispeech_v1.8_to_v2.x.tar.gz)|Librispeech Dataset| Word-based| 307 MB| 2 Conv + 3 bidirectional sharing weight RNN layers |-| 0.0685| 960 h|
|
|
||||||
[Ds2 Offline Baidu en8k model](https://deepspeech.bj.bcebos.com/eng_models/baidu_en8k_v1.8_to_v2.x.tar.gz)|Baidu Internal English Dataset| Word-based| 273 MB| 2 Conv + 3 bidirectional GRU layers |-| 0.0541 | 8628 h|
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## Language Model Released
|
|
||||||
|
|
||||||
Language Model | Training Data | Token-based | Size | Descriptions
|
|
||||||
:-------------:| :------------:| :-----: | -----: | :-----------------
|
|
||||||
[English LM](https://deepspeech.bj.bcebos.com/en_lm/common_crawl_00.prune01111.trie.klm) | [CommonCrawl(en.00)](http://web-language-models.s3-website-us-east-1.amazonaws.com/ngrams/en/deduped/en.00.deduped.xz) | Word-based | 8.3 GB | Pruned with 0 1 1 1 1; <br/> About 1.85 billion n-grams; <br/> 'trie' binary with '-a 22 -q 8 -b 8'
|
|
||||||
[Mandarin LM Small](https://deepspeech.bj.bcebos.com/zh_lm/zh_giga.no_cna_cmn.prune01244.klm) | Baidu Internal Corpus | Char-based | 2.8 GB | Pruned with 0 1 2 4 4; <br/> About 0.13 billion n-grams; <br/> 'probing' binary with default settings
|
|
||||||
[Mandarin LM Large](https://deepspeech.bj.bcebos.com/zh_lm/zhidao_giga.klm) | Baidu Internal Corpus | Char-based | 70.4 GB | No Pruning; <br/> About 3.7 billion n-grams; <br/> 'probing' binary with default settings
|
|
@ -0,0 +1,33 @@
|
|||||||
|
# PaddleSpeech
|
||||||
|
|
||||||
|
## What is PaddleSpeech?
|
||||||
|
PaddleSpeech is an open-source toolkit on PaddlePaddle platform for two critical tasks in Speech - Speech-To-Text (Automatic Speech Recognition, ASR) and Text-To-Speech Synthesis (TTS), with modules involving state-of-art and influential models.
|
||||||
|
|
||||||
|
## What can PaddleSpeech do?
|
||||||
|
|
||||||
|
### Speech-To-Text
|
||||||
|
(An introduce of ASR in PaddleSpeech is needed here!)
|
||||||
|
|
||||||
|
### Text-To-Speech
|
||||||
|
TTS mainly consists of components below:
|
||||||
|
- Implementation of models and commonly used neural network layers.
|
||||||
|
- Dataset abstraction and common data preprocessing pipelines.
|
||||||
|
- Ready-to-run experiments.
|
||||||
|
|
||||||
|
PaddleSpeech TTS provides you with a complete TTS pipeline, including:
|
||||||
|
- Text FrontEnd
|
||||||
|
- Rule based Chinese frontend.
|
||||||
|
- Acoustic Models
|
||||||
|
- FastSpeech2
|
||||||
|
- SpeedySpeech
|
||||||
|
- TransformerTTS
|
||||||
|
- Tacotron2
|
||||||
|
- Vocoders
|
||||||
|
- Multi Band MelGAN
|
||||||
|
- Parallel WaveGAN
|
||||||
|
- WaveFlow
|
||||||
|
- Voice Cloning
|
||||||
|
- Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
|
||||||
|
- GE2E
|
||||||
|
|
||||||
|
Text-To-Speech helps you to train TTS models with simple commands.
|
@ -0,0 +1,55 @@
|
|||||||
|
# Released Models
|
||||||
|
|
||||||
|
## Speech-To-Text Models
|
||||||
|
### Acoustic Model Released in paddle 2.X
|
||||||
|
Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER | Hours of speech
|
||||||
|
:-------------:| :------------:| :-----: | -----: | :----------------- |:--------- | :---------- | :---------
|
||||||
|
[Ds2 Online Aishell Model](https://deepspeech.bj.bcebos.com/release2.1/aishell/s0/aishell.s0.ds_online.5rnn.debug.tar.gz) | Aishell Dataset | Char-based | 345 MB | 2 Conv + 5 LSTM layers with only forward direction | 0.0824 |-| 151 h
|
||||||
|
[Ds2 Offline Aishell Model](https://deepspeech.bj.bcebos.com/release2.1/aishell/s0/aishell.s0.ds2.offline.cer6p65.release.tar.gz)| Aishell Dataset | Char-based | 306 MB | 2 Conv + 3 bidirectional GRU layers| 0.065 |-| 151 h
|
||||||
|
[Conformer Online Aishell Model](https://deepspeech.bj.bcebos.com/release2.1/aishell/s1/aishell.chunk.release.tar.gz) | Aishell Dataset | Char-based | 283 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention + CTC | 0.0594 |-| 151 h
|
||||||
|
[Conformer Offline Aishell Model](https://deepspeech.bj.bcebos.com/release2.1/aishell/s1/aishell.release.tar.gz) | Aishell Dataset | Char-based | 284 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention | 0.0547 |-| 151 h
|
||||||
|
[Conformer Librispeech Model](https://deepspeech.bj.bcebos.com/release2.1/librispeech/s1/conformer.release.tar.gz) | Librispeech Dataset | Word-based | 287 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention |-| 0.0325 | 960 h
|
||||||
|
[Transformer Librispeech Model](https://deepspeech.bj.bcebos.com/release2.1/librispeech/s1/transformer.release.tar.gz) | Librispeech Dataset | Word-based | 195 MB | Encoder:Transformer, Decoder:Transformer, Decoding method: Attention |-| 0.0544 | 960 h
|
||||||
|
|
||||||
|
### Acoustic Model Transformed from paddle 1.8
|
||||||
|
Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER | Hours of speech
|
||||||
|
:-------------:| :------------:| :-----: | -----: | :----------------- | :---------- | :---------- | :---------
|
||||||
|
[Ds2 Offline Aishell model](https://deepspeech.bj.bcebos.com/mandarin_models/aishell_model_v1.8_to_v2.x.tar.gz)|Aishell Dataset| Char-based| 234 MB| 2 Conv + 3 bidirectional GRU layers| 0.0804 |-| 151 h|
|
||||||
|
[Ds2 Offline Librispeech model](https://deepspeech.bj.bcebos.com/eng_models/librispeech_v1.8_to_v2.x.tar.gz)|Librispeech Dataset| Word-based| 307 MB| 2 Conv + 3 bidirectional sharing weight RNN layers |-| 0.0685| 960 h|
|
||||||
|
[Ds2 Offline Baidu en8k model](https://deepspeech.bj.bcebos.com/eng_models/baidu_en8k_v1.8_to_v2.x.tar.gz)|Baidu Internal English Dataset| Word-based| 273 MB| 2 Conv + 3 bidirectional GRU layers |-| 0.0541 | 8628 h|
|
||||||
|
|
||||||
|
### Language Model Released
|
||||||
|
|
||||||
|
Language Model | Training Data | Token-based | Size | Descriptions
|
||||||
|
:-------------:| :------------:| :-----: | -----: | :-----------------
|
||||||
|
[English LM](https://deepspeech.bj.bcebos.com/en_lm/common_crawl_00.prune01111.trie.klm) | [CommonCrawl(en.00)](http://web-language-models.s3-website-us-east-1.amazonaws.com/ngrams/en/deduped/en.00.deduped.xz) | Word-based | 8.3 GB | Pruned with 0 1 1 1 1; <br/> About 1.85 billion n-grams; <br/> 'trie' binary with '-a 22 -q 8 -b 8'
|
||||||
|
[Mandarin LM Small](https://deepspeech.bj.bcebos.com/zh_lm/zh_giga.no_cna_cmn.prune01244.klm) | Baidu Internal Corpus | Char-based | 2.8 GB | Pruned with 0 1 2 4 4; <br/> About 0.13 billion n-grams; <br/> 'probing' binary with default settings
|
||||||
|
[Mandarin LM Large](https://deepspeech.bj.bcebos.com/zh_lm/zhidao_giga.klm) | Baidu Internal Corpus | Char-based | 70.4 GB | No Pruning; <br/> About 3.7 billion n-grams; <br/> 'probing' binary with default settings
|
||||||
|
|
||||||
|
## Text-To-Speech Models
|
||||||
|
### Acoustic Models
|
||||||
|
Model Type | Dataset| Example Link | Pretrained Models
|
||||||
|
:-------------:| :------------:| :-----: | :-----
|
||||||
|
Tacotron2|LJSpeech|[tacotron2-vctk](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/ljspeech/tts0)|[tacotron2_ljspeech_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_ckpt_0.3.zip)
|
||||||
|
TransformerTTS| LJSpeech| [transformer-ljspeech](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/ljspeech/tts1)|[transformer_tts_ljspeech_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_ckpt_0.4.zip)
|
||||||
|
SpeedySpeech| CSMSC | [speedyspeech-csmsc](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/csmsc/tts2) |[speedyspeech_nosil_baker_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/speedyspeech_nosil_baker_ckpt_0.5.zip)
|
||||||
|
FastSpeech2| CSMSC |[fastspeech2-csmsc](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/csmsc/tts3)|[fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_ckpt_0.4.zip)
|
||||||
|
FastSpeech2| AISHELL-3 |[fastspeech2-aishell3](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/aishell3/tts3)|[fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_aishell3_ckpt_0.4.zip)
|
||||||
|
FastSpeech2| LJSpeech |[fastspeech2-ljspeech](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/ljspeech/tts3)|[fastspeech2_nosil_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_ljspeech_ckpt_0.5.zip)
|
||||||
|
FastSpeech2| VCTK |[fastspeech2-csmsc](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/vctk/tts3)|[fastspeech2_nosil_vctk_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_vctk_ckpt_0.5.zip)
|
||||||
|
|
||||||
|
|
||||||
|
### Vocoders
|
||||||
|
|
||||||
|
Model Type | Dataset| Example Link | Pretrained Models
|
||||||
|
:-------------:| :------------:| :-----: | :-----
|
||||||
|
WaveFlow| LJSpeech |[waveflow-ljspeech](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/ljspeech/voc0)|[waveflow_ljspeech_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_ljspeech_ckpt_0.3.zip)
|
||||||
|
Parallel WaveGAN| CSMSC |[PWGAN-csmsc](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/csmsc/voc1)|[pwg_baker_ckpt_0.4.zip.](https://paddlespeech.bj.bcebos.com/Parakeet/pwg_baker_ckpt_0.4.zip)
|
||||||
|
Parallel WaveGAN| LJSpeech |[PWGAN-ljspeech](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/ljspeech/voc1)|[pwg_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/pwg_ljspeech_ckpt_0.5.zip)
|
||||||
|
Parallel WaveGAN| VCTK |[PWGAN-vctk](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/vctk/voc1)|[pwg_vctk_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/pwg_vctk_ckpt_0.5.zip)
|
||||||
|
|
||||||
|
### Voice Cloning
|
||||||
|
Model Type | Dataset| Example Link | Pretrained Models
|
||||||
|
:-------------:| :------------:| :-----: | :-----
|
||||||
|
GE2E| AISHELL-3, etc. |[ge2e](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/other/ge2e)|[ge2e_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/ge2e_ckpt_0.3.zip)
|
||||||
|
GE2E + Tactron2| AISHELL-3 |[ge2e-tactron2-aishell3](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/aishell3/vc0)|[tacotron2_aishell3_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_aishell3_ckpt_0.3.zip)
|
@ -0,0 +1,7 @@
|
|||||||
|
Audio Sample (PaddleSpeech TTS VS Espnet TTS)
|
||||||
|
==================
|
||||||
|
|
||||||
|
This is an audio demo page to contrast PaddleSpeech TTS and Espnet TTS, We use their respective modules (Text Frontend, Acoustic model and Vocoder) here.
|
||||||
|
We use Espnet's released models here.
|
||||||
|
|
||||||
|
FastSpeech2 + Parallel WaveGAN in CSMSC
|
@ -0,0 +1,9 @@
|
|||||||
|
# GAN Vocoders
|
||||||
|
This is a brief introduction of GAN Vocoders, we mainly introduce the losses of different vocoders here.
|
||||||
|
|
||||||
|
Model | Generator Loss |Discriminator Loss
|
||||||
|
:-------------:| :------------:| :-----
|
||||||
|
Parallel Wave GAN| adversial loss <br> Feature Matching | Multi-Scale Discriminator |
|
||||||
|
Mel GAN |adversial loss <br> Multi-resolution STFT loss | adversial loss|
|
||||||
|
Multi-Band Mel GAN | adversial loss <br> full band Multi-resolution STFT loss <br> sub band Multi-resolution STFT loss |Multi-Scale Discriminator|
|
||||||
|
HiFi GAN |adversial loss <br> Feature Matching <br> Mel-Spectrogram Loss | Multi-Scale Discriminator <br> Multi-Period Discriminato |
|
@ -1,45 +0,0 @@
|
|||||||
.. parakeet documentation master file, created by
|
|
||||||
sphinx-quickstart on Fri Sep 10 14:22:24 2021.
|
|
||||||
You can adapt this file completely to your liking, but it should at least
|
|
||||||
contain the root `toctree` directive.
|
|
||||||
|
|
||||||
Parakeet
|
|
||||||
====================================
|
|
||||||
|
|
||||||
``parakeet`` is a deep learning based text-to-speech toolkit built upon ``paddlepaddle`` framework. It aims to provide a flexible, efficient and state-of-the-art text-to-speech toolkit for the open-source community. It includes many influential TTS models proposed by `Baidu Research <http://research.baidu.com>`_ and other research groups.
|
|
||||||
|
|
||||||
``parakeet`` mainly consists of components below.
|
|
||||||
|
|
||||||
#. Implementation of models and commonly used neural network layers.
|
|
||||||
#. Dataset abstraction and common data preprocessing pipelines.
|
|
||||||
#. Ready-to-run experiments.
|
|
||||||
|
|
||||||
.. toctree::
|
|
||||||
:maxdepth: 1
|
|
||||||
:caption: Introduction
|
|
||||||
|
|
||||||
introduction
|
|
||||||
|
|
||||||
.. toctree::
|
|
||||||
:maxdepth: 1
|
|
||||||
:caption: Getting started
|
|
||||||
|
|
||||||
install
|
|
||||||
basic_usage
|
|
||||||
advanced_usage
|
|
||||||
cn_text_frontend
|
|
||||||
released_models
|
|
||||||
|
|
||||||
.. toctree::
|
|
||||||
:maxdepth: 1
|
|
||||||
:caption: Demos
|
|
||||||
|
|
||||||
demo
|
|
||||||
|
|
||||||
|
|
||||||
Indices and tables
|
|
||||||
==================
|
|
||||||
|
|
||||||
* :ref:`genindex`
|
|
||||||
* :ref:`modindex`
|
|
||||||
* :ref:`search`
|
|
@ -1,47 +0,0 @@
|
|||||||
# Installation
|
|
||||||
## Install PaddlePaddle
|
|
||||||
Parakeet requires PaddlePaddle as its backend. Note that 2.1.2 or newer versions of paddle is required.
|
|
||||||
|
|
||||||
Since paddlepaddle has multiple packages depending on the device (cpu or gpu) and the dependency libraries, it is recommended to install a proper package of paddlepaddle with respect to the device and dependency library versons via `pip`.
|
|
||||||
|
|
||||||
Installing paddlepaddle with conda or build paddlepaddle from source is also supported. Please refer to [PaddlePaddle installation](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html) for more details.
|
|
||||||
|
|
||||||
Example instruction to install paddlepaddle via pip is listed below.
|
|
||||||
|
|
||||||
### PaddlePaddle with GPU
|
|
||||||
```python
|
|
||||||
# CUDA10.1 的 PaddlePaddle
|
|
||||||
python -m pip install paddlepaddle-gpu==2.1.2.post101 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
|
|
||||||
# CUDA10.2 的 PaddlePaddle
|
|
||||||
python -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
|
|
||||||
# CUDA11.0 的 PaddlePaddle
|
|
||||||
python -m pip install paddlepaddle-gpu==2.1.2.post110 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
|
|
||||||
# CUDA11.2 的 PaddlePaddle
|
|
||||||
python -m pip install paddlepaddle-gpu==2.1.2.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
|
|
||||||
```
|
|
||||||
### PaddlePaddle with CPU
|
|
||||||
```python
|
|
||||||
python -m pip install paddlepaddle==2.1.2 -i https://mirror.baidu.com/pypi/simple
|
|
||||||
```
|
|
||||||
## Install libsndfile
|
|
||||||
Experimemts in parakeet often involve audio and spectrum processing, thus `librosa` and `soundfile` are required. `soundfile` requires a extra C library `libsndfile`, which is not always handled by pip.
|
|
||||||
|
|
||||||
For Windows and Mac users, `libsndfile` is also installed when installing `soundfile` via pip, but for Linux users, installing `libsndfile` via system package manager is required. Example commands for popular distributions are listed below.
|
|
||||||
```bash
|
|
||||||
# ubuntu, debian
|
|
||||||
sudo apt-get install libsndfile1
|
|
||||||
# centos, fedora
|
|
||||||
sudo yum install libsndfile
|
|
||||||
# openSUSE
|
|
||||||
sudo zypper in libsndfile
|
|
||||||
```
|
|
||||||
For any problem with installtion of soundfile, please refer to [SoundFile](https://pypi.org/project/SoundFile/).
|
|
||||||
## Install Parakeet
|
|
||||||
There are two ways to install parakeet according to the purpose of using it.
|
|
||||||
|
|
||||||
1. If you want to run experiments provided by parakeet or add new models and experiments, it is recommended to clone the project from github (Parakeet), and install it in editable mode.
|
|
||||||
```python
|
|
||||||
git clone https://github.com/PaddlePaddle/Parakeet
|
|
||||||
cd Parakeet
|
|
||||||
pip install -e .
|
|
||||||
```
|
|
@ -1,27 +0,0 @@
|
|||||||
# Parakeet - PAddle PARAllel text-to-speech toolKIT
|
|
||||||
|
|
||||||
## What is Parakeet?
|
|
||||||
Parakeet is a deep learning based text-to-speech toolkit built upon paddlepaddle framework. It aims to provide a flexible, efficient and state-of-the-art text-to-speech toolkit for the open-source community. It includes many influential TTS models proposed by Baidu Research and other research groups.
|
|
||||||
|
|
||||||
## What can Parakeet do?
|
|
||||||
Parakeet mainly consists of components below:
|
|
||||||
- Implementation of models and commonly used neural network layers.
|
|
||||||
- Dataset abstraction and common data preprocessing pipelines.
|
|
||||||
- Ready-to-run experiments.
|
|
||||||
|
|
||||||
Parakeet provides you with a complete TTS pipeline, including:
|
|
||||||
- Text FrontEnd
|
|
||||||
- Rule based Chinese frontend.
|
|
||||||
- Acoustic Models
|
|
||||||
- FastSpeech2
|
|
||||||
- SpeedySpeech
|
|
||||||
- TransformerTTS
|
|
||||||
- Tacotron2
|
|
||||||
- Vocoders
|
|
||||||
- Parallel WaveGAN
|
|
||||||
- WaveFlow
|
|
||||||
- Voice Cloning
|
|
||||||
- Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
|
|
||||||
- GE2E
|
|
||||||
|
|
||||||
Parakeet helps you to train TTS models with simple commands.
|
|
@ -1,5 +1,5 @@
|
|||||||
# Chinese Rule Based Text Frontend
|
# Chinese Rule Based Text Frontend
|
||||||
TTS system mainly includes three modules: `text frontend`, `Acoustic model` and `Vocoder`. We provide a complete Chinese text frontend module in Parakeet, see exapmle in `Parakeet/examples/text_frontend/`.
|
A TTS system mainly includes three modules: `Text Frontend`, `Acoustic model` and `Vocoder`. We provide a complete Chinese text frontend module in PaddleSpeech TTS, see exapmle in [examples/other/text_frontend/](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/other/text_frontend).
|
||||||
|
|
||||||
A text frontend module mainly includes:
|
A text frontend module mainly includes:
|
||||||
- Text Segmentation
|
- Text Segmentation
|
Loading…
Reference in new issue