You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
6.8 KiB
6.8 KiB
Released Models
Speech-To-Text Models
Acoustic Model Released in paddle 2.X
Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER | Hours of speech |
---|---|---|---|---|---|---|---|
Ds2 Online Aishell Model | Aishell Dataset | Char-based | 345 MB | 2 Conv + 5 LSTM layers with only forward direction | 0.0824 | - | 151 h |
Ds2 Offline Aishell Model | Aishell Dataset | Char-based | 306 MB | 2 Conv + 3 bidirectional GRU layers | 0.065 | - | 151 h |
Conformer Online Aishell Model | Aishell Dataset | Char-based | 283 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention + CTC | 0.0594 | - | 151 h |
Conformer Offline Aishell Model | Aishell Dataset | Char-based | 284 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention | 0.0547 | - | 151 h |
Conformer Librispeech Model | Librispeech Dataset | Word-based | 287 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention | - | 0.0325 | 960 h |
Transformer Librispeech Model | Librispeech Dataset | Word-based | 195 MB | Encoder:Transformer, Decoder:Transformer, Decoding method: Attention | - | 0.0544 | 960 h |
Acoustic Model Transformed from paddle 1.8
Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER | Hours of speech |
---|---|---|---|---|---|---|---|
Ds2 Offline Aishell model | Aishell Dataset | Char-based | 234 MB | 2 Conv + 3 bidirectional GRU layers | 0.0804 | - | 151 h |
Ds2 Offline Librispeech model | Librispeech Dataset | Word-based | 307 MB | 2 Conv + 3 bidirectional sharing weight RNN layers | - | 0.0685 | 960 h |
Ds2 Offline Baidu en8k model | Baidu Internal English Dataset | Word-based | 273 MB | 2 Conv + 3 bidirectional GRU layers | - | 0.0541 | 8628 h |
Language Model Released
Language Model | Training Data | Token-based | Size | Descriptions |
---|---|---|---|---|
English LM | CommonCrawl(en.00) | Word-based | 8.3 GB | Pruned with 0 1 1 1 1; About 1.85 billion n-grams; 'trie' binary with '-a 22 -q 8 -b 8' |
Mandarin LM Small | Baidu Internal Corpus | Char-based | 2.8 GB | Pruned with 0 1 2 4 4; About 0.13 billion n-grams; 'probing' binary with default settings |
Mandarin LM Large | Baidu Internal Corpus | Char-based | 70.4 GB | No Pruning; About 3.7 billion n-grams; 'probing' binary with default settings |
Text-To-Speech Models
Acoustic Models
Model Type | Dataset | Example Link | Pretrained Models |
---|---|---|---|
Tacotron2 | LJSpeech | tacotron2-vctk | tacotron2_ljspeech_ckpt_0.3.zip |
TransformerTTS | LJSpeech | transformer-ljspeech | transformer_tts_ljspeech_ckpt_0.4.zip |
SpeedySpeech | CSMSC | speedyspeech-csmsc | speedyspeech_nosil_baker_ckpt_0.5.zip |
FastSpeech2 | CSMSC | fastspeech2-csmsc | fastspeech2_nosil_baker_ckpt_0.4.zip |
FastSpeech2 | AISHELL-3 | fastspeech2-aishell3 | fastspeech2_nosil_aishell3_ckpt_0.4.zip |
FastSpeech2 | LJSpeech | fastspeech2-ljspeech | fastspeech2_nosil_ljspeech_ckpt_0.5.zip |
FastSpeech2 | VCTK | fastspeech2-csmsc | fastspeech2_nosil_vctk_ckpt_0.5.zip |
Vocoders
Model Type | Dataset | Example Link | Pretrained Models |
---|---|---|---|
WaveFlow | LJSpeech | waveflow-ljspeech | waveflow_ljspeech_ckpt_0.3.zip |
Parallel WaveGAN | CSMSC | PWGAN-csmsc | pwg_baker_ckpt_0.4.zip. |
Parallel WaveGAN | LJSpeech | PWGAN-ljspeech | pwg_ljspeech_ckpt_0.5.zip |
Parallel WaveGAN | VCTK | PWGAN-vctk | pwg_vctk_ckpt_0.5.zip |
Voice Cloning
Model Type | Dataset | Example Link | Pretrained Models |
---|---|---|---|
GE2E | AISHELL-3, etc. | ge2e | ge2e_ckpt_0.3.zip |
GE2E + Tactron2 | AISHELL-3 | ge2e-tactron2-aishell3 | tacotron2_aishell3_ckpt_0.3.zip |