You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
11 KiB
11 KiB
Released Models
Speech-to-Text Models
Acoustic Model Released in paddle 2.X
Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER | Hours of speech | example link |
---|---|---|---|---|---|---|---|---|
Ds2 Online Aishell ASR0 Model | Aishell Dataset | Char-based | 345 MB | 2 Conv + 5 LSTM layers with only forward direction | 0.080 | - | 151 h | D2 Online Aishell S0 Example |
Ds2 Offline Aishell ASR0 Model | Aishell Dataset | Char-based | 306 MB | 2 Conv + 3 bidirectional GRU layers | 0.064 | - | 151 h | Ds2 Offline Aishell S0 Example |
Conformer Online Aishell ASR1 Model | Aishell Dataset | Char-based | 283 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0594 | - | 151 h | Conformer Online Aishell S1 Example |
Conformer Offline Aishell ASR1 Model | Aishell Dataset | Char-based | 284 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0547 | - | 151 h | Conformer Offline Aishell S1 Example |
Conformer Librispeech ASR1 Model | Librispeech Dataset | subword-based | 287 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring | - | 0.0325 | 960 h | Conformer Librispeech S1 example |
Transformer Librispeech ASR1 Model | Librispeech Dataset | subword-based | 131 MB | Encoder:Transformer, Decoder:Transformer, Decoding method: Attention rescoring | - | 0.0410 | 960 h | Transformer Librispeech S1 example |
Transformer Librispeech ASR2 Model | Librispeech Dataset | subword-based | 131 MB | Encoder:Transformer, Decoder:Transformer, Decoding method: JoinCTC w/ LM | - | 0.024 | 960 h | Transformer Librispeech S2 example |
Acoustic Model Transformed from paddle 1.8
Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER | Hours of speech |
---|---|---|---|---|---|---|---|
Ds2 Offline Aishell model | Aishell Dataset | Char-based | 234 MB | 2 Conv + 3 bidirectional GRU layers | 0.0804 | - | 151 h |
Ds2 Offline Librispeech model | Librispeech Dataset | Word-based | 307 MB | 2 Conv + 3 bidirectional sharing weight RNN layers | - | 0.0685 | 960 h |
Ds2 Offline Baidu en8k model | Baidu Internal English Dataset | Word-based | 273 MB | 2 Conv + 3 bidirectional GRU layers | - | 0.0541 | 8628 h |
Language Model Released
Language Model | Training Data | Token-based | Size | Descriptions |
---|---|---|---|---|
English LM | CommonCrawl(en.00) | Word-based | 8.3 GB | Pruned with 0 1 1 1 1; About 1.85 billion n-grams; 'trie' binary with '-a 22 -q 8 -b 8' |
Mandarin LM Small | Baidu Internal Corpus | Char-based | 2.8 GB | Pruned with 0 1 2 4 4; About 0.13 billion n-grams; 'probing' binary with default settings |
Mandarin LM Large | Baidu Internal Corpus | Char-based | 70.4 GB | No Pruning; About 3.7 billion n-grams; 'probing' binary with default settings |
Text-to-Speech Models
Acoustic Models
Model Type | Dataset | Example Link | Pretrained Models | Static Models | Siize(static) |
---|---|---|---|---|---|
Tacotron2 | LJSpeech | tacotron2-vctk | tacotron2_ljspeech_ckpt_0.3.zip | ||
TransformerTTS | LJSpeech | transformer-ljspeech | transformer_tts_ljspeech_ckpt_0.4.zip | ||
SpeedySpeech | CSMSC | speedyspeech-csmsc | speedyspeech_nosil_baker_ckpt_0.5.zip | speedyspeech_nosil_baker_static_0.5.zip | 12MB |
FastSpeech2 | CSMSC | fastspeech2-csmsc | fastspeech2_nosil_baker_ckpt_0.4.zip | fastspeech2_nosil_baker_static_0.4.zip | 157MB |
FastSpeech2-Conformer | CSMSC | fastspeech2-csmsc | fastspeech2_conformer_baker_ckpt_0.5.zip | ||
FastSpeech2 | AISHELL-3 | fastspeech2-aishell3 | fastspeech2_nosil_aishell3_ckpt_0.4.zip | ||
FastSpeech2 | LJSpeech | fastspeech2-ljspeech | fastspeech2_nosil_ljspeech_ckpt_0.5.zip | ||
FastSpeech2 | VCTK | fastspeech2-csmsc | fastspeech2_nosil_vctk_ckpt_0.5.zip |
Vocoders
Model Type | Dataset | Example Link | Pretrained Models | Static Models | Size(static) |
---|---|---|---|---|---|
WaveFlow | LJSpeech | waveflow-ljspeech | waveflow_ljspeech_ckpt_0.3.zip | ||
Parallel WaveGAN | CSMSC | PWGAN-csmsc | pwg_baker_ckpt_0.4.zip | pwg_baker_static_0.4.zip | 5.1MB |
Parallel WaveGAN | LJSpeech | PWGAN-ljspeech | pwg_ljspeech_ckpt_0.5.zip | ||
Parallel WaveGAN | AISHELL-3 | PWGAN-aishell3 | pwg_aishell3_ckpt_0.5.zip | ||
Parallel WaveGAN | VCTK | PWGAN-vctk | pwg_vctk_ckpt_0.5.zip | ||
Multi Band MelGAN | CSMSC | MB MelGAN-csmsc | mb_melgan_baker_ckpt_0.5.zip mb_melgan_baker_finetune_ckpt_0.5.zip |
mb_melgan_baker_static_0.5.zip | 8.2MB |
Voice Cloning
Model Type | Dataset | Example Link | Pretrained Models |
---|---|---|---|
GE2E | AISHELL-3, etc. | ge2e | ge2e_ckpt_0.3.zip |
GE2E + Tactron2 | AISHELL-3 | ge2e-tactron2-aishell3 | tacotron2_aishell3_ckpt_0.3.zip |
GE2E + FastSpeech2 | AISHELL-3 | ge2e-fastspeech2-aishell3 | fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip |
Audio Classification Models
Model Type | Dataset | Example Link | Pretrained Models |
---|---|---|---|
PANN | Audioset | audioset_tagging_cnn | panns_cnn6.pdparams,panns_cnn10.pdparams,panns_cnn14.pdparams |
PANN | ESC-50 | pann-esc50 | panns_cnn6.tar.gz, panns_cnn10, panns_cnn14.tar.gz |
Speech Translation Models
Model Type | Dataset | Example Link | Pretrained Models | Model Size |
---|---|---|---|---|
FAT-ST | TED En-Zh | FAT + Transformer+ASR MTL | fat_st_ted-en-zh.tar.gz | 50.26M |