You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
PaddleSpeech/docs/source/released_model.md

12 KiB

Released Models

Speech-to-Text Models

Speech Recognition Model

Acoustic Model Training Data Token-based Size Descriptions CER WER Hours of speech Example Link
Ds2 Online Aishell ASR0 Model Aishell Dataset Char-based 345 MB 2 Conv + 5 LSTM layers with only forward direction 0.080 - 151 h D2 Online Aishell ASR0
Ds2 Offline Aishell ASR0 Model Aishell Dataset Char-based 306 MB 2 Conv + 3 bidirectional GRU layers 0.064 - 151 h Ds2 Offline Aishell ASR0
Conformer Online Aishell ASR1 Model Aishell Dataset Char-based 283 MB Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring 0.0594 - 151 h Conformer Online Aishell ASR1
Conformer Offline Aishell ASR1 Model Aishell Dataset Char-based 284 MB Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring 0.0547 - 151 h Conformer Offline Aishell ASR1
Transformer Aishell ASR1 Model Aishell Dataset Char-based 128 MB Encoder:Transformer, Decoder:Transformer, Decoding method: Attention rescoring 0.0523 151 h Transformer Aishell ASR1
Conformer Librispeech ASR1 Model Librispeech Dataset subword-based 191 MB Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring - 0.0337 960 h Conformer Librispeech ASR1
Transformer Librispeech ASR1 Model Librispeech Dataset subword-based 131 MB Encoder:Transformer, Decoder:Transformer, Decoding method: Attention rescoring - 0.0381 960 h Transformer Librispeech ASR1
Transformer Librispeech ASR2 Model Librispeech Dataset subword-based 131 MB Encoder:Transformer, Decoder:Transformer, Decoding method: JoinCTC w/ LM - 0.0240 960 h Transformer Librispeech ASR2

Language Model based on NGram

Language Model Training Data Token-based Size Descriptions
English LM CommonCrawl(en.00) Word-based 8.3 GB Pruned with 0 1 1 1 1;
About 1.85 billion n-grams;
'trie' binary with '-a 22 -q 8 -b 8'
Mandarin LM Small Baidu Internal Corpus Char-based 2.8 GB Pruned with 0 1 2 4 4;
About 0.13 billion n-grams;
'probing' binary with default settings
Mandarin LM Large Baidu Internal Corpus Char-based 70.4 GB No Pruning;
About 3.7 billion n-grams;
'probing' binary with default settings

Speech Translation Models

Model Training Data Token-based Size Descriptions BLEU Example Link
Transformer FAT-ST MTL En-Zh Ted-En-Zh Spm Encoder:Transformer, Decoder:Transformer,
Decoding method: Attention
20.80 Transformer Ted-En-Zh ST1

Text-to-Speech Models

Acoustic Models

Model Type Dataset Example Link Pretrained Models Static Models Siize(static)
Tacotron2 LJSpeech tacotron2-vctk tacotron2_ljspeech_ckpt_0.3.zip
TransformerTTS LJSpeech transformer-ljspeech transformer_tts_ljspeech_ckpt_0.4.zip
SpeedySpeech CSMSC speedyspeech-csmsc speedyspeech_nosil_baker_ckpt_0.5.zip speedyspeech_nosil_baker_static_0.5.zip 12MB
FastSpeech2 CSMSC fastspeech2-csmsc fastspeech2_nosil_baker_ckpt_0.4.zip fastspeech2_nosil_baker_static_0.4.zip 157MB
FastSpeech2-Conformer CSMSC fastspeech2-csmsc fastspeech2_conformer_baker_ckpt_0.5.zip
FastSpeech2 AISHELL-3 fastspeech2-aishell3 fastspeech2_nosil_aishell3_ckpt_0.4.zip
FastSpeech2 LJSpeech fastspeech2-ljspeech fastspeech2_nosil_ljspeech_ckpt_0.5.zip
FastSpeech2 VCTK fastspeech2-csmsc fastspeech2_nosil_vctk_ckpt_0.5.zip

Vocoders

Model Type Dataset Example Link Pretrained Models Static Models Size(static)
WaveFlow LJSpeech waveflow-ljspeech waveflow_ljspeech_ckpt_0.3.zip
Parallel WaveGAN CSMSC PWGAN-csmsc pwg_baker_ckpt_0.4.zip pwg_baker_static_0.4.zip 5.1MB
Parallel WaveGAN LJSpeech PWGAN-ljspeech pwg_ljspeech_ckpt_0.5.zip
Parallel WaveGAN AISHELL-3 PWGAN-aishell3 pwg_aishell3_ckpt_0.5.zip
Parallel WaveGAN VCTK PWGAN-vctk pwg_vctk_ckpt_0.5.zip
Multi Band MelGAN CSMSC MB MelGAN-csmsc mb_melgan_baker_ckpt_0.5.zip
mb_melgan_baker_finetune_ckpt_0.5.zip
mb_melgan_baker_static_0.5.zip 8.2MB
HiFiGAN CSMSC HiFiGAN-csmsc hifigan_csmsc_ckpt_0.1.1.zip hifigan_csmsc_static_0.1.1.zip 50MB

Voice Cloning

Model Type Dataset Example Link Pretrained Models
GE2E AISHELL-3, etc. ge2e ge2e_ckpt_0.3.zip
GE2E + Tactron2 AISHELL-3 ge2e-tactron2-aishell3 tacotron2_aishell3_ckpt_0.3.zip
GE2E + FastSpeech2 AISHELL-3 ge2e-fastspeech2-aishell3 fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip

Audio Classification Models

Model Type Dataset Example Link Pretrained Models
PANN Audioset audioset_tagging_cnn panns_cnn6.pdparams,panns_cnn10.pdparams,panns_cnn14.pdparams
PANN ESC-50 pann-esc50 panns_cnn6.tar.gz, panns_cnn10, panns_cnn14.tar.gz

Speech Recognition Model from paddle 1.8

Acoustic Model Training Data Token-based Size Descriptions CER WER Hours of speech
Ds2 Offline Aishell model Aishell Dataset Char-based 234 MB 2 Conv + 3 bidirectional GRU layers 0.0804 151 h
Ds2 Offline Librispeech model Librispeech Dataset Word-based 307 MB 2 Conv + 3 bidirectional sharing weight RNN layers 0.0685 960 h
Ds2 Offline Baidu en8k model Baidu Internal English Dataset Word-based 273 MB 2 Conv + 3 bidirectional GRU layers 0.0541 8628 h