diff --git a/README.md b/README.md index 79b86e9ff..e1f57fcaf 100644 --- a/README.md +++ b/README.md @@ -157,6 +157,7 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision - 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV). ### Recent Update +- 👑 2022.10.11: Add [Wav2vec2ASR](./examples/librispeech/asr3), wav2vec2.0 fine-tuning for ASR on LibriSpeech. - 🔥 2022.09.26: Add Voice Cloning, TTS finetune, and ERNIE-SAT in [PaddleSpeech Web Demo](./demos/speech_web). - ⚡ 2022.09.09: Add AISHELL-3 Voice Cloning [example](./examples/aishell3/vc2) with ECAPA-TDNN speaker encoder. - ⚡ 2022.08.25: Release TTS [finetune](./examples/other/tts_finetune/tts3) example. diff --git a/README_cn.md b/README_cn.md index 3d60882b2..1e932201f 100644 --- a/README_cn.md +++ b/README_cn.md @@ -179,6 +179,7 @@ ### 近期更新 +- 👑 2022.10.11: 新增 [Wav2vec2ASR](./examples/librispeech/asr3), 在 LibriSpeech 上针对ASR任务对wav2vec2.0 的fine-tuning. - 🔥 2022.09.26: 新增 Voice Cloning, TTS finetune 和 ERNIE-SAT 到 [PaddleSpeech 网页应用](./demos/speech_web)。 - ⚡ 2022.09.09: 新增基于 ECAPA-TDNN 声纹模型的 AISHELL-3 Voice Cloning [示例](./examples/aishell3/vc2)。 - ⚡ 2022.08.25: 发布 TTS [finetune](./examples/other/tts_finetune/tts3) 示例。 diff --git a/docs/source/reference.md b/docs/source/reference.md index 0d36d96f7..9a47a2302 100644 --- a/docs/source/reference.md +++ b/docs/source/reference.md @@ -28,6 +28,8 @@ We borrowed a lot of code from these repos to build `model` and `engine`, thanks * [speechbrain](https://github.com/speechbrain/speechbrain/blob/develop/LICENSE) - Apache-2.0 License - ECAPA-TDNN SV model +- ASR with CTC and pre-trained wav2vec2 models. + * [chainer](https://github.com/chainer/chainer/blob/master/LICENSE) - MIT License @@ -43,3 +45,7 @@ We borrowed a lot of code from these repos to build `model` and `engine`, thanks * [g2pW](https://github.com/GitYCC/g2pW/blob/master/LICENCE) - Apache-2.0 license + +*[transformers](https://github.com/huggingface/transformers) +- Apache-2.0 License +- Wav2vec2.0 diff --git a/docs/source/released_model.md b/docs/source/released_model.md index a2456f1fe..4e76da033 100644 --- a/docs/source/released_model.md +++ b/docs/source/released_model.md @@ -18,6 +18,12 @@ Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER | [Transformer Librispeech ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/asr1_transformer_librispeech_ckpt_0.1.1.model.tar.gz) | Librispeech Dataset | subword-based | 131 MB | Encoder:Transformer, Decoder:Transformer, Decoding method: Attention rescoring |-| 0.0381 | 960 h | [Transformer Librispeech ASR1](../../examples/librispeech/asr1) | python | [Transformer Librispeech ASR2 Model](https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr2/asr2_transformer_librispeech_ckpt_0.1.1.model.tar.gz) | Librispeech Dataset | subword-based | 131 MB | Encoder:Transformer, Decoder:Transformer, Decoding method: JoinCTC w/ LM |-| 0.0240 | 960 h | [Transformer Librispeech ASR2](../../examples/librispeech/asr2) | python | +### Self-Supervised Pre-trained Model +Model | Pre-Train Method | Pre-Train Data | Finetune Data | Size | Descriptions | CER | WER | Example Link | +:-------------:| :------------:| :-----: | -----: | :-----: |:-----:| :-----: | :-----: | :-----: | +[Wav2vec2-large-960h-lv60-self Model](https://paddlespeech.bj.bcebos.com/wav2vec/wav2vec2-large-960h-lv60-self.pdparams) | wav2vec2 | Librispeech and LV-60k Dataset (5.3w h) | - | 1.18 GB |Pre-trained Wav2vec2.0 Model | - | - | - | +[Wav2vec2ASR-large-960h-librispeech Model](https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr3/wav2vec2ASR-large-960h-librispeech_ckpt_1.3.0.model.tar.gz) | wav2vec2 | Librispeech and LV-60k Dataset (5.3w h) | Librispeech (960 h) | 1.18 GB |Encoder: Wav2vec2.0, Decoder: CTC, Decoding method: Greedy search | - | 0.0189 | [Wav2vecASR Librispeech ASR3](../../examples/librispeech/asr3) | + ### Language Model based on NGram Language Model | Training Data | Token-based | Size | Descriptions :------------:| :------------:|:------------: | :------------: | :------------: