add whisper doc. (#4115)

3 weeks ago · 1e3e186c18
parent 538f260061
commit 1e3e186c18
4 changed files with 8 additions and 4 deletions
--- a/README.md
+++ b/README.md
@ -178,6 +178,7 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
  - 🧩  *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).

 ### Recent Update
+- 🎉 2025.09.01: Add [Whisper large v3 and turbo model](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/whisper).
 - 🤗 2025.08.11: Add [code-switch online model and server demo](./examples/tal_cs/asr1/).
 - 👑 2023.05.31: Add [WavLM ASR-en](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/librispeech/asr5), WavLM fine-tuning for ASR on LibriSpeech.
 - 🎉 2023.05.18: Add [Squeezeformer](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell/asr1), Squeezeformer training for ASR on Aishell.
--- a/README_cn.md
+++ b/README_cn.md
@ -183,6 +183,7 @@
  - 🧩 级联模型应用: 作为传统语音任务的扩展，我们结合了自然语言处理、计算机视觉等任务，实现更接近实际需求的产业级应用。

 ### 近期更新
+- 🎉 2025.09.01: 新增 [Whisper large v3 与 turbo 模型](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/whisper).
 - 🤗 2025.08.11: 新增 [流式中英混合 tal_cs 识别模型](./examples/tal_cs/asr1/).
 - 👑 2023.05.31: 新增 [WavLM ASR-en](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/librispeech/asr5), 基于WavLM的英语识别微调，使用LibriSpeech数据集
 - 🎉 2023.05.18: 新增 [Squeezeformer](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell/asr1), 使用Squeezeformer进行训练，使用Aishell数据集
--- a/demos/whisper/README.md
+++ b/demos/whisper/README.md
@ -39,10 +39,10 @@ Whisper model trained by OpenAI whisper https://github.com/openai/whisper
   ```
   Arguments:
   - `input`(required): Audio file to recognize.
-   - `model`: Model type of asr task. Default: `whisper-large`.
+   - `model`: Model type of asr task. Default: `whisper`.
   - `task`: Output type. Default: `transcribe`.
   - `lang`: Model language. Default: ``. Use `en` to choice English-only model. Now [medium,base,small,tiny] size can support English-only.
-   - `size`: Model size for decode. Default: `large`. Now can support [large,medium,base,small,tiny].
+   - `size`: Model size for decode. Default: `turbo`. Now can support [turbo,large,medium,base,small,tiny].
   - `language`: Set decode language. Default: `None`. Forcibly set the recognized language, which is determined by the model itself by default. 
   - `sample_rate`: Sample rate of the model. Default: `16000`. Other sampling rates are not supported now.
   - `config`: Config of asr task. Use pretrained model when it is None. Default: `None`.
@ -74,6 +74,7 @@ Whisper model trained by OpenAI whisper https://github.com/openai/whisper
   feature = whisper_executor(
       model='whisper',
       task='translate',
+       size='large', # For the translation function, is it better to use large or medium model
       sample_rate=16000,
       config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
       ckpt_path=None,
--- a/demos/whisper/README_cn.md
+++ b/demos/whisper/README_cn.md
@ -39,10 +39,10 @@ Whisper模型由OpenAI Whisper训练 https://github.com/openai/whisper
   ```
   参数：
   - `input`(必须输入)：用于识别的音频文件。
-   - `model`：ASR 任务的模型，默认值：`whisper-large`。
+   - `model`：ASR 任务的模型，默认值：`whisper`。
   - `task`：输出类别，默认值：`transcribe`。
   - `lang`: 模型语言，默认值：``，使用`en`选择只支持英文的模型，目前可选择`en`的模型有[medium,base,small,tiny]。
-   - `size`: 模型大小，默认值：`large`，目前支持[large,medium,base,small,tiny]。
+   - `size`: 模型大小，默认值：`turbo`，目前支持[turbo,large,medium,base,small,tiny]。
   - `language`：设定解码语言，默认值：`None`，强制设定识别出的语言，默认为模型自行判定。
   - `sample_rate`：音频采样率，默认值：`16000`，目前Whisper暂不支持其他采样率。
   - `config`：ASR 任务的参数文件，若不设置则使用预训练模型中的默认配置，默认值：`None`。
@ -74,6 +74,7 @@ Whisper模型由OpenAI Whisper训练 https://github.com/openai/whisper
   feature = whisper_executor(
       model='whisper',
       task='translate',
+       size='large', # For the translation function, is it better to use large or medium model
       sample_rate=16000,
       config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
       ckpt_path=None,