add whisper doc. (#4115)

pull/4102/merge
zxcd 3 days ago committed by GitHub
parent 538f260061
commit 1e3e186c18
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -178,6 +178,7 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
- 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).
### Recent Update
- 🎉 2025.09.01: Add [Whisper large v3 and turbo model](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/whisper).
- 🤗 2025.08.11: Add [code-switch online model and server demo](./examples/tal_cs/asr1/).
- 👑 2023.05.31: Add [WavLM ASR-en](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/librispeech/asr5), WavLM fine-tuning for ASR on LibriSpeech.
- 🎉 2023.05.18: Add [Squeezeformer](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell/asr1), Squeezeformer training for ASR on Aishell.

@ -183,6 +183,7 @@
- 🧩 级联模型应用: 作为传统语音任务的扩展,我们结合了自然语言处理、计算机视觉等任务,实现更接近实际需求的产业级应用。
### 近期更新
- 🎉 2025.09.01: 新增 [Whisper large v3 与 turbo 模型](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/whisper).
- 🤗 2025.08.11: 新增 [流式中英混合 tal_cs 识别模型](./examples/tal_cs/asr1/).
- 👑 2023.05.31: 新增 [WavLM ASR-en](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/librispeech/asr5), 基于WavLM的英语识别微调使用LibriSpeech数据集
- 🎉 2023.05.18: 新增 [Squeezeformer](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell/asr1), 使用Squeezeformer进行训练使用Aishell数据集

@ -39,10 +39,10 @@ Whisper model trained by OpenAI whisper https://github.com/openai/whisper
```
Arguments:
- `input`(required): Audio file to recognize.
- `model`: Model type of asr task. Default: `whisper-large`.
- `model`: Model type of asr task. Default: `whisper`.
- `task`: Output type. Default: `transcribe`.
- `lang`: Model language. Default: ``. Use `en` to choice English-only model. Now [medium,base,small,tiny] size can support English-only.
- `size`: Model size for decode. Default: `large`. Now can support [large,medium,base,small,tiny].
- `size`: Model size for decode. Default: `turbo`. Now can support [turbo,large,medium,base,small,tiny].
- `language`: Set decode language. Default: `None`. Forcibly set the recognized language, which is determined by the model itself by default.
- `sample_rate`: Sample rate of the model. Default: `16000`. Other sampling rates are not supported now.
- `config`: Config of asr task. Use pretrained model when it is None. Default: `None`.
@ -74,6 +74,7 @@ Whisper model trained by OpenAI whisper https://github.com/openai/whisper
feature = whisper_executor(
model='whisper',
task='translate',
size='large', # For the translation function, is it better to use large or medium model
sample_rate=16000,
config=None, # Set `config` and `ckpt_path` to None to use pretrained model.
ckpt_path=None,

@ -39,10 +39,10 @@ Whisper模型由OpenAI Whisper训练 https://github.com/openai/whisper
```
参数:
- `input`(必须输入):用于识别的音频文件。
- `model`ASR 任务的模型,默认值:`whisper-large`。
- `model`ASR 任务的模型,默认值:`whisper`。
- `task`:输出类别,默认值:`transcribe`。
- `lang`: 模型语言,默认值:``,使用`en`选择只支持英文的模型,目前可选择`en`的模型有[medium,base,small,tiny]。
- `size`: 模型大小,默认值:`large`,目前支持[large,medium,base,small,tiny]。
- `size`: 模型大小,默认值:`turbo`,目前支持[turbo,large,medium,base,small,tiny]。
- `language`:设定解码语言,默认值:`None`,强制设定识别出的语言,默认为模型自行判定。
- `sample_rate`:音频采样率,默认值:`16000`目前Whisper暂不支持其他采样率。
- `config`ASR 任务的参数文件,若不设置则使用预训练模型中的默认配置,默认值:`None`。
@ -74,6 +74,7 @@ Whisper模型由OpenAI Whisper训练 https://github.com/openai/whisper
feature = whisper_executor(
model='whisper',
task='translate',
size='large', # For the translation function, is it better to use large or medium model
sample_rate=16000,
config=None, # Set `config` and `ckpt_path` to None to use pretrained model.
ckpt_path=None,

Loading…
Cancel
Save