add asr code-switch cli and demo.

pull/2816/head
zxcd 3 years ago
parent e793d267d9
commit f8d3a915aa

@ -157,6 +157,8 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
- 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).
### Recent Update
- 🔥 2022.01.10: Add [code-switch asr CLI and Demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_recognition).
- 👑 2022.01.06: Add [code-switch asr tal_cs recipe](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/tal_cs/asr1/).
- 🎉 2022.12.02: Add [end-to-end Prosody Prediction pipeline](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3_rhy) (including using prosody labels in Acoustic Model).
- 🎉 2022.11.30: Add [TTS Android Demo](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/TTSAndroid).
- 🤗 2022.11.28: PP-TTS and PP-ASR demos are available in [AIStudio](https://aistudio.baidu.com/aistudio/modelsoverview) and [official website

@ -164,6 +164,8 @@
### 近期更新
- 🔥 2022.01.10: 新增 [中英混合 ASR CLI 和 Demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_recognition).
- 👑 2022.01.06: 新增 [ASR中英混合 tal_cs 训练推理流程](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/tal_cs/asr1/).
- 🎉 2022.12.02: 新增 [端到端韵律预测全流程](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3_rhy) (包含在声学模型中使用韵律标签)。
- 🎉 2022.11.30: 新增 [TTS Android 部署示例](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/TTSAndroid)。
- 🤗 2022.11.28: PP-TTS and PP-ASR 示例可在 [AIStudio](https://aistudio.baidu.com/aistudio/modelsoverview) 和[飞桨官网](https://www.paddlepaddle.org.cn/models)体验!

@ -17,7 +17,7 @@ The input of this demo should be a WAV file(`.wav`), and the sample rate must be
Here are sample files for this demo that can be downloaded:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav
```
### 3. Usage
@ -27,6 +27,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
paddlespeech asr --input ./zh.wav -v
# English
paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav -v
# Code-Switch
paddlespeech asr --model conformer_talcs --lang zh_en --codeswitch True --input ./ch_zh_mix.wav -v
# Chinese ASR + Punctuation Restoration
paddlespeech asr --input ./zh.wav -v | paddlespeech text --task punc -v
```
@ -40,6 +42,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
- `input`(required): Audio file to recognize.
- `model`: Model type of asr task. Default: `conformer_wenetspeech`.
- `lang`: Model language. Default: `zh`.
- `codeswitch`: Code Swith Model. Default: `False`
- `sample_rate`: Sample rate of the model. Default: `16000`.
- `config`: Config of asr task. Use pretrained model when it is None. Default: `None`.
- `ckpt_path`: Model checkpoint. Use pretrained model when it is None. Default: `None`.
@ -83,14 +86,15 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API:
| Model | Language | Sample Rate
| :--- | :---: | :---: |
| conformer_wenetspeech | zh | 16k
| conformer_online_multicn | zh | 16k
| conformer_aishell | zh | 16k
| conformer_online_aishell | zh | 16k
| transformer_librispeech | en | 16k
| deepspeech2online_wenetspeech | zh | 16k
| deepspeech2offline_aishell| zh| 16k
| deepspeech2online_aishell | zh | 16k
| deepspeech2offline_librispeech | en | 16k
| Model | Code Switch | Language | Sample Rate
| :--- | :---: | :---: | :---: |
| conformer_wenetspeech | False | zh | 16k
| conformer_online_multicn | False | zh | 16k
| conformer_aishell | False | zh | 16k
| conformer_online_aishell | False | zh | 16k
| transformer_librispeech | False | en | 16k
| deepspeech2online_wenetspeech | False | zh | 16k
| deepspeech2offline_aishell | False | zh| 16k
| deepspeech2online_aishell | False | zh | 16k
| deepspeech2offline_librispeech | False | en | 16k
| conformer_talcs | True | zh_en | 16k

@ -1,4 +1,5 @@
(简体中文|[English](./README.md))
(简体中文|[English](./README.md))
# 语音识别
## 介绍
@ -16,7 +17,7 @@
可以下载此 demo 的示例音频:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav
```
### 3. 使用方法
- 命令行 (推荐使用)
@ -25,6 +26,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
paddlespeech asr --input ./zh.wav -v
# 英文
paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav -v
#中英混合
paddlespeech asr --model conformer_talcs --lang zh_en --codeswitch True --input ./ch_zh_mix.wav -v
# 中文 + 标点恢复
paddlespeech asr --input ./zh.wav -v | paddlespeech text --task punc -v
```
@ -38,6 +41,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
- `input`(必须输入):用于识别的音频文件。
- `model`ASR 任务的模型,默认值:`conformer_wenetspeech`。
- `lang`:模型语言,默认值:`zh`。
- `codeswitch`: 是否使用语言转换,默认值:`False`。
- `sample_rate`:音频采样率,默认值:`16000`。
- `config`ASR 任务的参数文件,若不设置则使用预训练模型中的默认配置,默认值:`None`。
- `ckpt_path`:模型参数文件,若不设置则下载预训练模型使用,默认值:`None`。
@ -80,14 +84,15 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
### 4.预训练模型
以下是 PaddleSpeech 提供的可以被命令行和 python API 使用的预训练模型列表:
| 模型 | 语言 | 采样率
| :--- | :---: | :---: |
| conformer_wenetspeech | zh | 16k
| conformer_online_multicn | zh | 16k
| conformer_aishell | zh | 16k
| conformer_online_aishell | zh | 16k
| transformer_librispeech | en | 16k
| deepspeech2online_wenetspeech | zh | 16k
| deepspeech2offline_aishell| zh| 16k
| deepspeech2online_aishell | zh | 16k
| deepspeech2offline_librispeech | en | 16k
| 模型 | 语言转换 | 语言 | 采样率
| :--- | :---: | :---: | :---: |
| conformer_wenetspeech | False | zh | 16k
| conformer_online_multicn | False | zh | 16k
| conformer_aishell | False | zh | 16k
| conformer_online_aishell | False | zh | 16k
| transformer_librispeech | False | en | 16k
| deepspeech2online_wenetspeech | False | zh | 16k
| deepspeech2offline_aishell | False | zh| 16k
| deepspeech2online_aishell | False | zh | 16k
| deepspeech2offline_librispeech | False | en | 16k
| conformer_talcs | True | zh_en | 16k

@ -1,7 +1,9 @@
#!/bin/bash
#!/bin/bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav
# asr
paddlespeech asr --input ./zh.wav
@ -18,6 +20,11 @@ paddlespeech asr --help
# english asr
paddlespeech asr --lang en --model transformer_librispeech --input ./en.wav
# code-switch asr
paddlespeech asr --lang zh_en --codeswitch True --model conformer_talcs --input ./ch_zh_mix.wav
# model stats
paddlespeech stats --task asr

@ -25,6 +25,9 @@ import librosa
import numpy as np
import paddle
import soundfile
from paddlespeech.audio.transform.transformation import Transformation
from paddlespeech.s2t.frontend.featurizer.text_featurizer import TextFeaturizer
from paddlespeech.s2t.utils.utility import UpdateConfig
from yacs.config import CfgNode
from ...utils.env import MODEL_HOME
@ -34,9 +37,6 @@ from ..log import logger
from ..utils import CLI_TIMER
from ..utils import stats_wrapper
from ..utils import timer_register
from paddlespeech.audio.transform.transformation import Transformation
from paddlespeech.s2t.frontend.featurizer.text_featurizer import TextFeaturizer
from paddlespeech.s2t.utils.utility import UpdateConfig
__all__ = ['ASRExecutor']
@ -62,8 +62,13 @@ class ASRExecutor(BaseExecutor):
'--lang',
type=str,
default='zh',
help='Choose model language. zh or en, zh:[conformer_wenetspeech-zh-16k], en:[transformer_librispeech-en-16k]'
help='Choose model language. [zh, en, zh_en], zh:[conformer_wenetspeech-zh-16k], en:[transformer_librispeech-en-16k], zh_en:[conformer_talcs-zh_en-16k-codeswitch]'
)
self.parser.add_argument(
'--codeswitch',
type=bool,
default=False,
help='Choose whether use code-switch. True or False.')
self.parser.add_argument(
"--sample_rate",
type=int,
@ -127,6 +132,7 @@ class ASRExecutor(BaseExecutor):
def _init_from_path(self,
model_type: str='wenetspeech',
lang: str='zh',
codeswitch: bool=False,
sample_rate: int=16000,
cfg_path: Optional[os.PathLike]=None,
decode_method: str='attention_rescoring',
@ -144,6 +150,9 @@ class ASRExecutor(BaseExecutor):
if cfg_path is None or ckpt_path is None:
sample_rate_str = '16k' if sample_rate == 16000 else '8k'
if lang == "zh_en" and codeswitch is True:
tag = model_type + '-' + lang + '-' + sample_rate_str + '-' + 'codeswitch'
else:
tag = model_type + '-' + lang + '-' + sample_rate_str
self.task_resource.set_task_model(tag, version=None)
self.res_path = self.task_resource.res_dir
@ -423,6 +432,7 @@ class ASRExecutor(BaseExecutor):
model = parser_args.model
lang = parser_args.lang
codeswitch = parser_args.codeswitch
sample_rate = parser_args.sample_rate
config = parser_args.config
ckpt_path = parser_args.ckpt_path
@ -444,6 +454,7 @@ class ASRExecutor(BaseExecutor):
audio_file=input_,
model=model,
lang=lang,
codeswitch=codeswitch,
sample_rate=sample_rate,
config=config,
ckpt_path=ckpt_path,
@ -472,6 +483,7 @@ class ASRExecutor(BaseExecutor):
audio_file: os.PathLike,
model: str='conformer_u2pp_online_wenetspeech',
lang: str='zh',
codeswitch: bool=False,
sample_rate: int=16000,
config: os.PathLike=None,
ckpt_path: os.PathLike=None,
@ -485,8 +497,8 @@ class ASRExecutor(BaseExecutor):
"""
audio_file = os.path.abspath(audio_file)
paddle.set_device(device)
self._init_from_path(model, lang, sample_rate, config, decode_method,
num_decoding_left_chunks, ckpt_path)
self._init_from_path(model, lang, codeswitch, sample_rate, config,
decode_method, num_decoding_left_chunks, ckpt_path)
if not self._check(audio_file, sample_rate, force_yes):
sys.exit(-1)
if rtf:

@ -14,6 +14,7 @@
import argparse
from typing import List
import numpy
from prettytable import PrettyTable
from ..resource import CommonTaskResource
@ -78,7 +79,7 @@ class VersionCommand:
model_name_format = {
'asr': 'Model-Language-Sample Rate',
'asr': 'Model-Size-Code Switch-Multilingual-Language-Sample Rate',
'cls': 'Model-Sample Rate',
'st': 'Model-Source language-Target language',
'text': 'Model-Task-Language',
@ -111,7 +112,15 @@ class StatsCommand:
fields = model_name_format[self.task].split("-")
table = PrettyTable(fields)
for key in pretrained_models:
table.add_row(key.split("-"))
line = key.split("-")
if self.task == "asr" and len(line) < len(fields):
for i in range(len(line), len(fields)):
line.append("-")
tmp = numpy.array(line)
idx = [0, 5, 3, 4, 1, 2]
line = tmp[idx]
table.add_row(line)
print(table)
def execute(self, argv: List[str]) -> bool:

@ -322,6 +322,18 @@ asr_dynamic_pretrained_models = {
'099a601759d467cd0a8523ff939819c5'
},
},
"conformer_talcs-zh_en-16k-codeswitch": {
'1.4': {
'url':
'https://paddlespeech.bj.bcebos.com/s2t/tal_cs/asr1/asr1_conformer_talcs_ckpt_1.4.0.model.tar.gz',
'md5':
'01962c5d0a70878fe41cacd4f61e14d1',
'cfg_path':
'model.yaml',
'ckpt_path':
'exp/conformer/checkpoints/avg_10'
},
},
}
asr_static_pretrained_models = {

Loading…
Cancel
Save