diff --git a/README.md b/README.md index dbdf6a4f8..2fb773634 100644 --- a/README.md +++ b/README.md @@ -157,6 +157,8 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision - 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV). ### Recent Update +- 🔥 2022.01.10: Add [code-switch asr CLI and Demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_recognition). +- 👑 2022.01.06: Add [code-switch asr tal_cs recipe](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/tal_cs/asr1/). - 🎉 2022.12.02: Add [end-to-end Prosody Prediction pipeline](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3_rhy) (including using prosody labels in Acoustic Model). - 🎉 2022.11.30: Add [TTS Android Demo](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/TTSAndroid). - 🤗 2022.11.28: PP-TTS and PP-ASR demos are available in [AIStudio](https://aistudio.baidu.com/aistudio/modelsoverview) and [official website diff --git a/README_cn.md b/README_cn.md index 5cc156c9f..53f6a66e4 100644 --- a/README_cn.md +++ b/README_cn.md @@ -164,6 +164,8 @@ ### 近期更新 +- 🔥 2022.01.10: 新增 [中英混合 ASR CLI 和 Demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_recognition). +- 👑 2022.01.06: 新增 [ASR中英混合 tal_cs 训练推理流程](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/tal_cs/asr1/). - 🎉 2022.12.02: 新增 [端到端韵律预测全流程](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3_rhy) (包含在声学模型中使用韵律标签)。 - 🎉 2022.11.30: 新增 [TTS Android 部署示例](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/TTSAndroid)。 - 🤗 2022.11.28: PP-TTS and PP-ASR 示例可在 [AIStudio](https://aistudio.baidu.com/aistudio/modelsoverview) 和[飞桨官网](https://www.paddlepaddle.org.cn/models)体验! diff --git a/demos/speech_recognition/README.md b/demos/speech_recognition/README.md index c815a88af..ee2acd6fd 100644 --- a/demos/speech_recognition/README.md +++ b/demos/speech_recognition/README.md @@ -17,7 +17,7 @@ The input of this demo should be a WAV file(`.wav`), and the sample rate must be Here are sample files for this demo that can be downloaded: ```bash -wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav +wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav ``` ### 3. Usage @@ -27,6 +27,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee paddlespeech asr --input ./zh.wav -v # English paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav -v + # Code-Switch + paddlespeech asr --model conformer_talcs --lang zh_en --codeswitch True --input ./ch_zh_mix.wav -v # Chinese ASR + Punctuation Restoration paddlespeech asr --input ./zh.wav -v | paddlespeech text --task punc -v ``` @@ -40,6 +42,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee - `input`(required): Audio file to recognize. - `model`: Model type of asr task. Default: `conformer_wenetspeech`. - `lang`: Model language. Default: `zh`. + - `codeswitch`: Code Swith Model. Default: `False` - `sample_rate`: Sample rate of the model. Default: `16000`. - `config`: Config of asr task. Use pretrained model when it is None. Default: `None`. - `ckpt_path`: Model checkpoint. Use pretrained model when it is None. Default: `None`. @@ -83,14 +86,15 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API: -| Model | Language | Sample Rate -| :--- | :---: | :---: | -| conformer_wenetspeech | zh | 16k -| conformer_online_multicn | zh | 16k -| conformer_aishell | zh | 16k -| conformer_online_aishell | zh | 16k -| transformer_librispeech | en | 16k -| deepspeech2online_wenetspeech | zh | 16k -| deepspeech2offline_aishell| zh| 16k -| deepspeech2online_aishell | zh | 16k -| deepspeech2offline_librispeech | en | 16k +| Model | Code Switch | Language | Sample Rate +| :--- | :---: | :---: | :---: | +| conformer_wenetspeech | False | zh | 16k +| conformer_online_multicn | False | zh | 16k +| conformer_aishell | False | zh | 16k +| conformer_online_aishell | False | zh | 16k +| transformer_librispeech | False | en | 16k +| deepspeech2online_wenetspeech | False | zh | 16k +| deepspeech2offline_aishell | False | zh| 16k +| deepspeech2online_aishell | False | zh | 16k +| deepspeech2offline_librispeech | False | en | 16k +| conformer_talcs | True | zh_en | 16k diff --git a/demos/speech_recognition/README_cn.md b/demos/speech_recognition/README_cn.md index 13aa9f277..62dce3bc9 100644 --- a/demos/speech_recognition/README_cn.md +++ b/demos/speech_recognition/README_cn.md @@ -1,4 +1,5 @@ (简体中文|[English](./README.md)) + (简体中文|[English](./README.md)) # 语音识别 ## 介绍 @@ -16,7 +17,7 @@ 可以下载此 demo 的示例音频: ```bash -wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav +wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav ``` ### 3. 使用方法 - 命令行 (推荐使用) @@ -25,6 +26,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee paddlespeech asr --input ./zh.wav -v # 英文 paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav -v + #中英混合 + paddlespeech asr --model conformer_talcs --lang zh_en --codeswitch True --input ./ch_zh_mix.wav -v # 中文 + 标点恢复 paddlespeech asr --input ./zh.wav -v | paddlespeech text --task punc -v ``` @@ -38,6 +41,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee - `input`(必须输入):用于识别的音频文件。 - `model`:ASR 任务的模型,默认值:`conformer_wenetspeech`。 - `lang`:模型语言,默认值:`zh`。 + - `codeswitch`: 是否使用语言转换,默认值:`False`。 - `sample_rate`:音频采样率,默认值:`16000`。 - `config`:ASR 任务的参数文件,若不设置则使用预训练模型中的默认配置,默认值:`None`。 - `ckpt_path`:模型参数文件,若不设置则下载预训练模型使用,默认值:`None`。 @@ -80,14 +84,15 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee ### 4.预训练模型 以下是 PaddleSpeech 提供的可以被命令行和 python API 使用的预训练模型列表: -| 模型 | 语言 | 采样率 -| :--- | :---: | :---: | -| conformer_wenetspeech | zh | 16k -| conformer_online_multicn | zh | 16k -| conformer_aishell | zh | 16k -| conformer_online_aishell | zh | 16k -| transformer_librispeech | en | 16k -| deepspeech2online_wenetspeech | zh | 16k -| deepspeech2offline_aishell| zh| 16k -| deepspeech2online_aishell | zh | 16k -| deepspeech2offline_librispeech | en | 16k +| 模型 | 语言转换 | 语言 | 采样率 +| :--- | :---: | :---: | :---: | +| conformer_wenetspeech | False | zh | 16k +| conformer_online_multicn | False | zh | 16k +| conformer_aishell | False | zh | 16k +| conformer_online_aishell | False | zh | 16k +| transformer_librispeech | False | en | 16k +| deepspeech2online_wenetspeech | False | zh | 16k +| deepspeech2offline_aishell | False | zh| 16k +| deepspeech2online_aishell | False | zh | 16k +| deepspeech2offline_librispeech | False | en | 16k +| conformer_talcs | True | zh_en | 16k diff --git a/demos/speech_recognition/run.sh b/demos/speech_recognition/run.sh index e48ff3e96..5eacce9d9 100755 --- a/demos/speech_recognition/run.sh +++ b/demos/speech_recognition/run.sh @@ -1,7 +1,9 @@ #!/bin/bash +#!/bin/bash wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav +wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav # asr paddlespeech asr --input ./zh.wav @@ -18,6 +20,11 @@ paddlespeech asr --help # english asr paddlespeech asr --lang en --model transformer_librispeech --input ./en.wav + +# code-switch asr +paddlespeech asr --lang zh_en --codeswitch True --model conformer_talcs --input ./ch_zh_mix.wav + + # model stats paddlespeech stats --task asr diff --git a/paddlespeech/cli/asr/infer.py b/paddlespeech/cli/asr/infer.py index 004143361..3f8e5f65e 100644 --- a/paddlespeech/cli/asr/infer.py +++ b/paddlespeech/cli/asr/infer.py @@ -25,6 +25,9 @@ import librosa import numpy as np import paddle import soundfile +from paddlespeech.audio.transform.transformation import Transformation +from paddlespeech.s2t.frontend.featurizer.text_featurizer import TextFeaturizer +from paddlespeech.s2t.utils.utility import UpdateConfig from yacs.config import CfgNode from ...utils.env import MODEL_HOME @@ -34,9 +37,6 @@ from ..log import logger from ..utils import CLI_TIMER from ..utils import stats_wrapper from ..utils import timer_register -from paddlespeech.audio.transform.transformation import Transformation -from paddlespeech.s2t.frontend.featurizer.text_featurizer import TextFeaturizer -from paddlespeech.s2t.utils.utility import UpdateConfig __all__ = ['ASRExecutor'] @@ -62,8 +62,13 @@ class ASRExecutor(BaseExecutor): '--lang', type=str, default='zh', - help='Choose model language. zh or en, zh:[conformer_wenetspeech-zh-16k], en:[transformer_librispeech-en-16k]' + help='Choose model language. [zh, en, zh_en], zh:[conformer_wenetspeech-zh-16k], en:[transformer_librispeech-en-16k], zh_en:[conformer_talcs-zh_en-16k-codeswitch]' ) + self.parser.add_argument( + '--codeswitch', + type=bool, + default=False, + help='Choose whether use code-switch. True or False.') self.parser.add_argument( "--sample_rate", type=int, @@ -127,6 +132,7 @@ class ASRExecutor(BaseExecutor): def _init_from_path(self, model_type: str='wenetspeech', lang: str='zh', + codeswitch: bool=False, sample_rate: int=16000, cfg_path: Optional[os.PathLike]=None, decode_method: str='attention_rescoring', @@ -144,7 +150,10 @@ class ASRExecutor(BaseExecutor): if cfg_path is None or ckpt_path is None: sample_rate_str = '16k' if sample_rate == 16000 else '8k' - tag = model_type + '-' + lang + '-' + sample_rate_str + if lang == "zh_en" and codeswitch is True: + tag = model_type + '-' + lang + '-' + sample_rate_str + '-' + 'codeswitch' + else: + tag = model_type + '-' + lang + '-' + sample_rate_str self.task_resource.set_task_model(tag, version=None) self.res_path = self.task_resource.res_dir @@ -423,6 +432,7 @@ class ASRExecutor(BaseExecutor): model = parser_args.model lang = parser_args.lang + codeswitch = parser_args.codeswitch sample_rate = parser_args.sample_rate config = parser_args.config ckpt_path = parser_args.ckpt_path @@ -444,6 +454,7 @@ class ASRExecutor(BaseExecutor): audio_file=input_, model=model, lang=lang, + codeswitch=codeswitch, sample_rate=sample_rate, config=config, ckpt_path=ckpt_path, @@ -472,6 +483,7 @@ class ASRExecutor(BaseExecutor): audio_file: os.PathLike, model: str='conformer_u2pp_online_wenetspeech', lang: str='zh', + codeswitch: bool=False, sample_rate: int=16000, config: os.PathLike=None, ckpt_path: os.PathLike=None, @@ -485,8 +497,8 @@ class ASRExecutor(BaseExecutor): """ audio_file = os.path.abspath(audio_file) paddle.set_device(device) - self._init_from_path(model, lang, sample_rate, config, decode_method, - num_decoding_left_chunks, ckpt_path) + self._init_from_path(model, lang, codeswitch, sample_rate, config, + decode_method, num_decoding_left_chunks, ckpt_path) if not self._check(audio_file, sample_rate, force_yes): sys.exit(-1) if rtf: diff --git a/paddlespeech/cli/base_commands.py b/paddlespeech/cli/base_commands.py index 767d0df78..efcd671d0 100644 --- a/paddlespeech/cli/base_commands.py +++ b/paddlespeech/cli/base_commands.py @@ -14,6 +14,7 @@ import argparse from typing import List +import numpy from prettytable import PrettyTable from ..resource import CommonTaskResource @@ -78,7 +79,7 @@ class VersionCommand: model_name_format = { - 'asr': 'Model-Language-Sample Rate', + 'asr': 'Model-Size-Code Switch-Multilingual-Language-Sample Rate', 'cls': 'Model-Sample Rate', 'st': 'Model-Source language-Target language', 'text': 'Model-Task-Language', @@ -111,7 +112,15 @@ class StatsCommand: fields = model_name_format[self.task].split("-") table = PrettyTable(fields) for key in pretrained_models: - table.add_row(key.split("-")) + line = key.split("-") + if self.task == "asr" and len(line) < len(fields): + for i in range(len(line), len(fields)): + line.append("-") + tmp = numpy.array(line) + idx = [0, 5, 3, 4, 1, 2] + line = tmp[idx] + table.add_row(line) + print(table) def execute(self, argv: List[str]) -> bool: diff --git a/paddlespeech/resource/pretrained_models.py b/paddlespeech/resource/pretrained_models.py index 3c5aa1f90..5da97692f 100644 --- a/paddlespeech/resource/pretrained_models.py +++ b/paddlespeech/resource/pretrained_models.py @@ -322,6 +322,18 @@ asr_dynamic_pretrained_models = { '099a601759d467cd0a8523ff939819c5' }, }, + "conformer_talcs-zh_en-16k-codeswitch": { + '1.4': { + 'url': + 'https://paddlespeech.bj.bcebos.com/s2t/tal_cs/asr1/asr1_conformer_talcs_ckpt_1.4.0.model.tar.gz', + 'md5': + '01962c5d0a70878fe41cacd4f61e14d1', + 'cfg_path': + 'model.yaml', + 'ckpt_path': + 'exp/conformer/checkpoints/avg_10' + }, + }, } asr_static_pretrained_models = {