diff --git a/README.md b/README.md index dbdf6a4f..2fb77363 100644 --- a/README.md +++ b/README.md @@ -157,6 +157,8 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision - 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV). ### Recent Update +- 🔥 2022.01.10: Add [code-switch asr CLI and Demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_recognition). +- 👑 2022.01.06: Add [code-switch asr tal_cs recipe](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/tal_cs/asr1/). - 🎉 2022.12.02: Add [end-to-end Prosody Prediction pipeline](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3_rhy) (including using prosody labels in Acoustic Model). - 🎉 2022.11.30: Add [TTS Android Demo](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/TTSAndroid). - 🤗 2022.11.28: PP-TTS and PP-ASR demos are available in [AIStudio](https://aistudio.baidu.com/aistudio/modelsoverview) and [official website diff --git a/README_cn.md b/README_cn.md index 5cc156c9..53f6a66e 100644 --- a/README_cn.md +++ b/README_cn.md @@ -164,6 +164,8 @@ ### 近期更新 +- 🔥 2022.01.10: 新增 [中英混合 ASR CLI 和 Demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_recognition). +- 👑 2022.01.06: 新增 [ASR中英混合 tal_cs 训练推理流程](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/tal_cs/asr1/). - 🎉 2022.12.02: 新增 [端到端韵律预测全流程](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3_rhy) (包含在声学模型中使用韵律标签)。 - 🎉 2022.11.30: 新增 [TTS Android 部署示例](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/TTSAndroid)。 - 🤗 2022.11.28: PP-TTS and PP-ASR 示例可在 [AIStudio](https://aistudio.baidu.com/aistudio/modelsoverview) 和[飞桨官网](https://www.paddlepaddle.org.cn/models)体验! diff --git a/demos/speech_recognition/README.md b/demos/speech_recognition/README.md index c815a88a..ee2acd6f 100644 --- a/demos/speech_recognition/README.md +++ b/demos/speech_recognition/README.md @@ -17,7 +17,7 @@ The input of this demo should be a WAV file(`.wav`), and the sample rate must be Here are sample files for this demo that can be downloaded: ```bash -wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav +wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav ``` ### 3. Usage @@ -27,6 +27,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee paddlespeech asr --input ./zh.wav -v # English paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav -v + # Code-Switch + paddlespeech asr --model conformer_talcs --lang zh_en --codeswitch True --input ./ch_zh_mix.wav -v # Chinese ASR + Punctuation Restoration paddlespeech asr --input ./zh.wav -v | paddlespeech text --task punc -v ``` @@ -40,6 +42,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee - `input`(required): Audio file to recognize. - `model`: Model type of asr task. Default: `conformer_wenetspeech`. - `lang`: Model language. Default: `zh`. + - `codeswitch`: Code Swith Model. Default: `False` - `sample_rate`: Sample rate of the model. Default: `16000`. - `config`: Config of asr task. Use pretrained model when it is None. Default: `None`. - `ckpt_path`: Model checkpoint. Use pretrained model when it is None. Default: `None`. @@ -83,14 +86,15 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API: -| Model | Language | Sample Rate -| :--- | :---: | :---: | -| conformer_wenetspeech | zh | 16k -| conformer_online_multicn | zh | 16k -| conformer_aishell | zh | 16k -| conformer_online_aishell | zh | 16k -| transformer_librispeech | en | 16k -| deepspeech2online_wenetspeech | zh | 16k -| deepspeech2offline_aishell| zh| 16k -| deepspeech2online_aishell | zh | 16k -| deepspeech2offline_librispeech | en | 16k +| Model | Code Switch | Language | Sample Rate +| :--- | :---: | :---: | :---: | +| conformer_wenetspeech | False | zh | 16k +| conformer_online_multicn | False | zh | 16k +| conformer_aishell | False | zh | 16k +| conformer_online_aishell | False | zh | 16k +| transformer_librispeech | False | en | 16k +| deepspeech2online_wenetspeech | False | zh | 16k +| deepspeech2offline_aishell | False | zh| 16k +| deepspeech2online_aishell | False | zh | 16k +| deepspeech2offline_librispeech | False | en | 16k +| conformer_talcs | True | zh_en | 16k diff --git a/demos/speech_recognition/README_cn.md b/demos/speech_recognition/README_cn.md index 13aa9f27..62dce3bc 100644 --- a/demos/speech_recognition/README_cn.md +++ b/demos/speech_recognition/README_cn.md @@ -1,4 +1,5 @@ (简体中文|[English](./README.md)) + (简体中文|[English](./README.md)) # 语音识别 ## 介绍 @@ -16,7 +17,7 @@ 可以下载此 demo 的示例音频: ```bash -wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav +wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav ``` ### 3. 使用方法 - 命令行 (推荐使用) @@ -25,6 +26,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee paddlespeech asr --input ./zh.wav -v # 英文 paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav -v + #中英混合 + paddlespeech asr --model conformer_talcs --lang zh_en --codeswitch True --input ./ch_zh_mix.wav -v # 中文 + 标点恢复 paddlespeech asr --input ./zh.wav -v | paddlespeech text --task punc -v ``` @@ -38,6 +41,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee - `input`(必须输入):用于识别的音频文件。 - `model`:ASR 任务的模型,默认值:`conformer_wenetspeech`。 - `lang`:模型语言,默认值:`zh`。 + - `codeswitch`: 是否使用语言转换,默认值:`False`。 - `sample_rate`:音频采样率,默认值:`16000`。 - `config`:ASR 任务的参数文件,若不设置则使用预训练模型中的默认配置,默认值:`None`。 - `ckpt_path`:模型参数文件,若不设置则下载预训练模型使用,默认值:`None`。 @@ -80,14 +84,15 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee ### 4.预训练模型 以下是 PaddleSpeech 提供的可以被命令行和 python API 使用的预训练模型列表: -| 模型 | 语言 | 采样率 -| :--- | :---: | :---: | -| conformer_wenetspeech | zh | 16k -| conformer_online_multicn | zh | 16k -| conformer_aishell | zh | 16k -| conformer_online_aishell | zh | 16k -| transformer_librispeech | en | 16k -| deepspeech2online_wenetspeech | zh | 16k -| deepspeech2offline_aishell| zh| 16k -| deepspeech2online_aishell | zh | 16k -| deepspeech2offline_librispeech | en | 16k +| 模型 | 语言转换 | 语言 | 采样率 +| :--- | :---: | :---: | :---: | +| conformer_wenetspeech | False | zh | 16k +| conformer_online_multicn | False | zh | 16k +| conformer_aishell | False | zh | 16k +| conformer_online_aishell | False | zh | 16k +| transformer_librispeech | False | en | 16k +| deepspeech2online_wenetspeech | False | zh | 16k +| deepspeech2offline_aishell | False | zh| 16k +| deepspeech2online_aishell | False | zh | 16k +| deepspeech2offline_librispeech | False | en | 16k +| conformer_talcs | True | zh_en | 16k diff --git a/demos/speech_recognition/run.sh b/demos/speech_recognition/run.sh index e48ff3e9..8ba6e4c3 100755 --- a/demos/speech_recognition/run.sh +++ b/demos/speech_recognition/run.sh @@ -2,6 +2,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav +wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav # asr paddlespeech asr --input ./zh.wav @@ -18,6 +19,11 @@ paddlespeech asr --help # english asr paddlespeech asr --lang en --model transformer_librispeech --input ./en.wav + +# code-switch asr +paddlespeech asr --lang zh_en --codeswitch True --model conformer_talcs --input ./ch_zh_mix.wav + + # model stats paddlespeech stats --task asr diff --git a/paddlespeech/cli/asr/infer.py b/paddlespeech/cli/asr/infer.py index 00414336..7a7aef8b 100644 --- a/paddlespeech/cli/asr/infer.py +++ b/paddlespeech/cli/asr/infer.py @@ -25,6 +25,9 @@ import librosa import numpy as np import paddle import soundfile +from paddlespeech.audio.transform.transformation import Transformation +from paddlespeech.s2t.frontend.featurizer.text_featurizer import TextFeaturizer +from paddlespeech.s2t.utils.utility import UpdateConfig from yacs.config import CfgNode from ...utils.env import MODEL_HOME @@ -34,9 +37,6 @@ from ..log import logger from ..utils import CLI_TIMER from ..utils import stats_wrapper from ..utils import timer_register -from paddlespeech.audio.transform.transformation import Transformation -from paddlespeech.s2t.frontend.featurizer.text_featurizer import TextFeaturizer -from paddlespeech.s2t.utils.utility import UpdateConfig __all__ = ['ASRExecutor'] @@ -62,8 +62,13 @@ class ASRExecutor(BaseExecutor): '--lang', type=str, default='zh', - help='Choose model language. zh or en, zh:[conformer_wenetspeech-zh-16k], en:[transformer_librispeech-en-16k]' + help='Choose model language. [zh, en, zh_en], zh:[conformer_wenetspeech-zh-16k], en:[transformer_librispeech-en-16k], zh_en:[conformer_talcs-codeswitch_zh_en-16k]' ) + self.parser.add_argument( + '--codeswitch', + type=bool, + default=False, + help='Choose whether use code-switch. True or False.') self.parser.add_argument( "--sample_rate", type=int, @@ -127,6 +132,7 @@ class ASRExecutor(BaseExecutor): def _init_from_path(self, model_type: str='wenetspeech', lang: str='zh', + codeswitch: bool=False, sample_rate: int=16000, cfg_path: Optional[os.PathLike]=None, decode_method: str='attention_rescoring', @@ -144,7 +150,12 @@ class ASRExecutor(BaseExecutor): if cfg_path is None or ckpt_path is None: sample_rate_str = '16k' if sample_rate == 16000 else '8k' - tag = model_type + '-' + lang + '-' + sample_rate_str + if lang == "zh_en" and codeswitch is True: + tag = model_type + '-' + 'codeswitch_' + lang + '-' + sample_rate_str + elif lang == "zh_en" or codeswitch is True: + raise Exception("codeswitch is true only in zh_en model") + else: + tag = model_type + '-' + lang + '-' + sample_rate_str self.task_resource.set_task_model(tag, version=None) self.res_path = self.task_resource.res_dir @@ -423,6 +434,7 @@ class ASRExecutor(BaseExecutor): model = parser_args.model lang = parser_args.lang + codeswitch = parser_args.codeswitch sample_rate = parser_args.sample_rate config = parser_args.config ckpt_path = parser_args.ckpt_path @@ -444,6 +456,7 @@ class ASRExecutor(BaseExecutor): audio_file=input_, model=model, lang=lang, + codeswitch=codeswitch, sample_rate=sample_rate, config=config, ckpt_path=ckpt_path, @@ -472,6 +485,7 @@ class ASRExecutor(BaseExecutor): audio_file: os.PathLike, model: str='conformer_u2pp_online_wenetspeech', lang: str='zh', + codeswitch: bool=False, sample_rate: int=16000, config: os.PathLike=None, ckpt_path: os.PathLike=None, @@ -485,8 +499,8 @@ class ASRExecutor(BaseExecutor): """ audio_file = os.path.abspath(audio_file) paddle.set_device(device) - self._init_from_path(model, lang, sample_rate, config, decode_method, - num_decoding_left_chunks, ckpt_path) + self._init_from_path(model, lang, codeswitch, sample_rate, config, + decode_method, num_decoding_left_chunks, ckpt_path) if not self._check(audio_file, sample_rate, force_yes): sys.exit(-1) if rtf: diff --git a/paddlespeech/cli/base_commands.py b/paddlespeech/cli/base_commands.py index 767d0df7..dfeb5cae 100644 --- a/paddlespeech/cli/base_commands.py +++ b/paddlespeech/cli/base_commands.py @@ -14,6 +14,7 @@ import argparse from typing import List +import numpy from prettytable import PrettyTable from ..resource import CommonTaskResource @@ -78,7 +79,7 @@ class VersionCommand: model_name_format = { - 'asr': 'Model-Language-Sample Rate', + 'asr': 'Model-Size-Code Switch-Multilingual-Language-Sample Rate', 'cls': 'Model-Sample Rate', 'st': 'Model-Source language-Target language', 'text': 'Model-Task-Language', @@ -111,7 +112,21 @@ class StatsCommand: fields = model_name_format[self.task].split("-") table = PrettyTable(fields) for key in pretrained_models: - table.add_row(key.split("-")) + line = key.split("-") + if self.task == "asr" and len(line) < len(fields): + for i in range(len(line), len(fields)): + line.append("-") + if "codeswitch" in key: + line[3], line[1] = line[1].split("_")[0], line[1].split( + "_")[1:] + elif "multilingual" in key: + line[4], line[1] = line[1].split("_")[0], line[1].split( + "_")[1:] + tmp = numpy.array(line) + idx = [0, 5, 3, 4, 1, 2] + line = tmp[idx] + table.add_row(line) + print(table) def execute(self, argv: List[str]) -> bool: diff --git a/paddlespeech/resource/pretrained_models.py b/paddlespeech/resource/pretrained_models.py index 3c5aa1f9..ff0b30f6 100644 --- a/paddlespeech/resource/pretrained_models.py +++ b/paddlespeech/resource/pretrained_models.py @@ -30,6 +30,7 @@ __all__ = [ ] # The tags for pretrained_models should be "{model_name}[_{dataset}][-{lang}][-...]". +# Add code-switch and multilingual tag, "{model_name}[_{dataset}]-[codeswitch/multilingual][_{lang}][-...]". # e.g. "conformer_wenetspeech-zh-16k" and "panns_cnn6-32k". # Command line and python api use "{model_name}[_{dataset}]" as --model, usage: # "paddlespeech asr --model conformer_wenetspeech --lang zh --sr 16000 --input ./input.wav" @@ -322,6 +323,18 @@ asr_dynamic_pretrained_models = { '099a601759d467cd0a8523ff939819c5' }, }, + "conformer_talcs-codeswitch_zh_en-16k": { + '1.4': { + 'url': + 'https://paddlespeech.bj.bcebos.com/s2t/tal_cs/asr1/asr1_conformer_talcs_ckpt_1.4.0.model.tar.gz', + 'md5': + '01962c5d0a70878fe41cacd4f61e14d1', + 'cfg_path': + 'model.yaml', + 'ckpt_path': + 'exp/conformer/checkpoints/avg_10' + }, + }, } asr_static_pretrained_models = { diff --git a/paddlespeech/server/bin/paddlespeech_server.py b/paddlespeech/server/bin/paddlespeech_server.py index 1b1792bd..299a8c3d 100644 --- a/paddlespeech/server/bin/paddlespeech_server.py +++ b/paddlespeech/server/bin/paddlespeech_server.py @@ -16,14 +16,9 @@ import sys import warnings from typing import List +import numpy import uvicorn from fastapi import FastAPI -from prettytable import PrettyTable -from starlette.middleware.cors import CORSMiddleware - -from ..executor import BaseExecutor -from ..util import cli_server_register -from ..util import stats_wrapper from paddlespeech.cli.log import logger from paddlespeech.resource import CommonTaskResource from paddlespeech.server.engine.engine_pool import init_engine_pool @@ -31,6 +26,12 @@ from paddlespeech.server.engine.engine_warmup import warm_up from paddlespeech.server.restful.api import setup_router as setup_http_router from paddlespeech.server.utils.config import get_config from paddlespeech.server.ws.api import setup_router as setup_ws_router +from prettytable import PrettyTable +from starlette.middleware.cors import CORSMiddleware + +from ..executor import BaseExecutor +from ..util import cli_server_register +from ..util import stats_wrapper warnings.filterwarnings("ignore") __all__ = ['ServerExecutor', 'ServerStatsExecutor'] @@ -134,7 +135,7 @@ class ServerStatsExecutor(): required=True) self.task_choices = ['asr', 'tts', 'cls', 'text', 'vector'] self.model_name_format = { - 'asr': 'Model-Language-Sample Rate', + 'asr': 'Model-Size-Code Switch-Multilingual-Language-Sample Rate', 'tts': 'Model-Language', 'cls': 'Model-Sample Rate', 'text': 'Model-Task-Language', @@ -145,7 +146,20 @@ class ServerStatsExecutor(): fields = self.model_name_format[self.task].split("-") table = PrettyTable(fields) for key in pretrained_models: - table.add_row(key.split("-")) + line = key.split("-") + if self.task == "asr" and len(line) < len(fields): + for i in range(len(line), len(fields)): + line.append("-") + if "codeswitch" in key: + line[3], line[1] = line[1].split("_")[0], line[1].split( + "_")[1:] + elif "multilingual" in key: + line[4], line[1] = line[1].split("_")[0], line[1].split( + "_")[1:] + tmp = numpy.array(line) + idx = [0, 5, 3, 4, 1, 2] + line = tmp[idx] + table.add_row(line) print(table) def execute(self, argv: List[str]) -> bool: diff --git a/tests/unit/cli/test_cli.sh b/tests/unit/cli/test_cli.sh index 3a58626d..5d3b76f6 100755 --- a/tests/unit/cli/test_cli.sh +++ b/tests/unit/cli/test_cli.sh @@ -14,7 +14,7 @@ paddlespeech ssl --task asr --lang en --input ./en.wav paddlespeech ssl --task vector --lang en --input ./en.wav # Speech_recognition -wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav +wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav paddlespeech asr --input ./zh.wav paddlespeech asr --model conformer_aishell --input ./zh.wav paddlespeech asr --model conformer_online_aishell --input ./zh.wav @@ -26,6 +26,7 @@ paddlespeech asr --model deepspeech2offline_aishell --input ./zh.wav paddlespeech asr --model deepspeech2online_wenetspeech --input ./zh.wav paddlespeech asr --model deepspeech2online_aishell --input ./zh.wav paddlespeech asr --model deepspeech2offline_librispeech --lang en --input ./en.wav +paddlespeech asr --model conformer_talcs --lang zh_en --codeswitch True --input ./ch_zh_mix.wav # Support editing num_decoding_left_chunks paddlespeech asr --model conformer_online_wenetspeech --num_decoding_left_chunks 3 --input ./zh.wav