[ASR] add asr code-switch cli and demo, test='asr' (#2816)

* add asr code-switch cli and demo.

* fix some model named problem.
pull/2825/head
zxcd 2 years ago committed by GitHub
parent 2c4c141de5
commit 88fe26f17c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -157,6 +157,8 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
- 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).
### Recent Update
- 🔥 2022.01.10: Add [code-switch asr CLI and Demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_recognition).
- 👑 2022.01.06: Add [code-switch asr tal_cs recipe](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/tal_cs/asr1/).
- 🎉 2022.12.02: Add [end-to-end Prosody Prediction pipeline](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3_rhy) (including using prosody labels in Acoustic Model).
- 🎉 2022.11.30: Add [TTS Android Demo](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/TTSAndroid).
- 🤗 2022.11.28: PP-TTS and PP-ASR demos are available in [AIStudio](https://aistudio.baidu.com/aistudio/modelsoverview) and [official website

@ -164,6 +164,8 @@
### 近期更新
- 🔥 2022.01.10: 新增 [中英混合 ASR CLI 和 Demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_recognition).
- 👑 2022.01.06: 新增 [ASR中英混合 tal_cs 训练推理流程](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/tal_cs/asr1/).
- 🎉 2022.12.02: 新增 [端到端韵律预测全流程](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3_rhy) (包含在声学模型中使用韵律标签)。
- 🎉 2022.11.30: 新增 [TTS Android 部署示例](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/TTSAndroid)。
- 🤗 2022.11.28: PP-TTS and PP-ASR 示例可在 [AIStudio](https://aistudio.baidu.com/aistudio/modelsoverview) 和[飞桨官网](https://www.paddlepaddle.org.cn/models)体验!

@ -17,7 +17,7 @@ The input of this demo should be a WAV file(`.wav`), and the sample rate must be
Here are sample files for this demo that can be downloaded:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav
```
### 3. Usage
@ -27,6 +27,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
paddlespeech asr --input ./zh.wav -v
# English
paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav -v
# Code-Switch
paddlespeech asr --model conformer_talcs --lang zh_en --codeswitch True --input ./ch_zh_mix.wav -v
# Chinese ASR + Punctuation Restoration
paddlespeech asr --input ./zh.wav -v | paddlespeech text --task punc -v
```
@ -40,6 +42,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
- `input`(required): Audio file to recognize.
- `model`: Model type of asr task. Default: `conformer_wenetspeech`.
- `lang`: Model language. Default: `zh`.
- `codeswitch`: Code Swith Model. Default: `False`
- `sample_rate`: Sample rate of the model. Default: `16000`.
- `config`: Config of asr task. Use pretrained model when it is None. Default: `None`.
- `ckpt_path`: Model checkpoint. Use pretrained model when it is None. Default: `None`.
@ -83,14 +86,15 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API:
| Model | Language | Sample Rate
| :--- | :---: | :---: |
| conformer_wenetspeech | zh | 16k
| conformer_online_multicn | zh | 16k
| conformer_aishell | zh | 16k
| conformer_online_aishell | zh | 16k
| transformer_librispeech | en | 16k
| deepspeech2online_wenetspeech | zh | 16k
| deepspeech2offline_aishell| zh| 16k
| deepspeech2online_aishell | zh | 16k
| deepspeech2offline_librispeech | en | 16k
| Model | Code Switch | Language | Sample Rate
| :--- | :---: | :---: | :---: |
| conformer_wenetspeech | False | zh | 16k
| conformer_online_multicn | False | zh | 16k
| conformer_aishell | False | zh | 16k
| conformer_online_aishell | False | zh | 16k
| transformer_librispeech | False | en | 16k
| deepspeech2online_wenetspeech | False | zh | 16k
| deepspeech2offline_aishell | False | zh| 16k
| deepspeech2online_aishell | False | zh | 16k
| deepspeech2offline_librispeech | False | en | 16k
| conformer_talcs | True | zh_en | 16k

@ -1,4 +1,5 @@
(简体中文|[English](./README.md))
(简体中文|[English](./README.md))
# 语音识别
## 介绍
@ -16,7 +17,7 @@
可以下载此 demo 的示例音频:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav
```
### 3. 使用方法
- 命令行 (推荐使用)
@ -25,6 +26,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
paddlespeech asr --input ./zh.wav -v
# 英文
paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav -v
#中英混合
paddlespeech asr --model conformer_talcs --lang zh_en --codeswitch True --input ./ch_zh_mix.wav -v
# 中文 + 标点恢复
paddlespeech asr --input ./zh.wav -v | paddlespeech text --task punc -v
```
@ -38,6 +41,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
- `input`(必须输入):用于识别的音频文件。
- `model`ASR 任务的模型,默认值:`conformer_wenetspeech`。
- `lang`:模型语言,默认值:`zh`。
- `codeswitch`: 是否使用语言转换,默认值:`False`。
- `sample_rate`:音频采样率,默认值:`16000`。
- `config`ASR 任务的参数文件,若不设置则使用预训练模型中的默认配置,默认值:`None`。
- `ckpt_path`:模型参数文件,若不设置则下载预训练模型使用,默认值:`None`。
@ -80,14 +84,15 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
### 4.预训练模型
以下是 PaddleSpeech 提供的可以被命令行和 python API 使用的预训练模型列表:
| 模型 | 语言 | 采样率
| :--- | :---: | :---: |
| conformer_wenetspeech | zh | 16k
| conformer_online_multicn | zh | 16k
| conformer_aishell | zh | 16k
| conformer_online_aishell | zh | 16k
| transformer_librispeech | en | 16k
| deepspeech2online_wenetspeech | zh | 16k
| deepspeech2offline_aishell| zh| 16k
| deepspeech2online_aishell | zh | 16k
| deepspeech2offline_librispeech | en | 16k
| 模型 | 语言转换 | 语言 | 采样率
| :--- | :---: | :---: | :---: |
| conformer_wenetspeech | False | zh | 16k
| conformer_online_multicn | False | zh | 16k
| conformer_aishell | False | zh | 16k
| conformer_online_aishell | False | zh | 16k
| transformer_librispeech | False | en | 16k
| deepspeech2online_wenetspeech | False | zh | 16k
| deepspeech2offline_aishell | False | zh| 16k
| deepspeech2online_aishell | False | zh | 16k
| deepspeech2offline_librispeech | False | en | 16k
| conformer_talcs | True | zh_en | 16k

@ -2,6 +2,7 @@
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav
# asr
paddlespeech asr --input ./zh.wav
@ -18,6 +19,11 @@ paddlespeech asr --help
# english asr
paddlespeech asr --lang en --model transformer_librispeech --input ./en.wav
# code-switch asr
paddlespeech asr --lang zh_en --codeswitch True --model conformer_talcs --input ./ch_zh_mix.wav
# model stats
paddlespeech stats --task asr

@ -25,6 +25,9 @@ import librosa
import numpy as np
import paddle
import soundfile
from paddlespeech.audio.transform.transformation import Transformation
from paddlespeech.s2t.frontend.featurizer.text_featurizer import TextFeaturizer
from paddlespeech.s2t.utils.utility import UpdateConfig
from yacs.config import CfgNode
from ...utils.env import MODEL_HOME
@ -34,9 +37,6 @@ from ..log import logger
from ..utils import CLI_TIMER
from ..utils import stats_wrapper
from ..utils import timer_register
from paddlespeech.audio.transform.transformation import Transformation
from paddlespeech.s2t.frontend.featurizer.text_featurizer import TextFeaturizer
from paddlespeech.s2t.utils.utility import UpdateConfig
__all__ = ['ASRExecutor']
@ -62,8 +62,13 @@ class ASRExecutor(BaseExecutor):
'--lang',
type=str,
default='zh',
help='Choose model language. zh or en, zh:[conformer_wenetspeech-zh-16k], en:[transformer_librispeech-en-16k]'
help='Choose model language. [zh, en, zh_en], zh:[conformer_wenetspeech-zh-16k], en:[transformer_librispeech-en-16k], zh_en:[conformer_talcs-codeswitch_zh_en-16k]'
)
self.parser.add_argument(
'--codeswitch',
type=bool,
default=False,
help='Choose whether use code-switch. True or False.')
self.parser.add_argument(
"--sample_rate",
type=int,
@ -127,6 +132,7 @@ class ASRExecutor(BaseExecutor):
def _init_from_path(self,
model_type: str='wenetspeech',
lang: str='zh',
codeswitch: bool=False,
sample_rate: int=16000,
cfg_path: Optional[os.PathLike]=None,
decode_method: str='attention_rescoring',
@ -144,7 +150,12 @@ class ASRExecutor(BaseExecutor):
if cfg_path is None or ckpt_path is None:
sample_rate_str = '16k' if sample_rate == 16000 else '8k'
tag = model_type + '-' + lang + '-' + sample_rate_str
if lang == "zh_en" and codeswitch is True:
tag = model_type + '-' + 'codeswitch_' + lang + '-' + sample_rate_str
elif lang == "zh_en" or codeswitch is True:
raise Exception("codeswitch is true only in zh_en model")
else:
tag = model_type + '-' + lang + '-' + sample_rate_str
self.task_resource.set_task_model(tag, version=None)
self.res_path = self.task_resource.res_dir
@ -423,6 +434,7 @@ class ASRExecutor(BaseExecutor):
model = parser_args.model
lang = parser_args.lang
codeswitch = parser_args.codeswitch
sample_rate = parser_args.sample_rate
config = parser_args.config
ckpt_path = parser_args.ckpt_path
@ -444,6 +456,7 @@ class ASRExecutor(BaseExecutor):
audio_file=input_,
model=model,
lang=lang,
codeswitch=codeswitch,
sample_rate=sample_rate,
config=config,
ckpt_path=ckpt_path,
@ -472,6 +485,7 @@ class ASRExecutor(BaseExecutor):
audio_file: os.PathLike,
model: str='conformer_u2pp_online_wenetspeech',
lang: str='zh',
codeswitch: bool=False,
sample_rate: int=16000,
config: os.PathLike=None,
ckpt_path: os.PathLike=None,
@ -485,8 +499,8 @@ class ASRExecutor(BaseExecutor):
"""
audio_file = os.path.abspath(audio_file)
paddle.set_device(device)
self._init_from_path(model, lang, sample_rate, config, decode_method,
num_decoding_left_chunks, ckpt_path)
self._init_from_path(model, lang, codeswitch, sample_rate, config,
decode_method, num_decoding_left_chunks, ckpt_path)
if not self._check(audio_file, sample_rate, force_yes):
sys.exit(-1)
if rtf:

@ -14,6 +14,7 @@
import argparse
from typing import List
import numpy
from prettytable import PrettyTable
from ..resource import CommonTaskResource
@ -78,7 +79,7 @@ class VersionCommand:
model_name_format = {
'asr': 'Model-Language-Sample Rate',
'asr': 'Model-Size-Code Switch-Multilingual-Language-Sample Rate',
'cls': 'Model-Sample Rate',
'st': 'Model-Source language-Target language',
'text': 'Model-Task-Language',
@ -111,7 +112,21 @@ class StatsCommand:
fields = model_name_format[self.task].split("-")
table = PrettyTable(fields)
for key in pretrained_models:
table.add_row(key.split("-"))
line = key.split("-")
if self.task == "asr" and len(line) < len(fields):
for i in range(len(line), len(fields)):
line.append("-")
if "codeswitch" in key:
line[3], line[1] = line[1].split("_")[0], line[1].split(
"_")[1:]
elif "multilingual" in key:
line[4], line[1] = line[1].split("_")[0], line[1].split(
"_")[1:]
tmp = numpy.array(line)
idx = [0, 5, 3, 4, 1, 2]
line = tmp[idx]
table.add_row(line)
print(table)
def execute(self, argv: List[str]) -> bool:

@ -30,6 +30,7 @@ __all__ = [
]
# The tags for pretrained_models should be "{model_name}[_{dataset}][-{lang}][-...]".
# Add code-switch and multilingual tag, "{model_name}[_{dataset}]-[codeswitch/multilingual][_{lang}][-...]".
# e.g. "conformer_wenetspeech-zh-16k" and "panns_cnn6-32k".
# Command line and python api use "{model_name}[_{dataset}]" as --model, usage:
# "paddlespeech asr --model conformer_wenetspeech --lang zh --sr 16000 --input ./input.wav"
@ -322,6 +323,18 @@ asr_dynamic_pretrained_models = {
'099a601759d467cd0a8523ff939819c5'
},
},
"conformer_talcs-codeswitch_zh_en-16k": {
'1.4': {
'url':
'https://paddlespeech.bj.bcebos.com/s2t/tal_cs/asr1/asr1_conformer_talcs_ckpt_1.4.0.model.tar.gz',
'md5':
'01962c5d0a70878fe41cacd4f61e14d1',
'cfg_path':
'model.yaml',
'ckpt_path':
'exp/conformer/checkpoints/avg_10'
},
},
}
asr_static_pretrained_models = {

@ -16,14 +16,9 @@ import sys
import warnings
from typing import List
import numpy
import uvicorn
from fastapi import FastAPI
from prettytable import PrettyTable
from starlette.middleware.cors import CORSMiddleware
from ..executor import BaseExecutor
from ..util import cli_server_register
from ..util import stats_wrapper
from paddlespeech.cli.log import logger
from paddlespeech.resource import CommonTaskResource
from paddlespeech.server.engine.engine_pool import init_engine_pool
@ -31,6 +26,12 @@ from paddlespeech.server.engine.engine_warmup import warm_up
from paddlespeech.server.restful.api import setup_router as setup_http_router
from paddlespeech.server.utils.config import get_config
from paddlespeech.server.ws.api import setup_router as setup_ws_router
from prettytable import PrettyTable
from starlette.middleware.cors import CORSMiddleware
from ..executor import BaseExecutor
from ..util import cli_server_register
from ..util import stats_wrapper
warnings.filterwarnings("ignore")
__all__ = ['ServerExecutor', 'ServerStatsExecutor']
@ -134,7 +135,7 @@ class ServerStatsExecutor():
required=True)
self.task_choices = ['asr', 'tts', 'cls', 'text', 'vector']
self.model_name_format = {
'asr': 'Model-Language-Sample Rate',
'asr': 'Model-Size-Code Switch-Multilingual-Language-Sample Rate',
'tts': 'Model-Language',
'cls': 'Model-Sample Rate',
'text': 'Model-Task-Language',
@ -145,7 +146,20 @@ class ServerStatsExecutor():
fields = self.model_name_format[self.task].split("-")
table = PrettyTable(fields)
for key in pretrained_models:
table.add_row(key.split("-"))
line = key.split("-")
if self.task == "asr" and len(line) < len(fields):
for i in range(len(line), len(fields)):
line.append("-")
if "codeswitch" in key:
line[3], line[1] = line[1].split("_")[0], line[1].split(
"_")[1:]
elif "multilingual" in key:
line[4], line[1] = line[1].split("_")[0], line[1].split(
"_")[1:]
tmp = numpy.array(line)
idx = [0, 5, 3, 4, 1, 2]
line = tmp[idx]
table.add_row(line)
print(table)
def execute(self, argv: List[str]) -> bool:

@ -14,7 +14,7 @@ paddlespeech ssl --task asr --lang en --input ./en.wav
paddlespeech ssl --task vector --lang en --input ./en.wav
# Speech_recognition
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav
paddlespeech asr --input ./zh.wav
paddlespeech asr --model conformer_aishell --input ./zh.wav
paddlespeech asr --model conformer_online_aishell --input ./zh.wav
@ -26,6 +26,7 @@ paddlespeech asr --model deepspeech2offline_aishell --input ./zh.wav
paddlespeech asr --model deepspeech2online_wenetspeech --input ./zh.wav
paddlespeech asr --model deepspeech2online_aishell --input ./zh.wav
paddlespeech asr --model deepspeech2offline_librispeech --lang en --input ./en.wav
paddlespeech asr --model conformer_talcs --lang zh_en --codeswitch True --input ./ch_zh_mix.wav
# Support editing num_decoding_left_chunks
paddlespeech asr --model conformer_online_wenetspeech --num_decoding_left_chunks 3 --input ./zh.wav

Loading…
Cancel
Save