Merge branch 'develop' into develop

pull/1515/head
TianYuan 2 years ago committed by GitHub
commit 6bf0d3bf57
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

4
.gitignore vendored

@ -2,6 +2,7 @@
*.pyc *.pyc
.vscode .vscode
*log *log
*.wav
*.pdmodel *.pdmodel
*.pdiparams* *.pdiparams*
*.zip *.zip
@ -30,5 +31,8 @@ tools/OpenBLAS/
tools/Miniconda3-latest-Linux-x86_64.sh tools/Miniconda3-latest-Linux-x86_64.sh
tools/activate_python.sh tools/activate_python.sh
tools/miniconda.sh tools/miniconda.sh
tools/CRF++-0.58/
speechx/fc_patch/
*output/ *output/

@ -196,16 +196,18 @@ Developers can have a try of our models with [PaddleSpeech Command Line](./paddl
```shell ```shell
paddlespeech cls --input input.wav paddlespeech cls --input input.wav
``` ```
**Automatic Speech Recognition** **Automatic Speech Recognition**
```shell ```shell
paddlespeech asr --lang zh --input input_16k.wav paddlespeech asr --lang zh --input input_16k.wav
``` ```
**Speech Translation** (English to Chinese)
**Speech Translation** (English to Chinese)
(not support for Mac and Windows now) (not support for Mac and Windows now)
```shell ```shell
paddlespeech st --input input_16k.wav paddlespeech st --input input_16k.wav
``` ```
**Text-to-Speech** **Text-to-Speech**
```shell ```shell
paddlespeech tts --input "你好,欢迎使用飞桨深度学习框架!" --output output.wav paddlespeech tts --input "你好,欢迎使用飞桨深度学习框架!" --output output.wav
@ -218,7 +220,16 @@ paddlespeech tts --input "你好,欢迎使用飞桨深度学习框架!" --ou
paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭 paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭
``` ```
**Batch Process**
```
echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts
```
**Shell Pipeline**
- ASR + Punctuation Restoration
```
paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
```
For more command lines, please see: [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos) For more command lines, please see: [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos)

@ -216,6 +216,17 @@ paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!
paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭 paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭
``` ```
**批处理**
```
echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts
```
**Shell管道**
ASR + Punc:
```
paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
```
更多命令行命令请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos) 更多命令行命令请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos)
> Note: 如果需要训练或者微调,请查看[语音识别](./docs/source/asr/quick_start.md) [语音合成](./docs/source/tts/quick_start.md)。 > Note: 如果需要训练或者微调,请查看[语音识别](./docs/source/asr/quick_start.md) [语音合成](./docs/source/tts/quick_start.md)。
@ -558,6 +569,7 @@ year={2021}
- 非常感谢 [kslz](https://github.com/kslz) 补充中文文档。 - 非常感谢 [kslz](https://github.com/kslz) 补充中文文档。
- 非常感谢 [awmmmm](https://github.com/awmmmm) 提供 fastspeech2 aishell3 conformer 预训练模型。 - 非常感谢 [awmmmm](https://github.com/awmmmm) 提供 fastspeech2 aishell3 conformer 预训练模型。
- 非常感谢 [phecda-xu](https://github.com/phecda-xu)/[PaddleDubbing](https://github.com/phecda-xu/PaddleDubbing) 基于 PaddleSpeech 的 TTS 模型搭建带 GUI 操作界面的配音工具。 - 非常感谢 [phecda-xu](https://github.com/phecda-xu)/[PaddleDubbing](https://github.com/phecda-xu/PaddleDubbing) 基于 PaddleSpeech 的 TTS 模型搭建带 GUI 操作界面的配音工具。
此外PaddleSpeech 依赖于许多开源存储库。有关更多信息,请参阅 [references](./docs/source/reference.md)。 此外PaddleSpeech 依赖于许多开源存储库。有关更多信息,请参阅 [references](./docs/source/reference.md)。

@ -27,6 +27,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
paddlespeech asr --input ./zh.wav paddlespeech asr --input ./zh.wav
# English # English
paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav
# Chinese ASR + Punctuation Restoration
paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
``` ```
(It doesn't matter if package `paddlespeech-ctcdecoders` is not found, this package is optional.) (It doesn't matter if package `paddlespeech-ctcdecoders` is not found, this package is optional.)

@ -25,6 +25,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
paddlespeech asr --input ./zh.wav paddlespeech asr --input ./zh.wav
# 英文 # 英文
paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav
# 中文 + 标点恢复
paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
``` ```
(如果显示 `paddlespeech-ctcdecoders` 这个 python 包没有找到的 Error没有关系这个包是非必须的。) (如果显示 `paddlespeech-ctcdecoders` 这个 python 包没有找到的 Error没有关系这个包是非必须的。)

@ -1,4 +1,10 @@
#!/bin/bash #!/bin/bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
# asr
paddlespeech asr --input ./zh.wav paddlespeech asr --input ./zh.wav
# asr + punc
paddlespeech asr --input ./zh.wav | paddlespeech text --task punc

@ -17,11 +17,14 @@ The input of this demo should be a text of the specific language that can be pas
### 3. Usage ### 3. Usage
- Command Line (Recommended) - Command Line (Recommended)
- Chinese - Chinese
The default acoustic model is `Fastspeech2`, and the default vocoder is `Parallel WaveGAN`. The default acoustic model is `Fastspeech2`, and the default vocoder is `Parallel WaveGAN`.
```bash ```bash
paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!" paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!"
``` ```
- Batch Process
```bash
echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts
```
- Chinese, use `SpeedySpeech` as the acoustic model - Chinese, use `SpeedySpeech` as the acoustic model
```bash ```bash
paddlespeech tts --am speedyspeech_csmsc --input "你好,欢迎使用百度飞桨深度学习框架!" paddlespeech tts --am speedyspeech_csmsc --input "你好,欢迎使用百度飞桨深度学习框架!"

@ -24,6 +24,10 @@
```bash ```bash
paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!" paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!"
``` ```
- 批处理
```bash
echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts
```
- 中文,使用 `SpeedySpeech` 作为声学模型 - 中文,使用 `SpeedySpeech` 作为声学模型
```bash ```bash
paddlespeech tts --am speedyspeech_csmsc --input "你好,欢迎使用百度飞桨深度学习框架!" paddlespeech tts --am speedyspeech_csmsc --input "你好,欢迎使用百度飞桨深度学习框架!"

@ -1,3 +1,7 @@
#!/bin/bash #!/bin/bash
# single process
paddlespeech tts --input 今天的天气不错啊 paddlespeech tts --input 今天的天气不错啊
# Batch process
echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts

@ -0,0 +1,369 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a1e738e0",
"metadata": {},
"source": [
"## 获取测试的 logit 数据"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "29d3368b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"hlens.npy\n",
"logits.npy\n",
"ys_lens.npy\n",
"ys_pad.npy\n"
]
}
],
"source": [
"!mkdir -p ./test_data\n",
"!test -f ./test_data/ctc_loss_compare_data.tgz || wget -P ./test_data https://paddlespeech.bj.bcebos.com/datasets/unit_test/asr/ctc_loss_compare_data.tgz\n",
"!tar xzvf test_data/ctc_loss_compare_data.tgz -C ./test_data\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "240caf1d",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import numpy as np\n",
"import time\n",
"\n",
"data_dir=\"./test_data\"\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "91bad949",
"metadata": {},
"outputs": [],
"source": [
"logits_np = np.load(os.path.join(data_dir, \"logits.npy\"))\n",
"ys_pad_np = np.load(os.path.join(data_dir, \"ys_pad.npy\"))\n",
"hlens_np = np.load(os.path.join(data_dir, \"hlens.npy\"))\n",
"ys_lens_np = np.load(os.path.join(data_dir, \"ys_lens.npy\"))"
]
},
{
"cell_type": "markdown",
"id": "4cef2f15",
"metadata": {},
"source": [
"## 使用 torch 的 ctc loss"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "90612004",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'1.10.1+cu102'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import torch\n",
"torch.__version__"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "00799f97",
"metadata": {},
"outputs": [],
"source": [
"def torch_ctc_loss(use_cpu):\n",
" if use_cpu:\n",
" device = torch.device(\"cpu\")\n",
" else:\n",
" device = torch.device(\"cuda\")\n",
"\n",
" reduction_type = \"sum\" \n",
"\n",
" ctc_loss = torch.nn.CTCLoss(reduction=reduction_type)\n",
"\n",
" ys_hat = torch.tensor(logits_np, device = device)\n",
" ys_pad = torch.tensor(ys_pad_np, device = device)\n",
" hlens = torch.tensor(hlens_np, device = device)\n",
" ys_lens = torch.tensor(ys_lens_np, device = device)\n",
"\n",
" ys_hat = ys_hat.transpose(0, 1)\n",
" \n",
" # 开始计算时间\n",
" start_time = time.time()\n",
" ys_hat = ys_hat.log_softmax(2)\n",
" loss = ctc_loss(ys_hat, ys_pad, hlens, ys_lens)\n",
" end_time = time.time()\n",
" \n",
" loss = loss / ys_hat.size(1)\n",
" return end_time - start_time, loss.item()"
]
},
{
"cell_type": "markdown",
"id": "ba47b5a4",
"metadata": {},
"source": [
"## 使用 paddle 的 ctc loss"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "6882a06e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'2.2.2'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import paddle\n",
"paddle.__version__"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "3cfa3b7c",
"metadata": {},
"outputs": [],
"source": [
"def paddle_ctc_loss(use_cpu): \n",
" import paddle.nn as pn\n",
" if use_cpu:\n",
" device = \"cpu\"\n",
" else:\n",
" device = \"gpu\"\n",
"\n",
" paddle.set_device(device)\n",
"\n",
" logits = paddle.to_tensor(logits_np)\n",
" ys_pad = paddle.to_tensor(ys_pad_np,dtype='int32')\n",
" hlens = paddle.to_tensor(hlens_np, dtype='int64')\n",
" ys_lens = paddle.to_tensor(ys_lens_np, dtype='int64')\n",
"\n",
" logits = logits.transpose([1,0,2])\n",
"\n",
" ctc_loss = pn.CTCLoss(reduction='sum')\n",
" # 开始计算时间\n",
" start_time = time.time()\n",
" pn_loss = ctc_loss(logits, ys_pad, hlens, ys_lens)\n",
" end_time = time.time()\n",
" \n",
" pn_loss = pn_loss / logits.shape[1]\n",
" return end_time - start_time, pn_loss.item()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "40413ef9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU, iteration 10\n",
"torch_ctc_loss 159.17137145996094\n",
"paddle_ctc_loss 159.16574096679688\n",
"paddle average time 1.718252992630005\n",
"torch average time 0.17536230087280275\n",
"paddle time / torch time (cpu) 9.798303193320452\n",
"\n",
"GPU, iteration 10\n",
"torch_ctc_loss 159.172119140625\n",
"paddle_ctc_loss 159.17205810546875\n",
"paddle average time 0.018606925010681154\n",
"torch average time 0.0026710033416748047\n",
"paddle time / torch time (gpu) 6.966267963938231\n"
]
}
],
"source": [
"# 使用 CPU\n",
"\n",
"iteration = 10\n",
"use_cpu = True\n",
"torch_total_time = 0\n",
"paddle_total_time = 0\n",
"for _ in range(iteration):\n",
" cost_time, torch_loss = torch_ctc_loss(use_cpu)\n",
" torch_total_time += cost_time\n",
"for _ in range(iteration):\n",
" cost_time, paddle_loss = paddle_ctc_loss(use_cpu)\n",
" paddle_total_time += cost_time\n",
"print (\"CPU, iteration\", iteration)\n",
"print (\"torch_ctc_loss\", torch_loss)\n",
"print (\"paddle_ctc_loss\", paddle_loss)\n",
"print (\"paddle average time\", paddle_total_time / iteration)\n",
"print (\"torch average time\", torch_total_time / iteration)\n",
"print (\"paddle time / torch time (cpu)\" , paddle_total_time/ torch_total_time)\n",
"\n",
"print (\"\")\n",
"\n",
"# 使用 GPU\n",
"\n",
"use_cpu = False\n",
"torch_total_time = 0\n",
"paddle_total_time = 0\n",
"for _ in range(iteration):\n",
" cost_time, torch_loss = torch_ctc_loss(use_cpu)\n",
" torch_total_time += cost_time\n",
"for _ in range(iteration):\n",
" cost_time, paddle_loss = paddle_ctc_loss(use_cpu)\n",
" paddle_total_time += cost_time\n",
"print (\"GPU, iteration\", iteration)\n",
"print (\"torch_ctc_loss\", torch_loss)\n",
"print (\"paddle_ctc_loss\", paddle_loss)\n",
"print (\"paddle average time\", paddle_total_time / iteration)\n",
"print (\"torch average time\", torch_total_time / iteration)\n",
"print (\"paddle time / torch time (gpu)\" , paddle_total_time/ torch_total_time)"
]
},
{
"cell_type": "markdown",
"id": "7cdf8697",
"metadata": {},
"source": [
"## 其他: 使用 PaddleSpeech 中的 ctcloss 查一下loss值"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "73fad81d",
"metadata": {},
"outputs": [],
"source": [
"logits_np = np.load(os.path.join(data_dir, \"logits.npy\"))\n",
"ys_pad_np = np.load(os.path.join(data_dir, \"ys_pad.npy\"))\n",
"hlens_np = np.load(os.path.join(data_dir, \"hlens.npy\"))\n",
"ys_lens_np = np.load(os.path.join(data_dir, \"ys_lens.npy\"))"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "2b41e45d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2022-02-25 11:34:34.143 | INFO | paddlespeech.s2t.modules.loss:__init__:41 - CTCLoss Loss reduction: sum, div-bs: True\n",
"2022-02-25 11:34:34.143 | INFO | paddlespeech.s2t.modules.loss:__init__:42 - CTCLoss Grad Norm Type: instance\n",
"2022-02-25 11:34:34.144 | INFO | paddlespeech.s2t.modules.loss:__init__:73 - CTCLoss() kwargs:{'norm_by_times': True}, not support: {'norm_by_batchsize': False, 'norm_by_total_logits_len': False}\n",
"loss 159.17205810546875\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/root/miniconda3/lib/python3.7/site-packages/paddle/fluid/dygraph/math_op_patch.py:253: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.float32, but right dtype is paddle.int32, the right dtype will convert to paddle.float32\n",
" format(lhs_dtype, rhs_dtype, lhs_dtype))\n"
]
}
],
"source": [
"use_cpu = False\n",
"\n",
"from paddlespeech.s2t.modules.loss import CTCLoss\n",
"\n",
"if use_cpu:\n",
" device = \"cpu\"\n",
"else:\n",
" device = \"gpu\"\n",
"\n",
"paddle.set_device(device)\n",
"\n",
"blank_id=0\n",
"reduction_type='sum'\n",
"batch_average= True\n",
"grad_norm_type='instance'\n",
"\n",
"criterion = CTCLoss(\n",
" blank=blank_id,\n",
" reduction=reduction_type,\n",
" batch_average=batch_average,\n",
" grad_norm_type=grad_norm_type)\n",
"\n",
"logits = paddle.to_tensor(logits_np)\n",
"ys_pad = paddle.to_tensor(ys_pad_np,dtype='int32')\n",
"hlens = paddle.to_tensor(hlens_np, dtype='int64')\n",
"ys_lens = paddle.to_tensor(ys_lens_np, dtype='int64')\n",
"\n",
"pn_ctc_loss = criterion(logits, ys_pad, hlens, ys_lens)\n",
"print(\"loss\", pn_ctc_loss.item())\n",
" "
]
},
{
"cell_type": "markdown",
"id": "de525d38",
"metadata": {},
"source": [
"## 结论\n",
"在 CPU 环境下: torch 的 CTC loss 的计算速度是 paddle 的 9.8 倍 \n",
"在 GPU 环境下: torch 的 CTC loss 的计算速度是 paddle 的 6.87 倍\n",
"\n",
"## 其他结论\n",
"torch 的 ctc loss 在 CPU 和 GPU 下 都没有完全对齐。其中CPU的前向对齐精度大约为 1e-2。 GPU 的前向对齐精度大约为 1e-4 。"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -225,7 +225,9 @@ optional arguments:
9. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu. 9. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu.
## Pretrained Model ## Pretrained Model
Pretrained FastSpeech2 model with no silence in the edge of audios. [fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_ckpt_0.4.zip) Pretrained FastSpeech2 model with no silence in the edge of audios:
- [fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_ckpt_0.4.zip)
- [fastspeech2_conformer_aishell3_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_conformer_aishell3_ckpt_0.2.0.zip) (Thanks for [@awmmmm](https://github.com/awmmmm)'s contribution)
FastSpeech2 checkpoint contains files listed below. FastSpeech2 checkpoint contains files listed below.

@ -0,0 +1,110 @@
###########################################################
# FEATURE EXTRACTION SETTING #
###########################################################
fs: 24000 # sr
n_fft: 2048 # FFT size (samples).
n_shift: 300 # Hop size (samples). 12.5ms
win_length: 1200 # Window length (samples). 50ms
# If set to null, it will be the same as fft_size.
window: "hann" # Window function.
# Only used for feats_type != raw
fmin: 80 # Minimum frequency of Mel basis.
fmax: 7600 # Maximum frequency of Mel basis.
n_mels: 80 # The number of mel basis.
# Only used for the model using pitch features (e.g. FastSpeech2)
f0min: 80 # Maximum f0 for pitch extraction.
f0max: 400 # Minimum f0 for pitch extraction.
###########################################################
# DATA SETTING #
###########################################################
batch_size: 32
num_workers: 4
###########################################################
# MODEL SETTING #
###########################################################
model:
adim: 384 # attention dimension
aheads: 2 # number of attention heads
elayers: 4 # number of encoder layers
eunits: 1536 # number of encoder ff units
dlayers: 4 # number of decoder layers
dunits: 1536 # number of decoder ff units
positionwise_layer_type: conv1d # type of position-wise layer
positionwise_conv_kernel_size: 3 # kernel size of position wise conv layer
duration_predictor_layers: 2 # number of layers of duration predictor
duration_predictor_chans: 256 # number of channels of duration predictor
duration_predictor_kernel_size: 3 # filter size of duration predictor
postnet_layers: 5 # number of layers of postnset
postnet_filts: 5 # filter size of conv layers in postnet
postnet_chans: 256 # number of channels of conv layers in postnet
encoder_normalize_before: True # whether to perform layer normalization before the input
decoder_normalize_before: True # whether to perform layer normalization before the input
reduction_factor: 1 # reduction factor
encoder_type: conformer # encoder type
decoder_type: conformer # decoder type
conformer_pos_enc_layer_type: rel_pos # conformer positional encoding type
conformer_self_attn_layer_type: rel_selfattn # conformer self-attention type
conformer_activation_type: swish # conformer activation type
use_macaron_style_in_conformer: true # whether to use macaron style in conformer
use_cnn_in_conformer: true # whether to use CNN in conformer
conformer_enc_kernel_size: 7 # kernel size in CNN module of conformer-based encoder
conformer_dec_kernel_size: 31 # kernel size in CNN module of conformer-based decoder
init_type: xavier_uniform # initialization type
transformer_enc_dropout_rate: 0.2 # dropout rate for transformer encoder layer
transformer_enc_positional_dropout_rate: 0.2 # dropout rate for transformer encoder positional encoding
transformer_enc_attn_dropout_rate: 0.2 # dropout rate for transformer encoder attention layer
transformer_dec_dropout_rate: 0.2 # dropout rate for transformer decoder layer
transformer_dec_positional_dropout_rate: 0.2 # dropout rate for transformer decoder positional encoding
transformer_dec_attn_dropout_rate: 0.2 # dropout rate for transformer decoder attention layer
pitch_predictor_layers: 5 # number of conv layers in pitch predictor
pitch_predictor_chans: 256 # number of channels of conv layers in pitch predictor
pitch_predictor_kernel_size: 5 # kernel size of conv leyers in pitch predictor
pitch_predictor_dropout: 0.5 # dropout rate in pitch predictor
pitch_embed_kernel_size: 1 # kernel size of conv embedding layer for pitch
pitch_embed_dropout: 0.0 # dropout rate after conv embedding layer for pitch
stop_gradient_from_pitch_predictor: true # whether to stop the gradient from pitch predictor to encoder
energy_predictor_layers: 2 # number of conv layers in energy predictor
energy_predictor_chans: 256 # number of channels of conv layers in energy predictor
energy_predictor_kernel_size: 3 # kernel size of conv leyers in energy predictor
energy_predictor_dropout: 0.5 # dropout rate in energy predictor
energy_embed_kernel_size: 1 # kernel size of conv embedding layer for energy
energy_embed_dropout: 0.0 # dropout rate after conv embedding layer for energy
stop_gradient_from_energy_predictor: false # whether to stop the gradient from energy predictor to encoder
spk_embed_dim: 256 # speaker embedding dimension
spk_embed_integration_type: concat # speaker embedding integration type
###########################################################
# UPDATER SETTING #
###########################################################
updater:
use_masking: True # whether to apply masking for padded part in loss calculation
###########################################################
# OPTIMIZER SETTING #
###########################################################
optimizer:
optim: adam # optimizer type
learning_rate: 0.001 # learning rate
###########################################################
# TRAINING SETTING #
###########################################################
max_epoch: 1000
num_snapshots: 5
###########################################################
# OTHER SETTING #
###########################################################
seed: 10086

@ -10,7 +10,7 @@ Run the command below to get the results of the test.
```bash ```bash
./run.sh ./run.sh
``` ```
The `avg WER` of g2p is: 0.027124048652822204 The `avg WER` of g2p is: 0.026014352515701198
```text ```text
,--------------------------------------------------------------------. ,--------------------------------------------------------------------.
| | # Snt # Wrd | Corr Sub Del Ins Err S.Err | | | # Snt # Wrd | Corr Sub Del Ins Err S.Err |

@ -20,5 +20,6 @@ from .cls import CLSExecutor
from .st import STExecutor from .st import STExecutor
from .text import TextExecutor from .text import TextExecutor
from .tts import TTSExecutor from .tts import TTSExecutor
from .stats import StatsExecutor
_locale._getdefaultlocale = (lambda *args: ['en_US', 'utf8']) _locale._getdefaultlocale = (lambda *args: ['en_US', 'utf8'])

@ -0,0 +1,14 @@
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .infer import StatsExecutor

@ -0,0 +1,193 @@
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
from typing import List
from prettytable import PrettyTable
from ..log import logger
from ..utils import cli_register
from ..utils import stats_wrapper
__all__ = ['StatsExecutor']
model_name_format = {
'asr': 'Model-Language-Sample Rate',
'cls': 'Model-Sample Rate',
'st': 'Model-Source language-Target language',
'text': 'Model-Task-Language',
'tts': 'Model-Language'
}
@cli_register(
name='paddlespeech.stats',
description='Get speech tasks support models list.')
class StatsExecutor():
def __init__(self):
super(StatsExecutor, self).__init__()
self.parser = argparse.ArgumentParser(
prog='paddlespeech.stats', add_help=True)
self.parser.add_argument(
'--task',
type=str,
default='asr',
choices=['asr', 'cls', 'st', 'text', 'tts'],
help='Choose speech task.',
required=True)
self.task_choices = ['asr', 'cls', 'st', 'text', 'tts']
def show_support_models(self, pretrained_models: dict):
fields = model_name_format[self.task].split("-")
table = PrettyTable(fields)
for key in pretrained_models:
table.add_row(key.split("-"))
print(table)
def execute(self, argv: List[str]) -> bool:
"""
Command line entry.
"""
parser_args = self.parser.parse_args(argv)
self.task = parser_args.task
if self.task not in self.task_choices:
logger.error(
"Please input correct speech task, choices = ['asr', 'cls', 'st', 'text', 'tts']"
)
return False
elif self.task == 'asr':
try:
from ..asr.infer import pretrained_models
logger.info(
"Here is the list of ASR pretrained models released by PaddleSpeech that can be used by command line and python API"
)
self.show_support_models(pretrained_models)
return True
except BaseException:
logger.error("Failed to get the list of ASR pretrained models.")
return False
elif self.task == 'cls':
try:
from ..cls.infer import pretrained_models
logger.info(
"Here is the list of CLS pretrained models released by PaddleSpeech that can be used by command line and python API"
)
self.show_support_models(pretrained_models)
return True
except BaseException:
logger.error("Failed to get the list of CLS pretrained models.")
return False
elif self.task == 'st':
try:
from ..st.infer import pretrained_models
logger.info(
"Here is the list of ST pretrained models released by PaddleSpeech that can be used by command line and python API"
)
self.show_support_models(pretrained_models)
return True
except BaseException:
logger.error("Failed to get the list of ST pretrained models.")
return False
elif self.task == 'text':
try:
from ..text.infer import pretrained_models
logger.info(
"Here is the list of TEXT pretrained models released by PaddleSpeech that can be used by command line and python API"
)
self.show_support_models(pretrained_models)
return True
except BaseException:
logger.error(
"Failed to get the list of TEXT pretrained models.")
return False
elif self.task == 'tts':
try:
from ..tts.infer import pretrained_models
logger.info(
"Here is the list of TTS pretrained models released by PaddleSpeech that can be used by command line and python API"
)
self.show_support_models(pretrained_models)
return True
except BaseException:
logger.error("Failed to get the list of TTS pretrained models.")
return False
@stats_wrapper
def __call__(
self,
task: str=None, ):
"""
Python API to call an executor.
"""
self.task = task
if self.task not in self.task_choices:
print(
"Please input correct speech task, choices = ['asr', 'cls', 'st', 'text', 'tts']"
)
elif self.task == 'asr':
try:
from ..asr.infer import pretrained_models
print(
"Here is the list of ASR pretrained models released by PaddleSpeech that can be used by command line and python API"
)
self.show_support_models(pretrained_models)
except BaseException:
print("Failed to get the list of ASR pretrained models.")
elif self.task == 'cls':
try:
from ..cls.infer import pretrained_models
print(
"Here is the list of CLS pretrained models released by PaddleSpeech that can be used by command line and python API"
)
self.show_support_models(pretrained_models)
except BaseException:
print("Failed to get the list of CLS pretrained models.")
elif self.task == 'st':
try:
from ..st.infer import pretrained_models
print(
"Here is the list of ST pretrained models released by PaddleSpeech that can be used by command line and python API"
)
self.show_support_models(pretrained_models)
except BaseException:
print("Failed to get the list of ST pretrained models.")
elif self.task == 'text':
try:
from ..text.infer import pretrained_models
print(
"Here is the list of TEXT pretrained models released by PaddleSpeech that can be used by command line and python API"
)
self.show_support_models(pretrained_models)
except BaseException:
print("Failed to get the list of TEXT pretrained models.")
elif self.task == 'tts':
try:
from ..tts.infer import pretrained_models
print(
"Here is the list of TTS pretrained models released by PaddleSpeech that can be used by command line and python API"
)
self.show_support_models(pretrained_models)
except BaseException:
print("Failed to get the list of TTS pretrained models.")

@ -51,7 +51,7 @@ def _batch_shuffle(indices, batch_size, epoch, clipped=False):
""" """
rng = np.random.RandomState(epoch) rng = np.random.RandomState(epoch)
shift_len = rng.randint(0, batch_size - 1) shift_len = rng.randint(0, batch_size - 1)
batch_indices = list(zip(*[iter(indices[shift_len:])] * batch_size)) batch_indices = list(zip(* [iter(indices[shift_len:])] * batch_size))
rng.shuffle(batch_indices) rng.shuffle(batch_indices)
batch_indices = [item for batch in batch_indices for item in batch] batch_indices = [item for batch in batch_indices for item in batch]
assert clipped is False assert clipped is False

@ -33,8 +33,6 @@ from paddlespeech.s2t.modules.decoder import TransformerDecoder
from paddlespeech.s2t.modules.encoder import ConformerEncoder from paddlespeech.s2t.modules.encoder import ConformerEncoder
from paddlespeech.s2t.modules.encoder import TransformerEncoder from paddlespeech.s2t.modules.encoder import TransformerEncoder
from paddlespeech.s2t.modules.loss import LabelSmoothingLoss from paddlespeech.s2t.modules.loss import LabelSmoothingLoss
from paddlespeech.s2t.modules.mask import mask_finished_preds
from paddlespeech.s2t.modules.mask import mask_finished_scores
from paddlespeech.s2t.modules.mask import subsequent_mask from paddlespeech.s2t.modules.mask import subsequent_mask
from paddlespeech.s2t.utils import checkpoint from paddlespeech.s2t.utils import checkpoint
from paddlespeech.s2t.utils import layer_tools from paddlespeech.s2t.utils import layer_tools
@ -291,7 +289,7 @@ class U2STBaseModel(nn.Layer):
device = speech.place device = speech.place
# Let's assume B = batch_size and N = beam_size # Let's assume B = batch_size and N = beam_size
# 1. Encoder and init hypothesis # 1. Encoder and init hypothesis
encoder_out, encoder_mask = self._forward_encoder( encoder_out, encoder_mask = self._forward_encoder(
speech, speech_lengths, decoding_chunk_size, speech, speech_lengths, decoding_chunk_size,
num_decoding_left_chunks, num_decoding_left_chunks,

@ -14,3 +14,4 @@
from .paddlespeech_client import ASRClientExecutor from .paddlespeech_client import ASRClientExecutor
from .paddlespeech_client import TTSClientExecutor from .paddlespeech_client import TTSClientExecutor
from .paddlespeech_server import ServerExecutor from .paddlespeech_server import ServerExecutor
from .paddlespeech_server import ServerStatsExecutor

@ -16,15 +16,17 @@ from typing import List
import uvicorn import uvicorn
from fastapi import FastAPI from fastapi import FastAPI
from prettytable import PrettyTable
from ..executor import BaseExecutor from ..executor import BaseExecutor
from ..util import cli_server_register from ..util import cli_server_register
from ..util import stats_wrapper from ..util import stats_wrapper
from paddlespeech.cli.log import logger
from paddlespeech.server.engine.engine_pool import init_engine_pool from paddlespeech.server.engine.engine_pool import init_engine_pool
from paddlespeech.server.restful.api import setup_router from paddlespeech.server.restful.api import setup_router
from paddlespeech.server.utils.config import get_config from paddlespeech.server.utils.config import get_config
__all__ = ['ServerExecutor'] __all__ = ['ServerExecutor', 'ServerStatsExecutor']
app = FastAPI( app = FastAPI(
title="PaddleSpeech Serving API", description="Api", version="0.0.1") title="PaddleSpeech Serving API", description="Api", version="0.0.1")
@ -86,3 +88,139 @@ class ServerExecutor(BaseExecutor):
config = get_config(config_file) config = get_config(config_file)
if self.init(config): if self.init(config):
uvicorn.run(app, host=config.host, port=config.port, debug=True) uvicorn.run(app, host=config.host, port=config.port, debug=True)
@cli_server_register(
name='paddlespeech_server.stats',
description='Get the models supported by each speech task in the service.')
class ServerStatsExecutor():
def __init__(self):
super(ServerStatsExecutor, self).__init__()
self.parser = argparse.ArgumentParser(
prog='paddlespeech_server.stats', add_help=True)
self.parser.add_argument(
'--task',
type=str,
default=None,
choices=['asr', 'tts'],
help='Choose speech task.',
required=True)
self.task_choices = ['asr', 'tts']
self.model_name_format = {
'asr': 'Model-Language-Sample Rate',
'tts': 'Model-Language'
}
def show_support_models(self, pretrained_models: dict):
fields = self.model_name_format[self.task].split("-")
table = PrettyTable(fields)
for key in pretrained_models:
table.add_row(key.split("-"))
print(table)
def execute(self, argv: List[str]) -> bool:
"""
Command line entry.
"""
parser_args = self.parser.parse_args(argv)
self.task = parser_args.task
if self.task not in self.task_choices:
logger.error(
"Please input correct speech task, choices = ['asr', 'tts']")
return False
elif self.task == 'asr':
try:
from paddlespeech.cli.asr.infer import pretrained_models
logger.info(
"Here is the table of ASR pretrained models supported in the service."
)
self.show_support_models(pretrained_models)
# show ASR static pretrained model
from paddlespeech.server.engine.asr.paddleinference.asr_engine import pretrained_models
logger.info(
"Here is the table of ASR static pretrained models supported in the service."
)
self.show_support_models(pretrained_models)
return True
except BaseException:
logger.error(
"Failed to get the table of ASR pretrained models supported in the service."
)
return False
elif self.task == 'tts':
try:
from paddlespeech.cli.tts.infer import pretrained_models
logger.info(
"Here is the table of TTS pretrained models supported in the service."
)
self.show_support_models(pretrained_models)
# show TTS static pretrained model
from paddlespeech.server.engine.tts.paddleinference.tts_engine import pretrained_models
logger.info(
"Here is the table of TTS static pretrained models supported in the service."
)
self.show_support_models(pretrained_models)
return True
except BaseException:
logger.error(
"Failed to get the table of TTS pretrained models supported in the service."
)
return False
@stats_wrapper
def __call__(
self,
task: str=None, ):
"""
Python API to call an executor.
"""
self.task = task
if self.task not in self.task_choices:
print("Please input correct speech task, choices = ['asr', 'tts']")
elif self.task == 'asr':
try:
from paddlespeech.cli.asr.infer import pretrained_models
print(
"Here is the table of ASR pretrained models supported in the service."
)
self.show_support_models(pretrained_models)
# show ASR static pretrained model
from paddlespeech.server.engine.asr.paddleinference.asr_engine import pretrained_models
print(
"Here is the table of ASR static pretrained models supported in the service."
)
self.show_support_models(pretrained_models)
except BaseException:
print(
"Failed to get the table of ASR pretrained models supported in the service."
)
elif self.task == 'tts':
try:
from paddlespeech.cli.tts.infer import pretrained_models
print(
"Here is the table of TTS pretrained models supported in the service."
)
self.show_support_models(pretrained_models)
# show TTS static pretrained model
from paddlespeech.server.engine.tts.paddleinference.tts_engine import pretrained_models
print(
"Here is the table of TTS static pretrained models supported in the service."
)
self.show_support_models(pretrained_models)
except BaseException:
print(
"Failed to get the table of TTS pretrained models supported in the service."
)

@ -63,7 +63,7 @@ class ToneSandhi():
'扫把', '惦记' '扫把', '惦记'
} }
self.must_not_neural_tone_words = { self.must_not_neural_tone_words = {
"男子", "女子", "分子", "原子", "量子", "莲子", "石子", "瓜子", "电子" "男子", "女子", "分子", "原子", "量子", "莲子", "石子", "瓜子", "电子", "人人", "虎虎"
} }
self.punc = ":,;。?!“”‘’':,;.?!" self.punc = ":,;。?!“”‘’':,;.?!"
@ -77,7 +77,9 @@ class ToneSandhi():
# reduplication words for n. and v. e.g. 奶奶, 试试, 旺旺 # reduplication words for n. and v. e.g. 奶奶, 试试, 旺旺
for j, item in enumerate(word): for j, item in enumerate(word):
if j - 1 >= 0 and item == word[j - 1] and pos[0] in {"n", "v", "a"}: if j - 1 >= 0 and item == word[j - 1] and pos[0] in {
"n", "v", "a"
} and word not in self.must_not_neural_tone_words:
finals[j] = finals[j][:-1] + "5" finals[j] = finals[j][:-1] + "5"
ge_idx = word.find("") ge_idx = word.find("")
if len(word) >= 1 and word[-1] in "吧呢哈啊呐噻嘛吖嗨呐哦哒额滴哩哟喽啰耶喔诶": if len(word) >= 1 and word[-1] in "吧呢哈啊呐噻嘛吖嗨呐哦哒额滴哩哟喽啰耶喔诶":

@ -20,7 +20,10 @@ import numpy as np
import paddle import paddle
from g2pM import G2pM from g2pM import G2pM
from pypinyin import lazy_pinyin from pypinyin import lazy_pinyin
from pypinyin import load_phrases_dict
from pypinyin import load_single_dict
from pypinyin import Style from pypinyin import Style
from pypinyin_dict.phrase_pinyin_data import large_pinyin
from paddlespeech.t2s.frontend.generate_lexicon import generate_lexicon from paddlespeech.t2s.frontend.generate_lexicon import generate_lexicon
from paddlespeech.t2s.frontend.tone_sandhi import ToneSandhi from paddlespeech.t2s.frontend.tone_sandhi import ToneSandhi
@ -41,6 +44,8 @@ class Frontend():
self.g2pM_model = G2pM() self.g2pM_model = G2pM()
self.pinyin2phone = generate_lexicon( self.pinyin2phone = generate_lexicon(
with_tone=True, with_erhua=False) with_tone=True, with_erhua=False)
else:
self.__init__pypinyin()
self.must_erhua = {"小院儿", "胡同儿", "范儿", "老汉儿", "撒欢儿", "寻老礼儿", "妥妥儿"} self.must_erhua = {"小院儿", "胡同儿", "范儿", "老汉儿", "撒欢儿", "寻老礼儿", "妥妥儿"}
self.not_erhua = { self.not_erhua = {
"虐儿", "为儿", "护儿", "瞒儿", "救儿", "替儿", "有儿", "一儿", "我儿", "俺儿", "妻儿", "虐儿", "为儿", "护儿", "瞒儿", "救儿", "替儿", "有儿", "一儿", "我儿", "俺儿", "妻儿",
@ -62,6 +67,23 @@ class Frontend():
for tone, id in tone_id: for tone, id in tone_id:
self.vocab_tones[tone] = int(id) self.vocab_tones[tone] = int(id)
def __init__pypinyin(self):
large_pinyin.load()
load_phrases_dict({u'开户行': [[u'ka1i'], [u'hu4'], [u'hang2']]})
load_phrases_dict({u'发卡行': [[u'fa4'], [u'ka3'], [u'hang2']]})
load_phrases_dict({u'放款行': [[u'fa4ng'], [u'kua3n'], [u'hang2']]})
load_phrases_dict({u'茧行': [[u'jia3n'], [u'hang2']]})
load_phrases_dict({u'行号': [[u'hang2'], [u'ha4o']]})
load_phrases_dict({u'各地': [[u'ge4'], [u'di4']]})
load_phrases_dict({u'借还款': [[u'jie4'], [u'hua2n'], [u'kua3n']]})
load_phrases_dict({u'时间为': [[u'shi2'], [u'jia1n'], [u'we2i']]})
load_phrases_dict({u'为准': [[u'we2i'], [u'zhu3n']]})
load_phrases_dict({u'色差': [[u'se4'], [u'cha1']]})
# 调整字的拼音顺序
load_single_dict({ord(u''): u'de,di4'})
def _get_initials_finals(self, word: str) -> List[List[str]]: def _get_initials_finals(self, word: str) -> List[List[str]]:
initials = [] initials = []
finals = [] finals = []

@ -63,7 +63,10 @@ def replace_time(match) -> str:
result = f"{num2str(hour)}" result = f"{num2str(hour)}"
if minute.lstrip('0'): if minute.lstrip('0'):
result += f"{_time_num2str(minute)}" if int(minute) == 30:
result += f""
else:
result += f"{_time_num2str(minute)}"
if second and second.lstrip('0'): if second and second.lstrip('0'):
result += f"{_time_num2str(second)}" result += f"{_time_num2str(second)}"
@ -71,7 +74,10 @@ def replace_time(match) -> str:
result += "" result += ""
result += f"{num2str(hour_2)}" result += f"{num2str(hour_2)}"
if minute_2.lstrip('0'): if minute_2.lstrip('0'):
result += f"{_time_num2str(minute_2)}" if int(minute) == 30:
result += f""
else:
result += f"{_time_num2str(minute_2)}"
if second_2 and second_2.lstrip('0'): if second_2 and second_2.lstrip('0'):
result += f"{_time_num2str(second_2)}" result += f"{_time_num2str(second_2)}"

@ -28,7 +28,7 @@ UNITS = OrderedDict({
8: '亿', 8: '亿',
}) })
COM_QUANTIFIERS = '(朵|匹|张|座|回|场|尾|条|个|首|阙|阵|网|炮|顶|丘|棵|只|支|袭|辆|挑|担|颗|壳|窠|曲|墙|群|腔|砣|座|客|贯|扎|捆|刀|令|打|手|罗|坡|山|岭|江|溪|钟|队|单|双|对|出|口|头|脚|板|跳|枝|件|贴|针|线|管|名|位|身|堂|课|本|页|家|户|层|丝|毫|厘|分|钱|两|斤|担|铢|石|钧|锱|忽|(千|毫|微)克|毫|厘|(公)分|分|寸|尺|丈|里|寻|常|铺|程|(千|分|厘|毫|微)米|米|撮|勺|合|升|斗|石|盘|碗|碟|叠|桶|笼|盆|盒|杯|钟|斛|锅|簋|篮|盘|桶|罐|瓶|壶|卮|盏|箩|箱|煲|啖|袋|钵|年|月|日|季|刻|时|周|天|秒|分|旬|纪|岁|世|更|夜|春|夏|秋|冬|代|伏|辈|丸|泡|粒|颗|幢|堆|条|根|支|道|面|片|张|颗|块|元|(亿|千万|百万|万|千|百)|(亿|千万|百万|万|千|百|美|)元|(亿|千万|百万|万|千|百|)块|角|毛|分)' COM_QUANTIFIERS = '(所|朵|匹|张|座|回|场|尾|条|个|首|阙|阵|网|炮|顶|丘|棵|只|支|袭|辆|挑|担|颗|壳|窠|曲|墙|群|腔|砣|座|客|贯|扎|捆|刀|令|打|手|罗|坡|山|岭|江|溪|钟|队|单|双|对|出|口|头|脚|板|跳|枝|件|贴|针|线|管|名|位|身|堂|课|本|页|家|户|层|丝|毫|厘|分|钱|两|斤|担|铢|石|钧|锱|忽|(千|毫|微)克|毫|厘|(公)分|分|寸|尺|丈|里|寻|常|铺|程|(千|分|厘|毫|微)米|米|撮|勺|合|升|斗|石|盘|碗|碟|叠|桶|笼|盆|盒|杯|钟|斛|锅|簋|篮|盘|桶|罐|瓶|壶|卮|盏|箩|箱|煲|啖|袋|钵|年|月|日|季|刻|时|周|天|秒|分|小时|旬|纪|岁|世|更|夜|春|夏|秋|冬|代|伏|辈|丸|泡|粒|颗|幢|堆|条|根|支|道|面|片|张|颗|块|元|(亿|千万|百万|万|千|百)|(亿|千万|百万|万|千|百|美|)元|(亿|千万|百万|万|千|百|)块|角|毛|分)'
# 分数表达式 # 分数表达式
RE_FRAC = re.compile(r'(-?)(\d+)/(\d+)') RE_FRAC = re.compile(r'(-?)(\d+)/(\d+)')
@ -110,7 +110,7 @@ def replace_default_num(match):
# 纯小数 # 纯小数
RE_DECIMAL_NUM = re.compile(r'(-?)((\d+)(\.\d+))' r'|(\.(\d+))') RE_DECIMAL_NUM = re.compile(r'(-?)((\d+)(\.\d+))' r'|(\.(\d+))')
# 正整数 + 量词 # 正整数 + 量词
RE_POSITIVE_QUANTIFIERS = re.compile(r"(\d+)([多余几])?" + COM_QUANTIFIERS) RE_POSITIVE_QUANTIFIERS = re.compile(r"(\d+)([多余几\+])?" + COM_QUANTIFIERS)
RE_NUMBER = re.compile(r'(-?)((\d+)(\.\d+)?)' r'|(\.(\d+))') RE_NUMBER = re.compile(r'(-?)((\d+)(\.\d+)?)' r'|(\.(\d+))')
@ -123,6 +123,8 @@ def replace_positive_quantifier(match) -> str:
""" """
number = match.group(1) number = match.group(1)
match_2 = match.group(2) match_2 = match.group(2)
if match_2 == "+":
match_2 = ""
match_2: str = match_2 if match_2 else "" match_2: str = match_2 if match_2 else ""
quantifiers: str = match.group(3) quantifiers: str = match.group(3)
number: str = num2str(number) number: str = num2str(number)
@ -151,6 +153,7 @@ def replace_number(match) -> str:
# 范围表达式 # 范围表达式
# match.group(1) and match.group(8) are copy from RE_NUMBER # match.group(1) and match.group(8) are copy from RE_NUMBER
RE_RANGE = re.compile( RE_RANGE = re.compile(
r'((-?)((\d+)(\.\d+)?)|(\.(\d+)))[-~]((-?)((\d+)(\.\d+)?)|(\.(\d+)))') r'((-?)((\d+)(\.\d+)?)|(\.(\d+)))[-~]((-?)((\d+)(\.\d+)?)|(\.(\d+)))')

@ -63,11 +63,19 @@ class TextNormalizer():
# Only for pure Chinese here # Only for pure Chinese here
if lang == "zh": if lang == "zh":
text = text.replace(" ", "") text = text.replace(" ", "")
# 过滤掉特殊字符
text = re.sub(r'[《》【】<=>{}()#&@“”^_|…\\]', '', text)
text = self.SENTENCE_SPLITOR.sub(r'\1\n', text) text = self.SENTENCE_SPLITOR.sub(r'\1\n', text)
text = text.strip() text = text.strip()
sentences = [sentence.strip() for sentence in re.split(r'\n+', text)] sentences = [sentence.strip() for sentence in re.split(r'\n+', text)]
return sentences return sentences
def _post_replace(self, sentence: str) -> str:
sentence = sentence.replace('/', '')
sentence = sentence.replace('~', '')
return sentence
def normalize_sentence(self, sentence: str) -> str: def normalize_sentence(self, sentence: str) -> str:
# basic character conversions # basic character conversions
sentence = tranditional_to_simplified(sentence) sentence = tranditional_to_simplified(sentence)
@ -97,6 +105,7 @@ class TextNormalizer():
sentence) sentence)
sentence = RE_DEFAULT_NUM.sub(replace_default_num, sentence) sentence = RE_DEFAULT_NUM.sub(replace_default_num, sentence)
sentence = RE_NUMBER.sub(replace_number, sentence) sentence = RE_NUMBER.sub(replace_number, sentence)
sentence = self._post_replace(sentence)
return sentence return sentence

@ -36,4 +36,4 @@ def repeat(N, fn):
Returns: Returns:
MultiSequential: Repeated model instance. MultiSequential: Repeated model instance.
""" """
return MultiSequential(*[fn(n) for n in range(N)]) return MultiSequential(* [fn(n) for n in range(N)])

@ -48,6 +48,7 @@ base = [
"paddlespeech_feat", "paddlespeech_feat",
"praatio==5.0.0", "praatio==5.0.0",
"pypinyin", "pypinyin",
"pypinyin-dict",
"python-dateutil", "python-dateutil",
"pyworld", "pyworld",
"resampy==0.2.2", "resampy==0.2.2",
@ -62,6 +63,7 @@ base = [
"visualdl", "visualdl",
"webrtcvad", "webrtcvad",
"yacs~=0.1.8", "yacs~=0.1.8",
"prettytable",
] ]
server = [ server = [

@ -54,4 +54,4 @@ batch_size:16|30
fp_items:fp32 fp_items:fp32
iteration:50 iteration:50
--profiler-options:"batch_range=[10,35];state=GPU;tracer_option=Default;profile_path=model.profile" --profiler-options:"batch_range=[10,35];state=GPU;tracer_option=Default;profile_path=model.profile"
flags:FLAGS_eager_delete_tensor_gb=0.0;FLAGS_fraction_of_gpu_memory_to_use=0.98;FLAGS_conv_workspace_size_limit=4096 flags:null

@ -54,4 +54,4 @@ batch_size:6|16
fp_items:fp32 fp_items:fp32
iteration:50 iteration:50
--profiler_options:"batch_range=[10,35];state=GPU;tracer_option=Default;profile_path=model.profile" --profiler_options:"batch_range=[10,35];state=GPU;tracer_option=Default;profile_path=model.profile"
flags:FLAGS_eager_delete_tensor_gb=0.0;FLAGS_fraction_of_gpu_memory_to_use=0.98;FLAGS_conv_workspace_size_limit=4096 flags:null

@ -26,15 +26,19 @@ if [ ${MODE} = "benchmark_train" ];then
curPath=$(readlink -f "$(dirname "$0")") curPath=$(readlink -f "$(dirname "$0")")
echo "curPath:"${curPath} echo "curPath:"${curPath}
cd ${curPath}/../.. cd ${curPath}/../..
pip install . apt-get install libsndfile1
pip install pytest-runner kaldiio setuptools_scm -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple
cd - cd -
if [ ${model_name} == "conformer" ]; then if [ ${model_name} == "conformer" ]; then
# set the URL for aishell_tiny dataset # set the URL for aishell_tiny dataset
URL='None' URL=${conformer_data_URL:-"None"}
echo "URL:"${URL} echo "URL:"${URL}
if [ ${URL} == 'None' ];then if [ ${URL} == 'None' ];then
echo "please contact author to get the URL.\n" echo "please contact author to get the URL.\n"
exit exit
else
wget -P ${curPath}/../../dataset/aishell/ ${URL}
fi fi
sed -i "s#^URL_ROOT_TAG#URL_ROOT = '${URL}'#g" ${curPath}/conformer/scripts/aishell_tiny.py sed -i "s#^URL_ROOT_TAG#URL_ROOT = '${URL}'#g" ${curPath}/conformer/scripts/aishell_tiny.py
cp ${curPath}/conformer/scripts/aishell_tiny.py ${curPath}/../../dataset/aishell/ cp ${curPath}/conformer/scripts/aishell_tiny.py ${curPath}/../../dataset/aishell/
@ -42,6 +46,7 @@ if [ ${MODE} = "benchmark_train" ];then
source path.sh source path.sh
# download audio data # download audio data
sed -i "s#aishell.py#aishell_tiny.py#g" ./local/data.sh sed -i "s#aishell.py#aishell_tiny.py#g" ./local/data.sh
sed -i "s#python3#python#g" ./local/data.sh
bash ./local/data.sh || exit -1 bash ./local/data.sh || exit -1
if [ $? -ne 0 ]; then if [ $? -ne 0 ]; then
exit 1 exit 1
@ -56,7 +61,6 @@ if [ ${MODE} = "benchmark_train" ];then
sed -i "s#conf/#test_tipc/conformer/benchmark_train/conf/#g" ${curPath}/conformer/benchmark_train/conf/conformer.yaml sed -i "s#conf/#test_tipc/conformer/benchmark_train/conf/#g" ${curPath}/conformer/benchmark_train/conf/conformer.yaml
sed -i "s#data/#test_tipc/conformer/benchmark_train/data/#g" ${curPath}/conformer/benchmark_train/conf/tuning/decode.yaml sed -i "s#data/#test_tipc/conformer/benchmark_train/data/#g" ${curPath}/conformer/benchmark_train/conf/tuning/decode.yaml
sed -i "s#data/#test_tipc/conformer/benchmark_train/data/#g" ${curPath}/conformer/benchmark_train/conf/preprocess.yaml sed -i "s#data/#test_tipc/conformer/benchmark_train/data/#g" ${curPath}/conformer/benchmark_train/conf/preprocess.yaml
fi fi
if [ ${model_name} == "pwgan" ]; then if [ ${model_name} == "pwgan" ]; then
@ -73,4 +77,4 @@ if [ ${MODE} = "benchmark_train" ];then
python ../paddlespeech/t2s/exps/gan_vocoder/normalize.py --metadata=dump/test/raw/metadata.jsonl --dumpdir=dump/test/norm --stats=dump/train/feats_stats.npy python ../paddlespeech/t2s/exps/gan_vocoder/normalize.py --metadata=dump/test/raw/metadata.jsonl --dumpdir=dump/test/norm --stats=dump/train/feats_stats.npy
fi fi
fi fi

@ -11,11 +11,15 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import os
import pickle
import unittest import unittest
import numpy as np import numpy as np
import paddle import paddle
from paddle import inference
from paddlespeech.s2t.models.ds2_online import DeepSpeech2InferModelOnline
from paddlespeech.s2t.models.ds2_online import DeepSpeech2ModelOnline from paddlespeech.s2t.models.ds2_online import DeepSpeech2ModelOnline
@ -182,5 +186,77 @@ class TestDeepSpeech2ModelOnline(unittest.TestCase):
paddle.allclose(final_state_c_box, final_state_c_box_chk), True) paddle.allclose(final_state_c_box, final_state_c_box_chk), True)
class TestDeepSpeech2StaticModelOnline(unittest.TestCase):
def setUp(self):
export_prefix = "exp/deepspeech2_online/checkpoints/test_export"
if not os.path.exists(os.path.dirname(export_prefix)):
os.makedirs(os.path.dirname(export_prefix), mode=0o755)
infer_model = DeepSpeech2InferModelOnline(
feat_size=161,
dict_size=4233,
num_conv_layers=2,
num_rnn_layers=5,
rnn_size=1024,
num_fc_layers=0,
fc_layers_size_list=[-1],
use_gru=False)
static_model = infer_model.export()
paddle.jit.save(static_model, export_prefix)
with open("test_data/static_ds2online_inputs.pickle", "rb") as f:
self.data_dict = pickle.load(f)
self.setup_model(export_prefix)
def setup_model(self, export_prefix):
deepspeech_config = inference.Config(export_prefix + ".pdmodel",
export_prefix + ".pdiparams")
if ('CUDA_VISIBLE_DEVICES' in os.environ.keys() and
os.environ['CUDA_VISIBLE_DEVICES'].strip() != ''):
deepspeech_config.enable_use_gpu(100, 0)
deepspeech_config.enable_memory_optim()
deepspeech_predictor = inference.create_predictor(deepspeech_config)
self.predictor = deepspeech_predictor
def test_unit(self):
input_names = self.predictor.get_input_names()
audio_handle = self.predictor.get_input_handle(input_names[0])
audio_len_handle = self.predictor.get_input_handle(input_names[1])
h_box_handle = self.predictor.get_input_handle(input_names[2])
c_box_handle = self.predictor.get_input_handle(input_names[3])
x_chunk = self.data_dict["audio_chunk"]
x_chunk_lens = self.data_dict["audio_chunk_lens"]
chunk_state_h_box = self.data_dict["chunk_state_h_box"]
chunk_state_c_box = self.data_dict["chunk_state_c_bos"]
audio_handle.reshape(x_chunk.shape)
audio_handle.copy_from_cpu(x_chunk)
audio_len_handle.reshape(x_chunk_lens.shape)
audio_len_handle.copy_from_cpu(x_chunk_lens)
h_box_handle.reshape(chunk_state_h_box.shape)
h_box_handle.copy_from_cpu(chunk_state_h_box)
c_box_handle.reshape(chunk_state_c_box.shape)
c_box_handle.copy_from_cpu(chunk_state_c_box)
output_names = self.predictor.get_output_names()
output_handle = self.predictor.get_output_handle(output_names[0])
output_lens_handle = self.predictor.get_output_handle(output_names[1])
output_state_h_handle = self.predictor.get_output_handle(
output_names[2])
output_state_c_handle = self.predictor.get_output_handle(
output_names[3])
self.predictor.run()
output_chunk_probs = output_handle.copy_to_cpu()
output_chunk_lens = output_lens_handle.copy_to_cpu()
chunk_state_h_box = output_state_h_handle.copy_to_cpu()
chunk_state_c_box = output_state_c_handle.copy_to_cpu()
return True
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

@ -0,0 +1,3 @@
mkdir -p ./test_data
wget -P ./test_data https://paddlespeech.bj.bcebos.com/datasets/unit_test/asr/static_ds2online_inputs.pickle
python deepspeech2_online_model_test.py
Loading…
Cancel
Save