Merge branch 'PaddlePaddle:develop' into develop

pull/2493/head
liangym 3 years ago committed by GitHub
commit c737dab833
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -19,8 +19,6 @@
<div align="center"> <div align="center">
<h4> <h4>
<a href="#quick-start"> Quick Start </a> <a href="#quick-start"> Quick Start </a>
| <a href="#quick-start-server"> Quick Start Server </a>
| <a href="#quick-start-streaming-server"> Quick Start Streaming Server</a>
| <a href="#documents"> Documents </a> | <a href="#documents"> Documents </a>
| <a href="#model-list"> Models List </a> | <a href="#model-list"> Models List </a>
| <a href="https://aistudio.baidu.com/aistudio/education/group/info/25130"> AIStudio Courses </a> | <a href="https://aistudio.baidu.com/aistudio/education/group/info/25130"> AIStudio Courses </a>
@ -159,6 +157,8 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
- 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV). - 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).
### Recent Update ### Recent Update
- 🔥 2022.09.26: Add Voice Cloning, TTS finetune, and ERNIE-SAT in [PaddleSpeech Web Demo](./demos/speech_web).
- ⚡ 2022.09.09: Add AISHELL-3 Voice Cloning [example](./examples/aishell3/vc2) with ECAPA-TDNN speaker encoder.
- ⚡ 2022.08.25: Release TTS [finetune](./examples/other/tts_finetune/tts3) example. - ⚡ 2022.08.25: Release TTS [finetune](./examples/other/tts_finetune/tts3) example.
- 🔥 2022.08.22: Add ERNIE-SAT models: [ERNIE-SAT-vctk](./examples/vctk/ernie_sat)、[ERNIE-SAT-aishell3](./examples/aishell3/ernie_sat)、[ERNIE-SAT-zh_en](./examples/aishell3_vctk/ernie_sat). - 🔥 2022.08.22: Add ERNIE-SAT models: [ERNIE-SAT-vctk](./examples/vctk/ernie_sat)、[ERNIE-SAT-aishell3](./examples/aishell3/ernie_sat)、[ERNIE-SAT-zh_en](./examples/aishell3_vctk/ernie_sat).
- 🔥 2022.08.15: Add [g2pW](https://github.com/GitYCC/g2pW) into TTS Chinese Text Frontend. - 🔥 2022.08.15: Add [g2pW](https://github.com/GitYCC/g2pW) into TTS Chinese Text Frontend.
@ -705,7 +705,7 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
<tbody> <tbody>
<tr> <tr>
<td>Speaker Verification</td> <td>Speaker Verification</td>
<td>VoxCeleb12</td> <td>VoxCeleb1/2</td>
<td>ECAPA-TDNN</td> <td>ECAPA-TDNN</td>
<td> <td>
<a href = "./examples/voxceleb/sv0">ecapa-tdnn-voxceleb12</a> <a href = "./examples/voxceleb/sv0">ecapa-tdnn-voxceleb12</a>
@ -714,6 +714,31 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
</tbody> </tbody>
</table> </table>
<a name="SpeakerDiarization"></a>
**Speaker Diarization**
<table style="width:100%">
<thead>
<tr>
<th> Task </th>
<th> Dataset </th>
<th> Model Type </th>
<th> Example </th>
</tr>
</thead>
<tbody>
<tr>
<td>Speaker Diarization</td>
<td>AMI</td>
<td>ECAPA-TDNN + AHC / SC</td>
<td>
<a href = "./examples/ami/sd0">ecapa-tdnn-ami</a>
</td>
</tr>
</tbody>
</table>
<a name="PunctuationRestoration"></a> <a name="PunctuationRestoration"></a>
**Punctuation Restoration** **Punctuation Restoration**
@ -767,6 +792,7 @@ Normally, [Speech SoTA](https://paperswithcode.com/area/speech), [Audio SoTA](ht
- [Text-to-Speech](#TextToSpeech) - [Text-to-Speech](#TextToSpeech)
- [Audio Classification](#AudioClassification) - [Audio Classification](#AudioClassification)
- [Speaker Verification](#SpeakerVerification) - [Speaker Verification](#SpeakerVerification)
- [Speaker Diarization](#SpeakerDiarization)
- [Punctuation Restoration](#PunctuationRestoration) - [Punctuation Restoration](#PunctuationRestoration)
- [Community](#Community) - [Community](#Community)
- [Welcome to contribute](#contribution) - [Welcome to contribute](#contribution)

@ -19,10 +19,8 @@
</p> </p>
<div align="center"> <div align="center">
<h4> <h4>
<a href="#安装"> 安装 </a> <a href="#安装"> 安装 </a>
| <a href="#快速开始"> 快速开始 </a> | <a href="#快速开始"> 快速开始 </a>
| <a href="#快速使用服务"> 快速使用服务 </a>
| <a href="#快速使用流式服务"> 快速使用流式服务 </a>
| <a href="#教程文档"> 教程文档 </a> | <a href="#教程文档"> 教程文档 </a>
| <a href="#模型列表"> 模型列表 </a> | <a href="#模型列表"> 模型列表 </a>
| <a href="https://aistudio.baidu.com/aistudio/education/group/info/25130"> AIStudio 课程 </a> | <a href="https://aistudio.baidu.com/aistudio/education/group/info/25130"> AIStudio 课程 </a>
@ -181,6 +179,8 @@
</div> </div>
### 近期更新 ### 近期更新
- 🔥 2022.09.26: 新增 Voice Cloning, TTS finetune 和 ERNIE-SAT 到 [PaddleSpeech 网页应用](./demos/speech_web)。
- ⚡ 2022.09.09: 新增基于 ECAPA-TDNN 声纹模型的 AISHELL-3 Voice Cloning [示例](./examples/aishell3/vc2)。
- ⚡ 2022.08.25: 发布 TTS [finetune](./examples/other/tts_finetune/tts3) 示例。 - ⚡ 2022.08.25: 发布 TTS [finetune](./examples/other/tts_finetune/tts3) 示例。
- 🔥 2022.08.22: 新增 ERNIE-SAT 模型: [ERNIE-SAT-vctk](./examples/vctk/ernie_sat)、[ERNIE-SAT-aishell3](./examples/aishell3/ernie_sat)、[ERNIE-SAT-zh_en](./examples/aishell3_vctk/ernie_sat)。 - 🔥 2022.08.22: 新增 ERNIE-SAT 模型: [ERNIE-SAT-vctk](./examples/vctk/ernie_sat)、[ERNIE-SAT-aishell3](./examples/aishell3/ernie_sat)、[ERNIE-SAT-zh_en](./examples/aishell3_vctk/ernie_sat)。
- 🔥 2022.08.15: 将 [g2pW](https://github.com/GitYCC/g2pW) 引入 TTS 中文文本前端。 - 🔥 2022.08.15: 将 [g2pW](https://github.com/GitYCC/g2pW) 引入 TTS 中文文本前端。
@ -717,8 +717,8 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
</thead> </thead>
<tbody> <tbody>
<tr> <tr>
<td>Speaker Verification</td> <td>声纹识别</td>
<td>VoxCeleb12</td> <td>VoxCeleb1/2</td>
<td>ECAPA-TDNN</td> <td>ECAPA-TDNN</td>
<td> <td>
<a href = "./examples/voxceleb/sv0">ecapa-tdnn-voxceleb12</a> <a href = "./examples/voxceleb/sv0">ecapa-tdnn-voxceleb12</a>
@ -727,6 +727,31 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
</tbody> </tbody>
</table> </table>
<a name="说话人日志模型"></a>
**说话人日志**
<table style="width:100%">
<thead>
<tr>
<th> 任务 </th>
<th> 数据集 </th>
<th> 模型类型 </th>
<th> 脚本 </th>
</tr>
</thead>
<tbody>
<tr>
<td>说话人日志</td>
<td>AMI</td>
<td>ECAPA-TDNN + AHC / SC</td>
<td>
<a href = "./examples/ami/sd0">ecapa-tdnn-ami</a>
</td>
</tr>
</tbody>
</table>
<a name="标点恢复模型"></a> <a name="标点恢复模型"></a>
**标点恢复** **标点恢复**
@ -786,6 +811,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
- [语音合成](#语音合成模型) - [语音合成](#语音合成模型)
- [声音分类](#声音分类模型) - [声音分类](#声音分类模型)
- [声纹识别](#声纹识别模型) - [声纹识别](#声纹识别模型)
- [说话人日志](#说话人日志模型)
- [标点恢复](#标点恢复模型) - [标点恢复](#标点恢复模型)
- [技术交流群](#技术交流群) - [技术交流群](#技术交流群)
- [欢迎贡献](#欢迎贡献) - [欢迎贡献](#欢迎贡献)

@ -21,7 +21,7 @@ engine_list: ['asr_online']
################################### ASR ######################################### ################################### ASR #########################################
################### speech task: asr; engine_type: online ####################### ################### speech task: asr; engine_type: online #######################
asr_online: asr_online:
model_type: 'conformer_online_wenetspeech' model_type: 'conformer_u2pp_online_wenetspeech'
am_model: # the pdmodel file of am static model [optional] am_model: # the pdmodel file of am static model [optional]
am_params: # the pdiparams file of am static model [optional] am_params: # the pdiparams file of am static model [optional]
lang: 'zh' lang: 'zh'

@ -9,6 +9,7 @@ Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER |
[Ds2 Online Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_fbank161_ckpt_0.2.1.model.tar.gz) | Aishell Dataset | Char-based | 491 MB | 2 Conv + 5 LSTM layers | 0.0666 |-| 151 h | [D2 Online Aishell ASR0](../../examples/aishell/asr0) | onnx/inference/python | [Ds2 Online Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_fbank161_ckpt_0.2.1.model.tar.gz) | Aishell Dataset | Char-based | 491 MB | 2 Conv + 5 LSTM layers | 0.0666 |-| 151 h | [D2 Online Aishell ASR0](../../examples/aishell/asr0) | onnx/inference/python |
[Ds2 Offline Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_offline_aishell_ckpt_1.0.1.model.tar.gz)| Aishell Dataset | Char-based | 1.4 GB | 2 Conv + 5 bidirectional LSTM layers| 0.0554 |-| 151 h | [Ds2 Offline Aishell ASR0](../../examples/aishell/asr0) | inference/python | [Ds2 Offline Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_offline_aishell_ckpt_1.0.1.model.tar.gz)| Aishell Dataset | Char-based | 1.4 GB | 2 Conv + 5 bidirectional LSTM layers| 0.0554 |-| 151 h | [Ds2 Offline Aishell ASR0](../../examples/aishell/asr0) | inference/python |
[Conformer Online Wenetspeech ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr1/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz) | WenetSpeech Dataset | Char-based | 457 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring| 0.11 (test\_net) 0.1879 (test\_meeting) |-| 10000 h |- | python | [Conformer Online Wenetspeech ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr1/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz) | WenetSpeech Dataset | Char-based | 457 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring| 0.11 (test\_net) 0.1879 (test\_meeting) |-| 10000 h |- | python |
[Conformer U2PP Online Wenetspeech ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr1/asr1_chunk_conformer_u2pp_wenetspeech_ckpt_1.1.1.model.tar.gz) | WenetSpeech Dataset | Char-based | 476 MB | Encoder:Conformer, Decoder:BiTransformer, Decoding method: Attention rescoring| 0.047198 (aishell test\_-1) 0.059212 (aishell test\_16) |-| 10000 h |- | python |
[Conformer Online Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_chunk_conformer_aishell_ckpt_0.2.0.model.tar.gz) | Aishell Dataset | Char-based | 189 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring| 0.0544 |-| 151 h | [Conformer Online Aishell ASR1](../../examples/aishell/asr1) | python | [Conformer Online Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_chunk_conformer_aishell_ckpt_0.2.0.model.tar.gz) | Aishell Dataset | Char-based | 189 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring| 0.0544 |-| 151 h | [Conformer Online Aishell ASR1](../../examples/aishell/asr1) | python |
[Conformer Offline Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_conformer_aishell_ckpt_1.0.1.model.tar.gz) | Aishell Dataset | Char-based | 189 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0460 |-| 151 h | [Conformer Offline Aishell ASR1](../../examples/aishell/asr1) | python | [Conformer Offline Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_conformer_aishell_ckpt_1.0.1.model.tar.gz) | Aishell Dataset | Char-based | 189 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0460 |-| 151 h | [Conformer Offline Aishell ASR1](../../examples/aishell/asr1) | python |
[Transformer Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_transformer_aishell_ckpt_0.1.1.model.tar.gz) | Aishell Dataset | Char-based | 128 MB | Encoder:Transformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0523 || 151 h | [Transformer Aishell ASR1](../../examples/aishell/asr1) | python | [Transformer Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_transformer_aishell_ckpt_0.1.1.model.tar.gz) | Aishell Dataset | Char-based | 128 MB | Encoder:Transformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0523 || 151 h | [Transformer Aishell ASR1](../../examples/aishell/asr1) | python |

@ -26,6 +26,10 @@ if [ ${seed} != 0 ]; then
export FLAGS_cudnn_deterministic=True export FLAGS_cudnn_deterministic=True
fi fi
# default memeory allocator strategy may case gpu training hang
# for no OOM raised when memory exhaused
export FLAGS_allocator_strategy=naive_best_fit
if [ ${ngpu} == 0 ]; then if [ ${ngpu} == 0 ]; then
python3 -u ${BIN_DIR}/train.py \ python3 -u ${BIN_DIR}/train.py \
--ngpu ${ngpu} \ --ngpu ${ngpu} \

@ -35,6 +35,10 @@ echo ${ips_config}
mkdir -p exp mkdir -p exp
# default memeory allocator strategy may case gpu training hang
# for no OOM raised when memory exhaused
export FLAGS_allocator_strategy=naive_best_fit
if [ ${ngpu} == 0 ]; then if [ ${ngpu} == 0 ]; then
python3 -u ${BIN_DIR}/train.py \ python3 -u ${BIN_DIR}/train.py \
--ngpu ${ngpu} \ --ngpu ${ngpu} \

@ -0,0 +1,44 @@
###########################################################
# DATA SETTING #
###########################################################
dataset_type: Ernie
train_path: data/iwslt2012_zh/train.txt
dev_path: data/iwslt2012_zh/dev.txt
test_path: data/iwslt2012_zh/test.txt
batch_size: 64
num_workers: 2
data_params:
pretrained_token: ernie-3.0-base-zh
punc_path: data/iwslt2012_zh/punc_vocab
seq_len: 100
###########################################################
# MODEL SETTING #
###########################################################
model_type: ErnieLinear
model:
pretrained_token: ernie-3.0-base-zh
num_classes: 4
###########################################################
# OPTIMIZER SETTING #
###########################################################
optimizer_params:
weight_decay: 1.0e-6 # weight decay coefficient.
scheduler_params:
learning_rate: 1.0e-5 # learning rate.
gamma: 0.9999 # scheduler gamma must between(0.0, 1.0) and closer to 1.0 is better.
###########################################################
# TRAINING SETTING #
###########################################################
max_epoch: 20
num_snapshots: 5
###########################################################
# OTHER SETTING #
###########################################################
num_snapshots: 10 # max number of snapshots to keep while training
seed: 42 # random seed for paddle, random, and np.random

@ -0,0 +1,44 @@
###########################################################
# DATA SETTING #
###########################################################
dataset_type: Ernie
train_path: data/iwslt2012_zh/train.txt
dev_path: data/iwslt2012_zh/dev.txt
test_path: data/iwslt2012_zh/test.txt
batch_size: 64
num_workers: 2
data_params:
pretrained_token: ernie-3.0-medium-zh
punc_path: data/iwslt2012_zh/punc_vocab
seq_len: 100
###########################################################
# MODEL SETTING #
###########################################################
model_type: ErnieLinear
model:
pretrained_token: ernie-3.0-medium-zh
num_classes: 4
###########################################################
# OPTIMIZER SETTING #
###########################################################
optimizer_params:
weight_decay: 1.0e-6 # weight decay coefficient.
scheduler_params:
learning_rate: 1.0e-5 # learning rate.
gamma: 0.9999 # scheduler gamma must between(0.0, 1.0) and closer to 1.0 is better.
###########################################################
# TRAINING SETTING #
###########################################################
max_epoch: 20
num_snapshots: 5
###########################################################
# OTHER SETTING #
###########################################################
num_snapshots: 10 # max number of snapshots to keep while training
seed: 42 # random seed for paddle, random, and np.random

@ -0,0 +1,44 @@
###########################################################
# DATA SETTING #
###########################################################
dataset_type: Ernie
train_path: data/iwslt2012_zh/train.txt
dev_path: data/iwslt2012_zh/dev.txt
test_path: data/iwslt2012_zh/test.txt
batch_size: 64
num_workers: 2
data_params:
pretrained_token: ernie-3.0-mini-zh
punc_path: data/iwslt2012_zh/punc_vocab
seq_len: 100
###########################################################
# MODEL SETTING #
###########################################################
model_type: ErnieLinear
model:
pretrained_token: ernie-3.0-mini-zh
num_classes: 4
###########################################################
# OPTIMIZER SETTING #
###########################################################
optimizer_params:
weight_decay: 1.0e-6 # weight decay coefficient.
scheduler_params:
learning_rate: 1.0e-5 # learning rate.
gamma: 0.9999 # scheduler gamma must between(0.0, 1.0) and closer to 1.0 is better.
###########################################################
# TRAINING SETTING #
###########################################################
max_epoch: 20
num_snapshots: 5
###########################################################
# OTHER SETTING #
###########################################################
num_snapshots: 10 # max number of snapshots to keep while training
seed: 42 # random seed for paddle, random, and np.random

@ -0,0 +1,44 @@
###########################################################
# DATA SETTING #
###########################################################
dataset_type: Ernie
train_path: data/iwslt2012_zh/train.txt
dev_path: data/iwslt2012_zh/dev.txt
test_path: data/iwslt2012_zh/test.txt
batch_size: 64
num_workers: 2
data_params:
pretrained_token: ernie-3.0-nano-zh
punc_path: data/iwslt2012_zh/punc_vocab
seq_len: 100
###########################################################
# MODEL SETTING #
###########################################################
model_type: ErnieLinear
model:
pretrained_token: ernie-3.0-nano-zh
num_classes: 4
###########################################################
# OPTIMIZER SETTING #
###########################################################
optimizer_params:
weight_decay: 1.0e-6 # weight decay coefficient.
scheduler_params:
learning_rate: 1.0e-5 # learning rate.
gamma: 0.9999 # scheduler gamma must between(0.0, 1.0) and closer to 1.0 is better.
###########################################################
# TRAINING SETTING #
###########################################################
max_epoch: 20
num_snapshots: 5
###########################################################
# OTHER SETTING #
###########################################################
num_snapshots: 10 # max number of snapshots to keep while training
seed: 42 # random seed for paddle, random, and np.random

@ -0,0 +1,44 @@
###########################################################
# DATA SETTING #
###########################################################
dataset_type: Ernie
train_path: data/iwslt2012_zh/train.txt
dev_path: data/iwslt2012_zh/dev.txt
test_path: data/iwslt2012_zh/test.txt
batch_size: 64
num_workers: 2
data_params:
pretrained_token: ernie-tiny
punc_path: data/iwslt2012_zh/punc_vocab
seq_len: 100
###########################################################
# MODEL SETTING #
###########################################################
model_type: ErnieLinear
model:
pretrained_token: ernie-tiny
num_classes: 4
###########################################################
# OPTIMIZER SETTING #
###########################################################
optimizer_params:
weight_decay: 1.0e-6 # weight decay coefficient.
scheduler_params:
learning_rate: 1.0e-5 # learning rate.
gamma: 0.9999 # scheduler gamma must between(0.0, 1.0) and closer to 1.0 is better.
###########################################################
# TRAINING SETTING #
###########################################################
max_epoch: 20
num_snapshots: 5
###########################################################
# OTHER SETTING #
###########################################################
num_snapshots: 10 # max number of snapshots to keep while training
seed: 42 # random seed for paddle, random, and np.random

@ -26,6 +26,10 @@ if [ ${seed} != 0 ]; then
export FLAGS_cudnn_deterministic=True export FLAGS_cudnn_deterministic=True
fi fi
# default memeory allocator strategy may case gpu training hang
# for no OOM raised when memory exhaused
export FLAGS_allocator_strategy=naive_best_fit
if [ ${ngpu} == 0 ]; then if [ ${ngpu} == 0 ]; then
python3 -u ${BIN_DIR}/train.py \ python3 -u ${BIN_DIR}/train.py \
--ngpu ${ngpu} \ --ngpu ${ngpu} \

@ -29,6 +29,10 @@ fi
# export FLAGS_cudnn_exhaustive_search=true # export FLAGS_cudnn_exhaustive_search=true
# export FLAGS_conv_workspace_size_limit=4000 # export FLAGS_conv_workspace_size_limit=4000
# default memeory allocator strategy may case gpu training hang
# for no OOM raised when memory exhaused
export FLAGS_allocator_strategy=naive_best_fit
if [ ${ngpu} == 0 ]; then if [ ${ngpu} == 0 ]; then
python3 -u ${BIN_DIR}/train.py \ python3 -u ${BIN_DIR}/train.py \
--ngpu ${ngpu} \ --ngpu ${ngpu} \

@ -26,6 +26,10 @@ if [ ${seed} != 0 ]; then
export FLAGS_cudnn_deterministic=True export FLAGS_cudnn_deterministic=True
fi fi
# default memeory allocator strategy may case gpu training hang
# for no OOM raised when memory exhaused
export FLAGS_allocator_strategy=naive_best_fit
if [ ${ngpu} == 0 ]; then if [ ${ngpu} == 0 ]; then
python3 -u ${BIN_DIR}/train.py \ python3 -u ${BIN_DIR}/train.py \
--ngpu ${ngpu} \ --ngpu ${ngpu} \

@ -19,6 +19,10 @@ if [ ${seed} != 0 ]; then
export FLAGS_cudnn_deterministic=True export FLAGS_cudnn_deterministic=True
fi fi
# default memeory allocator strategy may case gpu training hang
# for no OOM raised when memory exhaused
export FLAGS_allocator_strategy=naive_best_fit
if [ ${ngpu} == 0 ]; then if [ ${ngpu} == 0 ]; then
python3 -u ${BIN_DIR}/train.py \ python3 -u ${BIN_DIR}/train.py \
--ngpu ${ngpu} \ --ngpu ${ngpu} \

@ -32,6 +32,10 @@ fi
mkdir -p exp mkdir -p exp
# default memeory allocator strategy may case gpu training hang
# for no OOM raised when memory exhaused
export FLAGS_allocator_strategy=naive_best_fit
if [ ${ngpu} == 0 ]; then if [ ${ngpu} == 0 ]; then
python3 -u ${BIN_DIR}/train.py \ python3 -u ${BIN_DIR}/train.py \
--ngpu ${ngpu} \ --ngpu ${ngpu} \

@ -34,6 +34,10 @@ fi
mkdir -p exp mkdir -p exp
# default memeory allocator strategy may case gpu training hang
# for no OOM raised when memory exhaused
export FLAGS_allocator_strategy=naive_best_fit
if [ ${ngpu} == 0 ]; then if [ ${ngpu} == 0 ]; then
python3 -u ${BIN_DIR}/train.py \ python3 -u ${BIN_DIR}/train.py \
--ngpu ${ngpu} \ --ngpu ${ngpu} \

@ -35,6 +35,10 @@ echo ${ips_config}
mkdir -p exp mkdir -p exp
# default memeory allocator strategy may case gpu training hang
# for no OOM raised when memory exhaused
export FLAGS_allocator_strategy=naive_best_fit
if [ ${ngpu} == 0 ]; then if [ ${ngpu} == 0 ]; then
python3 -u ${BIN_DIR}/train.py \ python3 -u ${BIN_DIR}/train.py \
--ngpu ${ngpu} \ --ngpu ${ngpu} \

@ -42,3 +42,7 @@
```bash ```bash
paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭 paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭
``` ```
- Faster Punctuation Restoration
```bash
paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭 --model ernie_linear_p3_wudao_fast
```

@ -43,3 +43,7 @@
```bash ```bash
paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭 paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭
``` ```
- 快速标点恢复
```bash
paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭 --model ernie_linear_p3_wudao_fast
```

@ -12,6 +12,7 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import argparse import argparse
import io
import os import os
import sys import sys
import time import time
@ -51,7 +52,7 @@ class ASRExecutor(BaseExecutor):
self.parser.add_argument( self.parser.add_argument(
'--model', '--model',
type=str, type=str,
default='conformer_wenetspeech', default='conformer_u2pp_wenetspeech',
choices=[ choices=[
tag[:tag.index('-')] tag[:tag.index('-')]
for tag in self.task_resource.pretrained_models.keys() for tag in self.task_resource.pretrained_models.keys()
@ -229,6 +230,8 @@ class ASRExecutor(BaseExecutor):
audio_file = input audio_file = input
if isinstance(audio_file, (str, os.PathLike)): if isinstance(audio_file, (str, os.PathLike)):
logger.debug("Preprocess audio_file:" + audio_file) logger.debug("Preprocess audio_file:" + audio_file)
elif isinstance(audio_file, io.BytesIO):
audio_file.seek(0)
# Get the object for feature extraction # Get the object for feature extraction
if "deepspeech2" in model_type or "conformer" in model_type or "transformer" in model_type: if "deepspeech2" in model_type or "conformer" in model_type or "transformer" in model_type:
@ -352,6 +355,8 @@ class ASRExecutor(BaseExecutor):
if not os.path.isfile(audio_file): if not os.path.isfile(audio_file):
logger.error("Please input the right audio file path") logger.error("Please input the right audio file path")
return False return False
elif isinstance(audio_file, io.BytesIO):
audio_file.seek(0)
logger.debug("checking the audio file format......") logger.debug("checking the audio file format......")
try: try:
@ -465,7 +470,7 @@ class ASRExecutor(BaseExecutor):
@stats_wrapper @stats_wrapper
def __call__(self, def __call__(self,
audio_file: os.PathLike, audio_file: os.PathLike,
model: str='conformer_wenetspeech', model: str='conformer_u2pp_wenetspeech',
lang: str='zh', lang: str='zh',
sample_rate: int=16000, sample_rate: int=16000,
config: os.PathLike=None, config: os.PathLike=None,

@ -20,10 +20,13 @@ from typing import Optional
from typing import Union from typing import Union
import paddle import paddle
import yaml
from yacs.config import CfgNode
from ..executor import BaseExecutor from ..executor import BaseExecutor
from ..log import logger from ..log import logger
from ..utils import stats_wrapper from ..utils import stats_wrapper
from paddlespeech.text.models.ernie_linear import ErnieLinear
__all__ = ['TextExecutor'] __all__ = ['TextExecutor']
@ -139,6 +142,66 @@ class TextExecutor(BaseExecutor):
self.model.eval() self.model.eval()
#init new models
def _init_from_path_new(self,
task: str='punc',
model_type: str='ernie_linear_p7_wudao',
lang: str='zh',
cfg_path: Optional[os.PathLike]=None,
ckpt_path: Optional[os.PathLike]=None,
vocab_file: Optional[os.PathLike]=None):
if hasattr(self, 'model'):
logger.debug('Model had been initialized.')
return
self.task = task
if cfg_path is None or ckpt_path is None or vocab_file is None:
tag = '-'.join([model_type, task, lang])
self.task_resource.set_task_model(tag, version=None)
self.cfg_path = os.path.join(
self.task_resource.res_dir,
self.task_resource.res_dict['cfg_path'])
self.ckpt_path = os.path.join(
self.task_resource.res_dir,
self.task_resource.res_dict['ckpt_path'])
self.vocab_file = os.path.join(
self.task_resource.res_dir,
self.task_resource.res_dict['vocab_file'])
else:
self.cfg_path = os.path.abspath(cfg_path)
self.ckpt_path = os.path.abspath(ckpt_path)
self.vocab_file = os.path.abspath(vocab_file)
model_name = model_type[:model_type.rindex('_')]
if self.task == 'punc':
# punc list
self._punc_list = []
with open(self.vocab_file, 'r') as f:
for line in f:
self._punc_list.append(line.strip())
# model
with open(self.cfg_path) as f:
config = CfgNode(yaml.safe_load(f))
self.model = ErnieLinear(**config["model"])
_, tokenizer_class = self.task_resource.get_model_class(model_name)
state_dict = paddle.load(self.ckpt_path)
self.model.set_state_dict(state_dict["main_params"])
self.model.eval()
#tokenizer: fast version: ernie-3.0-mini-zh slow version:ernie-1.0
if 'fast' not in model_type:
self.tokenizer = tokenizer_class.from_pretrained('ernie-1.0')
else:
self.tokenizer = tokenizer_class.from_pretrained(
'ernie-3.0-mini-zh')
else:
raise NotImplementedError
def _clean_text(self, text): def _clean_text(self, text):
text = text.lower() text = text.lower()
text = re.sub('[^A-Za-z0-9\u4e00-\u9fa5]', '', text) text = re.sub('[^A-Za-z0-9\u4e00-\u9fa5]', '', text)
@ -179,7 +242,7 @@ class TextExecutor(BaseExecutor):
else: else:
raise NotImplementedError raise NotImplementedError
def postprocess(self) -> Union[str, os.PathLike]: def postprocess(self, isNewTrainer: bool=False) -> Union[str, os.PathLike]:
""" """
Output postprocess and return human-readable results such as texts and audio files. Output postprocess and return human-readable results such as texts and audio files.
""" """
@ -192,13 +255,13 @@ class TextExecutor(BaseExecutor):
input_ids[1:seq_len - 1]) input_ids[1:seq_len - 1])
labels = preds[1:seq_len - 1].tolist() labels = preds[1:seq_len - 1].tolist()
assert len(tokens) == len(labels) assert len(tokens) == len(labels)
if isNewTrainer:
self._punc_list = [0] + self._punc_list
text = '' text = ''
for t, l in zip(tokens, labels): for t, l in zip(tokens, labels):
text += t text += t
if l != 0: # Non punc. if l != 0: # Non punc.
text += self._punc_list[l] text += self._punc_list[l]
return text return text
else: else:
raise NotImplementedError raise NotImplementedError
@ -255,10 +318,20 @@ class TextExecutor(BaseExecutor):
""" """
Python API to call an executor. Python API to call an executor.
""" """
paddle.set_device(device) #Here is old version models
self._init_from_path(task, model, lang, config, ckpt_path, punc_vocab) if model in ['ernie_linear_p7_wudao', 'ernie_linear_p3_wudao']:
self.preprocess(text) paddle.set_device(device)
self.infer() self._init_from_path(task, model, lang, config, ckpt_path,
res = self.postprocess() # Retrieve result of text task. punc_vocab)
self.preprocess(text)
self.infer()
res = self.postprocess() # Retrieve result of text task.
#Add new way to infer
else:
paddle.set_device(device)
self._init_from_path_new(task, model, lang, config, ckpt_path,
punc_vocab)
self.preprocess(text)
self.infer()
res = self.postprocess(isNewTrainer=True)
return res return res

@ -25,6 +25,8 @@ model_alias = {
"deepspeech2online": ["paddlespeech.s2t.models.ds2:DeepSpeech2Model"], "deepspeech2online": ["paddlespeech.s2t.models.ds2:DeepSpeech2Model"],
"conformer": ["paddlespeech.s2t.models.u2:U2Model"], "conformer": ["paddlespeech.s2t.models.u2:U2Model"],
"conformer_online": ["paddlespeech.s2t.models.u2:U2Model"], "conformer_online": ["paddlespeech.s2t.models.u2:U2Model"],
"conformer_u2pp": ["paddlespeech.s2t.models.u2:U2Model"],
"conformer_u2pp_online": ["paddlespeech.s2t.models.u2:U2Model"],
"transformer": ["paddlespeech.s2t.models.u2:U2Model"], "transformer": ["paddlespeech.s2t.models.u2:U2Model"],
"wenetspeech": ["paddlespeech.s2t.models.u2:U2Model"], "wenetspeech": ["paddlespeech.s2t.models.u2:U2Model"],
@ -51,6 +53,10 @@ model_alias = {
"paddlespeech.text.models:ErnieLinear", "paddlespeech.text.models:ErnieLinear",
"paddlenlp.transformers:ErnieTokenizer" "paddlenlp.transformers:ErnieTokenizer"
], ],
"ernie_linear_p3_wudao": [
"paddlespeech.text.models:ErnieLinear",
"paddlenlp.transformers:ErnieTokenizer"
],
# --------------------------------- # ---------------------------------
# -------------- TTS -------------- # -------------- TTS --------------

@ -68,6 +68,46 @@ asr_dynamic_pretrained_models = {
'', '',
}, },
}, },
"conformer_u2pp_wenetspeech-zh-16k": {
'1.1': {
'url':
'https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr1/asr1_chunk_conformer_u2pp_wenetspeech_ckpt_1.1.1.model.tar.gz',
'md5':
'eae678c04ed3b3f89672052fdc0c5e10',
'cfg_path':
'model.yaml',
'ckpt_path':
'exp/chunk_conformer_u2pp/checkpoints/avg_10',
'model':
'exp/chunk_conformer_u2pp/checkpoints/avg_10.pdparams',
'params':
'exp/chunk_conformer_u2pp/checkpoints/avg_10.pdparams',
'lm_url':
'',
'lm_md5':
'',
},
},
"conformer_u2pp_online_wenetspeech-zh-16k": {
'1.1': {
'url':
'https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr1/asr1_chunk_conformer_u2pp_wenetspeech_ckpt_1.1.2.model.tar.gz',
'md5':
'925d047e9188dea7f421a718230c9ae3',
'cfg_path':
'model.yaml',
'ckpt_path':
'exp/chunk_conformer_u2pp/checkpoints/avg_10',
'model':
'exp/chunk_conformer_u2pp/checkpoints/avg_10.pdparams',
'params':
'exp/chunk_conformer_u2pp/checkpoints/avg_10.pdparams',
'lm_url':
'',
'lm_md5':
'',
},
},
"conformer_online_multicn-zh-16k": { "conformer_online_multicn-zh-16k": {
'1.0': { '1.0': {
'url': 'url':
@ -529,7 +569,7 @@ text_dynamic_pretrained_models = {
'ckpt/model_state.pdparams', 'ckpt/model_state.pdparams',
'vocab_file': 'vocab_file':
'punc_vocab.txt', 'punc_vocab.txt',
}, }
}, },
"ernie_linear_p3_wudao-punc-zh": { "ernie_linear_p3_wudao-punc-zh": {
'1.0': { '1.0': {
@ -543,8 +583,22 @@ text_dynamic_pretrained_models = {
'ckpt/model_state.pdparams', 'ckpt/model_state.pdparams',
'vocab_file': 'vocab_file':
'punc_vocab.txt', 'punc_vocab.txt',
}, }
}, },
"ernie_linear_p3_wudao_fast-punc-zh": {
'1.0': {
'url':
'https://paddlespeech.bj.bcebos.com/text/ernie_linear_p3_wudao_fast-punc-zh.tar.gz',
'md5':
'c93f9594119541a5dbd763381a751d08',
'cfg_path':
'ckpt/model_config.json',
'ckpt_path':
'ckpt/model_state.pdparams',
'vocab_file':
'punc_vocab.txt',
}
}
} }
# --------------------------------- # ---------------------------------

@ -40,7 +40,6 @@ class U2Infer():
self.preprocess_conf = config.preprocess_config self.preprocess_conf = config.preprocess_config
self.preprocess_args = {"train": False} self.preprocess_args = {"train": False}
self.preprocessing = Transformation(self.preprocess_conf) self.preprocessing = Transformation(self.preprocess_conf)
self.reverse_weight = getattr(config.model_conf, 'reverse_weight', 0.0)
self.text_feature = TextFeaturizer( self.text_feature = TextFeaturizer(
unit_type=config.unit_type, unit_type=config.unit_type,
vocab=config.vocab_filepath, vocab=config.vocab_filepath,
@ -89,8 +88,7 @@ class U2Infer():
ctc_weight=decode_config.ctc_weight, ctc_weight=decode_config.ctc_weight,
decoding_chunk_size=decode_config.decoding_chunk_size, decoding_chunk_size=decode_config.decoding_chunk_size,
num_decoding_left_chunks=decode_config.num_decoding_left_chunks, num_decoding_left_chunks=decode_config.num_decoding_left_chunks,
simulate_streaming=decode_config.simulate_streaming, simulate_streaming=decode_config.simulate_streaming)
reverse_weight=self.reverse_weight)
rsl = result_transcripts[0][0] rsl = result_transcripts[0][0]
utt = Path(self.audio_file).name utt = Path(self.audio_file).name
logger.info(f"hyp: {utt} {result_transcripts[0][0]}") logger.info(f"hyp: {utt} {result_transcripts[0][0]}")

@ -316,7 +316,6 @@ class U2Tester(U2Trainer):
vocab=self.config.vocab_filepath, vocab=self.config.vocab_filepath,
spm_model_prefix=self.config.spm_model_prefix) spm_model_prefix=self.config.spm_model_prefix)
self.vocab_list = self.text_feature.vocab_list self.vocab_list = self.text_feature.vocab_list
self.reverse_weight = getattr(config.model_conf, 'reverse_weight', 0.0)
def id2token(self, texts, texts_len, text_feature): def id2token(self, texts, texts_len, text_feature):
""" ord() id to chr() chr """ """ ord() id to chr() chr """
@ -351,8 +350,7 @@ class U2Tester(U2Trainer):
ctc_weight=decode_config.ctc_weight, ctc_weight=decode_config.ctc_weight,
decoding_chunk_size=decode_config.decoding_chunk_size, decoding_chunk_size=decode_config.decoding_chunk_size,
num_decoding_left_chunks=decode_config.num_decoding_left_chunks, num_decoding_left_chunks=decode_config.num_decoding_left_chunks,
simulate_streaming=decode_config.simulate_streaming, simulate_streaming=decode_config.simulate_streaming)
reverse_weight=self.reverse_weight)
decode_time = time.time() - start_time decode_time = time.time() - start_time
for utt, target, result, rec_tids in zip( for utt, target, result, rec_tids in zip(

@ -507,16 +507,14 @@ class U2BaseModel(ASRInterface, nn.Layer):
num_decoding_left_chunks, simulate_streaming) num_decoding_left_chunks, simulate_streaming)
return hyps[0][0] return hyps[0][0]
def attention_rescoring( def attention_rescoring(self,
self, speech: paddle.Tensor,
speech: paddle.Tensor, speech_lengths: paddle.Tensor,
speech_lengths: paddle.Tensor, beam_size: int,
beam_size: int, decoding_chunk_size: int=-1,
decoding_chunk_size: int=-1, num_decoding_left_chunks: int=-1,
num_decoding_left_chunks: int=-1, ctc_weight: float=0.0,
ctc_weight: float=0.0, simulate_streaming: bool=False) -> List[int]:
simulate_streaming: bool=False,
reverse_weight: float=0.0, ) -> List[int]:
""" Apply attention rescoring decoding, CTC prefix beam search """ Apply attention rescoring decoding, CTC prefix beam search
is applied first to get nbest, then we resoring the nbest on is applied first to get nbest, then we resoring the nbest on
attention decoder with corresponding encoder out attention decoder with corresponding encoder out
@ -536,7 +534,7 @@ class U2BaseModel(ASRInterface, nn.Layer):
""" """
assert speech.shape[0] == speech_lengths.shape[0] assert speech.shape[0] == speech_lengths.shape[0]
assert decoding_chunk_size != 0 assert decoding_chunk_size != 0
if reverse_weight > 0.0: if self.reverse_weight > 0.0:
# decoder should be a bitransformer decoder if reverse_weight > 0.0 # decoder should be a bitransformer decoder if reverse_weight > 0.0
assert hasattr(self.decoder, 'right_decoder') assert hasattr(self.decoder, 'right_decoder')
device = speech.place device = speech.place
@ -574,7 +572,7 @@ class U2BaseModel(ASRInterface, nn.Layer):
self.eos) self.eos)
decoder_out, r_decoder_out, _ = self.decoder( decoder_out, r_decoder_out, _ = self.decoder(
encoder_out, encoder_mask, hyps_pad, hyps_lens, r_hyps_pad, encoder_out, encoder_mask, hyps_pad, hyps_lens, r_hyps_pad,
reverse_weight) # (beam_size, max_hyps_len, vocab_size) self.reverse_weight) # (beam_size, max_hyps_len, vocab_size)
# ctc score in ln domain # ctc score in ln domain
decoder_out = paddle.nn.functional.log_softmax(decoder_out, axis=-1) decoder_out = paddle.nn.functional.log_softmax(decoder_out, axis=-1)
decoder_out = decoder_out.numpy() decoder_out = decoder_out.numpy()
@ -594,12 +592,13 @@ class U2BaseModel(ASRInterface, nn.Layer):
score += decoder_out[i][j][w] score += decoder_out[i][j][w]
# last decoder output token is `eos`, for laste decoder input token. # last decoder output token is `eos`, for laste decoder input token.
score += decoder_out[i][len(hyp[0])][self.eos] score += decoder_out[i][len(hyp[0])][self.eos]
if reverse_weight > 0: if self.reverse_weight > 0:
r_score = 0.0 r_score = 0.0
for j, w in enumerate(hyp[0]): for j, w in enumerate(hyp[0]):
r_score += r_decoder_out[i][len(hyp[0]) - j - 1][w] r_score += r_decoder_out[i][len(hyp[0]) - j - 1][w]
r_score += r_decoder_out[i][len(hyp[0])][self.eos] r_score += r_decoder_out[i][len(hyp[0])][self.eos]
score = score * (1 - reverse_weight) + r_score * reverse_weight score = score * (1 - self.reverse_weight
) + r_score * self.reverse_weight
# add ctc score (which in ln domain) # add ctc score (which in ln domain)
score += hyp[1] * ctc_weight score += hyp[1] * ctc_weight
if score > best_score: if score > best_score:
@ -748,8 +747,7 @@ class U2BaseModel(ASRInterface, nn.Layer):
ctc_weight: float=0.0, ctc_weight: float=0.0,
decoding_chunk_size: int=-1, decoding_chunk_size: int=-1,
num_decoding_left_chunks: int=-1, num_decoding_left_chunks: int=-1,
simulate_streaming: bool=False, simulate_streaming: bool=False):
reverse_weight: float=0.0):
"""u2 decoding. """u2 decoding.
Args: Args:
@ -821,8 +819,7 @@ class U2BaseModel(ASRInterface, nn.Layer):
decoding_chunk_size=decoding_chunk_size, decoding_chunk_size=decoding_chunk_size,
num_decoding_left_chunks=num_decoding_left_chunks, num_decoding_left_chunks=num_decoding_left_chunks,
ctc_weight=ctc_weight, ctc_weight=ctc_weight,
simulate_streaming=simulate_streaming, simulate_streaming=simulate_streaming)
reverse_weight=reverse_weight)
hyps = [hyp] hyps = [hyp]
else: else:
raise ValueError(f"Not support decoding method: {decoding_method}") raise ValueError(f"Not support decoding method: {decoding_method}")

@ -30,7 +30,7 @@ asr_online:
decode_method: decode_method:
num_decoding_left_chunks: -1 num_decoding_left_chunks: -1
force_yes: True force_yes: True
device: # cpu or gpu:id device: cpu # cpu or gpu:id
continuous_decoding: True # enable continue decoding when endpoint detected continuous_decoding: True # enable continue decoding when endpoint detected
am_predictor_conf: am_predictor_conf:

@ -22,6 +22,7 @@ from numpy import float32
from yacs.config import CfgNode from yacs.config import CfgNode
from paddlespeech.audio.transform.transformation import Transformation from paddlespeech.audio.transform.transformation import Transformation
from paddlespeech.audio.utils.tensor_utils import st_reverse_pad_list
from paddlespeech.cli.asr.infer import ASRExecutor from paddlespeech.cli.asr.infer import ASRExecutor
from paddlespeech.cli.log import logger from paddlespeech.cli.log import logger
from paddlespeech.resource import CommonTaskResource from paddlespeech.resource import CommonTaskResource
@ -602,24 +603,31 @@ class PaddleASRConnectionHanddler:
hyps_pad = pad_sequence( hyps_pad = pad_sequence(
hyp_list, batch_first=True, padding_value=self.model.ignore_id) hyp_list, batch_first=True, padding_value=self.model.ignore_id)
ori_hyps_pad = hyps_pad
hyps_lens = paddle.to_tensor( hyps_lens = paddle.to_tensor(
[len(hyp[0]) for hyp in hyps], place=self.device, [len(hyp[0]) for hyp in hyps], place=self.device,
dtype=paddle.long) # (beam_size,) dtype=paddle.long) # (beam_size,)
hyps_pad, _ = add_sos_eos(hyps_pad, self.model.sos, self.model.eos, hyps_pad, _ = add_sos_eos(hyps_pad, self.model.sos, self.model.eos,
self.model.ignore_id) self.model.ignore_id)
hyps_lens = hyps_lens + 1 # Add <sos> at begining hyps_lens = hyps_lens + 1 # Add <sos> at begining
encoder_out = self.encoder_out.repeat(beam_size, 1, 1) encoder_out = self.encoder_out.repeat(beam_size, 1, 1)
encoder_mask = paddle.ones( encoder_mask = paddle.ones(
(beam_size, 1, encoder_out.shape[1]), dtype=paddle.bool) (beam_size, 1, encoder_out.shape[1]), dtype=paddle.bool)
decoder_out, _, _ = self.model.decoder( r_hyps_pad = st_reverse_pad_list(ori_hyps_pad, hyps_lens - 1,
encoder_out, encoder_mask, hyps_pad, self.model.sos, self.model.eos)
hyps_lens) # (beam_size, max_hyps_len, vocab_size) decoder_out, r_decoder_out, _ = self.model.decoder(
encoder_out, encoder_mask, hyps_pad, hyps_lens, r_hyps_pad,
self.model.reverse_weight) # (beam_size, max_hyps_len, vocab_size)
# ctc score in ln domain # ctc score in ln domain
decoder_out = paddle.nn.functional.log_softmax(decoder_out, axis=-1) decoder_out = paddle.nn.functional.log_softmax(decoder_out, axis=-1)
decoder_out = decoder_out.numpy() decoder_out = decoder_out.numpy()
# r_decoder_out will be 0.0, if reverse_weight is 0.0 or decoder is a
# conventional transformer decoder.
r_decoder_out = paddle.nn.functional.log_softmax(r_decoder_out, axis=-1)
r_decoder_out = r_decoder_out.numpy()
# Only use decoder score for rescoring # Only use decoder score for rescoring
best_score = -float('inf') best_score = -float('inf')
best_index = 0 best_index = 0
@ -631,6 +639,13 @@ class PaddleASRConnectionHanddler:
# last decoder output token is `eos`, for laste decoder input token. # last decoder output token is `eos`, for laste decoder input token.
score += decoder_out[i][len(hyp[0])][self.model.eos] score += decoder_out[i][len(hyp[0])][self.model.eos]
if self.model.reverse_weight > 0:
r_score = 0.0
for j, w in enumerate(hyp[0]):
r_score += r_decoder_out[i][len(hyp[0]) - j - 1][w]
r_score += r_decoder_out[i][len(hyp[0])][self.model.eos]
score = score * (1 - self.model.reverse_weight
) + r_score * self.model.reverse_weight
# add ctc score (which in ln domain) # add ctc score (which in ln domain)
score += hyp[1] * self.ctc_decode_config.ctc_weight score += hyp[1] * self.ctc_decode_config.ctc_weight

@ -107,11 +107,14 @@ class PaddleTextConnectionHandler:
assert len(tokens) == len(labels) assert len(tokens) == len(labels)
text = '' text = ''
is_fast_model = 'fast' in self.text_engine.config.model_type
for t, l in zip(tokens, labels): for t, l in zip(tokens, labels):
text += t text += t
if l != 0: # Non punc. if l != 0: # Non punc.
text += self._punc_list[l] if is_fast_model:
text += self._punc_list[l - 1]
else:
text += self._punc_list[l]
return text return text
else: else:
raise NotImplementedError raise NotImplementedError
@ -160,14 +163,23 @@ class TextEngine(BaseEngine):
return False return False
self.executor = TextServerExecutor() self.executor = TextServerExecutor()
self.executor._init_from_path( if 'fast' in config.model_type:
task=config.task, self.executor._init_from_path_new(
model_type=config.model_type, task=config.task,
lang=config.lang, model_type=config.model_type,
cfg_path=config.cfg_path, lang=config.lang,
ckpt_path=config.ckpt_path, cfg_path=config.cfg_path,
vocab_file=config.vocab_file) ckpt_path=config.ckpt_path,
vocab_file=config.vocab_file)
else:
self.executor._init_from_path(
task=config.task,
model_type=config.model_type,
lang=config.lang,
cfg_path=config.cfg_path,
ckpt_path=config.ckpt_path,
vocab_file=config.vocab_file)
logger.info("Using model: %s." % (config.model_type))
logger.info("Initialize Text server engine successfully on device: %s." logger.info("Initialize Text server engine successfully on device: %s."
% (self.device)) % (self.device))
return True return True

@ -7,7 +7,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav https://paddlespe
paddlespeech cls --input ./cat.wav --topk 10 paddlespeech cls --input ./cat.wav --topk 10
# Punctuation_restoration # Punctuation_restoration
paddlespeech text --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭 paddlespeech text --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭 --model ernie_linear_p3_wudao_fast
# Speech_recognition # Speech_recognition
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav

Loading…
Cancel
Save