Merge branch 'develop' into dev-hym

3 years ago · 474373bceb
parent 2938d3e49b b4c6a52beb
commit 474373bceb
570 changed files with 15750 additions and 63672 deletions
--- a/.gitignore
+++ b/.gitignore
@ -39,6 +39,9 @@ tools/env.sh
 tools/openfst-1.8.1/
 tools/libsndfile/
 tools/python-soundfile/
+tools/onnx
+tools/onnxruntime
+tools/Paddle2ONNX

 speechx/fc_patch/

--- a/.mergify.yml
+++ b/.mergify.yml
@ -52,7 +52,7 @@ pull_request_rules:
        add: ["T2S"]
  - name: "auto add label=Audio"
    conditions:
-      - files~=^paddleaudio/
+      - files~=^paddlespeech/audio/
    actions:
      label:
        add: ["Audio"]
@ -100,7 +100,7 @@ pull_request_rules:
        add: ["README"]
  - name: "auto add label=Documentation"
    conditions:
-      - files~=^(docs/|CHANGELOG.md|paddleaudio/CHANGELOG.md)
+      - files~=^(docs/|CHANGELOG.md)
    actions:
      label:
        add: ["Documentation"]
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@ -51,12 +51,12 @@ repos:
        language: system
        files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx|cuh|proto)$
        exclude: (?=speechx/speechx/kaldi|speechx/patch|speechx/tools/fstbin|speechx/tools/lmbin).*(\.cpp|\.cc|\.h|\.py)$
-    -   id: copyright_checker
-        name: copyright_checker
-        entry: python .pre-commit-hooks/copyright-check.hook
-        language: system
-        files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx|proto|py)$
-        exclude: (?=third_party|pypinyin|speechx/speechx/kaldi|speechx/patch|speechx/tools/fstbin|speechx/tools/lmbin).*(\.cpp|\.cc|\.h|\.py)$
+    #-   id: copyright_checker
+    #    name: copyright_checker
+    #    entry: python .pre-commit-hooks/copyright-check.hook
+    #    language: system
+    #    files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx|proto|py)$
+    #    exclude: (?=third_party|pypinyin|speechx/speechx/kaldi|speechx/patch|speechx/tools/fstbin|speechx/tools/lmbin).*(\.cpp|\.cc|\.h|\.py)$
 -   repo: https://github.com/asottile/reorder_python_imports
    rev: v2.4.0
    hooks:
--- a/.pre-commit-hooks/copyright-check.hook
+++ b/.pre-commit-hooks/copyright-check.hook
@ -19,7 +19,7 @@ import subprocess
 import platform

 COPYRIGHT = '''
-Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.

 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
--- a/README.md
+++ b/README.md
@ -1,7 +1,4 @@
 ([简体中文](./README_cn.md)|English)
-
-
-
 <p align="center">
  <img src="./docs/images/PaddleSpeech_logo.png" />
 </p>
@ -20,20 +17,19 @@
    <a href="https://huggingface.co/spaces"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a>
 </p>
 <div align="center">  
-<h3>
-  | <a href="#quick-start"> Quick Start </a>
+<h4>
+    <a href="#quick-start"> Quick Start </a>
  | <a href="#quick-start-server"> Quick Start Server </a>
  | <a href="#quick-start-streaming-server"> Quick Start Streaming Server</a>
-  |
-  </br>
  | <a href="#documents"> Documents </a>
  | <a href="#model-list"> Models List </a>
-  |
-</h3>
+  | <a href="https://aistudio.baidu.com/aistudio/education/group/info/25130"> AIStudio Courses </a>
+  | <a href="https://arxiv.org/abs/2205.12007"> Paper </a>
+  | <a href="https://gitee.com/paddlepaddle/PaddleSpeech"> Gitee </a>
+</h4>
 </div>

-
-
+------------------------------------------------------------------------------------

 **PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models.

@ -170,23 +166,12 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
 - 🤗  2021.12.14: [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
 - 👏🏻  2021.12.10: `CLI` is available for `Audio Classification`, `Automatic Speech Recognition`, `Speech Translation (English to Chinese)` and `Text-to-Speech`.

-### 🔥 Hot Activities
-
-<!---
-2021.12.14: We would like to have an online courses to introduce basics and research of speech, as well as code practice with `paddlespeech`. Please pay attention to our [Calendar](https://www.paddlepaddle.org.cn/live).
--->
-
- 2021.12.21~12.24
-
-  4 Days Live Courses: Depth interpretation of PaddleSpeech!
-
-  **Courses videos and related materials: https://aistudio.baidu.com/aistudio/education/group/info/25130**

 ### Community
- Scan the QR code below with your Wechat (reply【语音】after your friend's application is approved), you can access to official technical exchange group. Look forward to your participation.
+- Scan the QR code below with your Wechat, you can access to official technical exchange group and get the bonus ( more than 20GB learning materials, such as papers, codes and videos ) and the live link of the lessons. Look forward to your participation.

 <div align="center">
-<img src="https://raw.githubusercontent.com/yt605155624/lanceTest/main/images/wechat_4.jpg"  width = "300"  />
+<img src="https://user-images.githubusercontent.com/23690325/169763015-cbd8e28d-602c-4723-810d-dbc6da49441e.jpg"  width = "200"  />
 </div>

 ## Installation
--- a/README_cn.md
+++ b/README_cn.md
@ -18,40 +18,21 @@
    <a href="https://huggingface.co/spaces"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a>
 </p>
 <div align="center">  
-<h3>
-  <a href="#quick-start"> Quick Start </a>
-  | <a href="#quick-start-server"> Quick Start Server </a>
-  | <a href="#quick-start-streaming-server"> Quick Start Streaming Server</a>
-  </br>
-  <a href="#documents"> Documents </a>
-  | <a href="#model-list"> Models List </a>
-</h3>
+<h4>
+    <a href="#快速开始"> 快速开始 </a>
+  | <a href="#快速使用服务"> 快速使用服务 </a>
+  | <a href="#快速使用流式服务"> 快速使用流式服务 </a>
+  | <a href="#教程文档"> 教程文档 </a>
+  | <a href="#模型列表"> 模型列表 </a>
+  | <a href="https://aistudio.baidu.com/aistudio/education/group/info/25130"> AIStudio 课程 </a>
+  | <a href="https://arxiv.org/abs/2205.12007"> 论文 </a>
+  | <a href="https://gitee.com/paddlepaddle/PaddleSpeech"> Gitee 
+</h4>
 </div>


 ------------------------------------------------------------------------------------

-<div align="center">  
-  <h3>
-  <a href="#quick-start"> 快速开始 </a>
-  | <a href="#quick-start-server"> 快速使用服务 </a>
-  | <a href="#quick-start-streaming-server"> 快速使用流式服务 </a>
-  | <a href="#documents"> 教程文档 </a>
-  | <a href="#model-list"> 模型列表 </a>
-</div>
-
-
-
-<!---
-from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readmes-readable.md
-1.What is this repo or project? (You can reuse the repo description you used earlier because this section doesn’t have to be long.)
-2.How does it work?
-3.Who will use this repo or project?
-4.What is the goal of this project?
-->
-
-
-
 **PaddleSpeech** 是基于飞桨 [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) 的语音方向的开源模型库，用于语音和音频中的各种关键任务的开发，包含大量基于深度学习前沿和有影响力的模型，一些典型的应用示例如下：
 ##### 语音识别

@ -179,38 +160,30 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme

 ### 近期更新

-<!---
-2021.12.14: We would like to have an online courses to introduce basics and research of speech, as well as code practice with `paddlespeech`. Please pay attention to our [Calendar](https://www.paddlepaddle.org.cn/live).
--->
- 👑 2022.05.13: PaddleSpeech 发布 [PP-ASR](./docs/source/asr/PPASR_cn.md)、[PP-TTS](./docs/source/tts/PPTTS_cn.md)、[PP-VPR](docs/source/vpr/PPVPR_cn.md)
+- 👑 2022.05.13: PaddleSpeech 发布 [PP-ASR](./docs/source/asr/PPASR_cn.md) 流式语音识别系统、[PP-TTS](./docs/source/tts/PPTTS_cn.md) 流式语音合成系统、[PP-VPR](docs/source/vpr/PPVPR_cn.md) 全链路声纹识别系统
 - 👏🏻 2022.05.06: PaddleSpeech Streaming Server 上线! 覆盖了语音识别（标点恢复、时间戳），和语音合成。
 - 👏🏻 2022.05.06: PaddleSpeech Server 上线! 覆盖了声音分类、语音识别、语音合成、声纹识别，标点恢复。
 - 👏🏻 2022.03.28: PaddleSpeech CLI 覆盖声音分类、语音识别、语音翻译（英译中）、语音合成，声纹验证。
 - 🤗 2021.12.14: PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!

-### 🔥 热门活动

- 2021.12.21~12.24
+ ### 🔥 加入技术交流群获取入群福利

-  4 日直播课: 深度解读 PaddleSpeech 语音技术!
-
-  **直播回放与课件资料: https://aistudio.baidu.com/aistudio/education/group/info/25130**
-
-
-### 技术交流群
-微信扫描二维码（好友申请通过后回复【语音】）加入官方交流群，获得更高效的问题答疑，与各行各业开发者充分交流，期待您的加入。
+ - 3 日直播课链接: 深度解读 PP-TTS、PP-ASR、PP-VPR 三项核心语音系统关键技术
+ - 20G 学习大礼包：视频课程、前沿论文与学习资料
+  
+微信扫描二维码关注公众号，点击“马上报名”填写问卷加入官方交流群，获得更高效的问题答疑，与各行各业开发者充分交流，期待您的加入。

 <div align="center">
-<img src="https://raw.githubusercontent.com/yt605155624/lanceTest/main/images/wechat_4.jpg"  width = "300"  />
+<img src="https://user-images.githubusercontent.com/23690325/169763015-cbd8e28d-602c-4723-810d-dbc6da49441e.jpg"  width = "200"  />
 </div>

-
 ## 安装

 我们强烈建议用户在 **Linux** 环境下，*3.7* 以上版本的 *python* 上安装 PaddleSpeech。
 目前为止，**Linux** 支持声音分类、语音识别、语音合成和语音翻译四种功能，**Mac OSX、 Windows** 下暂不支持语音翻译功能。 想了解具体安装细节，可以参考[安装文档](./docs/source/install_cn.md)。

-
+<a name="快速开始"></a>
 ## 快速开始

 安装完成后，开发者可以通过命令行快速开始，改变 `--input` 可以尝试用自己的音频或文本测试。
@ -257,7 +230,7 @@ paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
 更多命令行命令请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos)
 > Note: 如果需要训练或者微调，请查看[语音识别](./docs/source/asr/quick_start.md)， [语音合成](./docs/source/tts/quick_start.md)。

-
+<a name="快速使用服务"></a>
 ## 快速使用服务
 安装完成后，开发者可以通过命令行快速使用服务。

@ -283,30 +256,30 @@ paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav

 更多服务相关的命令行使用信息，请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server)

-<a name="quickstartstreamingserver"></a>
+<a name="快速使用流式服务"></a>
 ## 快速使用流式服务

-开发者可以尝试[流式ASR](./demos/streaming_asr_server/README.md)和 [流式TTS](./demos/streaming_tts_server/README.md)服务.
+开发者可以尝试 [流式 ASR](./demos/streaming_asr_server/README.md) 和 [流式 TTS](./demos/streaming_tts_server/README.md) 服务.

-**启动流式ASR服务**
+**启动流式 ASR 服务**

 ```
 paddlespeech_server start --config_file ./demos/streaming_asr_server/conf/application.yaml
 ```

-**访问流式ASR服务**     
+**访问流式 ASR 服务**     

 ```
 paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input input_16k.wav
 ```

-**启动流式TTS服务**
+**启动流式 TTS 服务**

 ```
 paddlespeech_server start --config_file ./demos/streaming_tts_server/conf/tts_online_application.yaml
 ```

-**访问流式TTS服务**     
+**访问流式 TTS 服务**     

 ```
 paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav
@ -314,8 +287,7 @@ paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http

 更多信息参看： [流式 ASR](./demos/streaming_asr_server/README.md) 和 [流式 TTS](./demos/streaming_tts_server/README.md) 

-<a name="modulelist"></a>
-
+<a name="模型列表"></a>
 ## 模型列表
 PaddleSpeech 支持很多主流的模型，并提供了预训练模型，详情请见[模型列表](./docs/source/released_model.md)。

@ -587,6 +559,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
  </tbody>
 </table>

+<a name="教程文档"></a>
 ## 教程文档

 对于 PaddleSpeech 的所关注的任务，以下指南有助于帮助开发者快速入门，了解语音相关核心思想。
@ -668,7 +641,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
 <a name="欢迎贡献"></a>
 ## 参与 PaddleSpeech 的开发

-热烈欢迎您在[Discussions](https://github.com/PaddlePaddle/PaddleSpeech/discussions) 中提交问题，并在[Issues](https://github.com/PaddlePaddle/PaddleSpeech/issues) 中指出发现的 bug。此外，我们非常希望您参与到 PaddleSpeech 的开发中！
+热烈欢迎您在 [Discussions](https://github.com/PaddlePaddle/PaddleSpeech/discussions) 中提交问题，并在 [Issues](https://github.com/PaddlePaddle/PaddleSpeech/issues) 中指出发现的 bug。此外，我们非常希望您参与到 PaddleSpeech 的开发中！

 ### 贡献者
 <p align="center">
--- a/audio/.gitignore
+++ b/audio/.gitignore
@ -1,2 +0,0 @@
-.eggs
-*.wav
--- a/audio/CHANGELOG.md
+++ b/audio/CHANGELOG.md
@ -1,9 +0,0 @@
-# Changelog
-
-Date: 2022-3-15, Author: Xiaojie Chen.
-  - kaldi and librosa mfcc, fbank, spectrogram.
-  - unit test and benchmark.
-
-Date: 2022-2-25, Author: Hui Zhang.
-  - Refactor architecture.
-  - dtw distance and mcd style dtw.
--- a/audio/README.md
+++ b/audio/README.md
@ -1,7 +0,0 @@
-# PaddleAudio
-
-PaddleAudio is an audio library for PaddlePaddle.
-
-## Install
-
-`pip install .`
--- a/audio/docs/Makefile
+++ b/audio/docs/Makefile
@ -1,19 +0,0 @@
-# Minimal makefile for Sphinx documentation
-#
-
-# You can set these variables from the command line.
-SPHINXOPTS    =
-SPHINXBUILD   = sphinx-build
-SOURCEDIR     = source
-BUILDDIR      = build
-
-# Put it first so that "make" without argument is like "make help".
-help:
-	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
-
-.PHONY: help Makefile
-
-# Catch-all target: route all unknown targets to Sphinx using the new
-# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
-%: Makefile
-	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
--- a/audio/docs/README.md
+++ b/audio/docs/README.md
@ -1,24 +0,0 @@
-# Build docs for PaddleAudio
-
-Execute the following steps in **current directory**.
-
-## 1. Install
-
-`pip install Sphinx sphinx_rtd_theme`
-
-
-## 2. Generate API docs
-
-Generate API docs from doc string.
-
-`sphinx-apidoc -fMeT -o source ../paddleaudio ../paddleaudio/utils --templatedir source/_templates`
-
-
-## 3. Build
-
-`sphinx-build source _html`
-
-
-## 4. Preview
-
-Open `_html/index.html` for page preview.
--- a/audio/docs/images/paddle.png
+++ b/audio/docs/images/paddle.png
--- a/audio/docs/make.bat
+++ b/audio/docs/make.bat
@ -1,35 +0,0 @@
-@ECHO OFF
-
-pushd %~dp0
-
-REM Command file for Sphinx documentation
-
-if "%SPHINXBUILD%" == "" (
-	set SPHINXBUILD=sphinx-build
-)
-set SOURCEDIR=source
-set BUILDDIR=build
-
-if "%1" == "" goto help
-
-%SPHINXBUILD% >NUL 2>NUL
-if errorlevel 9009 (
-	echo.
-	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
-	echo.installed, then set the SPHINXBUILD environment variable to point
-	echo.to the full path of the 'sphinx-build' executable. Alternatively you
-	echo.may add the Sphinx directory to PATH.
-	echo.
-	echo.If you don't have Sphinx installed, grab it from
-	echo.http://sphinx-doc.org/
-	exit /b 1
-)
-
-%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
-goto end
-
-:help
-%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
-
-:end
-popd
--- a/audio/paddleaudio/metric/dtw.py
+++ b/audio/paddleaudio/metric/dtw.py
@ -1,44 +0,0 @@
-# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-import numpy as np
-from dtaidistance import dtw_ndim
-
-__all__ = [
-    'dtw_distance',
-]
-
-
-def dtw_distance(xs: np.ndarray, ys: np.ndarray) -> float:
-    """Dynamic Time Warping.
-    This function keeps a compact matrix, not the full warping paths matrix.
-    Uses dynamic programming to compute:
-
-    Examples:
-        .. code-block:: python
-
-            wps[i, j] = (s1[i]-s2[j])**2 + min(
-                            wps[i-1, j  ] + penalty,  // vertical   / insertion / expansion
-                            wps[i  , j-1] + penalty,  // horizontal / deletion  / compression
-                            wps[i-1, j-1])            // diagonal   / match
-
-            dtw = sqrt(wps[-1, -1])
-
-    Args:
-        xs (np.ndarray): ref sequence, [T,D]
-        ys (np.ndarray): hyp sequence, [T,D]
-
-    Returns:
-        float: dtw distance
-    """
-    return dtw_ndim.distance(xs, ys)
--- a/audio/paddleaudio/utils/env.py
+++ b/audio/paddleaudio/utils/env.py
@ -1,60 +0,0 @@
-# Copyright (c) 2021  PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License"
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-'''
-This module is used to store environmental variables in PaddleAudio.
-PPAUDIO_HOME     -->  the root directory for storing PaddleAudio related data. Default to ~/.paddleaudio. Users can change the
-├                            default value through the PPAUDIO_HOME environment variable.
-├─ MODEL_HOME    -->  Store model files.
-└─ DATA_HOME     -->  Store automatically downloaded datasets.
-'''
-import os
-
-__all__ = [
-    'USER_HOME',
-    'PPAUDIO_HOME',
-    'MODEL_HOME',
-    'DATA_HOME',
-]
-
-
-def _get_user_home():
-    return os.path.expanduser('~')
-
-
-def _get_ppaudio_home():
-    if 'PPAUDIO_HOME' in os.environ:
-        home_path = os.environ['PPAUDIO_HOME']
-        if os.path.exists(home_path):
-            if os.path.isdir(home_path):
-                return home_path
-            else:
-                raise RuntimeError(
-                    'The environment variable PPAUDIO_HOME {} is not a directory.'.
-                    format(home_path))
-        else:
-            return home_path
-    return os.path.join(_get_user_home(), '.paddleaudio')
-
-
-def _get_sub_home(directory):
-    home = os.path.join(_get_ppaudio_home(), directory)
-    if not os.path.exists(home):
-        os.makedirs(home)
-    return home
-
-
-USER_HOME = _get_user_home()
-PPAUDIO_HOME = _get_ppaudio_home()
-MODEL_HOME = _get_sub_home('models')
-DATA_HOME = _get_sub_home('datasets')
--- a/audio/setup.py
+++ b/audio/setup.py
@ -1,99 +0,0 @@
-# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-import glob
-import os
-
-import setuptools
-from setuptools.command.install import install
-from setuptools.command.test import test
-
-# set the version here
-VERSION = '0.0.0'
-
-
-# Inspired by the example at https://pytest.org/latest/goodpractises.html
-class TestCommand(test):
-    def finalize_options(self):
-        test.finalize_options(self)
-        self.test_args = []
-        self.test_suite = True
-
-    def run(self):
-        self.run_benchmark()
-        super(TestCommand, self).run()
-
-    def run_tests(self):
-        # Run nose ensuring that argv simulates running nosetests directly
-        import nose
-        nose.run_exit(argv=['nosetests', '-w', 'tests'])
-
-    def run_benchmark(self):
-        for benchmark_item in glob.glob('tests/benchmark/*py'):
-            os.system(f'pytest {benchmark_item}')
-
-
-class InstallCommand(install):
-    def run(self):
-        install.run(self)
-
-
-def write_version_py(filename='paddleaudio/__init__.py'):
-    with open(filename, "a") as f:
-        f.write(f"__version__ = '{VERSION}'")
-
-
-def remove_version_py(filename='paddleaudio/__init__.py'):
-    with open(filename, "r") as f:
-        lines = f.readlines()
-    with open(filename, "w") as f:
-        for line in lines:
-            if "__version__" not in line:
-                f.write(line)
-
-
-remove_version_py()
-write_version_py()
-
-setuptools.setup(
-    name="paddleaudio",
-    version=VERSION,
-    author="",
-    author_email="",
-    description="PaddleAudio, in development",
-    long_description="",
-    long_description_content_type="text/markdown",
-    url="",
-    packages=setuptools.find_packages(include=['paddleaudio*']),
-    classifiers=[
-        "Programming Language :: Python :: 3",
-        "License :: OSI Approved :: MIT License",
-        "Operating System :: OS Independent",
-    ],
-    python_requires='>=3.6',
-    install_requires=[
-        'numpy >= 1.15.0', 'scipy >= 1.0.0', 'resampy >= 0.2.2',
-        'soundfile >= 0.9.0', 'colorlog', 'dtaidistance == 2.3.1', 'pathos'
-    ],
-    extras_require={
-        'test': [
-            'nose', 'librosa==0.8.1', 'soundfile==0.10.3.post1',
-            'torchaudio==0.10.2', 'pytest-benchmark'
-        ],
-    },
-    cmdclass={
-        'install': InstallCommand,
-        'test': TestCommand,
-    }, )
-
-remove_version_py()
--- a/audio/tests/.gitkeep
+++ b/audio/tests/.gitkeep
--- a/demos/README.md
+++ b/demos/README.md
@ -2,14 +2,14 @@

 ([简体中文](./README_cn.md)|English)

-The directory containes many speech applications in multi scenarios.
+This directory contains many speech applications in multiple scenarios.

 * audio searching - mass audio similarity retrieval
 * audio tagging - multi-label tagging of an audio file
-* automatic_video_subtitiles - generate subtitles from a video
+* automatic_video_subtitles - generate subtitles from a video
 * metaverse - 2D AR with TTS  
 * punctuation_restoration - restore punctuation from raw text
-* speech recogintion - recognize text of an audio file 
+* speech recognition - recognize text of an audio file 
 * speech server - Server for Speech Task, e.g. ASR,TTS,CLS
 * streaming asr server - receive audio stream from websocket, and recognize to transcript.
 * speech translation - end to end speech translation  
--- a/demos/audio_content_search/README.md
+++ b/demos/audio_content_search/README.md
@ -16,7 +16,12 @@ see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/doc

 You can choose one way from meduim and hard to install paddlespeech.

-The dependency refers to the requirements.txt
+The dependency refers to the requirements.txt, and install the dependency as follows:
+
+```
+pip install -r requriement.txt 
+```
+
 ### 2. Prepare Input File
 The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.

--- a/demos/audio_content_search/README_cn.md
+++ b/demos/audio_content_search/README_cn.md
@ -16,7 +16,11 @@
 请看[安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)。

 你可以从 medium，hard 三中方式中选择一种方式安装。
-依赖参见 requirements.txt
+依赖参见 requirements.txt, 安装依赖
+
+```
+pip install -r requriement.txt 
+```

 ### 2. 准备输入
 这个 demo 的输入应该是一个 WAV 文件（`.wav`），并且采样率必须与模型的采样率相同。
--- a/demos/audio_content_search/conf/acs_application.yaml
+++ b/demos/audio_content_search/conf/acs_application.yaml
@ -28,6 +28,7 @@ acs_python:
    word_list: "./conf/words.txt"
    sample_rate: 16000
    device: 'cpu' # set 'gpu:id' or 'cpu'
+    ping_timeout: 100 # seconds



--- a/demos/audio_content_search/requirements.txt
+++ b/demos/audio_content_search/requirements.txt
@ -0,0 +1 @@
+websocket-client
--- a/demos/audio_content_search/streaming_asr_server.py
+++ b/demos/audio_content_search/streaming_asr_server.py
@ -0,0 +1,38 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+
+from paddlespeech.cli.log import logger
+from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        prog='paddlespeech_server.start', add_help=True)
+    parser.add_argument(
+        "--config_file",
+        action="store",
+        help="yaml file of the app",
+        default=None,
+        required=True)
+
+    parser.add_argument(
+        "--log_file",
+        action="store",
+        help="log file",
+        default="./log/paddlespeech.log")
+    logger.info("start to parse the args")
+    args = parser.parse_args()
+
+    logger.info("start to launch the streaming asr server")
+    streaming_asr_server = ServerExecutor()
+    streaming_asr_server(config_file=args.config_file, log_file=args.log_file)
--- a/demos/audio_searching/README.md
+++ b/demos/audio_searching/README.md
@ -89,7 +89,7 @@ Then to start the system server, and it provides HTTP backend services.
  Then start the server with Fastapi.

  ```bash
-  export PYTHONPATH=$PYTHONPATH:./src:../../paddleaudio
+  export PYTHONPATH=$PYTHONPATH:./src
  python src/audio_search.py
  ```

--- a/demos/audio_searching/README_cn.md
+++ b/demos/audio_searching/README_cn.md
@ -91,7 +91,7 @@ ffce340b3790  minio/minio:RELEASE.2020-12-03T00-03-10Z  "/usr/bin/docker-ent…"
  启动用 Fastapi 构建的服务

  ```bash
-  export PYTHONPATH=$PYTHONPATH:./src:../../paddleaudio
+  export PYTHONPATH=$PYTHONPATH:./src
  python src/audio_search.py
  ```

--- a/demos/audio_searching/src/encode.py
+++ b/demos/audio_searching/src/encode.py
@ -14,7 +14,7 @@
 import numpy as np
 from logs import LOGGER

-from paddlespeech.cli import VectorExecutor
+from paddlespeech.cli.vector import VectorExecutor

 vector_executor = VectorExecutor()

--- a/demos/audio_searching/src/operations/load.py
+++ b/demos/audio_searching/src/operations/load.py
@ -26,9 +26,8 @@ def get_audios(path):
    """
    supported_formats = [".wav", ".mp3", ".ogg", ".flac", ".m4a"]
    return [
-        item
-        for sublist in [[os.path.join(dir, file) for file in files]
-                        for dir, _, files in list(os.walk(path))]
+        item for sublist in [[os.path.join(dir, file) for file in files]
+                             for dir, _, files in list(os.walk(path))]
        for item in sublist if os.path.splitext(item)[1] in supported_formats
    ]

--- a/demos/audio_tagging/README.md
+++ b/demos/audio_tagging/README.md
@ -57,7 +57,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav https://paddlespe
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import CLSExecutor
+  from paddlespeech.cli.cls import CLSExecutor

  cls_executor = CLSExecutor()
  result = cls_executor(
--- a/demos/audio_tagging/README_cn.md
+++ b/demos/audio_tagging/README_cn.md
@ -57,7 +57,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav https://paddlespe
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import CLSExecutor
+  from paddlespeech.cli.cls import CLSExecutor

  cls_executor = CLSExecutor()
  result = cls_executor(
--- a/demos/automatic_video_subtitiles/README.md
+++ b/demos/automatic_video_subtitiles/README.md
@ -28,7 +28,8 @@ ffmpeg -i subtitle_demo1.mp4 -ac 1 -ar 16000 -vn input.wav
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import ASRExecutor, TextExecutor
+  from paddlespeech.cli.asr import ASRExecutor
+  from paddlespeech.cli.text import TextExecutor

  asr_executor = ASRExecutor()
  text_executor = TextExecutor()
--- a/demos/automatic_video_subtitiles/README_cn.md
+++ b/demos/automatic_video_subtitiles/README_cn.md
@ -23,7 +23,8 @@ ffmpeg -i subtitle_demo1.mp4 -ac 1 -ar 16000 -vn input.wav
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import ASRExecutor, TextExecutor
+  from paddlespeech.cli.asr import ASRExecutor
+  from paddlespeech.cli.text import TextExecutor

  asr_executor = ASRExecutor()
  text_executor = TextExecutor()
--- a/demos/automatic_video_subtitiles/recognize.py
+++ b/demos/automatic_video_subtitiles/recognize.py
@ -16,8 +16,8 @@ import os

 import paddle

-from paddlespeech.cli import ASRExecutor
-from paddlespeech.cli import TextExecutor
+from paddlespeech.cli.asr import ASRExecutor
+from paddlespeech.cli.text import TextExecutor

 # yapf: disable
 parser = argparse.ArgumentParser(__doc__)
--- a/demos/custom_streaming_asr/README.md
+++ b/demos/custom_streaming_asr/README.md
@ -3,10 +3,13 @@
 # Customized Auto Speech Recognition

 ## introduction
+
 In some cases, we need to recognize the specific rare words with high accuracy. eg: address recognition in navigation apps. customized ASR can slove those issues.

 this demo is customized for expense account, which need to recognize rare address.

+the scripts are in https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/speechx/examples/custom_asr
+
 * G with slot: 打车到 "address_slot"。  
 ![](https://ai-studio-static-online.cdn.bcebos.com/28d9ef132a7f47a895a65ae9e5c4f55b8f472c9f3dd24be8a2e66e0b88b173a4)

@ -62,4 +65,4 @@ I0513 10:58:13.884493 41768 feature_cache.h:52] set finished
 I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240
 I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240
 LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90)  the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元
-```
+```
--- a/demos/custom_streaming_asr/README_cn.md
+++ b/demos/custom_streaming_asr/README_cn.md
@ -6,6 +6,8 @@

 这个 demo 是打车报销单的场景识别，需要识别一些稀有的地名，可以通过如下操作实现。

+相关脚本:https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/speechx/examples/custom_asr
+
 * G with slot: 打车到 "address_slot"。  
 ![](https://ai-studio-static-online.cdn.bcebos.com/28d9ef132a7f47a895a65ae9e5c4f55b8f472c9f3dd24be8a2e66e0b88b173a4)

--- a/demos/punctuation_restoration/README.md
+++ b/demos/punctuation_restoration/README.md
@ -42,7 +42,7 @@ The input of this demo should be a text of the specific language that can be pas
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import TextExecutor
+  from paddlespeech.cli.text import TextExecutor

  text_executor = TextExecutor()
  result = text_executor(
--- a/demos/punctuation_restoration/README_cn.md
+++ b/demos/punctuation_restoration/README_cn.md
@ -44,7 +44,7 @@
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import TextExecutor
+  from paddlespeech.cli.text import TextExecutor

  text_executor = TextExecutor()
  result = text_executor(
--- a/demos/speaker_verification/README.md
+++ b/demos/speaker_verification/README.md
@ -53,51 +53,50 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
  Output:

  ```bash
-    demo [  1.4217498    5.626253    -5.342073     1.1773866    3.308055
-    1.756596     5.167894    10.80636     -3.8226728   -5.6141334
-    2.623845    -0.8072968    1.9635103   -7.3128724    0.01103897
-    -9.723131     0.6619743   -6.976803    10.213478     7.494748
-    2.9105635    3.8949256    3.7999806    7.1061673   16.905321
-    -7.1493764    8.733103     3.4230042   -4.831653   -11.403367
-    11.232214     7.1274667   -4.2828417    2.452362    -5.130748
-    -18.177666    -2.6116815  -11.000337    -6.7314315    1.6564683
-    0.7618269    1.1253023   -2.083836     4.725744    -8.782597
-    -3.539873     3.814236     5.1420674    2.162061     4.096431
-    -6.4162116   12.747448     1.9429878  -15.152943     6.417416
-    16.097002    -9.716668    -1.9920526   -3.3649497   -1.871939
-    11.567354     3.69788     11.258265     7.442363     9.183411
-    4.5281515   -1.2417862    4.3959084    6.6727695    5.8898783
-    7.627124    -0.66919386 -11.889693    -9.208865    -7.4274073
-    -3.7776625    6.917234    -9.848748    -2.0944717   -5.135116
-    0.49563864   9.317534    -5.9141874   -1.8098574   -0.11738578
-    -7.169265    -1.0578263   -5.7216787   -5.1173844   16.137651
-    -4.473626     7.6624317   -0.55381083   9.631587    -6.4704556
-    -8.548508     4.3716145   -0.79702514   4.478997    -2.9758704
-    3.272176     2.8382776    5.134597    -9.190781    -0.5657382
-    -4.8745747    2.3165567   -5.984303    -2.1798875    0.35541576
-    -0.31784213   9.493548     2.1144536    4.358092   -12.089823
-    8.451689    -7.925461     4.6242585    4.4289427   18.692003
-    -2.6204622   -5.149185    -0.35821092   8.488551     4.981496
-    -9.32683     -2.2544234    6.6417594    1.2119585   10.977129
-    16.555033     3.3238444    9.551863    -1.6676947   -0.79539716
-    -8.605674    -0.47356385   2.6741948   -5.359179    -2.6673796
-    0.66607     15.443222     4.740594    -3.4725387   11.592567
-    -2.054497     1.7361217   -8.265324    -9.30447      5.4068313
-    -1.5180256   -7.746615    -6.089606     0.07112726  -0.34904733
-    -8.649895    -9.998958    -2.564841    -0.53999114   2.601808
-    -0.31927416  -1.8815292   -2.07215     -3.4105783   -8.2998085
-    1.483641   -15.365992    -8.288208     3.8847756   -3.4876456
-    7.3629923    0.4657332    3.132599    12.438889    -1.8337058
-    4.532936     2.7264361   10.145339    -6.521951     2.897153
-    -3.3925855    5.079156     7.759716     4.677565     5.8457737
-    2.402413     7.7071047    3.9711342   -6.390043     6.1268735
-    -3.7760346  -11.118123  ]
+    demo [ -1.3251206    7.8606825   -4.620626     0.3000721    2.2648535
+    -1.1931441    3.0647137    7.673595    -6.0044727  -12.02426
+    -1.9496069    3.1269536    1.618838    -7.6383104   -1.2299773
+  -12.338331     2.1373026   -5.3957124    9.717328     5.6752305
+    3.7805123    3.0597172    3.429692     8.97601     13.174125
+    -0.53132284   8.9424715    4.46511     -4.4262476   -9.726503
+    8.399328     7.2239175   -7.435854     2.9441683   -4.3430395
+  -13.886965    -1.6346735  -10.9027405   -5.311245     3.8007221
+    3.8976038   -2.1230774   -2.3521194    4.151031    -7.4048667
+    0.13911647   2.4626107    4.9664545    0.9897574    5.4839754
+    -3.3574002   10.1340065   -0.6120171  -10.403095     4.6007543
+    16.00935     -7.7836914   -4.1945305   -6.9368606    1.1789556
+    11.490801     4.2380238    9.550931     8.375046     7.5089145
+    -0.65707296  -0.30051577   2.8406055    3.0828028    0.730817
+    6.148354     0.13766119 -13.424735    -7.7461405   -2.3227983
+    -8.305252     2.9879124  -10.995229     0.15211068  -2.3820348
+    -1.7984174    8.495629    -5.8522367   -3.755498     0.6989711
+    -5.2702994   -2.6188622   -1.8828466   -4.64665     14.078544
+    -0.5495333   10.579158    -3.2160501    9.349004    -4.381078
+  -11.675817    -2.8630207    4.5721755    2.246612    -4.574342
+    1.8610188    2.3767874    5.6257877   -9.784078     0.64967257
+    -1.4579505    0.4263264   -4.9211264   -2.454784     3.4869802
+    -0.42654222   8.341269     1.356552     7.0966883  -13.102829
+    8.016734    -7.1159344    1.8699781    0.208721    14.699384
+    -1.025278    -2.6107233   -2.5082312    8.427193     6.9138527
+    -6.2912464    0.6157366    2.489688    -3.4668267    9.921763
+    11.200815    -0.1966403    7.4916005   -0.62312716  -0.25848144
+    -9.947997    -0.9611041    1.1649219   -2.1907122   -1.5028487
+    -0.51926106  15.165954     2.4649463   -0.9980445    7.4416637
+    -2.0768049    3.5896823   -7.3055434   -7.5620847    4.323335
+    0.0804418   -6.56401     -2.3148053   -1.7642345   -2.4708817
+    -7.675618    -9.548878    -1.0177554    0.16986446   2.5877135
+    -1.8752296   -0.36614323  -6.0493784   -2.3965611   -5.9453387
+    0.9424033  -13.155974    -7.457801     0.14658108  -3.742797
+    5.8414927   -1.2872906    5.5694313   12.57059      1.0939219
+    2.2142086    1.9181576    6.9914207   -5.888139     3.1409824
+    -2.003628     2.4434285    9.973139     5.03668      2.0051203
+    2.8615603    5.860224     2.9176188   -1.6311141    2.0292206
+    -4.070415    -6.831437  ]
  ```

 - Python API
  ```python
-  import paddle
-  from paddlespeech.cli import VectorExecutor
+  from paddlespeech.cli.vector import VectorExecutor

  vector_executor = VectorExecutor()
  audio_emb = vector_executor(
@ -128,88 +127,88 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
  ```bash
  # Vector Result:
   Audio embedding Result:
-    [  1.4217498    5.626253    -5.342073     1.1773866    3.308055
-    1.756596     5.167894    10.80636     -3.8226728   -5.6141334
-    2.623845    -0.8072968    1.9635103   -7.3128724    0.01103897
-    -9.723131     0.6619743   -6.976803    10.213478     7.494748
-    2.9105635    3.8949256    3.7999806    7.1061673   16.905321
-    -7.1493764    8.733103     3.4230042   -4.831653   -11.403367
-    11.232214     7.1274667   -4.2828417    2.452362    -5.130748
-    -18.177666    -2.6116815  -11.000337    -6.7314315    1.6564683
-    0.7618269    1.1253023   -2.083836     4.725744    -8.782597
-    -3.539873     3.814236     5.1420674    2.162061     4.096431
-    -6.4162116   12.747448     1.9429878  -15.152943     6.417416
-    16.097002    -9.716668    -1.9920526   -3.3649497   -1.871939
-    11.567354     3.69788     11.258265     7.442363     9.183411
-    4.5281515   -1.2417862    4.3959084    6.6727695    5.8898783
-    7.627124    -0.66919386 -11.889693    -9.208865    -7.4274073
-    -3.7776625    6.917234    -9.848748    -2.0944717   -5.135116
-    0.49563864   9.317534    -5.9141874   -1.8098574   -0.11738578
-    -7.169265    -1.0578263   -5.7216787   -5.1173844   16.137651
-    -4.473626     7.6624317   -0.55381083   9.631587    -6.4704556
-    -8.548508     4.3716145   -0.79702514   4.478997    -2.9758704
-    3.272176     2.8382776    5.134597    -9.190781    -0.5657382
-    -4.8745747    2.3165567   -5.984303    -2.1798875    0.35541576
-    -0.31784213   9.493548     2.1144536    4.358092   -12.089823
-    8.451689    -7.925461     4.6242585    4.4289427   18.692003
-    -2.6204622   -5.149185    -0.35821092   8.488551     4.981496
-    -9.32683     -2.2544234    6.6417594    1.2119585   10.977129
-    16.555033     3.3238444    9.551863    -1.6676947   -0.79539716
-    -8.605674    -0.47356385   2.6741948   -5.359179    -2.6673796
-    0.66607     15.443222     4.740594    -3.4725387   11.592567
-    -2.054497     1.7361217   -8.265324    -9.30447      5.4068313
-    -1.5180256   -7.746615    -6.089606     0.07112726  -0.34904733
-    -8.649895    -9.998958    -2.564841    -0.53999114   2.601808
-    -0.31927416  -1.8815292   -2.07215     -3.4105783   -8.2998085
-    1.483641   -15.365992    -8.288208     3.8847756   -3.4876456
-    7.3629923    0.4657332    3.132599    12.438889    -1.8337058
-    4.532936     2.7264361   10.145339    -6.521951     2.897153
-    -3.3925855    5.079156     7.759716     4.677565     5.8457737
-    2.402413     7.7071047    3.9711342   -6.390043     6.1268735
-    -3.7760346  -11.118123  ]
+    [ -1.3251206    7.8606825   -4.620626     0.3000721    2.2648535
+      -1.1931441    3.0647137    7.673595    -6.0044727  -12.02426
+      -1.9496069    3.1269536    1.618838    -7.6383104   -1.2299773
+    -12.338331     2.1373026   -5.3957124    9.717328     5.6752305
+      3.7805123    3.0597172    3.429692     8.97601     13.174125
+      -0.53132284   8.9424715    4.46511     -4.4262476   -9.726503
+      8.399328     7.2239175   -7.435854     2.9441683   -4.3430395
+    -13.886965    -1.6346735  -10.9027405   -5.311245     3.8007221
+      3.8976038   -2.1230774   -2.3521194    4.151031    -7.4048667
+      0.13911647   2.4626107    4.9664545    0.9897574    5.4839754
+      -3.3574002   10.1340065   -0.6120171  -10.403095     4.6007543
+      16.00935     -7.7836914   -4.1945305   -6.9368606    1.1789556
+      11.490801     4.2380238    9.550931     8.375046     7.5089145
+      -0.65707296  -0.30051577   2.8406055    3.0828028    0.730817
+      6.148354     0.13766119 -13.424735    -7.7461405   -2.3227983
+      -8.305252     2.9879124  -10.995229     0.15211068  -2.3820348
+      -1.7984174    8.495629    -5.8522367   -3.755498     0.6989711
+      -5.2702994   -2.6188622   -1.8828466   -4.64665     14.078544
+      -0.5495333   10.579158    -3.2160501    9.349004    -4.381078
+    -11.675817    -2.8630207    4.5721755    2.246612    -4.574342
+      1.8610188    2.3767874    5.6257877   -9.784078     0.64967257
+      -1.4579505    0.4263264   -4.9211264   -2.454784     3.4869802
+      -0.42654222   8.341269     1.356552     7.0966883  -13.102829
+      8.016734    -7.1159344    1.8699781    0.208721    14.699384
+      -1.025278    -2.6107233   -2.5082312    8.427193     6.9138527
+      -6.2912464    0.6157366    2.489688    -3.4668267    9.921763
+      11.200815    -0.1966403    7.4916005   -0.62312716  -0.25848144
+      -9.947997    -0.9611041    1.1649219   -2.1907122   -1.5028487
+      -0.51926106  15.165954     2.4649463   -0.9980445    7.4416637
+      -2.0768049    3.5896823   -7.3055434   -7.5620847    4.323335
+      0.0804418   -6.56401     -2.3148053   -1.7642345   -2.4708817
+      -7.675618    -9.548878    -1.0177554    0.16986446   2.5877135
+      -1.8752296   -0.36614323  -6.0493784   -2.3965611   -5.9453387
+      0.9424033  -13.155974    -7.457801     0.14658108  -3.742797
+      5.8414927   -1.2872906    5.5694313   12.57059      1.0939219
+      2.2142086    1.9181576    6.9914207   -5.888139     3.1409824
+      -2.003628     2.4434285    9.973139     5.03668      2.0051203
+      2.8615603    5.860224     2.9176188   -1.6311141    2.0292206
+      -4.070415    -6.831437  ]
    # get the test embedding
    Test embedding Result:
-    [ -1.902964     2.0690894   -8.034194     3.5472693    0.18089125
-      6.9085927    1.4097427   -1.9487704  -10.021278    -0.20755845
-      -8.04332      4.344489     2.3200977  -14.306299     5.184692
-    -11.55602     -3.8497238    0.6444722    1.2833948    2.6766639
-      0.5878921    0.7946299    1.7207596    2.5791872   14.998469
-      -1.3385371   15.031221    -0.8006958    1.99287     -9.52007
-      2.435466     4.003221    -4.33817     -4.898601    -5.304714
-    -18.033886    10.790787   -12.784645    -5.641755     2.9761686
-    -10.566622     1.4839455    6.152458    -5.7195854    2.8603241
-      6.112133     8.489869     5.5958056    1.2836679   -1.2293907
-      0.89927405   7.0288725   -2.854029    -0.9782962    5.8255906
-      14.905906    -5.025907     0.7866458   -4.2444224  -16.354029
-      10.521315     0.9604709   -3.3257897    7.144871   -13.592733
-      -8.568869    -1.7953678    0.26313916  10.916714    -6.9374123
-      1.857403    -6.2746415    2.8154466   -7.2338667   -2.293357
-      -0.05452765   5.4287076    5.0849075   -6.690375    -1.6183422
-      3.654291     0.94352573  -9.200294    -5.4749465   -3.5235846
-      1.3420814    4.240421    -2.772944    -2.8451524   16.311104
-      4.2969875   -1.762936   -12.5758915    8.595198    -0.8835239
-      -1.5708797    1.568961     1.1413603    3.5032008   -0.45251232
-      -6.786333    16.89443      5.3366146   -8.789056     0.6355629
-      3.2579517   -3.328322     7.5969577    0.66025066  -6.550468
-      -9.148656     2.020372    -0.4615173    1.1965656   -3.8764873
-      11.6562195   -6.0750933   12.182899     3.2218833    0.81969476
-      5.570001    -3.8459578   -7.205299     7.9262037   -7.6611166
-      -5.249467    -2.2671914    7.2658715  -13.298164     4.821147
-      -2.7263982   11.691089    -3.8918593   -2.838112    -1.0336838
-      -3.8034165    2.8536487   -5.60398     -1.1972581    1.3455094
-      -3.4903061    2.2408795    5.5010734   -3.970756    11.99696
-      -7.8858757    0.43160373  -5.5059714    4.3426995   16.322706
-      11.635366     0.72157705  -9.245714    -3.91465     -4.449838
-      -1.5716927    7.713747    -2.2430465   -6.198303   -13.481864
-      2.8156567   -5.7812386    5.1456156    2.7289324  -14.505571
-      13.270688     3.448231    -7.0659585    4.5886116   -4.466099
-      -0.296428   -11.463529    -2.6076477   14.110243    -6.9725137
-      -1.9962958    2.7119343   19.391657     0.01961198  14.607133
-      -1.6695905   -4.391516     1.3131028   -6.670972    -5.888604
-      12.0612335    5.9285784    3.3715196    1.492534    10.723728
-      -0.95514804 -12.085431  ]
+    [  2.5247195    5.119042    -4.335273     4.4583654    5.047907
+      3.5059214    1.6159848    0.49364898 -11.6899185   -3.1014526
+      -5.6589785   -0.42684984   2.674276   -11.937654     6.2248464
+    -10.776924    -5.694543     1.112041     1.5709964    1.0961034
+      1.3976512    2.324352     1.339981     5.279319    13.734659
+      -2.5753925   13.651442    -2.2357535    5.1575427   -3.251567
+      1.4023279    6.1191974   -6.0845175   -1.3646189   -2.6789894
+    -15.220778     9.779349    -9.411551    -6.388947     6.8313975
+      -9.245996     0.31196198   2.5509644   -4.413065     6.1649427
+      6.793837     2.6328635    8.620976     3.4832475    0.52491665
+      2.9115407    5.8392377    0.6702376   -3.2726715    2.6694255
+      16.91701     -5.5811176    0.23362345  -4.5573606  -11.801059
+      14.728292    -0.5198082   -3.999922     7.0927105   -7.0459595
+      -5.4389      -0.46420583  -5.1085467   10.376568    -8.889225
+      -0.37705845  -1.659806     2.6731026   -7.1909504    1.4608804
+      -2.163136    -0.17949677   4.0241547    0.11319201   0.601279
+      2.039692     3.1910992  -11.649526    -8.121584    -4.8707457
+      0.3851982    1.4231744   -2.3321972    0.99332285  14.121717
+      5.899413     0.7384519  -17.760096    10.555021     4.1366534
+      -0.3391071   -0.20792882   3.208204     0.8847948   -8.721497
+      -6.432868    13.006379     4.8956      -9.155822    -1.9441519
+      5.7815638   -2.066733    10.425042    -0.8802383   -2.4314315
+      -9.869258     0.35095334  -5.3549943    2.1076174   -8.290468
+      8.4433365   -4.689333     9.334139    -2.172678    -3.0250976
+      8.394216    -3.2110903   -7.93868      2.3960824   -2.3213403
+      -1.4963245   -3.476059     4.132903   -10.893354     4.362673
+      -0.45456508  10.258634    -1.1655927   -6.7799754    0.22885278
+      -4.399287     2.333433    -4.84745     -4.2752337   -1.3577863
+      -1.0685898    9.505196     7.3062205    0.08708266  12.927811
+      -9.57974      1.3936648   -1.9444873    5.776769    15.251903
+      10.6118355   -1.4903594   -9.535318    -3.6553776   -1.6699586
+      -0.5933151    7.600357    -4.8815503   -8.698617   -15.855757
+      0.25632986  -7.2235737    0.9506656    0.7128582   -9.051738
+      8.74869     -1.6426028   -6.5762258    2.506905    -6.7431564
+      5.129912   -12.189555    -3.6435068   12.068113    -6.0059533
+      -2.3535995    2.9014351   22.3082      -1.5563312   13.193291
+      2.7583609   -7.468798     1.3407065   -4.599617    -6.2345777
+      10.7689295    7.137627     5.099476     0.3473359    9.647881
+      -2.0484571   -5.8549366 ]
    # get the score between enroll and test
-    Eembeddings Score: 0.4292638301849365
+    Eembeddings Score: 0.45332613587379456
  ```

 ### 4.Pretrained Models
--- a/demos/speaker_verification/README_cn.md
+++ b/demos/speaker_verification/README_cn.md
@ -51,51 +51,51 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav

  输出：
  ```bash
-  demo  [  1.4217498    5.626253    -5.342073     1.1773866    3.308055
-    1.756596     5.167894    10.80636     -3.8226728   -5.6141334
-    2.623845    -0.8072968    1.9635103   -7.3128724    0.01103897
-    -9.723131     0.6619743   -6.976803    10.213478     7.494748
-    2.9105635    3.8949256    3.7999806    7.1061673   16.905321
-    -7.1493764    8.733103     3.4230042   -4.831653   -11.403367
-    11.232214     7.1274667   -4.2828417    2.452362    -5.130748
-    -18.177666    -2.6116815  -11.000337    -6.7314315    1.6564683
-    0.7618269    1.1253023   -2.083836     4.725744    -8.782597
-    -3.539873     3.814236     5.1420674    2.162061     4.096431
-    -6.4162116   12.747448     1.9429878  -15.152943     6.417416
-    16.097002    -9.716668    -1.9920526   -3.3649497   -1.871939
-    11.567354     3.69788     11.258265     7.442363     9.183411
-    4.5281515   -1.2417862    4.3959084    6.6727695    5.8898783
-    7.627124    -0.66919386 -11.889693    -9.208865    -7.4274073
-    -3.7776625    6.917234    -9.848748    -2.0944717   -5.135116
-    0.49563864   9.317534    -5.9141874   -1.8098574   -0.11738578
-    -7.169265    -1.0578263   -5.7216787   -5.1173844   16.137651
-    -4.473626     7.6624317   -0.55381083   9.631587    -6.4704556
-    -8.548508     4.3716145   -0.79702514   4.478997    -2.9758704
-    3.272176     2.8382776    5.134597    -9.190781    -0.5657382
-    -4.8745747    2.3165567   -5.984303    -2.1798875    0.35541576
-    -0.31784213   9.493548     2.1144536    4.358092   -12.089823
-    8.451689    -7.925461     4.6242585    4.4289427   18.692003
-    -2.6204622   -5.149185    -0.35821092   8.488551     4.981496
-    -9.32683     -2.2544234    6.6417594    1.2119585   10.977129
-    16.555033     3.3238444    9.551863    -1.6676947   -0.79539716
-    -8.605674    -0.47356385   2.6741948   -5.359179    -2.6673796
-    0.66607     15.443222     4.740594    -3.4725387   11.592567
-    -2.054497     1.7361217   -8.265324    -9.30447      5.4068313
-    -1.5180256   -7.746615    -6.089606     0.07112726  -0.34904733
-    -8.649895    -9.998958    -2.564841    -0.53999114   2.601808
-    -0.31927416  -1.8815292   -2.07215     -3.4105783   -8.2998085
-    1.483641   -15.365992    -8.288208     3.8847756   -3.4876456
-    7.3629923    0.4657332    3.132599    12.438889    -1.8337058
-    4.532936     2.7264361   10.145339    -6.521951     2.897153
-    -3.3925855    5.079156     7.759716     4.677565     5.8457737
-    2.402413     7.7071047    3.9711342   -6.390043     6.1268735
-    -3.7760346  -11.118123  ]
+    [ -1.3251206    7.8606825   -4.620626     0.3000721    2.2648535
+    -1.1931441    3.0647137    7.673595    -6.0044727  -12.02426
+    -1.9496069    3.1269536    1.618838    -7.6383104   -1.2299773
+  -12.338331     2.1373026   -5.3957124    9.717328     5.6752305
+    3.7805123    3.0597172    3.429692     8.97601     13.174125
+    -0.53132284   8.9424715    4.46511     -4.4262476   -9.726503
+    8.399328     7.2239175   -7.435854     2.9441683   -4.3430395
+  -13.886965    -1.6346735  -10.9027405   -5.311245     3.8007221
+    3.8976038   -2.1230774   -2.3521194    4.151031    -7.4048667
+    0.13911647   2.4626107    4.9664545    0.9897574    5.4839754
+    -3.3574002   10.1340065   -0.6120171  -10.403095     4.6007543
+    16.00935     -7.7836914   -4.1945305   -6.9368606    1.1789556
+    11.490801     4.2380238    9.550931     8.375046     7.5089145
+    -0.65707296  -0.30051577   2.8406055    3.0828028    0.730817
+    6.148354     0.13766119 -13.424735    -7.7461405   -2.3227983
+    -8.305252     2.9879124  -10.995229     0.15211068  -2.3820348
+    -1.7984174    8.495629    -5.8522367   -3.755498     0.6989711
+    -5.2702994   -2.6188622   -1.8828466   -4.64665     14.078544
+    -0.5495333   10.579158    -3.2160501    9.349004    -4.381078
+  -11.675817    -2.8630207    4.5721755    2.246612    -4.574342
+    1.8610188    2.3767874    5.6257877   -9.784078     0.64967257
+    -1.4579505    0.4263264   -4.9211264   -2.454784     3.4869802
+    -0.42654222   8.341269     1.356552     7.0966883  -13.102829
+    8.016734    -7.1159344    1.8699781    0.208721    14.699384
+    -1.025278    -2.6107233   -2.5082312    8.427193     6.9138527
+    -6.2912464    0.6157366    2.489688    -3.4668267    9.921763
+    11.200815    -0.1966403    7.4916005   -0.62312716  -0.25848144
+    -9.947997    -0.9611041    1.1649219   -2.1907122   -1.5028487
+    -0.51926106  15.165954     2.4649463   -0.9980445    7.4416637
+    -2.0768049    3.5896823   -7.3055434   -7.5620847    4.323335
+    0.0804418   -6.56401     -2.3148053   -1.7642345   -2.4708817
+    -7.675618    -9.548878    -1.0177554    0.16986446   2.5877135
+    -1.8752296   -0.36614323  -6.0493784   -2.3965611   -5.9453387
+    0.9424033  -13.155974    -7.457801     0.14658108  -3.742797
+    5.8414927   -1.2872906    5.5694313   12.57059      1.0939219
+    2.2142086    1.9181576    6.9914207   -5.888139     3.1409824
+    -2.003628     2.4434285    9.973139     5.03668      2.0051203
+    2.8615603    5.860224     2.9176188   -1.6311141    2.0292206
+    -4.070415    -6.831437  ]
  ```

 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import VectorExecutor
+  from paddlespeech.cli.vector import VectorExecutor

  vector_executor = VectorExecutor()
  audio_emb = vector_executor(
@ -125,88 +125,88 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
  ```bash
  # Vector Result:
   Audio embedding Result:
-    [  1.4217498    5.626253    -5.342073     1.1773866    3.308055
-    1.756596     5.167894    10.80636     -3.8226728   -5.6141334
-    2.623845    -0.8072968    1.9635103   -7.3128724    0.01103897
-    -9.723131     0.6619743   -6.976803    10.213478     7.494748
-    2.9105635    3.8949256    3.7999806    7.1061673   16.905321
-    -7.1493764    8.733103     3.4230042   -4.831653   -11.403367
-    11.232214     7.1274667   -4.2828417    2.452362    -5.130748
-    -18.177666    -2.6116815  -11.000337    -6.7314315    1.6564683
-    0.7618269    1.1253023   -2.083836     4.725744    -8.782597
-    -3.539873     3.814236     5.1420674    2.162061     4.096431
-    -6.4162116   12.747448     1.9429878  -15.152943     6.417416
-    16.097002    -9.716668    -1.9920526   -3.3649497   -1.871939
-    11.567354     3.69788     11.258265     7.442363     9.183411
-    4.5281515   -1.2417862    4.3959084    6.6727695    5.8898783
-    7.627124    -0.66919386 -11.889693    -9.208865    -7.4274073
-    -3.7776625    6.917234    -9.848748    -2.0944717   -5.135116
-    0.49563864   9.317534    -5.9141874   -1.8098574   -0.11738578
-    -7.169265    -1.0578263   -5.7216787   -5.1173844   16.137651
-    -4.473626     7.6624317   -0.55381083   9.631587    -6.4704556
-    -8.548508     4.3716145   -0.79702514   4.478997    -2.9758704
-    3.272176     2.8382776    5.134597    -9.190781    -0.5657382
-    -4.8745747    2.3165567   -5.984303    -2.1798875    0.35541576
-    -0.31784213   9.493548     2.1144536    4.358092   -12.089823
-    8.451689    -7.925461     4.6242585    4.4289427   18.692003
-    -2.6204622   -5.149185    -0.35821092   8.488551     4.981496
-    -9.32683     -2.2544234    6.6417594    1.2119585   10.977129
-    16.555033     3.3238444    9.551863    -1.6676947   -0.79539716
-    -8.605674    -0.47356385   2.6741948   -5.359179    -2.6673796
-    0.66607     15.443222     4.740594    -3.4725387   11.592567
-    -2.054497     1.7361217   -8.265324    -9.30447      5.4068313
-    -1.5180256   -7.746615    -6.089606     0.07112726  -0.34904733
-    -8.649895    -9.998958    -2.564841    -0.53999114   2.601808
-    -0.31927416  -1.8815292   -2.07215     -3.4105783   -8.2998085
-    1.483641   -15.365992    -8.288208     3.8847756   -3.4876456
-    7.3629923    0.4657332    3.132599    12.438889    -1.8337058
-    4.532936     2.7264361   10.145339    -6.521951     2.897153
-    -3.3925855    5.079156     7.759716     4.677565     5.8457737
-    2.402413     7.7071047    3.9711342   -6.390043     6.1268735
-    -3.7760346  -11.118123  ]
+    [ -1.3251206    7.8606825   -4.620626     0.3000721    2.2648535
+      -1.1931441    3.0647137    7.673595    -6.0044727  -12.02426
+      -1.9496069    3.1269536    1.618838    -7.6383104   -1.2299773
+    -12.338331     2.1373026   -5.3957124    9.717328     5.6752305
+      3.7805123    3.0597172    3.429692     8.97601     13.174125
+      -0.53132284   8.9424715    4.46511     -4.4262476   -9.726503
+      8.399328     7.2239175   -7.435854     2.9441683   -4.3430395
+    -13.886965    -1.6346735  -10.9027405   -5.311245     3.8007221
+      3.8976038   -2.1230774   -2.3521194    4.151031    -7.4048667
+      0.13911647   2.4626107    4.9664545    0.9897574    5.4839754
+      -3.3574002   10.1340065   -0.6120171  -10.403095     4.6007543
+      16.00935     -7.7836914   -4.1945305   -6.9368606    1.1789556
+      11.490801     4.2380238    9.550931     8.375046     7.5089145
+      -0.65707296  -0.30051577   2.8406055    3.0828028    0.730817
+      6.148354     0.13766119 -13.424735    -7.7461405   -2.3227983
+      -8.305252     2.9879124  -10.995229     0.15211068  -2.3820348
+      -1.7984174    8.495629    -5.8522367   -3.755498     0.6989711
+      -5.2702994   -2.6188622   -1.8828466   -4.64665     14.078544
+      -0.5495333   10.579158    -3.2160501    9.349004    -4.381078
+    -11.675817    -2.8630207    4.5721755    2.246612    -4.574342
+      1.8610188    2.3767874    5.6257877   -9.784078     0.64967257
+      -1.4579505    0.4263264   -4.9211264   -2.454784     3.4869802
+      -0.42654222   8.341269     1.356552     7.0966883  -13.102829
+      8.016734    -7.1159344    1.8699781    0.208721    14.699384
+      -1.025278    -2.6107233   -2.5082312    8.427193     6.9138527
+      -6.2912464    0.6157366    2.489688    -3.4668267    9.921763
+      11.200815    -0.1966403    7.4916005   -0.62312716  -0.25848144
+      -9.947997    -0.9611041    1.1649219   -2.1907122   -1.5028487
+      -0.51926106  15.165954     2.4649463   -0.9980445    7.4416637
+      -2.0768049    3.5896823   -7.3055434   -7.5620847    4.323335
+      0.0804418   -6.56401     -2.3148053   -1.7642345   -2.4708817
+      -7.675618    -9.548878    -1.0177554    0.16986446   2.5877135
+      -1.8752296   -0.36614323  -6.0493784   -2.3965611   -5.9453387
+      0.9424033  -13.155974    -7.457801     0.14658108  -3.742797
+      5.8414927   -1.2872906    5.5694313   12.57059      1.0939219
+      2.2142086    1.9181576    6.9914207   -5.888139     3.1409824
+      -2.003628     2.4434285    9.973139     5.03668      2.0051203
+      2.8615603    5.860224     2.9176188   -1.6311141    2.0292206
+      -4.070415    -6.831437  ]
    # get the test embedding
    Test embedding Result:
-    [ -1.902964     2.0690894   -8.034194     3.5472693    0.18089125
-      6.9085927    1.4097427   -1.9487704  -10.021278    -0.20755845
-      -8.04332      4.344489     2.3200977  -14.306299     5.184692
-    -11.55602     -3.8497238    0.6444722    1.2833948    2.6766639
-      0.5878921    0.7946299    1.7207596    2.5791872   14.998469
-      -1.3385371   15.031221    -0.8006958    1.99287     -9.52007
-      2.435466     4.003221    -4.33817     -4.898601    -5.304714
-    -18.033886    10.790787   -12.784645    -5.641755     2.9761686
-    -10.566622     1.4839455    6.152458    -5.7195854    2.8603241
-      6.112133     8.489869     5.5958056    1.2836679   -1.2293907
-      0.89927405   7.0288725   -2.854029    -0.9782962    5.8255906
-      14.905906    -5.025907     0.7866458   -4.2444224  -16.354029
-      10.521315     0.9604709   -3.3257897    7.144871   -13.592733
-      -8.568869    -1.7953678    0.26313916  10.916714    -6.9374123
-      1.857403    -6.2746415    2.8154466   -7.2338667   -2.293357
-      -0.05452765   5.4287076    5.0849075   -6.690375    -1.6183422
-      3.654291     0.94352573  -9.200294    -5.4749465   -3.5235846
-      1.3420814    4.240421    -2.772944    -2.8451524   16.311104
-      4.2969875   -1.762936   -12.5758915    8.595198    -0.8835239
-      -1.5708797    1.568961     1.1413603    3.5032008   -0.45251232
-      -6.786333    16.89443      5.3366146   -8.789056     0.6355629
-      3.2579517   -3.328322     7.5969577    0.66025066  -6.550468
-      -9.148656     2.020372    -0.4615173    1.1965656   -3.8764873
-      11.6562195   -6.0750933   12.182899     3.2218833    0.81969476
-      5.570001    -3.8459578   -7.205299     7.9262037   -7.6611166
-      -5.249467    -2.2671914    7.2658715  -13.298164     4.821147
-      -2.7263982   11.691089    -3.8918593   -2.838112    -1.0336838
-      -3.8034165    2.8536487   -5.60398     -1.1972581    1.3455094
-      -3.4903061    2.2408795    5.5010734   -3.970756    11.99696
-      -7.8858757    0.43160373  -5.5059714    4.3426995   16.322706
-      11.635366     0.72157705  -9.245714    -3.91465     -4.449838
-      -1.5716927    7.713747    -2.2430465   -6.198303   -13.481864
-      2.8156567   -5.7812386    5.1456156    2.7289324  -14.505571
-      13.270688     3.448231    -7.0659585    4.5886116   -4.466099
-      -0.296428   -11.463529    -2.6076477   14.110243    -6.9725137
-      -1.9962958    2.7119343   19.391657     0.01961198  14.607133
-      -1.6695905   -4.391516     1.3131028   -6.670972    -5.888604
-      12.0612335    5.9285784    3.3715196    1.492534    10.723728
-      -0.95514804 -12.085431  ]
+    [  2.5247195    5.119042    -4.335273     4.4583654    5.047907
+      3.5059214    1.6159848    0.49364898 -11.6899185   -3.1014526
+      -5.6589785   -0.42684984   2.674276   -11.937654     6.2248464
+    -10.776924    -5.694543     1.112041     1.5709964    1.0961034
+      1.3976512    2.324352     1.339981     5.279319    13.734659
+      -2.5753925   13.651442    -2.2357535    5.1575427   -3.251567
+      1.4023279    6.1191974   -6.0845175   -1.3646189   -2.6789894
+    -15.220778     9.779349    -9.411551    -6.388947     6.8313975
+      -9.245996     0.31196198   2.5509644   -4.413065     6.1649427
+      6.793837     2.6328635    8.620976     3.4832475    0.52491665
+      2.9115407    5.8392377    0.6702376   -3.2726715    2.6694255
+      16.91701     -5.5811176    0.23362345  -4.5573606  -11.801059
+      14.728292    -0.5198082   -3.999922     7.0927105   -7.0459595
+      -5.4389      -0.46420583  -5.1085467   10.376568    -8.889225
+      -0.37705845  -1.659806     2.6731026   -7.1909504    1.4608804
+      -2.163136    -0.17949677   4.0241547    0.11319201   0.601279
+      2.039692     3.1910992  -11.649526    -8.121584    -4.8707457
+      0.3851982    1.4231744   -2.3321972    0.99332285  14.121717
+      5.899413     0.7384519  -17.760096    10.555021     4.1366534
+      -0.3391071   -0.20792882   3.208204     0.8847948   -8.721497
+      -6.432868    13.006379     4.8956      -9.155822    -1.9441519
+      5.7815638   -2.066733    10.425042    -0.8802383   -2.4314315
+      -9.869258     0.35095334  -5.3549943    2.1076174   -8.290468
+      8.4433365   -4.689333     9.334139    -2.172678    -3.0250976
+      8.394216    -3.2110903   -7.93868      2.3960824   -2.3213403
+      -1.4963245   -3.476059     4.132903   -10.893354     4.362673
+      -0.45456508  10.258634    -1.1655927   -6.7799754    0.22885278
+      -4.399287     2.333433    -4.84745     -4.2752337   -1.3577863
+      -1.0685898    9.505196     7.3062205    0.08708266  12.927811
+      -9.57974      1.3936648   -1.9444873    5.776769    15.251903
+      10.6118355   -1.4903594   -9.535318    -3.6553776   -1.6699586
+      -0.5933151    7.600357    -4.8815503   -8.698617   -15.855757
+      0.25632986  -7.2235737    0.9506656    0.7128582   -9.051738
+      8.74869     -1.6426028   -6.5762258    2.506905    -6.7431564
+      5.129912   -12.189555    -3.6435068   12.068113    -6.0059533
+      -2.3535995    2.9014351   22.3082      -1.5563312   13.193291
+      2.7583609   -7.468798     1.3407065   -4.599617    -6.2345777
+      10.7689295    7.137627     5.099476     0.3473359    9.647881
+      -2.0484571   -5.8549366 ]
    # get the score between enroll and test
-    Eembeddings Score: 0.4292638301849365
+    Eembeddings Score: 0.45332613587379456
  ```

 ### 4.预训练模型
--- a/demos/speech_recognition/README.md
+++ b/demos/speech_recognition/README.md
@ -58,7 +58,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import ASRExecutor
+  from paddlespeech.cli.asr import ASRExecutor

  asr_executor = ASRExecutor()
  text = asr_executor(
--- a/demos/speech_recognition/README_cn.md
+++ b/demos/speech_recognition/README_cn.md
@ -56,7 +56,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import ASRExecutor
+  from paddlespeech.cli.asr import ASRExecutor

  asr_executor = ASRExecutor()
  text = asr_executor(
--- a/demos/speech_server/README.md
+++ b/demos/speech_server/README.md
@ -257,13 +257,13 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
  paddlespeech_client vector --task spk  --server_ip 127.0.0.1 --port 8090 --input 85236145389.wav
  ```

-  * Usage:
+  Usage:

  ``` bash
  paddlespeech_client vector --help
  ```

-  * Arguments:
+  Arguments:
    * server_ip: server ip. Default: 127.0.0.1
    * port: server port. Default: 8090
    * input(required): Input text to generate.
@ -271,35 +271,35 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
    * enroll: enroll audio
    * test: test audio

-  * Output:
+  Output:

-  ``` bash
-    [2022-05-08 00:18:44,249] [    INFO] - vector http client start
-    [2022-05-08 00:18:44,250] [    INFO] - the input audio: 85236145389.wav
-    [2022-05-08 00:18:44,250] [    INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector
-    [2022-05-08 00:18:44,250] [    INFO] - http://127.0.0.1:8590/paddlespeech/vector
-    [2022-05-08 00:18:44,406] [    INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [1.421751856803894, 5.626245498657227, -5.342077255249023, 1.1773887872695923, 3.3080549240112305, 1.7565933465957642, 5.167886257171631, 10.806358337402344, -3.8226819038391113, -5.614140033721924, 2.6238479614257812, -0.8072972893714905, 1.9635076522827148, -7.312870025634766, 0.011035939678549767, -9.723129272460938, 0.6619706153869629, -6.976806163787842, 10.213476181030273, 7.494769096374512, 2.9105682373046875, 3.8949244022369385, 3.799983501434326, 7.106168746948242, 16.90532875061035, -7.149388313293457, 8.733108520507812, 3.423006296157837, -4.831653594970703, -11.403363227844238, 11.232224464416504, 7.127461910247803, -4.282842636108398, 2.452359437942505, -5.130749702453613, -18.17766761779785, -2.6116831302642822, -11.000344276428223, -6.731433391571045, 1.6564682722091675, 0.7618281245231628, 1.125300407409668, -2.0838370323181152, 4.725743293762207, -8.782588005065918, -3.5398752689361572, 3.8142364025115967, 5.142068862915039, 2.1620609760284424, 4.09643030166626, -6.416214942932129, 12.747446060180664, 1.9429892301559448, -15.15294361114502, 6.417416095733643, 16.09701156616211, -9.716667175292969, -1.9920575618743896, -3.36494779586792, -1.8719440698623657, 11.567351341247559, 3.6978814601898193, 11.258262634277344, 7.442368507385254, 9.183408737182617, 4.528149127960205, -1.2417854070663452, 4.395912170410156, 6.6727728843688965, 5.88988733291626, 7.627128601074219, -0.6691966652870178, -11.889698028564453, -9.20886516571045, -7.42740535736084, -3.777663230895996, 6.917238712310791, -9.848755836486816, -2.0944676399230957, -5.1351165771484375, 0.4956451654434204, 9.317537307739258, -5.914181232452393, -1.809860348701477, -0.11738915741443634, -7.1692705154418945, -1.057827353477478, -5.721670627593994, -5.117385387420654, 16.13765525817871, -4.473617076873779, 7.6624321937561035, -0.55381840467453, 9.631585121154785, -6.470459461212158, -8.548508644104004, 4.371616840362549, -0.7970245480537415, 4.4789886474609375, -2.975860834121704, 3.2721822261810303, 2.838287830352783, 5.134591102600098, -9.19079875946045, -0.5657302737236023, -4.8745832443237305, 2.3165574073791504, -5.984319686889648, -2.1798853874206543, 0.3554139733314514, -0.3178512752056122, 9.493552207946777, 2.1144471168518066, 4.358094692230225, -12.089824676513672, 8.451693534851074, -7.925466537475586, 4.624246597290039, 4.428936958312988, 18.69200897216797, -2.6204581260681152, -5.14918851852417, -0.3582090139389038, 8.488558769226074, 4.98148775100708, -9.326835632324219, -2.2544219493865967, 6.641760349273682, 1.2119598388671875, 10.977124214172363, 16.555034637451172, 3.3238420486450195, 9.551861763000488, -1.6676981449127197, -0.7953944206237793, -8.605667114257812, -0.4735655188560486, 2.674196243286133, -5.359177112579346, -2.66738224029541, 0.6660683155059814, 15.44322681427002, 4.740593433380127, -3.472534418106079, 11.592567443847656, -2.0544962882995605, 1.736127495765686, -8.265326499938965, -9.30447769165039, 5.406829833984375, -1.518022894859314, -7.746612548828125, -6.089611053466797, 0.07112743705511093, -0.3490503430366516, -8.64989185333252, -9.998957633972168, -2.564845085144043, -0.5399947762489319, 2.6018123626708984, -0.3192799389362335, -1.8815255165100098, -2.0721492767333984, -3.410574436187744, -8.29980754852295, 1.483638048171997, -15.365986824035645, -8.288211822509766, 3.884779930114746, -3.4876468181610107, 7.362999439239502, 0.4657334089279175, 3.1326050758361816, 12.438895225524902, -1.8337041139602661, 4.532927989959717, 2.7264339923858643, 10.14534854888916, -6.521963596343994, 2.897155523300171, -3.392582654953003, 5.079153060913086, 7.7597246170043945, 4.677570819854736, 5.845779895782471, 2.402411460876465, 7.7071051597595215, 3.9711380004882812, -6.39003849029541, 6.12687873840332, -3.776029348373413, -11.118121147155762]}}
-    [2022-05-08 00:18:44,406] [    INFO] - Response time 0.156481 s.
+  ```bash
+    [2022-05-25 12:25:36,165] [    INFO] - vector http client start
+    [2022-05-25 12:25:36,165] [    INFO] - the input audio: 85236145389.wav
+    [2022-05-25 12:25:36,165] [    INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector
+    [2022-05-25 12:25:36,166] [    INFO] - http://127.0.0.1:8790/paddlespeech/vector
+    [2022-05-25 12:25:36,324] [    INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [-1.3251205682754517, 7.860682487487793, -4.620625972747803, 0.3000721037387848, 2.2648534774780273, -1.1931440830230713, 3.064713716506958, 7.673594951629639, -6.004472732543945, -12.024259567260742, -1.9496068954467773, 3.126953601837158, 1.6188379526138306, -7.638310432434082, -1.2299772500991821, -12.33833122253418, 2.1373026371002197, -5.395712375640869, 9.717328071594238, 5.675230503082275, 3.7805123329162598, 3.0597171783447266, 3.429692029953003, 8.9760103225708, 13.174124717712402, -0.5313228368759155, 8.942471504211426, 4.465109825134277, -4.426247596740723, -9.726503372192383, 8.399328231811523, 7.223917484283447, -7.435853958129883, 2.9441683292388916, -4.343039512634277, -13.886964797973633, -1.6346734762191772, -10.902740478515625, -5.311244964599609, 3.800722122192383, 3.897603750228882, -2.123077392578125, -2.3521194458007812, 4.151031017303467, -7.404866695404053, 0.13911646604537964, 2.4626107215881348, 4.96645450592041, 0.9897574186325073, 5.483975410461426, -3.3574001789093018, 10.13400650024414, -0.6120170950889587, -10.403095245361328, 4.600754261016846, 16.009349822998047, -7.78369140625, -4.194530487060547, -6.93686056137085, 1.1789555549621582, 11.490800857543945, 4.23802375793457, 9.550930976867676, 8.375045776367188, 7.508914470672607, -0.6570729613304138, -0.3005157709121704, 2.8406054973602295, 3.0828027725219727, 0.7308170199394226, 6.1483540534973145, 0.1376611888408661, -13.424735069274902, -7.746140480041504, -2.322798252105713, -8.305252075195312, 2.98791241645813, -10.99522876739502, 0.15211068093776703, -2.3820347785949707, -1.7984174489974976, 8.49562931060791, -5.852236747741699, -3.755497932434082, 0.6989710927009583, -5.270299434661865, -2.6188621520996094, -1.8828465938568115, -4.6466498374938965, 14.078543663024902, -0.5495333075523376, 10.579157829284668, -3.216050148010254, 9.349003791809082, -4.381077766418457, -11.675816535949707, -2.863020658493042, 4.5721755027771, 2.246612071990967, -4.574341773986816, 1.8610187768936157, 2.3767874240875244, 5.625787734985352, -9.784077644348145, 0.6496725678443909, -1.457950472831726, 0.4263263940811157, -4.921126365661621, -2.4547839164733887, 3.4869801998138428, -0.4265422224998474, 8.341268539428711, 1.356552004814148, 7.096688270568848, -13.102828979492188, 8.01673412322998, -7.115934371948242, 1.8699780702590942, 0.20872099697589874, 14.699383735656738, -1.0252779722213745, -2.6107232570648193, -2.5082311630249023, 8.427192687988281, 6.913852691650391, -6.29124641418457, 0.6157366037368774, 2.489687919616699, -3.4668266773223877, 9.92176342010498, 11.200815200805664, -0.19664029777050018, 7.491600513458252, -0.6231271624565125, -0.2584814429283142, -9.947997093200684, -0.9611040949821472, 1.1649218797683716, -2.1907122135162354, -1.502848744392395, -0.5192610621452332, 15.165953636169434, 2.4649462699890137, -0.998044490814209, 7.44166374206543, -2.0768048763275146, 3.5896823406219482, -7.305543422698975, -7.562084674835205, 4.32333517074585, 0.08044180274009705, -6.564010143280029, -2.314805269241333, -1.7642345428466797, -2.470881700515747, -7.6756181716918945, -9.548877716064453, -1.017755389213562, 0.1698644608259201, 2.5877134799957275, -1.8752295970916748, -0.36614322662353516, -6.049378395080566, -2.3965611457824707, -5.945338726043701, 0.9424033164978027, -13.155974388122559, -7.45780086517334, 0.14658108353614807, -3.7427968978881836, 5.841492652893066, -1.2872905731201172, 5.569431304931641, 12.570590019226074, 1.0939218997955322, 2.2142086029052734, 1.9181575775146484, 6.991420745849609, -5.888138771057129, 3.1409823894500732, -2.0036280155181885, 2.4434285163879395, 9.973138809204102, 5.036680221557617, 2.005120277404785, 2.861560344696045, 5.860223770141602, 2.917618751525879, -1.63111412525177, 2.0292205810546875, -4.070415019989014, -6.831437110900879]}}
+    [2022-05-25 12:25:36,324] [    INFO] - Response time 0.159053 s.
  ```

 * Python API

-``` python
-from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor
+  ``` python
+  from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor

-vectorclient_executor = VectorClientExecutor()
-res = vectorclient_executor(
-    input="85236145389.wav",
-    server_ip="127.0.0.1",
-    port=8090,
-    task="spk")
-print(res)
-```
+  vectorclient_executor = VectorClientExecutor()
+  res = vectorclient_executor(
+      input="85236145389.wav",
+      server_ip="127.0.0.1",
+      port=8090,
+      task="spk")
+  print(res)
+  ```

-* Output:
+  Output:

  ``` bash
-    {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [1.421751856803894, 5.626245498657227, -5.342077255249023, 1.1773887872695923, 3.3080549240112305, 1.7565933465957642, 5.167886257171631, 10.806358337402344, -3.8226819038391113, -5.614140033721924, 2.6238479614257812, -0.8072972893714905, 1.9635076522827148, -7.312870025634766, 0.011035939678549767, -9.723129272460938, 0.6619706153869629, -6.976806163787842, 10.213476181030273, 7.494769096374512, 2.9105682373046875, 3.8949244022369385, 3.799983501434326, 7.106168746948242, 16.90532875061035, -7.149388313293457, 8.733108520507812, 3.423006296157837, -4.831653594970703, -11.403363227844238, 11.232224464416504, 7.127461910247803, -4.282842636108398, 2.452359437942505, -5.130749702453613, -18.17766761779785, -2.6116831302642822, -11.000344276428223, -6.731433391571045, 1.6564682722091675, 0.7618281245231628, 1.125300407409668, -2.0838370323181152, 4.725743293762207, -8.782588005065918, -3.5398752689361572, 3.8142364025115967, 5.142068862915039, 2.1620609760284424, 4.09643030166626, -6.416214942932129, 12.747446060180664, 1.9429892301559448, -15.15294361114502, 6.417416095733643, 16.09701156616211, -9.716667175292969, -1.9920575618743896, -3.36494779586792, -1.8719440698623657, 11.567351341247559, 3.6978814601898193, 11.258262634277344, 7.442368507385254, 9.183408737182617, 4.528149127960205, -1.2417854070663452, 4.395912170410156, 6.6727728843688965, 5.88988733291626, 7.627128601074219, -0.6691966652870178, -11.889698028564453, -9.20886516571045, -7.42740535736084, -3.777663230895996, 6.917238712310791, -9.848755836486816, -2.0944676399230957, -5.1351165771484375, 0.4956451654434204, 9.317537307739258, -5.914181232452393, -1.809860348701477, -0.11738915741443634, -7.1692705154418945, -1.057827353477478, -5.721670627593994, -5.117385387420654, 16.13765525817871, -4.473617076873779, 7.6624321937561035, -0.55381840467453, 9.631585121154785, -6.470459461212158, -8.548508644104004, 4.371616840362549, -0.7970245480537415, 4.4789886474609375, -2.975860834121704, 3.2721822261810303, 2.838287830352783, 5.134591102600098, -9.19079875946045, -0.5657302737236023, -4.8745832443237305, 2.3165574073791504, -5.984319686889648, -2.1798853874206543, 0.3554139733314514, -0.3178512752056122, 9.493552207946777, 2.1144471168518066, 4.358094692230225, -12.089824676513672, 8.451693534851074, -7.925466537475586, 4.624246597290039, 4.428936958312988, 18.69200897216797, -2.6204581260681152, -5.14918851852417, -0.3582090139389038, 8.488558769226074, 4.98148775100708, -9.326835632324219, -2.2544219493865967, 6.641760349273682, 1.2119598388671875, 10.977124214172363, 16.555034637451172, 3.3238420486450195, 9.551861763000488, -1.6676981449127197, -0.7953944206237793, -8.605667114257812, -0.4735655188560486, 2.674196243286133, -5.359177112579346, -2.66738224029541, 0.6660683155059814, 15.44322681427002, 4.740593433380127, -3.472534418106079, 11.592567443847656, -2.0544962882995605, 1.736127495765686, -8.265326499938965, -9.30447769165039, 5.406829833984375, -1.518022894859314, -7.746612548828125, -6.089611053466797, 0.07112743705511093, -0.3490503430366516, -8.64989185333252, -9.998957633972168, -2.564845085144043, -0.5399947762489319, 2.6018123626708984, -0.3192799389362335, -1.8815255165100098, -2.0721492767333984, -3.410574436187744, -8.29980754852295, 1.483638048171997, -15.365986824035645, -8.288211822509766, 3.884779930114746, -3.4876468181610107, 7.362999439239502, 0.4657334089279175, 3.1326050758361816, 12.438895225524902, -1.8337041139602661, 4.532927989959717, 2.7264339923858643, 10.14534854888916, -6.521963596343994, 2.897155523300171, -3.392582654953003, 5.079153060913086, 7.7597246170043945, 4.677570819854736, 5.845779895782471, 2.402411460876465, 7.7071051597595215, 3.9711380004882812, -6.39003849029541, 6.12687873840332, -3.776029348373413, -11.118121147155762]}}
+    {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [-1.3251205682754517, 7.860682487487793, -4.620625972747803, 0.3000721037387848, 2.2648534774780273, -1.1931440830230713, 3.064713716506958, 7.673594951629639, -6.004472732543945, -12.024259567260742, -1.9496068954467773, 3.126953601837158, 1.6188379526138306, -7.638310432434082, -1.2299772500991821, -12.33833122253418, 2.1373026371002197, -5.395712375640869, 9.717328071594238, 5.675230503082275, 3.7805123329162598, 3.0597171783447266, 3.429692029953003, 8.9760103225708, 13.174124717712402, -0.5313228368759155, 8.942471504211426, 4.465109825134277, -4.426247596740723, -9.726503372192383, 8.399328231811523, 7.223917484283447, -7.435853958129883, 2.9441683292388916, -4.343039512634277, -13.886964797973633, -1.6346734762191772, -10.902740478515625, -5.311244964599609, 3.800722122192383, 3.897603750228882, -2.123077392578125, -2.3521194458007812, 4.151031017303467, -7.404866695404053, 0.13911646604537964, 2.4626107215881348, 4.96645450592041, 0.9897574186325073, 5.483975410461426, -3.3574001789093018, 10.13400650024414, -0.6120170950889587, -10.403095245361328, 4.600754261016846, 16.009349822998047, -7.78369140625, -4.194530487060547, -6.93686056137085, 1.1789555549621582, 11.490800857543945, 4.23802375793457, 9.550930976867676, 8.375045776367188, 7.508914470672607, -0.6570729613304138, -0.3005157709121704, 2.8406054973602295, 3.0828027725219727, 0.7308170199394226, 6.1483540534973145, 0.1376611888408661, -13.424735069274902, -7.746140480041504, -2.322798252105713, -8.305252075195312, 2.98791241645813, -10.99522876739502, 0.15211068093776703, -2.3820347785949707, -1.7984174489974976, 8.49562931060791, -5.852236747741699, -3.755497932434082, 0.6989710927009583, -5.270299434661865, -2.6188621520996094, -1.8828465938568115, -4.6466498374938965, 14.078543663024902, -0.5495333075523376, 10.579157829284668, -3.216050148010254, 9.349003791809082, -4.381077766418457, -11.675816535949707, -2.863020658493042, 4.5721755027771, 2.246612071990967, -4.574341773986816, 1.8610187768936157, 2.3767874240875244, 5.625787734985352, -9.784077644348145, 0.6496725678443909, -1.457950472831726, 0.4263263940811157, -4.921126365661621, -2.4547839164733887, 3.4869801998138428, -0.4265422224998474, 8.341268539428711, 1.356552004814148, 7.096688270568848, -13.102828979492188, 8.01673412322998, -7.115934371948242, 1.8699780702590942, 0.20872099697589874, 14.699383735656738, -1.0252779722213745, -2.6107232570648193, -2.5082311630249023, 8.427192687988281, 6.913852691650391, -6.29124641418457, 0.6157366037368774, 2.489687919616699, -3.4668266773223877, 9.92176342010498, 11.200815200805664, -0.19664029777050018, 7.491600513458252, -0.6231271624565125, -0.2584814429283142, -9.947997093200684, -0.9611040949821472, 1.1649218797683716, -2.1907122135162354, -1.502848744392395, -0.5192610621452332, 15.165953636169434, 2.4649462699890137, -0.998044490814209, 7.44166374206543, -2.0768048763275146, 3.5896823406219482, -7.305543422698975, -7.562084674835205, 4.32333517074585, 0.08044180274009705, -6.564010143280029, -2.314805269241333, -1.7642345428466797, -2.470881700515747, -7.6756181716918945, -9.548877716064453, -1.017755389213562, 0.1698644608259201, 2.5877134799957275, -1.8752295970916748, -0.36614322662353516, -6.049378395080566, -2.3965611457824707, -5.945338726043701, 0.9424033164978027, -13.155974388122559, -7.45780086517334, 0.14658108353614807, -3.7427968978881836, 5.841492652893066, -1.2872905731201172, 5.569431304931641, 12.570590019226074, 1.0939218997955322, 2.2142086029052734, 1.9181575775146484, 6.991420745849609, -5.888138771057129, 3.1409823894500732, -2.0036280155181885, 2.4434285163879395, 9.973138809204102, 5.036680221557617, 2.005120277404785, 2.861560344696045, 5.860223770141602, 2.917618751525879, -1.63111412525177, 2.0292205810546875, -4.070415019989014, -6.831437110900879]}}
  ```

 #### 7.2 Get the score between speaker audio embedding
@ -314,13 +314,13 @@ print(res)
  paddlespeech_client vector --task score  --server_ip 127.0.0.1 --port 8090 --enroll 85236145389.wav --test 123456789.wav
  ```

-  * Usage:
+  Usage:

  ``` bash
  paddlespeech_client vector --help
  ```

-  * Arguments:
+  Arguments:
    * server_ip: server ip. Default: 127.0.0.1
    * port: server port. Default: 8090
    * input(required): Input text to generate.
@ -328,42 +328,42 @@ print(res)
    * enroll: enroll audio
    * test: test audio
  
-* Output:
-
-``` bash
-  [2022-05-09 10:28:40,556] [    INFO] - vector score http client start
-  [2022-05-09 10:28:40,556] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
-  [2022-05-09 10:28:40,556] [    INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector/score
-  [2022-05-09 10:28:40,731] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.4292638897895813}}
-  [2022-05-09 10:28:40,731] [    INFO] - The vector: None
-  [2022-05-09 10:28:40,731] [    INFO] - Response time 0.175514 s.
-```
+  Output:
+
+  ``` bash
+    [2022-05-25 12:33:24,527] [    INFO] - vector score http client start
+    [2022-05-25 12:33:24,527] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
+    [2022-05-25 12:33:24,528] [    INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector/score
+    [2022-05-25 12:33:24,695] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
+    [2022-05-25 12:33:24,696] [    INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
+    [2022-05-25 12:33:24,696] [    INFO] - Response time 0.168271 s.
+  ```

 * Python API

-``` python 
-from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor
-
-vectorclient_executor = VectorClientExecutor()
-res = vectorclient_executor(
-    input=None,
-    enroll_audio="85236145389.wav",
-    test_audio="123456789.wav",
-    server_ip="127.0.0.1",
-    port=8090,
-    task="score")
-print(res)
-```
+  ``` python 
+  from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor

-* Output:
+  vectorclient_executor = VectorClientExecutor()
+  res = vectorclient_executor(
+      input=None,
+      enroll_audio="85236145389.wav",
+      test_audio="123456789.wav",
+      server_ip="127.0.0.1",
+      port=8090,
+      task="score")
+  print(res)
+  ```

-``` bash
-[2022-05-09 10:34:54,769] [    INFO] - vector score http client start
-[2022-05-09 10:34:54,771] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
-[2022-05-09 10:34:54,771] [    INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector/score
-[2022-05-09 10:34:55,026] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.4292638897895813}}
-```
+  Output:

+  ``` bash
+  [2022-05-25 12:30:14,143] [    INFO] - vector score http client start
+  [2022-05-25 12:30:14,143] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
+  [2022-05-25 12:30:14,143] [    INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector/score
+  [2022-05-25 12:30:14,363] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
+  {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
+  ```

 ### 8. Punctuation prediction
  
@ -382,7 +382,7 @@ print(res)
  ```bash
  paddlespeech_client text --help
  ```
-  参数:
+  Arguments:
  - `server_ip`: server ip. Default: 127.0.0.1
  - `port`: server port. Default: 8090
  - `input`(required): Input text to get punctuation.
--- a/demos/speech_server/README_cn.md
+++ b/demos/speech_server/README_cn.md
@ -3,7 +3,7 @@
 # 语音服务

 ## 介绍
-这个demo是一个启动离线语音服务和访问服务的实现。它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。
+这个 demo 是一个启动离线语音服务和访问服务的实现。它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。


 ## 使用方法
@ -24,7 +24,7 @@

 ASR client 的输入是一个 WAV 文件（`.wav`），并且采样率必须与模型的采样率相同。

-可以下载此 ASR client的示例音频：
+可以下载此 ASR client 的示例音频：
 ```bash
 wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
 ```
@ -99,7 +99,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
  ```

  参数:
-  - `server_ip`: 服务端ip地址，默认: 127.0.0.1。
+  - `server_ip`: 服务端 ip 地址，默认: 127.0.0.1。
  - `port`: 服务端口，默认: 8090。
  - `input`(必须输入): 用于识别的音频文件。
  - `sample_rate`: 音频采样率，默认值：16000。
@ -198,10 +198,11 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee

  ```

-  ### 6. CLS 客户端使用方法
+### 6. CLS 客户端使用方法

-  **注意：** 初次使用客户端时响应时间会略长
-  - 命令行 (推荐使用)
+**注意：** 初次使用客户端时响应时间会略长
+
+- 命令行 (推荐使用)

  若 `127.0.0.1` 不能访问，则需要使用实际服务 IP 地址

@ -215,7 +216,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
  paddlespeech_client cls --help
  ```
  参数:
-  - `server_ip`: 服务端ip地址，默认: 127.0.0.1。
+  - `server_ip`: 服务端 ip 地址，默认: 127.0.0.1。
  - `port`: 服务端口，默认: 8090。
  - `input`(必须输入): 用于分类的音频文件。
  - `topk`: 分类结果的topk。
@ -261,48 +262,48 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
  paddlespeech_client vector --task spk  --server_ip 127.0.0.1 --port 8090 --input 85236145389.wav
  ```

-* 使用帮助:
+  使用帮助:

  ``` bash
  paddlespeech_client vector --help
  ```
-* 参数:
+  参数:
  * server_ip: 服务端ip地址，默认: 127.0.0.1。
  * port: 服务端口，默认: 8090。
  * input(必须输入): 用于识别的音频文件。
  * task: vector 的任务，可选spk或者score。默认是 spk。
  * enroll: 注册音频；。
  * test: 测试音频。
-* 输出:
-
-``` bash
-  [2022-05-08 00:18:44,249] [    INFO] - vector http client start
-  [2022-05-08 00:18:44,250] [    INFO] - the input audio: 85236145389.wav
-  [2022-05-08 00:18:44,250] [    INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector
-  [2022-05-08 00:18:44,250] [    INFO] - http://127.0.0.1:8590/paddlespeech/vector
-  [2022-05-08 00:18:44,406] [    INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [1.421751856803894, 5.626245498657227, -5.342077255249023, 1.1773887872695923, 3.3080549240112305, 1.7565933465957642, 5.167886257171631, 10.806358337402344, -3.8226819038391113, -5.614140033721924, 2.6238479614257812, -0.8072972893714905, 1.9635076522827148, -7.312870025634766, 0.011035939678549767, -9.723129272460938, 0.6619706153869629, -6.976806163787842, 10.213476181030273, 7.494769096374512, 2.9105682373046875, 3.8949244022369385, 3.799983501434326, 7.106168746948242, 16.90532875061035, -7.149388313293457, 8.733108520507812, 3.423006296157837, -4.831653594970703, -11.403363227844238, 11.232224464416504, 7.127461910247803, -4.282842636108398, 2.452359437942505, -5.130749702453613, -18.17766761779785, -2.6116831302642822, -11.000344276428223, -6.731433391571045, 1.6564682722091675, 0.7618281245231628, 1.125300407409668, -2.0838370323181152, 4.725743293762207, -8.782588005065918, -3.5398752689361572, 3.8142364025115967, 5.142068862915039, 2.1620609760284424, 4.09643030166626, -6.416214942932129, 12.747446060180664, 1.9429892301559448, -15.15294361114502, 6.417416095733643, 16.09701156616211, -9.716667175292969, -1.9920575618743896, -3.36494779586792, -1.8719440698623657, 11.567351341247559, 3.6978814601898193, 11.258262634277344, 7.442368507385254, 9.183408737182617, 4.528149127960205, -1.2417854070663452, 4.395912170410156, 6.6727728843688965, 5.88988733291626, 7.627128601074219, -0.6691966652870178, -11.889698028564453, -9.20886516571045, -7.42740535736084, -3.777663230895996, 6.917238712310791, -9.848755836486816, -2.0944676399230957, -5.1351165771484375, 0.4956451654434204, 9.317537307739258, -5.914181232452393, -1.809860348701477, -0.11738915741443634, -7.1692705154418945, -1.057827353477478, -5.721670627593994, -5.117385387420654, 16.13765525817871, -4.473617076873779, 7.6624321937561035, -0.55381840467453, 9.631585121154785, -6.470459461212158, -8.548508644104004, 4.371616840362549, -0.7970245480537415, 4.4789886474609375, -2.975860834121704, 3.2721822261810303, 2.838287830352783, 5.134591102600098, -9.19079875946045, -0.5657302737236023, -4.8745832443237305, 2.3165574073791504, -5.984319686889648, -2.1798853874206543, 0.3554139733314514, -0.3178512752056122, 9.493552207946777, 2.1144471168518066, 4.358094692230225, -12.089824676513672, 8.451693534851074, -7.925466537475586, 4.624246597290039, 4.428936958312988, 18.69200897216797, -2.6204581260681152, -5.14918851852417, -0.3582090139389038, 8.488558769226074, 4.98148775100708, -9.326835632324219, -2.2544219493865967, 6.641760349273682, 1.2119598388671875, 10.977124214172363, 16.555034637451172, 3.3238420486450195, 9.551861763000488, -1.6676981449127197, -0.7953944206237793, -8.605667114257812, -0.4735655188560486, 2.674196243286133, -5.359177112579346, -2.66738224029541, 0.6660683155059814, 15.44322681427002, 4.740593433380127, -3.472534418106079, 11.592567443847656, -2.0544962882995605, 1.736127495765686, -8.265326499938965, -9.30447769165039, 5.406829833984375, -1.518022894859314, -7.746612548828125, -6.089611053466797, 0.07112743705511093, -0.3490503430366516, -8.64989185333252, -9.998957633972168, -2.564845085144043, -0.5399947762489319, 2.6018123626708984, -0.3192799389362335, -1.8815255165100098, -2.0721492767333984, -3.410574436187744, -8.29980754852295, 1.483638048171997, -15.365986824035645, -8.288211822509766, 3.884779930114746, -3.4876468181610107, 7.362999439239502, 0.4657334089279175, 3.1326050758361816, 12.438895225524902, -1.8337041139602661, 4.532927989959717, 2.7264339923858643, 10.14534854888916, -6.521963596343994, 2.897155523300171, -3.392582654953003, 5.079153060913086, 7.7597246170043945, 4.677570819854736, 5.845779895782471, 2.402411460876465, 7.7071051597595215, 3.9711380004882812, -6.39003849029541, 6.12687873840332, -3.776029348373413, -11.118121147155762]}}
-  [2022-05-08 00:18:44,406] [    INFO] - Response time 0.156481 s.
-```
+  输出:
+
+  ``` bash
+    [2022-05-25 12:25:36,165] [    INFO] - vector http client start
+    [2022-05-25 12:25:36,165] [    INFO] - the input audio: 85236145389.wav
+    [2022-05-25 12:25:36,165] [    INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector
+    [2022-05-25 12:25:36,166] [    INFO] - http://127.0.0.1:8790/paddlespeech/vector
+    [2022-05-25 12:25:36,324] [    INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [-1.3251205682754517, 7.860682487487793, -4.620625972747803, 0.3000721037387848, 2.2648534774780273, -1.1931440830230713, 3.064713716506958, 7.673594951629639, -6.004472732543945, -12.024259567260742, -1.9496068954467773, 3.126953601837158, 1.6188379526138306, -7.638310432434082, -1.2299772500991821, -12.33833122253418, 2.1373026371002197, -5.395712375640869, 9.717328071594238, 5.675230503082275, 3.7805123329162598, 3.0597171783447266, 3.429692029953003, 8.9760103225708, 13.174124717712402, -0.5313228368759155, 8.942471504211426, 4.465109825134277, -4.426247596740723, -9.726503372192383, 8.399328231811523, 7.223917484283447, -7.435853958129883, 2.9441683292388916, -4.343039512634277, -13.886964797973633, -1.6346734762191772, -10.902740478515625, -5.311244964599609, 3.800722122192383, 3.897603750228882, -2.123077392578125, -2.3521194458007812, 4.151031017303467, -7.404866695404053, 0.13911646604537964, 2.4626107215881348, 4.96645450592041, 0.9897574186325073, 5.483975410461426, -3.3574001789093018, 10.13400650024414, -0.6120170950889587, -10.403095245361328, 4.600754261016846, 16.009349822998047, -7.78369140625, -4.194530487060547, -6.93686056137085, 1.1789555549621582, 11.490800857543945, 4.23802375793457, 9.550930976867676, 8.375045776367188, 7.508914470672607, -0.6570729613304138, -0.3005157709121704, 2.8406054973602295, 3.0828027725219727, 0.7308170199394226, 6.1483540534973145, 0.1376611888408661, -13.424735069274902, -7.746140480041504, -2.322798252105713, -8.305252075195312, 2.98791241645813, -10.99522876739502, 0.15211068093776703, -2.3820347785949707, -1.7984174489974976, 8.49562931060791, -5.852236747741699, -3.755497932434082, 0.6989710927009583, -5.270299434661865, -2.6188621520996094, -1.8828465938568115, -4.6466498374938965, 14.078543663024902, -0.5495333075523376, 10.579157829284668, -3.216050148010254, 9.349003791809082, -4.381077766418457, -11.675816535949707, -2.863020658493042, 4.5721755027771, 2.246612071990967, -4.574341773986816, 1.8610187768936157, 2.3767874240875244, 5.625787734985352, -9.784077644348145, 0.6496725678443909, -1.457950472831726, 0.4263263940811157, -4.921126365661621, -2.4547839164733887, 3.4869801998138428, -0.4265422224998474, 8.341268539428711, 1.356552004814148, 7.096688270568848, -13.102828979492188, 8.01673412322998, -7.115934371948242, 1.8699780702590942, 0.20872099697589874, 14.699383735656738, -1.0252779722213745, -2.6107232570648193, -2.5082311630249023, 8.427192687988281, 6.913852691650391, -6.29124641418457, 0.6157366037368774, 2.489687919616699, -3.4668266773223877, 9.92176342010498, 11.200815200805664, -0.19664029777050018, 7.491600513458252, -0.6231271624565125, -0.2584814429283142, -9.947997093200684, -0.9611040949821472, 1.1649218797683716, -2.1907122135162354, -1.502848744392395, -0.5192610621452332, 15.165953636169434, 2.4649462699890137, -0.998044490814209, 7.44166374206543, -2.0768048763275146, 3.5896823406219482, -7.305543422698975, -7.562084674835205, 4.32333517074585, 0.08044180274009705, -6.564010143280029, -2.314805269241333, -1.7642345428466797, -2.470881700515747, -7.6756181716918945, -9.548877716064453, -1.017755389213562, 0.1698644608259201, 2.5877134799957275, -1.8752295970916748, -0.36614322662353516, -6.049378395080566, -2.3965611457824707, -5.945338726043701, 0.9424033164978027, -13.155974388122559, -7.45780086517334, 0.14658108353614807, -3.7427968978881836, 5.841492652893066, -1.2872905731201172, 5.569431304931641, 12.570590019226074, 1.0939218997955322, 2.2142086029052734, 1.9181575775146484, 6.991420745849609, -5.888138771057129, 3.1409823894500732, -2.0036280155181885, 2.4434285163879395, 9.973138809204102, 5.036680221557617, 2.005120277404785, 2.861560344696045, 5.860223770141602, 2.917618751525879, -1.63111412525177, 2.0292205810546875, -4.070415019989014, -6.831437110900879]}}
+    [2022-05-25 12:25:36,324] [    INFO] - Response time 0.159053 s.
+  ```

 * Python API

-``` python
-from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor
+  ``` python
+  from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor

-vectorclient_executor = VectorClientExecutor()
-res = vectorclient_executor(
-    input="85236145389.wav",
-    server_ip="127.0.0.1",
-    port=8090,
-    task="spk")
-print(res)
-```
+  vectorclient_executor = VectorClientExecutor()
+  res = vectorclient_executor(
+      input="85236145389.wav",
+      server_ip="127.0.0.1",
+      port=8090,
+      task="spk")
+  print(res)
+  ```

-* 输出:
+  输出:

-``` bash
-  {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [1.421751856803894, 5.626245498657227, -5.342077255249023, 1.1773887872695923, 3.3080549240112305, 1.7565933465957642, 5.167886257171631, 10.806358337402344, -3.8226819038391113, -5.614140033721924, 2.6238479614257812, -0.8072972893714905, 1.9635076522827148, -7.312870025634766, 0.011035939678549767, -9.723129272460938, 0.6619706153869629, -6.976806163787842, 10.213476181030273, 7.494769096374512, 2.9105682373046875, 3.8949244022369385, 3.799983501434326, 7.106168746948242, 16.90532875061035, -7.149388313293457, 8.733108520507812, 3.423006296157837, -4.831653594970703, -11.403363227844238, 11.232224464416504, 7.127461910247803, -4.282842636108398, 2.452359437942505, -5.130749702453613, -18.17766761779785, -2.6116831302642822, -11.000344276428223, -6.731433391571045, 1.6564682722091675, 0.7618281245231628, 1.125300407409668, -2.0838370323181152, 4.725743293762207, -8.782588005065918, -3.5398752689361572, 3.8142364025115967, 5.142068862915039, 2.1620609760284424, 4.09643030166626, -6.416214942932129, 12.747446060180664, 1.9429892301559448, -15.15294361114502, 6.417416095733643, 16.09701156616211, -9.716667175292969, -1.9920575618743896, -3.36494779586792, -1.8719440698623657, 11.567351341247559, 3.6978814601898193, 11.258262634277344, 7.442368507385254, 9.183408737182617, 4.528149127960205, -1.2417854070663452, 4.395912170410156, 6.6727728843688965, 5.88988733291626, 7.627128601074219, -0.6691966652870178, -11.889698028564453, -9.20886516571045, -7.42740535736084, -3.777663230895996, 6.917238712310791, -9.848755836486816, -2.0944676399230957, -5.1351165771484375, 0.4956451654434204, 9.317537307739258, -5.914181232452393, -1.809860348701477, -0.11738915741443634, -7.1692705154418945, -1.057827353477478, -5.721670627593994, -5.117385387420654, 16.13765525817871, -4.473617076873779, 7.6624321937561035, -0.55381840467453, 9.631585121154785, -6.470459461212158, -8.548508644104004, 4.371616840362549, -0.7970245480537415, 4.4789886474609375, -2.975860834121704, 3.2721822261810303, 2.838287830352783, 5.134591102600098, -9.19079875946045, -0.5657302737236023, -4.8745832443237305, 2.3165574073791504, -5.984319686889648, -2.1798853874206543, 0.3554139733314514, -0.3178512752056122, 9.493552207946777, 2.1144471168518066, 4.358094692230225, -12.089824676513672, 8.451693534851074, -7.925466537475586, 4.624246597290039, 4.428936958312988, 18.69200897216797, -2.6204581260681152, -5.14918851852417, -0.3582090139389038, 8.488558769226074, 4.98148775100708, -9.326835632324219, -2.2544219493865967, 6.641760349273682, 1.2119598388671875, 10.977124214172363, 16.555034637451172, 3.3238420486450195, 9.551861763000488, -1.6676981449127197, -0.7953944206237793, -8.605667114257812, -0.4735655188560486, 2.674196243286133, -5.359177112579346, -2.66738224029541, 0.6660683155059814, 15.44322681427002, 4.740593433380127, -3.472534418106079, 11.592567443847656, -2.0544962882995605, 1.736127495765686, -8.265326499938965, -9.30447769165039, 5.406829833984375, -1.518022894859314, -7.746612548828125, -6.089611053466797, 0.07112743705511093, -0.3490503430366516, -8.64989185333252, -9.998957633972168, -2.564845085144043, -0.5399947762489319, 2.6018123626708984, -0.3192799389362335, -1.8815255165100098, -2.0721492767333984, -3.410574436187744, -8.29980754852295, 1.483638048171997, -15.365986824035645, -8.288211822509766, 3.884779930114746, -3.4876468181610107, 7.362999439239502, 0.4657334089279175, 3.1326050758361816, 12.438895225524902, -1.8337041139602661, 4.532927989959717, 2.7264339923858643, 10.14534854888916, -6.521963596343994, 2.897155523300171, -3.392582654953003, 5.079153060913086, 7.7597246170043945, 4.677570819854736, 5.845779895782471, 2.402411460876465, 7.7071051597595215, 3.9711380004882812, -6.39003849029541, 6.12687873840332, -3.776029348373413, -11.118121147155762]}}
-```
+  ``` bash
+    {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [-1.3251205682754517, 7.860682487487793, -4.620625972747803, 0.3000721037387848, 2.2648534774780273, -1.1931440830230713, 3.064713716506958, 7.673594951629639, -6.004472732543945, -12.024259567260742, -1.9496068954467773, 3.126953601837158, 1.6188379526138306, -7.638310432434082, -1.2299772500991821, -12.33833122253418, 2.1373026371002197, -5.395712375640869, 9.717328071594238, 5.675230503082275, 3.7805123329162598, 3.0597171783447266, 3.429692029953003, 8.9760103225708, 13.174124717712402, -0.5313228368759155, 8.942471504211426, 4.465109825134277, -4.426247596740723, -9.726503372192383, 8.399328231811523, 7.223917484283447, -7.435853958129883, 2.9441683292388916, -4.343039512634277, -13.886964797973633, -1.6346734762191772, -10.902740478515625, -5.311244964599609, 3.800722122192383, 3.897603750228882, -2.123077392578125, -2.3521194458007812, 4.151031017303467, -7.404866695404053, 0.13911646604537964, 2.4626107215881348, 4.96645450592041, 0.9897574186325073, 5.483975410461426, -3.3574001789093018, 10.13400650024414, -0.6120170950889587, -10.403095245361328, 4.600754261016846, 16.009349822998047, -7.78369140625, -4.194530487060547, -6.93686056137085, 1.1789555549621582, 11.490800857543945, 4.23802375793457, 9.550930976867676, 8.375045776367188, 7.508914470672607, -0.6570729613304138, -0.3005157709121704, 2.8406054973602295, 3.0828027725219727, 0.7308170199394226, 6.1483540534973145, 0.1376611888408661, -13.424735069274902, -7.746140480041504, -2.322798252105713, -8.305252075195312, 2.98791241645813, -10.99522876739502, 0.15211068093776703, -2.3820347785949707, -1.7984174489974976, 8.49562931060791, -5.852236747741699, -3.755497932434082, 0.6989710927009583, -5.270299434661865, -2.6188621520996094, -1.8828465938568115, -4.6466498374938965, 14.078543663024902, -0.5495333075523376, 10.579157829284668, -3.216050148010254, 9.349003791809082, -4.381077766418457, -11.675816535949707, -2.863020658493042, 4.5721755027771, 2.246612071990967, -4.574341773986816, 1.8610187768936157, 2.3767874240875244, 5.625787734985352, -9.784077644348145, 0.6496725678443909, -1.457950472831726, 0.4263263940811157, -4.921126365661621, -2.4547839164733887, 3.4869801998138428, -0.4265422224998474, 8.341268539428711, 1.356552004814148, 7.096688270568848, -13.102828979492188, 8.01673412322998, -7.115934371948242, 1.8699780702590942, 0.20872099697589874, 14.699383735656738, -1.0252779722213745, -2.6107232570648193, -2.5082311630249023, 8.427192687988281, 6.913852691650391, -6.29124641418457, 0.6157366037368774, 2.489687919616699, -3.4668266773223877, 9.92176342010498, 11.200815200805664, -0.19664029777050018, 7.491600513458252, -0.6231271624565125, -0.2584814429283142, -9.947997093200684, -0.9611040949821472, 1.1649218797683716, -2.1907122135162354, -1.502848744392395, -0.5192610621452332, 15.165953636169434, 2.4649462699890137, -0.998044490814209, 7.44166374206543, -2.0768048763275146, 3.5896823406219482, -7.305543422698975, -7.562084674835205, 4.32333517074585, 0.08044180274009705, -6.564010143280029, -2.314805269241333, -1.7642345428466797, -2.470881700515747, -7.6756181716918945, -9.548877716064453, -1.017755389213562, 0.1698644608259201, 2.5877134799957275, -1.8752295970916748, -0.36614322662353516, -6.049378395080566, -2.3965611457824707, -5.945338726043701, 0.9424033164978027, -13.155974388122559, -7.45780086517334, 0.14658108353614807, -3.7427968978881836, 5.841492652893066, -1.2872905731201172, 5.569431304931641, 12.570590019226074, 1.0939218997955322, 2.2142086029052734, 1.9181575775146484, 6.991420745849609, -5.888138771057129, 3.1409823894500732, -2.0036280155181885, 2.4434285163879395, 9.973138809204102, 5.036680221557617, 2.005120277404785, 2.861560344696045, 5.860223770141602, 2.917618751525879, -1.63111412525177, 2.0292205810546875, -4.070415019989014, -6.831437110900879]}}
+  ```

 #### 7.2 音频声纹打分

@ -315,60 +316,62 @@ print(res)
  paddlespeech_client vector --task score  --server_ip 127.0.0.1 --port 8090 --enroll 85236145389.wav --test 123456789.wav
  ```

-* 使用帮助:
+  使用帮助:

  ``` bash
  paddlespeech_client vector --help
  ```

-* 参数:
+  参数:
  * server_ip: 服务端ip地址，默认: 127.0.0.1。
  * port: 服务端口，默认: 8090。
  * input(必须输入): 用于识别的音频文件。
  * task: vector 的任务，可选spk或者score。默认是 spk。
  * enroll: 注册音频；。
  * test: 测试音频。
-* 输出:
-
-``` bash
-  [2022-05-09 10:28:40,556] [    INFO] - vector score http client start
-  [2022-05-09 10:28:40,556] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
-  [2022-05-09 10:28:40,556] [    INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector/score
-  [2022-05-09 10:28:40,731] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.4292638897895813}}
-  [2022-05-09 10:28:40,731] [    INFO] - The vector: None
-  [2022-05-09 10:28:40,731] [    INFO] - Response time 0.175514 s.
-```
+
+  输出:
+
+  ``` bash
+    [2022-05-25 12:33:24,527] [    INFO] - vector score http client start
+    [2022-05-25 12:33:24,527] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
+    [2022-05-25 12:33:24,528] [    INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector/score
+    [2022-05-25 12:33:24,695] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
+    [2022-05-25 12:33:24,696] [    INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
+    [2022-05-25 12:33:24,696] [    INFO] - Response time 0.168271 s.
+  ```

 * Python API

-``` python 
-from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor
-
-vectorclient_executor = VectorClientExecutor()
-res = vectorclient_executor(
-    input=None,
-    enroll_audio="85236145389.wav",
-    test_audio="123456789.wav",
-    server_ip="127.0.0.1",
-    port=8090,
-    task="score")
-print(res)
-```
+  ``` python 
+  from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor

-* 输出:
+  vectorclient_executor = VectorClientExecutor()
+  res = vectorclient_executor(
+      input=None,
+      enroll_audio="85236145389.wav",
+      test_audio="123456789.wav",
+      server_ip="127.0.0.1",
+      port=8090,
+      task="score")
+  print(res)
+  ```

-``` bash
-[2022-05-09 10:34:54,769] [    INFO] - vector score http client start
-[2022-05-09 10:34:54,771] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
-[2022-05-09 10:34:54,771] [    INFO] - endpoint: http://127.0.0.1:8590/paddlespeech/vector/score
-[2022-05-09 10:34:55,026] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.4292638897895813}}
-```
+  输出:
+
+  ``` bash
+  [2022-05-25 12:30:14,143] [    INFO] - vector score http client start
+  [2022-05-25 12:30:14,143] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
+  [2022-05-25 12:30:14,143] [    INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector/score
+  [2022-05-25 12:30:14,363] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
+  {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
+  ```


 ### 8. 标点预测
  
  **注意：** 初次使用客户端时响应时间会略长
-  - 命令行 (推荐使用)
+- 命令行 (推荐使用)

  若 `127.0.0.1` 不能访问，则需要使用实际服务 IP 地址

@ -411,17 +414,17 @@ print(res)
  ```

 ## 服务支持的模型
-### ASR支持的模型
-通过 `paddlespeech_server stats --task asr` 获取ASR服务支持的所有模型，其中静态模型可用于 paddle inference 推理。 
+### ASR 支持的模型
+通过 `paddlespeech_server stats --task asr` 获取 ASR 服务支持的所有模型，其中静态模型可用于 paddle inference 推理。 

-### TTS支持的模型
-通过 `paddlespeech_server stats --task tts` 获取TTS服务支持的所有模型，其中静态模型可用于 paddle inference 推理。
+### TTS 支持的模型
+通过 `paddlespeech_server stats --task tts` 获取 TTS 服务支持的所有模型，其中静态模型可用于  paddle inference 推理。

-### CLS支持的模型
-通过 `paddlespeech_server stats --task cls` 获取CLS服务支持的所有模型，其中静态模型可用于 paddle inference 推理。
+### CLS 支持的模型
+通过 `paddlespeech_server stats --task cls` 获取 CLS 服务支持的所有模型，其中静态模型可用于  paddle inference 推理。

-### Vector支持的模型
-通过 `paddlespeech_server stats --task vector` 获取Vector服务支持的所有模型。
+### Vector 支持的模型
+通过 `paddlespeech_server stats --task vector` 获取 Vector 服务支持的所有模型。

 ### Text支持的模型
-通过 `paddlespeech_server stats --task text` 获取Text服务支持的所有模型。
+通过 `paddlespeech_server stats --task text` 获取 Text 服务支持的所有模型。
--- a/demos/speech_server/start_multi_progress_server.py
+++ b/demos/speech_server/start_multi_progress_server.py
@ -0,0 +1,70 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import warnings
+
+import uvicorn
+from fastapi import FastAPI
+from starlette.middleware.cors import CORSMiddleware
+
+from paddlespeech.server.engine.engine_pool import init_engine_pool
+from paddlespeech.server.restful.api import setup_router as setup_http_router
+from paddlespeech.server.utils.config import get_config
+from paddlespeech.server.ws.api import setup_router as setup_ws_router
+warnings.filterwarnings("ignore")
+import sys
+
+app = FastAPI(
+    title="PaddleSpeech Serving API", description="Api", version="0.0.1")
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"])
+
+# change yaml file here
+config_file = "./conf/application.yaml"
+config = get_config(config_file)
+
+# init engine
+if not init_engine_pool(config):
+    print("Failed to init engine.")
+    sys.exit(-1)
+
+# get api_router
+api_list = list(engine.split("_")[0] for engine in config.engine_list)
+if config.protocol == "websocket":
+    api_router = setup_ws_router(api_list)
+elif config.protocol == "http":
+    api_router = setup_http_router(api_list)
+else:
+    raise Exception("unsupported protocol")
+    sys.exit(-1)
+
+# app needs to operate outside the main function 
+app.include_router(api_router)
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(add_help=True)
+    parser.add_argument(
+        "--workers", type=int, help="workers of server", default=1)
+    args = parser.parse_args()
+
+    uvicorn.run(
+        "start_multi_progress_server:app",
+        host=config.host,
+        port=config.port,
+        debug=True,
+        workers=args.workers)
--- a/demos/speech_translation/README.md
+++ b/demos/speech_translation/README.md
@ -47,7 +47,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import STExecutor
+  from paddlespeech.cli.st import STExecutor

  st_executor = STExecutor()
  text = st_executor(
--- a/demos/speech_translation/README_cn.md
+++ b/demos/speech_translation/README_cn.md
@ -47,7 +47,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import STExecutor
+  from paddlespeech.cli.st import STExecutor
  
  st_executor = STExecutor()
  text = st_executor(
--- a/demos/streaming_asr_server/README.md
+++ b/demos/streaming_asr_server/README.md
@ -33,6 +33,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
  ```bash
  # in PaddleSpeech/demos/streaming_asr_server start the service
   paddlespeech_server start --config_file ./conf/ws_conformer_wenetspeech_application.yaml
+  # if you want to increase decoding speed, you can use the config file below, it will increase decoding speed and reduce accuracy  
+   paddlespeech_server start --config_file ./conf/ws_conformer_wenetspeech_application_faster.yaml
  ```

  Usage:
--- a/demos/streaming_asr_server/README_cn.md
+++ b/demos/streaming_asr_server/README_cn.md
@ -21,7 +21,7 @@
 下载好 `PaddleSpeech` 之后，进入到 `PaddleSpeech/demos/streaming_asr_server` 目录。
 配置文件可参见该目录下 `conf/ws_application.yaml` 和 `conf/ws_conformer_wenetspeech_application.yaml` 。

-目前服务集成的模型有： DeepSpeech2和 conformer模型，对应的配置文件如下：
+目前服务集成的模型有： DeepSpeech2 和 conformer模型，对应的配置文件如下：
 * DeepSpeech: `conf/ws_application.yaml`
 * conformer: `conf/ws_conformer_wenetspeech_application.yaml`

@ -40,6 +40,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
  ```bash
  # 在 PaddleSpeech/demos/streaming_asr_server 目录启动服务
  paddlespeech_server start --config_file ./conf/ws_conformer_wenetspeech_application.yaml
+  # 你如果愿意为了增加解码的速度而牺牲一定的模型精度，你可以使用如下的脚本 
+   paddlespeech_server start --config_file ./conf/ws_conformer_wenetspeech_application_faster.yaml
  ```

  使用方法：
--- a/demos/streaming_asr_server/conf/application.yaml
+++ b/demos/streaming_asr_server/conf/application.yaml
@ -31,6 +31,8 @@ asr_online:
    force_yes: True
    device: 'cpu' # cpu or gpu:id
    decode_method: "attention_rescoring"
+    continuous_decoding: True # enable continue decoding when endpoint detected
+
    am_predictor_conf:
        device:  # set 'gpu:id' or 'cpu'
        switch_ir_optim: True
--- a/demos/streaming_asr_server/conf/ws_conformer_application.yaml
+++ b/demos/streaming_asr_server/conf/ws_conformer_application.yaml
@ -4,7 +4,7 @@
 #                             SERVER SETTING                                    #
 #################################################################################
 host: 0.0.0.0
-port: 8090
+port: 8091

 # The task format in the engin_list is: <speech task>_<engine type>
 # task choices = ['asr_online']
@ -28,8 +28,12 @@ asr_online:
    sample_rate: 16000
    cfg_path: 
    decode_method: 
+    num_decoding_left_chunks: -1
    force_yes: True
    device: 'cpu' # cpu or gpu:id
+    decode_method: "attention_rescoring"
+    continuous_decoding: True # enable continue decoding when endpoint detected
+
    am_predictor_conf:
        device:  # set 'gpu:id' or 'cpu'
        switch_ir_optim: True
--- a/demos/streaming_asr_server/conf/ws_conformer_wenetspeech_application.yaml
+++ b/demos/streaming_asr_server/conf/ws_conformer_wenetspeech_application.yaml
@ -31,6 +31,8 @@ asr_online:
    force_yes: True
    device: 'cpu' # cpu or gpu:id
    decode_method: "attention_rescoring"
+    continuous_decoding: True # enable continue decoding when endpoint detected
+    num_decoding_left_chunks: -1
    am_predictor_conf:
        device:  # set 'gpu:id' or 'cpu'
        switch_ir_optim: True
--- a/demos/streaming_asr_server/conf/ws_conformer_wenetspeech_application_faster.yaml
+++ b/demos/streaming_asr_server/conf/ws_conformer_wenetspeech_application_faster.yaml
@ -0,0 +1,48 @@
+# This is the parameter configuration file for PaddleSpeech Serving.
+
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8090
+
+# The task format in the engin_list is: <speech task>_<engine type>
+# task choices = ['asr_online']
+# protocol = ['websocket'] (only one can be selected).
+# websocket only support online engine type.
+protocol: 'websocket'
+engine_list: ['asr_online']
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### ASR #########################################
+################### speech task: asr; engine_type: online #######################
+asr_online:
+    model_type: 'conformer_online_wenetspeech'
+    am_model: # the pdmodel file of am static model [optional]
+    am_params:  # the pdiparams file of am static model [optional]
+    lang: 'zh'
+    sample_rate: 16000
+    cfg_path: 
+    decode_method: 
+    force_yes: True
+    device: 'cpu' # cpu or gpu:id
+    decode_method: "attention_rescoring"
+    continuous_decoding: True # enable continue decoding when endpoint detected
+    num_decoding_left_chunks: 16
+    am_predictor_conf:
+        device:  # set 'gpu:id' or 'cpu'
+        switch_ir_optim: True
+        glog_info: False  # True -> print glog
+        summary: True  # False -> do not show predictor config
+
+    chunk_buffer_conf:
+        window_n: 7     # frame
+        shift_n: 4      # frame
+        window_ms: 25   # ms
+        shift_ms: 10    # ms
+        sample_rate: 16000
+        sample_width: 2
--- a/demos/streaming_asr_server/conf/ws_ds2_application.yaml
+++ b/demos/streaming_asr_server/conf/ws_ds2_application.yaml
@ -28,6 +28,7 @@ asr_online:
    sample_rate: 16000
    cfg_path: 
    decode_method: 
+    num_decoding_left_chunks: 
    force_yes: True
    device: 'cpu' # cpu or gpu:id

--- a/demos/streaming_asr_server/server.sh
+++ b/demos/streaming_asr_server/server.sh
@ -4,5 +4,6 @@ export CUDA_VISIBLE_DEVICE=0,1,2,3
 # nohup python3 punc_server.py --config_file conf/punc_application.yaml > punc.log 2>&1 &
 paddlespeech_server start --config_file conf/punc_application.yaml &> punc.log &

-# nohup python3 streaming_asr_server.py --config_file conf/ws_conformer_application.yaml > streaming_asr.log 2>&1 &
-paddlespeech_server start --config_file conf/ws_conformer_application.yaml &> streaming_asr.log  &
+# nohup python3 streaming_asr_server.py --config_file conf/ws_conformer_wenetspeech_application.yaml > streaming_asr.log 2>&1 &
+paddlespeech_server start --config_file conf/ws_conformer_wenetspeech_application.yaml &> streaming_asr.log  &
+
--- a/demos/streaming_asr_server/test.sh
+++ b/demos/streaming_asr_server/test.sh
@ -9,4 +9,5 @@ paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --input ./zh.wa
 # read the wav and call streaming and punc service
 # If `127.0.0.1` is not accessible, you need to use the actual service IP address.
 # python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --wavfile ./zh.wav
-paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --input ./zh.wav
+paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --input ./zh.wav
+
--- a/demos/streaming_asr_server/web/templates/index.html
+++ b/demos/streaming_asr_server/web/templates/index.html
@ -93,6 +93,7 @@

    function parseResult(data) {
      var data = JSON.parse(data)
+      console.log('result json:', data)
      var result = data.result
      console.log(result)
      $("#resultPanel").html(result)
--- a/demos/streaming_asr_server/websocket_client.py
+++ b/demos/streaming_asr_server/websocket_client.py
@ -13,9 +13,7 @@
 # limitations under the License.
 #!/usr/bin/python
 # -*- coding: UTF-8 -*-
-
 # script for calc RTF: grep -rn RTF log.txt | awk '{print $NF}' | awk -F "=" '{sum += $NF} END {print "all time",sum, "audio num", NR,  "RTF", sum/NR}'
-
 import argparse
 import asyncio
 import codecs
--- a/demos/streaming_tts_server/README.md
+++ b/demos/streaming_tts_server/README.md
@ -27,7 +27,7 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
 - In streaming voc inference, one chunk of data is inferred at a time to achieve a streaming effect. Where `voc_block` indicates the number of valid frames in the chunk, and `voc_pad` indicates the number of frames added before and after the voc_block in a chunk. The existence of voc_pad is used to eliminate errors caused by streaming inference and avoid the influence of streaming inference on the quality of synthesized audio.
    - Both hifigan and mb_melgan support streaming voc inference.
    - When the voc model is mb_melgan, when voc_pad=14, the synthetic audio for streaming inference is consistent with the non-streaming synthetic audio; the minimum voc_pad can be set to 7, and the synthetic audio has no abnormal hearing. If the voc_pad is less than 7, the synthetic audio sounds abnormal.
-    - When the voc model is hifigan, when voc_pad=20, the streaming inference synthetic audio is consistent with the non-streaming synthetic audio; when voc_pad=14, the synthetic audio has no abnormal hearing.
+    - When the voc model is hifigan, when voc_pad=19, the streaming inference synthetic audio is consistent with the non-streaming synthetic audio; when voc_pad=14, the synthetic audio has no abnormal hearing.
 - Inference speed: mb_melgan > hifigan; Audio quality: mb_melgan < hifigan
 - **Note:** If the service can be started normally in the container, but the client access IP is unreachable, you can try to replace the `host` address in the configuration file with the local IP address.

--- a/demos/streaming_tts_server/README_cn.md
+++ b/demos/streaming_tts_server/README_cn.md
@ -27,7 +27,7 @@
 - 流式 voc 推理中，每次会对一个 chunk 的数据进行推理以达到流式的效果。其中 `voc_block` 表示chunk中的有效帧数，`voc_pad` 表示一个 chunk 中 voc_block 前后各加的帧数。voc_pad 的存在用于消除流式推理产生的误差，避免由流式推理对合成音频质量的影响。
    - hifigan, mb_melgan 均支持流式 voc 推理
    - 当 voc 模型为 mb_melgan，当 voc_pad=14 时，流式推理合成音频与非流式合成音频一致；voc_pad 最小可以设置为7，合成音频听感上没有异常，若 voc_pad 小于7，合成音频听感上存在异常。
-    - 当 voc 模型为 hifigan，当 voc_pad=20 时，流式推理合成音频与非流式合成音频一致；当 voc_pad=14 时，合成音频听感上没有异常。
+    - 当 voc 模型为 hifigan，当 voc_pad=19 时，流式推理合成音频与非流式合成音频一致；当 voc_pad=14 时，合成音频听感上没有异常。
 - 推理速度：mb_melgan > hifigan; 音频质量：mb_melgan < hifigan
 - **注意：** 如果在容器里可正常启动服务，但客户端访问 ip 不可达，可尝试将配置文件中 `host` 地址换成本地 ip 地址。

--- a/demos/streaming_tts_server/conf/tts_online_application.yaml
+++ b/demos/streaming_tts_server/conf/tts_online_application.yaml
@ -47,7 +47,7 @@ tts_online:
    am_pad: 12
    # voc_pad and voc_block voc model to streaming voc infer,
    # when voc model is mb_melgan_csmsc, voc_pad set 14, streaming synthetic audio is the same as non-streaming synthetic audio; The minimum value of pad can be set to 7, streaming synthetic audio sounds normal
-    # when voc model is hifigan_csmsc, voc_pad set 20, streaming synthetic audio is the same as non-streaming synthetic audio; voc_pad set 14, streaming synthetic audio sounds normal
+    # when voc model is hifigan_csmsc, voc_pad set 19, streaming synthetic audio is the same as non-streaming synthetic audio; voc_pad set 14, streaming synthetic audio sounds normal
    voc_block: 36
    voc_pad: 14
    
@ -95,7 +95,7 @@ tts_online-onnx:
    am_pad: 12
    # voc_pad and voc_block voc model to streaming voc infer,
    # when voc model is mb_melgan_csmsc_onnx, voc_pad set 14, streaming synthetic audio is the same as non-streaming synthetic audio; The minimum value of pad can be set to 7, streaming synthetic audio sounds normal
-    # when voc model is hifigan_csmsc_onnx, voc_pad set 20, streaming synthetic audio is the same as non-streaming synthetic audio; voc_pad set 14, streaming synthetic audio sounds normal
+    # when voc model is hifigan_csmsc_onnx, voc_pad set 19, streaming synthetic audio is the same as non-streaming synthetic audio; voc_pad set 14, streaming synthetic audio sounds normal
    voc_block: 36
    voc_pad: 14
    # voc_upsample should be same as n_shift on voc config.
--- a/demos/text_to_speech/README.md
+++ b/demos/text_to_speech/README.md
@ -77,7 +77,7 @@ The input of this demo should be a text of the specific language that can be pas
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import TTSExecutor
+  from paddlespeech.cli.tts import TTSExecutor

  tts_executor = TTSExecutor()
  wav_file = tts_executor(
--- a/demos/text_to_speech/README_cn.md
+++ b/demos/text_to_speech/README_cn.md
@ -80,7 +80,7 @@
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import TTSExecutor
+  from paddlespeech.cli.tts import TTSExecutor

  tts_executor = TTSExecutor()
  wav_file = tts_executor(
--- a/docker/ubuntu18-cpu/Dockerfile
+++ b/docker/ubuntu18-cpu/Dockerfile
@ -0,0 +1,15 @@
+FROM registry.baidubce.com/paddlepaddle/paddle:2.2.2
+LABEL maintainer="paddlesl@baidu.com"
+
+RUN git clone --depth 1 https://github.com/PaddlePaddle/PaddleSpeech.git /home/PaddleSpeech  
+RUN pip3 uninstall mccabe -y ; exit 0;
+RUN pip3 install multiprocess==0.70.12 importlib-metadata==4.2.0 dill==0.3.4
+
+RUN cd /home/PaddleSpeech/audio
+RUN python setup.py bdist_wheel
+
+RUN cd /home/PaddleSpeech
+RUN python setup.py bdist_wheel
+RUN pip install audio/dist/*.whl dist/*.whl
+
+WORKDIR /home/PaddleSpeech/
--- a/docs/paddlespeech.pdf
+++ b/docs/paddlespeech.pdf
--- a/docs/source/asr/PPASR_cn.md
+++ b/docs/source/asr/PPASR_cn.md
@ -92,5 +92,3 @@ server 的 demo： [streaming_asr_server](https://github.com/PaddlePaddle/Paddle
 ## 4. 快速开始

 关于如果使用 PP-ASR，可以看这里的 [install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)，其中提供了 **简单**、**中等**、**困难** 三种安装方式。如果想体验 paddlespeech 的推理功能，可以用 **简单** 安装方式。
-
-
--- a/docs/source/audio/_static/custom.css
+++ b/docs/source/audio/_static/custom.css
--- a/docs/source/audio/_templates/module.rst_t
+++ b/docs/source/audio/_templates/module.rst_t
--- a/docs/source/audio/_templates/package.rst_t
+++ b/docs/source/audio/_templates/package.rst_t
--- a/docs/source/audio/_templates/toc.rst_t
+++ b/docs/source/audio/_templates/toc.rst_t
--- a/docs/source/audio/conf.py
+++ b/docs/source/audio/conf.py
--- a/docs/source/audio/index.rst
+++ b/docs/source/audio/index.rst
--- a/docs/source/cls/custom_dataset.md
+++ b/docs/source/cls/custom_dataset.md
@ -1,8 +1,8 @@
 # Customize Dataset for Audio Classification

-Following this tutorial you can customize your dataset for audio classification task by using `paddlespeech` and `paddleaudio`.
+Following this tutorial you can customize your dataset for audio classification task by using `paddlespeech`.

-A base class of classification dataset is `paddleaudio.dataset.AudioClassificationDataset`. To customize your dataset you should write a dataset class derived from `AudioClassificationDataset`. 
+A base class of classification dataset is `paddlespeech.audio.dataset.AudioClassificationDataset`. To customize your dataset you should write a dataset class derived from `AudioClassificationDataset`. 

 Assuming you have some wave files that stored in your own directory. You should prepare a meta file with the information of filepaths and labels. For example the absolute path of it is `/PATH/TO/META_FILE.txt`:
 ```
@ -14,7 +14,7 @@ Assuming you have some wave files that stored in your own directory. You should
 Here is an example to build your custom dataset in `custom_dataset.py`:

 ```python
-from paddleaudio.datasets.dataset import AudioClassificationDataset
+from paddlespeech.audio.datasets.dataset import AudioClassificationDataset

 class CustomDataset(AudioClassificationDataset):
    meta_file = '/PATH/TO/META_FILE.txt'
@ -48,7 +48,7 @@ class CustomDataset(AudioClassificationDataset):
 Then you can build dataset and data loader from `CustomDataset`:
 ```python
 import paddle
-from paddleaudio.features import LogMelSpectrogram
+from paddlespeech.audio.features import LogMelSpectrogram

 from custom_dataset import CustomDataset

--- a/docs/source/install.md
+++ b/docs/source/install.md
@ -4,7 +4,7 @@ There are 3 ways to use `PaddleSpeech`. According to the degree of difficulty, t

 | Way | Function                                                     | Support|
 |:---- |:----------------------------------------------------------- |:----|
-| Easy     | (1) Use command-line functions of PaddleSpeech. <br> (2) Experience PaddleSpeech on Ai Studio. | Linux, Mac(not support M1 chip)，Windows |
+| Easy     | (1) Use command-line functions of PaddleSpeech. <br> (2) Experience PaddleSpeech on Ai Studio. | Linux, Mac(not support M1 chip)，Windows ( For more information about installation, see [#1195](https://github.com/PaddlePaddle/PaddleSpeech/discussions/1195)) |
 | Medium     | Support major functions ，such as using the` ready-made `examples and using PaddleSpeech to train your model.                                           | Linux |
 | Hard     | Support full function of Paddlespeech, including using join ctc decoder with kaldi, training n-gram language model, Montreal-Forced-Aligner, and so on. And you are more able to be a developer! | Ubuntu |

--- a/docs/source/install_cn.md
+++ b/docs/source/install_cn.md
@ -3,7 +3,7 @@
 `PaddleSpeech` 有三种安装方法。根据安装的难易程度，这三种方法可以分为 **简单**, **中等** 和 **困难**.
 | 方式 | 功能                                                         | 支持系统            |
 | :--- | :----------------------------------------------------------- | :------------------ |
-| 简单 | (1) 使用 PaddleSpeech 的命令行功能. <br> (2) 在 Aistudio上体验 PaddleSpeech. | Linux, Mac(不支持M1芯片)，Windows |
+| 简单 | (1) 使用 PaddleSpeech 的命令行功能. <br> (2) 在 Aistudio上体验 PaddleSpeech. | Linux, Mac(不支持M1芯片)，Windows (安装详情查看[#1195](https://github.com/PaddlePaddle/PaddleSpeech/discussions/1195)) |
 | 中等 | 支持 PaddleSpeech 主要功能，比如使用已有 examples 中的模型和使用 PaddleSpeech 来训练自己的模型. | Linux               |
 | 困难 | 支持 PaddleSpeech 的各项功能，包含结合kaldi使用 join ctc decoder 方式解码，训练语言模型,使用强制对齐等。并且你更能成为一名开发者！ | Ubuntu              |
 ## 先决条件
--- a/docs/source/released_model.md
+++ b/docs/source/released_model.md
@ -6,15 +6,15 @@
 ### Speech Recognition Model
 Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER | Hours of speech | Example Link 
 :-------------:| :------------:| :-----: | -----: | :-----: |:-----:| :-----:  | :-----:  | :-----: 
-[Ds2 Online Wenetspeech ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr0/asr0_deepspeech2_online_wenetspeech_ckpt_1.0.0a.model.tar.gz) | Wenetspeech Dataset | Char-based | 1.2 GB  | 2 Conv + 5 LSTM layers | 0.152 (test\_net, w/o LM) <br> 0.2417 (test\_meeting, w/o LM) <br> 0.053 (aishell, w/ LM) |-| 10000 h |- 
+[Ds2 Online Wenetspeech ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr0/asr0_deepspeech2_online_wenetspeech_ckpt_1.0.2.model.tar.gz) | Wenetspeech Dataset | Char-based | 1.2 GB  | 2 Conv + 5 LSTM layers | 0.152 (test\_net, w/o LM) <br> 0.2417 (test\_meeting, w/o LM) <br> 0.053 (aishell, w/ LM) |-| 10000 h |- 
 [Ds2 Online Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_fbank161_ckpt_0.2.1.model.tar.gz) | Aishell Dataset | Char-based | 491 MB  | 2 Conv + 5 LSTM layers | 0.0666 |-| 151 h | [D2 Online Aishell ASR0](../../examples/aishell/asr0) 
-[Ds2 Offline Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz)| Aishell Dataset | Char-based | 306 MB | 2 Conv + 3 bidirectional GRU layers| 0.064 |-| 151 h | [Ds2 Offline Aishell ASR0](../../examples/aishell/asr0) 
+[Ds2 Offline Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_offline_aishell_ckpt_1.0.1.model.tar.gz)| Aishell Dataset | Char-based | 1.4 GB | 2 Conv + 5 bidirectional LSTM layers| 0.0554 |-| 151 h | [Ds2 Offline Aishell ASR0](../../examples/aishell/asr0) 
 [Conformer Online Wenetspeech ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr1/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz) | WenetSpeech Dataset | Char-based | 457 MB  | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring| 0.11 (test\_net) 0.1879 (test\_meeting) |-| 10000 h |- 
 [Conformer Online Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_chunk_conformer_aishell_ckpt_0.2.0.model.tar.gz) | Aishell Dataset | Char-based | 189 MB  | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring| 0.0544 |-| 151 h | [Conformer Online Aishell ASR1](../../examples/aishell/asr1) 
 [Conformer Offline Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_conformer_aishell_ckpt_0.1.2.model.tar.gz) | Aishell Dataset | Char-based | 189 MB  | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0464 |-| 151 h | [Conformer Offline Aishell ASR1](../../examples/aishell/asr1) 
 [Transformer Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_transformer_aishell_ckpt_0.1.1.model.tar.gz) | Aishell Dataset | Char-based | 128 MB | Encoder:Transformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0523 || 151 h | [Transformer  Aishell ASR1](../../examples/aishell/asr1) 
-[Ds2 Offline Librispeech ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr0/asr0_deepspeech2_librispeech_ckpt_0.1.1.model.tar.gz)| Librispeech Dataset | Char-based | 518 MB | 2 Conv + 3 bidirectional LSTM layers| - |0.0725| 960 h | [Ds2 Offline Librispeech ASR0](../../examples/librispeech/asr0) 
-[Conformer Librispeech ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz) | Librispeech Dataset | subword-based | 191 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring |-| 0.0337 | 960 h | [Conformer Librispeech ASR1](../../examples/librispeech/asr1) 
+[Ds2 Offline Librispeech ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr0/asr0_deepspeech2_offline_librispeech_ckpt_1.0.1.model.tar.gz)| Librispeech Dataset | Char-based | 1.3 GB | 2 Conv + 5 bidirectional LSTM layers| - |0.0467| 960 h | [Ds2 Offline Librispeech ASR0](../../examples/librispeech/asr0) 
+[Conformer Librispeech ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz) | Librispeech Dataset | subword-based | 191 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring |-| 0.0338 | 960 h | [Conformer Librispeech ASR1](../../examples/librispeech/asr1) 
 [Transformer Librispeech ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/asr1_transformer_librispeech_ckpt_0.1.1.model.tar.gz) | Librispeech Dataset | subword-based | 131 MB  | Encoder:Transformer, Decoder:Transformer, Decoding method: Attention rescoring |-| 0.0381 | 960 h | [Transformer Librispeech ASR1](../../examples/librispeech/asr1) 
 [Transformer Librispeech ASR2 Model](https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr2/asr2_transformer_librispeech_ckpt_0.1.1.model.tar.gz) | Librispeech Dataset | subword-based | 131 MB  | Encoder:Transformer, Decoder:Transformer, Decoding method: JoinCTC w/ LM |-| 0.0240 | 960 h | [Transformer Librispeech ASR2](../../examples/librispeech/asr2) 

@ -82,17 +82,9 @@ PANN | ESC-50 |[pann-esc50](../../examples/esc50/cls0)|[esc50_cnn6.tar.gz](https

 Model Type | Dataset| Example Link | Pretrained Models | Static Models 
 :-------------:| :------------:| :-----: | :-----: | :-----:
-PANN | VoxCeleb| [voxceleb_ecapatdnn](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/voxceleb/sv0) | [ecapatdnn.tar.gz](https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_0.tar.gz) | -
+ECAPA-TDNN | VoxCeleb| [voxceleb_ecapatdnn](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/voxceleb/sv0) | [ecapatdnn.tar.gz](https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_1.tar.gz) | -

 ## Punctuation Restoration Models
 Model Type | Dataset| Example Link | Pretrained Models
 :-------------:| :------------:| :-----: | :-----:
 Ernie Linear | IWLST2012_zh |[iwslt2012_punc0](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/iwslt2012/punc0)|[ernie_linear_p3_iwslt2012_zh_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/text/ernie_linear_p3_iwslt2012_zh_ckpt_0.1.1.zip)
-
-## Speech Recognition Model  from paddle 1.8
-
-| Acoustic Model |Training Data| Token-based | Size | Descriptions | CER | WER | Hours of speech |
-| :-----:| :-----:  |  :-----:  |  :-----:  | :-----:  |  :-----: | :-----:  | :-----: |
-| [Ds2 Offline Aishell model](https://deepspeech.bj.bcebos.com/mandarin_models/aishell_model_v1.8_to_v2.x.tar.gz) |        Aishell Dataset  | Char-based  | 234 MB | 2 Conv + 3 bidirectional GRU layers  | 0.0804 | —  | 151 h  |
-| [Ds2 Offline Librispeech model](https://deepspeech.bj.bcebos.com/eng_models/librispeech_v1.8_to_v2.x.tar.gz) |      Librispeech Dataset | Word-based  | 307 MB | 2 Conv + 3 bidirectional sharing weight RNN layers | —  | 0.0685 | 960 h |
-| [Ds2 Offline Baidu en8k model](https://deepspeech.bj.bcebos.com/eng_models/baidu_en8k_v1.8_to_v2.x.tar.gz) | Baidu Internal English Dataset | Word-based  | 273 MB | 2 Conv + 3 bidirectional GRU layers   |—  | 0.0541 | 8628 h|
--- a/examples/aishell/asr0/RESULTS.md
+++ b/examples/aishell/asr0/RESULTS.md
@ -12,7 +12,8 @@
 ## Deepspeech2 Non-Streaming

 | Model | Number of Params | Release | Config | Test set | Valid Loss | CER |  
-| --- | --- | --- | --- | --- | --- | --- |  
+| --- | --- | --- | --- | --- | --- | --- |
+| DeepSpeech2 | 122.3M | r1.0.1 | conf/deepspeech2.yaml + U2 Data pipline and spec aug + fbank161 | test | 5.780756044387817 | 0.055400 | 
 | DeepSpeech2 | 58.4M | v2.2.0 | conf/deepspeech2.yaml + spec aug | test | 5.738585948944092 | 0.064000 |  
 | DeepSpeech2 | 58.4M | v2.1.0 | conf/deepspeech2.yaml + spec aug | test | 7.483316898345947 | 0.077860 |  
 | DeepSpeech2 | 58.4M | v2.1.0 | conf/deepspeech2.yaml | test | 7.299022197723389 | 0.078671 |
--- a/examples/aishell/asr0/conf/augmentation.json
+++ b/examples/aishell/asr0/conf/augmentation.json
@ -1,36 +0,0 @@
-[
-  {
-    "type": "speed",
-    "params": {
-      "min_speed_rate": 0.9,
-      "max_speed_rate": 1.1,
-      "num_rates": 3
-    },
-    "prob": 0.0
-  },
-  {
-    "type": "shift",
-    "params": {
-      "min_shift_ms": -5,
-      "max_shift_ms": 5
-    },
-    "prob": 1.0
-  },
-  {
-    "type": "specaug",
-    "params": {
-      "W": 0,
-      "warp_mode": "PIL",
-      "F": 10,
-      "n_freq_masks": 2,
-      "T": 50,
-      "n_time_masks": 2,
-      "p": 1.0,
-      "adaptive_number_ratio": 0,
-      "adaptive_size_ratio": 0,
-      "max_n_time_masks": 20,
-      "replace_with_zero": true
-    },
-    "prob": 1.0
-  }
-]
--- a/examples/aishell/asr0/conf/deepspeech2.yaml
+++ b/examples/aishell/asr0/conf/deepspeech2.yaml
@ -15,50 +15,53 @@ max_output_input_ratio: .inf
 ###########################################
 #              Dataloader                 #
 ###########################################
-batch_size: 64 # one gpu
-mean_std_filepath: data/mean_std.json
-unit_type: char
 vocab_filepath: data/lang_char/vocab.txt 
-augmentation_config: conf/augmentation.json
-random_seed: 0
-spm_model_prefix: 
-spectrum_type: linear
+spm_model_prefix: ''
+unit_type: 'char'
+preprocess_config: conf/preprocess.yaml
 feat_dim: 161
-delta_delta: False
 stride_ms: 10.0
-window_ms: 20.0
-n_fft: None
-max_freq: None
-target_sample_rate: 16000
-use_dB_normalization: True
-target_dB: -20
-dither: 1.0
-keep_transcription_text: False
-sortagrad: True
-shuffle_method: batch_shuffle
-num_workers: 2
+window_ms: 25.0
+sortagrad: 0 # Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs 
+batch_size: 64
+maxlen_in: 512  # if input length  > maxlen-in, batchsize is automatically reduced
+maxlen_out: 150  # if output length > maxlen-out, batchsize is automatically reduced
+minibatches: 0 # for debug
+batch_count: auto
+batch_bins: 0 
+batch_frames_in: 0
+batch_frames_out: 0
+batch_frames_inout: 0
+num_workers: 8
+subsampling_factor: 1
+num_encs: 1

 ############################################
 #           Network Architecture           #
 ############################################
 num_conv_layers: 2
-num_rnn_layers: 3
+num_rnn_layers: 5
 rnn_layer_size: 1024
-use_gru: True 
-share_rnn_weights: False
+rnn_direction: bidirect # [forward, bidirect]
+num_fc_layers: 0
+fc_layers_size_list: -1,
+use_gru: False 
 blank_id: 0
-ctc_grad_norm_type: instance 
-
+  
+  
 ###########################################
 #                Training                 #
 ###########################################
-n_epoch: 80
+n_epoch: 50
 accum_grad: 1
-lr: 2.0e-3
-lr_decay: 0.83
+lr: 5.0e-4
+lr_decay: 0.93
 weight_decay: 1.0e-6
 global_grad_clip: 3.0
-log_interval: 100
+dist_sampler: False
+log_interval: 1
 checkpoint:
  kbest_n: 50
  latest_n: 5
+
+  
--- a/examples/aishell/asr0/conf/deepspeech2_online.yaml
+++ b/examples/aishell/asr0/conf/deepspeech2_online.yaml
@ -15,28 +15,26 @@ max_output_input_ratio: .inf
 ###########################################
 #              Dataloader                 #
 ###########################################
-batch_size: 64 # one gpu
-mean_std_filepath: data/mean_std.json
-unit_type: char
 vocab_filepath: data/lang_char/vocab.txt 
-augmentation_config: conf/augmentation.json
-random_seed: 0
-spm_model_prefix: 
-spectrum_type: linear #linear, mfcc, fbank
+spm_model_prefix: ''
+unit_type: 'char'
+preprocess_config: conf/preprocess.yaml
 feat_dim: 161
-delta_delta: False
 stride_ms: 10.0
-window_ms: 20.0
-n_fft: None
-max_freq: None
-target_sample_rate: 16000
-use_dB_normalization: True
-target_dB: -20
-dither: 1.0
-keep_transcription_text: False
-sortagrad: True
-shuffle_method: batch_shuffle
-num_workers: 0
+window_ms: 25.0
+sortagrad: 0 # Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs 
+batch_size: 64
+maxlen_in: 512  # if input length  > maxlen-in, batchsize is automatically reduced
+maxlen_out: 150  # if output length > maxlen-out, batchsize is automatically reduced
+minibatches: 0 # for debug
+batch_count: auto
+batch_bins: 0 
+batch_frames_in: 0
+batch_frames_out: 0
+batch_frames_inout: 0
+num_workers: 8
+subsampling_factor: 1
+num_encs: 1

 ############################################
 #           Network Architecture           #
@ -54,12 +52,13 @@ blank_id: 0
 ###########################################
 #                Training                 #
 ###########################################
-n_epoch: 65
+n_epoch: 30
 accum_grad: 1
 lr: 5.0e-4
 lr_decay: 0.93
 weight_decay: 1.0e-6
 global_grad_clip: 3.0
+dist_sampler: False
 log_interval: 100
 checkpoint:
  kbest_n: 50
--- a/examples/aishell/asr0/conf/preprocess.yaml
+++ b/examples/aishell/asr0/conf/preprocess.yaml
@ -0,0 +1,25 @@
+process:
+  # extract kaldi fbank from PCM
+  - type: fbank_kaldi
+    fs: 16000
+    n_mels: 161
+    n_shift: 160
+    win_length: 400
+    dither: 0.1
+  - type: cmvn_json
+    cmvn_path: data/mean_std.json
+  # these three processes are a.k.a. SpecAugument
+  - type: time_warp
+    max_time_warp: 5
+    inplace: true
+    mode: PIL
+  - type: freq_mask
+    F: 30
+    n_mask: 2
+    inplace: true
+    replace_with_zero: false
+  - type: time_mask
+    T: 40
+    n_mask: 2
+    inplace: true
+    replace_with_zero: false
--- a/examples/aishell/asr0/conf/tuning/decode.yaml
+++ b/examples/aishell/asr0/conf/tuning/decode.yaml
@ -2,9 +2,9 @@ decode_batch_size: 128
 error_rate_type: cer 
 decoding_method: ctc_beam_search
 lang_model_path: data/lm/zh_giga.no_cna_cmn.prune01244.klm
-alpha: 1.9
-beta: 5.0
-beam_size: 300
+alpha: 2.2
+beta: 4.3
+beam_size: 500
 cutoff_prob: 0.99
 cutoff_top_n: 40
 num_proc_bsearch: 10
--- a/examples/aishell/asr0/local/data.sh
+++ b/examples/aishell/asr0/local/data.sh
@ -33,12 +33,13 @@ if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
    num_workers=$(nproc)
    python3 ${MAIN_ROOT}/utils/compute_mean_std.py \
    --manifest_path="data/manifest.train.raw" \
-    --spectrum_type="linear" \
+    --spectrum_type="fbank" \
+    --feat_dim=161 \
    --delta_delta=false \
    --stride_ms=10 \
-    --window_ms=20 \
+    --window_ms=25 \
    --sample_rate=16000 \
-    --use_dB_normalization=True \
+    --use_dB_normalization=False \
    --num_samples=2000 \
    --num_workers=${num_workers} \
    --output_path="data/mean_std.json"
--- a/examples/aishell/asr0/local/export.sh
+++ b/examples/aishell/asr0/local/export.sh
@ -1,7 +1,7 @@
 #!/bin/bash

-if [ $# != 4 ];then
-    echo "usage: $0 config_path ckpt_prefix jit_model_path model_type"
+if [ $# != 3 ];then
+    echo "usage: $0 config_path ckpt_prefix jit_model_path"
    exit -1
 fi

@ -11,14 +11,12 @@ echo "using $ngpu gpus..."
 config_path=$1
 ckpt_path_prefix=$2
 jit_model_export_path=$3
-model_type=$4

 python3 -u ${BIN_DIR}/export.py \
 --ngpu ${ngpu} \
 --config ${config_path} \
 --checkpoint_path ${ckpt_path_prefix} \
--export_path ${jit_model_export_path} \
--model_type ${model_type}
+--export_path ${jit_model_export_path}

 if [ $? -ne 0 ]; then
    echo "Failed in export!"
--- a/examples/aishell/asr0/local/test.sh
+++ b/examples/aishell/asr0/local/test.sh
@ -1,7 +1,7 @@
 #!/bin/bash

-if [ $# != 4 ];then
-    echo "usage: ${0} config_path decode_config_path ckpt_path_prefix model_type"
+if [ $# != 3 ];then
+    echo "usage: ${0} config_path decode_config_path ckpt_path_prefix"
    exit -1
 fi

@ -13,7 +13,6 @@ echo "using $ngpu gpus..."
 config_path=$1
 decode_config_path=$2
 ckpt_prefix=$3
-model_type=$4

 # download language model
 bash local/download_lm_ch.sh
@ -23,7 +22,7 @@ fi

 if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
    # format the reference test file
-    python utils/format_rsl.py \
+    python3 utils/format_rsl.py \
        --origin_ref data/manifest.test.raw \
        --trans_ref data/manifest.test.text

@ -32,8 +31,7 @@ if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
    --config ${config_path} \
    --decode_cfg ${decode_config_path} \
    --result_file ${ckpt_prefix}.rsl \
-    --checkpoint_path ${ckpt_prefix} \
-    --model_type ${model_type}
+    --checkpoint_path ${ckpt_prefix}

    if [ $? -ne 0 ]; then
        echo "Failed in evaluation!"
@ -41,25 +39,25 @@ if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
    fi

    # format the hyp file
-    python utils/format_rsl.py \
+    python3 utils/format_rsl.py \
        --origin_hyp ${ckpt_prefix}.rsl \
        --trans_hyp ${ckpt_prefix}.rsl.text

-    python utils/compute-wer.py --char=1 --v=1 \
-        data/manifest.test.text ${ckpt_prefix}.rsl.text > ${ckpt_prefix}.error 
+    python3 utils/compute-wer.py --char=1 --v=1 \
+        data/manifest.test.text ${ckpt_prefix}.rsl.text > ${ckpt_prefix}.error
 fi

 if [ ${stage} -le 101 ] && [ ${stop_stage} -ge 101 ]; then
-    python utils/format_rsl.py \
+    python3 utils/format_rsl.py \
        --origin_ref data/manifest.test.raw \
        --trans_ref_sclite data/manifest.test.text.sclite

-        python utils/format_rsl.py \
-            --origin_hyp ${ckpt_prefix}.rsl \
-            --trans_hyp_sclite ${ckpt_prefix}.rsl.text.sclite
+    python3 utils/format_rsl.py \
+        --origin_hyp ${ckpt_prefix}.rsl \
+        --trans_hyp_sclite ${ckpt_prefix}.rsl.text.sclite

-        mkdir -p ${ckpt_prefix}_sclite
-        sclite -i wsj -r data/manifest.test.text.sclite -h  ${ckpt_prefix}.rsl.text.sclite  -e utf-8 -o all -O ${ckpt_prefix}_sclite -c NOASCII
+    mkdir -p ${ckpt_prefix}_sclite
+    sclite -i wsj -r data/manifest.test.text.sclite -h  ${ckpt_prefix}.rsl.text.sclite  -e utf-8 -o all -O ${ckpt_prefix}_sclite -c NOASCII
 fi

 exit 0
--- a/examples/aishell/asr0/local/test_export.sh
+++ b/examples/aishell/asr0/local/test_export.sh
@ -1,7 +1,7 @@
 #!/bin/bash

-if [ $# != 4 ];then
-    echo "usage: ${0} config_path decode_config_path ckpt_path_prefix model_type"
+if [ $# != 3 ];then
+    echo "usage: ${0} config_path decode_config_path ckpt_path_prefix"
    exit -1
 fi

@ -11,7 +11,6 @@ echo "using $ngpu gpus..."
 config_path=$1
 decode_config_path=$2
 jit_model_export_path=$3
-model_type=$4

 # download language model
 bash local/download_lm_ch.sh > /dev/null 2>&1
@ -24,8 +23,7 @@ python3 -u ${BIN_DIR}/test_export.py \
 --config ${config_path} \
 --decode_cfg ${decode_config_path} \
 --result_file ${jit_model_export_path}.rsl \
--export_path ${jit_model_export_path} \
--model_type ${model_type}
+--export_path ${jit_model_export_path}

 if [ $? -ne 0 ]; then
    echo "Failed in evaluation!"
--- a/examples/aishell/asr0/local/test_wav.sh
+++ b/examples/aishell/asr0/local/test_wav.sh
@ -1,7 +1,7 @@
 #!/bin/bash

-if [ $# != 5 ];then
-    echo "usage: ${0} config_path decode_config_path ckpt_path_prefix model_type audio_file"
+if [ $# != 4 ];then
+    echo "usage: ${0} config_path decode_config_path ckpt_path_prefix audio_file"
    exit -1
 fi

@ -11,8 +11,7 @@ echo "using $ngpu gpus..."
 config_path=$1
 decode_config_path=$2
 ckpt_prefix=$3
-model_type=$4
-audio_file=$5
+audio_file=$4

 mkdir -p data
 wget -nc https://paddlespeech.bj.bcebos.com/datasets/single_wav/zh/demo_01_03.wav -P data/
@ -37,7 +36,6 @@ python3 -u ${BIN_DIR}/test_wav.py \
 --decode_cfg ${decode_config_path} \
 --result_file ${ckpt_prefix}.rsl \
 --checkpoint_path ${ckpt_prefix} \
--model_type ${model_type} \
 --audio_file ${audio_file}

 if [ $? -ne 0 ]; then
--- a/examples/aishell/asr0/local/train.sh
+++ b/examples/aishell/asr0/local/train.sh
@ -1,7 +1,7 @@
 #!/bin/bash

-if [ $# != 3 ];then
-    echo "usage: CUDA_VISIBLE_DEVICES=0 ${0} config_path ckpt_name model_type"
+if [ $# -lt 2 ] && [ $# -gt 3 ];then
+    echo "usage: CUDA_VISIBLE_DEVICES=0 ${0} config_path ckpt_name ips(optional)"
    exit -1
 fi

@ -10,7 +10,13 @@ echo "using $ngpu gpus..."

 config_path=$1
 ckpt_name=$2
-model_type=$3
+ips=$3
+
+if [ ! $ips ];then
+  ips_config=
+else
+  ips_config="--ips="${ips}
+fi

 mkdir -p exp

@ -25,14 +31,12 @@ python3 -u ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
 --config ${config_path} \
 --output exp/${ckpt_name} \
--model_type ${model_type} \
 --seed ${seed}
 else
-python3 -m paddle.distributed.launch --gpus=${CUDA_VISIBLE_DEVICES} ${BIN_DIR}/train.py \
+python3 -m paddle.distributed.launch --gpus=${CUDA_VISIBLE_DEVICES} ${ips_config} ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
 --config ${config_path} \
 --output exp/${ckpt_name} \
--model_type ${model_type} \
 --seed ${seed}
 fi

--- a/examples/aishell/asr0/run.sh
+++ b/examples/aishell/asr0/run.sh
@ -6,9 +6,9 @@ gpus=0,1,2,3
 stage=0
 stop_stage=100
 conf_path=conf/deepspeech2.yaml    #conf/deepspeech2.yaml or conf/deepspeech2_online.yaml
+ips=            #xx.xx.xx.xx,xx.xx.xx.xx
 decode_conf_path=conf/tuning/decode.yaml
-avg_num=1
-model_type=offline    # offline or online
+avg_num=10
 audio_file=data/demo_01_03.wav

 source ${MAIN_ROOT}/utils/parse_options.sh || exit 1;
@ -25,7 +25,7 @@ fi

 if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
    # train model, all `ckpt` under `exp` dir
-    CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path}  ${ckpt} ${model_type}
+    CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path} ${ckpt} ${ips}
 fi

 if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
@ -35,21 +35,21 @@ fi

 if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
    # test ckpt avg_n
-    CUDA_VISIBLE_DEVICES=0 ./local/test.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} ${model_type}|| exit -1
+    CUDA_VISIBLE_DEVICES=0 ./local/test.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt}|| exit -1
 fi

 if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
    # export ckpt avg_n
-    CUDA_VISIBLE_DEVICES=0 ./local/export.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} exp/${ckpt}/checkpoints/${avg_ckpt}.jit ${model_type}
+    CUDA_VISIBLE_DEVICES=0 ./local/export.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} exp/${ckpt}/checkpoints/${avg_ckpt}.jit
 fi

 if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
    # test export ckpt avg_n
-    CUDA_VISIBLE_DEVICES=0 ./local/test_export.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt}.jit ${model_type}|| exit -1
+    CUDA_VISIBLE_DEVICES=0 ./local/test_export.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt}.jit|| exit -1
 fi

 # Optionally, you can add LM and test it with runtime.
 if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then
    # test a single .wav file
-    CUDA_VISIBLE_DEVICES=0 ./local/test_wav.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} ${model_type} ${audio_file} || exit -1
+    CUDA_VISIBLE_DEVICES=0 ./local/test_wav.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} ${audio_file} || exit -1
 fi
--- a/examples/aishell/asr1/local/train.sh
+++ b/examples/aishell/asr1/local/train.sh
@ -17,13 +17,21 @@ if [ ${seed} != 0  ]; then
    echo "using seed $seed & FLAGS_cudnn_deterministic=True ..."
 fi

-if [ $# != 2 ];then
-    echo "usage: CUDA_VISIBLE_DEVICES=0 ${0} config_path ckpt_name"
+if [ $# -lt 2 ] && [ $# -gt 3 ];then
+    echo "usage: CUDA_VISIBLE_DEVICES=0 ${0} config_path ckpt_name ips(optional)"
    exit -1
 fi

 config_path=$1
 ckpt_name=$2
+ips=$3
+
+if [ ! $ips ];then
+  ips_config=
+else
+  ips_config="--ips="${ips}
+fi
+echo ${ips_config}

 mkdir -p exp

@ -37,7 +45,7 @@ python3 -u ${BIN_DIR}/train.py \
 --benchmark-batch-size ${benchmark_batch_size} \
 --benchmark-max-step ${benchmark_max_step}
 else
-python3 -m paddle.distributed.launch --gpus=${CUDA_VISIBLE_DEVICES} ${BIN_DIR}/train.py \
+python3 -m paddle.distributed.launch --gpus=${CUDA_VISIBLE_DEVICES} ${ips_config} ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
 --seed ${seed} \
 --config ${config_path} \
--- a/examples/aishell/asr1/run.sh
+++ b/examples/aishell/asr1/run.sh
@ -6,6 +6,7 @@ gpus=0,1,2,3
 stage=0
 stop_stage=50
 conf_path=conf/conformer.yaml
+ips=            #xx.xx.xx.xx,xx.xx.xx.xx
 decode_conf_path=conf/tuning/decode.yaml
 avg_num=30
 audio_file=data/demo_01_03.wav
@ -23,7 +24,7 @@ fi

 if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
    # train model, all `ckpt` under `exp` dir
-    CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path}  ${ckpt}
+    CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path} ${ckpt} ${ips}
 fi

 if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
--- a/examples/aishell3/tts3/README.md
+++ b/examples/aishell3/tts3/README.md
@ -6,15 +6,8 @@ AISHELL-3 is a large-scale and high-fidelity multi-speaker Mandarin speech corpu
 We use AISHELL-3 to train a multi-speaker fastspeech2 model here.
 ## Dataset
 ### Download and Extract
-Download AISHELL-3.
-```bash
-wget https://www.openslr.org/resources/93/data_aishell3.tgz
-```
-Extract AISHELL-3.
-```bash
-mkdir data_aishell3
-tar zxvf data_aishell3.tgz -C data_aishell3
-```
+Download AISHELL-3 from it's [Official Website](http://www.aishelltech.com/aishell_3) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/data_aishell3`.
+ 
 ### Get MFA Result and Extract
 We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2.
 You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo.
@ -120,12 +113,12 @@ CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_p
 ```
 ```text
 usage: synthesize.py [-h]
-                     [--am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}]
+                     [--am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech,tacotron2_aishell3}]
                     [--am_config AM_CONFIG] [--am_ckpt AM_CKPT]
                     [--am_stat AM_STAT] [--phones_dict PHONES_DICT]
                     [--tones_dict TONES_DICT] [--speaker_dict SPEAKER_DICT]
                     [--voice-cloning VOICE_CLONING]
-                     [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}]
+                     [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,wavernn_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,style_melgan_csmsc}]
                     [--voc_config VOC_CONFIG] [--voc_ckpt VOC_CKPT]
                     [--voc_stat VOC_STAT] [--ngpu NGPU]
                     [--test_metadata TEST_METADATA] [--output_dir OUTPUT_DIR]
@ -134,11 +127,10 @@ Synthesize with acoustic model & vocoder

 optional arguments:
  -h, --help            show this help message and exit
-  --am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}
+  --am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech,tacotron2_aishell3}
                        Choose acoustic model type of tts task.
  --am_config AM_CONFIG
-                        Config of acoustic model. Use deault config when it is
-                        None.
+                        Config of acoustic model.
  --am_ckpt AM_CKPT     Checkpoint file of acoustic model.
  --am_stat AM_STAT     mean and standard deviation used to normalize
                        spectrogram when training acoustic model.
@ -150,10 +142,10 @@ optional arguments:
                        speaker id map file.
  --voice-cloning VOICE_CLONING
                        whether training voice cloning model.
-  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}
+  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,wavernn_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,style_melgan_csmsc}
                        Choose vocoder type of tts task.
  --voc_config VOC_CONFIG
-                        Config of voc. Use deault config when it is None.
+                        Config of voc.
  --voc_ckpt VOC_CKPT   Checkpoint file of voc.
  --voc_stat VOC_STAT   mean and standard deviation used to normalize
                        spectrogram when training voc.
@ -169,12 +161,12 @@ CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_outp
 ```
 ```text
 usage: synthesize_e2e.py [-h]
-                         [--am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}]
+                         [--am {speedyspeech_csmsc,speedyspeech_aishell3,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech}]
                         [--am_config AM_CONFIG] [--am_ckpt AM_CKPT]
                         [--am_stat AM_STAT] [--phones_dict PHONES_DICT]
                         [--tones_dict TONES_DICT]
                         [--speaker_dict SPEAKER_DICT] [--spk_id SPK_ID]
-                         [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}]
+                         [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,style_melgan_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,wavernn_csmsc}]
                         [--voc_config VOC_CONFIG] [--voc_ckpt VOC_CKPT]
                         [--voc_stat VOC_STAT] [--lang LANG]
                         [--inference_dir INFERENCE_DIR] [--ngpu NGPU]
@ -184,11 +176,10 @@ Synthesize with acoustic model & vocoder

 optional arguments:
  -h, --help            show this help message and exit
-  --am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}
+  --am {speedyspeech_csmsc,speedyspeech_aishell3,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech}
                        Choose acoustic model type of tts task.
  --am_config AM_CONFIG
-                        Config of acoustic model. Use deault config when it is
-                        None.
+                        Config of acoustic model.
  --am_ckpt AM_CKPT     Checkpoint file of acoustic model.
  --am_stat AM_STAT     mean and standard deviation used to normalize
                        spectrogram when training acoustic model.
@ -199,10 +190,10 @@ optional arguments:
  --speaker_dict SPEAKER_DICT
                        speaker id map file.
  --spk_id SPK_ID       spk id for multi speaker acoustic model
-  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}
+  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,style_melgan_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,wavernn_csmsc}
                        Choose vocoder type of tts task.
  --voc_config VOC_CONFIG
-                        Config of voc. Use deault config when it is None.
+                        Config of voc.
  --voc_ckpt VOC_CKPT   Checkpoint file of voc.
  --voc_stat VOC_STAT   mean and standard deviation used to normalize
                        spectrogram when training voc.
@ -215,9 +206,9 @@ optional arguments:
                        output dir.
 ```
 1. `--am` is acoustic model type with the format {model_name}_{dataset}
-2. `--am_config`, `--am_checkpoint`, `--am_stat`, `--phones_dict` `--speaker_dict` are arguments for acoustic model, which correspond to the 5 files in the fastspeech2 pretrained model.
+2. `--am_config`, `--am_ckpt`, `--am_stat`, `--phones_dict` `--speaker_dict` are arguments for acoustic model, which correspond to the 5 files in the fastspeech2 pretrained model.
 3. `--voc` is vocoder type with the format {model_name}_{dataset}
-4. `--voc_config`, `--voc_checkpoint`, `--voc_stat` are arguments for vocoder, which correspond to the 3 files in the parallel wavegan pretrained model.
+4. `--voc_config`, `--voc_ckpt`, `--voc_stat` are arguments for vocoder, which correspond to the 3 files in the parallel wavegan pretrained model.
 5. `--lang` is the model language, which can be `zh` or `en`.
 6. `--test_metadata` should be the metadata file in the normalized subfolder of `test`  in the `dump` folder.
 7. `--text` is the text file, which contains sentences to synthesize.
--- a/examples/aishell3/vc0/README.md
+++ b/examples/aishell3/vc0/README.md
@ -6,15 +6,8 @@ This example contains code used to train a [Tacotron2](https://arxiv.org/abs/171

 ## Dataset
 ### Download and Extract
-Download AISHELL-3.
-```bash
-wget https://www.openslr.org/resources/93/data_aishell3.tgz
-```
-Extract AISHELL-3.
-```bash
-mkdir data_aishell3
-tar zxvf data_aishell3.tgz -C data_aishell3
-```
+Download AISHELL-3 from it's [Official Website](http://www.aishelltech.com/aishell_3) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/data_aishell3`.
+
 ### Get MFA Result and Extract
 We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get phonemes for Tacotron2, the durations of MFA are not needed here.
 You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo.
--- a/examples/aishell3/vc1/README.md
+++ b/examples/aishell3/vc1/README.md
@ -6,15 +6,8 @@ This example contains code used to train a [FastSpeech2](https://arxiv.org/abs/2

 ## Dataset
 ### Download and Extract
-Download AISHELL-3.
-```bash
-wget https://www.openslr.org/resources/93/data_aishell3.tgz
-```
-Extract AISHELL-3.
-```bash
-mkdir data_aishell3
-tar zxvf data_aishell3.tgz -C data_aishell3
-```
+Download AISHELL-3 from it's [Official Website](http://www.aishelltech.com/aishell_3) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/data_aishell3`.
+
 ### Get MFA Result and Extract
 We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2.
 You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo.
--- a/examples/aishell3/voc1/README.md
+++ b/examples/aishell3/voc1/README.md
@ -4,15 +4,8 @@ This example contains code used to train a [parallel wavegan](http://arxiv.org/a
 AISHELL-3 is a large-scale and high-fidelity multi-speaker Mandarin speech corpus that could be used to train multi-speaker Text-to-Speech (TTS) systems.
 ## Dataset
 ### Download and Extract
-Download AISHELL-3.
-```bash
-wget https://www.openslr.org/resources/93/data_aishell3.tgz
-```
-Extract AISHELL-3.
-```bash
-mkdir data_aishell3
-tar zxvf data_aishell3.tgz -C data_aishell3
-```
+Download AISHELL-3 from it's [Official Website](http://www.aishelltech.com/aishell_3) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/data_aishell3`.
+
 ### Get MFA Result and Extract
 We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2.
 You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo.
@ -75,7 +68,7 @@ Train a ParallelWaveGAN model.

 optional arguments:
  -h, --help            show this help message and exit
-  --config CONFIG       config file to overwrite default config.
+  --config CONFIG       ParallelWaveGAN config file.
  --train-metadata TRAIN_METADATA
                        training data.
  --dev-metadata DEV_METADATA
--- a/examples/aishell3/voc5/README.md
+++ b/examples/aishell3/voc5/README.md
@ -4,15 +4,7 @@ This example contains code used to train a [HiFiGAN](https://arxiv.org/abs/2010.
 AISHELL-3 is a large-scale and high-fidelity multi-speaker Mandarin speech corpus that could be used to train multi-speaker Text-to-Speech (TTS) systems.
 ## Dataset
 ### Download and Extract
-Download AISHELL-3.
-```bash
-wget https://www.openslr.org/resources/93/data_aishell3.tgz
-```
-Extract AISHELL-3.
-```bash
-mkdir data_aishell3
-tar zxvf data_aishell3.tgz -C data_aishell3
-```
+Download AISHELL-3 from it's [Official Website](http://www.aishelltech.com/aishell_3) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/data_aishell3`.
 ### Get MFA Result and Extract
 We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2.
 You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo.
@ -67,15 +59,13 @@ Here's the complete help message.
 ```text
 usage: train.py [-h] [--config CONFIG] [--train-metadata TRAIN_METADATA]
                [--dev-metadata DEV_METADATA] [--output-dir OUTPUT_DIR]
-                [--ngpu NGPU] [--batch-size BATCH_SIZE] [--max-iter MAX_ITER]
-                [--run-benchmark RUN_BENCHMARK]
-                [--profiler_options PROFILER_OPTIONS]
+                [--ngpu NGPU]

-Train a ParallelWaveGAN model.
+Train a HiFiGAN model.

 optional arguments:
  -h, --help            show this help message and exit
-  --config CONFIG       config file to overwrite default config.
+  --config CONFIG       HiFiGAN config file.
  --train-metadata TRAIN_METADATA
                        training data.
  --dev-metadata DEV_METADATA
@ -83,19 +73,6 @@ optional arguments:
  --output-dir OUTPUT_DIR
                        output dir.
  --ngpu NGPU           if ngpu == 0, use cpu.
-
-benchmark:
-  arguments related to benchmark.
-
-  --batch-size BATCH_SIZE
-                        batch size.
-  --max-iter MAX_ITER   train max steps.
-  --run-benchmark RUN_BENCHMARK
-                        runing benchmark or not, if True, use the --batch-size
-                        and --max-iter.
-  --profiler_options PROFILER_OPTIONS
-                        The option of profiler, which should be in format
-                        "key1=value1;key2=value2;key3=value3".
 ```

 1. `--config` is a config file in yaml format to overwrite the default config, which can be found at `conf/default.yaml`.
--- a/examples/ami/sd0/README.md
+++ b/examples/ami/sd0/README.md
@ -26,4 +26,7 @@ Use the following command to run diarization on AMI corpus.
 ./run.sh  --data_folder ./amicorpus  --manual_annot_folder ./ami_public_manual_1.6.2
 ```

-## Results (DER) coming soon! :)
+## Best performance in terms of Diarization Error Rate (DER).
+  | System | Mic. |Orcl. (Dev)|Orcl. (Eval)| Est. (Dev) |Est. (Eval)|
+  | --------|-------- | ---------|----------- | --------|-----------|
+  | ECAPA-TDNN + SC  | HeadsetMix| 1.54 % | 3.07 %| 1.56 %| 3.28 %  |
--- a/examples/callcenter/asr1/local/train.sh
+++ b/examples/callcenter/asr1/local/train.sh
@ -1,7 +1,7 @@
 #! /usr/bin/env bash

-if [ $# != 2 ];then
-    echo "usage: CUDA_VISIBLE_DEVICES=0 ${0} config_path ckpt_name"
+if [ $# -lt 2 ] && [ $# -gt 3 ];then
+    echo "usage: CUDA_VISIBLE_DEVICES=0 ${0} config_path ckpt_name ips(optional)"
    exit -1
 fi

@ -10,6 +10,13 @@ echo "using $ngpu gpus..."

 config_path=$1
 ckpt_name=$2
+ips=$3
+
+if [ ! $ips ];then
+  ips_config=
+else
+  ips_config="--ips="${ips}
+fi

 echo "using ${device}..."

@ -28,7 +35,7 @@ python3 -u ${BIN_DIR}/train.py \
 --output exp/${ckpt_name} \
 --seed ${seed}
 else
-python3 -m paddle.distributed.launch --gpus=${CUDA_VISIBLE_DEVICES} ${BIN_DIR}/train.py \
+python3 -m paddle.distributed.launch --gpus=${CUDA_VISIBLE_DEVICES} ${ips_config} ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
 --config ${config_path} \
 --output exp/${ckpt_name} \
--- a/examples/callcenter/asr1/run.sh
+++ b/examples/callcenter/asr1/run.sh
@ -6,6 +6,7 @@ gpus=0,1,2,3
 stage=0
 stop_stage=50
 conf_path=conf/conformer.yaml
+ips=            #xx.xx.xx.xx,xx.xx.xx.xx
 decode_conf_path=conf/tuning/decode.yaml
 avg_num=20

@ -22,7 +23,7 @@ fi

 if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
    # train model, all `ckpt` under `exp` dir
-    CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path}  ${ckpt}
+    CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path} ${ckpt} ${ips}
 fi

 if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
--- a/examples/csmsc/tts0/README.md
+++ b/examples/csmsc/tts0/README.md
@ -3,7 +3,7 @@ This example contains code used to train a [Tacotron2](https://arxiv.org/abs/171

 ## Dataset
 ### Download and Extract
-Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/source).
+Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/TNtts/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`.

 ### Get MFA Result and Extract
 We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get phonemes for Tacotron2, the durations of MFA are not needed here.
@ -103,12 +103,12 @@ CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_p
 ```
 ```text
 usage: synthesize.py [-h]
-                     [--am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc}]
+                     [--am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech,tacotron2_aishell3}]
                     [--am_config AM_CONFIG] [--am_ckpt AM_CKPT]
                     [--am_stat AM_STAT] [--phones_dict PHONES_DICT]
                     [--tones_dict TONES_DICT] [--speaker_dict SPEAKER_DICT]
                     [--voice-cloning VOICE_CLONING]
-                     [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}]
+                     [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,wavernn_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,style_melgan_csmsc}]
                     [--voc_config VOC_CONFIG] [--voc_ckpt VOC_CKPT]
                     [--voc_stat VOC_STAT] [--ngpu NGPU]
                     [--test_metadata TEST_METADATA] [--output_dir OUTPUT_DIR]
@ -117,11 +117,10 @@ Synthesize with acoustic model & vocoder

 optional arguments:
  -h, --help            show this help message and exit
-  --am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc}
+  --am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech,tacotron2_aishell3}
                        Choose acoustic model type of tts task.
  --am_config AM_CONFIG
-                        Config of acoustic model. Use deault config when it is
-                        None.
+                        Config of acoustic model.
  --am_ckpt AM_CKPT     Checkpoint file of acoustic model.
  --am_stat AM_STAT     mean and standard deviation used to normalize
                        spectrogram when training acoustic model.
@ -133,10 +132,10 @@ optional arguments:
                        speaker id map file.
  --voice-cloning VOICE_CLONING
                        whether training voice cloning model.
-  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}
+  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,wavernn_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,style_melgan_csmsc}
                        Choose vocoder type of tts task.
  --voc_config VOC_CONFIG
-                        Config of voc. Use deault config when it is None.
+                        Config of voc.
  --voc_ckpt VOC_CKPT   Checkpoint file of voc.
  --voc_stat VOC_STAT   mean and standard deviation used to normalize
                        spectrogram when training voc.
@ -152,12 +151,12 @@ CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_outp
 ```
 ```text
 usage: synthesize_e2e.py [-h]
-                         [--am {speedyspeech_csmsc,speedyspeech_aishell3,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc}]
+                         [--am {speedyspeech_csmsc,speedyspeech_aishell3,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech}]
                         [--am_config AM_CONFIG] [--am_ckpt AM_CKPT]
                         [--am_stat AM_STAT] [--phones_dict PHONES_DICT]
                         [--tones_dict TONES_DICT]
                         [--speaker_dict SPEAKER_DICT] [--spk_id SPK_ID]
-                         [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,style_melgan_csmsc,hifigan_csmsc}]
+                         [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,style_melgan_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,wavernn_csmsc}]
                         [--voc_config VOC_CONFIG] [--voc_ckpt VOC_CKPT]
                         [--voc_stat VOC_STAT] [--lang LANG]
                         [--inference_dir INFERENCE_DIR] [--ngpu NGPU]
@ -167,11 +166,10 @@ Synthesize with acoustic model & vocoder

 optional arguments:
  -h, --help            show this help message and exit
-  --am {speedyspeech_csmsc,speedyspeech_aishell3,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc}
+  --am {speedyspeech_csmsc,speedyspeech_aishell3,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech}
                        Choose acoustic model type of tts task.
  --am_config AM_CONFIG
-                        Config of acoustic model. Use deault config when it is
-                        None.
+                        Config of acoustic model.
  --am_ckpt AM_CKPT     Checkpoint file of acoustic model.
  --am_stat AM_STAT     mean and standard deviation used to normalize
                        spectrogram when training acoustic model.
@ -182,10 +180,10 @@ optional arguments:
  --speaker_dict SPEAKER_DICT
                        speaker id map file.
  --spk_id SPK_ID       spk id for multi speaker acoustic model
-  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,style_melgan_csmsc,hifigan_csmsc}
+  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,style_melgan_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,wavernn_csmsc}
                        Choose vocoder type of tts task.
  --voc_config VOC_CONFIG
-                        Config of voc. Use deault config when it is None.
+                        Config of voc.
  --voc_ckpt VOC_CKPT   Checkpoint file of voc.
  --voc_stat VOC_STAT   mean and standard deviation used to normalize
                        spectrogram when training voc.
@ -198,9 +196,9 @@ optional arguments:
                        output dir.
 ```
 1. `--am` is acoustic model type with the format {model_name}_{dataset}
-2. `--am_config`, `--am_checkpoint`, `--am_stat` and `--phones_dict` are arguments for acoustic model, which correspond to the 4 files in the Tacotron2 pretrained model.
+2. `--am_config`, `--am_ckpt`, `--am_stat` and `--phones_dict` are arguments for acoustic model, which correspond to the 4 files in the Tacotron2 pretrained model.
 3. `--voc` is vocoder type with the format {model_name}_{dataset}
-4. `--voc_config`, `--voc_checkpoint`, `--voc_stat` are arguments for vocoder, which correspond to the 3 files in the parallel wavegan pretrained model.
+4. `--voc_config`, `--voc_ckpt`, `--voc_stat` are arguments for vocoder, which correspond to the 3 files in the parallel wavegan pretrained model.
 5. `--lang` is the model language, which can be `zh` or `en`.
 6. `--test_metadata` should be the metadata file in the normalized subfolder of `test`  in the `dump` folder.
 7. `--text` is the text file, which contains sentences to synthesize.
--- a/examples/csmsc/tts2/README.md
+++ b/examples/csmsc/tts2/README.md
@ -3,7 +3,7 @@ This example contains code used to train a [SpeedySpeech](http://arxiv.org/abs/2

 ## Dataset
 ### Download and Extract
-Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/source).
+Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/TNtts/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`.

 ### Get MFA Result and Extract
 We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for SPEEDYSPEECH.
@ -109,12 +109,12 @@ CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_p
 ```
 ```text
 usage: synthesize.py [-h]
-                     [--am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}]
+                     [--am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech,tacotron2_aishell3}]
                     [--am_config AM_CONFIG] [--am_ckpt AM_CKPT]
                     [--am_stat AM_STAT] [--phones_dict PHONES_DICT]
                     [--tones_dict TONES_DICT] [--speaker_dict SPEAKER_DICT]
                     [--voice-cloning VOICE_CLONING]
-                     [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}]
+                     [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,wavernn_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,style_melgan_csmsc}]
                     [--voc_config VOC_CONFIG] [--voc_ckpt VOC_CKPT]
                     [--voc_stat VOC_STAT] [--ngpu NGPU]
                     [--test_metadata TEST_METADATA] [--output_dir OUTPUT_DIR]
@ -123,11 +123,10 @@ Synthesize with acoustic model & vocoder

 optional arguments:
  -h, --help            show this help message and exit
-  --am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}
+  --am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech,tacotron2_aishell3}
                        Choose acoustic model type of tts task.
  --am_config AM_CONFIG
-                        Config of acoustic model. Use deault config when it is
-                        None.
+                        Config of acoustic model.
  --am_ckpt AM_CKPT     Checkpoint file of acoustic model.
  --am_stat AM_STAT     mean and standard deviation used to normalize
                        spectrogram when training acoustic model.
@ -139,10 +138,10 @@ optional arguments:
                        speaker id map file.
  --voice-cloning VOICE_CLONING
                        whether training voice cloning model.
-  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}
+  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,wavernn_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,style_melgan_csmsc}
                        Choose vocoder type of tts task.
  --voc_config VOC_CONFIG
-                        Config of voc. Use deault config when it is None.
+                        Config of voc.
  --voc_ckpt VOC_CKPT   Checkpoint file of voc.
  --voc_stat VOC_STAT   mean and standard deviation used to normalize
                        spectrogram when training voc.
@ -158,12 +157,12 @@ CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_outp
 ```
 ```text
 usage: synthesize_e2e.py [-h]
-                         [--am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}]
+                         [--am {speedyspeech_csmsc,speedyspeech_aishell3,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech}]
                         [--am_config AM_CONFIG] [--am_ckpt AM_CKPT]
                         [--am_stat AM_STAT] [--phones_dict PHONES_DICT]
                         [--tones_dict TONES_DICT]
                         [--speaker_dict SPEAKER_DICT] [--spk_id SPK_ID]
-                         [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}]
+                         [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,style_melgan_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,wavernn_csmsc}]
                         [--voc_config VOC_CONFIG] [--voc_ckpt VOC_CKPT]
                         [--voc_stat VOC_STAT] [--lang LANG]
                         [--inference_dir INFERENCE_DIR] [--ngpu NGPU]
@ -173,11 +172,10 @@ Synthesize with acoustic model & vocoder

 optional arguments:
  -h, --help            show this help message and exit
-  --am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}
+  --am {speedyspeech_csmsc,speedyspeech_aishell3,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech}
                        Choose acoustic model type of tts task.
  --am_config AM_CONFIG
-                        Config of acoustic model. Use deault config when it is
-                        None.
+                        Config of acoustic model.
  --am_ckpt AM_CKPT     Checkpoint file of acoustic model.
  --am_stat AM_STAT     mean and standard deviation used to normalize
                        spectrogram when training acoustic model.
@ -188,10 +186,10 @@ optional arguments:
  --speaker_dict SPEAKER_DICT
                        speaker id map file.
  --spk_id SPK_ID       spk id for multi speaker acoustic model
-  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}
+  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,style_melgan_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,wavernn_csmsc}
                        Choose vocoder type of tts task.
  --voc_config VOC_CONFIG
-                        Config of voc. Use deault config when it is None.
+                        Config of voc.
  --voc_ckpt VOC_CKPT   Checkpoint file of voc.
  --voc_stat VOC_STAT   mean and standard deviation used to normalize
                        spectrogram when training voc.
@ -204,9 +202,9 @@ optional arguments:
                        output dir.
 ```
 1. `--am` is acoustic model type with the format {model_name}_{dataset}
-2. `--am_config`, `--am_checkpoint`, `--am_stat`, `--phones_dict` and `--tones_dict` are arguments for acoustic model, which correspond to the 5 files in the speedyspeech pretrained model.
+2. `--am_config`, `--am_ckpt`, `--am_stat`, `--phones_dict` and `--tones_dict` are arguments for acoustic model, which correspond to the 5 files in the speedyspeech pretrained model.
 3. `--voc` is vocoder type with the format {model_name}_{dataset}
-4. `--voc_config`, `--voc_checkpoint`, `--voc_stat` are arguments for vocoder, which correspond to the 3 files in the parallel wavegan pretrained model.
+4. `--voc_config`, `--voc_ckpt`, `--voc_stat` are arguments for vocoder, which correspond to the 3 files in the parallel wavegan pretrained model.
 5. `--lang` is the model language, which can be `zh` or `en`.
 6. `--test_metadata` should be the metadata file in the normalized subfolder of `test`  in the `dump` folder.
 7. `--text` is the text file, which contains sentences to synthesize.
--- a/examples/csmsc/tts3/README.md
+++ b/examples/csmsc/tts3/README.md
@ -4,7 +4,7 @@ This example contains code used to train a [Fastspeech2](https://arxiv.org/abs/2

 ## Dataset
 ### Download and Extract
-Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/source).
+Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/TNtts/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`.

 ### Get MFA Result and Extract
 We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for fastspeech2.
@ -111,12 +111,12 @@ CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_p
 ```
 ```text
 usage: synthesize.py [-h]
-                     [--am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}]
+                     [--am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech,tacotron2_aishell3}]
                     [--am_config AM_CONFIG] [--am_ckpt AM_CKPT]
                     [--am_stat AM_STAT] [--phones_dict PHONES_DICT]
                     [--tones_dict TONES_DICT] [--speaker_dict SPEAKER_DICT]
                     [--voice-cloning VOICE_CLONING]
-                     [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}]
+                     [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,wavernn_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,style_melgan_csmsc}]
                     [--voc_config VOC_CONFIG] [--voc_ckpt VOC_CKPT]
                     [--voc_stat VOC_STAT] [--ngpu NGPU]
                     [--test_metadata TEST_METADATA] [--output_dir OUTPUT_DIR]
@ -125,11 +125,10 @@ Synthesize with acoustic model & vocoder

 optional arguments:
  -h, --help            show this help message and exit
-  --am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}
+  --am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech,tacotron2_aishell3}
                        Choose acoustic model type of tts task.
  --am_config AM_CONFIG
-                        Config of acoustic model. Use deault config when it is
-                        None.
+                        Config of acoustic model.
  --am_ckpt AM_CKPT     Checkpoint file of acoustic model.
  --am_stat AM_STAT     mean and standard deviation used to normalize
                        spectrogram when training acoustic model.
@ -141,10 +140,10 @@ optional arguments:
                        speaker id map file.
  --voice-cloning VOICE_CLONING
                        whether training voice cloning model.
-  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}
+  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,wavernn_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,style_melgan_csmsc}
                        Choose vocoder type of tts task.
  --voc_config VOC_CONFIG
-                        Config of voc. Use deault config when it is None.
+                        Config of voc.
  --voc_ckpt VOC_CKPT   Checkpoint file of voc.
  --voc_stat VOC_STAT   mean and standard deviation used to normalize
                        spectrogram when training voc.
@ -160,12 +159,12 @@ CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_outp
 ```
 ```text
 usage: synthesize_e2e.py [-h]
-                         [--am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}]
+                         [--am {speedyspeech_csmsc,speedyspeech_aishell3,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech}]
                         [--am_config AM_CONFIG] [--am_ckpt AM_CKPT]
                         [--am_stat AM_STAT] [--phones_dict PHONES_DICT]
                         [--tones_dict TONES_DICT]
                         [--speaker_dict SPEAKER_DICT] [--spk_id SPK_ID]
-                         [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}]
+                         [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,style_melgan_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,wavernn_csmsc}]
                         [--voc_config VOC_CONFIG] [--voc_ckpt VOC_CKPT]
                         [--voc_stat VOC_STAT] [--lang LANG]
                         [--inference_dir INFERENCE_DIR] [--ngpu NGPU]
@ -175,11 +174,10 @@ Synthesize with acoustic model & vocoder

 optional arguments:
  -h, --help            show this help message and exit
-  --am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}
+  --am {speedyspeech_csmsc,speedyspeech_aishell3,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech}
                        Choose acoustic model type of tts task.
  --am_config AM_CONFIG
-                        Config of acoustic model. Use deault config when it is
-                        None.
+                        Config of acoustic model.
  --am_ckpt AM_CKPT     Checkpoint file of acoustic model.
  --am_stat AM_STAT     mean and standard deviation used to normalize
                        spectrogram when training acoustic model.
@ -190,10 +188,10 @@ optional arguments:
  --speaker_dict SPEAKER_DICT
                        speaker id map file.
  --spk_id SPK_ID       spk id for multi speaker acoustic model
-  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}
+  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,style_melgan_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,wavernn_csmsc}
                        Choose vocoder type of tts task.
  --voc_config VOC_CONFIG
-                        Config of voc. Use deault config when it is None.
+                        Config of voc.
  --voc_ckpt VOC_CKPT   Checkpoint file of voc.
  --voc_stat VOC_STAT   mean and standard deviation used to normalize
                        spectrogram when training voc.
@ -204,11 +202,12 @@ optional arguments:
  --text TEXT           text to synthesize, a 'utt_id sentence' pair per line.
  --output_dir OUTPUT_DIR
                        output dir.
+
 ```
 1. `--am` is acoustic model type with the format {model_name}_{dataset}
-2. `--am_config`, `--am_checkpoint`, `--am_stat` and `--phones_dict` are arguments for acoustic model, which correspond to the 4 files in the fastspeech2 pretrained model.
+2. `--am_config`, `--am_ckpt`, `--am_stat` and `--phones_dict` are arguments for acoustic model, which correspond to the 4 files in the fastspeech2 pretrained model.
 3. `--voc` is vocoder type with the format {model_name}_{dataset}
-4. `--voc_config`, `--voc_checkpoint`, `--voc_stat` are arguments for vocoder, which correspond to the 3 files in the parallel wavegan pretrained model.
+4. `--voc_config`, `--voc_ckpt`, `--voc_stat` are arguments for vocoder, which correspond to the 3 files in the parallel wavegan pretrained model.
 5. `--lang` is the model language, which can be `zh` or `en`.
 6. `--test_metadata` should be the metadata file in the normalized subfolder of `test`  in the `dump` folder.
 7. `--text` is the text file, which contains sentences to synthesize.
--- a/Show More
+++ b/Show More