Merge branch 'develop' into dev-hym

4 years ago · 474373bceb
parent 2938d3e49b b4c6a52beb
commit 474373bceb
570 changed files with 15750 additions and 63672 deletions
--- a/.gitignore
+++ b/.gitignore
@ -39,6 +39,9 @@ tools/env.sh
 tools/openfst-1.8.1/
 tools/libsndfile/
 tools/python-soundfile/
 tools/onnx
 tools/onnxruntime
 tools/Paddle2ONNX
 speechx/fc_patch/
--- a/.mergify.yml
+++ b/.mergify.yml
@ -52,7 +52,7 @@ pull_request_rules:
        add: ["T2S"]
  - name: "auto add label=Audio"
    conditions:
-      - files~=^paddleaudio/
+      - files~=^paddlespeech/audio/
    actions:
      label:
        add: ["Audio"]
@ -100,7 +100,7 @@ pull_request_rules:
        add: ["README"]
  - name: "auto add label=Documentation"
    conditions:
-      - files~=^(docs/|CHANGELOG.md|paddleaudio/CHANGELOG.md)
+      - files~=^(docs/|CHANGELOG.md)
    actions:
      label:
        add: ["Documentation"]
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@ -51,12 +51,12 @@ repos:
        language: system
        files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx|cuh|proto)$
        exclude: (?=speechx/speechx/kaldi|speechx/patch|speechx/tools/fstbin|speechx/tools/lmbin).*(\.cpp|\.cc|\.h|\.py)$
-    -   id: copyright_checker
+    #-   id: copyright_checker
-        name: copyright_checker
+    #    name: copyright_checker
-        entry: python .pre-commit-hooks/copyright-check.hook
+    #    entry: python .pre-commit-hooks/copyright-check.hook
-        language: system
+    #    language: system
-        files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx|proto|py)$
+    #    files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx|proto|py)$
-        exclude: (?=third_party|pypinyin|speechx/speechx/kaldi|speechx/patch|speechx/tools/fstbin|speechx/tools/lmbin).*(\.cpp|\.cc|\.h|\.py)$
+    #    exclude: (?=third_party|pypinyin|speechx/speechx/kaldi|speechx/patch|speechx/tools/fstbin|speechx/tools/lmbin).*(\.cpp|\.cc|\.h|\.py)$
 -   repo: https://github.com/asottile/reorder_python_imports
    rev: v2.4.0
    hooks:
--- a/.pre-commit-hooks/copyright-check.hook
+++ b/.pre-commit-hooks/copyright-check.hook
@ -19,7 +19,7 @@ import subprocess
 import platform
 COPYRIGHT = '''
-Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
--- a/README.md
+++ b/README.md
@ -1,7 +1,4 @@
 ([简体中文](./README_cn.md)|English)
 <p align="center">
  <img src="./docs/images/PaddleSpeech_logo.png" />
 </p>
@ -20,20 +17,19 @@
    <a href="https://huggingface.co/spaces"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a>
 </p>
 <div align="center">  
-<h3>
+<h4>
-  | <a href="#quick-start"> Quick Start </a>
+    <a href="#quick-start"> Quick Start </a>
  | <a href="#quick-start-server"> Quick Start Server </a>
  | <a href="#quick-start-streaming-server"> Quick Start Streaming Server</a>
  |
  </br>
  | <a href="#documents"> Documents </a>
  | <a href="#model-list"> Models List </a>
-  |
+  | <a href="https://aistudio.baidu.com/aistudio/education/group/info/25130"> AIStudio Courses </a>
-</h3>
+  | <a href="https://arxiv.org/abs/2205.12007"> Paper </a>
  | <a href="https://gitee.com/paddlepaddle/PaddleSpeech"> Gitee </a>
 </h4>
 </div>
-
+------------------------------------------------------------------------------------
 **PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models.
@ -170,23 +166,12 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
 - 🤗  2021.12.14: [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
 - 👏🏻  2021.12.10: `CLI` is available for `Audio Classification`, `Automatic Speech Recognition`, `Speech Translation (English to Chinese)` and `Text-to-Speech`.
 ### 🔥 Hot Activities
 <!---
 2021.12.14: We would like to have an online courses to introduce basics and research of speech, as well as code practice with `paddlespeech`. Please pay attention to our [Calendar](https://www.paddlepaddle.org.cn/live).
 --->
 - 2021.12.21~12.24
  4 Days Live Courses: Depth interpretation of PaddleSpeech!
  **Courses videos and related materials: https://aistudio.baidu.com/aistudio/education/group/info/25130**
 ### Community
- Scan the QR code below with your Wechat (reply【语音】after your friend's application is approved), you can access to official technical exchange group. Look forward to your participation.
+- Scan the QR code below with your Wechat, you can access to official technical exchange group and get the bonus ( more than 20GB learning materials, such as papers, codes and videos ) and the live link of the lessons. Look forward to your participation.
 <div align="center">
-<img src="https://raw.githubusercontent.com/yt605155624/lanceTest/main/images/wechat_4.jpg"  width = "300"  />
+<img src="https://user-images.githubusercontent.com/23690325/169763015-cbd8e28d-602c-4723-810d-dbc6da49441e.jpg"  width = "200"  />
 </div>
 ## Installation
--- a/README_cn.md
+++ b/README_cn.md
@ -18,40 +18,21 @@
    <a href="https://huggingface.co/spaces"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a>
 </p>
 <div align="center">  
-<h3>
+<h4>
-  <a href="#quick-start"> Quick Start </a>
+    <a href="#快速开始"> 快速开始 </a>
-  | <a href="#quick-start-server"> Quick Start Server </a>
+  | <a href="#快速使用服务"> 快速使用服务 </a>
-  | <a href="#quick-start-streaming-server"> Quick Start Streaming Server</a>
+  | <a href="#快速使用流式服务"> 快速使用流式服务 </a>
-  </br>
+  | <a href="#教程文档"> 教程文档 </a>
-  <a href="#documents"> Documents </a>
+  | <a href="#模型列表"> 模型列表 </a>
-  | <a href="#model-list"> Models List </a>
+  | <a href="https://aistudio.baidu.com/aistudio/education/group/info/25130"> AIStudio 课程 </a>
-</h3>
+  | <a href="https://arxiv.org/abs/2205.12007"> 论文 </a>
  | <a href="https://gitee.com/paddlepaddle/PaddleSpeech"> Gitee 
 </h4>
 </div>
 ------------------------------------------------------------------------------------
 <div align="center">  
  <h3>
  <a href="#quick-start"> 快速开始 </a>
  | <a href="#quick-start-server"> 快速使用服务 </a>
  | <a href="#quick-start-streaming-server"> 快速使用流式服务 </a>
  | <a href="#documents"> 教程文档 </a>
  | <a href="#model-list"> 模型列表 </a>
 </div>
 <!---
 from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readmes-readable.md
 1.What is this repo or project? (You can reuse the repo description you used earlier because this section doesn’t have to be long.)
 2.How does it work?
 3.Who will use this repo or project?
 4.What is the goal of this project?
 -->
 **PaddleSpeech** 是基于飞桨 [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) 的语音方向的开源模型库，用于语音和音频中的各种关键任务的开发，包含大量基于深度学习前沿和有影响力的模型，一些典型的应用示例如下：
 ##### 语音识别
@ -179,38 +160,30 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
 ### 近期更新
-<!---
+- 👑 2022.05.13: PaddleSpeech 发布 [PP-ASR](./docs/source/asr/PPASR_cn.md) 流式语音识别系统、[PP-TTS](./docs/source/tts/PPTTS_cn.md) 流式语音合成系统、[PP-VPR](docs/source/vpr/PPVPR_cn.md) 全链路声纹识别系统
 2021.12.14: We would like to have an online courses to introduce basics and research of speech, as well as code practice with `paddlespeech`. Please pay attention to our [Calendar](https://www.paddlepaddle.org.cn/live).
 --->
 - 👑 2022.05.13: PaddleSpeech 发布 [PP-ASR](./docs/source/asr/PPASR_cn.md)、[PP-TTS](./docs/source/tts/PPTTS_cn.md)、[PP-VPR](docs/source/vpr/PPVPR_cn.md)
 - 👏🏻 2022.05.06: PaddleSpeech Streaming Server 上线! 覆盖了语音识别（标点恢复、时间戳），和语音合成。
 - 👏🏻 2022.05.06: PaddleSpeech Server 上线! 覆盖了声音分类、语音识别、语音合成、声纹识别，标点恢复。
 - 👏🏻 2022.03.28: PaddleSpeech CLI 覆盖声音分类、语音识别、语音翻译（英译中）、语音合成，声纹验证。
 - 🤗 2021.12.14: PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
 ### 🔥 热门活动
 - 2021.12.21~12.24
-  4 日直播课: 深度解读 PaddleSpeech 语音技术!
+ ### 🔥 加入技术交流群获取入群福利
-  **直播回放与课件资料: https://aistudio.baidu.com/aistudio/education/group/info/25130**
+ - 3 日直播课链接: 深度解读 PP-TTS、PP-ASR、PP-VPR 三项核心语音系统关键技术
 - 20G 学习大礼包：视频课程、前沿论文与学习资料
-
+微信扫描二维码关注公众号，点击“马上报名”填写问卷加入官方交流群，获得更高效的问题答疑，与各行各业开发者充分交流，期待您的加入。
 ### 技术交流群
 微信扫描二维码（好友申请通过后回复【语音】）加入官方交流群，获得更高效的问题答疑，与各行各业开发者充分交流，期待您的加入。
 <div align="center">
-<img src="https://raw.githubusercontent.com/yt605155624/lanceTest/main/images/wechat_4.jpg"  width = "300"  />
+<img src="https://user-images.githubusercontent.com/23690325/169763015-cbd8e28d-602c-4723-810d-dbc6da49441e.jpg"  width = "200"  />
 </div>
 ## 安装
 我们强烈建议用户在 **Linux** 环境下，*3.7* 以上版本的 *python* 上安装 PaddleSpeech。
 目前为止，**Linux** 支持声音分类、语音识别、语音合成和语音翻译四种功能，**Mac OSX、 Windows** 下暂不支持语音翻译功能。 想了解具体安装细节，可以参考[安装文档](./docs/source/install_cn.md)。
-
+<a name="快速开始"></a>
 ## 快速开始
 安装完成后，开发者可以通过命令行快速开始，改变 `--input` 可以尝试用自己的音频或文本测试。
@ -257,7 +230,7 @@ paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
 更多命令行命令请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos)
 > Note: 如果需要训练或者微调，请查看[语音识别](./docs/source/asr/quick_start.md)， [语音合成](./docs/source/tts/quick_start.md)。
-
+<a name="快速使用服务"></a>
 ## 快速使用服务
 安装完成后，开发者可以通过命令行快速使用服务。
@ -283,7 +256,7 @@ paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav
 更多服务相关的命令行使用信息，请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server)
-<a name="quickstartstreamingserver"></a>
+<a name="快速使用流式服务"></a>
 ## 快速使用流式服务
 开发者可以尝试 [流式 ASR](./demos/streaming_asr_server/README.md) 和 [流式 TTS](./demos/streaming_tts_server/README.md) 服务.
@ -314,8 +287,7 @@ paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http
 更多信息参看： [流式 ASR](./demos/streaming_asr_server/README.md) 和 [流式 TTS](./demos/streaming_tts_server/README.md) 
-<a name="modulelist"></a>
+<a name="模型列表"></a>
 ## 模型列表
 PaddleSpeech 支持很多主流的模型，并提供了预训练模型，详情请见[模型列表](./docs/source/released_model.md)。
@ -587,6 +559,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
  </tbody>
 </table>
 <a name="教程文档"></a>
 ## 教程文档
 对于 PaddleSpeech 的所关注的任务，以下指南有助于帮助开发者快速入门，了解语音相关核心思想。
--- a/audio/.gitignore
+++ b/audio/.gitignore
@ -1,2 +0,0 @@
 .eggs
 *.wav
--- a/audio/CHANGELOG.md
+++ b/audio/CHANGELOG.md
@ -1,9 +0,0 @@
 # Changelog
 Date: 2022-3-15, Author: Xiaojie Chen.
  - kaldi and librosa mfcc, fbank, spectrogram.
  - unit test and benchmark.
 Date: 2022-2-25, Author: Hui Zhang.
  - Refactor architecture.
  - dtw distance and mcd style dtw.
--- a/audio/README.md
+++ b/audio/README.md
@ -1,7 +0,0 @@
 # PaddleAudio
 PaddleAudio is an audio library for PaddlePaddle.
 ## Install
 `pip install .`
--- a/audio/docs/Makefile
+++ b/audio/docs/Makefile
@ -1,19 +0,0 @@
 # Minimal makefile for Sphinx documentation
 #
 # You can set these variables from the command line.
 SPHINXOPTS    =
 SPHINXBUILD   = sphinx-build
 SOURCEDIR     = source
 BUILDDIR      = build
 # Put it first so that "make" without argument is like "make help".
 help:
 	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
 .PHONY: help Makefile
 # Catch-all target: route all unknown targets to Sphinx using the new
 # "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
 %: Makefile
 	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
--- a/audio/docs/README.md
+++ b/audio/docs/README.md
@ -1,24 +0,0 @@
 # Build docs for PaddleAudio
 Execute the following steps in **current directory**.
 ## 1. Install
 `pip install Sphinx sphinx_rtd_theme`
 ## 2. Generate API docs
 Generate API docs from doc string.
 `sphinx-apidoc -fMeT -o source ../paddleaudio ../paddleaudio/utils --templatedir source/_templates`
 ## 3. Build
 `sphinx-build source _html`
 ## 4. Preview
 Open `_html/index.html` for page preview.
--- a/audio/docs/images/paddle.png
+++ b/audio/docs/images/paddle.png
--- a/audio/docs/make.bat
+++ b/audio/docs/make.bat
@ -1,35 +0,0 @@
@ECHO OFF
 pushd %~dp0
 REM Command file for Sphinx documentation
 if "%SPHINXBUILD%" == "" (
 	set SPHINXBUILD=sphinx-build
 )
 set SOURCEDIR=source
 set BUILDDIR=build
 if "%1" == "" goto help
 %SPHINXBUILD% >NUL 2>NUL
 if errorlevel 9009 (
 	echo.
 	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
 	echo.installed, then set the SPHINXBUILD environment variable to point
 	echo.to the full path of the 'sphinx-build' executable. Alternatively you
 	echo.may add the Sphinx directory to PATH.
 	echo.
 	echo.If you don't have Sphinx installed, grab it from
 	echo.http://sphinx-doc.org/
 	exit /b 1
 )
 %SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
 goto end
 :help
 %SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
 :end
 popd
--- a/audio/paddleaudio/metric/dtw.py
+++ b/audio/paddleaudio/metric/dtw.py
@ -1,44 +0,0 @@
 # Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import numpy as np
 from dtaidistance import dtw_ndim
 __all__ = [
    'dtw_distance',
 ]
 def dtw_distance(xs: np.ndarray, ys: np.ndarray) -> float:
    """Dynamic Time Warping.
    This function keeps a compact matrix, not the full warping paths matrix.
    Uses dynamic programming to compute:
    Examples:
        .. code-block:: python
            wps[i, j] = (s1[i]-s2[j])**2 + min(
                            wps[i-1, j  ] + penalty,  // vertical   / insertion / expansion
                            wps[i  , j-1] + penalty,  // horizontal / deletion  / compression
                            wps[i-1, j-1])            // diagonal   / match
            dtw = sqrt(wps[-1, -1])
    Args:
        xs (np.ndarray): ref sequence, [T,D]
        ys (np.ndarray): hyp sequence, [T,D]
    Returns:
        float: dtw distance
    """
    return dtw_ndim.distance(xs, ys)
--- a/audio/paddleaudio/utils/env.py
+++ b/audio/paddleaudio/utils/env.py
@ -1,60 +0,0 @@
 # Copyright (c) 2021  PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License"
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 '''
 This module is used to store environmental variables in PaddleAudio.
 PPAUDIO_HOME     -->  the root directory for storing PaddleAudio related data. Default to ~/.paddleaudio. Users can change the
 ├                            default value through the PPAUDIO_HOME environment variable.
 ├─ MODEL_HOME    -->  Store model files.
 └─ DATA_HOME     -->  Store automatically downloaded datasets.
 '''
 import os
 __all__ = [
    'USER_HOME',
    'PPAUDIO_HOME',
    'MODEL_HOME',
    'DATA_HOME',
 ]
 def _get_user_home():
    return os.path.expanduser('~')
 def _get_ppaudio_home():
    if 'PPAUDIO_HOME' in os.environ:
        home_path = os.environ['PPAUDIO_HOME']
        if os.path.exists(home_path):
            if os.path.isdir(home_path):
                return home_path
            else:
                raise RuntimeError(
                    'The environment variable PPAUDIO_HOME {} is not a directory.'.
                    format(home_path))
        else:
            return home_path
    return os.path.join(_get_user_home(), '.paddleaudio')
 def _get_sub_home(directory):
    home = os.path.join(_get_ppaudio_home(), directory)
    if not os.path.exists(home):
        os.makedirs(home)
    return home
 USER_HOME = _get_user_home()
 PPAUDIO_HOME = _get_ppaudio_home()
 MODEL_HOME = _get_sub_home('models')
 DATA_HOME = _get_sub_home('datasets')
--- a/audio/setup.py
+++ b/audio/setup.py
@ -1,99 +0,0 @@
 # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import glob
 import os
 import setuptools
 from setuptools.command.install import install
 from setuptools.command.test import test
 # set the version here
 VERSION = '0.0.0'
 # Inspired by the example at https://pytest.org/latest/goodpractises.html
 class TestCommand(test):
    def finalize_options(self):
        test.finalize_options(self)
        self.test_args = []
        self.test_suite = True
    def run(self):
        self.run_benchmark()
        super(TestCommand, self).run()
    def run_tests(self):
        # Run nose ensuring that argv simulates running nosetests directly
        import nose
        nose.run_exit(argv=['nosetests', '-w', 'tests'])
    def run_benchmark(self):
        for benchmark_item in glob.glob('tests/benchmark/*py'):
            os.system(f'pytest {benchmark_item}')
 class InstallCommand(install):
    def run(self):
        install.run(self)
 def write_version_py(filename='paddleaudio/__init__.py'):
    with open(filename, "a") as f:
        f.write(f"__version__ = '{VERSION}'")
 def remove_version_py(filename='paddleaudio/__init__.py'):
    with open(filename, "r") as f:
        lines = f.readlines()
    with open(filename, "w") as f:
        for line in lines:
            if "__version__" not in line:
                f.write(line)
 remove_version_py()
 write_version_py()
 setuptools.setup(
    name="paddleaudio",
    version=VERSION,
    author="",
    author_email="",
    description="PaddleAudio, in development",
    long_description="",
    long_description_content_type="text/markdown",
    url="",
    packages=setuptools.find_packages(include=['paddleaudio*']),
    classifiers=[
        "Programming Language :: Python :: 3",
        "License :: OSI Approved :: MIT License",
        "Operating System :: OS Independent",
    ],
    python_requires='>=3.6',
    install_requires=[
        'numpy >= 1.15.0', 'scipy >= 1.0.0', 'resampy >= 0.2.2',
        'soundfile >= 0.9.0', 'colorlog', 'dtaidistance == 2.3.1', 'pathos'
    ],
    extras_require={
        'test': [
            'nose', 'librosa==0.8.1', 'soundfile==0.10.3.post1',
            'torchaudio==0.10.2', 'pytest-benchmark'
        ],
    },
    cmdclass={
        'install': InstallCommand,
        'test': TestCommand,
    }, )
 remove_version_py()
--- a/audio/tests/.gitkeep
+++ b/audio/tests/.gitkeep
--- a/demos/README.md
+++ b/demos/README.md
@ -2,14 +2,14 @@
 ([简体中文](./README_cn.md)|English)
-The directory containes many speech applications in multi scenarios.
+This directory contains many speech applications in multiple scenarios.
 * audio searching - mass audio similarity retrieval
 * audio tagging - multi-label tagging of an audio file
-* automatic_video_subtitiles - generate subtitles from a video
+* automatic_video_subtitles - generate subtitles from a video
 * metaverse - 2D AR with TTS  
 * punctuation_restoration - restore punctuation from raw text
-* speech recogintion - recognize text of an audio file 
+* speech recognition - recognize text of an audio file 
 * speech server - Server for Speech Task, e.g. ASR,TTS,CLS
 * streaming asr server - receive audio stream from websocket, and recognize to transcript.
 * speech translation - end to end speech translation  
--- a/demos/audio_content_search/README.md
+++ b/demos/audio_content_search/README.md
@ -16,7 +16,12 @@ see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/doc
 You can choose one way from meduim and hard to install paddlespeech.
-The dependency refers to the requirements.txt
+The dependency refers to the requirements.txt, and install the dependency as follows:
 ```
 pip install -r requriement.txt 
 ```
 ### 2. Prepare Input File
 The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
--- a/demos/audio_content_search/README_cn.md
+++ b/demos/audio_content_search/README_cn.md
@ -16,7 +16,11 @@
 请看[安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)。
 你可以从 medium，hard 三中方式中选择一种方式安装。
-依赖参见 requirements.txt
+依赖参见 requirements.txt, 安装依赖
 ```
 pip install -r requriement.txt 
 ```
 ### 2. 准备输入
 这个 demo 的输入应该是一个 WAV 文件（`.wav`），并且采样率必须与模型的采样率相同。
--- a/demos/audio_content_search/conf/acs_application.yaml
+++ b/demos/audio_content_search/conf/acs_application.yaml
@ -28,6 +28,7 @@ acs_python:
    word_list: "./conf/words.txt"
    sample_rate: 16000
    device: 'cpu' # set 'gpu:id' or 'cpu'
    ping_timeout: 100 # seconds
--- a/demos/audio_content_search/requirements.txt
+++ b/demos/audio_content_search/requirements.txt
@ -0,0 +1 @@
 websocket-client
--- a/demos/audio_content_search/streaming_asr_server.py
+++ b/demos/audio_content_search/streaming_asr_server.py
@ -0,0 +1,38 @@
 # Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import argparse
 from paddlespeech.cli.log import logger
 from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
 if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        prog='paddlespeech_server.start', add_help=True)
    parser.add_argument(
        "--config_file",
        action="store",
        help="yaml file of the app",
        default=None,
        required=True)
    parser.add_argument(
        "--log_file",
        action="store",
        help="log file",
        default="./log/paddlespeech.log")
    logger.info("start to parse the args")
    args = parser.parse_args()
    logger.info("start to launch the streaming asr server")
    streaming_asr_server = ServerExecutor()
    streaming_asr_server(config_file=args.config_file, log_file=args.log_file)
--- a/demos/audio_searching/README.md
+++ b/demos/audio_searching/README.md
@ -89,7 +89,7 @@ Then to start the system server, and it provides HTTP backend services.
  Then start the server with Fastapi.
  ```bash
-  export PYTHONPATH=$PYTHONPATH:./src:../../paddleaudio
+  export PYTHONPATH=$PYTHONPATH:./src
  python src/audio_search.py
  ```
--- a/demos/audio_searching/README_cn.md
+++ b/demos/audio_searching/README_cn.md
@ -91,7 +91,7 @@ ffce340b3790  minio/minio:RELEASE.2020-12-03T00-03-10Z  "/usr/bin/docker-ent…"
  启动用 Fastapi 构建的服务
  ```bash
-  export PYTHONPATH=$PYTHONPATH:./src:../../paddleaudio
+  export PYTHONPATH=$PYTHONPATH:./src
  python src/audio_search.py
  ```
--- a/demos/audio_searching/src/encode.py
+++ b/demos/audio_searching/src/encode.py
@ -14,7 +14,7 @@
 import numpy as np
 from logs import LOGGER
-from paddlespeech.cli import VectorExecutor
+from paddlespeech.cli.vector import VectorExecutor
 vector_executor = VectorExecutor()
--- a/demos/audio_searching/src/operations/load.py
+++ b/demos/audio_searching/src/operations/load.py
@ -26,8 +26,7 @@ def get_audios(path):
    """
    supported_formats = [".wav", ".mp3", ".ogg", ".flac", ".m4a"]
    return [
-        item
+        item for sublist in [[os.path.join(dir, file) for file in files]
        for sublist in [[os.path.join(dir, file) for file in files]
                             for dir, _, files in list(os.walk(path))]
        for item in sublist if os.path.splitext(item)[1] in supported_formats
    ]
--- a/demos/audio_tagging/README.md
+++ b/demos/audio_tagging/README.md
@ -57,7 +57,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav https://paddlespe
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import CLSExecutor
+  from paddlespeech.cli.cls import CLSExecutor
  cls_executor = CLSExecutor()
  result = cls_executor(
--- a/demos/audio_tagging/README_cn.md
+++ b/demos/audio_tagging/README_cn.md
@ -57,7 +57,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav https://paddlespe
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import CLSExecutor
+  from paddlespeech.cli.cls import CLSExecutor
  cls_executor = CLSExecutor()
  result = cls_executor(
--- a/demos/automatic_video_subtitiles/README.md
+++ b/demos/automatic_video_subtitiles/README.md
@ -28,7 +28,8 @@ ffmpeg -i subtitle_demo1.mp4 -ac 1 -ar 16000 -vn input.wav
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import ASRExecutor, TextExecutor
+  from paddlespeech.cli.asr import ASRExecutor
  from paddlespeech.cli.text import TextExecutor
  asr_executor = ASRExecutor()
  text_executor = TextExecutor()
--- a/demos/automatic_video_subtitiles/README_cn.md
+++ b/demos/automatic_video_subtitiles/README_cn.md
@ -23,7 +23,8 @@ ffmpeg -i subtitle_demo1.mp4 -ac 1 -ar 16000 -vn input.wav
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import ASRExecutor, TextExecutor
+  from paddlespeech.cli.asr import ASRExecutor
  from paddlespeech.cli.text import TextExecutor
  asr_executor = ASRExecutor()
  text_executor = TextExecutor()
--- a/demos/automatic_video_subtitiles/recognize.py
+++ b/demos/automatic_video_subtitiles/recognize.py
@ -16,8 +16,8 @@ import os
 import paddle
-from paddlespeech.cli import ASRExecutor
+from paddlespeech.cli.asr import ASRExecutor
-from paddlespeech.cli import TextExecutor
+from paddlespeech.cli.text import TextExecutor
 # yapf: disable
 parser = argparse.ArgumentParser(__doc__)
--- a/demos/custom_streaming_asr/README.md
+++ b/demos/custom_streaming_asr/README.md
@ -3,10 +3,13 @@
 # Customized Auto Speech Recognition
 ## introduction
 In some cases, we need to recognize the specific rare words with high accuracy. eg: address recognition in navigation apps. customized ASR can slove those issues.
 this demo is customized for expense account, which need to recognize rare address.
 the scripts are in https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/speechx/examples/custom_asr
 * G with slot: 打车到 "address_slot"。  
 ![](https://ai-studio-static-online.cdn.bcebos.com/28d9ef132a7f47a895a65ae9e5c4f55b8f472c9f3dd24be8a2e66e0b88b173a4)
--- a/demos/custom_streaming_asr/README_cn.md
+++ b/demos/custom_streaming_asr/README_cn.md
@ -6,6 +6,8 @@
 这个 demo 是打车报销单的场景识别，需要识别一些稀有的地名，可以通过如下操作实现。
 相关脚本:https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/speechx/examples/custom_asr
 * G with slot: 打车到 "address_slot"。  
 ![](https://ai-studio-static-online.cdn.bcebos.com/28d9ef132a7f47a895a65ae9e5c4f55b8f472c9f3dd24be8a2e66e0b88b173a4)
--- a/demos/punctuation_restoration/README.md
+++ b/demos/punctuation_restoration/README.md
@ -42,7 +42,7 @@ The input of this demo should be a text of the specific language that can be pas
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import TextExecutor
+  from paddlespeech.cli.text import TextExecutor
  text_executor = TextExecutor()
  result = text_executor(
--- a/demos/punctuation_restoration/README_cn.md
+++ b/demos/punctuation_restoration/README_cn.md
@ -44,7 +44,7 @@
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import TextExecutor
+  from paddlespeech.cli.text import TextExecutor
  text_executor = TextExecutor()
  result = text_executor(
--- a/demos/speaker_verification/README.md
+++ b/demos/speaker_verification/README.md
@ -53,51 +53,50 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
  Output:
  ```bash
-    demo [  1.4217498    5.626253    -5.342073     1.1773866    3.308055
+    demo [ -1.3251206    7.8606825   -4.620626     0.3000721    2.2648535
-    1.756596     5.167894    10.80636     -3.8226728   -5.6141334
+    -1.1931441    3.0647137    7.673595    -6.0044727  -12.02426
-    2.623845    -0.8072968    1.9635103   -7.3128724    0.01103897
+    -1.9496069    3.1269536    1.618838    -7.6383104   -1.2299773
-    -9.723131     0.6619743   -6.976803    10.213478     7.494748
+  -12.338331     2.1373026   -5.3957124    9.717328     5.6752305
-    2.9105635    3.8949256    3.7999806    7.1061673   16.905321
+    3.7805123    3.0597172    3.429692     8.97601     13.174125
-    -7.1493764    8.733103     3.4230042   -4.831653   -11.403367
+    -0.53132284   8.9424715    4.46511     -4.4262476   -9.726503
-    11.232214     7.1274667   -4.2828417    2.452362    -5.130748
+    8.399328     7.2239175   -7.435854     2.9441683   -4.3430395
-    -18.177666    -2.6116815  -11.000337    -6.7314315    1.6564683
+  -13.886965    -1.6346735  -10.9027405   -5.311245     3.8007221
-    0.7618269    1.1253023   -2.083836     4.725744    -8.782597
+    3.8976038   -2.1230774   -2.3521194    4.151031    -7.4048667
-    -3.539873     3.814236     5.1420674    2.162061     4.096431
+    0.13911647   2.4626107    4.9664545    0.9897574    5.4839754
-    -6.4162116   12.747448     1.9429878  -15.152943     6.417416
+    -3.3574002   10.1340065   -0.6120171  -10.403095     4.6007543
-    16.097002    -9.716668    -1.9920526   -3.3649497   -1.871939
+    16.00935     -7.7836914   -4.1945305   -6.9368606    1.1789556
-    11.567354     3.69788     11.258265     7.442363     9.183411
+    11.490801     4.2380238    9.550931     8.375046     7.5089145
-    4.5281515   -1.2417862    4.3959084    6.6727695    5.8898783
+    -0.65707296  -0.30051577   2.8406055    3.0828028    0.730817
-    7.627124    -0.66919386 -11.889693    -9.208865    -7.4274073
+    6.148354     0.13766119 -13.424735    -7.7461405   -2.3227983
-    -3.7776625    6.917234    -9.848748    -2.0944717   -5.135116
+    -8.305252     2.9879124  -10.995229     0.15211068  -2.3820348
-    0.49563864   9.317534    -5.9141874   -1.8098574   -0.11738578
+    -1.7984174    8.495629    -5.8522367   -3.755498     0.6989711
-    -7.169265    -1.0578263   -5.7216787   -5.1173844   16.137651
+    -5.2702994   -2.6188622   -1.8828466   -4.64665     14.078544
-    -4.473626     7.6624317   -0.55381083   9.631587    -6.4704556
+    -0.5495333   10.579158    -3.2160501    9.349004    -4.381078
-    -8.548508     4.3716145   -0.79702514   4.478997    -2.9758704
+  -11.675817    -2.8630207    4.5721755    2.246612    -4.574342
-    3.272176     2.8382776    5.134597    -9.190781    -0.5657382
+    1.8610188    2.3767874    5.6257877   -9.784078     0.64967257
-    -4.8745747    2.3165567   -5.984303    -2.1798875    0.35541576
+    -1.4579505    0.4263264   -4.9211264   -2.454784     3.4869802
-    -0.31784213   9.493548     2.1144536    4.358092   -12.089823
+    -0.42654222   8.341269     1.356552     7.0966883  -13.102829
-    8.451689    -7.925461     4.6242585    4.4289427   18.692003
+    8.016734    -7.1159344    1.8699781    0.208721    14.699384
-    -2.6204622   -5.149185    -0.35821092   8.488551     4.981496
+    -1.025278    -2.6107233   -2.5082312    8.427193     6.9138527
-    -9.32683     -2.2544234    6.6417594    1.2119585   10.977129
+    -6.2912464    0.6157366    2.489688    -3.4668267    9.921763
-    16.555033     3.3238444    9.551863    -1.6676947   -0.79539716
+    11.200815    -0.1966403    7.4916005   -0.62312716  -0.25848144
-    -8.605674    -0.47356385   2.6741948   -5.359179    -2.6673796
+    -9.947997    -0.9611041    1.1649219   -2.1907122   -1.5028487
-    0.66607     15.443222     4.740594    -3.4725387   11.592567
+    -0.51926106  15.165954     2.4649463   -0.9980445    7.4416637
-    -2.054497     1.7361217   -8.265324    -9.30447      5.4068313
+    -2.0768049    3.5896823   -7.3055434   -7.5620847    4.323335
-    -1.5180256   -7.746615    -6.089606     0.07112726  -0.34904733
+    0.0804418   -6.56401     -2.3148053   -1.7642345   -2.4708817
-    -8.649895    -9.998958    -2.564841    -0.53999114   2.601808
+    -7.675618    -9.548878    -1.0177554    0.16986446   2.5877135
-    -0.31927416  -1.8815292   -2.07215     -3.4105783   -8.2998085
+    -1.8752296   -0.36614323  -6.0493784   -2.3965611   -5.9453387
-    1.483641   -15.365992    -8.288208     3.8847756   -3.4876456
+    0.9424033  -13.155974    -7.457801     0.14658108  -3.742797
-    7.3629923    0.4657332    3.132599    12.438889    -1.8337058
+    5.8414927   -1.2872906    5.5694313   12.57059      1.0939219
-    4.532936     2.7264361   10.145339    -6.521951     2.897153
+    2.2142086    1.9181576    6.9914207   -5.888139     3.1409824
-    -3.3925855    5.079156     7.759716     4.677565     5.8457737
+    -2.003628     2.4434285    9.973139     5.03668      2.0051203
-    2.402413     7.7071047    3.9711342   -6.390043     6.1268735
+    2.8615603    5.860224     2.9176188   -1.6311141    2.0292206
-    -3.7760346  -11.118123  ]
+    -4.070415    -6.831437  ]
  ```
 - Python API
  ```python
-  import paddle
+  from paddlespeech.cli.vector import VectorExecutor
  from paddlespeech.cli import VectorExecutor
  vector_executor = VectorExecutor()
  audio_emb = vector_executor(
@ -128,88 +127,88 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
  ```bash
  # Vector Result:
   Audio embedding Result:
-    [  1.4217498    5.626253    -5.342073     1.1773866    3.308055
+    [ -1.3251206    7.8606825   -4.620626     0.3000721    2.2648535
-    1.756596     5.167894    10.80636     -3.8226728   -5.6141334
+      -1.1931441    3.0647137    7.673595    -6.0044727  -12.02426
-    2.623845    -0.8072968    1.9635103   -7.3128724    0.01103897
+      -1.9496069    3.1269536    1.618838    -7.6383104   -1.2299773
-    -9.723131     0.6619743   -6.976803    10.213478     7.494748
+    -12.338331     2.1373026   -5.3957124    9.717328     5.6752305
-    2.9105635    3.8949256    3.7999806    7.1061673   16.905321
+      3.7805123    3.0597172    3.429692     8.97601     13.174125
-    -7.1493764    8.733103     3.4230042   -4.831653   -11.403367
+      -0.53132284   8.9424715    4.46511     -4.4262476   -9.726503
-    11.232214     7.1274667   -4.2828417    2.452362    -5.130748
+      8.399328     7.2239175   -7.435854     2.9441683   -4.3430395
-    -18.177666    -2.6116815  -11.000337    -6.7314315    1.6564683
+    -13.886965    -1.6346735  -10.9027405   -5.311245     3.8007221
-    0.7618269    1.1253023   -2.083836     4.725744    -8.782597
+      3.8976038   -2.1230774   -2.3521194    4.151031    -7.4048667
-    -3.539873     3.814236     5.1420674    2.162061     4.096431
+      0.13911647   2.4626107    4.9664545    0.9897574    5.4839754
-    -6.4162116   12.747448     1.9429878  -15.152943     6.417416
+      -3.3574002   10.1340065   -0.6120171  -10.403095     4.6007543
-    16.097002    -9.716668    -1.9920526   -3.3649497   -1.871939
+      16.00935     -7.7836914   -4.1945305   -6.9368606    1.1789556
-    11.567354     3.69788     11.258265     7.442363     9.183411
+      11.490801     4.2380238    9.550931     8.375046     7.5089145
-    4.5281515   -1.2417862    4.3959084    6.6727695    5.8898783
+      -0.65707296  -0.30051577   2.8406055    3.0828028    0.730817
-    7.627124    -0.66919386 -11.889693    -9.208865    -7.4274073
+      6.148354     0.13766119 -13.424735    -7.7461405   -2.3227983
-    -3.7776625    6.917234    -9.848748    -2.0944717   -5.135116
+      -8.305252     2.9879124  -10.995229     0.15211068  -2.3820348
-    0.49563864   9.317534    -5.9141874   -1.8098574   -0.11738578
+      -1.7984174    8.495629    -5.8522367   -3.755498     0.6989711
-    -7.169265    -1.0578263   -5.7216787   -5.1173844   16.137651
+      -5.2702994   -2.6188622   -1.8828466   -4.64665     14.078544
-    -4.473626     7.6624317   -0.55381083   9.631587    -6.4704556
+      -0.5495333   10.579158    -3.2160501    9.349004    -4.381078
-    -8.548508     4.3716145   -0.79702514   4.478997    -2.9758704
+    -11.675817    -2.8630207    4.5721755    2.246612    -4.574342
-    3.272176     2.8382776    5.134597    -9.190781    -0.5657382
+      1.8610188    2.3767874    5.6257877   -9.784078     0.64967257
-    -4.8745747    2.3165567   -5.984303    -2.1798875    0.35541576
+      -1.4579505    0.4263264   -4.9211264   -2.454784     3.4869802
-    -0.31784213   9.493548     2.1144536    4.358092   -12.089823
+      -0.42654222   8.341269     1.356552     7.0966883  -13.102829
-    8.451689    -7.925461     4.6242585    4.4289427   18.692003
+      8.016734    -7.1159344    1.8699781    0.208721    14.699384
-    -2.6204622   -5.149185    -0.35821092   8.488551     4.981496
+      -1.025278    -2.6107233   -2.5082312    8.427193     6.9138527
-    -9.32683     -2.2544234    6.6417594    1.2119585   10.977129
+      -6.2912464    0.6157366    2.489688    -3.4668267    9.921763
-    16.555033     3.3238444    9.551863    -1.6676947   -0.79539716
+      11.200815    -0.1966403    7.4916005   -0.62312716  -0.25848144
-    -8.605674    -0.47356385   2.6741948   -5.359179    -2.6673796
+      -9.947997    -0.9611041    1.1649219   -2.1907122   -1.5028487
-    0.66607     15.443222     4.740594    -3.4725387   11.592567
+      -0.51926106  15.165954     2.4649463   -0.9980445    7.4416637
-    -2.054497     1.7361217   -8.265324    -9.30447      5.4068313
+      -2.0768049    3.5896823   -7.3055434   -7.5620847    4.323335
-    -1.5180256   -7.746615    -6.089606     0.07112726  -0.34904733
+      0.0804418   -6.56401     -2.3148053   -1.7642345   -2.4708817
-    -8.649895    -9.998958    -2.564841    -0.53999114   2.601808
+      -7.675618    -9.548878    -1.0177554    0.16986446   2.5877135
-    -0.31927416  -1.8815292   -2.07215     -3.4105783   -8.2998085
+      -1.8752296   -0.36614323  -6.0493784   -2.3965611   -5.9453387
-    1.483641   -15.365992    -8.288208     3.8847756   -3.4876456
+      0.9424033  -13.155974    -7.457801     0.14658108  -3.742797
-    7.3629923    0.4657332    3.132599    12.438889    -1.8337058
+      5.8414927   -1.2872906    5.5694313   12.57059      1.0939219
-    4.532936     2.7264361   10.145339    -6.521951     2.897153
+      2.2142086    1.9181576    6.9914207   -5.888139     3.1409824
-    -3.3925855    5.079156     7.759716     4.677565     5.8457737
+      -2.003628     2.4434285    9.973139     5.03668      2.0051203
-    2.402413     7.7071047    3.9711342   -6.390043     6.1268735
+      2.8615603    5.860224     2.9176188   -1.6311141    2.0292206
-    -3.7760346  -11.118123  ]
+      -4.070415    -6.831437  ]
    # get the test embedding
    Test embedding Result:
-    [ -1.902964     2.0690894   -8.034194     3.5472693    0.18089125
+    [  2.5247195    5.119042    -4.335273     4.4583654    5.047907
-      6.9085927    1.4097427   -1.9487704  -10.021278    -0.20755845
+      3.5059214    1.6159848    0.49364898 -11.6899185   -3.1014526
-      -8.04332      4.344489     2.3200977  -14.306299     5.184692
+      -5.6589785   -0.42684984   2.674276   -11.937654     6.2248464
-    -11.55602     -3.8497238    0.6444722    1.2833948    2.6766639
+    -10.776924    -5.694543     1.112041     1.5709964    1.0961034
-      0.5878921    0.7946299    1.7207596    2.5791872   14.998469
+      1.3976512    2.324352     1.339981     5.279319    13.734659
-      -1.3385371   15.031221    -0.8006958    1.99287     -9.52007
+      -2.5753925   13.651442    -2.2357535    5.1575427   -3.251567
-      2.435466     4.003221    -4.33817     -4.898601    -5.304714
+      1.4023279    6.1191974   -6.0845175   -1.3646189   -2.6789894
-    -18.033886    10.790787   -12.784645    -5.641755     2.9761686
+    -15.220778     9.779349    -9.411551    -6.388947     6.8313975
-    -10.566622     1.4839455    6.152458    -5.7195854    2.8603241
+      -9.245996     0.31196198   2.5509644   -4.413065     6.1649427
-      6.112133     8.489869     5.5958056    1.2836679   -1.2293907
+      6.793837     2.6328635    8.620976     3.4832475    0.52491665
-      0.89927405   7.0288725   -2.854029    -0.9782962    5.8255906
+      2.9115407    5.8392377    0.6702376   -3.2726715    2.6694255
-      14.905906    -5.025907     0.7866458   -4.2444224  -16.354029
+      16.91701     -5.5811176    0.23362345  -4.5573606  -11.801059
-      10.521315     0.9604709   -3.3257897    7.144871   -13.592733
+      14.728292    -0.5198082   -3.999922     7.0927105   -7.0459595
-      -8.568869    -1.7953678    0.26313916  10.916714    -6.9374123
+      -5.4389      -0.46420583  -5.1085467   10.376568    -8.889225
-      1.857403    -6.2746415    2.8154466   -7.2338667   -2.293357
+      -0.37705845  -1.659806     2.6731026   -7.1909504    1.4608804
-      -0.05452765   5.4287076    5.0849075   -6.690375    -1.6183422
+      -2.163136    -0.17949677   4.0241547    0.11319201   0.601279
-      3.654291     0.94352573  -9.200294    -5.4749465   -3.5235846
+      2.039692     3.1910992  -11.649526    -8.121584    -4.8707457
-      1.3420814    4.240421    -2.772944    -2.8451524   16.311104
+      0.3851982    1.4231744   -2.3321972    0.99332285  14.121717
-      4.2969875   -1.762936   -12.5758915    8.595198    -0.8835239
+      5.899413     0.7384519  -17.760096    10.555021     4.1366534
-      -1.5708797    1.568961     1.1413603    3.5032008   -0.45251232
+      -0.3391071   -0.20792882   3.208204     0.8847948   -8.721497
-      -6.786333    16.89443      5.3366146   -8.789056     0.6355629
+      -6.432868    13.006379     4.8956      -9.155822    -1.9441519
-      3.2579517   -3.328322     7.5969577    0.66025066  -6.550468
+      5.7815638   -2.066733    10.425042    -0.8802383   -2.4314315
-      -9.148656     2.020372    -0.4615173    1.1965656   -3.8764873
+      -9.869258     0.35095334  -5.3549943    2.1076174   -8.290468
-      11.6562195   -6.0750933   12.182899     3.2218833    0.81969476
+      8.4433365   -4.689333     9.334139    -2.172678    -3.0250976
-      5.570001    -3.8459578   -7.205299     7.9262037   -7.6611166
+      8.394216    -3.2110903   -7.93868      2.3960824   -2.3213403
-      -5.249467    -2.2671914    7.2658715  -13.298164     4.821147
+      -1.4963245   -3.476059     4.132903   -10.893354     4.362673
-      -2.7263982   11.691089    -3.8918593   -2.838112    -1.0336838
+      -0.45456508  10.258634    -1.1655927   -6.7799754    0.22885278
-      -3.8034165    2.8536487   -5.60398     -1.1972581    1.3455094
+      -4.399287     2.333433    -4.84745     -4.2752337   -1.3577863
-      -3.4903061    2.2408795    5.5010734   -3.970756    11.99696
+      -1.0685898    9.505196     7.3062205    0.08708266  12.927811
-      -7.8858757    0.43160373  -5.5059714    4.3426995   16.322706
+      -9.57974      1.3936648   -1.9444873    5.776769    15.251903
-      11.635366     0.72157705  -9.245714    -3.91465     -4.449838
+      10.6118355   -1.4903594   -9.535318    -3.6553776   -1.6699586
-      -1.5716927    7.713747    -2.2430465   -6.198303   -13.481864
+      -0.5933151    7.600357    -4.8815503   -8.698617   -15.855757
-      2.8156567   -5.7812386    5.1456156    2.7289324  -14.505571
+      0.25632986  -7.2235737    0.9506656    0.7128582   -9.051738
-      13.270688     3.448231    -7.0659585    4.5886116   -4.466099
+      8.74869     -1.6426028   -6.5762258    2.506905    -6.7431564
-      -0.296428   -11.463529    -2.6076477   14.110243    -6.9725137
+      5.129912   -12.189555    -3.6435068   12.068113    -6.0059533
-      -1.9962958    2.7119343   19.391657     0.01961198  14.607133
+      -2.3535995    2.9014351   22.3082      -1.5563312   13.193291
-      -1.6695905   -4.391516     1.3131028   -6.670972    -5.888604
+      2.7583609   -7.468798     1.3407065   -4.599617    -6.2345777
-      12.0612335    5.9285784    3.3715196    1.492534    10.723728
+      10.7689295    7.137627     5.099476     0.3473359    9.647881
-      -0.95514804 -12.085431  ]
+      -2.0484571   -5.8549366 ]
    # get the score between enroll and test
-    Eembeddings Score: 0.4292638301849365
+    Eembeddings Score: 0.45332613587379456
  ```
 ### 4.Pretrained Models
--- a/demos/speaker_verification/README_cn.md
+++ b/demos/speaker_verification/README_cn.md
@ -51,51 +51,51 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
  输出：
  ```bash
-  demo  [  1.4217498    5.626253    -5.342073     1.1773866    3.308055
+    [ -1.3251206    7.8606825   -4.620626     0.3000721    2.2648535
-    1.756596     5.167894    10.80636     -3.8226728   -5.6141334
+    -1.1931441    3.0647137    7.673595    -6.0044727  -12.02426
-    2.623845    -0.8072968    1.9635103   -7.3128724    0.01103897
+    -1.9496069    3.1269536    1.618838    -7.6383104   -1.2299773
-    -9.723131     0.6619743   -6.976803    10.213478     7.494748
+  -12.338331     2.1373026   -5.3957124    9.717328     5.6752305
-    2.9105635    3.8949256    3.7999806    7.1061673   16.905321
+    3.7805123    3.0597172    3.429692     8.97601     13.174125
-    -7.1493764    8.733103     3.4230042   -4.831653   -11.403367
+    -0.53132284   8.9424715    4.46511     -4.4262476   -9.726503
-    11.232214     7.1274667   -4.2828417    2.452362    -5.130748
+    8.399328     7.2239175   -7.435854     2.9441683   -4.3430395
-    -18.177666    -2.6116815  -11.000337    -6.7314315    1.6564683
+  -13.886965    -1.6346735  -10.9027405   -5.311245     3.8007221
-    0.7618269    1.1253023   -2.083836     4.725744    -8.782597
+    3.8976038   -2.1230774   -2.3521194    4.151031    -7.4048667
-    -3.539873     3.814236     5.1420674    2.162061     4.096431
+    0.13911647   2.4626107    4.9664545    0.9897574    5.4839754
-    -6.4162116   12.747448     1.9429878  -15.152943     6.417416
+    -3.3574002   10.1340065   -0.6120171  -10.403095     4.6007543
-    16.097002    -9.716668    -1.9920526   -3.3649497   -1.871939
+    16.00935     -7.7836914   -4.1945305   -6.9368606    1.1789556
-    11.567354     3.69788     11.258265     7.442363     9.183411
+    11.490801     4.2380238    9.550931     8.375046     7.5089145
-    4.5281515   -1.2417862    4.3959084    6.6727695    5.8898783
+    -0.65707296  -0.30051577   2.8406055    3.0828028    0.730817
-    7.627124    -0.66919386 -11.889693    -9.208865    -7.4274073
+    6.148354     0.13766119 -13.424735    -7.7461405   -2.3227983
-    -3.7776625    6.917234    -9.848748    -2.0944717   -5.135116
+    -8.305252     2.9879124  -10.995229     0.15211068  -2.3820348
-    0.49563864   9.317534    -5.9141874   -1.8098574   -0.11738578
+    -1.7984174    8.495629    -5.8522367   -3.755498     0.6989711
-    -7.169265    -1.0578263   -5.7216787   -5.1173844   16.137651
+    -5.2702994   -2.6188622   -1.8828466   -4.64665     14.078544
-    -4.473626     7.6624317   -0.55381083   9.631587    -6.4704556
+    -0.5495333   10.579158    -3.2160501    9.349004    -4.381078
-    -8.548508     4.3716145   -0.79702514   4.478997    -2.9758704
+  -11.675817    -2.8630207    4.5721755    2.246612    -4.574342
-    3.272176     2.8382776    5.134597    -9.190781    -0.5657382
+    1.8610188    2.3767874    5.6257877   -9.784078     0.64967257
-    -4.8745747    2.3165567   -5.984303    -2.1798875    0.35541576
+    -1.4579505    0.4263264   -4.9211264   -2.454784     3.4869802
-    -0.31784213   9.493548     2.1144536    4.358092   -12.089823
+    -0.42654222   8.341269     1.356552     7.0966883  -13.102829
-    8.451689    -7.925461     4.6242585    4.4289427   18.692003
+    8.016734    -7.1159344    1.8699781    0.208721    14.699384
-    -2.6204622   -5.149185    -0.35821092   8.488551     4.981496
+    -1.025278    -2.6107233   -2.5082312    8.427193     6.9138527
-    -9.32683     -2.2544234    6.6417594    1.2119585   10.977129
+    -6.2912464    0.6157366    2.489688    -3.4668267    9.921763
-    16.555033     3.3238444    9.551863    -1.6676947   -0.79539716
+    11.200815    -0.1966403    7.4916005   -0.62312716  -0.25848144
-    -8.605674    -0.47356385   2.6741948   -5.359179    -2.6673796
+    -9.947997    -0.9611041    1.1649219   -2.1907122   -1.5028487
-    0.66607     15.443222     4.740594    -3.4725387   11.592567
+    -0.51926106  15.165954     2.4649463   -0.9980445    7.4416637
-    -2.054497     1.7361217   -8.265324    -9.30447      5.4068313
+    -2.0768049    3.5896823   -7.3055434   -7.5620847    4.323335
-    -1.5180256   -7.746615    -6.089606     0.07112726  -0.34904733
+    0.0804418   -6.56401     -2.3148053   -1.7642345   -2.4708817
-    -8.649895    -9.998958    -2.564841    -0.53999114   2.601808
+    -7.675618    -9.548878    -1.0177554    0.16986446   2.5877135
-    -0.31927416  -1.8815292   -2.07215     -3.4105783   -8.2998085
+    -1.8752296   -0.36614323  -6.0493784   -2.3965611   -5.9453387
-    1.483641   -15.365992    -8.288208     3.8847756   -3.4876456
+    0.9424033  -13.155974    -7.457801     0.14658108  -3.742797
-    7.3629923    0.4657332    3.132599    12.438889    -1.8337058
+    5.8414927   -1.2872906    5.5694313   12.57059      1.0939219
-    4.532936     2.7264361   10.145339    -6.521951     2.897153
+    2.2142086    1.9181576    6.9914207   -5.888139     3.1409824
-    -3.3925855    5.079156     7.759716     4.677565     5.8457737
+    -2.003628     2.4434285    9.973139     5.03668      2.0051203
-    2.402413     7.7071047    3.9711342   -6.390043     6.1268735
+    2.8615603    5.860224     2.9176188   -1.6311141    2.0292206
-    -3.7760346  -11.118123  ]
+    -4.070415    -6.831437  ]
  ```
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import VectorExecutor
+  from paddlespeech.cli.vector import VectorExecutor
  vector_executor = VectorExecutor()
  audio_emb = vector_executor(
@ -125,88 +125,88 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
  ```bash
  # Vector Result:
   Audio embedding Result:
-    [  1.4217498    5.626253    -5.342073     1.1773866    3.308055
+    [ -1.3251206    7.8606825   -4.620626     0.3000721    2.2648535
-    1.756596     5.167894    10.80636     -3.8226728   -5.6141334
+      -1.1931441    3.0647137    7.673595    -6.0044727  -12.02426
-    2.623845    -0.8072968    1.9635103   -7.3128724    0.01103897
+      -1.9496069    3.1269536    1.618838    -7.6383104   -1.2299773
-    -9.723131     0.6619743   -6.976803    10.213478     7.494748
+    -12.338331     2.1373026   -5.3957124    9.717328     5.6752305
-    2.9105635    3.8949256    3.7999806    7.1061673   16.905321
+      3.7805123    3.0597172    3.429692     8.97601     13.174125
-    -7.1493764    8.733103     3.4230042   -4.831653   -11.403367
+      -0.53132284   8.9424715    4.46511     -4.4262476   -9.726503
-    11.232214     7.1274667   -4.2828417    2.452362    -5.130748
+      8.399328     7.2239175   -7.435854     2.9441683   -4.3430395
-    -18.177666    -2.6116815  -11.000337    -6.7314315    1.6564683
+    -13.886965    -1.6346735  -10.9027405   -5.311245     3.8007221
-    0.7618269    1.1253023   -2.083836     4.725744    -8.782597
+      3.8976038   -2.1230774   -2.3521194    4.151031    -7.4048667
-    -3.539873     3.814236     5.1420674    2.162061     4.096431
+      0.13911647   2.4626107    4.9664545    0.9897574    5.4839754
-    -6.4162116   12.747448     1.9429878  -15.152943     6.417416
+      -3.3574002   10.1340065   -0.6120171  -10.403095     4.6007543
-    16.097002    -9.716668    -1.9920526   -3.3649497   -1.871939
+      16.00935     -7.7836914   -4.1945305   -6.9368606    1.1789556
-    11.567354     3.69788     11.258265     7.442363     9.183411
+      11.490801     4.2380238    9.550931     8.375046     7.5089145
-    4.5281515   -1.2417862    4.3959084    6.6727695    5.8898783
+      -0.65707296  -0.30051577   2.8406055    3.0828028    0.730817
-    7.627124    -0.66919386 -11.889693    -9.208865    -7.4274073
+      6.148354     0.13766119 -13.424735    -7.7461405   -2.3227983
-    -3.7776625    6.917234    -9.848748    -2.0944717   -5.135116
+      -8.305252     2.9879124  -10.995229     0.15211068  -2.3820348
-    0.49563864   9.317534    -5.9141874   -1.8098574   -0.11738578
+      -1.7984174    8.495629    -5.8522367   -3.755498     0.6989711
-    -7.169265    -1.0578263   -5.7216787   -5.1173844   16.137651
+      -5.2702994   -2.6188622   -1.8828466   -4.64665     14.078544
-    -4.473626     7.6624317   -0.55381083   9.631587    -6.4704556
+      -0.5495333   10.579158    -3.2160501    9.349004    -4.381078
-    -8.548508     4.3716145   -0.79702514   4.478997    -2.9758704
+    -11.675817    -2.8630207    4.5721755    2.246612    -4.574342
-    3.272176     2.8382776    5.134597    -9.190781    -0.5657382
+      1.8610188    2.3767874    5.6257877   -9.784078     0.64967257
-    -4.8745747    2.3165567   -5.984303    -2.1798875    0.35541576
+      -1.4579505    0.4263264   -4.9211264   -2.454784     3.4869802
-    -0.31784213   9.493548     2.1144536    4.358092   -12.089823
+      -0.42654222   8.341269     1.356552     7.0966883  -13.102829
-    8.451689    -7.925461     4.6242585    4.4289427   18.692003
+      8.016734    -7.1159344    1.8699781    0.208721    14.699384
-    -2.6204622   -5.149185    -0.35821092   8.488551     4.981496
+      -1.025278    -2.6107233   -2.5082312    8.427193     6.9138527
-    -9.32683     -2.2544234    6.6417594    1.2119585   10.977129
+      -6.2912464    0.6157366    2.489688    -3.4668267    9.921763
-    16.555033     3.3238444    9.551863    -1.6676947   -0.79539716
+      11.200815    -0.1966403    7.4916005   -0.62312716  -0.25848144
-    -8.605674    -0.47356385   2.6741948   -5.359179    -2.6673796
+      -9.947997    -0.9611041    1.1649219   -2.1907122   -1.5028487
-    0.66607     15.443222     4.740594    -3.4725387   11.592567
+      -0.51926106  15.165954     2.4649463   -0.9980445    7.4416637
-    -2.054497     1.7361217   -8.265324    -9.30447      5.4068313
+      -2.0768049    3.5896823   -7.3055434   -7.5620847    4.323335
-    -1.5180256   -7.746615    -6.089606     0.07112726  -0.34904733
+      0.0804418   -6.56401     -2.3148053   -1.7642345   -2.4708817
-    -8.649895    -9.998958    -2.564841    -0.53999114   2.601808
+      -7.675618    -9.548878    -1.0177554    0.16986446   2.5877135
-    -0.31927416  -1.8815292   -2.07215     -3.4105783   -8.2998085
+      -1.8752296   -0.36614323  -6.0493784   -2.3965611   -5.9453387
-    1.483641   -15.365992    -8.288208     3.8847756   -3.4876456
+      0.9424033  -13.155974    -7.457801     0.14658108  -3.742797
-    7.3629923    0.4657332    3.132599    12.438889    -1.8337058
+      5.8414927   -1.2872906    5.5694313   12.57059      1.0939219
-    4.532936     2.7264361   10.145339    -6.521951     2.897153
+      2.2142086    1.9181576    6.9914207   -5.888139     3.1409824
-    -3.3925855    5.079156     7.759716     4.677565     5.8457737
+      -2.003628     2.4434285    9.973139     5.03668      2.0051203
-    2.402413     7.7071047    3.9711342   -6.390043     6.1268735
+      2.8615603    5.860224     2.9176188   -1.6311141    2.0292206
-    -3.7760346  -11.118123  ]
+      -4.070415    -6.831437  ]
    # get the test embedding
    Test embedding Result:
-    [ -1.902964     2.0690894   -8.034194     3.5472693    0.18089125
+    [  2.5247195    5.119042    -4.335273     4.4583654    5.047907
-      6.9085927    1.4097427   -1.9487704  -10.021278    -0.20755845
+      3.5059214    1.6159848    0.49364898 -11.6899185   -3.1014526
-      -8.04332      4.344489     2.3200977  -14.306299     5.184692
+      -5.6589785   -0.42684984   2.674276   -11.937654     6.2248464
-    -11.55602     -3.8497238    0.6444722    1.2833948    2.6766639
+    -10.776924    -5.694543     1.112041     1.5709964    1.0961034
-      0.5878921    0.7946299    1.7207596    2.5791872   14.998469
+      1.3976512    2.324352     1.339981     5.279319    13.734659
-      -1.3385371   15.031221    -0.8006958    1.99287     -9.52007
+      -2.5753925   13.651442    -2.2357535    5.1575427   -3.251567
-      2.435466     4.003221    -4.33817     -4.898601    -5.304714
+      1.4023279    6.1191974   -6.0845175   -1.3646189   -2.6789894
-    -18.033886    10.790787   -12.784645    -5.641755     2.9761686
+    -15.220778     9.779349    -9.411551    -6.388947     6.8313975
-    -10.566622     1.4839455    6.152458    -5.7195854    2.8603241
+      -9.245996     0.31196198   2.5509644   -4.413065     6.1649427
-      6.112133     8.489869     5.5958056    1.2836679   -1.2293907
+      6.793837     2.6328635    8.620976     3.4832475    0.52491665
-      0.89927405   7.0288725   -2.854029    -0.9782962    5.8255906
+      2.9115407    5.8392377    0.6702376   -3.2726715    2.6694255
-      14.905906    -5.025907     0.7866458   -4.2444224  -16.354029
+      16.91701     -5.5811176    0.23362345  -4.5573606  -11.801059
-      10.521315     0.9604709   -3.3257897    7.144871   -13.592733
+      14.728292    -0.5198082   -3.999922     7.0927105   -7.0459595
-      -8.568869    -1.7953678    0.26313916  10.916714    -6.9374123
+      -5.4389      -0.46420583  -5.1085467   10.376568    -8.889225
-      1.857403    -6.2746415    2.8154466   -7.2338667   -2.293357
+      -0.37705845  -1.659806     2.6731026   -7.1909504    1.4608804
-      -0.05452765   5.4287076    5.0849075   -6.690375    -1.6183422
+      -2.163136    -0.17949677   4.0241547    0.11319201   0.601279
-      3.654291     0.94352573  -9.200294    -5.4749465   -3.5235846
+      2.039692     3.1910992  -11.649526    -8.121584    -4.8707457
-      1.3420814    4.240421    -2.772944    -2.8451524   16.311104
+      0.3851982    1.4231744   -2.3321972    0.99332285  14.121717
-      4.2969875   -1.762936   -12.5758915    8.595198    -0.8835239
+      5.899413     0.7384519  -17.760096    10.555021     4.1366534
-      -1.5708797    1.568961     1.1413603    3.5032008   -0.45251232
+      -0.3391071   -0.20792882   3.208204     0.8847948   -8.721497
-      -6.786333    16.89443      5.3366146   -8.789056     0.6355629
+      -6.432868    13.006379     4.8956      -9.155822    -1.9441519
-      3.2579517   -3.328322     7.5969577    0.66025066  -6.550468
+      5.7815638   -2.066733    10.425042    -0.8802383   -2.4314315
-      -9.148656     2.020372    -0.4615173    1.1965656   -3.8764873
+      -9.869258     0.35095334  -5.3549943    2.1076174   -8.290468
-      11.6562195   -6.0750933   12.182899     3.2218833    0.81969476
+      8.4433365   -4.689333     9.334139    -2.172678    -3.0250976
-      5.570001    -3.8459578   -7.205299     7.9262037   -7.6611166
+      8.394216    -3.2110903   -7.93868      2.3960824   -2.3213403
-      -5.249467    -2.2671914    7.2658715  -13.298164     4.821147
+      -1.4963245   -3.476059     4.132903   -10.893354     4.362673
-      -2.7263982   11.691089    -3.8918593   -2.838112    -1.0336838
+      -0.45456508  10.258634    -1.1655927   -6.7799754    0.22885278
-      -3.8034165    2.8536487   -5.60398     -1.1972581    1.3455094
+      -4.399287     2.333433    -4.84745     -4.2752337   -1.3577863
-      -3.4903061    2.2408795    5.5010734   -3.970756    11.99696
+      -1.0685898    9.505196     7.3062205    0.08708266  12.927811
-      -7.8858757    0.43160373  -5.5059714    4.3426995   16.322706
+      -9.57974      1.3936648   -1.9444873    5.776769    15.251903
-      11.635366     0.72157705  -9.245714    -3.91465     -4.449838
+      10.6118355   -1.4903594   -9.535318    -3.6553776   -1.6699586
-      -1.5716927    7.713747    -2.2430465   -6.198303   -13.481864
+      -0.5933151    7.600357    -4.8815503   -8.698617   -15.855757
-      2.8156567   -5.7812386    5.1456156    2.7289324  -14.505571
+      0.25632986  -7.2235737    0.9506656    0.7128582   -9.051738
-      13.270688     3.448231    -7.0659585    4.5886116   -4.466099
+      8.74869     -1.6426028   -6.5762258    2.506905    -6.7431564
-      -0.296428   -11.463529    -2.6076477   14.110243    -6.9725137
+      5.129912   -12.189555    -3.6435068   12.068113    -6.0059533
-      -1.9962958    2.7119343   19.391657     0.01961198  14.607133
+      -2.3535995    2.9014351   22.3082      -1.5563312   13.193291
-      -1.6695905   -4.391516     1.3131028   -6.670972    -5.888604
+      2.7583609   -7.468798     1.3407065   -4.599617    -6.2345777
-      12.0612335    5.9285784    3.3715196    1.492534    10.723728
+      10.7689295    7.137627     5.099476     0.3473359    9.647881
-      -0.95514804 -12.085431  ]
+      -2.0484571   -5.8549366 ]
    # get the score between enroll and test
-    Eembeddings Score: 0.4292638301849365
+    Eembeddings Score: 0.45332613587379456
  ```
 ### 4.预训练模型
--- a/demos/speech_recognition/README.md
+++ b/demos/speech_recognition/README.md
@ -58,7 +58,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import ASRExecutor
+  from paddlespeech.cli.asr import ASRExecutor
  asr_executor = ASRExecutor()
  text = asr_executor(
--- a/demos/speech_recognition/README_cn.md
+++ b/demos/speech_recognition/README_cn.md
@ -56,7 +56,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import ASRExecutor
+  from paddlespeech.cli.asr import ASRExecutor
  asr_executor = ASRExecutor()
  text = asr_executor(
--- a/demos/speech_server/README.md
+++ b/demos/speech_server/README.md
@ -257,13 +257,13 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
  paddlespeech_client vector --task spk  --server_ip 127.0.0.1 --port 8090 --input 85236145389.wav
  ```
-  * Usage:
+  Usage:
  ``` bash
  paddlespeech_client vector --help
  ```
-  * Arguments:
+  Arguments:
    * server_ip: server ip. Default: 127.0.0.1
    * port: server port. Default: 8090
    * input(required): Input text to generate.
@ -271,15 +271,15 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
    * enroll: enroll audio
    * test: test audio
-  * Output:
+  Output:
  ```bash
-    [2022-05-08 00:18:44,249] [    INFO] - vector http client start
+    [2022-05-25 12:25:36,165] [    INFO] - vector http client start
-    [2022-05-08 00:18:44,250] [    INFO] - the input audio: 85236145389.wav
+    [2022-05-25 12:25:36,165] [    INFO] - the input audio: 85236145389.wav
-    [2022-05-08 00:18:44,250] [    INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector
+    [2022-05-25 12:25:36,165] [    INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector
-    [2022-05-08 00:18:44,250] [    INFO] - http://127.0.0.1:8590/paddlespeech/vector
+    [2022-05-25 12:25:36,166] [    INFO] - http://127.0.0.1:8790/paddlespeech/vector
-    [2022-05-08 00:18:44,406] [    INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [1.421751856803894, 5.626245498657227, -5.342077255249023, 1.1773887872695923, 3.3080549240112305, 1.7565933465957642, 5.167886257171631, 10.806358337402344, -3.8226819038391113, -5.614140033721924, 2.6238479614257812, -0.8072972893714905, 1.9635076522827148, -7.312870025634766, 0.011035939678549767, -9.723129272460938, 0.6619706153869629, -6.976806163787842, 10.213476181030273, 7.494769096374512, 2.9105682373046875, 3.8949244022369385, 3.799983501434326, 7.106168746948242, 16.90532875061035, -7.149388313293457, 8.733108520507812, 3.423006296157837, -4.831653594970703, -11.403363227844238, 11.232224464416504, 7.127461910247803, -4.282842636108398, 2.452359437942505, -5.130749702453613, -18.17766761779785, -2.6116831302642822, -11.000344276428223, -6.731433391571045, 1.6564682722091675, 0.7618281245231628, 1.125300407409668, -2.0838370323181152, 4.725743293762207, -8.782588005065918, -3.5398752689361572, 3.8142364025115967, 5.142068862915039, 2.1620609760284424, 4.09643030166626, -6.416214942932129, 12.747446060180664, 1.9429892301559448, -15.15294361114502, 6.417416095733643, 16.09701156616211, -9.716667175292969, -1.9920575618743896, -3.36494779586792, -1.8719440698623657, 11.567351341247559, 3.6978814601898193, 11.258262634277344, 7.442368507385254, 9.183408737182617, 4.528149127960205, -1.2417854070663452, 4.395912170410156, 6.6727728843688965, 5.88988733291626, 7.627128601074219, -0.6691966652870178, -11.889698028564453, -9.20886516571045, -7.42740535736084, -3.777663230895996, 6.917238712310791, -9.848755836486816, -2.0944676399230957, -5.1351165771484375, 0.4956451654434204, 9.317537307739258, -5.914181232452393, -1.809860348701477, -0.11738915741443634, -7.1692705154418945, -1.057827353477478, -5.721670627593994, -5.117385387420654, 16.13765525817871, -4.473617076873779, 7.6624321937561035, -0.55381840467453, 9.631585121154785, -6.470459461212158, -8.548508644104004, 4.371616840362549, -0.7970245480537415, 4.4789886474609375, -2.975860834121704, 3.2721822261810303, 2.838287830352783, 5.134591102600098, -9.19079875946045, -0.5657302737236023, -4.8745832443237305, 2.3165574073791504, -5.984319686889648, -2.1798853874206543, 0.3554139733314514, -0.3178512752056122, 9.493552207946777, 2.1144471168518066, 4.358094692230225, -12.089824676513672, 8.451693534851074, -7.925466537475586, 4.624246597290039, 4.428936958312988, 18.69200897216797, -2.6204581260681152, -5.14918851852417, -0.3582090139389038, 8.488558769226074, 4.98148775100708, -9.326835632324219, -2.2544219493865967, 6.641760349273682, 1.2119598388671875, 10.977124214172363, 16.555034637451172, 3.3238420486450195, 9.551861763000488, -1.6676981449127197, -0.7953944206237793, -8.605667114257812, -0.4735655188560486, 2.674196243286133, -5.359177112579346, -2.66738224029541, 0.6660683155059814, 15.44322681427002, 4.740593433380127, -3.472534418106079, 11.592567443847656, -2.0544962882995605, 1.736127495765686, -8.265326499938965, -9.30447769165039, 5.406829833984375, -1.518022894859314, -7.746612548828125, -6.089611053466797, 0.07112743705511093, -0.3490503430366516, -8.64989185333252, -9.998957633972168, -2.564845085144043, -0.5399947762489319, 2.6018123626708984, -0.3192799389362335, -1.8815255165100098, -2.0721492767333984, -3.410574436187744, -8.29980754852295, 1.483638048171997, -15.365986824035645, -8.288211822509766, 3.884779930114746, -3.4876468181610107, 7.362999439239502, 0.4657334089279175, 3.1326050758361816, 12.438895225524902, -1.8337041139602661, 4.532927989959717, 2.7264339923858643, 10.14534854888916, -6.521963596343994, 2.897155523300171, -3.392582654953003, 5.079153060913086, 7.7597246170043945, 4.677570819854736, 5.845779895782471, 2.402411460876465, 7.7071051597595215, 3.9711380004882812, -6.39003849029541, 6.12687873840332, -3.776029348373413, -11.118121147155762]}}
+    [2022-05-25 12:25:36,324] [    INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [-1.3251205682754517, 7.860682487487793, -4.620625972747803, 0.3000721037387848, 2.2648534774780273, -1.1931440830230713, 3.064713716506958, 7.673594951629639, -6.004472732543945, -12.024259567260742, -1.9496068954467773, 3.126953601837158, 1.6188379526138306, -7.638310432434082, -1.2299772500991821, -12.33833122253418, 2.1373026371002197, -5.395712375640869, 9.717328071594238, 5.675230503082275, 3.7805123329162598, 3.0597171783447266, 3.429692029953003, 8.9760103225708, 13.174124717712402, -0.5313228368759155, 8.942471504211426, 4.465109825134277, -4.426247596740723, -9.726503372192383, 8.399328231811523, 7.223917484283447, -7.435853958129883, 2.9441683292388916, -4.343039512634277, -13.886964797973633, -1.6346734762191772, -10.902740478515625, -5.311244964599609, 3.800722122192383, 3.897603750228882, -2.123077392578125, -2.3521194458007812, 4.151031017303467, -7.404866695404053, 0.13911646604537964, 2.4626107215881348, 4.96645450592041, 0.9897574186325073, 5.483975410461426, -3.3574001789093018, 10.13400650024414, -0.6120170950889587, -10.403095245361328, 4.600754261016846, 16.009349822998047, -7.78369140625, -4.194530487060547, -6.93686056137085, 1.1789555549621582, 11.490800857543945, 4.23802375793457, 9.550930976867676, 8.375045776367188, 7.508914470672607, -0.6570729613304138, -0.3005157709121704, 2.8406054973602295, 3.0828027725219727, 0.7308170199394226, 6.1483540534973145, 0.1376611888408661, -13.424735069274902, -7.746140480041504, -2.322798252105713, -8.305252075195312, 2.98791241645813, -10.99522876739502, 0.15211068093776703, -2.3820347785949707, -1.7984174489974976, 8.49562931060791, -5.852236747741699, -3.755497932434082, 0.6989710927009583, -5.270299434661865, -2.6188621520996094, -1.8828465938568115, -4.6466498374938965, 14.078543663024902, -0.5495333075523376, 10.579157829284668, -3.216050148010254, 9.349003791809082, -4.381077766418457, -11.675816535949707, -2.863020658493042, 4.5721755027771, 2.246612071990967, -4.574341773986816, 1.8610187768936157, 2.3767874240875244, 5.625787734985352, -9.784077644348145, 0.6496725678443909, -1.457950472831726, 0.4263263940811157, -4.921126365661621, -2.4547839164733887, 3.4869801998138428, -0.4265422224998474, 8.341268539428711, 1.356552004814148, 7.096688270568848, -13.102828979492188, 8.01673412322998, -7.115934371948242, 1.8699780702590942, 0.20872099697589874, 14.699383735656738, -1.0252779722213745, -2.6107232570648193, -2.5082311630249023, 8.427192687988281, 6.913852691650391, -6.29124641418457, 0.6157366037368774, 2.489687919616699, -3.4668266773223877, 9.92176342010498, 11.200815200805664, -0.19664029777050018, 7.491600513458252, -0.6231271624565125, -0.2584814429283142, -9.947997093200684, -0.9611040949821472, 1.1649218797683716, -2.1907122135162354, -1.502848744392395, -0.5192610621452332, 15.165953636169434, 2.4649462699890137, -0.998044490814209, 7.44166374206543, -2.0768048763275146, 3.5896823406219482, -7.305543422698975, -7.562084674835205, 4.32333517074585, 0.08044180274009705, -6.564010143280029, -2.314805269241333, -1.7642345428466797, -2.470881700515747, -7.6756181716918945, -9.548877716064453, -1.017755389213562, 0.1698644608259201, 2.5877134799957275, -1.8752295970916748, -0.36614322662353516, -6.049378395080566, -2.3965611457824707, -5.945338726043701, 0.9424033164978027, -13.155974388122559, -7.45780086517334, 0.14658108353614807, -3.7427968978881836, 5.841492652893066, -1.2872905731201172, 5.569431304931641, 12.570590019226074, 1.0939218997955322, 2.2142086029052734, 1.9181575775146484, 6.991420745849609, -5.888138771057129, 3.1409823894500732, -2.0036280155181885, 2.4434285163879395, 9.973138809204102, 5.036680221557617, 2.005120277404785, 2.861560344696045, 5.860223770141602, 2.917618751525879, -1.63111412525177, 2.0292205810546875, -4.070415019989014, -6.831437110900879]}}
-    [2022-05-08 00:18:44,406] [    INFO] - Response time 0.156481 s.
+    [2022-05-25 12:25:36,324] [    INFO] - Response time 0.159053 s.
  ```
 * Python API
@ -296,10 +296,10 @@ res = vectorclient_executor(
  print(res)
  ```
-* Output:
+  Output:
  ``` bash
-    {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [1.421751856803894, 5.626245498657227, -5.342077255249023, 1.1773887872695923, 3.3080549240112305, 1.7565933465957642, 5.167886257171631, 10.806358337402344, -3.8226819038391113, -5.614140033721924, 2.6238479614257812, -0.8072972893714905, 1.9635076522827148, -7.312870025634766, 0.011035939678549767, -9.723129272460938, 0.6619706153869629, -6.976806163787842, 10.213476181030273, 7.494769096374512, 2.9105682373046875, 3.8949244022369385, 3.799983501434326, 7.106168746948242, 16.90532875061035, -7.149388313293457, 8.733108520507812, 3.423006296157837, -4.831653594970703, -11.403363227844238, 11.232224464416504, 7.127461910247803, -4.282842636108398, 2.452359437942505, -5.130749702453613, -18.17766761779785, -2.6116831302642822, -11.000344276428223, -6.731433391571045, 1.6564682722091675, 0.7618281245231628, 1.125300407409668, -2.0838370323181152, 4.725743293762207, -8.782588005065918, -3.5398752689361572, 3.8142364025115967, 5.142068862915039, 2.1620609760284424, 4.09643030166626, -6.416214942932129, 12.747446060180664, 1.9429892301559448, -15.15294361114502, 6.417416095733643, 16.09701156616211, -9.716667175292969, -1.9920575618743896, -3.36494779586792, -1.8719440698623657, 11.567351341247559, 3.6978814601898193, 11.258262634277344, 7.442368507385254, 9.183408737182617, 4.528149127960205, -1.2417854070663452, 4.395912170410156, 6.6727728843688965, 5.88988733291626, 7.627128601074219, -0.6691966652870178, -11.889698028564453, -9.20886516571045, -7.42740535736084, -3.777663230895996, 6.917238712310791, -9.848755836486816, -2.0944676399230957, -5.1351165771484375, 0.4956451654434204, 9.317537307739258, -5.914181232452393, -1.809860348701477, -0.11738915741443634, -7.1692705154418945, -1.057827353477478, -5.721670627593994, -5.117385387420654, 16.13765525817871, -4.473617076873779, 7.6624321937561035, -0.55381840467453, 9.631585121154785, -6.470459461212158, -8.548508644104004, 4.371616840362549, -0.7970245480537415, 4.4789886474609375, -2.975860834121704, 3.2721822261810303, 2.838287830352783, 5.134591102600098, -9.19079875946045, -0.5657302737236023, -4.8745832443237305, 2.3165574073791504, -5.984319686889648, -2.1798853874206543, 0.3554139733314514, -0.3178512752056122, 9.493552207946777, 2.1144471168518066, 4.358094692230225, -12.089824676513672, 8.451693534851074, -7.925466537475586, 4.624246597290039, 4.428936958312988, 18.69200897216797, -2.6204581260681152, -5.14918851852417, -0.3582090139389038, 8.488558769226074, 4.98148775100708, -9.326835632324219, -2.2544219493865967, 6.641760349273682, 1.2119598388671875, 10.977124214172363, 16.555034637451172, 3.3238420486450195, 9.551861763000488, -1.6676981449127197, -0.7953944206237793, -8.605667114257812, -0.4735655188560486, 2.674196243286133, -5.359177112579346, -2.66738224029541, 0.6660683155059814, 15.44322681427002, 4.740593433380127, -3.472534418106079, 11.592567443847656, -2.0544962882995605, 1.736127495765686, -8.265326499938965, -9.30447769165039, 5.406829833984375, -1.518022894859314, -7.746612548828125, -6.089611053466797, 0.07112743705511093, -0.3490503430366516, -8.64989185333252, -9.998957633972168, -2.564845085144043, -0.5399947762489319, 2.6018123626708984, -0.3192799389362335, -1.8815255165100098, -2.0721492767333984, -3.410574436187744, -8.29980754852295, 1.483638048171997, -15.365986824035645, -8.288211822509766, 3.884779930114746, -3.4876468181610107, 7.362999439239502, 0.4657334089279175, 3.1326050758361816, 12.438895225524902, -1.8337041139602661, 4.532927989959717, 2.7264339923858643, 10.14534854888916, -6.521963596343994, 2.897155523300171, -3.392582654953003, 5.079153060913086, 7.7597246170043945, 4.677570819854736, 5.845779895782471, 2.402411460876465, 7.7071051597595215, 3.9711380004882812, -6.39003849029541, 6.12687873840332, -3.776029348373413, -11.118121147155762]}}
+    {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [-1.3251205682754517, 7.860682487487793, -4.620625972747803, 0.3000721037387848, 2.2648534774780273, -1.1931440830230713, 3.064713716506958, 7.673594951629639, -6.004472732543945, -12.024259567260742, -1.9496068954467773, 3.126953601837158, 1.6188379526138306, -7.638310432434082, -1.2299772500991821, -12.33833122253418, 2.1373026371002197, -5.395712375640869, 9.717328071594238, 5.675230503082275, 3.7805123329162598, 3.0597171783447266, 3.429692029953003, 8.9760103225708, 13.174124717712402, -0.5313228368759155, 8.942471504211426, 4.465109825134277, -4.426247596740723, -9.726503372192383, 8.399328231811523, 7.223917484283447, -7.435853958129883, 2.9441683292388916, -4.343039512634277, -13.886964797973633, -1.6346734762191772, -10.902740478515625, -5.311244964599609, 3.800722122192383, 3.897603750228882, -2.123077392578125, -2.3521194458007812, 4.151031017303467, -7.404866695404053, 0.13911646604537964, 2.4626107215881348, 4.96645450592041, 0.9897574186325073, 5.483975410461426, -3.3574001789093018, 10.13400650024414, -0.6120170950889587, -10.403095245361328, 4.600754261016846, 16.009349822998047, -7.78369140625, -4.194530487060547, -6.93686056137085, 1.1789555549621582, 11.490800857543945, 4.23802375793457, 9.550930976867676, 8.375045776367188, 7.508914470672607, -0.6570729613304138, -0.3005157709121704, 2.8406054973602295, 3.0828027725219727, 0.7308170199394226, 6.1483540534973145, 0.1376611888408661, -13.424735069274902, -7.746140480041504, -2.322798252105713, -8.305252075195312, 2.98791241645813, -10.99522876739502, 0.15211068093776703, -2.3820347785949707, -1.7984174489974976, 8.49562931060791, -5.852236747741699, -3.755497932434082, 0.6989710927009583, -5.270299434661865, -2.6188621520996094, -1.8828465938568115, -4.6466498374938965, 14.078543663024902, -0.5495333075523376, 10.579157829284668, -3.216050148010254, 9.349003791809082, -4.381077766418457, -11.675816535949707, -2.863020658493042, 4.5721755027771, 2.246612071990967, -4.574341773986816, 1.8610187768936157, 2.3767874240875244, 5.625787734985352, -9.784077644348145, 0.6496725678443909, -1.457950472831726, 0.4263263940811157, -4.921126365661621, -2.4547839164733887, 3.4869801998138428, -0.4265422224998474, 8.341268539428711, 1.356552004814148, 7.096688270568848, -13.102828979492188, 8.01673412322998, -7.115934371948242, 1.8699780702590942, 0.20872099697589874, 14.699383735656738, -1.0252779722213745, -2.6107232570648193, -2.5082311630249023, 8.427192687988281, 6.913852691650391, -6.29124641418457, 0.6157366037368774, 2.489687919616699, -3.4668266773223877, 9.92176342010498, 11.200815200805664, -0.19664029777050018, 7.491600513458252, -0.6231271624565125, -0.2584814429283142, -9.947997093200684, -0.9611040949821472, 1.1649218797683716, -2.1907122135162354, -1.502848744392395, -0.5192610621452332, 15.165953636169434, 2.4649462699890137, -0.998044490814209, 7.44166374206543, -2.0768048763275146, 3.5896823406219482, -7.305543422698975, -7.562084674835205, 4.32333517074585, 0.08044180274009705, -6.564010143280029, -2.314805269241333, -1.7642345428466797, -2.470881700515747, -7.6756181716918945, -9.548877716064453, -1.017755389213562, 0.1698644608259201, 2.5877134799957275, -1.8752295970916748, -0.36614322662353516, -6.049378395080566, -2.3965611457824707, -5.945338726043701, 0.9424033164978027, -13.155974388122559, -7.45780086517334, 0.14658108353614807, -3.7427968978881836, 5.841492652893066, -1.2872905731201172, 5.569431304931641, 12.570590019226074, 1.0939218997955322, 2.2142086029052734, 1.9181575775146484, 6.991420745849609, -5.888138771057129, 3.1409823894500732, -2.0036280155181885, 2.4434285163879395, 9.973138809204102, 5.036680221557617, 2.005120277404785, 2.861560344696045, 5.860223770141602, 2.917618751525879, -1.63111412525177, 2.0292205810546875, -4.070415019989014, -6.831437110900879]}}
  ```
 #### 7.2 Get the score between speaker audio embedding
@ -314,13 +314,13 @@ print(res)
  paddlespeech_client vector --task score  --server_ip 127.0.0.1 --port 8090 --enroll 85236145389.wav --test 123456789.wav
  ```
-  * Usage:
+  Usage:
  ``` bash
  paddlespeech_client vector --help
  ```
-  * Arguments:
+  Arguments:
    * server_ip: server ip. Default: 127.0.0.1
    * port: server port. Default: 8090
    * input(required): Input text to generate.
@ -328,15 +328,15 @@ print(res)
    * enroll: enroll audio
    * test: test audio
-* Output:
+  Output:
  ``` bash
-  [2022-05-09 10:28:40,556] [    INFO] - vector score http client start
+    [2022-05-25 12:33:24,527] [    INFO] - vector score http client start
-  [2022-05-09 10:28:40,556] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
+    [2022-05-25 12:33:24,527] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
-  [2022-05-09 10:28:40,556] [    INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector/score
+    [2022-05-25 12:33:24,528] [    INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector/score
-  [2022-05-09 10:28:40,731] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.4292638897895813}}
+    [2022-05-25 12:33:24,695] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
-  [2022-05-09 10:28:40,731] [    INFO] - The vector: None
+    [2022-05-25 12:33:24,696] [    INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
-  [2022-05-09 10:28:40,731] [    INFO] - Response time 0.175514 s.
+    [2022-05-25 12:33:24,696] [    INFO] - Response time 0.168271 s.
  ```
 * Python API
@ -355,16 +355,16 @@ res = vectorclient_executor(
  print(res)
  ```
-* Output:
+  Output:
  ``` bash
-[2022-05-09 10:34:54,769] [    INFO] - vector score http client start
+  [2022-05-25 12:30:14,143] [    INFO] - vector score http client start
-[2022-05-09 10:34:54,771] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
+  [2022-05-25 12:30:14,143] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
-[2022-05-09 10:34:54,771] [    INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector/score
+  [2022-05-25 12:30:14,143] [    INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector/score
-[2022-05-09 10:34:55,026] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.4292638897895813}}
+  [2022-05-25 12:30:14,363] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
  {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
  ```
 ### 8. Punctuation prediction
 **Note:** The response time will be slightly longer when using the client for the first time
@ -382,7 +382,7 @@ print(res)
  ```bash
  paddlespeech_client text --help
  ```
-  参数:
+  Arguments:
  - `server_ip`: server ip. Default: 127.0.0.1
  - `port`: server port. Default: 8090
  - `input`(required): Input text to get punctuation.
--- a/demos/speech_server/README_cn.md
+++ b/demos/speech_server/README_cn.md
@ -201,6 +201,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 ### 6. CLS 客户端使用方法
 **注意：** 初次使用客户端时响应时间会略长
 - 命令行 (推荐使用)
  若 `127.0.0.1` 不能访问，则需要使用实际服务 IP 地址
@ -261,27 +262,27 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
  paddlespeech_client vector --task spk  --server_ip 127.0.0.1 --port 8090 --input 85236145389.wav
  ```
-* 使用帮助:
+  使用帮助:
  ``` bash
  paddlespeech_client vector --help
  ```
-* 参数:
+  参数:
  * server_ip: 服务端ip地址，默认: 127.0.0.1。
  * port: 服务端口，默认: 8090。
  * input(必须输入): 用于识别的音频文件。
  * task: vector 的任务，可选spk或者score。默认是 spk。
  * enroll: 注册音频；。
  * test: 测试音频。
-* 输出:
+  输出:
  ``` bash
-  [2022-05-08 00:18:44,249] [    INFO] - vector http client start
+    [2022-05-25 12:25:36,165] [    INFO] - vector http client start
-  [2022-05-08 00:18:44,250] [    INFO] - the input audio: 85236145389.wav
+    [2022-05-25 12:25:36,165] [    INFO] - the input audio: 85236145389.wav
-  [2022-05-08 00:18:44,250] [    INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector
+    [2022-05-25 12:25:36,165] [    INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector
-  [2022-05-08 00:18:44,250] [    INFO] - http://127.0.0.1:8590/paddlespeech/vector
+    [2022-05-25 12:25:36,166] [    INFO] - http://127.0.0.1:8790/paddlespeech/vector
-  [2022-05-08 00:18:44,406] [    INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [1.421751856803894, 5.626245498657227, -5.342077255249023, 1.1773887872695923, 3.3080549240112305, 1.7565933465957642, 5.167886257171631, 10.806358337402344, -3.8226819038391113, -5.614140033721924, 2.6238479614257812, -0.8072972893714905, 1.9635076522827148, -7.312870025634766, 0.011035939678549767, -9.723129272460938, 0.6619706153869629, -6.976806163787842, 10.213476181030273, 7.494769096374512, 2.9105682373046875, 3.8949244022369385, 3.799983501434326, 7.106168746948242, 16.90532875061035, -7.149388313293457, 8.733108520507812, 3.423006296157837, -4.831653594970703, -11.403363227844238, 11.232224464416504, 7.127461910247803, -4.282842636108398, 2.452359437942505, -5.130749702453613, -18.17766761779785, -2.6116831302642822, -11.000344276428223, -6.731433391571045, 1.6564682722091675, 0.7618281245231628, 1.125300407409668, -2.0838370323181152, 4.725743293762207, -8.782588005065918, -3.5398752689361572, 3.8142364025115967, 5.142068862915039, 2.1620609760284424, 4.09643030166626, -6.416214942932129, 12.747446060180664, 1.9429892301559448, -15.15294361114502, 6.417416095733643, 16.09701156616211, -9.716667175292969, -1.9920575618743896, -3.36494779586792, -1.8719440698623657, 11.567351341247559, 3.6978814601898193, 11.258262634277344, 7.442368507385254, 9.183408737182617, 4.528149127960205, -1.2417854070663452, 4.395912170410156, 6.6727728843688965, 5.88988733291626, 7.627128601074219, -0.6691966652870178, -11.889698028564453, -9.20886516571045, -7.42740535736084, -3.777663230895996, 6.917238712310791, -9.848755836486816, -2.0944676399230957, -5.1351165771484375, 0.4956451654434204, 9.317537307739258, -5.914181232452393, -1.809860348701477, -0.11738915741443634, -7.1692705154418945, -1.057827353477478, -5.721670627593994, -5.117385387420654, 16.13765525817871, -4.473617076873779, 7.6624321937561035, -0.55381840467453, 9.631585121154785, -6.470459461212158, -8.548508644104004, 4.371616840362549, -0.7970245480537415, 4.4789886474609375, -2.975860834121704, 3.2721822261810303, 2.838287830352783, 5.134591102600098, -9.19079875946045, -0.5657302737236023, -4.8745832443237305, 2.3165574073791504, -5.984319686889648, -2.1798853874206543, 0.3554139733314514, -0.3178512752056122, 9.493552207946777, 2.1144471168518066, 4.358094692230225, -12.089824676513672, 8.451693534851074, -7.925466537475586, 4.624246597290039, 4.428936958312988, 18.69200897216797, -2.6204581260681152, -5.14918851852417, -0.3582090139389038, 8.488558769226074, 4.98148775100708, -9.326835632324219, -2.2544219493865967, 6.641760349273682, 1.2119598388671875, 10.977124214172363, 16.555034637451172, 3.3238420486450195, 9.551861763000488, -1.6676981449127197, -0.7953944206237793, -8.605667114257812, -0.4735655188560486, 2.674196243286133, -5.359177112579346, -2.66738224029541, 0.6660683155059814, 15.44322681427002, 4.740593433380127, -3.472534418106079, 11.592567443847656, -2.0544962882995605, 1.736127495765686, -8.265326499938965, -9.30447769165039, 5.406829833984375, -1.518022894859314, -7.746612548828125, -6.089611053466797, 0.07112743705511093, -0.3490503430366516, -8.64989185333252, -9.998957633972168, -2.564845085144043, -0.5399947762489319, 2.6018123626708984, -0.3192799389362335, -1.8815255165100098, -2.0721492767333984, -3.410574436187744, -8.29980754852295, 1.483638048171997, -15.365986824035645, -8.288211822509766, 3.884779930114746, -3.4876468181610107, 7.362999439239502, 0.4657334089279175, 3.1326050758361816, 12.438895225524902, -1.8337041139602661, 4.532927989959717, 2.7264339923858643, 10.14534854888916, -6.521963596343994, 2.897155523300171, -3.392582654953003, 5.079153060913086, 7.7597246170043945, 4.677570819854736, 5.845779895782471, 2.402411460876465, 7.7071051597595215, 3.9711380004882812, -6.39003849029541, 6.12687873840332, -3.776029348373413, -11.118121147155762]}}
+    [2022-05-25 12:25:36,324] [    INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [-1.3251205682754517, 7.860682487487793, -4.620625972747803, 0.3000721037387848, 2.2648534774780273, -1.1931440830230713, 3.064713716506958, 7.673594951629639, -6.004472732543945, -12.024259567260742, -1.9496068954467773, 3.126953601837158, 1.6188379526138306, -7.638310432434082, -1.2299772500991821, -12.33833122253418, 2.1373026371002197, -5.395712375640869, 9.717328071594238, 5.675230503082275, 3.7805123329162598, 3.0597171783447266, 3.429692029953003, 8.9760103225708, 13.174124717712402, -0.5313228368759155, 8.942471504211426, 4.465109825134277, -4.426247596740723, -9.726503372192383, 8.399328231811523, 7.223917484283447, -7.435853958129883, 2.9441683292388916, -4.343039512634277, -13.886964797973633, -1.6346734762191772, -10.902740478515625, -5.311244964599609, 3.800722122192383, 3.897603750228882, -2.123077392578125, -2.3521194458007812, 4.151031017303467, -7.404866695404053, 0.13911646604537964, 2.4626107215881348, 4.96645450592041, 0.9897574186325073, 5.483975410461426, -3.3574001789093018, 10.13400650024414, -0.6120170950889587, -10.403095245361328, 4.600754261016846, 16.009349822998047, -7.78369140625, -4.194530487060547, -6.93686056137085, 1.1789555549621582, 11.490800857543945, 4.23802375793457, 9.550930976867676, 8.375045776367188, 7.508914470672607, -0.6570729613304138, -0.3005157709121704, 2.8406054973602295, 3.0828027725219727, 0.7308170199394226, 6.1483540534973145, 0.1376611888408661, -13.424735069274902, -7.746140480041504, -2.322798252105713, -8.305252075195312, 2.98791241645813, -10.99522876739502, 0.15211068093776703, -2.3820347785949707, -1.7984174489974976, 8.49562931060791, -5.852236747741699, -3.755497932434082, 0.6989710927009583, -5.270299434661865, -2.6188621520996094, -1.8828465938568115, -4.6466498374938965, 14.078543663024902, -0.5495333075523376, 10.579157829284668, -3.216050148010254, 9.349003791809082, -4.381077766418457, -11.675816535949707, -2.863020658493042, 4.5721755027771, 2.246612071990967, -4.574341773986816, 1.8610187768936157, 2.3767874240875244, 5.625787734985352, -9.784077644348145, 0.6496725678443909, -1.457950472831726, 0.4263263940811157, -4.921126365661621, -2.4547839164733887, 3.4869801998138428, -0.4265422224998474, 8.341268539428711, 1.356552004814148, 7.096688270568848, -13.102828979492188, 8.01673412322998, -7.115934371948242, 1.8699780702590942, 0.20872099697589874, 14.699383735656738, -1.0252779722213745, -2.6107232570648193, -2.5082311630249023, 8.427192687988281, 6.913852691650391, -6.29124641418457, 0.6157366037368774, 2.489687919616699, -3.4668266773223877, 9.92176342010498, 11.200815200805664, -0.19664029777050018, 7.491600513458252, -0.6231271624565125, -0.2584814429283142, -9.947997093200684, -0.9611040949821472, 1.1649218797683716, -2.1907122135162354, -1.502848744392395, -0.5192610621452332, 15.165953636169434, 2.4649462699890137, -0.998044490814209, 7.44166374206543, -2.0768048763275146, 3.5896823406219482, -7.305543422698975, -7.562084674835205, 4.32333517074585, 0.08044180274009705, -6.564010143280029, -2.314805269241333, -1.7642345428466797, -2.470881700515747, -7.6756181716918945, -9.548877716064453, -1.017755389213562, 0.1698644608259201, 2.5877134799957275, -1.8752295970916748, -0.36614322662353516, -6.049378395080566, -2.3965611457824707, -5.945338726043701, 0.9424033164978027, -13.155974388122559, -7.45780086517334, 0.14658108353614807, -3.7427968978881836, 5.841492652893066, -1.2872905731201172, 5.569431304931641, 12.570590019226074, 1.0939218997955322, 2.2142086029052734, 1.9181575775146484, 6.991420745849609, -5.888138771057129, 3.1409823894500732, -2.0036280155181885, 2.4434285163879395, 9.973138809204102, 5.036680221557617, 2.005120277404785, 2.861560344696045, 5.860223770141602, 2.917618751525879, -1.63111412525177, 2.0292205810546875, -4.070415019989014, -6.831437110900879]}}
-  [2022-05-08 00:18:44,406] [    INFO] - Response time 0.156481 s.
+    [2022-05-25 12:25:36,324] [    INFO] - Response time 0.159053 s.
  ```
 * Python API
@ -298,10 +299,10 @@ res = vectorclient_executor(
  print(res)
  ```
-* 输出:
+  输出:
  ``` bash
-  {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [1.421751856803894, 5.626245498657227, -5.342077255249023, 1.1773887872695923, 3.3080549240112305, 1.7565933465957642, 5.167886257171631, 10.806358337402344, -3.8226819038391113, -5.614140033721924, 2.6238479614257812, -0.8072972893714905, 1.9635076522827148, -7.312870025634766, 0.011035939678549767, -9.723129272460938, 0.6619706153869629, -6.976806163787842, 10.213476181030273, 7.494769096374512, 2.9105682373046875, 3.8949244022369385, 3.799983501434326, 7.106168746948242, 16.90532875061035, -7.149388313293457, 8.733108520507812, 3.423006296157837, -4.831653594970703, -11.403363227844238, 11.232224464416504, 7.127461910247803, -4.282842636108398, 2.452359437942505, -5.130749702453613, -18.17766761779785, -2.6116831302642822, -11.000344276428223, -6.731433391571045, 1.6564682722091675, 0.7618281245231628, 1.125300407409668, -2.0838370323181152, 4.725743293762207, -8.782588005065918, -3.5398752689361572, 3.8142364025115967, 5.142068862915039, 2.1620609760284424, 4.09643030166626, -6.416214942932129, 12.747446060180664, 1.9429892301559448, -15.15294361114502, 6.417416095733643, 16.09701156616211, -9.716667175292969, -1.9920575618743896, -3.36494779586792, -1.8719440698623657, 11.567351341247559, 3.6978814601898193, 11.258262634277344, 7.442368507385254, 9.183408737182617, 4.528149127960205, -1.2417854070663452, 4.395912170410156, 6.6727728843688965, 5.88988733291626, 7.627128601074219, -0.6691966652870178, -11.889698028564453, -9.20886516571045, -7.42740535736084, -3.777663230895996, 6.917238712310791, -9.848755836486816, -2.0944676399230957, -5.1351165771484375, 0.4956451654434204, 9.317537307739258, -5.914181232452393, -1.809860348701477, -0.11738915741443634, -7.1692705154418945, -1.057827353477478, -5.721670627593994, -5.117385387420654, 16.13765525817871, -4.473617076873779, 7.6624321937561035, -0.55381840467453, 9.631585121154785, -6.470459461212158, -8.548508644104004, 4.371616840362549, -0.7970245480537415, 4.4789886474609375, -2.975860834121704, 3.2721822261810303, 2.838287830352783, 5.134591102600098, -9.19079875946045, -0.5657302737236023, -4.8745832443237305, 2.3165574073791504, -5.984319686889648, -2.1798853874206543, 0.3554139733314514, -0.3178512752056122, 9.493552207946777, 2.1144471168518066, 4.358094692230225, -12.089824676513672, 8.451693534851074, -7.925466537475586, 4.624246597290039, 4.428936958312988, 18.69200897216797, -2.6204581260681152, -5.14918851852417, -0.3582090139389038, 8.488558769226074, 4.98148775100708, -9.326835632324219, -2.2544219493865967, 6.641760349273682, 1.2119598388671875, 10.977124214172363, 16.555034637451172, 3.3238420486450195, 9.551861763000488, -1.6676981449127197, -0.7953944206237793, -8.605667114257812, -0.4735655188560486, 2.674196243286133, -5.359177112579346, -2.66738224029541, 0.6660683155059814, 15.44322681427002, 4.740593433380127, -3.472534418106079, 11.592567443847656, -2.0544962882995605, 1.736127495765686, -8.265326499938965, -9.30447769165039, 5.406829833984375, -1.518022894859314, -7.746612548828125, -6.089611053466797, 0.07112743705511093, -0.3490503430366516, -8.64989185333252, -9.998957633972168, -2.564845085144043, -0.5399947762489319, 2.6018123626708984, -0.3192799389362335, -1.8815255165100098, -2.0721492767333984, -3.410574436187744, -8.29980754852295, 1.483638048171997, -15.365986824035645, -8.288211822509766, 3.884779930114746, -3.4876468181610107, 7.362999439239502, 0.4657334089279175, 3.1326050758361816, 12.438895225524902, -1.8337041139602661, 4.532927989959717, 2.7264339923858643, 10.14534854888916, -6.521963596343994, 2.897155523300171, -3.392582654953003, 5.079153060913086, 7.7597246170043945, 4.677570819854736, 5.845779895782471, 2.402411460876465, 7.7071051597595215, 3.9711380004882812, -6.39003849029541, 6.12687873840332, -3.776029348373413, -11.118121147155762]}}
+    {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [-1.3251205682754517, 7.860682487487793, -4.620625972747803, 0.3000721037387848, 2.2648534774780273, -1.1931440830230713, 3.064713716506958, 7.673594951629639, -6.004472732543945, -12.024259567260742, -1.9496068954467773, 3.126953601837158, 1.6188379526138306, -7.638310432434082, -1.2299772500991821, -12.33833122253418, 2.1373026371002197, -5.395712375640869, 9.717328071594238, 5.675230503082275, 3.7805123329162598, 3.0597171783447266, 3.429692029953003, 8.9760103225708, 13.174124717712402, -0.5313228368759155, 8.942471504211426, 4.465109825134277, -4.426247596740723, -9.726503372192383, 8.399328231811523, 7.223917484283447, -7.435853958129883, 2.9441683292388916, -4.343039512634277, -13.886964797973633, -1.6346734762191772, -10.902740478515625, -5.311244964599609, 3.800722122192383, 3.897603750228882, -2.123077392578125, -2.3521194458007812, 4.151031017303467, -7.404866695404053, 0.13911646604537964, 2.4626107215881348, 4.96645450592041, 0.9897574186325073, 5.483975410461426, -3.3574001789093018, 10.13400650024414, -0.6120170950889587, -10.403095245361328, 4.600754261016846, 16.009349822998047, -7.78369140625, -4.194530487060547, -6.93686056137085, 1.1789555549621582, 11.490800857543945, 4.23802375793457, 9.550930976867676, 8.375045776367188, 7.508914470672607, -0.6570729613304138, -0.3005157709121704, 2.8406054973602295, 3.0828027725219727, 0.7308170199394226, 6.1483540534973145, 0.1376611888408661, -13.424735069274902, -7.746140480041504, -2.322798252105713, -8.305252075195312, 2.98791241645813, -10.99522876739502, 0.15211068093776703, -2.3820347785949707, -1.7984174489974976, 8.49562931060791, -5.852236747741699, -3.755497932434082, 0.6989710927009583, -5.270299434661865, -2.6188621520996094, -1.8828465938568115, -4.6466498374938965, 14.078543663024902, -0.5495333075523376, 10.579157829284668, -3.216050148010254, 9.349003791809082, -4.381077766418457, -11.675816535949707, -2.863020658493042, 4.5721755027771, 2.246612071990967, -4.574341773986816, 1.8610187768936157, 2.3767874240875244, 5.625787734985352, -9.784077644348145, 0.6496725678443909, -1.457950472831726, 0.4263263940811157, -4.921126365661621, -2.4547839164733887, 3.4869801998138428, -0.4265422224998474, 8.341268539428711, 1.356552004814148, 7.096688270568848, -13.102828979492188, 8.01673412322998, -7.115934371948242, 1.8699780702590942, 0.20872099697589874, 14.699383735656738, -1.0252779722213745, -2.6107232570648193, -2.5082311630249023, 8.427192687988281, 6.913852691650391, -6.29124641418457, 0.6157366037368774, 2.489687919616699, -3.4668266773223877, 9.92176342010498, 11.200815200805664, -0.19664029777050018, 7.491600513458252, -0.6231271624565125, -0.2584814429283142, -9.947997093200684, -0.9611040949821472, 1.1649218797683716, -2.1907122135162354, -1.502848744392395, -0.5192610621452332, 15.165953636169434, 2.4649462699890137, -0.998044490814209, 7.44166374206543, -2.0768048763275146, 3.5896823406219482, -7.305543422698975, -7.562084674835205, 4.32333517074585, 0.08044180274009705, -6.564010143280029, -2.314805269241333, -1.7642345428466797, -2.470881700515747, -7.6756181716918945, -9.548877716064453, -1.017755389213562, 0.1698644608259201, 2.5877134799957275, -1.8752295970916748, -0.36614322662353516, -6.049378395080566, -2.3965611457824707, -5.945338726043701, 0.9424033164978027, -13.155974388122559, -7.45780086517334, 0.14658108353614807, -3.7427968978881836, 5.841492652893066, -1.2872905731201172, 5.569431304931641, 12.570590019226074, 1.0939218997955322, 2.2142086029052734, 1.9181575775146484, 6.991420745849609, -5.888138771057129, 3.1409823894500732, -2.0036280155181885, 2.4434285163879395, 9.973138809204102, 5.036680221557617, 2.005120277404785, 2.861560344696045, 5.860223770141602, 2.917618751525879, -1.63111412525177, 2.0292205810546875, -4.070415019989014, -6.831437110900879]}}
  ```
 #### 7.2 音频声纹打分
@ -315,28 +316,29 @@ print(res)
  paddlespeech_client vector --task score  --server_ip 127.0.0.1 --port 8090 --enroll 85236145389.wav --test 123456789.wav
  ```
-* 使用帮助:
+  使用帮助:
  ``` bash
  paddlespeech_client vector --help
  ```
-* 参数:
+  参数:
  * server_ip: 服务端ip地址，默认: 127.0.0.1。
  * port: 服务端口，默认: 8090。
  * input(必须输入): 用于识别的音频文件。
  * task: vector 的任务，可选spk或者score。默认是 spk。
  * enroll: 注册音频；。
  * test: 测试音频。
-* 输出:
+
  输出:
  ``` bash
-  [2022-05-09 10:28:40,556] [    INFO] - vector score http client start
+    [2022-05-25 12:33:24,527] [    INFO] - vector score http client start
-  [2022-05-09 10:28:40,556] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
+    [2022-05-25 12:33:24,527] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
-  [2022-05-09 10:28:40,556] [    INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector/score
+    [2022-05-25 12:33:24,528] [    INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector/score
-  [2022-05-09 10:28:40,731] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.4292638897895813}}
+    [2022-05-25 12:33:24,695] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
-  [2022-05-09 10:28:40,731] [    INFO] - The vector: None
+    [2022-05-25 12:33:24,696] [    INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
-  [2022-05-09 10:28:40,731] [    INFO] - Response time 0.175514 s.
+    [2022-05-25 12:33:24,696] [    INFO] - Response time 0.168271 s.
  ```
 * Python API
@ -355,13 +357,14 @@ res = vectorclient_executor(
  print(res)
  ```
-* 输出:
+  输出:
  ``` bash
-[2022-05-09 10:34:54,769] [    INFO] - vector score http client start
+  [2022-05-25 12:30:14,143] [    INFO] - vector score http client start
-[2022-05-09 10:34:54,771] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
+  [2022-05-25 12:30:14,143] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
-[2022-05-09 10:34:54,771] [    INFO] - endpoint: http://127.0.0.1:8590/paddlespeech/vector/score
+  [2022-05-25 12:30:14,143] [    INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector/score
-[2022-05-09 10:34:55,026] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.4292638897895813}}
+  [2022-05-25 12:30:14,363] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
  {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
  ```
--- a/demos/speech_server/start_multi_progress_server.py
+++ b/demos/speech_server/start_multi_progress_server.py
@ -0,0 +1,70 @@
 # Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import argparse
 import warnings
 import uvicorn
 from fastapi import FastAPI
 from starlette.middleware.cors import CORSMiddleware
 from paddlespeech.server.engine.engine_pool import init_engine_pool
 from paddlespeech.server.restful.api import setup_router as setup_http_router
 from paddlespeech.server.utils.config import get_config
 from paddlespeech.server.ws.api import setup_router as setup_ws_router
 warnings.filterwarnings("ignore")
 import sys
 app = FastAPI(
    title="PaddleSpeech Serving API", description="Api", version="0.0.1")
 app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"])
 # change yaml file here
 config_file = "./conf/application.yaml"
 config = get_config(config_file)
 # init engine
 if not init_engine_pool(config):
    print("Failed to init engine.")
    sys.exit(-1)
 # get api_router
 api_list = list(engine.split("_")[0] for engine in config.engine_list)
 if config.protocol == "websocket":
    api_router = setup_ws_router(api_list)
 elif config.protocol == "http":
    api_router = setup_http_router(api_list)
 else:
    raise Exception("unsupported protocol")
    sys.exit(-1)
 # app needs to operate outside the main function 
 app.include_router(api_router)
 if __name__ == "__main__":
    parser = argparse.ArgumentParser(add_help=True)
    parser.add_argument(
        "--workers", type=int, help="workers of server", default=1)
    args = parser.parse_args()
    uvicorn.run(
        "start_multi_progress_server:app",
        host=config.host,
        port=config.port,
        debug=True,
        workers=args.workers)
--- a/demos/speech_translation/README.md
+++ b/demos/speech_translation/README.md
@ -47,7 +47,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import STExecutor
+  from paddlespeech.cli.st import STExecutor
  st_executor = STExecutor()
  text = st_executor(
--- a/demos/speech_translation/README_cn.md
+++ b/demos/speech_translation/README_cn.md
@ -47,7 +47,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import STExecutor
+  from paddlespeech.cli.st import STExecutor
  st_executor = STExecutor()
  text = st_executor(
--- a/demos/streaming_asr_server/README.md
+++ b/demos/streaming_asr_server/README.md
@ -33,6 +33,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
  ```bash
  # in PaddleSpeech/demos/streaming_asr_server start the service
   paddlespeech_server start --config_file ./conf/ws_conformer_wenetspeech_application.yaml
  # if you want to increase decoding speed, you can use the config file below, it will increase decoding speed and reduce accuracy  
   paddlespeech_server start --config_file ./conf/ws_conformer_wenetspeech_application_faster.yaml
  ```
  Usage:
--- a/demos/streaming_asr_server/README_cn.md
+++ b/demos/streaming_asr_server/README_cn.md
@ -40,6 +40,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
  ```bash
  # 在 PaddleSpeech/demos/streaming_asr_server 目录启动服务
  paddlespeech_server start --config_file ./conf/ws_conformer_wenetspeech_application.yaml
  # 你如果愿意为了增加解码的速度而牺牲一定的模型精度，你可以使用如下的脚本 
   paddlespeech_server start --config_file ./conf/ws_conformer_wenetspeech_application_faster.yaml
  ```
  使用方法：
--- a/demos/streaming_asr_server/conf/application.yaml
+++ b/demos/streaming_asr_server/conf/application.yaml
@ -31,6 +31,8 @@ asr_online:
    force_yes: True
    device: 'cpu' # cpu or gpu:id
    decode_method: "attention_rescoring"
    continuous_decoding: True # enable continue decoding when endpoint detected
    am_predictor_conf:
        device:  # set 'gpu:id' or 'cpu'
        switch_ir_optim: True
--- a/demos/streaming_asr_server/conf/ws_conformer_application.yaml
+++ b/demos/streaming_asr_server/conf/ws_conformer_application.yaml
@ -4,7 +4,7 @@
 #                             SERVER SETTING                                    #
 #################################################################################
 host: 0.0.0.0
-port: 8090
+port: 8091
 # The task format in the engin_list is: <speech task>_<engine type>
 # task choices = ['asr_online']
@ -28,8 +28,12 @@ asr_online:
    sample_rate: 16000
    cfg_path: 
    decode_method: 
    num_decoding_left_chunks: -1
    force_yes: True
    device: 'cpu' # cpu or gpu:id
    decode_method: "attention_rescoring"
    continuous_decoding: True # enable continue decoding when endpoint detected
    am_predictor_conf:
        device:  # set 'gpu:id' or 'cpu'
        switch_ir_optim: True
--- a/demos/streaming_asr_server/conf/ws_conformer_wenetspeech_application.yaml
+++ b/demos/streaming_asr_server/conf/ws_conformer_wenetspeech_application.yaml
@ -31,6 +31,8 @@ asr_online:
    force_yes: True
    device: 'cpu' # cpu or gpu:id
    decode_method: "attention_rescoring"
    continuous_decoding: True # enable continue decoding when endpoint detected
    num_decoding_left_chunks: -1
    am_predictor_conf:
        device:  # set 'gpu:id' or 'cpu'
        switch_ir_optim: True
--- a/demos/streaming_asr_server/conf/ws_conformer_wenetspeech_application_faster.yaml
+++ b/demos/streaming_asr_server/conf/ws_conformer_wenetspeech_application_faster.yaml
@ -0,0 +1,48 @@
 # This is the parameter configuration file for PaddleSpeech Serving.
 #################################################################################
 #                             SERVER SETTING                                    #
 #################################################################################
 host: 0.0.0.0
 port: 8090
 # The task format in the engin_list is: <speech task>_<engine type>
 # task choices = ['asr_online']
 # protocol = ['websocket'] (only one can be selected).
 # websocket only support online engine type.
 protocol: 'websocket'
 engine_list: ['asr_online']
 #################################################################################
 #                                ENGINE CONFIG                                  #
 #################################################################################
 ################################### ASR #########################################
 ################### speech task: asr; engine_type: online #######################
 asr_online:
    model_type: 'conformer_online_wenetspeech'
    am_model: # the pdmodel file of am static model [optional]
    am_params:  # the pdiparams file of am static model [optional]
    lang: 'zh'
    sample_rate: 16000
    cfg_path: 
    decode_method: 
    force_yes: True
    device: 'cpu' # cpu or gpu:id
    decode_method: "attention_rescoring"
    continuous_decoding: True # enable continue decoding when endpoint detected
    num_decoding_left_chunks: 16
    am_predictor_conf:
        device:  # set 'gpu:id' or 'cpu'
        switch_ir_optim: True
        glog_info: False  # True -> print glog
        summary: True  # False -> do not show predictor config
    chunk_buffer_conf:
        window_n: 7     # frame
        shift_n: 4      # frame
        window_ms: 25   # ms
        shift_ms: 10    # ms
        sample_rate: 16000
        sample_width: 2
--- a/demos/streaming_asr_server/conf/ws_ds2_application.yaml
+++ b/demos/streaming_asr_server/conf/ws_ds2_application.yaml
@ -28,6 +28,7 @@ asr_online:
    sample_rate: 16000
    cfg_path: 
    decode_method: 
    num_decoding_left_chunks: 
    force_yes: True
    device: 'cpu' # cpu or gpu:id
--- a/demos/streaming_asr_server/server.sh
+++ b/demos/streaming_asr_server/server.sh
@ -4,5 +4,6 @@ export CUDA_VISIBLE_DEVICE=0,1,2,3
 # nohup python3 punc_server.py --config_file conf/punc_application.yaml > punc.log 2>&1 &
 paddlespeech_server start --config_file conf/punc_application.yaml &> punc.log &
-# nohup python3 streaming_asr_server.py --config_file conf/ws_conformer_application.yaml > streaming_asr.log 2>&1 &
+# nohup python3 streaming_asr_server.py --config_file conf/ws_conformer_wenetspeech_application.yaml > streaming_asr.log 2>&1 &
-paddlespeech_server start --config_file conf/ws_conformer_application.yaml &> streaming_asr.log  &
+paddlespeech_server start --config_file conf/ws_conformer_wenetspeech_application.yaml &> streaming_asr.log  &
--- a/demos/streaming_asr_server/test.sh
+++ b/demos/streaming_asr_server/test.sh
@ -10,3 +10,4 @@ paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --input ./zh.wa
 # If `127.0.0.1` is not accessible, you need to use the actual service IP address.
 # python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --wavfile ./zh.wav
 paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --input ./zh.wav
--- a/demos/streaming_asr_server/web/templates/index.html
+++ b/demos/streaming_asr_server/web/templates/index.html
@ -93,6 +93,7 @@
    function parseResult(data) {
      var data = JSON.parse(data)
      console.log('result json:', data)
      var result = data.result
      console.log(result)
      $("#resultPanel").html(result)
--- a/demos/streaming_asr_server/websocket_client.py
+++ b/demos/streaming_asr_server/websocket_client.py
@ -13,9 +13,7 @@
 # limitations under the License.
 #!/usr/bin/python
 # -*- coding: UTF-8 -*-
 # script for calc RTF: grep -rn RTF log.txt | awk '{print $NF}' | awk -F "=" '{sum += $NF} END {print "all time",sum, "audio num", NR,  "RTF", sum/NR}'
 import argparse
 import asyncio
 import codecs
--- a/demos/streaming_tts_server/README.md
+++ b/demos/streaming_tts_server/README.md
@ -27,7 +27,7 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
 - In streaming voc inference, one chunk of data is inferred at a time to achieve a streaming effect. Where `voc_block` indicates the number of valid frames in the chunk, and `voc_pad` indicates the number of frames added before and after the voc_block in a chunk. The existence of voc_pad is used to eliminate errors caused by streaming inference and avoid the influence of streaming inference on the quality of synthesized audio.
    - Both hifigan and mb_melgan support streaming voc inference.
    - When the voc model is mb_melgan, when voc_pad=14, the synthetic audio for streaming inference is consistent with the non-streaming synthetic audio; the minimum voc_pad can be set to 7, and the synthetic audio has no abnormal hearing. If the voc_pad is less than 7, the synthetic audio sounds abnormal.
-    - When the voc model is hifigan, when voc_pad=20, the streaming inference synthetic audio is consistent with the non-streaming synthetic audio; when voc_pad=14, the synthetic audio has no abnormal hearing.
+    - When the voc model is hifigan, when voc_pad=19, the streaming inference synthetic audio is consistent with the non-streaming synthetic audio; when voc_pad=14, the synthetic audio has no abnormal hearing.
 - Inference speed: mb_melgan > hifigan; Audio quality: mb_melgan < hifigan
 - **Note:** If the service can be started normally in the container, but the client access IP is unreachable, you can try to replace the `host` address in the configuration file with the local IP address.
--- a/demos/streaming_tts_server/README_cn.md
+++ b/demos/streaming_tts_server/README_cn.md
@ -27,7 +27,7 @@
 - 流式 voc 推理中，每次会对一个 chunk 的数据进行推理以达到流式的效果。其中 `voc_block` 表示chunk中的有效帧数，`voc_pad` 表示一个 chunk 中 voc_block 前后各加的帧数。voc_pad 的存在用于消除流式推理产生的误差，避免由流式推理对合成音频质量的影响。
    - hifigan, mb_melgan 均支持流式 voc 推理
    - 当 voc 模型为 mb_melgan，当 voc_pad=14 时，流式推理合成音频与非流式合成音频一致；voc_pad 最小可以设置为7，合成音频听感上没有异常，若 voc_pad 小于7，合成音频听感上存在异常。
-    - 当 voc 模型为 hifigan，当 voc_pad=20 时，流式推理合成音频与非流式合成音频一致；当 voc_pad=14 时，合成音频听感上没有异常。
+    - 当 voc 模型为 hifigan，当 voc_pad=19 时，流式推理合成音频与非流式合成音频一致；当 voc_pad=14 时，合成音频听感上没有异常。
 - 推理速度：mb_melgan > hifigan; 音频质量：mb_melgan < hifigan
 - **注意：** 如果在容器里可正常启动服务，但客户端访问 ip 不可达，可尝试将配置文件中 `host` 地址换成本地 ip 地址。
--- a/demos/streaming_tts_server/conf/tts_online_application.yaml
+++ b/demos/streaming_tts_server/conf/tts_online_application.yaml
@ -47,7 +47,7 @@ tts_online:
    am_pad: 12
    # voc_pad and voc_block voc model to streaming voc infer,
    # when voc model is mb_melgan_csmsc, voc_pad set 14, streaming synthetic audio is the same as non-streaming synthetic audio; The minimum value of pad can be set to 7, streaming synthetic audio sounds normal
-    # when voc model is hifigan_csmsc, voc_pad set 20, streaming synthetic audio is the same as non-streaming synthetic audio; voc_pad set 14, streaming synthetic audio sounds normal
+    # when voc model is hifigan_csmsc, voc_pad set 19, streaming synthetic audio is the same as non-streaming synthetic audio; voc_pad set 14, streaming synthetic audio sounds normal
    voc_block: 36
    voc_pad: 14
@ -95,7 +95,7 @@ tts_online-onnx:
    am_pad: 12
    # voc_pad and voc_block voc model to streaming voc infer,
    # when voc model is mb_melgan_csmsc_onnx, voc_pad set 14, streaming synthetic audio is the same as non-streaming synthetic audio; The minimum value of pad can be set to 7, streaming synthetic audio sounds normal
-    # when voc model is hifigan_csmsc_onnx, voc_pad set 20, streaming synthetic audio is the same as non-streaming synthetic audio; voc_pad set 14, streaming synthetic audio sounds normal
+    # when voc model is hifigan_csmsc_onnx, voc_pad set 19, streaming synthetic audio is the same as non-streaming synthetic audio; voc_pad set 14, streaming synthetic audio sounds normal
    voc_block: 36
    voc_pad: 14
    # voc_upsample should be same as n_shift on voc config.
--- a/demos/text_to_speech/README.md
+++ b/demos/text_to_speech/README.md
@ -77,7 +77,7 @@ The input of this demo should be a text of the specific language that can be pas
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import TTSExecutor
+  from paddlespeech.cli.tts import TTSExecutor
  tts_executor = TTSExecutor()
  wav_file = tts_executor(
--- a/demos/text_to_speech/README_cn.md
+++ b/demos/text_to_speech/README_cn.md
@ -80,7 +80,7 @@
 - Python API
  ```python
  import paddle
-  from paddlespeech.cli import TTSExecutor
+  from paddlespeech.cli.tts import TTSExecutor
  tts_executor = TTSExecutor()
  wav_file = tts_executor(
--- a/docker/ubuntu18-cpu/Dockerfile
+++ b/docker/ubuntu18-cpu/Dockerfile
@ -0,0 +1,15 @@
 FROM registry.baidubce.com/paddlepaddle/paddle:2.2.2
 LABEL maintainer="paddlesl@baidu.com"
 RUN git clone --depth 1 https://github.com/PaddlePaddle/PaddleSpeech.git /home/PaddleSpeech  
 RUN pip3 uninstall mccabe -y ; exit 0;
 RUN pip3 install multiprocess==0.70.12 importlib-metadata==4.2.0 dill==0.3.4
 RUN cd /home/PaddleSpeech/audio
 RUN python setup.py bdist_wheel
 RUN cd /home/PaddleSpeech
 RUN python setup.py bdist_wheel
 RUN pip install audio/dist/*.whl dist/*.whl
 WORKDIR /home/PaddleSpeech/
--- a/docs/paddlespeech.pdf
+++ b/docs/paddlespeech.pdf
--- a/docs/source/asr/PPASR_cn.md
+++ b/docs/source/asr/PPASR_cn.md
@ -92,5 +92,3 @@ server 的 demo： [streaming_asr_server](https://github.com/PaddlePaddle/Paddle
 ## 4. 快速开始
 关于如果使用 PP-ASR，可以看这里的 [install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)，其中提供了 **简单**、**中等**、**困难** 三种安装方式。如果想体验 paddlespeech 的推理功能，可以用 **简单** 安装方式。
--- a/docs/source/audio/_static/custom.css
+++ b/docs/source/audio/_static/custom.css
--- a/docs/source/audio/_templates/module.rst_t
+++ b/docs/source/audio/_templates/module.rst_t
--- a/docs/source/audio/_templates/package.rst_t
+++ b/docs/source/audio/_templates/package.rst_t
--- a/docs/source/audio/_templates/toc.rst_t
+++ b/docs/source/audio/_templates/toc.rst_t
--- a/docs/source/audio/conf.py
+++ b/docs/source/audio/conf.py
--- a/docs/source/audio/index.rst
+++ b/docs/source/audio/index.rst
--- a/docs/source/cls/custom_dataset.md
+++ b/docs/source/cls/custom_dataset.md
@ -1,8 +1,8 @@
 # Customize Dataset for Audio Classification
-Following this tutorial you can customize your dataset for audio classification task by using `paddlespeech` and `paddleaudio`.
+Following this tutorial you can customize your dataset for audio classification task by using `paddlespeech`.
-A base class of classification dataset is `paddleaudio.dataset.AudioClassificationDataset`. To customize your dataset you should write a dataset class derived from `AudioClassificationDataset`. 
+A base class of classification dataset is `paddlespeech.audio.dataset.AudioClassificationDataset`. To customize your dataset you should write a dataset class derived from `AudioClassificationDataset`. 
 Assuming you have some wave files that stored in your own directory. You should prepare a meta file with the information of filepaths and labels. For example the absolute path of it is `/PATH/TO/META_FILE.txt`:
 ```
@ -14,7 +14,7 @@ Assuming you have some wave files that stored in your own directory. You should
 Here is an example to build your custom dataset in `custom_dataset.py`:
 ```python
-from paddleaudio.datasets.dataset import AudioClassificationDataset
+from paddlespeech.audio.datasets.dataset import AudioClassificationDataset
 class CustomDataset(AudioClassificationDataset):
    meta_file = '/PATH/TO/META_FILE.txt'
@ -48,7 +48,7 @@ class CustomDataset(AudioClassificationDataset):
 Then you can build dataset and data loader from `CustomDataset`:
 ```python
 import paddle
-from paddleaudio.features import LogMelSpectrogram
+from paddlespeech.audio.features import LogMelSpectrogram
 from custom_dataset import CustomDataset
--- a/docs/source/install.md
+++ b/docs/source/install.md
@ -4,7 +4,7 @@ There are 3 ways to use `PaddleSpeech`. According to the degree of difficulty, t
 | Way | Function                                                     | Support|
 |:---- |:----------------------------------------------------------- |:----|
-| Easy     | (1) Use command-line functions of PaddleSpeech. <br> (2) Experience PaddleSpeech on Ai Studio. | Linux, Mac(not support M1 chip)，Windows |
+| Easy     | (1) Use command-line functions of PaddleSpeech. <br> (2) Experience PaddleSpeech on Ai Studio. | Linux, Mac(not support M1 chip)，Windows ( For more information about installation, see [#1195](https://github.com/PaddlePaddle/PaddleSpeech/discussions/1195)) |
 | Medium     | Support major functions ，such as using the` ready-made `examples and using PaddleSpeech to train your model.                                           | Linux |
 | Hard     | Support full function of Paddlespeech, including using join ctc decoder with kaldi, training n-gram language model, Montreal-Forced-Aligner, and so on. And you are more able to be a developer! | Ubuntu |
--- a/docs/source/install_cn.md
+++ b/docs/source/install_cn.md
@ -3,7 +3,7 @@
 `PaddleSpeech` 有三种安装方法。根据安装的难易程度，这三种方法可以分为 **简单**, **中等** 和 **困难**.
 | 方式 | 功能                                                         | 支持系统            |
 | :--- | :----------------------------------------------------------- | :------------------ |
-| 简单 | (1) 使用 PaddleSpeech 的命令行功能. <br> (2) 在 Aistudio上体验 PaddleSpeech. | Linux, Mac(不支持M1芯片)，Windows |
+| 简单 | (1) 使用 PaddleSpeech 的命令行功能. <br> (2) 在 Aistudio上体验 PaddleSpeech. | Linux, Mac(不支持M1芯片)，Windows (安装详情查看[#1195](https://github.com/PaddlePaddle/PaddleSpeech/discussions/1195)) |
 | 中等 | 支持 PaddleSpeech 主要功能，比如使用已有 examples 中的模型和使用 PaddleSpeech 来训练自己的模型. | Linux               |
 | 困难 | 支持 PaddleSpeech 的各项功能，包含结合kaldi使用 join ctc decoder 方式解码，训练语言模型,使用强制对齐等。并且你更能成为一名开发者！ | Ubuntu              |
 ## 先决条件
--- a/docs/source/released_model.md
+++ b/docs/source/released_model.md
@ -6,15 +6,15 @@
 ### Speech Recognition Model
 Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER | Hours of speech | Example Link 
 :-------------:| :------------:| :-----: | -----: | :-----: |:-----:| :-----:  | :-----:  | :-----: 
-[Ds2 Online Wenetspeech ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr0/asr0_deepspeech2_online_wenetspeech_ckpt_1.0.0a.model.tar.gz) | Wenetspeech Dataset | Char-based | 1.2 GB  | 2 Conv + 5 LSTM layers | 0.152 (test\_net, w/o LM) <br> 0.2417 (test\_meeting, w/o LM) <br> 0.053 (aishell, w/ LM) |-| 10000 h |- 
+[Ds2 Online Wenetspeech ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr0/asr0_deepspeech2_online_wenetspeech_ckpt_1.0.2.model.tar.gz) | Wenetspeech Dataset | Char-based | 1.2 GB  | 2 Conv + 5 LSTM layers | 0.152 (test\_net, w/o LM) <br> 0.2417 (test\_meeting, w/o LM) <br> 0.053 (aishell, w/ LM) |-| 10000 h |- 
 [Ds2 Online Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_fbank161_ckpt_0.2.1.model.tar.gz) | Aishell Dataset | Char-based | 491 MB  | 2 Conv + 5 LSTM layers | 0.0666 |-| 151 h | [D2 Online Aishell ASR0](../../examples/aishell/asr0) 
-[Ds2 Offline Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz)| Aishell Dataset | Char-based | 306 MB | 2 Conv + 3 bidirectional GRU layers| 0.064 |-| 151 h | [Ds2 Offline Aishell ASR0](../../examples/aishell/asr0) 
+[Ds2 Offline Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_offline_aishell_ckpt_1.0.1.model.tar.gz)| Aishell Dataset | Char-based | 1.4 GB | 2 Conv + 5 bidirectional LSTM layers| 0.0554 |-| 151 h | [Ds2 Offline Aishell ASR0](../../examples/aishell/asr0) 
 [Conformer Online Wenetspeech ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr1/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz) | WenetSpeech Dataset | Char-based | 457 MB  | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring| 0.11 (test\_net) 0.1879 (test\_meeting) |-| 10000 h |- 
 [Conformer Online Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_chunk_conformer_aishell_ckpt_0.2.0.model.tar.gz) | Aishell Dataset | Char-based | 189 MB  | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring| 0.0544 |-| 151 h | [Conformer Online Aishell ASR1](../../examples/aishell/asr1) 
 [Conformer Offline Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_conformer_aishell_ckpt_0.1.2.model.tar.gz) | Aishell Dataset | Char-based | 189 MB  | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0464 |-| 151 h | [Conformer Offline Aishell ASR1](../../examples/aishell/asr1) 
 [Transformer Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_transformer_aishell_ckpt_0.1.1.model.tar.gz) | Aishell Dataset | Char-based | 128 MB | Encoder:Transformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0523 || 151 h | [Transformer  Aishell ASR1](../../examples/aishell/asr1) 
-[Ds2 Offline Librispeech ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr0/asr0_deepspeech2_librispeech_ckpt_0.1.1.model.tar.gz)| Librispeech Dataset | Char-based | 518 MB | 2 Conv + 3 bidirectional LSTM layers| - |0.0725| 960 h | [Ds2 Offline Librispeech ASR0](../../examples/librispeech/asr0) 
+[Ds2 Offline Librispeech ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr0/asr0_deepspeech2_offline_librispeech_ckpt_1.0.1.model.tar.gz)| Librispeech Dataset | Char-based | 1.3 GB | 2 Conv + 5 bidirectional LSTM layers| - |0.0467| 960 h | [Ds2 Offline Librispeech ASR0](../../examples/librispeech/asr0) 
-[Conformer Librispeech ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz) | Librispeech Dataset | subword-based | 191 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring |-| 0.0337 | 960 h | [Conformer Librispeech ASR1](../../examples/librispeech/asr1) 
+[Conformer Librispeech ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz) | Librispeech Dataset | subword-based | 191 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring |-| 0.0338 | 960 h | [Conformer Librispeech ASR1](../../examples/librispeech/asr1) 
 [Transformer Librispeech ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/asr1_transformer_librispeech_ckpt_0.1.1.model.tar.gz) | Librispeech Dataset | subword-based | 131 MB  | Encoder:Transformer, Decoder:Transformer, Decoding method: Attention rescoring |-| 0.0381 | 960 h | [Transformer Librispeech ASR1](../../examples/librispeech/asr1) 
 [Transformer Librispeech ASR2 Model](https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr2/asr2_transformer_librispeech_ckpt_0.1.1.model.tar.gz) | Librispeech Dataset | subword-based | 131 MB  | Encoder:Transformer, Decoder:Transformer, Decoding method: JoinCTC w/ LM |-| 0.0240 | 960 h | [Transformer Librispeech ASR2](../../examples/librispeech/asr2) 
@ -82,17 +82,9 @@ PANN | ESC-50 |[pann-esc50](../../examples/esc50/cls0)|[esc50_cnn6.tar.gz](https
 Model Type | Dataset| Example Link | Pretrained Models | Static Models 
 :-------------:| :------------:| :-----: | :-----: | :-----:
-PANN | VoxCeleb| [voxceleb_ecapatdnn](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/voxceleb/sv0) | [ecapatdnn.tar.gz](https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_0.tar.gz) | -
+ECAPA-TDNN | VoxCeleb| [voxceleb_ecapatdnn](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/voxceleb/sv0) | [ecapatdnn.tar.gz](https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_1.tar.gz) | -
 ## Punctuation Restoration Models
 Model Type | Dataset| Example Link | Pretrained Models
 :-------------:| :------------:| :-----: | :-----:
 Ernie Linear | IWLST2012_zh |[iwslt2012_punc0](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/iwslt2012/punc0)|[ernie_linear_p3_iwslt2012_zh_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/text/ernie_linear_p3_iwslt2012_zh_ckpt_0.1.1.zip)
 ## Speech Recognition Model  from paddle 1.8
 | Acoustic Model |Training Data| Token-based | Size | Descriptions | CER | WER | Hours of speech |
 | :-----:| :-----:  |  :-----:  |  :-----:  | :-----:  |  :-----: | :-----:  | :-----: |
 | [Ds2 Offline Aishell model](https://deepspeech.bj.bcebos.com/mandarin_models/aishell_model_v1.8_to_v2.x.tar.gz) |        Aishell Dataset  | Char-based  | 234 MB | 2 Conv + 3 bidirectional GRU layers  | 0.0804 | —  | 151 h  |
 | [Ds2 Offline Librispeech model](https://deepspeech.bj.bcebos.com/eng_models/librispeech_v1.8_to_v2.x.tar.gz) |      Librispeech Dataset | Word-based  | 307 MB | 2 Conv + 3 bidirectional sharing weight RNN layers | —  | 0.0685 | 960 h |
 | [Ds2 Offline Baidu en8k model](https://deepspeech.bj.bcebos.com/eng_models/baidu_en8k_v1.8_to_v2.x.tar.gz) | Baidu Internal English Dataset | Word-based  | 273 MB | 2 Conv + 3 bidirectional GRU layers   |—  | 0.0541 | 8628 h|
--- a/examples/aishell/asr0/RESULTS.md
+++ b/examples/aishell/asr0/RESULTS.md
@ -13,6 +13,7 @@
 | Model | Number of Params | Release | Config | Test set | Valid Loss | CER |  
 | --- | --- | --- | --- | --- | --- | --- |
 | DeepSpeech2 | 122.3M | r1.0.1 | conf/deepspeech2.yaml + U2 Data pipline and spec aug + fbank161 | test | 5.780756044387817 | 0.055400 | 
 | DeepSpeech2 | 58.4M | v2.2.0 | conf/deepspeech2.yaml + spec aug | test | 5.738585948944092 | 0.064000 |  
 | DeepSpeech2 | 58.4M | v2.1.0 | conf/deepspeech2.yaml + spec aug | test | 7.483316898345947 | 0.077860 |  
 | DeepSpeech2 | 58.4M | v2.1.0 | conf/deepspeech2.yaml | test | 7.299022197723389 | 0.078671 |
--- a/examples/aishell/asr0/conf/augmentation.json
+++ b/examples/aishell/asr0/conf/augmentation.json
@ -1,36 +0,0 @@
 [
  {
    "type": "speed",
    "params": {
      "min_speed_rate": 0.9,
      "max_speed_rate": 1.1,
      "num_rates": 3
    },
    "prob": 0.0
  },
  {
    "type": "shift",
    "params": {
      "min_shift_ms": -5,
      "max_shift_ms": 5
    },
    "prob": 1.0
  },
  {
    "type": "specaug",
    "params": {
      "W": 0,
      "warp_mode": "PIL",
      "F": 10,
      "n_freq_masks": 2,
      "T": 50,
      "n_time_masks": 2,
      "p": 1.0,
      "adaptive_number_ratio": 0,
      "adaptive_size_ratio": 0,
      "max_n_time_masks": 20,
      "replace_with_zero": true
    },
    "prob": 1.0
  }
 ]
--- a/examples/aishell/asr0/conf/deepspeech2.yaml
+++ b/examples/aishell/asr0/conf/deepspeech2.yaml
@ -15,50 +15,53 @@ max_output_input_ratio: .inf
 ###########################################
 #              Dataloader                 #
 ###########################################
 batch_size: 64 # one gpu
 mean_std_filepath: data/mean_std.json
 unit_type: char
 vocab_filepath: data/lang_char/vocab.txt 
-augmentation_config: conf/augmentation.json
+spm_model_prefix: ''
-random_seed: 0
+unit_type: 'char'
-spm_model_prefix: 
+preprocess_config: conf/preprocess.yaml
 spectrum_type: linear
 feat_dim: 161
 delta_delta: False
 stride_ms: 10.0
-window_ms: 20.0
+window_ms: 25.0
-n_fft: None
+sortagrad: 0 # Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs 
-max_freq: None
+batch_size: 64
-target_sample_rate: 16000
+maxlen_in: 512  # if input length  > maxlen-in, batchsize is automatically reduced
-use_dB_normalization: True
+maxlen_out: 150  # if output length > maxlen-out, batchsize is automatically reduced
-target_dB: -20
+minibatches: 0 # for debug
-dither: 1.0
+batch_count: auto
-keep_transcription_text: False
+batch_bins: 0 
-sortagrad: True
+batch_frames_in: 0
-shuffle_method: batch_shuffle
+batch_frames_out: 0
-num_workers: 2
+batch_frames_inout: 0
 num_workers: 8
 subsampling_factor: 1
 num_encs: 1
 ############################################
 #           Network Architecture           #
 ############################################
 num_conv_layers: 2
-num_rnn_layers: 3
+num_rnn_layers: 5
 rnn_layer_size: 1024
-use_gru: True 
+rnn_direction: bidirect # [forward, bidirect]
-share_rnn_weights: False
+num_fc_layers: 0
 fc_layers_size_list: -1,
 use_gru: False 
 blank_id: 0
-ctc_grad_norm_type: instance 
+  
 ###########################################
 #                Training                 #
 ###########################################
-n_epoch: 80
+n_epoch: 50
 accum_grad: 1
-lr: 2.0e-3
+lr: 5.0e-4
-lr_decay: 0.83
+lr_decay: 0.93
 weight_decay: 1.0e-6
 global_grad_clip: 3.0
-log_interval: 100
+dist_sampler: False
 log_interval: 1
 checkpoint:
  kbest_n: 50
  latest_n: 5
--- a/examples/aishell/asr0/conf/deepspeech2_online.yaml
+++ b/examples/aishell/asr0/conf/deepspeech2_online.yaml
@ -15,28 +15,26 @@ max_output_input_ratio: .inf
 ###########################################
 #              Dataloader                 #
 ###########################################
 batch_size: 64 # one gpu
 mean_std_filepath: data/mean_std.json
 unit_type: char
 vocab_filepath: data/lang_char/vocab.txt 
-augmentation_config: conf/augmentation.json
+spm_model_prefix: ''
-random_seed: 0
+unit_type: 'char'
-spm_model_prefix: 
+preprocess_config: conf/preprocess.yaml
 spectrum_type: linear #linear, mfcc, fbank
 feat_dim: 161
 delta_delta: False
 stride_ms: 10.0
-window_ms: 20.0
+window_ms: 25.0
-n_fft: None
+sortagrad: 0 # Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs 
-max_freq: None
+batch_size: 64
-target_sample_rate: 16000
+maxlen_in: 512  # if input length  > maxlen-in, batchsize is automatically reduced
-use_dB_normalization: True
+maxlen_out: 150  # if output length > maxlen-out, batchsize is automatically reduced
-target_dB: -20
+minibatches: 0 # for debug
-dither: 1.0
+batch_count: auto
-keep_transcription_text: False
+batch_bins: 0 
-sortagrad: True
+batch_frames_in: 0
-shuffle_method: batch_shuffle
+batch_frames_out: 0
-num_workers: 0
+batch_frames_inout: 0
 num_workers: 8
 subsampling_factor: 1
 num_encs: 1
 ############################################
 #           Network Architecture           #
@ -54,12 +52,13 @@ blank_id: 0
 ###########################################
 #                Training                 #
 ###########################################
-n_epoch: 65
+n_epoch: 30
 accum_grad: 1
 lr: 5.0e-4
 lr_decay: 0.93
 weight_decay: 1.0e-6
 global_grad_clip: 3.0
 dist_sampler: False
 log_interval: 100
 checkpoint:
  kbest_n: 50
--- a/examples/aishell/asr0/conf/preprocess.yaml
+++ b/examples/aishell/asr0/conf/preprocess.yaml
@ -0,0 +1,25 @@
 process:
  # extract kaldi fbank from PCM
  - type: fbank_kaldi
    fs: 16000
    n_mels: 161
    n_shift: 160
    win_length: 400
    dither: 0.1
  - type: cmvn_json
    cmvn_path: data/mean_std.json
  # these three processes are a.k.a. SpecAugument
  - type: time_warp
    max_time_warp: 5
    inplace: true
    mode: PIL
  - type: freq_mask
    F: 30
    n_mask: 2
    inplace: true
    replace_with_zero: false
  - type: time_mask
    T: 40
    n_mask: 2
    inplace: true
    replace_with_zero: false
--- a/examples/aishell/asr0/conf/tuning/decode.yaml
+++ b/examples/aishell/asr0/conf/tuning/decode.yaml
@ -2,9 +2,9 @@ decode_batch_size: 128
 error_rate_type: cer 
 decoding_method: ctc_beam_search
 lang_model_path: data/lm/zh_giga.no_cna_cmn.prune01244.klm
-alpha: 1.9
+alpha: 2.2
-beta: 5.0
+beta: 4.3
-beam_size: 300
+beam_size: 500
 cutoff_prob: 0.99
 cutoff_top_n: 40
 num_proc_bsearch: 10
--- a/examples/aishell/asr0/local/data.sh
+++ b/examples/aishell/asr0/local/data.sh
@ -33,12 +33,13 @@ if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
    num_workers=$(nproc)
    python3 ${MAIN_ROOT}/utils/compute_mean_std.py \
    --manifest_path="data/manifest.train.raw" \
-    --spectrum_type="linear" \
+    --spectrum_type="fbank" \
    --feat_dim=161 \
    --delta_delta=false \
    --stride_ms=10 \
-    --window_ms=20 \
+    --window_ms=25 \
    --sample_rate=16000 \
-    --use_dB_normalization=True \
+    --use_dB_normalization=False \
    --num_samples=2000 \
    --num_workers=${num_workers} \
    --output_path="data/mean_std.json"
--- a/examples/aishell/asr0/local/export.sh
+++ b/examples/aishell/asr0/local/export.sh
@ -1,7 +1,7 @@
 #!/bin/bash
-if [ $# != 4 ];then
+if [ $# != 3 ];then
-    echo "usage: $0 config_path ckpt_prefix jit_model_path model_type"
+    echo "usage: $0 config_path ckpt_prefix jit_model_path"
    exit -1
 fi
@ -11,14 +11,12 @@ echo "using $ngpu gpus..."
 config_path=$1
 ckpt_path_prefix=$2
 jit_model_export_path=$3
 model_type=$4
 python3 -u ${BIN_DIR}/export.py \
 --ngpu ${ngpu} \
 --config ${config_path} \
 --checkpoint_path ${ckpt_path_prefix} \
--export_path ${jit_model_export_path} \
+--export_path ${jit_model_export_path}
 --model_type ${model_type}
 if [ $? -ne 0 ]; then
    echo "Failed in export!"
--- a/examples/aishell/asr0/local/test.sh
+++ b/examples/aishell/asr0/local/test.sh
@ -1,7 +1,7 @@
 #!/bin/bash
-if [ $# != 4 ];then
+if [ $# != 3 ];then
-    echo "usage: ${0} config_path decode_config_path ckpt_path_prefix model_type"
+    echo "usage: ${0} config_path decode_config_path ckpt_path_prefix"
    exit -1
 fi
@ -13,7 +13,6 @@ echo "using $ngpu gpus..."
 config_path=$1
 decode_config_path=$2
 ckpt_prefix=$3
 model_type=$4
 # download language model
 bash local/download_lm_ch.sh
@ -23,7 +22,7 @@ fi
 if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
    # format the reference test file
-    python utils/format_rsl.py \
+    python3 utils/format_rsl.py \
        --origin_ref data/manifest.test.raw \
        --trans_ref data/manifest.test.text
@ -32,8 +31,7 @@ if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
    --config ${config_path} \
    --decode_cfg ${decode_config_path} \
    --result_file ${ckpt_prefix}.rsl \
-    --checkpoint_path ${ckpt_prefix} \
+    --checkpoint_path ${ckpt_prefix}
    --model_type ${model_type}
    if [ $? -ne 0 ]; then
        echo "Failed in evaluation!"
@ -41,20 +39,20 @@ if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
    fi
    # format the hyp file
-    python utils/format_rsl.py \
+    python3 utils/format_rsl.py \
        --origin_hyp ${ckpt_prefix}.rsl \
        --trans_hyp ${ckpt_prefix}.rsl.text
-    python utils/compute-wer.py --char=1 --v=1 \
+    python3 utils/compute-wer.py --char=1 --v=1 \
        data/manifest.test.text ${ckpt_prefix}.rsl.text > ${ckpt_prefix}.error
 fi
 if [ ${stage} -le 101 ] && [ ${stop_stage} -ge 101 ]; then
-    python utils/format_rsl.py \
+    python3 utils/format_rsl.py \
        --origin_ref data/manifest.test.raw \
        --trans_ref_sclite data/manifest.test.text.sclite
-        python utils/format_rsl.py \
+    python3 utils/format_rsl.py \
        --origin_hyp ${ckpt_prefix}.rsl \
        --trans_hyp_sclite ${ckpt_prefix}.rsl.text.sclite
--- a/examples/aishell/asr0/local/test_export.sh
+++ b/examples/aishell/asr0/local/test_export.sh
@ -1,7 +1,7 @@
 #!/bin/bash
-if [ $# != 4 ];then
+if [ $# != 3 ];then
-    echo "usage: ${0} config_path decode_config_path ckpt_path_prefix model_type"
+    echo "usage: ${0} config_path decode_config_path ckpt_path_prefix"
    exit -1
 fi
@ -11,7 +11,6 @@ echo "using $ngpu gpus..."
 config_path=$1
 decode_config_path=$2
 jit_model_export_path=$3
 model_type=$4
 # download language model
 bash local/download_lm_ch.sh > /dev/null 2>&1
@ -24,8 +23,7 @@ python3 -u ${BIN_DIR}/test_export.py \
 --config ${config_path} \
 --decode_cfg ${decode_config_path} \
 --result_file ${jit_model_export_path}.rsl \
--export_path ${jit_model_export_path} \
+--export_path ${jit_model_export_path}
 --model_type ${model_type}
 if [ $? -ne 0 ]; then
    echo "Failed in evaluation!"
--- a/examples/aishell/asr0/local/test_wav.sh
+++ b/examples/aishell/asr0/local/test_wav.sh
@ -1,7 +1,7 @@
 #!/bin/bash
-if [ $# != 5 ];then
+if [ $# != 4 ];then
-    echo "usage: ${0} config_path decode_config_path ckpt_path_prefix model_type audio_file"
+    echo "usage: ${0} config_path decode_config_path ckpt_path_prefix audio_file"
    exit -1
 fi
@ -11,8 +11,7 @@ echo "using $ngpu gpus..."
 config_path=$1
 decode_config_path=$2
 ckpt_prefix=$3
-model_type=$4
+audio_file=$4
 audio_file=$5
 mkdir -p data
 wget -nc https://paddlespeech.bj.bcebos.com/datasets/single_wav/zh/demo_01_03.wav -P data/
@ -37,7 +36,6 @@ python3 -u ${BIN_DIR}/test_wav.py \
 --decode_cfg ${decode_config_path} \
 --result_file ${ckpt_prefix}.rsl \
 --checkpoint_path ${ckpt_prefix} \
 --model_type ${model_type} \
 --audio_file ${audio_file}
 if [ $? -ne 0 ]; then
--- a/examples/aishell/asr0/local/train.sh
+++ b/examples/aishell/asr0/local/train.sh
@ -1,7 +1,7 @@
 #!/bin/bash
-if [ $# != 3 ];then
+if [ $# -lt 2 ] && [ $# -gt 3 ];then
-    echo "usage: CUDA_VISIBLE_DEVICES=0 ${0} config_path ckpt_name model_type"
+    echo "usage: CUDA_VISIBLE_DEVICES=0 ${0} config_path ckpt_name ips(optional)"
    exit -1
 fi
@ -10,7 +10,13 @@ echo "using $ngpu gpus..."
 config_path=$1
 ckpt_name=$2
-model_type=$3
+ips=$3
 if [ ! $ips ];then
  ips_config=
 else
  ips_config="--ips="${ips}
 fi
 mkdir -p exp
@ -25,14 +31,12 @@ python3 -u ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
 --config ${config_path} \
 --output exp/${ckpt_name} \
 --model_type ${model_type} \
 --seed ${seed}
 else
-python3 -m paddle.distributed.launch --gpus=${CUDA_VISIBLE_DEVICES} ${BIN_DIR}/train.py \
+python3 -m paddle.distributed.launch --gpus=${CUDA_VISIBLE_DEVICES} ${ips_config} ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
 --config ${config_path} \
 --output exp/${ckpt_name} \
 --model_type ${model_type} \
 --seed ${seed}
 fi
--- a/examples/aishell/asr0/run.sh
+++ b/examples/aishell/asr0/run.sh
@ -6,9 +6,9 @@ gpus=0,1,2,3
 stage=0
 stop_stage=100
 conf_path=conf/deepspeech2.yaml    #conf/deepspeech2.yaml or conf/deepspeech2_online.yaml
 ips=            #xx.xx.xx.xx,xx.xx.xx.xx
 decode_conf_path=conf/tuning/decode.yaml
-avg_num=1
+avg_num=10
 model_type=offline    # offline or online
 audio_file=data/demo_01_03.wav
 source ${MAIN_ROOT}/utils/parse_options.sh || exit 1;
@ -25,7 +25,7 @@ fi
 if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
    # train model, all `ckpt` under `exp` dir
-    CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path}  ${ckpt} ${model_type}
+    CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path} ${ckpt} ${ips}
 fi
 if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
@ -35,21 +35,21 @@ fi
 if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
    # test ckpt avg_n
-    CUDA_VISIBLE_DEVICES=0 ./local/test.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} ${model_type}|| exit -1
+    CUDA_VISIBLE_DEVICES=0 ./local/test.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt}|| exit -1
 fi
 if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
    # export ckpt avg_n
-    CUDA_VISIBLE_DEVICES=0 ./local/export.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} exp/${ckpt}/checkpoints/${avg_ckpt}.jit ${model_type}
+    CUDA_VISIBLE_DEVICES=0 ./local/export.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} exp/${ckpt}/checkpoints/${avg_ckpt}.jit
 fi
 if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
    # test export ckpt avg_n
-    CUDA_VISIBLE_DEVICES=0 ./local/test_export.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt}.jit ${model_type}|| exit -1
+    CUDA_VISIBLE_DEVICES=0 ./local/test_export.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt}.jit|| exit -1
 fi
 # Optionally, you can add LM and test it with runtime.
 if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then
    # test a single .wav file
-    CUDA_VISIBLE_DEVICES=0 ./local/test_wav.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} ${model_type} ${audio_file} || exit -1
+    CUDA_VISIBLE_DEVICES=0 ./local/test_wav.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} ${audio_file} || exit -1
 fi
--- a/examples/aishell/asr1/local/train.sh
+++ b/examples/aishell/asr1/local/train.sh
@ -17,13 +17,21 @@ if [ ${seed} != 0  ]; then
    echo "using seed $seed & FLAGS_cudnn_deterministic=True ..."
 fi
-if [ $# != 2 ];then
+if [ $# -lt 2 ] && [ $# -gt 3 ];then
-    echo "usage: CUDA_VISIBLE_DEVICES=0 ${0} config_path ckpt_name"
+    echo "usage: CUDA_VISIBLE_DEVICES=0 ${0} config_path ckpt_name ips(optional)"
    exit -1
 fi
 config_path=$1
 ckpt_name=$2
 ips=$3
 if [ ! $ips ];then
  ips_config=
 else
  ips_config="--ips="${ips}
 fi
 echo ${ips_config}
 mkdir -p exp
@ -37,7 +45,7 @@ python3 -u ${BIN_DIR}/train.py \
 --benchmark-batch-size ${benchmark_batch_size} \
 --benchmark-max-step ${benchmark_max_step}
 else
-python3 -m paddle.distributed.launch --gpus=${CUDA_VISIBLE_DEVICES} ${BIN_DIR}/train.py \
+python3 -m paddle.distributed.launch --gpus=${CUDA_VISIBLE_DEVICES} ${ips_config} ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
 --seed ${seed} \
 --config ${config_path} \
--- a/examples/aishell/asr1/run.sh
+++ b/examples/aishell/asr1/run.sh
@ -6,6 +6,7 @@ gpus=0,1,2,3
 stage=0
 stop_stage=50
 conf_path=conf/conformer.yaml
 ips=            #xx.xx.xx.xx,xx.xx.xx.xx
 decode_conf_path=conf/tuning/decode.yaml
 avg_num=30
 audio_file=data/demo_01_03.wav
@ -23,7 +24,7 @@ fi
 if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
    # train model, all `ckpt` under `exp` dir
-    CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path}  ${ckpt}
+    CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path} ${ckpt} ${ips}
 fi
 if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
--- a/examples/aishell3/tts3/README.md
+++ b/examples/aishell3/tts3/README.md
@ -6,15 +6,8 @@ AISHELL-3 is a large-scale and high-fidelity multi-speaker Mandarin speech corpu
 We use AISHELL-3 to train a multi-speaker fastspeech2 model here.
 ## Dataset
 ### Download and Extract
-Download AISHELL-3.
+Download AISHELL-3 from it's [Official Website](http://www.aishelltech.com/aishell_3) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/data_aishell3`.
-```bash
+ 
 wget https://www.openslr.org/resources/93/data_aishell3.tgz
 ```
 Extract AISHELL-3.
 ```bash
 mkdir data_aishell3
 tar zxvf data_aishell3.tgz -C data_aishell3
 ```
 ### Get MFA Result and Extract
 We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2.
 You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo.
@ -120,12 +113,12 @@ CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_p
 ```
 ```text
 usage: synthesize.py [-h]
-                     [--am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}]
+                     [--am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech,tacotron2_aishell3}]
                     [--am_config AM_CONFIG] [--am_ckpt AM_CKPT]
                     [--am_stat AM_STAT] [--phones_dict PHONES_DICT]
                     [--tones_dict TONES_DICT] [--speaker_dict SPEAKER_DICT]
                     [--voice-cloning VOICE_CLONING]
-                     [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}]
+                     [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,wavernn_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,style_melgan_csmsc}]
                     [--voc_config VOC_CONFIG] [--voc_ckpt VOC_CKPT]
                     [--voc_stat VOC_STAT] [--ngpu NGPU]
                     [--test_metadata TEST_METADATA] [--output_dir OUTPUT_DIR]
@ -134,11 +127,10 @@ Synthesize with acoustic model & vocoder
 optional arguments:
  -h, --help            show this help message and exit
-  --am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}
+  --am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech,tacotron2_aishell3}
                        Choose acoustic model type of tts task.
  --am_config AM_CONFIG
-                        Config of acoustic model. Use deault config when it is
+                        Config of acoustic model.
                        None.
  --am_ckpt AM_CKPT     Checkpoint file of acoustic model.
  --am_stat AM_STAT     mean and standard deviation used to normalize
                        spectrogram when training acoustic model.
@ -150,10 +142,10 @@ optional arguments:
                        speaker id map file.
  --voice-cloning VOICE_CLONING
                        whether training voice cloning model.
-  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}
+  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,wavernn_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,style_melgan_csmsc}
                        Choose vocoder type of tts task.
  --voc_config VOC_CONFIG
-                        Config of voc. Use deault config when it is None.
+                        Config of voc.
  --voc_ckpt VOC_CKPT   Checkpoint file of voc.
  --voc_stat VOC_STAT   mean and standard deviation used to normalize
                        spectrogram when training voc.
@ -169,12 +161,12 @@ CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_outp
 ```
 ```text
 usage: synthesize_e2e.py [-h]
-                         [--am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}]
+                         [--am {speedyspeech_csmsc,speedyspeech_aishell3,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech}]
                         [--am_config AM_CONFIG] [--am_ckpt AM_CKPT]
                         [--am_stat AM_STAT] [--phones_dict PHONES_DICT]
                         [--tones_dict TONES_DICT]
                         [--speaker_dict SPEAKER_DICT] [--spk_id SPK_ID]
-                         [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}]
+                         [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,style_melgan_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,wavernn_csmsc}]
                         [--voc_config VOC_CONFIG] [--voc_ckpt VOC_CKPT]
                         [--voc_stat VOC_STAT] [--lang LANG]
                         [--inference_dir INFERENCE_DIR] [--ngpu NGPU]
@ -184,11 +176,10 @@ Synthesize with acoustic model & vocoder
 optional arguments:
  -h, --help            show this help message and exit
-  --am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}
+  --am {speedyspeech_csmsc,speedyspeech_aishell3,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech}
                        Choose acoustic model type of tts task.
  --am_config AM_CONFIG
-                        Config of acoustic model. Use deault config when it is
+                        Config of acoustic model.
                        None.
  --am_ckpt AM_CKPT     Checkpoint file of acoustic model.
  --am_stat AM_STAT     mean and standard deviation used to normalize
                        spectrogram when training acoustic model.
@ -199,10 +190,10 @@ optional arguments:
  --speaker_dict SPEAKER_DICT
                        speaker id map file.
  --spk_id SPK_ID       spk id for multi speaker acoustic model
-  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}
+  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,style_melgan_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,wavernn_csmsc}
                        Choose vocoder type of tts task.
  --voc_config VOC_CONFIG
-                        Config of voc. Use deault config when it is None.
+                        Config of voc.
  --voc_ckpt VOC_CKPT   Checkpoint file of voc.
  --voc_stat VOC_STAT   mean and standard deviation used to normalize
                        spectrogram when training voc.
@ -215,9 +206,9 @@ optional arguments:
                        output dir.
 ```
 1. `--am` is acoustic model type with the format {model_name}_{dataset}
-2. `--am_config`, `--am_checkpoint`, `--am_stat`, `--phones_dict` `--speaker_dict` are arguments for acoustic model, which correspond to the 5 files in the fastspeech2 pretrained model.
+2. `--am_config`, `--am_ckpt`, `--am_stat`, `--phones_dict` `--speaker_dict` are arguments for acoustic model, which correspond to the 5 files in the fastspeech2 pretrained model.
 3. `--voc` is vocoder type with the format {model_name}_{dataset}
-4. `--voc_config`, `--voc_checkpoint`, `--voc_stat` are arguments for vocoder, which correspond to the 3 files in the parallel wavegan pretrained model.
+4. `--voc_config`, `--voc_ckpt`, `--voc_stat` are arguments for vocoder, which correspond to the 3 files in the parallel wavegan pretrained model.
 5. `--lang` is the model language, which can be `zh` or `en`.
 6. `--test_metadata` should be the metadata file in the normalized subfolder of `test`  in the `dump` folder.
 7. `--text` is the text file, which contains sentences to synthesize.
--- a/examples/aishell3/vc0/README.md
+++ b/examples/aishell3/vc0/README.md
@ -6,15 +6,8 @@ This example contains code used to train a [Tacotron2](https://arxiv.org/abs/171
 ## Dataset
 ### Download and Extract
-Download AISHELL-3.
+Download AISHELL-3 from it's [Official Website](http://www.aishelltech.com/aishell_3) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/data_aishell3`.
-```bash
+
 wget https://www.openslr.org/resources/93/data_aishell3.tgz
 ```
 Extract AISHELL-3.
 ```bash
 mkdir data_aishell3
 tar zxvf data_aishell3.tgz -C data_aishell3
 ```
 ### Get MFA Result and Extract
 We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get phonemes for Tacotron2, the durations of MFA are not needed here.
 You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo.
--- a/examples/aishell3/vc1/README.md
+++ b/examples/aishell3/vc1/README.md
@ -6,15 +6,8 @@ This example contains code used to train a [FastSpeech2](https://arxiv.org/abs/2
 ## Dataset
 ### Download and Extract
-Download AISHELL-3.
+Download AISHELL-3 from it's [Official Website](http://www.aishelltech.com/aishell_3) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/data_aishell3`.
-```bash
+
 wget https://www.openslr.org/resources/93/data_aishell3.tgz
 ```
 Extract AISHELL-3.
 ```bash
 mkdir data_aishell3
 tar zxvf data_aishell3.tgz -C data_aishell3
 ```
 ### Get MFA Result and Extract
 We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2.
 You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo.
--- a/examples/aishell3/voc1/README.md
+++ b/examples/aishell3/voc1/README.md
@ -4,15 +4,8 @@ This example contains code used to train a [parallel wavegan](http://arxiv.org/a
 AISHELL-3 is a large-scale and high-fidelity multi-speaker Mandarin speech corpus that could be used to train multi-speaker Text-to-Speech (TTS) systems.
 ## Dataset
 ### Download and Extract
-Download AISHELL-3.
+Download AISHELL-3 from it's [Official Website](http://www.aishelltech.com/aishell_3) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/data_aishell3`.
-```bash
+
 wget https://www.openslr.org/resources/93/data_aishell3.tgz
 ```
 Extract AISHELL-3.
 ```bash
 mkdir data_aishell3
 tar zxvf data_aishell3.tgz -C data_aishell3
 ```
 ### Get MFA Result and Extract
 We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2.
 You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo.
@ -75,7 +68,7 @@ Train a ParallelWaveGAN model.
 optional arguments:
  -h, --help            show this help message and exit
-  --config CONFIG       config file to overwrite default config.
+  --config CONFIG       ParallelWaveGAN config file.
  --train-metadata TRAIN_METADATA
                        training data.
  --dev-metadata DEV_METADATA
--- a/examples/aishell3/voc5/README.md
+++ b/examples/aishell3/voc5/README.md
@ -4,15 +4,7 @@ This example contains code used to train a [HiFiGAN](https://arxiv.org/abs/2010.
 AISHELL-3 is a large-scale and high-fidelity multi-speaker Mandarin speech corpus that could be used to train multi-speaker Text-to-Speech (TTS) systems.
 ## Dataset
 ### Download and Extract
-Download AISHELL-3.
+Download AISHELL-3 from it's [Official Website](http://www.aishelltech.com/aishell_3) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/data_aishell3`.
 ```bash
 wget https://www.openslr.org/resources/93/data_aishell3.tgz
 ```
 Extract AISHELL-3.
 ```bash
 mkdir data_aishell3
 tar zxvf data_aishell3.tgz -C data_aishell3
 ```
 ### Get MFA Result and Extract
 We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2.
 You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo.
@ -67,15 +59,13 @@ Here's the complete help message.
 ```text
 usage: train.py [-h] [--config CONFIG] [--train-metadata TRAIN_METADATA]
                [--dev-metadata DEV_METADATA] [--output-dir OUTPUT_DIR]
-                [--ngpu NGPU] [--batch-size BATCH_SIZE] [--max-iter MAX_ITER]
+                [--ngpu NGPU]
                [--run-benchmark RUN_BENCHMARK]
                [--profiler_options PROFILER_OPTIONS]
-Train a ParallelWaveGAN model.
+Train a HiFiGAN model.
 optional arguments:
  -h, --help            show this help message and exit
-  --config CONFIG       config file to overwrite default config.
+  --config CONFIG       HiFiGAN config file.
  --train-metadata TRAIN_METADATA
                        training data.
  --dev-metadata DEV_METADATA
@ -83,19 +73,6 @@ optional arguments:
  --output-dir OUTPUT_DIR
                        output dir.
  --ngpu NGPU           if ngpu == 0, use cpu.
 benchmark:
  arguments related to benchmark.
  --batch-size BATCH_SIZE
                        batch size.
  --max-iter MAX_ITER   train max steps.
  --run-benchmark RUN_BENCHMARK
                        runing benchmark or not, if True, use the --batch-size
                        and --max-iter.
  --profiler_options PROFILER_OPTIONS
                        The option of profiler, which should be in format
                        "key1=value1;key2=value2;key3=value3".
 ```
 1. `--config` is a config file in yaml format to overwrite the default config, which can be found at `conf/default.yaml`.
--- a/examples/ami/sd0/README.md
+++ b/examples/ami/sd0/README.md
@ -26,4 +26,7 @@ Use the following command to run diarization on AMI corpus.
 ./run.sh  --data_folder ./amicorpus  --manual_annot_folder ./ami_public_manual_1.6.2
 ```
-## Results (DER) coming soon! :)
+## Best performance in terms of Diarization Error Rate (DER).
  | System | Mic. |Orcl. (Dev)|Orcl. (Eval)| Est. (Dev) |Est. (Eval)|
  | --------|-------- | ---------|----------- | --------|-----------|
  | ECAPA-TDNN + SC  | HeadsetMix| 1.54 % | 3.07 %| 1.56 %| 3.28 %  |
--- a/examples/callcenter/asr1/local/train.sh
+++ b/examples/callcenter/asr1/local/train.sh
@ -1,7 +1,7 @@
 #! /usr/bin/env bash
-if [ $# != 2 ];then
+if [ $# -lt 2 ] && [ $# -gt 3 ];then
-    echo "usage: CUDA_VISIBLE_DEVICES=0 ${0} config_path ckpt_name"
+    echo "usage: CUDA_VISIBLE_DEVICES=0 ${0} config_path ckpt_name ips(optional)"
    exit -1
 fi
@ -10,6 +10,13 @@ echo "using $ngpu gpus..."
 config_path=$1
 ckpt_name=$2
 ips=$3
 if [ ! $ips ];then
  ips_config=
 else
  ips_config="--ips="${ips}
 fi
 echo "using ${device}..."
@ -28,7 +35,7 @@ python3 -u ${BIN_DIR}/train.py \
 --output exp/${ckpt_name} \
 --seed ${seed}
 else
-python3 -m paddle.distributed.launch --gpus=${CUDA_VISIBLE_DEVICES} ${BIN_DIR}/train.py \
+python3 -m paddle.distributed.launch --gpus=${CUDA_VISIBLE_DEVICES} ${ips_config} ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
 --config ${config_path} \
 --output exp/${ckpt_name} \
--- a/examples/callcenter/asr1/run.sh
+++ b/examples/callcenter/asr1/run.sh
@ -6,6 +6,7 @@ gpus=0,1,2,3
 stage=0
 stop_stage=50
 conf_path=conf/conformer.yaml
 ips=            #xx.xx.xx.xx,xx.xx.xx.xx
 decode_conf_path=conf/tuning/decode.yaml
 avg_num=20
@ -22,7 +23,7 @@ fi
 if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
    # train model, all `ckpt` under `exp` dir
-    CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path}  ${ckpt}
+    CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path} ${ckpt} ${ips}
 fi
 if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
--- a/examples/csmsc/tts0/README.md
+++ b/examples/csmsc/tts0/README.md
@ -3,7 +3,7 @@ This example contains code used to train a [Tacotron2](https://arxiv.org/abs/171
 ## Dataset
 ### Download and Extract
-Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/source).
+Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/TNtts/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`.
 ### Get MFA Result and Extract
 We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get phonemes for Tacotron2, the durations of MFA are not needed here.
@ -103,12 +103,12 @@ CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_p
 ```
 ```text
 usage: synthesize.py [-h]
-                     [--am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc}]
+                     [--am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech,tacotron2_aishell3}]
                     [--am_config AM_CONFIG] [--am_ckpt AM_CKPT]
                     [--am_stat AM_STAT] [--phones_dict PHONES_DICT]
                     [--tones_dict TONES_DICT] [--speaker_dict SPEAKER_DICT]
                     [--voice-cloning VOICE_CLONING]
-                     [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}]
+                     [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,wavernn_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,style_melgan_csmsc}]
                     [--voc_config VOC_CONFIG] [--voc_ckpt VOC_CKPT]
                     [--voc_stat VOC_STAT] [--ngpu NGPU]
                     [--test_metadata TEST_METADATA] [--output_dir OUTPUT_DIR]
@ -117,11 +117,10 @@ Synthesize with acoustic model & vocoder
 optional arguments:
  -h, --help            show this help message and exit
-  --am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc}
+  --am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech,tacotron2_aishell3}
                        Choose acoustic model type of tts task.
  --am_config AM_CONFIG
-                        Config of acoustic model. Use deault config when it is
+                        Config of acoustic model.
                        None.
  --am_ckpt AM_CKPT     Checkpoint file of acoustic model.
  --am_stat AM_STAT     mean and standard deviation used to normalize
                        spectrogram when training acoustic model.
@ -133,10 +132,10 @@ optional arguments:
                        speaker id map file.
  --voice-cloning VOICE_CLONING
                        whether training voice cloning model.
-  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}
+  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,wavernn_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,style_melgan_csmsc}
                        Choose vocoder type of tts task.
  --voc_config VOC_CONFIG
-                        Config of voc. Use deault config when it is None.
+                        Config of voc.
  --voc_ckpt VOC_CKPT   Checkpoint file of voc.
  --voc_stat VOC_STAT   mean and standard deviation used to normalize
                        spectrogram when training voc.
@ -152,12 +151,12 @@ CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_outp
 ```
 ```text
 usage: synthesize_e2e.py [-h]
-                         [--am {speedyspeech_csmsc,speedyspeech_aishell3,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc}]
+                         [--am {speedyspeech_csmsc,speedyspeech_aishell3,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech}]
                         [--am_config AM_CONFIG] [--am_ckpt AM_CKPT]
                         [--am_stat AM_STAT] [--phones_dict PHONES_DICT]
                         [--tones_dict TONES_DICT]
                         [--speaker_dict SPEAKER_DICT] [--spk_id SPK_ID]
-                         [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,style_melgan_csmsc,hifigan_csmsc}]
+                         [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,style_melgan_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,wavernn_csmsc}]
                         [--voc_config VOC_CONFIG] [--voc_ckpt VOC_CKPT]
                         [--voc_stat VOC_STAT] [--lang LANG]
                         [--inference_dir INFERENCE_DIR] [--ngpu NGPU]
@ -167,11 +166,10 @@ Synthesize with acoustic model & vocoder
 optional arguments:
  -h, --help            show this help message and exit
-  --am {speedyspeech_csmsc,speedyspeech_aishell3,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc}
+  --am {speedyspeech_csmsc,speedyspeech_aishell3,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech}
                        Choose acoustic model type of tts task.
  --am_config AM_CONFIG
-                        Config of acoustic model. Use deault config when it is
+                        Config of acoustic model.
                        None.
  --am_ckpt AM_CKPT     Checkpoint file of acoustic model.
  --am_stat AM_STAT     mean and standard deviation used to normalize
                        spectrogram when training acoustic model.
@ -182,10 +180,10 @@ optional arguments:
  --speaker_dict SPEAKER_DICT
                        speaker id map file.
  --spk_id SPK_ID       spk id for multi speaker acoustic model
-  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,style_melgan_csmsc,hifigan_csmsc}
+  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,style_melgan_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,wavernn_csmsc}
                        Choose vocoder type of tts task.
  --voc_config VOC_CONFIG
-                        Config of voc. Use deault config when it is None.
+                        Config of voc.
  --voc_ckpt VOC_CKPT   Checkpoint file of voc.
  --voc_stat VOC_STAT   mean and standard deviation used to normalize
                        spectrogram when training voc.
@ -198,9 +196,9 @@ optional arguments:
                        output dir.
 ```
 1. `--am` is acoustic model type with the format {model_name}_{dataset}
-2. `--am_config`, `--am_checkpoint`, `--am_stat` and `--phones_dict` are arguments for acoustic model, which correspond to the 4 files in the Tacotron2 pretrained model.
+2. `--am_config`, `--am_ckpt`, `--am_stat` and `--phones_dict` are arguments for acoustic model, which correspond to the 4 files in the Tacotron2 pretrained model.
 3. `--voc` is vocoder type with the format {model_name}_{dataset}
-4. `--voc_config`, `--voc_checkpoint`, `--voc_stat` are arguments for vocoder, which correspond to the 3 files in the parallel wavegan pretrained model.
+4. `--voc_config`, `--voc_ckpt`, `--voc_stat` are arguments for vocoder, which correspond to the 3 files in the parallel wavegan pretrained model.
 5. `--lang` is the model language, which can be `zh` or `en`.
 6. `--test_metadata` should be the metadata file in the normalized subfolder of `test`  in the `dump` folder.
 7. `--text` is the text file, which contains sentences to synthesize.
--- a/examples/csmsc/tts2/README.md
+++ b/examples/csmsc/tts2/README.md
@ -3,7 +3,7 @@ This example contains code used to train a [SpeedySpeech](http://arxiv.org/abs/2
 ## Dataset
 ### Download and Extract
-Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/source).
+Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/TNtts/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`.
 ### Get MFA Result and Extract
 We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for SPEEDYSPEECH.
@ -109,12 +109,12 @@ CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_p
 ```
 ```text
 usage: synthesize.py [-h]
-                     [--am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}]
+                     [--am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech,tacotron2_aishell3}]
                     [--am_config AM_CONFIG] [--am_ckpt AM_CKPT]
                     [--am_stat AM_STAT] [--phones_dict PHONES_DICT]
                     [--tones_dict TONES_DICT] [--speaker_dict SPEAKER_DICT]
                     [--voice-cloning VOICE_CLONING]
-                     [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}]
+                     [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,wavernn_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,style_melgan_csmsc}]
                     [--voc_config VOC_CONFIG] [--voc_ckpt VOC_CKPT]
                     [--voc_stat VOC_STAT] [--ngpu NGPU]
                     [--test_metadata TEST_METADATA] [--output_dir OUTPUT_DIR]
@ -123,11 +123,10 @@ Synthesize with acoustic model & vocoder
 optional arguments:
  -h, --help            show this help message and exit
-  --am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}
+  --am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech,tacotron2_aishell3}
                        Choose acoustic model type of tts task.
  --am_config AM_CONFIG
-                        Config of acoustic model. Use deault config when it is
+                        Config of acoustic model.
                        None.
  --am_ckpt AM_CKPT     Checkpoint file of acoustic model.
  --am_stat AM_STAT     mean and standard deviation used to normalize
                        spectrogram when training acoustic model.
@ -139,10 +138,10 @@ optional arguments:
                        speaker id map file.
  --voice-cloning VOICE_CLONING
                        whether training voice cloning model.
-  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}
+  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,wavernn_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,style_melgan_csmsc}
                        Choose vocoder type of tts task.
  --voc_config VOC_CONFIG
-                        Config of voc. Use deault config when it is None.
+                        Config of voc.
  --voc_ckpt VOC_CKPT   Checkpoint file of voc.
  --voc_stat VOC_STAT   mean and standard deviation used to normalize
                        spectrogram when training voc.
@ -158,12 +157,12 @@ CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_outp
 ```
 ```text
 usage: synthesize_e2e.py [-h]
-                         [--am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}]
+                         [--am {speedyspeech_csmsc,speedyspeech_aishell3,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech}]
                         [--am_config AM_CONFIG] [--am_ckpt AM_CKPT]
                         [--am_stat AM_STAT] [--phones_dict PHONES_DICT]
                         [--tones_dict TONES_DICT]
                         [--speaker_dict SPEAKER_DICT] [--spk_id SPK_ID]
-                         [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}]
+                         [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,style_melgan_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,wavernn_csmsc}]
                         [--voc_config VOC_CONFIG] [--voc_ckpt VOC_CKPT]
                         [--voc_stat VOC_STAT] [--lang LANG]
                         [--inference_dir INFERENCE_DIR] [--ngpu NGPU]
@ -173,11 +172,10 @@ Synthesize with acoustic model & vocoder
 optional arguments:
  -h, --help            show this help message and exit
-  --am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}
+  --am {speedyspeech_csmsc,speedyspeech_aishell3,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech}
                        Choose acoustic model type of tts task.
  --am_config AM_CONFIG
-                        Config of acoustic model. Use deault config when it is
+                        Config of acoustic model.
                        None.
  --am_ckpt AM_CKPT     Checkpoint file of acoustic model.
  --am_stat AM_STAT     mean and standard deviation used to normalize
                        spectrogram when training acoustic model.
@ -188,10 +186,10 @@ optional arguments:
  --speaker_dict SPEAKER_DICT
                        speaker id map file.
  --spk_id SPK_ID       spk id for multi speaker acoustic model
-  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}
+  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,style_melgan_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,wavernn_csmsc}
                        Choose vocoder type of tts task.
  --voc_config VOC_CONFIG
-                        Config of voc. Use deault config when it is None.
+                        Config of voc.
  --voc_ckpt VOC_CKPT   Checkpoint file of voc.
  --voc_stat VOC_STAT   mean and standard deviation used to normalize
                        spectrogram when training voc.
@ -204,9 +202,9 @@ optional arguments:
                        output dir.
 ```
 1. `--am` is acoustic model type with the format {model_name}_{dataset}
-2. `--am_config`, `--am_checkpoint`, `--am_stat`, `--phones_dict` and `--tones_dict` are arguments for acoustic model, which correspond to the 5 files in the speedyspeech pretrained model.
+2. `--am_config`, `--am_ckpt`, `--am_stat`, `--phones_dict` and `--tones_dict` are arguments for acoustic model, which correspond to the 5 files in the speedyspeech pretrained model.
 3. `--voc` is vocoder type with the format {model_name}_{dataset}
-4. `--voc_config`, `--voc_checkpoint`, `--voc_stat` are arguments for vocoder, which correspond to the 3 files in the parallel wavegan pretrained model.
+4. `--voc_config`, `--voc_ckpt`, `--voc_stat` are arguments for vocoder, which correspond to the 3 files in the parallel wavegan pretrained model.
 5. `--lang` is the model language, which can be `zh` or `en`.
 6. `--test_metadata` should be the metadata file in the normalized subfolder of `test`  in the `dump` folder.
 7. `--text` is the text file, which contains sentences to synthesize.
--- a/examples/csmsc/tts3/README.md
+++ b/examples/csmsc/tts3/README.md
@ -4,7 +4,7 @@ This example contains code used to train a [Fastspeech2](https://arxiv.org/abs/2
 ## Dataset
 ### Download and Extract
-Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/source).
+Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/TNtts/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`.
 ### Get MFA Result and Extract
 We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for fastspeech2.
@ -111,12 +111,12 @@ CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_p
 ```
 ```text
 usage: synthesize.py [-h]
-                     [--am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}]
+                     [--am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech,tacotron2_aishell3}]
                     [--am_config AM_CONFIG] [--am_ckpt AM_CKPT]
                     [--am_stat AM_STAT] [--phones_dict PHONES_DICT]
                     [--tones_dict TONES_DICT] [--speaker_dict SPEAKER_DICT]
                     [--voice-cloning VOICE_CLONING]
-                     [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}]
+                     [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,wavernn_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,style_melgan_csmsc}]
                     [--voc_config VOC_CONFIG] [--voc_ckpt VOC_CKPT]
                     [--voc_stat VOC_STAT] [--ngpu NGPU]
                     [--test_metadata TEST_METADATA] [--output_dir OUTPUT_DIR]
@ -125,11 +125,10 @@ Synthesize with acoustic model & vocoder
 optional arguments:
  -h, --help            show this help message and exit
-  --am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}
+  --am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech,tacotron2_aishell3}
                        Choose acoustic model type of tts task.
  --am_config AM_CONFIG
-                        Config of acoustic model. Use deault config when it is
+                        Config of acoustic model.
                        None.
  --am_ckpt AM_CKPT     Checkpoint file of acoustic model.
  --am_stat AM_STAT     mean and standard deviation used to normalize
                        spectrogram when training acoustic model.
@ -141,10 +140,10 @@ optional arguments:
                        speaker id map file.
  --voice-cloning VOICE_CLONING
                        whether training voice cloning model.
-  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}
+  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,wavernn_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,style_melgan_csmsc}
                        Choose vocoder type of tts task.
  --voc_config VOC_CONFIG
-                        Config of voc. Use deault config when it is None.
+                        Config of voc.
  --voc_ckpt VOC_CKPT   Checkpoint file of voc.
  --voc_stat VOC_STAT   mean and standard deviation used to normalize
                        spectrogram when training voc.
@ -160,12 +159,12 @@ CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_outp
 ```
 ```text
 usage: synthesize_e2e.py [-h]
-                         [--am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}]
+                         [--am {speedyspeech_csmsc,speedyspeech_aishell3,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech}]
                         [--am_config AM_CONFIG] [--am_ckpt AM_CKPT]
                         [--am_stat AM_STAT] [--phones_dict PHONES_DICT]
                         [--tones_dict TONES_DICT]
                         [--speaker_dict SPEAKER_DICT] [--spk_id SPK_ID]
-                         [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}]
+                         [--voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,style_melgan_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,wavernn_csmsc}]
                         [--voc_config VOC_CONFIG] [--voc_ckpt VOC_CKPT]
                         [--voc_stat VOC_STAT] [--lang LANG]
                         [--inference_dir INFERENCE_DIR] [--ngpu NGPU]
@ -175,11 +174,10 @@ Synthesize with acoustic model & vocoder
 optional arguments:
  -h, --help            show this help message and exit
-  --am {speedyspeech_csmsc,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk}
+  --am {speedyspeech_csmsc,speedyspeech_aishell3,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech}
                        Choose acoustic model type of tts task.
  --am_config AM_CONFIG
-                        Config of acoustic model. Use deault config when it is
+                        Config of acoustic model.
                        None.
  --am_ckpt AM_CKPT     Checkpoint file of acoustic model.
  --am_stat AM_STAT     mean and standard deviation used to normalize
                        spectrogram when training acoustic model.
@ -190,10 +188,10 @@ optional arguments:
  --speaker_dict SPEAKER_DICT
                        speaker id map file.
  --spk_id SPK_ID       spk id for multi speaker acoustic model
-  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc}
+  --voc {pwgan_csmsc,pwgan_ljspeech,pwgan_aishell3,pwgan_vctk,mb_melgan_csmsc,style_melgan_csmsc,hifigan_csmsc,hifigan_ljspeech,hifigan_aishell3,hifigan_vctk,wavernn_csmsc}
                        Choose vocoder type of tts task.
  --voc_config VOC_CONFIG
-                        Config of voc. Use deault config when it is None.
+                        Config of voc.
  --voc_ckpt VOC_CKPT   Checkpoint file of voc.
  --voc_stat VOC_STAT   mean and standard deviation used to normalize
                        spectrogram when training voc.
@ -204,11 +202,12 @@ optional arguments:
  --text TEXT           text to synthesize, a 'utt_id sentence' pair per line.
  --output_dir OUTPUT_DIR
                        output dir.
 ```
 1. `--am` is acoustic model type with the format {model_name}_{dataset}
-2. `--am_config`, `--am_checkpoint`, `--am_stat` and `--phones_dict` are arguments for acoustic model, which correspond to the 4 files in the fastspeech2 pretrained model.
+2. `--am_config`, `--am_ckpt`, `--am_stat` and `--phones_dict` are arguments for acoustic model, which correspond to the 4 files in the fastspeech2 pretrained model.
 3. `--voc` is vocoder type with the format {model_name}_{dataset}
-4. `--voc_config`, `--voc_checkpoint`, `--voc_stat` are arguments for vocoder, which correspond to the 3 files in the parallel wavegan pretrained model.
+4. `--voc_config`, `--voc_ckpt`, `--voc_stat` are arguments for vocoder, which correspond to the 3 files in the parallel wavegan pretrained model.
 5. `--lang` is the model language, which can be `zh` or `en`.
 6. `--test_metadata` should be the metadata file in the normalized subfolder of `test`  in the `dump` folder.
 7. `--text` is the text file, which contains sentences to synthesize.
--- a/Show More
+++ b/Show More
`@ -92,5 +92,3 @@ server 的 demo： [streaming_asr_server](https://github.com/PaddlePaddle/Paddle`
	`## 4. 快速开始`	`## 4. 快速开始`

	`关于如果使用 PP-ASR，可以看这里的 [install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)，其中提供了简单、中等、困难三种安装方式。如果想体验 paddlespeech 的推理功能，可以用简单安装方式。`	`关于如果使用 PP-ASR，可以看这里的 [install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)，其中提供了简单、中等、困难三种安装方式。如果想体验 paddlespeech 的推理功能，可以用简单安装方式。`