Merge branch 'develop' into u2pp_export

3 years ago · bdf876ea7b
parent afda7ed7d1 a657cc3e1b
commit bdf876ea7b
130 changed files with 4391 additions and 1546 deletions
--- a/README.md
+++ b/README.md
@ -19,8 +19,6 @@
 <div align="center">  
 <h4>
    <a href="#quick-start"> Quick Start </a>
-  | <a href="#quick-start-server"> Quick Start Server </a>
-  | <a href="#quick-start-streaming-server"> Quick Start Streaming Server</a>
  | <a href="#documents"> Documents </a>
  | <a href="#model-list"> Models List </a>
  | <a href="https://aistudio.baidu.com/aistudio/education/group/info/25130"> AIStudio Courses </a>
@ -159,6 +157,8 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
  - 🧩  *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).

 ### Recent Update
+- 🔥 2022.09.26: Add Voice Cloning, TTS finetune, and ERNIE-SAT in [PaddleSpeech Web Demo](./demos/speech_web).
+- ⚡ 2022.09.09: Add AISHELL-3 Voice Cloning [example](./examples/aishell3/vc2) with ECAPA-TDNN speaker encoder.
 - ⚡ 2022.08.25: Release TTS [finetune](./examples/other/tts_finetune/tts3) example.
 - 🔥 2022.08.22: Add ERNIE-SAT models: [ERNIE-SAT-vctk](./examples/vctk/ernie_sat)、[ERNIE-SAT-aishell3](./examples/aishell3/ernie_sat)、[ERNIE-SAT-zh_en](./examples/aishell3_vctk/ernie_sat).
 - 🔥 2022.08.15: Add [g2pW](https://github.com/GitYCC/g2pW) into TTS Chinese Text Frontend.
@ -705,7 +705,7 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
  <tbody>
  <tr>
      <td>Speaker Verification</td>
-      <td>VoxCeleb12</td>
+      <td>VoxCeleb1/2</td>
      <td>ECAPA-TDNN</td>
      <td>
      <a href = "./examples/voxceleb/sv0">ecapa-tdnn-voxceleb12</a>
@ -714,6 +714,31 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
  </tbody>
 </table>

+<a name="SpeakerDiarization"></a>
+
+**Speaker Diarization**
+
+<table style="width:100%">
+  <thead>
+    <tr>
+      <th> Task </th>
+      <th> Dataset </th>
+      <th> Model Type </th>
+      <th> Example </th>
+    </tr>
+  </thead>
+  <tbody>
+  <tr>
+      <td>Speaker Diarization</td>
+     <td>AMI</td>
+      <td>ECAPA-TDNN + AHC / SC</td>
+      <td>
+      <a href = "./examples/ami/sd0">ecapa-tdnn-ami</a>
+      </td>
+    </tr>
+  </tbody>
+</table>
+
 <a name="PunctuationRestoration"></a>

 **Punctuation Restoration**
@ -767,6 +792,7 @@ Normally, [Speech SoTA](https://paperswithcode.com/area/speech), [Audio SoTA](ht
  - [Text-to-Speech](#TextToSpeech)
  - [Audio Classification](#AudioClassification)
  - [Speaker Verification](#SpeakerVerification)
+  - [Speaker Diarization](#SpeakerDiarization)
  - [Punctuation Restoration](#PunctuationRestoration)
 - [Community](#Community)
 - [Welcome to contribute](#contribution)
--- a/README_cn.md
+++ b/README_cn.md
@ -19,10 +19,8 @@
 </p>
 <div align="center">  
 <h4>
-  <a href="#安装"> 安装 </a>
+    <a href="#安装"> 安装 </a>
  | <a href="#快速开始"> 快速开始 </a>
-  | <a href="#快速使用服务"> 快速使用服务 </a>
-  | <a href="#快速使用流式服务"> 快速使用流式服务 </a>
  | <a href="#教程文档"> 教程文档 </a>
  | <a href="#模型列表"> 模型列表 </a>
  | <a href="https://aistudio.baidu.com/aistudio/education/group/info/25130"> AIStudio 课程 </a>
@ -181,6 +179,8 @@
 </div>

 ### 近期更新
+- 🔥 2022.09.26: 新增 Voice Cloning, TTS finetune 和 ERNIE-SAT 到 [PaddleSpeech 网页应用](./demos/speech_web)。
+- ⚡ 2022.09.09: 新增基于 ECAPA-TDNN 声纹模型的 AISHELL-3 Voice Cloning [示例](./examples/aishell3/vc2)。
 - ⚡ 2022.08.25: 发布 TTS [finetune](./examples/other/tts_finetune/tts3) 示例。
 - 🔥 2022.08.22: 新增 ERNIE-SAT 模型: [ERNIE-SAT-vctk](./examples/vctk/ernie_sat)、[ERNIE-SAT-aishell3](./examples/aishell3/ernie_sat)、[ERNIE-SAT-zh_en](./examples/aishell3_vctk/ernie_sat)。
 - 🔥 2022.08.15: 将 [g2pW](https://github.com/GitYCC/g2pW) 引入 TTS 中文文本前端。
@ -717,8 +717,8 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
  </thead>
  <tbody>
  <tr>
-      <td>Speaker Verification</td>
-      <td>VoxCeleb12</td>
+      <td>声纹识别</td>
+      <td>VoxCeleb1/2</td>
      <td>ECAPA-TDNN</td>
      <td>
      <a href = "./examples/voxceleb/sv0">ecapa-tdnn-voxceleb12</a>
@ -727,6 +727,31 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
  </tbody>
 </table>

+<a name="说话人日志模型"></a>
+
+**说话人日志**
+
+<table style="width:100%">
+  <thead>
+    <tr>
+      <th> 任务 </th>
+      <th> 数据集 </th>
+      <th> 模型类型 </th>
+      <th> 脚本 </th>
+    </tr>
+  </thead>
+  <tbody>
+  <tr>
+      <td>说话人日志</td>
+      <td>AMI</td>
+      <td>ECAPA-TDNN + AHC / SC</td>
+      <td>
+      <a href = "./examples/ami/sd0">ecapa-tdnn-ami</a>
+      </td>
+    </tr>
+  </tbody>
+</table>
+
 <a name="标点恢复模型"></a>

 **标点恢复**
@ -786,6 +811,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
  - [语音合成](#语音合成模型)
  - [声音分类](#声音分类模型)
  - [声纹识别](#声纹识别模型)
+  - [说话人日志](#说话人日志模型)
  - [标点恢复](#标点恢复模型)
 - [技术交流群](#技术交流群)
 - [欢迎贡献](#欢迎贡献)
--- a/demos/speech_web/.gitignore
+++ b/demos/speech_web/.gitignore
@ -13,4 +13,7 @@
 *.pdmodel
 */source/*
 */PaddleSpeech/*
+*/tmp*/*
+*/duration.txt
+*/oov_info.txt

--- a/demos/speech_web/README.md
+++ b/demos/speech_web/README.md
@ -1,55 +1,82 @@
 # Paddle Speech Demo

-PaddleSpeechDemo 是一个以 PaddleSpeech 的语音交互功能为主体开发的 Demo 展示项目，用于帮助大家更好的上手 PaddleSpeech 以及使用 PaddleSpeech 构建自己的应用。
+## 简介
+Paddle Speech Demo 是一个以 PaddleSpeech 的语音交互功能为主体开发的 Demo 展示项目，用于帮助大家更好的上手 PaddleSpeech 以及使用 PaddleSpeech 构建自己的应用。

-智能语音交互部分使用 PaddleSpeech，对话以及信息抽取部分使用 PaddleNLP，网页前端展示部分基于 Vue3 进行开发
+智能语音交互部分使用 PaddleSpeech，对话以及信息抽取部分使用 PaddleNLP，网页前端展示部分基于 Vue3 进行开发。

 主要功能：

+`main.py` 中包含功能
 + 语音聊天：PaddleSpeech 的语音识别能力+语音合成能力，对话部分基于 PaddleNLP 的闲聊功能
 + 声纹识别：PaddleSpeech 的声纹识别功能展示
 + 语音识别：支持【实时语音识别】，【端到端识别】，【音频文件识别】三种模式
 + 语音合成：支持【流式合成】与【端到端合成】两种方式
 + 语音指令：基于 PaddleSpeech 的语音识别能力与 PaddleNLP 的信息抽取，实现交通费的智能报销

+`vc.py` 中包含功能
+ 一句话合成：基于 GE2E 和 ECAPA-TDNN 模型的一句话合成方案，可以模仿输入的音频的音色进行合成任务
+  + GE2E 音色克隆方案可以参考： [【FastSpeech2 + AISHELL-3 Voice Cloning】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/vc1)
+  + ECAPA-TDNN 音色克隆方案可以参考: [【FastSpeech2 + AISHELL-3 Voice Cloning (ECAPA-TDNN)】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/vc2)
+
+ 小数据微调：基于小数据集的微调方案，内置用12句话标贝中文女声微调示例，你也可以通过一键重置，录制自己的声音，注意在安静环境下录制，效果会更好。你可以在 [【Finetune your own AM based on FastSpeech2 with AISHELL-3】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/tts_finetune/tts3)中尝试使用自己的数据集进行微调。
+
+ ENIRE-SAT：语言-语音跨模态大模型 ENIRE-SAT 可视化展示示例，支持个性化合成，跨语言语音合成（音频为中文则输入英文文本进行合成），语音编辑（修改音频文字中间的结果）功能。 ENIRE-SAT 更多实现细节，可以参考：
+  + [【ERNIE-SAT with AISHELL-3 dataset】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/ernie_sat)
+  + [【ERNIE-SAT with with AISHELL3 and VCTK datasets】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3_vctk/ernie_sat)
+  + [【ERNIE-SAT with VCTK dataset】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/ernie_sat)
+
 运行效果：

- ![效果](docs/效果展示.png)
+ ![效果](https://user-images.githubusercontent.com/30135920/192155349-9ef93d20-730b-413d-8d50-412fedf11d4b.png)

-## 安装

-### 后端环境安装

-```
-# 安装环境
-cd speech_server
-pip install -r requirements.txt
+## 基础环境安装

-# 下载 ie 模型，针对地点进行微调，效果更好，不下载的话会使用其它版本，效果没有这个好
-cd source
-mkdir model
-cd model
-wget https://bj.bcebos.com/paddlenlp/applications/speech-cmd-analysis/finetune/model_state.pdparams
+### 后端环境安装
+```bash 
+# 需要先安装 PaddleSpeech
+cd speech_server
+pip install -r requirements.txt -i https://mirror.baidu.com/pypi/simple
+cd ../
 ```

 ### 前端环境安装
-
 前端依赖 `node.js` ，需要提前安装，确保 `npm` 可用，`npm` 测试版本 `8.3.1`，建议下载[官网](https://nodejs.org/en/)稳定版的 `node.js`

-```
+如果因为网络问题，无法下载依赖库，可以参考 FAQ 部分，`npm / yarn 下载速度慢问题`
+
+```bash
 # 进入前端目录
 cd web_client
-
 # 安装 `yarn`，已经安装可跳过
 npm install -g yarn
-
 # 使用yarn安装前端依赖
 yarn install
+cd ../
 ```

+
 ## 启动服务
+【注意】目前只支持 `main.py` 和 `vc.py` 两者中选择开启一个后端服务。
+
+### 启动 `main.py` 后端服务
+
+#### 下载相关模型
+
+只需手动下载语音指令所需模型即可，其他模型会自动下载。

-### 开启后端服务
+```bash
+cd speech_server
+mkdir -p source/model
+cd source/model
+# 下载IE模型
+wget https://bj.bcebos.com/paddlenlp/applications/speech-cmd-analysis/finetune/model_state.pdparams
+cd ../../../
+
+```
+#### 启动后端服务

 ```
 cd speech_server
@ -57,14 +84,116 @@ cd speech_server
 python main.py --port 8010
 ```

-### 开启前端服务
+
+### 启动 `vc.py` 后端服务
+
+参照下面的步骤自行配置项目所需环境。
+
+Aistudio 在线体验小样本合成后端功能：[【PaddleSpeech进阶】PaddleSpeech小样本合成方案体验](https://aistudio.baidu.com/aistudio/projectdetail/4573549?sUid=2470186&shared=1&ts=1664174385948)
+
+#### 下载相关模型和音频
+
+```bash
+cd speech_server
+
+# 已创建则跳过
+mkdir -p source/model
+cd source
+# 下载 & 解压 wav （包含VC测试音频）
+wget https://paddlespeech.bj.bcebos.com/demos/speech_web/wav_vc.zip
+unzip wav_vc.zip
+
+cd model
+# 下载 GE2E 相关模型
+wget https://bj.bcebos.com/paddlespeech/Parakeet/released_models/ge2e/ge2e_ckpt_0.3.zip
+unzip ge2e_ckpt_0.3.zip
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_aishell3_ckpt_0.5.zip
+unzip pwg_aishell3_ckpt_0.5.zip
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip
+unzip fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip
+
+# 下载 ECAPA-TDNN 相关模型
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_ckpt_vc2_1.2.0.zip
+unzip fastspeech2_aishell3_ckpt_vc2_1.2.0.zip
+
+# 下载 ERNIE-SAT 相关模型
+# aishell3 ERNIE-SAT
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/ernie_sat/erniesat_aishell3_ckpt_1.2.0.zip
+unzip erniesat_aishell3_ckpt_1.2.0.zip
+
+# vctk ERNIE-SAT
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/ernie_sat/erniesat_vctk_ckpt_1.2.0.zip
+unzip erniesat_vctk_ckpt_1.2.0.zip
+
+# aishell3_vctk ERNIE-SAT
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/ernie_sat/erniesat_aishell3_vctk_ckpt_1.2.0.zip
+unzip erniesat_aishell3_vctk_ckpt_1.2.0.zip
+
+# 下载 finetune 相关模型
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_ckpt_1.1.0.zip
+unzip fastspeech2_aishell3_ckpt_1.1.0.zip
+
+# 下载声码器
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_ckpt_0.2.0.zip
+unzip hifigan_aishell3_ckpt_0.2.0.zip
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_ckpt_0.2.0.zip
+unzip hifigan_vctk_ckpt_0.2.0.zip
+
+cd ../../../
+```
+
+#### ERNIE-SAT 环境配置
+
+ERNIE-SAT 体验依赖于 [examples/aishell3_vctk/ernie_sat](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3_vctk/ernie_sat) 的环境。参考 `examples/aishell3_vctk/ernie_sat` 下的 `README.md`， 确保 `examples/aishell3_vctk/ernie_sat` 下 `run.sh` 相关示例代码有效。
+ 
+运行好 `examples/aishell3_vctk/ernie_sat` 后，回到当前目录，创建环境：
+```bash
+cd speech_server
+ln -snf ../../../examples/aishell3_vctk/ernie_sat/download .
+ln -snf ../../../examples/aishell3_vctk/ernie_sat/tools .
+cd ../
+```
+
+#### finetune 环境配置
+
+`finetune` 需要解压 `tools/aligner` 中的 `aishell3_model.zip`，finetune 过程需要使用到 `tools/aligner/aishell3_model/meta.yaml` 文件。
+
+```bash
+cd speech_server/tools/aligner
+unzip aishell3_model.zip
+cd -
+```
+
+#### 启动后端服务
+
+```
+cd speech_server
+# 默认8010端口
+python vc.py --port 8010
+```
+
+### 启动前端服务

 ```
 cd web_client
 yarn dev --port 8011
 ```

-默认配置下，前端中配置的后台地址信息是 localhost，确保后端服务器和打开页面的游览器在同一台机器上，不在一台机器的配置方式见下方的 FAQ：【后端如果部署在其它机器或者别的端口如何修改】
+默认配置下，前端配置的后台地址信息是 `localhost`，确保后端服务器和打开页面的游览器在同一台机器上，不在一台机器的配置方式见下方的 FAQ：【后端如果部署在其它机器或者别的端口如何修改】
+
+#### 关于前端的一些说明
+
+为了方便后期的维护，这里并没有给出打包好的 HTML 文件，而是 Vue3 的项目，使用 `yarn dev --port 8011` 的方式启动测试，方便大家debug，相当于是启动了一个前端服务器。
+
+比如我们在本机启动的这个前端服务（运行 `yarn dev --port 8011` ），我们就可以通过在游览器中通过 `http://localhost:8011` 访问前端页面
+
+如果我们在其它服务器上（例如：`*.*.*.*` ）启动这个前端服务（运行 `yarn dev --port 8011` ），我们就可以通过在游览器中访问 `http://*.*.*.*:8011` 访问前端页面
+
+那前端跟后端是什么关系呢？ 两个是独立的，只要前端能够通过代理访问到后端的接口，那就没有问题。你可以在 A 机器上部署后端服务，然后在 B 机器上部署前端服务。我们在 `./web_client/vite.config.js` 中将 `/api` 映射到的是 `http://localhost:8010`，你可以把它配置成任意你想要访问后端地址。
+
+当前端在以 `*.*.*.*` 这类以 IP 地址形式的网页中访问时，由于游览器的安全限制，会禁止录音，需要重新配置游览器的安全策略， 可以看下面 FAQ 部分： [【前端以IP地址的形式访问，无法录音】]
+
+
 ## FAQ 

 #### Q: 如何安装node.js
@ -75,7 +204,7 @@ A： node.js的安装可以参考[【菜鸟教程】](https://www.runoob.com/nod

 A：后端的配置地址有分散在两个文件中

-修改第一个文件 `PaddleSpeechWebClient/vite.config.js`
+修改第一个文件 `./web_client/vite.config.js`

 ```
 server: {
@ -90,7 +219,7 @@ server: {
  }
 ```

-修改第二个文件 `PaddleSpeechWebClient/src/api/API.js`（ Websocket 代理配置失败，所以需要在这个文件中修改）
+修改第二个文件 `./web_client/src/api/API.js`（ Websocket 代理配置失败，所以需要在这个文件中修改）

 ```
 // websocket （这里改成后端所在的接口）
@ -99,12 +228,24 @@ ASR_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/onlineStream',  // Stream ASR 接
 TTS_SOCKET_RECORD: 'ws://localhost:8010/ws/tts/online', // Stream TTS 接口
 ```

-#### Q：后端以IP地址的形式，前端无法录音
+#### Q：前端以IP地址的形式访问，无法录音

 A：这里主要是游览器安全策略的限制，需要配置游览器后重启。游览器修改配置可参考[使用js-audio-recorder报浏览器不支持getUserMedia](https://blog.csdn.net/YRY_LIKE_YOU/article/details/113745273)

 chrome设置地址: chrome://flags/#unsafely-treat-insecure-origin-as-secure

+#### Q: npm / yarn 配置淘宝镜像源
+
+A: 配置淘宝镜像源，详细可以参考 [【yarn npm 设置淘宝镜像】](https://www.jianshu.com/p/f6f43e8f9d6b)
+
+```bash
+# npm 配置淘宝镜像源
+npm config set registry https://registry.npmmirror.com
+
+# yarn 配置淘宝镜像源
+yarn config set registry http://registry.npm.taobao.org/
+```
+
 ## 参考资料

 vue实现录音参考资料：https://blog.csdn.net/qq_41619796/article/details/107865602#t1
--- a/demos/speech_web/docs/效果展示.png
+++ b/demos/speech_web/docs/效果展示.png
--- a/demos/speech_web/speech_server/conf/tts3_finetune.yaml
+++ b/demos/speech_web/speech_server/conf/tts3_finetune.yaml
@ -3,10 +3,10 @@
 ###########################################################
 # Set to -1 to indicate that the parameter is the same as the pretrained model configuration

-batch_size: -1
+batch_size: 10
 learning_rate: 0.0001     # learning rate
 num_snapshots: -1

 # frozen_layers should be a list
 # if you don't need to freeze, set frozen_layers to []
-frozen_layers: ["encoder", "duration_predictor"]
+frozen_layers: ["encoder"]
--- a/demos/speech_web/speech_server/main.py
+++ b/demos/speech_web/speech_server/main.py
@ -1,8 +1,3 @@
-# todo:
-# 1. 开启服务
-# 2. 接收录音音频，返回识别结果
-# 3. 接收ASR识别结果，返回NLP对话结果
-# 4. 接收NLP对话结果，返回TTS音频
 import argparse
 import base64
 import datetime
@ -32,6 +27,7 @@ from starlette.requests import Request
 from starlette.responses import FileResponse
 from starlette.websockets import WebSocketState as WebSocketState

+from paddlespeech.cli.tts.infer import TTSExecutor
 from paddlespeech.server.engine.asr.online.python.asr_engine import PaddleASRConnectionHanddler
 from paddlespeech.server.utils.audio_process import float2pcm

@ -55,7 +51,7 @@ asr_config = "conf/ws_conformer_wenetspeech_application_faster.yaml"
 asr_init_path = "source/demo/demo.wav"
 db_path = "source/db/vpr.sqlite"
 ie_model_path = "source/model"
-
+tts_model = TTSExecutor()
 # 路径配置
 UPLOAD_PATH = "source/vpr"
 WAV_PATH = "source/wav"
@ -72,6 +68,14 @@ manager = ConnectionManager()
 aumanager = AudioMannger(chatbot)
 aumanager.init()
 vpr = VPR(db_path, dim=192, top_k=5)
+# 初始化下载模型
+tts_model(
+    text="今天天气准不错",
+    output="test.wav",
+    am='fastspeech2_mix',
+    spk_id=174,
+    voc='hifigan_csmsc',
+    lang='mix', )


 # 服务配置
@ -331,6 +335,7 @@ async def ieOffline(nlp_base: NlpBase):
 #####################################################################


+# 端到端合成
@app.post("/tts/offline")
 async def text2speechOffline(tts_base: TtsBase):
    text = tts_base.text
@ -340,8 +345,14 @@ async def text2speechOffline(tts_base: TtsBase):
        now_name = "tts_" + datetime.datetime.strftime(
            datetime.datetime.now(), '%Y%m%d%H%M%S') + randName() + ".wav"
        out_file_path = os.path.join(WAV_PATH, now_name)
-        # 保存为文件，再转成base64传输
-        chatbot.text2speech(text, outpath=out_file_path)
+        # 使用中英混合CLI
+        tts_model(
+            text=text,
+            output=out_file_path,
+            am='fastspeech2_mix',
+            spk_id=174,
+            voc='hifigan_csmsc',
+            lang='mix')
        with open(out_file_path, "rb") as f:
            data_bin = f.read()
        base_str = base64.b64encode(data_bin)
--- a/demos/speech_web/speech_server/requirements.txt
+++ b/demos/speech_web/speech_server/requirements.txt
@ -1,13 +1,8 @@
 aiofiles
 faiss-cpu
-fastapi
-librosa
-numpy
-paddlenlp
-paddlepaddle
-paddlespeech
+praatio==5.0.0
 pydantic
-python-multipartscikit_learn
-SoundFile
+python-multipart
+scikit_learn
 starlette
 uvicorn
--- a/demos/speech_web/speech_server/src/ernie_sat.py
+++ b/demos/speech_web/speech_server/src/ernie_sat.py
@ -0,0 +1,198 @@
+import os
+
+from .util import get_ngpu
+from .util import MAIN_ROOT
+from .util import run_cmd
+
+
+class SAT:
+    def __init__(self):
+        # pretrain model path
+        self.zh_pretrain_model_path = os.path.realpath(
+            "source/model/erniesat_aishell3_ckpt_1.2.0")
+        self.en_pretrain_model_path = os.path.realpath(
+            "source/model/erniesat_vctk_ckpt_1.2.0")
+        self.cross_pretrain_model_path = os.path.realpath(
+            "source/model/erniesat_aishell3_vctk_ckpt_1.2.0")
+
+        self.zh_voc_model_path = os.path.realpath(
+            "source/model/hifigan_aishell3_ckpt_0.2.0")
+        self.eb_voc_model_path = os.path.realpath(
+            "source/model/hifigan_vctk_ckpt_0.2.0")
+        self.cross_voc_model_path = os.path.realpath(
+            "source/model/hifigan_aishell3_ckpt_0.2.0")
+
+        self.BIN_DIR = os.path.join(MAIN_ROOT,
+                                    "paddlespeech/t2s/exps/ernie_sat")
+
+    def zh_synthesize_edit(self,
+                           old_str: str,
+                           new_str: str,
+                           input_name: os.PathLike,
+                           output_name: os.PathLike,
+                           task_name: str="synthesize",
+                           erniesat_ckpt_name: str="snapshot_iter_289500.pdz"):
+
+        if task_name not in ['synthesize', 'edit']:
+            print("task name only in ['edit', 'synthesize']")
+            return None
+
+        # 推理文件配置
+        config_path = os.path.join(self.zh_pretrain_model_path, "default.yaml")
+        phones_dict = os.path.join(self.zh_pretrain_model_path,
+                                   "phone_id_map.txt")
+        erniesat_ckpt = os.path.join(self.zh_pretrain_model_path,
+                                     erniesat_ckpt_name)
+        erniesat_stat = os.path.join(self.zh_pretrain_model_path,
+                                     "speech_stats.npy")
+
+        voc = "hifigan_aishell3"
+        voc_config = os.path.join(self.zh_voc_model_path, "default.yaml")
+        voc_ckpt = os.path.join(self.zh_voc_model_path,
+                                "snapshot_iter_2500000.pdz")
+        voc_stat = os.path.join(self.zh_voc_model_path, "feats_stats.npy")
+
+        cmd = self.get_cmd(
+            task_name=task_name,
+            input_name=input_name,
+            old_str=old_str,
+            new_str=new_str,
+            config_path=config_path,
+            phones_dict=phones_dict,
+            erniesat_ckpt=erniesat_ckpt,
+            erniesat_stat=erniesat_stat,
+            voc=voc,
+            voc_config=voc_config,
+            voc_ckpt=voc_ckpt,
+            voc_stat=voc_stat,
+            output_name=output_name,
+            source_lang="zh",
+            target_lang="zh")
+
+        return run_cmd(cmd, output_name)
+
+    def crossclone(self,
+                   old_str: str,
+                   new_str: str,
+                   input_name: os.PathLike,
+                   output_name: os.PathLike,
+                   source_lang: str,
+                   target_lang: str,
+                   erniesat_ckpt_name: str="snapshot_iter_489000.pdz"):
+        # 推理文件配置
+        config_path = os.path.join(self.cross_pretrain_model_path,
+                                   "default.yaml")
+        phones_dict = os.path.join(self.cross_pretrain_model_path,
+                                   "phone_id_map.txt")
+        erniesat_ckpt = os.path.join(self.cross_pretrain_model_path,
+                                     erniesat_ckpt_name)
+        erniesat_stat = os.path.join(self.cross_pretrain_model_path,
+                                     "speech_stats.npy")
+
+        voc = "hifigan_aishell3"
+        voc_config = os.path.join(self.cross_voc_model_path, "default.yaml")
+        voc_ckpt = os.path.join(self.cross_voc_model_path,
+                                "snapshot_iter_2500000.pdz")
+        voc_stat = os.path.join(self.cross_voc_model_path, "feats_stats.npy")
+        task_name = "synthesize"
+        cmd = self.get_cmd(
+            task_name=task_name,
+            input_name=input_name,
+            old_str=old_str,
+            new_str=new_str,
+            config_path=config_path,
+            phones_dict=phones_dict,
+            erniesat_ckpt=erniesat_ckpt,
+            erniesat_stat=erniesat_stat,
+            voc=voc,
+            voc_config=voc_config,
+            voc_ckpt=voc_ckpt,
+            voc_stat=voc_stat,
+            output_name=output_name,
+            source_lang=source_lang,
+            target_lang=target_lang)
+
+        return run_cmd(cmd, output_name)
+
+    def en_synthesize_edit(self,
+                           old_str: str,
+                           new_str: str,
+                           input_name: os.PathLike,
+                           output_name: os.PathLike,
+                           task_name: str="synthesize",
+                           erniesat_ckpt_name: str="snapshot_iter_199500.pdz"):
+
+        # 推理文件配置
+        config_path = os.path.join(self.en_pretrain_model_path, "default.yaml")
+        phones_dict = os.path.join(self.en_pretrain_model_path,
+                                   "phone_id_map.txt")
+        erniesat_ckpt = os.path.join(self.en_pretrain_model_path,
+                                     erniesat_ckpt_name)
+        erniesat_stat = os.path.join(self.en_pretrain_model_path,
+                                     "speech_stats.npy")
+
+        voc = "hifigan_aishell3"
+        voc_config = os.path.join(self.zh_voc_model_path, "default.yaml")
+        voc_ckpt = os.path.join(self.zh_voc_model_path,
+                                "snapshot_iter_2500000.pdz")
+        voc_stat = os.path.join(self.zh_voc_model_path, "feats_stats.npy")
+
+        cmd = self.get_cmd(
+            task_name=task_name,
+            input_name=input_name,
+            old_str=old_str,
+            new_str=new_str,
+            config_path=config_path,
+            phones_dict=phones_dict,
+            erniesat_ckpt=erniesat_ckpt,
+            erniesat_stat=erniesat_stat,
+            voc=voc,
+            voc_config=voc_config,
+            voc_ckpt=voc_ckpt,
+            voc_stat=voc_stat,
+            output_name=output_name,
+            source_lang="en",
+            target_lang="en")
+
+        return run_cmd(cmd, output_name)
+
+    def get_cmd(self,
+                task_name: str,
+                input_name: str,
+                old_str: str,
+                new_str: str,
+                config_path: str,
+                phones_dict: str,
+                erniesat_ckpt: str,
+                erniesat_stat: str,
+                voc: str,
+                voc_config: str,
+                voc_ckpt: str,
+                voc_stat: str,
+                output_name: str,
+                source_lang: str,
+                target_lang: str):
+        ngpu = get_ngpu()
+        cmd = f"""
+            FLAGS_allocator_strategy=naive_best_fit \
+            FLAGS_fraction_of_gpu_memory_to_use=0.01 \
+            python3 {self.BIN_DIR}/synthesize_e2e.py \
+                --task_name={task_name} \
+                --wav_path={input_name} \
+                --old_str='{old_str}' \
+                --new_str='{new_str}' \
+                --source_lang={source_lang} \
+                --target_lang={target_lang} \
+                --erniesat_config={config_path} \
+                --phones_dict={phones_dict} \
+                --erniesat_ckpt={erniesat_ckpt} \
+                --erniesat_stat={erniesat_stat} \
+                --voc={voc} \
+                --voc_config={voc_config} \
+                --voc_ckpt={voc_ckpt} \
+                --voc_stat={voc_stat} \
+                --output_name={output_name} \
+                --ngpu={ngpu}
+        """
+
+        return cmd
--- a/demos/speech_web/speech_server/src/finetune.py
+++ b/demos/speech_web/speech_server/src/finetune.py
@ -0,0 +1,127 @@
+import os
+
+from .util import get_ngpu
+from .util import MAIN_ROOT
+from .util import run_cmd
+
+
+def find_max_ckpt(model_path):
+    max_ckpt = 0
+    for filename in os.listdir(model_path):
+        if filename.endswith('.pdz'):
+            files = filename[:-4]
+            a1, a2, it = files.split("_")
+            if int(it) > max_ckpt:
+                max_ckpt = int(it)
+    return max_ckpt
+
+
+class FineTune:
+    def __init__(self):
+        self.now_file_path = os.path.dirname(__file__)
+        self.PYTHONPATH = os.path.join(MAIN_ROOT,
+                                       "examples/other/tts_finetune/tts3")
+        self.BIN_DIR = os.path.join(MAIN_ROOT,
+                                    "paddlespeech/t2s/exps/fastspeech2")
+        self.pretrained_model_dir = os.path.realpath(
+            "source/model/fastspeech2_aishell3_ckpt_1.1.0")
+        self.voc_model_dir = os.path.realpath(
+            "source/model/hifigan_aishell3_ckpt_0.2.0")
+        self.finetune_config = os.path.join("conf/tts3_finetune.yaml")
+
+    def finetune(self, input_dir, exp_dir='temp', epoch=100):
+        """
+        use cmd follow examples/other/tts_finetune/tts3/run.sh
+        """
+        newdir_name = "newdir"
+        new_dir = os.path.join(input_dir, newdir_name)
+        mfa_dir = os.path.join(exp_dir, 'mfa_result')
+        dump_dir = os.path.join(exp_dir, 'dump')
+        output_dir = os.path.join(exp_dir, 'exp')
+        lang = "zh"
+        ngpu = get_ngpu()
+
+        cmd = f"""
+            # check oov
+            python3 {self.PYTHONPATH}/local/check_oov.py \
+                --input_dir={input_dir} \
+                --pretrained_model_dir={self.pretrained_model_dir} \
+                --newdir_name={newdir_name} \
+                --lang={lang}
+            
+            # get mfa result
+            python3 {self.PYTHONPATH}/local/get_mfa_result.py \
+                --input_dir={new_dir} \
+                --mfa_dir={mfa_dir} \
+                --lang={lang}
+            
+            # generate durations.txt
+            python3 {self.PYTHONPATH}/local/generate_duration.py \
+                --mfa_dir={mfa_dir} 
+            
+            # extract feature
+            python3 {self.PYTHONPATH}/local/extract_feature.py \
+                --duration_file="./durations.txt" \
+                --input_dir={new_dir} \
+                --dump_dir={dump_dir} \
+                --pretrained_model_dir={self.pretrained_model_dir}
+            
+            # create finetune env
+            python3 {self.PYTHONPATH}/local/prepare_env.py \
+                --pretrained_model_dir={self.pretrained_model_dir} \
+                --output_dir={output_dir}
+            
+            # finetune
+            python3 {self.PYTHONPATH}/local/finetune.py \
+                --pretrained_model_dir={self.pretrained_model_dir} \
+                --dump_dir={dump_dir} \
+                --output_dir={output_dir} \
+                --ngpu={ngpu} \
+                --epoch=100 \
+                --finetune_config={self.finetune_config}
+        """
+
+        print(cmd)
+
+        return run_cmd(cmd, exp_dir)
+
+    def synthesize(self, text, wav_name, out_wav_dir, exp_dir='temp'):
+
+        voc = "hifigan_aishell3"
+        dump_dir = os.path.join(exp_dir, 'dump')
+        output_dir = os.path.join(exp_dir, 'exp')
+        text_path = os.path.join(exp_dir, 'sentences.txt')
+        lang = "zh"
+        ngpu = get_ngpu()
+
+        model_path = f"{output_dir}/checkpoints"
+        ckpt = find_max_ckpt(model_path)
+
+        # 生成对应的语句
+        with open(text_path, "w", encoding='utf8') as f:
+            f.write(wav_name + " " + text)
+
+        cmd = f"""
+            FLAGS_allocator_strategy=naive_best_fit \
+            FLAGS_fraction_of_gpu_memory_to_use=0.01 \
+            python3 {self.BIN_DIR}/../synthesize_e2e.py \
+                --am=fastspeech2_aishell3 \
+                --am_config={self.pretrained_model_dir}/default.yaml \
+                --am_ckpt={output_dir}/checkpoints/snapshot_iter_{ckpt}.pdz \
+                --am_stat={self.pretrained_model_dir}/speech_stats.npy \
+                --voc={voc} \
+                --voc_config={self.voc_model_dir}/default.yaml \
+                --voc_ckpt={self.voc_model_dir}/snapshot_iter_2500000.pdz \
+                --voc_stat={self.voc_model_dir}/feats_stats.npy \
+                --lang={lang} \
+                --text={text_path} \
+                --output_dir={out_wav_dir} \
+                --phones_dict={dump_dir}/phone_id_map.txt \
+                --speaker_dict={dump_dir}/speaker_id_map.txt \
+                --spk_id=0 \
+                --ngpu={ngpu}
+        """
+
+        out_path = os.path.join(out_wav_dir, f"{wav_name}.wav")
+
+        return run_cmd(cmd, out_path)
--- a/demos/speech_web/speech_server/src/ge2e_clone.py
+++ b/demos/speech_web/speech_server/src/ge2e_clone.py
@ -0,0 +1,60 @@
+import os
+import shutil
+
+from .util import get_ngpu
+from .util import MAIN_ROOT
+from .util import run_cmd
+
+
+class VoiceCloneGE2E():
+    def __init__(self):
+        # Path 到指定路径上
+        self.BIN_DIR = os.path.join(MAIN_ROOT, "paddlespeech/t2s/exps")
+        # am
+        self.am = "fastspeech2_aishell3"
+        self.am_config = "source/model/fastspeech2_nosil_aishell3_vc1_ckpt_0.5/default.yaml"
+        self.am_ckpt = "source/model/fastspeech2_nosil_aishell3_vc1_ckpt_0.5/snapshot_iter_96400.pdz"
+        self.am_stat = "source/model/fastspeech2_nosil_aishell3_vc1_ckpt_0.5/speech_stats.npy"
+        self.phones_dict = "source/model/fastspeech2_nosil_aishell3_vc1_ckpt_0.5/phone_id_map.txt"
+        # voc
+        self.voc = "pwgan_aishell3"
+        self.voc_config = "source/model/pwg_aishell3_ckpt_0.5/default.yaml"
+        self.voc_ckpt = "source/model/pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz"
+        self.voc_stat = "source/model/pwg_aishell3_ckpt_0.5/feats_stats.npy"
+        # ge2e
+        self.ge2e_params_path = "source/model/ge2e_ckpt_0.3/step-3000000.pdparams"
+
+    def vc(self, text, input_wav, out_wav):
+
+        # input wav 需要形成临时单独文件夹
+        _, full_file_name = os.path.split(input_wav)
+        ref_audio_dir = os.path.realpath("tmp_dir/ge2e")
+        if os.path.exists(ref_audio_dir):
+            shutil.rmtree(ref_audio_dir)
+
+        os.makedirs(ref_audio_dir, exist_ok=True)
+        shutil.copy(input_wav, ref_audio_dir)
+
+        output_dir = os.path.dirname(out_wav)
+        ngpu = get_ngpu()
+
+        cmd = f"""
+            python3 {self.BIN_DIR}/voice_cloning.py \
+                    --am={self.am} \
+                    --am_config={self.am_config} \
+                    --am_ckpt={self.am_ckpt} \
+                    --am_stat={self.am_stat} \
+                    --voc={self.voc} \
+                    --voc_config={self.voc_config} \
+                    --voc_ckpt={self.voc_ckpt} \
+                    --voc_stat={self.voc_stat} \
+                    --ge2e_params_path={self.ge2e_params_path} \
+                    --text="{text}" \
+                    --input-dir={ref_audio_dir} \
+                    --output-dir={output_dir} \
+                    --phones-dict={self.phones_dict} \
+                    --ngpu={ngpu}
+        """
+
+        output_name = os.path.join(output_dir, full_file_name)
+        return run_cmd(cmd, output_name=output_name)
--- a/demos/speech_web/speech_server/src/tdnn_clone.py
+++ b/demos/speech_web/speech_server/src/tdnn_clone.py
@ -0,0 +1,56 @@
+import os
+import shutil
+
+from .util import get_ngpu
+from .util import MAIN_ROOT
+from .util import run_cmd
+
+
+class VoiceCloneTDNN():
+    def __init__(self):
+        # Path 到指定路径上
+        self.BIN_DIR = os.path.join(MAIN_ROOT, "paddlespeech/t2s/exps")
+
+        self.am = "fastspeech2_aishell3"
+        self.am_config = "source/model/fastspeech2_aishell3_ckpt_vc2_1.2.0/default.yaml"
+        self.am_ckpt = "source/model/fastspeech2_aishell3_ckpt_vc2_1.2.0/snapshot_iter_96400.pdz"
+        self.am_stat = "source/model/fastspeech2_aishell3_ckpt_vc2_1.2.0/speech_stats.npy"
+        self.phones_dict = "source/model/fastspeech2_aishell3_ckpt_vc2_1.2.0/phone_id_map.txt"
+        # voc
+        self.voc = "pwgan_aishell3"
+        self.voc_config = "source/model/pwg_aishell3_ckpt_0.5/default.yaml"
+        self.voc_ckpt = "source/model/pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz"
+        self.voc_stat = "source/model/pwg_aishell3_ckpt_0.5/feats_stats.npy"
+
+    def vc(self, text, input_wav, out_wav):
+        # input wav 需要形成临时单独文件夹
+        _, full_file_name = os.path.split(input_wav)
+        ref_audio_dir = os.path.realpath("tmp_dir/tdnn")
+        if os.path.exists(ref_audio_dir):
+            shutil.rmtree(ref_audio_dir)
+        os.makedirs(ref_audio_dir, exist_ok=True)
+        shutil.copy(input_wav, ref_audio_dir)
+
+        output_dir = os.path.dirname(out_wav)
+        ngpu = get_ngpu()
+
+        cmd = f"""
+            python3 {self.BIN_DIR}/voice_cloning.py \
+                    --am={self.am} \
+                    --am_config={self.am_config} \
+                    --am_ckpt={self.am_ckpt} \
+                    --am_stat={self.am_stat} \
+                    --voc={self.voc} \
+                    --voc_config={self.voc_config} \
+                    --voc_ckpt={self.voc_ckpt} \
+                    --voc_stat={self.voc_stat} \
+                    --text="{text}" \
+                    --input-dir={ref_audio_dir} \
+                    --output-dir={output_dir} \
+                    --phones-dict={self.phones_dict} \
+                    --use_ecapa=True \
+                    --ngpu={ngpu}
+        """
+
+        output_name = os.path.join(output_dir, full_file_name)
+        return run_cmd(cmd, output_name=output_name)
--- a/demos/speech_web/speech_server/src/util.py
+++ b/demos/speech_web/speech_server/src/util.py
@ -1,4 +1,18 @@
+import os
 import random
+import subprocess
+
+import paddle
+
+NOW_FILE_PATH = os.path.dirname(__file__)
+MAIN_ROOT = os.path.realpath(os.path.join(NOW_FILE_PATH, "../../../../"))
+
+
+def get_ngpu():
+    if paddle.device.get_device() == "cpu":
+        return 0
+    else:
+        return 1


 def randName(n=5):
@ -11,3 +25,20 @@ def SuccessRequest(result=None, message="ok"):

 def ErrorRequest(result=None, message="error"):
    return {"code": -1, "result": result, "message": message}
+
+
+def run_cmd(cmd, output_name):
+    p = subprocess.Popen(cmd, shell=True)
+    res = p.wait()
+    print(cmd)
+    print("运行结果：", res)
+    if res == 0:
+        # 运行成功
+        if os.path.exists(output_name):
+            return output_name
+        else:
+            # 合成的文件不存在
+            return None
+    else:
+        # 运行失败
+        return None
--- a/demos/speech_web/speech_server/vc.py
+++ b/demos/speech_web/speech_server/vc.py
@ -0,0 +1,550 @@
+import argparse
+import base64
+import datetime
+import json
+import os
+from typing import List
+
+import aiofiles
+import librosa
+import soundfile as sf
+import uvicorn
+from fastapi import FastAPI
+from fastapi import UploadFile
+from pydantic import BaseModel
+from src.ernie_sat import SAT
+from src.finetune import FineTune
+from src.ge2e_clone import VoiceCloneGE2E
+from src.tdnn_clone import VoiceCloneTDNN
+from src.util import *
+from starlette.responses import FileResponse
+
+from paddlespeech.server.utils.audio_process import float2pcm
+
+# 解析配置
+parser = argparse.ArgumentParser(prog='PaddleSpeechDemo', add_help=True)
+
+parser.add_argument(
+    "--port",
+    action="store",
+    type=int,
+    help="port of the app",
+    default=8010,
+    required=False)
+
+args = parser.parse_args()
+port = args.port
+
+# 这里会对finetune产生影响，所以finetune使用了cmd
+vc_model = VoiceCloneGE2E()
+vc_model_tdnn = VoiceCloneTDNN()
+
+sat_model = SAT()
+ft_model = FineTune()
+
+# 配置文件
+tts_config = "conf/tts_online_application.yaml"
+asr_config = "conf/ws_conformer_wenetspeech_application_faster.yaml"
+asr_init_path = "source/demo/demo.wav"
+db_path = "source/db/vc.sqlite"
+ie_model_path = "source/model"
+
+# 路径配置
+VC_UPLOAD_PATH = "source/wav/vc/upload"
+VC_OUT_PATH = "source/wav/vc/out"
+
+FT_UPLOAD_PATH = "source/wav/finetune/upload"
+FT_OUT_PATH = "source/wav/finetune/out"
+FT_LABEL_PATH = "source/wav/finetune/label.json"
+FT_LABEL_TXT_PATH = "source/wav/finetune/labels.txt"
+FT_DEFAULT_PATH = "source/wav/finetune/default"
+FT_EXP_BASE_PATH = "tmp_dir/finetune"
+
+SAT_UPLOAD_PATH = "source/wav/SAT/upload"
+SAT_OUT_PATH = "source/wav/SAT/out"
+SAT_LABEL_PATH = "source/wav/SAT/label.json"
+
+# SAT 标注结果初始化
+if os.path.exists(SAT_LABEL_PATH):
+    with open(SAT_LABEL_PATH, "r", encoding='utf8') as f:
+        sat_label_dic = json.load(f)
+else:
+    sat_label_dic = {}
+
+# ft 标注结果初始化
+if os.path.exists(FT_LABEL_PATH):
+    with open(FT_LABEL_PATH, "r", encoding='utf8') as f:
+        ft_label_dic = json.load(f)
+else:
+    ft_label_dic = {}
+
+# 新建文件夹
+base_sources = [
+    VC_UPLOAD_PATH,
+    VC_OUT_PATH,
+    FT_UPLOAD_PATH,
+    FT_OUT_PATH,
+    FT_DEFAULT_PATH,
+    SAT_UPLOAD_PATH,
+    SAT_OUT_PATH,
+]
+for path in base_sources:
+    os.makedirs(path, exist_ok=True)
+#####################################################################
+########################### APP初始化  ###############################
+#####################################################################
+app = FastAPI()
+
+######################################################################
+########################### 接口类型  #################################
+#####################################################################
+
+
+# 接口结构
+class VcBase(BaseModel):
+    wavName: str
+    wavPath: str
+
+
+class VcBaseText(BaseModel):
+    wavName: str
+    wavPath: str
+    text: str
+    func: str
+
+
+class VcBaseSAT(BaseModel):
+    old_str: str
+    new_str: str
+    language: str
+    function: str
+    wav: str  # base64编码
+    filename: str
+
+
+class FTPath(BaseModel):
+    dataPath: str
+
+
+class VcBaseFT(BaseModel):
+    wav: str  # base64编码
+    filename: str
+    wav_path: str
+
+
+class VcBaseFTModel(BaseModel):
+    wav_path: str
+
+
+class VcBaseFTSyn(BaseModel):
+    exp_path: str
+    text: str
+
+
+######################################################################
+########################### 文件列表查询与保存服务 #################################
+#####################################################################
+
+
+def getVCList(path):
+    VC_FileDict = []
+    # 查询upload路径下的wav文件名
+    for root, dirs, files in os.walk(path, topdown=False):
+        for name in files:
+            # print(os.path.join(root, name))
+            VC_FileDict.append({'name': name, 'path': os.path.join(root, name)})
+    VC_FileDict = sorted(VC_FileDict, key=lambda x: x['name'], reverse=True)
+    return VC_FileDict
+
+
+async def saveFiles(files, SavePath):
+    right = 0
+    error = 0
+    error_info = "错误文件："
+    for file in files:
+        try:
+            if 'blob' in file.filename:
+                out_file_path = os.path.join(
+                    SavePath,
+                    datetime.datetime.strftime(datetime.datetime.now(),
+                                               '%H%M') + randName(3) + ".wav")
+            else:
+                out_file_path = os.path.join(SavePath, file.filename)
+
+            print("上传文件名:", out_file_path)
+            async with aiofiles.open(out_file_path, 'wb') as out_file:
+                content = await file.read()  # async read
+                await out_file.write(content)  # async write
+            # 将文件转成24k, 16bit类型的wav文件
+            wav, sr = librosa.load(out_file_path, sr=16000)
+            sf.write(out_file_path, data=wav, samplerate=sr)
+            right += 1
+        except Exception as e:
+            error += 1
+            error_info = error_info + file.filename + " " + str(e) + "\n"
+            continue
+    return f"上传成功：{right}, 上传失败：{error}, 失败原因： {error_info}"
+
+
+# 音频下载
+@app.post("/vc/download")
+async def VcDownload(base: VcBase):
+    if os.path.exists(base.wavPath):
+        return FileResponse(base.wavPath)
+    else:
+        return ErrorRequest(message="下载请求失败，文件不存在")
+
+
+# 音频下载base64
+@app.post("/vc/download_base64")
+async def VcDownloadBase64(base: VcBase):
+    if os.path.exists(base.wavPath):
+        # 将文件转成16k, 16bit类型的wav文件
+        wav, sr = librosa.load(base.wavPath, sr=16000)
+        wav = float2pcm(wav)  # float32 to int16
+        wav_bytes = wav.tobytes()  # to bytes
+        wav_base64 = base64.b64encode(wav_bytes).decode('utf8')
+        return SuccessRequest(result=wav_base64)
+    else:
+        return ErrorRequest(message="播放请求失败，文件不存在")
+
+
+######################################################################
+########################### VC 服务 #################################
+#####################################################################
+
+
+# 上传文件
+@app.post("/vc/upload")
+async def VcUpload(files: List[UploadFile]):
+    # res = saveFiles(files, VC_UPLOAD_PATH)
+    right = 0
+    error = 0
+    error_info = "错误文件："
+    for file in files:
+        try:
+            if 'blob' in file.filename:
+                out_file_path = os.path.join(
+                    VC_UPLOAD_PATH,
+                    datetime.datetime.strftime(datetime.datetime.now(),
+                                               '%H%M') + randName(3) + ".wav")
+            else:
+                out_file_path = os.path.join(VC_UPLOAD_PATH, file.filename)
+
+            print("上传文件名:", out_file_path)
+            async with aiofiles.open(out_file_path, 'wb') as out_file:
+                content = await file.read()  # async read
+                await out_file.write(content)  # async write
+            # 将文件转成24k, 16bit类型的wav文件
+            wav, sr = librosa.load(out_file_path, sr=16000)
+            sf.write(out_file_path, data=wav, samplerate=sr)
+            right += 1
+        except Exception as e:
+            error += 1
+            error_info = error_info + file.filename + " " + str(e) + "\n"
+            continue
+    return SuccessRequest(
+        result=f"上传成功：{right}, 上传失败：{error}, 失败原因： {error_info}")
+
+
+# 获取文件列表
+@app.get("/vc/list")
+async def VcList():
+    res = getVCList(VC_UPLOAD_PATH)
+    return SuccessRequest(result=res)
+
+
+# 获取音频文件
+@app.post("/vc/file")
+async def VcFileGet(base: VcBase):
+    if os.path.exists(base.wavPath):
+        return FileResponse(base.wavPath)
+    else:
+        return ErrorRequest(result="获取文件失败")
+
+
+# 删除音频文件
+@app.post("/vc/del")
+async def VcFileDel(base: VcBase):
+    if os.path.exists(base.wavPath):
+        os.remove(base.wavPath)
+        return SuccessRequest(result="删除成功")
+    else:
+        return ErrorRequest(result="删除失败")
+
+
+# 声音克隆G2P
+@app.post("/vc/clone_g2p")
+async def VcCloneG2P(base: VcBaseText):
+    if os.path.exists(base.wavPath):
+        try:
+            if base.func == 'ge2e':
+                wavName = base.wavName
+                wavPath = os.path.join(VC_OUT_PATH, wavName)
+                wavPath = vc_model.vc(
+                    text=base.text, input_wav=base.wavPath, out_wav=wavPath)
+            else:
+                wavName = base.wavName
+                wavPath = os.path.join(VC_OUT_PATH, wavName)
+                wavPath = vc_model_tdnn.vc(
+                    text=base.text, input_wav=base.wavPath, out_wav=wavPath)
+            if wavPath:
+                res = {"wavName": wavName, "wavPath": wavPath}
+                return SuccessRequest(result=res)
+            else:
+                return ErrorRequest(message="克隆失败，检查克隆脚本是否有效")
+        except Exception as e:
+            print(e)
+            return ErrorRequest(message="克隆失败，合成过程报错")
+    else:
+        return ErrorRequest(message="克隆失败，音频不存在")
+
+
+######################################################################
+########################### SAT 服务 #################################
+#####################################################################
+# 声音克隆SAT
+@app.post("/vc/clone_sat")
+async def VcCloneSAT(base: VcBaseSAT):
+    # 重新整理 sat_label_dict
+    if base.filename not in sat_label_dic or sat_label_dic[
+            base.filename] != base.old_str:
+        sat_label_dic[base.filename] = base.old_str
+        with open(SAT_LABEL_PATH, "w", encoding='utf8') as f:
+            json.dump(sat_label_dic, f, ensure_ascii=False, indent=4)
+
+    input_file_path = base.wav
+
+    # 选择任务
+    if base.language == "zh":
+        # 中文
+        if base.function == "synthesize":
+            output_file_path = os.path.join(SAT_OUT_PATH,
+                                            "sat_syn_zh_" + base.filename)
+            # 中文克隆
+            sat_result = sat_model.zh_synthesize_edit(
+                old_str=base.old_str,
+                new_str=base.new_str,
+                input_name=os.path.realpath(input_file_path),
+                output_name=os.path.realpath(output_file_path),
+                task_name="synthesize")
+        elif base.function == "edit":
+            output_file_path = os.path.join(SAT_OUT_PATH,
+                                            "sat_edit_zh_" + base.filename)
+            # 中文语音编辑
+            sat_result = sat_model.zh_synthesize_edit(
+                old_str=base.old_str,
+                new_str=base.new_str,
+                input_name=os.path.realpath(input_file_path),
+                output_name=os.path.realpath(output_file_path),
+                task_name="edit")
+        elif base.function == "crossclone":
+            output_file_path = os.path.join(SAT_OUT_PATH,
+                                            "sat_cross_zh_" + base.filename)
+            # 中文跨语言
+            sat_result = sat_model.crossclone(
+                old_str=base.old_str,
+                new_str=base.new_str,
+                input_name=os.path.realpath(input_file_path),
+                output_name=os.path.realpath(output_file_path),
+                source_lang="zh",
+                target_lang="en")
+        else:
+            return ErrorRequest(
+                message="请检查功能选项是否正确，仅支持:synthesize, edit, crossclone")
+    elif base.language == "en":
+        if base.function == "synthesize":
+            output_file_path = os.path.join(SAT_OUT_PATH,
+                                            "sat_syn_zh_" + base.filename)
+            # 英文语音克隆
+            sat_result = sat_model.en_synthesize_edit(
+                old_str=base.old_str,
+                new_str=base.new_str,
+                input_name=os.path.realpath(input_file_path),
+                output_name=os.path.realpath(output_file_path),
+                task_name="synthesize")
+        elif base.function == "edit":
+            output_file_path = os.path.join(SAT_OUT_PATH,
+                                            "sat_edit_zh_" + base.filename)
+            # 英文语音编辑
+            sat_result = sat_model.en_synthesize_edit(
+                old_str=base.old_str,
+                new_str=base.new_str,
+                input_name=os.path.realpath(input_file_path),
+                output_name=os.path.realpath(output_file_path),
+                task_name="edit")
+        elif base.function == "crossclone":
+            output_file_path = os.path.join(SAT_OUT_PATH,
+                                            "sat_cross_zh_" + base.filename)
+            # 英文跨语言
+            sat_result = sat_model.crossclone(
+                old_str=base.old_str,
+                new_str=base.new_str,
+                input_name=os.path.realpath(input_file_path),
+                output_name=os.path.realpath(output_file_path),
+                source_lang="en",
+                target_lang="zh")
+        else:
+            return ErrorRequest(
+                message="请检查功能选项是否正确，仅支持:synthesize, edit, crossclone")
+    else:
+        return ErrorRequest(message="请检查功能选项是否正确，仅支持中文和英文")
+
+    if sat_result:
+        return SuccessRequest(result=sat_result, message="SAT合成成功")
+    else:
+        return ErrorRequest(message="SAT 合成失败，请从后台检查错误信息！")
+
+
+# SAT 文件列表
+@app.get("/sat/list")
+async def SatList():
+    res = []
+    filelist = getVCList(SAT_UPLOAD_PATH)
+    for fileitem in filelist:
+        if fileitem['name'] in sat_label_dic:
+            fileitem['label'] = sat_label_dic[fileitem['name']]
+        else:
+            fileitem['label'] = ""
+        res.append(fileitem)
+    return SuccessRequest(result=res)
+
+
+# 上传 SAT 音频
+# 上传文件
+@app.post("/sat/upload")
+async def SATUpload(files: List[UploadFile]):
+    right = 0
+    error = 0
+    error_info = "错误文件："
+    for file in files:
+        try:
+            if 'blob' in file.filename:
+                out_file_path = os.path.join(
+                    SAT_UPLOAD_PATH,
+                    datetime.datetime.strftime(datetime.datetime.now(),
+                                               '%H%M') + randName(3) + ".wav")
+            else:
+                out_file_path = os.path.join(SAT_UPLOAD_PATH, file.filename)
+
+            print("上传文件名:", out_file_path)
+            async with aiofiles.open(out_file_path, 'wb') as out_file:
+                content = await file.read()  # async read
+                await out_file.write(content)  # async write
+            # 将文件转成24k, 16bit类型的wav文件
+            wav, sr = librosa.load(out_file_path, sr=16000)
+            sf.write(out_file_path, data=wav, samplerate=sr)
+            right += 1
+        except Exception as e:
+            error += 1
+            error_info = error_info + file.filename + " " + str(e) + "\n"
+            continue
+    return SuccessRequest(
+        result=f"上传成功：{right}, 上传失败：{error}, 失败原因： {error_info}")
+
+
+######################################################################
+########################### FinueTune 服务 #################################
+#####################################################################
+
+
+# finetune 文件列表
+@app.post("/finetune/list")
+async def FineTuneList(Path: FTPath):
+    dataPath = Path.dataPath
+    if dataPath == "default":
+        # 默认路径
+        FT_PATH = FT_DEFAULT_PATH
+    else:
+        FT_PATH = dataPath
+
+    res = []
+    filelist = getVCList(FT_PATH)
+    for name, value in ft_label_dic.items():
+        wav_path = os.path.join(FT_PATH, name)
+        if not os.path.exists(wav_path):
+            wav_path = ""
+        d = {'text': value['text'], 'name': name, 'path': wav_path}
+        res.append(d)
+    return SuccessRequest(result=res)
+
+
+# 一键重置，获取新的文件地址
+@app.get('/finetune/newdir')
+async def FTGetNewDir():
+    new_path = os.path.join(FT_UPLOAD_PATH, randName(3))
+    if not os.path.exists(new_path):
+        os.makedirs(new_path, exist_ok=True)
+    # 把 labels.txt 复制进去
+    cmd = f"cp {FT_LABEL_TXT_PATH} {new_path}"
+    os.system(cmd)
+    return SuccessRequest(result=new_path)
+
+
+# finetune 上传文件
+@app.post("/finetune/upload")
+async def FTUpload(base: VcBaseFT):
+    try:
+        # 文件夹是否存在
+        if not os.path.exists(base.wav_path):
+            os.makedirs(base.wav_path)
+        # 保存音频文件
+        out_file_path = os.path.join(base.wav_path, base.filename)
+        wav_b = base64.b64decode(base.wav)
+        async with aiofiles.open(out_file_path, 'wb') as out_file:
+            await out_file.write(wav_b)  # async write
+
+        return SuccessRequest(result="上传成功")
+    except Exception as e:
+        return ErrorRequest(result="上传失败")
+
+
+# finetune 微调
+@app.post("/finetune/clone_finetune")
+async def FTModel(base: VcBaseFTModel):
+    # 先检查 wav_path 是否有效
+    if base.wav_path == 'default':
+        data_path = FT_DEFAULT_PATH
+    else:
+        data_path = base.wav_path
+    if not os.path.exists(data_path):
+        return ErrorRequest(message="数据文件夹不存在")
+
+    data_base = data_path.split(os.sep)[-1]
+    exp_dir = os.path.join(FT_EXP_BASE_PATH, data_base)
+    try:
+        exp_dir = ft_model.finetune(
+            input_dir=os.path.realpath(data_path),
+            exp_dir=os.path.realpath(exp_dir))
+        if exp_dir:
+            return SuccessRequest(result=exp_dir)
+        else:
+            return ErrorRequest(message="微调失败")
+    except Exception as e:
+        print(e)
+        return ErrorRequest(message="微调失败")
+
+
+# finetune 合成
+@app.post("/finetune/clone_finetune_syn")
+async def FTSyn(base: VcBaseFTSyn):
+    try:
+        if not os.path.exists(base.exp_path):
+            return ErrorRequest(result="模型路径不存在")
+        wav_name = randName(5)
+        wav_path = ft_model.synthesize(
+            text=base.text,
+            wav_name=wav_name,
+            out_wav_dir=os.path.realpath(FT_OUT_PATH),
+            exp_dir=os.path.realpath(base.exp_path))
+        if wav_path:
+            res = {"wavName": wav_name + ".wav", "wavPath": wav_path}
+            return SuccessRequest(result=res)
+        else:
+            return ErrorRequest(message="音频合成失败")
+    except Exception as e:
+        return ErrorRequest(message="音频合成失败")
+
+
+if __name__ == '__main__':
+    uvicorn.run(app=app, host='0.0.0.0', port=port)
--- a/demos/speech_web/web_client/package.json
+++ b/demos/speech_web/web_client/package.json
@ -8,6 +8,7 @@
    "preview": "vite preview"
  },
  "dependencies": {
+    "@element-plus/icons-vue": "^2.0.9",
    "ant-design-vue": "^2.2.8",
    "axios": "^0.26.1",
    "element-plus": "^2.1.9",
@ -18,6 +19,7 @@
  },
  "devDependencies": {
    "@vitejs/plugin-vue": "^2.3.0",
-    "vite": "^2.9.0"
+    "vite": "^2.9.13",
+    "@vue/compiler-sfc": "^3.1.0"
  }
 }
--- a/demos/speech_web/web_client/src/api/API.js
+++ b/demos/speech_web/web_client/src/api/API.js
@ -19,6 +19,26 @@ export const apiURL =   {
    CHAT_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/offlineStream', // ChatBot websocket 接口
    ASR_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/onlineStream',  // Stream ASR 接口
    TTS_SOCKET_RECORD: 'ws://localhost:8010/ws/tts/online', // Stream TTS 接口
+
+    // voice clone
+    // Voice Clone
+    VC_List: '/api/vc/list',
+    SAT_List: '/api/sat/list',
+    FineTune_List: '/api/finetune/list',
+
+    VC_Upload: '/api/vc/upload',
+    SAT_Upload: '/api/sat/upload',
+    FineTune_Upload: '/api/finetune/upload',
+    FineTune_NewDir: '/api/finetune/newdir',
+
+    VC_Download: '/api/vc/download',
+    VC_Download_Base64: '/api/vc/download_base64',
+    VC_Del: '/api/vc/del',
+    
+    VC_CloneG2p: '/api/vc/clone_g2p',
+    VC_CloneSAT: '/api/vc/clone_sat',
+    VC_CloneFineTune: '/api/finetune/clone_finetune',
+    VC_CloneFineTuneSyn: '/api/finetune/clone_finetune_syn',
 }


--- a/demos/speech_web/web_client/src/api/ApiVC.js
+++ b/demos/speech_web/web_client/src/api/ApiVC.js
@ -0,0 +1,88 @@
+import axios from 'axios'
+import {apiURL} from "./API.js"
+
+// 上传音频-vc
+export async function vcUpload(params){
+    const result = await axios.post(apiURL.VC_Upload, params);
+    return result
+}
+
+// 上传音频-sat
+export async function satUpload(params){
+    const result = await axios.post(apiURL.SAT_Upload, params);
+    return result
+}
+
+// 上传音频-finetune
+export async function fineTuneUpload(params){
+    const result = await axios.post(apiURL.FineTune_Upload, params);
+    return result
+}
+
+// 删除音频
+export async function vcDel(params){
+    const result = await axios.post(apiURL.VC_Del, params);
+    return result
+}
+
+// 获取音频列表vc
+export async function vcList(){
+    const result = await axios.get(apiURL.VC_List);
+    return result
+}
+// 获取音频列表Sat
+export async function satList(){
+    const result = await axios.get(apiURL.SAT_List);
+    return result
+}
+
+// 获取音频列表fineTune
+export async function fineTuneList(params){
+    const result = await axios.post(apiURL.FineTune_List, params);
+    return result
+}
+
+// fineTune 一键重置 获取新的文件夹
+export async function fineTuneNewDir(){
+    const result = await axios.get(apiURL.FineTune_NewDir);
+    return result
+}
+
+// 获取音频数据
+export async function vcDownload(params){
+    const result = await axios.post(apiURL.VC_Download, params);
+    return result
+}
+
+// 获取音频数据Base64
+export async function vcDownloadBase64(params){
+    const result = await axios.post(apiURL.VC_Download_Base64, params);
+    return result
+}
+
+
+// 克隆合成G2P
+export async function vcCloneG2P(params){
+    const result = await axios.post(apiURL.VC_CloneG2p, params);
+    return result
+}
+
+// 克隆合成SAT
+export async function vcCloneSAT(params){
+    const result = await axios.post(apiURL.VC_CloneSAT, params);
+    return result
+}
+
+// 克隆合成 - finetune 微调
+export async function vcCloneFineTune(params){
+    const result = await axios.post(apiURL.VC_CloneFineTune, params);
+    return result
+}
+
+// 克隆合成 - finetune 合成
+export async function vcCloneFineTuneSyn(params){
+    const result = await axios.post(apiURL.VC_CloneFineTuneSyn, params);
+    return result
+}
+
+
--- a/demos/speech_web/web_client/src/components/Content/Header/Header.vue
+++ b/demos/speech_web/web_client/src/components/Content/Header/Header.vue
@ -4,7 +4,7 @@
        飞桨-PaddleSpeech
      </div>
      <div className="speech_header_describe">
-        PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源模型库，用于语音和音频中的各种关键任务的开发，欢迎大家Star收藏鼓励
+        PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源模型库，用于语音和音频中的各种关键任务的开发。支持语音识别，语音合成，声纹识别，声音分类，语音唤醒，语音翻译等多种语音任务，荣获 NAACL2022 Best Demo Award 。如果你喜欢这个示例，欢迎在 github 中 star 收藏鼓励。
      </div>
      <div className="speech_header_link_box">
        <a href="https://github.com/PaddlePaddle/PaddleSpeech" className="speech_header_link"  target='_blank' rel='noreferrer' key={index}>
--- a/demos/speech_web/web_client/src/components/Content/Header/style.less
+++ b/demos/speech_web/web_client/src/components/Content/Header/style.less
@ -43,6 +43,7 @@
        margin-bottom: 40px;
        display: flex;
        align-items: center;
+        margin-top: 40px;
    };
    .speech_header_link {
        display: block;
--- a/demos/speech_web/web_client/src/components/Experience.vue
+++ b/demos/speech_web/web_client/src/components/Experience.vue
@ -6,6 +6,10 @@ import TTST from './SubMenu/TTS/TTST.vue'
 import VPRT from './SubMenu/VPR/VPRT.vue'
 import IET from './SubMenu/IE/IET.vue'

+import VoiceCloneT from './SubMenu/VoiceClone/VoiceClone.vue'
+import ENIRE_SATT from './SubMenu/ENIRE_SAT/ENIRE_SAT.vue'
+import FineTuneT from './SubMenu/FineTune/FineTune.vue'
+
 </script>

 <template>
@ -37,6 +41,15 @@ import IET from './SubMenu/IE/IET.vue'
            <el-tab-pane label="语音指令" key="5">
            <IET></IET>
            </el-tab-pane>
+            <el-tab-pane label="一句话合成" key="6">
+            <VoiceCloneT></VoiceCloneT>
+            </el-tab-pane>
+            <el-tab-pane label="小数据微调" key="7">
+            <FineTuneT></FineTuneT>
+            </el-tab-pane>
+            <el-tab-pane label="ENIRE-SAT" key="8">
+            <ENIRE_SATT></ENIRE_SATT>
+            </el-tab-pane>
          </el-tabs>
        </div>
      </div>
--- a/demos/speech_web/web_client/src/components/SubMenu/ASR/RealTime/RealTime.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/ASR/RealTime/RealTime.vue
@ -58,9 +58,6 @@ export default {
    mounted () {
        this.wsUrl = apiURL.ASR_SOCKET_RECORD
        this.ws = new WebSocket(this.wsUrl)
-        if(this.ws.readyState === this.ws.CONNECTING){
-            this.$message.success("实时识别 Websocket 连接成功")
-        }
        var _that = this
        this.ws.addEventListener('message', function (event) {
                var temp = JSON.parse(event.data);
@ -78,7 +75,7 @@ export default {
            // 检查 websocket 状态
            // debugger
            if(this.ws.readyState != this.ws.OPEN){
-                this.$message.error("websocket 链接失败，请检查链接地址是否正确")
+                this.$message.error("websocket 链接失败，请检查 Websocket 后端服务是否正确开启")
                return
            }

--- a/demos/speech_web/web_client/src/components/SubMenu/ChatBot/Chat.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/ChatBot/Chat.vue
@ -1,298 +0,0 @@
-<template>
-  <div class="chatbox">
-      <h3>语音聊天</h3>
-      <div class="home" style="margin:1vw;">
-      <el-button :type="recoType" @click="startRecorder()"  style="margin:1vw;">{{ recoText }}</el-button>
-      <!-- <el-button :type="playType" @click="playRecorder()" style="margin:1vw;"> {{ playText }}</el-button> -->
-      <el-button :type="envType" @click="envRecorder()" style="margin:1vw;"> {{ envText }}</el-button>
-      <!-- <el-button :type="envType" @click="getTts(ttsd)" style="margin:1vw;"> TTS </el-button> -->
-      <el-button type="warning" @click="clearChat()" style="margin:1vw;"> 清空聊天</el-button>
-
-      </div>
-
-      <div v-for="Result in allResultList">
-      <h3>{{Result}}</h3>
-      </div>
-  </div>
-  
-</template>
- 
-<script>
-
-import Recorder from 'js-audio-recorder'
-
-
-const recorder = new Recorder({
-  sampleBits: 16,                 // 采样位数，支持 8 或 16，默认是16
-  sampleRate: 16000,              // 采样率，支持 11025、16000、22050、24000、44100、48000，根据浏览器默认值，我的chrome是48000
-  numChannels: 1,                 // 声道，支持 1 或 2， 默认是1
-  compiling: true
-})
-
-  export default {
-    name: 'home',
-    data () {
-      return {
-        recoType: "primary",
-        recoText: "开始录音",
-        playType: "success",
-        playText: "播放录音",
-        envType: "success",
-        envText: "环境采样",
-
-        asrResultList: [],
-        nlpResultList: [],
-        ttsResultList: [],
-        allResultList: [],
-        webSocketRes: "websocket",
-        drawRecordId: null,
-
-        onReco: false,
-        onPlay: false,
-        onRecoPause: false,
-        ws: '',
-
-        ttsd: "你的名字叫什么,你的名字叫什么,你的名字叫什么你的名字叫什么",
-        audioCtx: '',
-        source: '',
-
-        typedArray: '',
-        ttsResult: '',
-       
-      }
-    },
-    mounted () {
-        // 播放器
-        var AudioContext = window.AudioContext || window.webkitAudioContext;
-        this.audioCtx = new AudioContext({
-            latencyHint: 'interactive',
-            sampleRate: 24000,
-          });
-        // 定义 play
-        recorder.onplayend = () => {
-        this.onPlay = false
-        this.playText = "播放录音"
-        this.playType = "success"
-        this.$nextTick(()=>{})
-      }
-      // 初始化ws
-      this.ws = new WebSocket("ws://localhost:8010/ws/asr/offlineStream");
-
-      // 定义消息处理逻辑
-      var _that = this
-      this.ws.addEventListener('message', function (event) {
-          _that.allResultList.push("asr:" + event.data)
-          _that.$nextTick(()=>{})
-          _that.getNlp(event.data)
-      })
-    },
-
-    methods: {
-      // 清空录音
-      clearChat(){
-        this.allResultList = []
-      },
-      // 开始录音
-      startRecorder () {
-        if(!this.onReco){
-          this.resumeRecordOnline()
-          recorder.start().then(() => {
-            setInterval(() => {
-              // 持续录音
-              let newData = recorder.getNextData();
-              if (!newData.length) {
-                return;
-              }
-              // 上传到流式测试1
-              this.uploadChunk(newData)
-            }, 500)
-        }, (error) => {
-          console.log("录音出错");
-        })
-        this.onReco = true
-        this.recoType = "danger"
-        this.recoText = "结束录音"
-        this.$nextTick(()=>{
-          })
-        } else {
-          // 结束录音
-          recorder.stop()
-          this.onReco = false
-          this.recoType = "primary"
-          this.recoText = "开始录音"
-          this.$nextTick(()=>{})
-          recorder.clear()
-          // 音频导出成wav,然后上传到服务器
-          // const wavs = recorder.getWAVBlob()
-          // this.uploadFile(wavs, "/api/asr/offline")
-          // console.log(wavs)
-          // 给服务器发送停止指令, 清空缓存数据
-          this.stopRecordOnline()
-        }
-      },
-
-      // 开始录音
-      envRecorder () {
-        if(!this.onReco){
-          recorder.start().then(() => {
-        }, (error) => {
-          console.log("录音出错");
-        })
-        this.onReco = true
-        this.envType = "danger"
-        this.envText = "结束采样"
-        this.$nextTick(()=>{
-          })
-        } else {
-          // 结束录音
-          recorder.stop()
-          this.onReco = false
-          this.envType = "success"
-          this.envText = "环境采样"
-          this.$nextTick(()=>{})
-          const wavs = recorder.getWAVBlob()
-          this.uploadFile(wavs, "/api/asr/collectEnv")
-        }
-      },
-
-
-      // 录音播放
-      playRecorder () {
-        if(!this.onPlay){
-          // 播放音频
-          recorder.play()
-          this.onPlay = true
-          this.playText = "结束播放"
-          this.playType = "warning"
-          this.$nextTick(()=>{})
-        
-        } else {
-          recorder.stopPlay()
-          this.onPlay = false
-          this.playText = "播放录音"
-          this.playType = "success"
-          this.$nextTick(()=>{})
-        }
-      },
-
-      // 上传录音文件
-      async uploadFile(file, post_url){
-        const formData = new FormData()
-        formData.append('files', file)
-        const result = await this.$http.post(post_url, formData);
-        if (result.data.code === 0) {
-              this.asrResultList.push(result.data.result)
-              // this.$message.success(result.data.message);
-          } else {
-              this.$message.error(result.data.message);
-          }
-      },
-      // 上传chunk语音包
-      async uploadChunk(chunkDatas) {
-        chunkDatas.forEach((chunkData) => {
-                this.ws.send(chunkData)
-              })
-
-      },
-
-      // 停止录音,输出成pcm
-      async stopRecordOnline(){
-        const result = await this.$http.get("/api/asr/stopRecord");
-        if (result.data.code === 0) {
-            console.log("Online 录音停止成功")
-          } else {
-            // console.log("chunk 发送失败")
-          }
-      },
-      // 恢复录音，中间抛出的语音，一律不接受
-      async resumeRecordOnline(){
-        const result = await this.$http.get("/api/asr/resumeRecord");
-        if (result.data.code === 0) {
-            console.log("chunk 发送成功")
-          } else {
-            // console.log("chunk 发送失败")
-          }
-      },
-
-      // 请求 NLP 对话结果
-      async getNlp(asrText){
-        
-        // 录音暂停
-        this.onRecoPause = true
-        recorder.pause()
-        this.stopRecordOnline()
-        console.log('录音暂停')
-
-        const result = await this.$http.post("/api/nlp/chat", { chat: asrText});
-        if (result.data.code === 0) {
-              this.allResultList.push("nlp:" + result.data.result)
-              this.getTts(result.data.result)
-              // this.$message.success(result.data.message);
-          } else {
-              this.$message.error(result.data.message);
-          }
-        // console.log("录音恢复")
-      },
-
-    base64ToUint8Array(base64String) {
-      const padding = '='.repeat((4 - base64String.length % 4) % 4);
-       const base64 = (base64String + padding)
-                    .replace(/-/g, '+')
-                    .replace(/_/g, '/');
-
-       const rawData = window.atob(base64);
-       const outputArray = new Uint8Array(rawData.length);
-
-       for (let i = 0; i < rawData.length; ++i) {
-            outputArray[i] = rawData.charCodeAt(i);
-       }
-       return outputArray;
-      },
-
-      // 合成TTS音频
-      async getTts(nlpText){
-        // base64
-        this.ttsResult = await this.$http.post("/api/tts/offline", { text : nlpText});
-        this.typedArray = this.base64ToUint8Array(this.ttsResult.data.result)
-        // console.log("chat", this.typedArray.buffer)
-        this.playAudioData( this.typedArray.buffer )
-
-      },
-
-      // play
-      playAudioData( wav_buffer ) {
-        this.audioCtx.decodeAudioData(wav_buffer, buffer => {
-            this.source = this.audioCtx.createBufferSource();
-            this.source.onended = () => {
-              // 如果被暂停
-              if(this.onRecoPause){
-                console.log("恢复录音")
-                this.onRecoPause = false
-                // 客户端录音恢复
-                recorder.resume()
-                // 服务器录音恢复
-                this.resumeRecordOnline()
-              }
-              
-            }
-            this.source.buffer = buffer;
-            this.source.connect(this.audioCtx.destination);
-            this.source.start();
-        }, function(e) {
-            Recorder.throwError(e);
-        });
-    }
-    },
- 
-  }
-</script>
- 
-<style lang='less' scoped>
- .chatbox {
-  border: 4px solid #F00;
-  // position: fixed;
-  width: 100%;
-  height: 20%;
-  overflow: auto;
- }
-</style>
--- a/demos/speech_web/web_client/src/components/SubMenu/ChatBot/ChatT.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/ChatBot/ChatT.vue
@ -91,6 +91,10 @@ export default {
    methods: {
        // 开始录音
        startRecorder(){
+          if(this.ws.readyState != this.ws.OPEN){
+                this.$message.error("websocket 链接失败，请检查 Websocket 后端服务是否正确开启")
+                return
+            }
          this.allResultList = []
          if(!this.onReco){
            this.asrResult = this.speakingText
--- a/demos/speech_web/web_client/src/components/SubMenu/ENIRE_SAT/ENIRE_SAT.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/ENIRE_SAT/ENIRE_SAT.vue
@ -0,0 +1,487 @@
+<template>
+    <div class="sat">
+      <el-row :gutter="20">
+            <el-col :span="12"><div class="grid-content ep-bg-purple" />
+                <el-row :gutter="60" class="btn_row_wav" justify="center">
+                    <el-button class="ml-3" v-if="onEnrollRec === 0" @click="startRecorderEnroll()" type="primary">录制音频</el-button>
+                    <el-button class="ml-3" v-else-if="onEnrollRec === 1" @click="stopRecorderEnroll()" type="danger">停止录音</el-button>
+                    <el-button class="ml-3" v-else @click="uploadRecord()" type="success">上传录音</el-button>
+                    <a>&#12288</a>
+                    <el-upload
+                        :multiple="false"
+                        :accept="'.wav'"
+                        :auto-upload="false"
+                        :on-change="handleChange"
+                        :show-file-list="false"
+                    >
+                        <el-button class="ml-3" type="success">上传音频文件</el-button>
+                    </el-upload>
+                </el-row>
+                <div class="recording_table">
+                <el-table :data="vcDatas" border class="recording_table_box" scrollbar-always-on max-height="250px">
+                    <!-- <el-table-column prop="wavId" label="序号" width="60"/> -->
+                    <el-table-column prop="wavName" label="文件名" width="150"/>
+                    <el-table-column label="文本">
+                      <template #default="scope">
+                            <el-input 
+                              v-model="scope.row.label"
+                              :autosize="{ minRows: 8, maxRows: 13 }" 
+                              placeholder="Please input"
+                              />
+                            
+                        </template>
+                    </el-table-column>
+                    <el-table-column label="操作" width="80">
+                        <template #default="scope">
+                            <div class="flex justify-space-between mb-4 flex-wrap gap-4">
+                                <a @click="PlayTable(scope.row.wavId)"><el-icon><VideoPlay /></el-icon></a>
+                                <a>&#12288</a>
+                                <a @click="delWav(scope.row.wavId)"><el-icon><DeleteFilled /></el-icon></a>
+                            </div>
+                        </template>
+                    </el-table-column>
+                    <el-table-column fixed="right" label="选择" width="70">
+                        <template #default="scope">
+                            <el-switch v-model="scope.row.status"  @click="choseWav(scope.row.wavId)"/>
+                        </template>
+                    </el-table-column>
+                </el-table>
+                </div>
+
+            </el-col>
+            <el-col :span="8"><div class="grid-content ep-bg-purple" />
+                <el-space direction="vertical">
+                    <el-card class="box-card" style="width: 250px; height:310px">
+                        <template #header>
+                            <div class="card-header">
+                            <span>功能选择</span>
+                            </div>
+                        </template>  
+                        <el-radio-group v-model="funcMode">
+                          <el-radio label="1" size="middle" border style="margin-bottom: 10px">个性化语音合成</el-radio>
+                            <el-input
+                              v-if="funcMode === '1'"
+                              v-model="ttsText"
+                              :autosize="{ minRows: 2, maxRows: 2 }"
+                              type="textarea"
+                              placeholder="Please input"
+                              style="margin-bottom: 10px"
+                              />
+                          <el-radio label="2" size="middle" border style="margin-bottom: 10px">跨语言语音合成</el-radio>
+                            <el-input
+                              v-if="funcMode === '2'"
+                              v-model="ttsText"
+                              :autosize="{ minRows: 2, maxRows: 2 }"
+                              type="textarea"
+                              placeholder="Please input"
+                              style="margin-bottom: 10px"
+                              />
+                          <el-radio label="3" size="middle" border style="margin-bottom: 10px">语音编辑</el-radio>
+                            <el-input
+                                v-if="funcMode === '3'"
+                                v-model="ttsText"
+                                :autosize="{ minRows: 2, maxRows: 2 }"
+                                type="textarea"
+                                placeholder="Please input"
+                                style="margin-bottom: 10px"
+                                />
+                        </el-radio-group>
+                    </el-card>                    
+                </el-space>
+            </el-col>
+            <el-col :span="4"><div class="grid-content ep-bg-purple" />
+                <div class="play_board">
+                    <el-space direction="vertical">
+                        <el-row :gutter="20">
+                            <el-button size="large" v-if="onSyn === 0" type="primary" @click="SatSyn()">开始合成</el-button>
+                            <el-button size="large" v-else :loading-icon="Eleme" type="danger">合成中</el-button>
+                        </el-row>
+                        <el-row :gutter="20">
+                            <el-button v-if='this.cloneWav' type="success" @click="PlaySyn()">播放</el-button>
+                            <el-button v-else disabled type="primary" @click="PlaySyn()">播放</el-button>
+                            <el-button v-if='this.cloneWav' type="primary" @click="downLoadCloneWav()">下载</el-button>
+                            <el-button v-else disabled type="primary" @click="downLoadCloneWav()">下载</el-button>
+                        </el-row>
+                    </el-space>
+                </div>
+            </el-col>
+        </el-row>
+</div>
+</template>
+
+<script>
+import { vcCloneSAT, vcDownload, vcDownloadBase64, satUpload, satList, vcDel } from '../../../api/ApiVC'
+import Recorder from 'js-audio-recorder'
+
+let audioCtx = new AudioContext({
+latencyHint: 'interactive',
+sampleRate: 24000,
+});
+
+// 初始化录音
+const recorder = new Recorder({
+  sampleBits: 16,                 // 采样位数，支持 8 或 16，默认是16
+  sampleRate: 16000,              // 采样率，支持 11025、16000、22050、24000、44100、48000，根据浏览器默认值，我的chrome是48000
+  numChannels: 1,                 // 声道，支持 1 或 2， 默认是1
+  compiling: true
+})
+
+export default {
+name:"",
+data(){
+    return {
+        uploadStatus : 0,
+        recognitionStatus : 0,
+        asrResult : "",
+        indicator : "",
+        
+        filename: "",
+        upfile: "",
+        mode: 1,
+        language: 1,
+        wav_input: "卡尔普陪外孙玩滑梯",
+        new_input: "卡尔普陪外孙打滑梯",
+        received_file:"",
+
+        // 分割线
+        onEnrollRec: 0,
+        onSyn:0,
+        vcDatas: [],
+        funcMode: '1',
+        selected_Id: -1,
+        ttsText: '',
+        cloneWav: '',
+        wav:''
+    }
+},
+
+mounted () {
+        this.GetList()
+    },
+
+methods:{
+    // 获取文件列表
+    async GetList(){
+            this.vcDatas =[]
+            const result = await satList();
+            console.log("List: ", result);
+            for(let i=0; i < result.data.result.length; i++){
+                this.vcDatas.push({
+                    wavName: result.data.result[i]['name'],
+                    wavId: i,
+                    wavPath: result.data.result[i]['path'],
+                    status: false,
+                    label: result.data.result[i]['label']
+                })
+            }
+            console.log("vcDatas: ", this.vcDatas);
+            this.$nextTick(()=>{})
+    },
+
+    // 上传文件切换
+    async handleChange(file, fileList){
+      for(let i=0; i<fileList.length; i++){
+        this.uploadFile(fileList[i])
+      }
+      this.GetList()
+    },
+
+    async uploadFile(file){
+      let formData = new FormData();
+      formData.append('files', file.raw);
+      const result = await satUpload(formData);
+      if (result.data.code === 0) {
+          this.$message.success("音频上传成功")
+          
+      } else {
+          this.$message.error("音频上传失败")
+      }
+    },
+
+    // 开始录音
+    startRecorderEnroll(){
+            this.onEnrollRec = 1
+            recorder.clear()
+            recorder.start()
+        },
+    
+    // 结束录音
+    stopRecorderEnroll(){
+        this.onEnrollRec = 2
+        recorder.stop()
+        this.wav = recorder.getWAVBlob()
+    },
+
+    // 上传录音
+    async uploadRecord(){
+            this.onEnrollRec = 0
+            if(this.wav === ""){
+                this.$message.error("未检测到录音，录音失败，请重新录制")
+                return
+            } else {
+                if(this.wav === ''){
+                    this.$message.error("请先完成录音");
+                    this.onEnrollRec = 0
+                    return
+                } else {
+                    let formData = new FormData();
+                    formData.append('files', this.wav);
+                    const result = await satUpload(formData);
+                    console.log(result)
+                    this.GetList() 
+                }
+                this.$message.success("录音上传成功")
+            }
+        }, 
+
+    // 删除音频文件
+    async delWav(wavId){
+            console.log('wavId', wavId)
+            // 删除文件
+            const result = await vcDel(
+                {
+                  wavName: this.vcDatas[wavId]['wavName'],
+                  wavPath: this.vcDatas[wavId]['wavPath']
+                }
+            );
+            if(!result.data.code){
+                this.$message.success("删除成功")
+            } else {
+                this.$message.error(result.data.msg)
+            }
+            this.GetList()
+            this.reset()
+        },
+    
+    // 播放表格
+    async PlayTable(wavId){
+        this.Play(this.vcDatas[wavId])
+    },
+
+    // 播放音频
+    async Play(wavBase){
+        // 获取音频数据
+        const result = await vcDownloadBase64(wavBase);
+        // console.log('play result', result)
+        if (result.data.code === 0) {
+            // base转换二进制数
+            let typedArray = this.base64ToUint8Array(result.data.result)
+            // 添加wav文件头
+            let view = new DataView(typedArray.buffer);
+            view = Recorder.encodeWAV(view, 16000, 16000, 1, 16, true);
+            // 播放音频
+            this.playAudioData(view.buffer);
+        };
+        },
+    // chose wav
+    choseWav(wavId){
+            this.cloneWav = ''
+            this.nowFile = this.vcDatas[wavId].wavName
+            this.nowIndex = wavId
+            // only wavId is true else false
+            for(let i=0; i<this.vcDatas.length; i++){
+                if(i==wavId){
+                    this.vcDatas[wavId].status = true
+                    this.selected_Id = wavId
+                    this.ttsText = this.vcDatas[wavId]['label']
+                } else {
+                    this.vcDatas[i].status = false
+                }
+            }
+            this.$nextTick(()=>{})
+        },
+
+    // 播放音频
+    playAudioData(wav_buffer){
+        audioCtx.decodeAudioData(wav_buffer, buffer => {
+            let source = audioCtx.createBufferSource();
+            source.buffer = buffer
+            source.connect(audioCtx.destination);
+            source.start();
+        }, function (e) {
+        });
+    },
+
+
+    base64ToUint8Array(base64String){
+       const padding = '='.repeat((4 - base64String.length % 4) % 4);
+        const base64 = (base64String + padding)
+            .replace(/-/g, '+')
+            .replace(/_/g, '/');
+    
+        const rawData = window.atob(base64);
+        const outputArray = new Uint8Array(rawData.length);
+    
+        for (let i = 0; i < rawData.length; ++i) {
+            outputArray[i] = rawData.charCodeAt(i);
+        }
+        return outputArray; 
+    },
+
+    // 检查是否包含中文
+    hasChinese(str) {
+      return /[\u4E00-\u9FA5]+/g.test(str)
+    },
+
+    // SAT合成
+    async SatSyn(){
+      // 检查 select id
+      if(this.selected_Id < 0){
+        return this.$message.error("请先选择音频文件！")
+      }
+
+      // 检查音频对应的文本
+      if(!this.vcDatas[this.selected_Id]['label']){
+        return this.$message.error("音频对应文本不可以为空！")
+      }
+
+      // 检查待合成文本
+      if(!this.ttsText){
+        return this.$message.error("合成文本不可以为空！")
+      }
+
+      // 合成中
+      this.onSyn = 1
+      // 重置 clone wav
+      this.cloneWav = ""
+  
+      const old_str = this.vcDatas[this.selected_Id]['label']
+      const new_str = this.ttsText
+      let language = ""
+      // 包含中文
+      if(this.hasChinese(old_str)){
+        language = "zh"
+      } else{
+        language = "en"
+      }
+      // 功能选择
+      let func = ""
+      if(this.funcMode === '1') {
+        func = "synthesize"
+      } else if(this.funcMode === '2'){
+        func = "crossclone"
+      } else {
+        func = "edit"
+      }
+      
+      let wav_path = this.vcDatas[this.selected_Id]['wavPath']
+      let filename = this.vcDatas[this.selected_Id]['wavName']
+
+      const data = {
+        old_str: old_str,
+        new_str: new_str,
+        language: language,
+        function: func,
+        wav: wav_path,
+        filename: filename
+
+      }
+
+      console.log("sat data: ", data)
+      
+      // sat 接口
+      const result = await vcCloneSAT(data)
+      // 合成完成
+      this.onSyn = 0
+      console.log(result);
+      // debugger
+      if (result.data.code === 0) {
+
+        this.$message.success(result.data.message)
+        // 获取识别文本
+        this.cloneWav = result.data.result
+        console.log("cloneWave", this.cloneWav);
+
+      } else {
+        this.$message.error(result.data.message)
+      };
+    },
+    // 播放合成的音频
+    // 播放音频
+    async PlaySyn(){
+        // 获取音频数据
+        const data = {
+          wavName: "sat_"+this.filename,
+          wavPath: this.cloneWav
+        }
+        const result = await vcDownloadBase64(data);
+        // console.log('play result', result)
+        if (result.data.code === 0) {
+            // base转换二进制数
+            let typedArray = this.base64ToUint8Array(result.data.result)
+            // 添加wav文件头
+            let view = new DataView(typedArray.buffer);
+            view = Recorder.encodeWAV(view, 16000, 16000, 1, 16, true);
+            // 播放音频
+            this.playAudioData(view.buffer);
+        };
+        },
+
+
+    // 下载合成文件
+    async downLoadCloneWav(){
+    if(this.cloneWav  === ""){
+        this.$message.error("音频合成完毕后再下载！")
+    } else {
+        // const result = await vcDownload(this.cloneWav);
+        // 获取音频数据
+        const data = {
+          wavName: "sat_"+this.filename,
+          wavPath: this.cloneWav
+        }
+        const result = await vcDownloadBase64(data);
+        let view;
+        // console.log('play result', result)
+        if (result.data.code === 0) {
+            // base转换二进制数
+            let typedArray = this.base64ToUint8Array(result.data.result)
+            // 添加wav文件头
+            view = new DataView(typedArray.buffer);
+            view = Recorder.encodeWAV(view, 16000, 16000, 1, 16, true);
+            // 播放音频
+            // this.playAudioData(view.buffer);
+        }
+        console.log(view.buffer)
+        // debugger
+        const blob = new Blob([view.buffer], { type: 'audio/wav' });
+        const fileName = new Date().getTime() + '.wav';
+        const down = document.createElement('a');
+        down.download = fileName;
+        down.style.display = 'none';//隐藏,没必要展示出来
+        down.href = URL.createObjectURL(blob);
+        document.body.appendChild(down);
+        down.click();
+        URL.revokeObjectURL(down.href); // 释放URL 对象
+        document.body.removeChild(down);//下载完成移除
+      }
+    },
+
+}
+}   
+
+</script>
+
+<style lang="less" scoped>
+// @import "./style.less";
+.sat {
+    width: 1200px;
+    height: 410px;
+    background: #FFFFFF;
+    padding: 5px 80px 56px 80px;
+    box-sizing: border-box;
+}
+
+.el-row {
+  margin-bottom: 20px;
+}
+.grid-content {
+  border-radius: 4px;
+  min-height: 36px;
+}
+.play_board{
+    height: 100%;
+    display: flex;
+    align-items: center;
+}
+
+</style>
--- a/demos/speech_web/web_client/src/components/SubMenu/FineTune/FineTune.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/FineTune/FineTune.vue
@ -0,0 +1,427 @@
+<template>
+    <div class="finetune">
+      <el-row :gutter="20"> 
+        <el-col :span="12"><div class="grid-content ep-bg-purple" />
+          <el-row :gutter="60" class="btn_row_wav" justify="center">
+              <el-button class="ml-3" @click="clearAll()" type="primary">一键重置</el-button>
+              <el-button class="ml-3" @click="resetDefault()" type="primary">默认示例</el-button>
+              <el-button v-if='onFinetune === 0' class="ml-3" @click="fineTuneModel()" type="primary">一键微调</el-button>
+              <el-button v-else-if='onFinetune === 1' class="ml-3" @click="fineTuneModel()" type="danger">微调中</el-button>
+              <el-button v-else-if='onFinetune === 2' class="ml-3" @click="resetFinetuneBtn()" type="success">微调成功</el-button>
+              <el-button v-else class="ml-3" @click="resetFinetuneBtn()" type="success">微调失败</el-button>
+              <!-- <el-button class="ml-3" @click="chooseHistory()" type="warning">历史数据选择</el-button> -->
+        </el-row>
+
+        <div class="recording_table">
+            <el-table :data="vcDatas" border class="recording_table_box" scrollbar-always-on max-height="250px">
+                <el-table-column prop="wavId" label="序号" width="60"/>
+                <el-table-column prop="text" label="文本" />
+                <el-table-column label="音频" width="80">
+                    <template #default="scope">
+                        <a v-if="scope.row.wavPath != ''">{{ scope.row.wavName }}</a>
+                        <a v-else>
+                            
+                            <el-button class="ml-3" v-if="onEnrollRec === 0" @click="startRecorderEnroll()" type="primary" circle>
+                                <el-icon><Microphone /></el-icon>
+                            </el-button>
+                            <el-button class="ml-3" v-else-if="onEnrollRec === 1" @click="stopRecorderEnroll()" type="danger" circle>
+                                <el-icon><Microphone /></el-icon>
+                            </el-button>
+                            <el-button class="ml-3" v-else @click="uploadRecord(scope.row.wavId)" type="success" circle>
+                                <el-icon><Upload /></el-icon>
+                            </el-button>
+                        </a>
+                    </template>
+                </el-table-column>
+                <el-table-column label="操作" width="80" fixed="right">
+                    <template #default="scope">
+                        <div class="flex justify-space-between mb-4 flex-wrap gap-4">
+                            <a @click="PlayTable(scope.row.wavId)"><el-icon><VideoPlay /></el-icon></a>
+                            <a>&#12288</a>
+                            <a @click="delWav(scope.row.wavId)"><el-icon><DeleteFilled /></el-icon></a>
+                        </div>
+                    </template>
+                </el-table-column>
+            </el-table>
+        </div>
+
+            </el-col>
+            <el-col :span="8"><div class="grid-content ep-bg-purple" />
+                <el-space direction="vertical">
+                    <el-card class="box-card" style="width: 250px; height:310px">
+                        <template #header>
+                            
+                            <div class="card-header">
+                                <span>试验路径</span>
+                                <el-input
+                                    v-model="expPath"
+                                    :autosize="{ minRows: 2, maxRows: 3 }"
+                                    type="textarea"
+                                    placeholder="一键微调自动生成，可使用历史试验路径"
+                                    />
+                            </div>
+                        </template>
+                        <span>请输入中文文本</span>
+                        <el-input
+                            v-model="ttsText"
+                            :autosize="{ minRows: 5, maxRows: 6 }"
+                            type="textarea"
+                            placeholder="请输入待合成文本"
+                            />
+                    </el-card>                    
+                </el-space>
+            </el-col>
+            <el-col :span="4"><div class="grid-content ep-bg-purple" />
+                <div class="play_board">
+                    <el-space direction="vertical">
+                        <el-row :gutter="20">
+                            <el-button size="large" v-if="onSyn === 0" type="primary" @click="fineTuneSyn()">开始合成</el-button>
+                            <el-button size="large" v-else :loading-icon="Eleme" type="danger">合成中</el-button>
+                        </el-row>
+
+                        <el-row :gutter="20">
+                            <el-button v-if='this.cloneWav' type="success" @click="PlaySyn()">播放</el-button>
+                            <el-button v-else disabled type="primary" @click="PlaySyn()">播放</el-button>
+                            <el-button v-if='this.cloneWav' type="primary" @click="downLoadCloneWav()">下载</el-button>
+                            <el-button v-else disabled type="primary" @click="downLoadCloneWav()">下载</el-button>
+                        </el-row>
+                    </el-space>
+                </div>
+            </el-col>
+        </el-row>
+    </div>
+    </template>
+    
+    <script>
+    import Recorder from 'js-audio-recorder'
+    import { vcDownload, vcDownloadBase64, vcCloneFineTune, vcCloneFineTuneSyn, fineTuneList, vcDel, fineTuneUpload, fineTuneNewDir } from '../../../api/ApiVC';
+    
+    // 初始化录音
+    const recorder = new Recorder({
+      sampleBits: 16,                 // 采样位数，支持 8 或 16，默认是16
+      sampleRate: 16000,              // 采样率，支持 11025、16000、22050、24000、44100、48000，根据浏览器默认值，我的chrome是48000
+      numChannels: 1,                 // 声道，支持 1 或 2， 默认是1
+      compiling: true
+    })
+    
+    // 初始化播放器
+    const audioCtx = new AudioContext({
+        latencyHint: 'interactive',
+        sampleRate: 16000,
+    });
+
+    function blobToDataURL(blob, callback) {
+        let a = new FileReader();
+        a.onload = function (e) { callback(e.target.result); }
+        a.readAsDataURL(blob);
+    }
+
+    
+    export default {
+        data(){
+            return {
+              vcDatas:[],
+              defaultDataPath: 'default',
+              nowDataPath: '',
+              expPath: '',
+              wav: '',
+              wav_base64: '',
+              ttsText: '欢迎使用飞桨语音套件',
+              cloneWav: '',
+              
+              onEnrollRec: 0,  // 录音状态
+              onFinetune: 0,  // 微调状态
+              onSyn: 0, // 合成状态
+            }
+        },
+        mounted () {
+            this.nowDataPath = this.defaultDataPath
+            this.GetList()
+            
+        },
+        methods: {
+            // 重置 btn 
+            resetFinetuneBtn(){
+                this.onFinetune = 0
+            },
+        
+        // 一键重置
+        async clearAll(){
+            this.vcDatas = []
+            const result = await fineTuneNewDir()
+            console.log("clearALL: ", result.data.result);
+            this.nowDataPath = result.data.result
+            this.expPath = ''
+            this.onFinetune = 0
+            await this.GetList()
+        },
+        // 显示默认
+        async resetDefault(){
+            this.nowDataPath = this.defaultDataPath
+            await this.GetList()
+            this.expPath = ''
+        },
+
+        // 开始录音
+        startRecorderEnroll(){
+            this.onEnrollRec = 1
+            recorder.clear()
+            recorder.start()
+        },
+        // 结束录音
+        stopRecorderEnroll(){
+            this.onEnrollRec = 2
+            recorder.stop()
+            this.wav = recorder.getWAVBlob()
+        },
+
+        // 上传录音
+        async uploadRecord(wavId){
+            this.onEnrollRec = 0
+            if(this.wav === ""){
+                this.$message.error("未检测到录音，录音失败，请重新录制")
+                return
+            } else {
+                if(this.wav === ''){
+                    this.$message.error("请先完成录音");
+                    this.onEnrollRec = 0
+                    return
+                } else {
+                    let fileRes = ""
+                    let fileString = ""
+                    fileRes = await this.readFile(this.wav);
+                    fileString = fileRes.result;
+                    const audioBase64type = (fileString.match(/data:[^;]*;base64,/))?.[0] ?? '';
+                    const isBase64 = !!fileString.match(/data:[^;]*;base64,/);
+                    const uploadBase64 = fileString.substr(audioBase64type.length);
+                    
+                    // 上传时指定文件路径
+                    const data = {
+                        'wav': uploadBase64,
+                        'filename': this.vcDatas[wavId]['wavName'],
+                        'wav_path': this.nowDataPath
+                    }
+
+                    const result = await fineTuneUpload(data);
+                    console.log(result)
+                    this.GetList() 
+                }
+                this.$message.success("录音上传成功")
+            }
+        }, 
+        // 读取文件和Blob
+        readFile(file) {
+            return new Promise((resolve, reject) => {
+                const fileReader = new FileReader();
+                fileReader.onload = function () {
+                    resolve(fileReader);
+                };
+                fileReader.onerror = function (err) {
+                    reject(err);
+                };
+                fileReader.readAsDataURL(file);
+                });
+            },
+
+            // 获取文件列表
+          async GetList(){
+            this.vcDatas = []
+            const result = await fineTuneList({
+              dataPath: this.nowDataPath
+            });
+            console.log(result, result.data.result);
+            for(let i=0; i<result.data.result.length; i++){
+                this.vcDatas.push({
+                  wavId: i,
+                  text: result.data.result[i]['text'],
+                  wavName: result.data.result[i]['name'],
+                  wavPath: result.data.result[i]['path'],
+                })
+            }
+            this.$nextTick(()=>{})
+          },
+                  // 播放音频
+    playAudioData( wav_buffer ) {
+        audioCtx.decodeAudioData(wav_buffer, buffer => {
+            var source = audioCtx.createBufferSource();
+            source.buffer = buffer;
+            source.connect(audioCtx.destination);
+            source.start();
+        }, function(e) {
+            Recorder.throwError(e);
+            })
+    },
+        // base64解码
+        base64ToUint8Array(base64String) {
+        const padding = '='.repeat((4 - base64String.length % 4) % 4);
+        const base64 = (base64String + padding)
+                        .replace(/-/g, '+')
+                        .replace(/_/g, '/');
+
+        const rawData = window.atob(base64);
+        const outputArray = new Uint8Array(rawData.length);
+
+        for (let i = 0; i < rawData.length; ++i) {
+                outputArray[i] = rawData.charCodeAt(i);
+        }
+        return outputArray;
+    },
+            // 播放表格
+        async PlayTable(wavId){
+            this.Play(this.vcDatas[wavId])
+        },
+        // 播放合成后的音频
+        async PlaySyn(){
+           
+            if(this.cloneWav  === ""){
+                this.$message.error("请合成音频后再播放！！")
+                return
+            } else {
+                this.Play(this.cloneWav)
+            }
+        },
+        // 播放音频
+        async Play(wavBase){
+                // 获取音频数据
+                const result = await vcDownloadBase64(wavBase);
+                // console.log('play result', result)
+                if (result.data.code === 0) {
+                    // base转换二进制数
+                    let typedArray = this.base64ToUint8Array(result.data.result)
+                    // 添加wav文件头
+                    let view = new DataView(typedArray.buffer);
+                    view = Recorder.encodeWAV(view, 16000, 16000, 1, 16, true);
+                    // 播放音频
+                    this.playAudioData(view.buffer);
+                } else {
+                    this.$message.error("获取音频文件失败")
+                }
+        },
+                // 下载合成文件
+        async downLoadCloneWav(){
+            if(this.cloneWav  === ""){
+                this.$message.error("音频合成完毕后再下载！")
+            } else {
+                // const result = await vcDownload(this.cloneWav);
+                // 获取音频数据
+                const result = await vcDownloadBase64(this.cloneWav);
+                let view;
+                // console.log('play result', result)
+                if (result.data.code === 0) {
+                    // base转换二进制数
+                    let typedArray = this.base64ToUint8Array(result.data.result)
+                    // 添加wav文件头
+                    view = new DataView(typedArray.buffer);
+                    view = Recorder.encodeWAV(view, 16000, 16000, 1, 16, true);
+                    // 播放音频
+                    // this.playAudioData(view.buffer);
+                }
+                console.log(view.buffer)
+                // debugger
+                const blob = new Blob([view.buffer], { type: 'audio/wav' });
+                const fileName = new Date().getTime() + '.wav';
+                const down = document.createElement('a');
+                down.download = fileName;
+                down.style.display = 'none';//隐藏,没必要展示出来
+                down.href = URL.createObjectURL(blob);
+                document.body.appendChild(down);
+                down.click();
+                URL.revokeObjectURL(down.href); // 释放URL 对象
+                document.body.removeChild(down);//下载完成移除
+            }
+        },
+        // 删除音频文件
+        async delWav(wavId){
+            if(this.nowDataPath === this.defaultDataPath){
+                this.$message.error("默认音频不允许删除，可以一键重置，重新录音")
+                return 
+            }
+
+            console.log('wavId', wavId)
+            // 删除文件
+            const result = await vcDel(
+                {
+                    wavName: this.vcDatas[wavId]['wavName'],
+                    wavPath: this.vcDatas[wavId]['wavPath']
+                }
+            );
+            if(!result.data.code){
+                this.$message.success("删除成功")
+                this.GetList()
+            } else {
+                this.$message.error("文件删除失败")
+            }
+        }, 
+        // 微调模型
+        async fineTuneModel(){
+            // 先检查是否都有录音
+            for(let i=0; i < this.vcDatas.length; i++){
+                if(this.vcDatas['wavPath'] === ''){
+                    return this.$message.error("还有录音未完成，请先完成录音！")
+                }
+            }
+            this.onFinetune = 1
+            const result = await vcCloneFineTune(
+                {
+                    wav_path: this.nowDataPath,
+                }
+            );
+            if(!result.data.code){
+                this.onFinetune = 2
+                this.expPath = result.data.result
+                console.log("this.expPath: ", this.expPath)
+                this.$message.success("小数据微调成功")
+            } else {
+                this.onFinetune = 3
+                this.$message.error(result.data.msg)
+            }
+        },
+        // 合成音频
+        async fineTuneSyn(){
+            if(!this.expPath){
+                return this.$message.error("请先微调生成模型后再生成！")
+            }
+            // 合成
+            this.onSyn = 1
+            const result = await vcCloneFineTuneSyn(
+                {
+                    exp_path: this.expPath,
+                    text: this.ttsText
+                }
+            );
+            this.onSyn = 0
+            if(!result.data.code){
+                this.cloneWav = result.data.result
+                console.log("clone wav: ", this.cloneWav)
+                this.$message.success("音色克隆成功")
+            } else {
+                this.$message.error(result.data.msg)
+            }
+            this.$nextTick(()=>{})
+        }
+},
+};
+</script>
+    
+<style lang="less" scoped>
+// @import "./style.less";
+.finetune {
+  width: 1200px;
+  height: 410px;
+  background: #FFFFFF;
+  padding: 5px 80px 56px 80px;
+  box-sizing: border-box;
+}
+.el-row {
+  margin-bottom: 20px;
+}
+.grid-content {
+  border-radius: 4px;
+  min-height: 36px;
+}
+.play_board{
+    height: 100%;
+    display: flex;
+    align-items: center;
+}
+</style>
--- a/demos/speech_web/web_client/src/components/SubMenu/IE/IE.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/IE/IE.vue
@ -1,125 +0,0 @@
-<template>
-    <div class="iebox">
-        <h1>信息抽取体验</h1>
-        <el-button :type="recoType" @click="startRecorder()"  style="margin:1vw;">{{ recoText }}</el-button>
-        <h3>识别结果: {{ asrResultOffline }}</h3>
-        <h4>时间：{{ time }}</h4>
-        <h4>出发地：{{ outset }}</h4>
-        <h4>目的地：{{ destination }}</h4>
-        <h4>费用：{{ amount }}</h4>
-
-    </div>
-</template>
-
-<script>
-import Recorder from 'js-audio-recorder'
-
-const recorder = new Recorder({
-  sampleBits: 16,                 // 采样位数，支持 8 或 16，默认是16
-  sampleRate: 16000,              // 采样率，支持 11025、16000、22050、24000、44100、48000，根据浏览器默认值，我的chrome是48000
-  numChannels: 1,                 // 声道，支持 1 或 2， 默认是1
-  compiling: true
-})
-
-    export default {
-        name: "IE",
-        data(){
-            return {
-                streamAsrResult: '',
-                recoType: "primary",
-                recoText: "开始录音",
-                playType: "success",
-                asrResultOffline: '',
-                onReco: false,
-                ws:'',
-
-                time: '',
-                outset: '',
-                destination: '',
-                amount: ''
-
-            }
-        },
-        methods: {
-            startRecorder () {
-                if(!this.onReco){
-                    recorder.clear()
-                    recorder.start().then(() => {
-                    }, (error) => {
-                    console.log("录音出错");
-                })
-                this.onReco = true
-                this.recoType = "danger"
-                this.recoText = "结束录音"
-                
-                this.time = ''
-                this.outset=''
-                this.destination = ''
-                this.amount = ''
-
-                this.$nextTick(()=>{
-                })
-                } else {
-                // 结束录音
-                    recorder.stop()
-                    this.onReco = false
-                    this.recoType = "primary"
-                    this.recoText = "开始录音"
-                    this.$nextTick(()=>{})
-                    // 音频导出成wav,然后上传到服务器
-                    const wavs = recorder.getWAVBlob()
-                    this.uploadFile(wavs, "/api/asr/offline")
-                }
-            },
-            async uploadFile(file, post_url){
-                const formData = new FormData()
-                formData.append('files', file)
-                const result = await this.$http.post(post_url, formData);
-                if (result.data.code === 0) {
-                    this.asrResultOffline = result.data.result
-                    this.$nextTick(()=>{})
-                    this.$message.success(result.data.message);
-                    this.informationExtract()
-                } else {
-                    this.$message.error(result.data.message);
-                }
-            },
-            async informationExtract(){
-                const postdata = {
-                    chat: this.asrResultOffline
-                }
-                const result = await this.$http.post('/api/nlp/ie', postdata)
-                console.log("ie", result)
-
-                                if(result.data.result[0]['时间']){
-                    this.time = result.data.result[0]['时间'][0]['text']
-                }
-                
-                if(result.data.result[0]['出发地']){
-                    this.outset = result.data.result[0]['出发地'][0]['text']
-                }
-
-                if(result.data.result[0]['目的地']){
-                    this.destination = result.data.result[0]['目的地'][0]['text']
-                }
-
-                if(result.data.result[0]['费用']){
-                    this.amount = result.data.result[0]['费用'][0]['text']
-                }
-            }
-
-        },
-
-        
-    }
-</script>
-
-<style lang="less" scoped>
- .iebox {
-  border: 4px solid #F00;
-  top:80%;
-  width: 100%;
-  height: 20%;
-  overflow: auto;
- }
-</style>
--- a/demos/speech_web/web_client/src/components/SubMenu/TTS/TTST.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/TTS/TTST.vue
@ -228,6 +228,10 @@ export default {
        },
        // 基于WS的流式合成
        async getTtsChunkWavWS(){
+            if(this.ws.readyState != this.ws.OPEN){
+                this.$message.error("websocket 链接失败，请检查 Websocket 后端服务是否正确开启")
+                return
+            }
            // 初始化 chunks
            chunks = []
            chunk_index = 0
--- a/demos/speech_web/web_client/src/components/SubMenu/VPR/VPR.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/VPR/VPR.vue
@ -1,178 +0,0 @@
-<template>
-<div class="vprbox">
-        <div>
-      <h1>声纹识别展示</h1>
-    <el-input
-      v-model="spk_id"
-      class="w-50 m-2"
-      size="large"
-      placeholder="spk_id"
-    />
-    <el-button :type="recoType" @click="startRecorder()"  style="margin:1vw;">{{ recoText }}</el-button>
-    <el-button type="primary" @click="Enroll(spk_id)"  style="margin:1vw;"> 注册 </el-button>
-    <el-button type="primary" @click="Recog()"  style="margin:1vw;"> 识别 </el-button>
-    </div>
-    <div>
-        <h2>声纹得分结果</h2>
-        <el-table :data="score_result" style="width: 40%">
-            <el-table-column prop="spkId" label="spk_id" />
-            <el-table-column prop="score" label="score" />
-        </el-table>
-    </div>
-    <div>
-        <h2>声纹数据列表</h2>
-        <el-table :data="vpr_datas" style="width: 40%">
-            <el-table-column prop="spkId" label="spk_id" />
-            <el-table-column label="wav">
-                <template #default="scope2">
-                    <audio :src="'/VPR/vpr/data/?vprId='+scope2.row.vprId" controls>
-                    
-                    </audio>
-                </template>
-            </el-table-column>
-            <el-table-column fixed="right" label="Operations">
-                <template #default="scope">
-                    <el-button @click="Del(scope.row.spkId)" type="text" size="small">Delete</el-button>
-                </template>
-            </el-table-column>
-        </el-table>
-
-    </div>
-    
-</div>
-
-</template>
-
-<script>
-import Recorder from 'js-audio-recorder'
-const recorder = new Recorder({
-  sampleBits: 16,                 // 采样位数，支持 8 或 16，默认是16
-  sampleRate: 16000,              // 采样率，支持 11025、16000、22050、24000、44100、48000，根据浏览器默认值，我的chrome是48000
-  numChannels: 1,                 // 声道，支持 1 或 2， 默认是1
-  compiling: true
-})
-
-
-    export default {
-        name: "VPR",
-        data () {
-            return {
-                url_enroll: '/VPR/vpr/enroll', //注册
-                url_recog: '/VPR/vpr/recog',  //识别
-                url_del: '/VPR/vpr/del',    // 删除
-                url_list: '/VPR/vpr/list',   // 获取列表
-                url_data: '/VPR/vpr/data',   // 获取音频
-
-                spk_id: 'sss',
-                onRecord: false,
-                recoType: "primary",
-                recoText: "开始录音",
-                wav: '',
-
-                score_result: [],
-                vpr_datas: []
-            }
-        },
-        mounted () {
-            this.GetList()
-        },
-        methods: {
-            startRecorder () {
-                this.score_result = []
-                if(!this.onReco){
-                        recorder.start().then(() => {
-                    }, (error) => {
-                    console.log("录音出错");
-                })
-                this.onReco = true
-                this.recoType = "danger"
-                this.recoText = "结束录音"
-                this.$nextTick(()=>{
-                })
-                } else {
-                // 结束录音
-                    recorder.stop()
-                    this.onReco = false
-                    this.recoType = "primary"
-                    this.recoText = "开始录音"
-                    this.$nextTick(()=>{})
-                    // 音频导出成wav,然后上传到服务器
-                    this.wav = recorder.getWAVBlob()
-                }
-            },
-            async Enroll(spk_id){
-                if(this.wav === ''){
-                    this.$message.error("请先完成录音");
-                    return
-                }
-                let formData = new FormData()
-                formData.append('spk_id', this.spk_id)
-                formData.append('audio', this.wav)
-
-                console.log("formData", formData)
-                console.log("spk_id", this.spk_id)
-                const result = await this.$http.post(this.url_enroll, formData);
-                if(result.data.status){
-                    this.$message.success("声纹注册成功")
-                } else {
-                    this.$message.error(result.data.msg)
-                }
-                console.log(result)
-                this.GetList()
-            },
-            async Recog(){
-                this.score_result = []
-                if(this.wav === ''){
-                    this.$message.error("请先完成录音");
-                    return
-                }
-                let formData = new FormData()
-                formData.append('audio', this.wav)
-                const result = await this.$http.post(this.url_recog, formData);
-                console.log(result)
-                result.data.forEach(dat => {
-                    this.score_result.push({
-                        spkId: dat[0],
-                        score: dat[1][1]
-                    })
-                });
-            },
-            async Del(spkId){
-                console.log('spkId', spkId)
-                // 删除用户
-                const result = await this.$http.post(this.url_del, {spk_id: spkId});
-                if(result.data.status){
-                    this.$message.success("删除成功")
-                } else {
-                    this.$message.error(result.data.msg)
-                }
-                this.GetList()
-            },
-            async GetList(){
-                this.vpr_datas =[]
-                const result = await this.$http.get(this.url_list);
-                console.log("list", result)
-                for(let i=0; i<result.data[0].length; i++){
-                    this.vpr_datas.push({
-                        spkId: result.data[0][i],
-                        vprId: result.data[1][i]
-                    })
-                }
-                this.$nextTick(()=>{})
-            },
-            GetData(){},
-        },
-
-    }
-</script>
-
-<style lang='less' scoped>
-.vprbox {
-  border: 4px solid #F00;
-//   position: fixed;
-  top:60%;
-  width: 100%;
-  height: 20%;
-  overflow: auto;
- }
-</style>
--- a/demos/speech_web/web_client/src/components/SubMenu/VPR/VPRT.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/VPR/VPRT.vue
@ -214,14 +214,17 @@ export default {
                let formData = new FormData()
                formData.append('spk_id', this.enrollSpkId)
                formData.append('audio', this.wav)
-
+                
                const result = await vprEnroll(formData)
+                if (!result){
+                    this.$message.error("请检查后端服务是否正确开启")
+                    return 
+                }
                if(result.data.status){
                    this.$message.success("声纹注册成功")
                } else {
                    this.$message.error(result.data.msg)
                }
-                // console.log(result)
                this.GetList()
                this.wav = ''
                this.randomSpkId()
--- a/demos/speech_web/web_client/src/components/SubMenu/VoiceClone/VoiceClone.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/VoiceClone/VoiceClone.vue
@ -0,0 +1,380 @@
+<template>
+    <div class="voiceclone">
+        <el-row :gutter="20">
+            <el-col :span="12"><div class="grid-content ep-bg-purple" />
+                <el-row :gutter="60" class="btn_row_wav" justify="center">
+                    <el-button class="ml-3" v-if="onEnrollRec === 0" @click="startRecorderEnroll()" type="primary">录制音频</el-button>
+                    <el-button class="ml-3" v-else-if="onEnrollRec === 1" @click="stopRecorderEnroll()" type="danger">停止录音</el-button>
+                    <el-button class="ml-3" v-else @click="uploadRecord()" type="success">上传录音</el-button>
+                    <a>&#12288</a>
+                    <el-upload
+                        :multiple="false"
+                        :accept="'.wav'"
+                        :auto-upload="false"
+                        :on-change="handleChange"
+                        :show-file-list="false"
+                    >
+                        <el-button class="ml-3" type="success">上传音频文件</el-button>
+                    </el-upload>
+                </el-row>
+                <div class="recording_table">
+                <el-table :data="vcDatas" border class="recording_table_box" scrollbar-always-on max-height="250px">
+                    <el-table-column prop="wavId" label="序号" width="60"/>
+                    <el-table-column prop="wavName" label="文件名" />
+                    <el-table-column label="操作" width="80">
+                        <template #default="scope">
+                            <div class="flex justify-space-between mb-4 flex-wrap gap-4">
+                                <a @click="PlayTable(scope.row.wavId)"><el-icon><VideoPlay /></el-icon></a>
+                                <a>&#12288</a>
+                                <a @click="delWav(scope.row.wavId)"><el-icon><DeleteFilled /></el-icon></a>
+                            </div>
+                        </template>
+                    </el-table-column>
+                    <el-table-column fixed="right" label="选择" width="70">
+                        <template #default="scope">
+                            <el-switch v-model="scope.row.status"  @click="choseWav(scope.row.wavId)"/>
+                        </template>
+                    </el-table-column>
+                </el-table>
+                </div>
+
+            </el-col>
+            <el-col :span="8"><div class="grid-content ep-bg-purple" />
+                <el-space direction="vertical">
+                    <el-card class="box-card" style="width: 250px; height:310px">
+                        <template #header>
+                            <div class="card-header">
+                            <span>请输入中文文本</span>
+                            </div>
+                        </template>
+                        <div class="mb-2 flex items-center text-sm">
+                            <el-radio-group v-model="func_radio" class="ml-4">
+                            <el-radio label="1" size="large">GE2E</el-radio>
+                            <el-radio label="2" size="large">ECAPA-TDNN</el-radio>
+                            </el-radio-group>
+                        </div>
+                        <el-input
+                            v-model="ttsText"
+                            :autosize="{ minRows: 8, maxRows: 13 }"
+                            type="textarea"
+                            placeholder="Please input"
+                            />
+                    </el-card>                    
+                </el-space>
+            </el-col>
+            <el-col :span="4"><div class="grid-content ep-bg-purple" />
+                <div class="play_board">
+                    <el-space direction="vertical">
+                        <el-row :gutter="20">
+                            <el-button size="large" v-if="g2pOnSys === 0" type="primary" @click="g2pClone()">开始合成</el-button>
+                            <el-button size="large" v-else :loading-icon="Eleme" type="danger">合成中</el-button>
+                        </el-row>
+
+                        <el-row :gutter="20">
+                            <el-button v-if='this.cloneWav' type="success" @click="PlaySyn()">播放</el-button>
+                            <el-button v-else disabled type="primary" @click="PlaySyn()">播放</el-button>
+                            <el-button v-if='this.cloneWav' type="primary" @click="downLoadCloneWav()">下载</el-button>
+                            <el-button v-else disabled type="primary" @click="downLoadCloneWav()">下载</el-button>
+                        </el-row>
+                    </el-space>
+                </div>
+            </el-col>
+        </el-row>
+    </div>
+</template>
+
+<script>
+
+import Recorder from 'js-audio-recorder'
+import { vcCloneG2P, vcCloneSAT, vcDel, vcUpload, vcList, vcDownload, vcDownloadBase64 } from '../../../api/ApiVC';
+
+// 初始化录音
+const recorder = new Recorder({
+  sampleBits: 16,                 // 采样位数，支持 8 或 16，默认是16
+  sampleRate: 16000,              // 采样率，支持 11025、16000、22050、24000、44100、48000，根据浏览器默认值，我的chrome是48000
+  numChannels: 1,                 // 声道，支持 1 或 2， 默认是1
+  compiling: true
+})
+
+// 初始化播放器
+const audioCtx = new AudioContext({
+    latencyHint: 'interactive',
+    sampleRate: 16000,
+});
+
+export default {
+    data(){
+         return {
+            onEnrollRec: 0,     // 注册录音状态
+            wav: '',            // 录音结果
+            vcDatas: [],       // 已录制的音频
+            nowFile: "",        // 当前选择的音频
+            ttsText: "欢迎使用飞桨语音套件",
+            nowIndex: -1,
+            cloneWav: "",
+            g2pOnSys: 0,
+            func_radio: '1',
+         }
+    },
+    mounted () {
+        this.GetList()
+    },
+    methods:{
+        // 重置
+        reset(){
+            this.onEnrollRec = 0
+            this.wav = ''
+            this.vcDatas = []
+            this.nowFile = ""
+            this.ttsText = "欢迎使用飞桨语音套件"
+            this.nowIndex = -1
+        },
+        // 开始录音
+        startRecorderEnroll(){
+            this.onEnrollRec = 1
+            recorder.clear()
+            recorder.start()
+        },
+        // 结束录音
+        stopRecorderEnroll(){
+            this.onEnrollRec = 2
+            recorder.stop()
+            this.wav = recorder.getWAVBlob()
+        },
+        // chose wav
+        choseWav(wavId){
+            this.cloneWav = ''
+            this.nowFile = this.vcDatas[wavId].wavName
+            this.nowIndex = wavId
+            // only wavId is true else false
+            for(let i=0; i<this.vcDatas.length; i++){
+                if(i==wavId){
+                    this.vcDatas[wavId].status = true
+                } else {
+                    this.vcDatas[i].status = false
+                }
+            }
+            this.$nextTick(()=>{})
+        },
+        // 上传录音
+        async uploadRecord(){
+            this.onEnrollRec = 0
+            if(this.wav === ""){
+                this.$message.error("未检测到录音，录音失败，请重新录制")
+                return
+            } else {
+                if(this.wav === ''){
+                    this.$message.error("请先完成录音");
+                    this.onEnrollRec = 0
+                    return
+                } else {
+                    let formData = new FormData();
+                    formData.append('files', this.wav);
+                    const result = await vcUpload(formData);
+                    console.log(result)
+                    this.GetList() 
+                }
+                this.$message.success("录音上传成功")
+            }
+        }, 
+        // 上传列表改变
+        async handleChange(file, fileList){
+            for(let i=0; i<fileList.length; i++){
+                this.uploadFile(fileList[i])
+            } 
+        },
+
+        // 上传音频
+        async uploadFile(file){
+            let formData = new FormData();
+            formData.append('files', file.raw);
+            const result = await vcUpload(formData);
+            if (result.data.code === 0) {
+                this.$message.success("音频上传成功")
+                this.GetList()
+            } else {
+                this.$message.error("音频上传失败")
+            }
+        },
+        // 获取文件列表
+        async GetList(){
+            this.vcDatas =[]
+            const result = await vcList();
+            for(let i=0; i<result.data.result.length; i++){
+                this.vcDatas.push({
+                    wavName: result.data.result[i]['name'],
+                    wavId: i,
+                    wavPath: result.data.result[i]['path'],
+                    status: false
+                })
+            }
+            this.$nextTick(()=>{})
+        },
+        // 删除音频文件
+        async delWav(wavId){
+            console.log('wavId', wavId)
+            // 删除文件
+            const result = await vcDel(
+                {
+                    wavName: this.vcDatas[wavId]['wavName'],
+                    wavPath: this.vcDatas[wavId]['wavPath']
+                }
+            );
+            if(!result.data.code){
+                this.$message.success("删除成功")
+            } else {
+                this.$message.error(result.data.msg)
+            }
+            this.GetList()
+            this.reset()
+        },
+        // 下载合成文件
+        async downLoadCloneWav(){
+            if(this.cloneWav  === ""){
+                this.$message.error("音频合成完毕后再下载！")
+            } else {
+                // const result = await vcDownload(this.cloneWav);
+                // 获取音频数据
+                const result = await vcDownloadBase64(this.cloneWav);
+                let view;
+                // console.log('play result', result)
+                if (result.data.code === 0) {
+                    // base转换二进制数
+                    let typedArray = this.base64ToUint8Array(result.data.result)
+                    // 添加wav文件头
+                    view = new DataView(typedArray.buffer);
+                    view = Recorder.encodeWAV(view, 16000, 16000, 1, 16, true);
+                    // 播放音频
+                    // this.playAudioData(view.buffer);
+                }
+                console.log(view.buffer)
+                // debugger
+                const blob = new Blob([view.buffer], { type: 'audio/wav' });
+                const fileName = new Date().getTime() + '.wav';
+                const down = document.createElement('a');
+                down.download = fileName;
+                down.style.display = 'none';//隐藏,没必要展示出来
+                down.href = URL.createObjectURL(blob);
+                document.body.appendChild(down);
+                down.click();
+                URL.revokeObjectURL(down.href); // 释放URL 对象
+                document.body.removeChild(down);//下载完成移除
+            }
+        },
+        // g2p voice clone
+        async g2pClone(){
+            if(this.nowIndex === -1){
+                return this.$message.error("请先录音并上传，选择音频后再点击合成")
+            } else if (this.ttsText === ""){
+                return this.$message.error("合成文本不可以为空")
+            } else if (this.nowIndex >= this.vcDatas.length){
+                return this.$message.error("当前序号不可以超过音频个数")
+            }
+            this.cloneWav = ""
+            let func = ''
+            if(this.func_radio === '1'){
+                func = 'ge2e'
+            } else {
+                func = 'ecapa_tdnn'
+            }
+            console.log('func', func)
+
+            // 合成
+            this.g2pOnSys = 1
+            const result = await vcCloneG2P(
+                {
+                    wavName: this.vcDatas[this.nowIndex]['wavName'],
+                    wavPath: this.vcDatas[this.nowIndex]['wavPath'],
+                    text: this.ttsText,
+                    func: func
+                }
+            );
+            this.g2pOnSys = 0
+            if(result.data.code == 0){
+                this.cloneWav = result.data.result
+                console.log("clone wav: ", this.cloneWav)
+                this.$message.success("音频合成成功")
+            } else {
+                this.$message.error("音频合成失败，请检查后台错误后重试！")
+            }
+        },
+        // 播放表格
+        async PlayTable(wavId){
+            this.Play(this.vcDatas[wavId])
+        },
+        // 播放合成后的音频
+        async PlaySyn(){
+            if(this.cloneWav  === ""){
+                this.$message.error("请合成音频后再播放！！")
+                return
+            } else {
+                this.Play(this.cloneWav)
+            }
+        },
+        // 播放音频
+        async Play(wavBase){
+                // 获取音频数据
+                const result = await vcDownloadBase64(wavBase);
+                // console.log('play result', result)
+                if (result.data.code === 0) {
+                    // base转换二进制数
+                    let typedArray = this.base64ToUint8Array(result.data.result)
+                    // 添加wav文件头
+                    let view = new DataView(typedArray.buffer);
+                    view = Recorder.encodeWAV(view, 16000, 16000, 1, 16, true);
+                    // 播放音频
+                    this.playAudioData(view.buffer);
+                };
+        },
+        // base64解码
+        base64ToUint8Array(base64String) {
+            const padding = '='.repeat((4 - base64String.length % 4) % 4);
+            const base64 = (base64String + padding)
+                            .replace(/-/g, '+')
+                            .replace(/_/g, '/');
+
+            const rawData = window.atob(base64);
+            const outputArray = new Uint8Array(rawData.length);
+
+            for (let i = 0; i < rawData.length; ++i) {
+                    outputArray[i] = rawData.charCodeAt(i);
+            }
+            return outputArray;
+        }, 
+        // 播放音频
+        playAudioData( wav_buffer ) {
+        audioCtx.decodeAudioData(wav_buffer, buffer => {
+            var source = audioCtx.createBufferSource();
+            source.buffer = buffer;
+            source.connect(audioCtx.destination);
+            source.start();
+        }, function(e) {
+            Recorder.throwError(e);
+            })
+        },
+    },
+}
+</script>
+
+<style lang="less" scoped>
+// @import "./style.less";
+.voiceclone {
+    width: 1200px;
+    height: 410px;
+    background: #FFFFFF;
+    padding: 5px 80px 56px 80px;
+    box-sizing: border-box;
+}
+.el-row {
+  margin-bottom: 20px;
+}
+.grid-content {
+  border-radius: 4px;
+  min-height: 36px;
+}
+.play_board{
+    height: 100%;
+    display: flex;
+    align-items: center;
+}
+</style>
--- a/demos/speech_web/web_client/src/main.js
+++ b/demos/speech_web/web_client/src/main.js
@ -1,5 +1,6 @@
 import { createApp } from 'vue'
 import ElementPlus from 'element-plus'
+import * as ElementPlusIconsVue from '@element-plus/icons-vue'
 import 'element-plus/dist/index.css'
 import Antd from 'ant-design-vue';
 import 'ant-design-vue/dist/antd.css';
@ -9,5 +10,8 @@ import axios from 'axios'
 const app = createApp(App)
 app.config.globalProperties.$http = axios

+for (const [key, component] of Object.entries(ElementPlusIconsVue)) {
+    app.component(key, component)
+  }
 app.use(ElementPlus).use(Antd)
 app.mount('#app')
--- a/demos/speech_web/web_client/yarn.lock
+++ b/demos/speech_web/web_client/yarn.lock
@ -44,6 +44,11 @@
  resolved "https://registry.npmmirror.com/@element-plus/icons-vue/-/icons-vue-1.1.4.tgz"
  integrity sha512-Iz/nHqdp1sFPmdzRwHkEQQA3lKvoObk8azgABZ81QUOpW9s/lUyQVUSh0tNtEPZXQlKwlSh7SPgoVxzrE0uuVQ==

+"@element-plus/icons-vue@^2.0.9":
+  version "2.0.9"
+  resolved "https://registry.npmmirror.com/@element-plus/icons-vue/-/icons-vue-2.0.9.tgz#b7777c57534522e387303d194451d50ff549d49a"
+  integrity sha512-okdrwiVeKBmW41Hkl0eMrXDjzJwhQMuKiBOu17rOszqM+LS/yBYpNQNV5Jvoh06Wc+89fMmb/uhzf8NZuDuUaQ==
+
 "@floating-ui/core@^0.6.1":
  version "0.6.1"
  resolved "https://registry.npmmirror.com/@floating-ui/core/-/core-0.6.1.tgz"
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@ -20,6 +20,7 @@ onnxruntime==1.10.0
 opencc
 paddlenlp
 paddlepaddle>=2.2.2
+paddlespeech_ctcdecoders
 paddlespeech_feat
 pandas
 pathos == 0.2.8
@ -27,8 +28,8 @@ pattern_singleton
 Pillow>=9.0.0
 praatio==5.0.0
 prettytable
-pypinyin<=0.44.0
 pypinyin-dict
+pypinyin<=0.44.0
 python-dateutil
 pyworld==0.2.12
 recommonmark>=0.5.0
--- a/docs/source/api/paddlespeech.cls.exps.panns.deploy.predict.rst
+++ b/docs/source/api/paddlespeech.cls.exps.panns.deploy.predict.rst
@ -1,7 +0,0 @@
-paddlespeech.cls.exps.panns.deploy.predict module
-=================================================
-
-.. automodule:: paddlespeech.cls.exps.panns.deploy.predict
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.cls.exps.panns.deploy.rst
+++ b/docs/source/api/paddlespeech.cls.exps.panns.deploy.rst
@ -12,4 +12,3 @@ Submodules
 .. toctree::
   :maxdepth: 4

-   paddlespeech.cls.exps.panns.deploy.predict
--- a/docs/source/api/paddlespeech.cls.exps.panns.export_model.rst
+++ b/docs/source/api/paddlespeech.cls.exps.panns.export_model.rst
@ -1,7 +0,0 @@
-paddlespeech.cls.exps.panns.export\_model module
-================================================
-
-.. automodule:: paddlespeech.cls.exps.panns.export_model
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.cls.exps.panns.predict.rst
+++ b/docs/source/api/paddlespeech.cls.exps.panns.predict.rst
@ -1,7 +0,0 @@
-paddlespeech.cls.exps.panns.predict module
-==========================================
-
-.. automodule:: paddlespeech.cls.exps.panns.predict
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.cls.exps.panns.rst
+++ b/docs/source/api/paddlespeech.cls.exps.panns.rst
@ -20,6 +20,3 @@ Submodules
 .. toctree::
   :maxdepth: 4

-   paddlespeech.cls.exps.panns.export_model
-   paddlespeech.cls.exps.panns.predict
-   paddlespeech.cls.exps.panns.train
--- a/docs/source/api/paddlespeech.cls.exps.panns.train.rst
+++ b/docs/source/api/paddlespeech.cls.exps.panns.train.rst
@ -1,7 +0,0 @@
-paddlespeech.cls.exps.panns.train module
-========================================
-
-.. automodule:: paddlespeech.cls.exps.panns.train
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.kws.exps.mdtc.plot_det_curve.rst
+++ b/docs/source/api/paddlespeech.kws.exps.mdtc.plot_det_curve.rst
@ -1,7 +0,0 @@
-paddlespeech.kws.exps.mdtc.plot\_det\_curve module
-==================================================
-
-.. automodule:: paddlespeech.kws.exps.mdtc.plot_det_curve
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.kws.exps.mdtc.rst
+++ b/docs/source/api/paddlespeech.kws.exps.mdtc.rst
@ -14,6 +14,5 @@ Submodules

   paddlespeech.kws.exps.mdtc.collate
   paddlespeech.kws.exps.mdtc.compute_det
-   paddlespeech.kws.exps.mdtc.plot_det_curve
   paddlespeech.kws.exps.mdtc.score
   paddlespeech.kws.exps.mdtc.train
--- a/docs/source/api/paddlespeech.s2t.decoders.ctcdecoder.rst
+++ b/docs/source/api/paddlespeech.s2t.decoders.ctcdecoder.rst
@ -13,5 +13,4 @@ Submodules
   :maxdepth: 4

   paddlespeech.s2t.decoders.ctcdecoder.decoders_deprecated
-   paddlespeech.s2t.decoders.ctcdecoder.scorer_deprecated
   paddlespeech.s2t.decoders.ctcdecoder.swig_wrapper
--- a/docs/source/api/paddlespeech.s2t.decoders.ctcdecoder.scorer_deprecated.rst
+++ b/docs/source/api/paddlespeech.s2t.decoders.ctcdecoder.scorer_deprecated.rst
@ -1,7 +0,0 @@
-paddlespeech.s2t.decoders.ctcdecoder.scorer\_deprecated module
-==============================================================
-
-.. automodule:: paddlespeech.s2t.decoders.ctcdecoder.scorer_deprecated
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.decoders.recog_bin.rst
+++ b/docs/source/api/paddlespeech.s2t.decoders.recog_bin.rst
@ -1,7 +0,0 @@
-paddlespeech.s2t.decoders.recog\_bin module
-===========================================
-
-.. automodule:: paddlespeech.s2t.decoders.recog_bin
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.decoders.rst
+++ b/docs/source/api/paddlespeech.s2t.decoders.rst
@ -23,5 +23,4 @@ Submodules
   :maxdepth: 4

   paddlespeech.s2t.decoders.recog
-   paddlespeech.s2t.decoders.recog_bin
   paddlespeech.s2t.decoders.utils
--- a/docs/source/api/paddlespeech.s2t.decoders.scorers.ngram.rst
+++ b/docs/source/api/paddlespeech.s2t.decoders.scorers.ngram.rst
@ -1,7 +0,0 @@
-paddlespeech.s2t.decoders.scorers.ngram module
-==============================================
-
-.. automodule:: paddlespeech.s2t.decoders.scorers.ngram
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.decoders.scorers.rst
+++ b/docs/source/api/paddlespeech.s2t.decoders.scorers.rst
@ -15,5 +15,4 @@ Submodules
   paddlespeech.s2t.decoders.scorers.ctc
   paddlespeech.s2t.decoders.scorers.ctc_prefix_score
   paddlespeech.s2t.decoders.scorers.length_bonus
-   paddlespeech.s2t.decoders.scorers.ngram
   paddlespeech.s2t.decoders.scorers.scorer_interface
--- a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.client.rst
+++ b/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.client.rst
@ -1,7 +0,0 @@
-paddlespeech.s2t.exps.deepspeech2.bin.deploy.client module
-==========================================================
-
-.. automodule:: paddlespeech.s2t.exps.deepspeech2.bin.deploy.client
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.record.rst
+++ b/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.record.rst
@ -1,7 +0,0 @@
-paddlespeech.s2t.exps.deepspeech2.bin.deploy.record module
-==========================================================
-
-.. automodule:: paddlespeech.s2t.exps.deepspeech2.bin.deploy.record
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.rst
+++ b/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.rst
@ -12,8 +12,5 @@ Submodules
 .. toctree::
   :maxdepth: 4

-   paddlespeech.s2t.exps.deepspeech2.bin.deploy.client
-   paddlespeech.s2t.exps.deepspeech2.bin.deploy.record
   paddlespeech.s2t.exps.deepspeech2.bin.deploy.runtime
-   paddlespeech.s2t.exps.deepspeech2.bin.deploy.send
   paddlespeech.s2t.exps.deepspeech2.bin.deploy.server
--- a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.send.rst
+++ b/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.send.rst
@ -1,7 +0,0 @@
-paddlespeech.s2t.exps.deepspeech2.bin.deploy.send module
-========================================================
-
-.. automodule:: paddlespeech.s2t.exps.deepspeech2.bin.deploy.send
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.exps.u2.rst
+++ b/docs/source/api/paddlespeech.s2t.exps.u2.rst
@ -21,4 +21,3 @@ Submodules
   :maxdepth: 4

   paddlespeech.s2t.exps.u2.model
-   paddlespeech.s2t.exps.u2.trainer
--- a/docs/source/api/paddlespeech.s2t.exps.u2.trainer.rst
+++ b/docs/source/api/paddlespeech.s2t.exps.u2.trainer.rst
@ -1,7 +0,0 @@
-paddlespeech.s2t.exps.u2.trainer module
-=======================================
-
-.. automodule:: paddlespeech.s2t.exps.u2.trainer
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.exps.u2_kaldi.bin.recog.rst
+++ b/docs/source/api/paddlespeech.s2t.exps.u2_kaldi.bin.recog.rst
@ -1,7 +0,0 @@
-paddlespeech.s2t.exps.u2\_kaldi.bin.recog module
-================================================
-
-.. automodule:: paddlespeech.s2t.exps.u2_kaldi.bin.recog
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.exps.u2_kaldi.bin.rst
+++ b/docs/source/api/paddlespeech.s2t.exps.u2_kaldi.bin.rst
@ -12,6 +12,5 @@ Submodules
 .. toctree::
   :maxdepth: 4

-   paddlespeech.s2t.exps.u2_kaldi.bin.recog
   paddlespeech.s2t.exps.u2_kaldi.bin.test
   paddlespeech.s2t.exps.u2_kaldi.bin.train
--- a/docs/source/api/paddlespeech.s2t.training.extensions.rst
+++ b/docs/source/api/paddlespeech.s2t.training.extensions.rst
@ -15,5 +15,3 @@ Submodules
   paddlespeech.s2t.training.extensions.evaluator
   paddlespeech.s2t.training.extensions.extension
   paddlespeech.s2t.training.extensions.plot
-   paddlespeech.s2t.training.extensions.snapshot
-   paddlespeech.s2t.training.extensions.visualizer
--- a/docs/source/api/paddlespeech.s2t.training.extensions.snapshot.rst
+++ b/docs/source/api/paddlespeech.s2t.training.extensions.snapshot.rst
@ -1,7 +0,0 @@
-paddlespeech.s2t.training.extensions.snapshot module
-====================================================
-
-.. automodule:: paddlespeech.s2t.training.extensions.snapshot
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.training.extensions.visualizer.rst
+++ b/docs/source/api/paddlespeech.s2t.training.extensions.visualizer.rst
@ -1,7 +0,0 @@
-paddlespeech.s2t.training.extensions.visualizer module
-======================================================
-
-.. automodule:: paddlespeech.s2t.training.extensions.visualizer
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.training.updaters.rst
+++ b/docs/source/api/paddlespeech.s2t.training.updaters.rst
@ -13,5 +13,4 @@ Submodules
   :maxdepth: 4

   paddlespeech.s2t.training.updaters.standard_updater
-   paddlespeech.s2t.training.updaters.trainer
   paddlespeech.s2t.training.updaters.updater
--- a/docs/source/api/paddlespeech.s2t.training.updaters.trainer.rst
+++ b/docs/source/api/paddlespeech.s2t.training.updaters.trainer.rst
@ -1,7 +0,0 @@
-paddlespeech.s2t.training.updaters.trainer module
-=================================================
-
-.. automodule:: paddlespeech.s2t.training.updaters.trainer
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.transform.add_deltas.rst
+++ b/docs/source/api/paddlespeech.s2t.transform.add_deltas.rst
@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.add\_deltas module
-=============================================
-
-.. automodule:: paddlespeech.s2t.transform.add_deltas
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.transform.channel_selector.rst
+++ b/docs/source/api/paddlespeech.s2t.transform.channel_selector.rst
@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.channel\_selector module
-===================================================
-
-.. automodule:: paddlespeech.s2t.transform.channel_selector
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.transform.cmvn.rst
+++ b/docs/source/api/paddlespeech.s2t.transform.cmvn.rst
@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.cmvn module
-======================================
-
-.. automodule:: paddlespeech.s2t.transform.cmvn
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.transform.functional.rst
+++ b/docs/source/api/paddlespeech.s2t.transform.functional.rst
@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.functional module
-============================================
-
-.. automodule:: paddlespeech.s2t.transform.functional
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.transform.perturb.rst
+++ b/docs/source/api/paddlespeech.s2t.transform.perturb.rst
@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.perturb module
-=========================================
-
-.. automodule:: paddlespeech.s2t.transform.perturb
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.transform.rst
+++ b/docs/source/api/paddlespeech.s2t.transform.rst
@ -1,24 +0,0 @@
-paddlespeech.s2t.transform package
-==================================
-
-.. automodule:: paddlespeech.s2t.transform
-   :members:
-   :undoc-members:
-   :show-inheritance:
-
-Submodules
----------
-
-.. toctree::
-   :maxdepth: 4
-
-   paddlespeech.s2t.transform.add_deltas
-   paddlespeech.s2t.transform.channel_selector
-   paddlespeech.s2t.transform.cmvn
-   paddlespeech.s2t.transform.functional
-   paddlespeech.s2t.transform.perturb
-   paddlespeech.s2t.transform.spec_augment
-   paddlespeech.s2t.transform.spectrogram
-   paddlespeech.s2t.transform.transform_interface
-   paddlespeech.s2t.transform.transformation
-   paddlespeech.s2t.transform.wpe
--- a/docs/source/api/paddlespeech.s2t.transform.spec_augment.rst
+++ b/docs/source/api/paddlespeech.s2t.transform.spec_augment.rst
@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.spec\_augment module
-===============================================
-
-.. automodule:: paddlespeech.s2t.transform.spec_augment
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.transform.spectrogram.rst
+++ b/docs/source/api/paddlespeech.s2t.transform.spectrogram.rst
@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.spectrogram module
-=============================================
-
-.. automodule:: paddlespeech.s2t.transform.spectrogram
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.transform.transform_interface.rst
+++ b/docs/source/api/paddlespeech.s2t.transform.transform_interface.rst
@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.transform\_interface module
-======================================================
-
-.. automodule:: paddlespeech.s2t.transform.transform_interface
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.transform.transformation.rst
+++ b/docs/source/api/paddlespeech.s2t.transform.transformation.rst
@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.transformation module
-================================================
-
-.. automodule:: paddlespeech.s2t.transform.transformation
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.transform.wpe.rst
+++ b/docs/source/api/paddlespeech.s2t.transform.wpe.rst
@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.wpe module
-=====================================
-
-.. automodule:: paddlespeech.s2t.transform.wpe
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.server.engine.acs.python.acs_engine.rst
+++ b/docs/source/api/paddlespeech.server.engine.acs.python.acs_engine.rst
@ -1,7 +0,0 @@
-paddlespeech.server.engine.acs.python.acs\_engine module
-========================================================
-
-.. automodule:: paddlespeech.server.engine.acs.python.acs_engine
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.server.engine.acs.python.rst
+++ b/docs/source/api/paddlespeech.server.engine.acs.python.rst
@ -12,4 +12,3 @@ Submodules
 .. toctree::
   :maxdepth: 4

-   paddlespeech.server.engine.acs.python.acs_engine
--- a/docs/source/api/paddlespeech.server.utils.log.rst
+++ b/docs/source/api/paddlespeech.server.utils.log.rst
@ -1,7 +0,0 @@
-paddlespeech.server.utils.log module
-====================================
-
-.. automodule:: paddlespeech.server.utils.log
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.t2s.exps.rst
+++ b/docs/source/api/paddlespeech.t2s.exps.rst
@ -30,10 +30,10 @@ Submodules

   paddlespeech.t2s.exps.inference
   paddlespeech.t2s.exps.inference_streaming
+   paddlespeech.t2s.models.vits.monotonic_align
   paddlespeech.t2s.exps.ort_predict
   paddlespeech.t2s.exps.ort_predict_e2e
   paddlespeech.t2s.exps.ort_predict_streaming
-   paddlespeech.t2s.exps.stream_play_tts
   paddlespeech.t2s.exps.syn_utils
   paddlespeech.t2s.exps.synthesize
   paddlespeech.t2s.exps.synthesize_e2e
--- a/docs/source/api/paddlespeech.t2s.exps.stream_play_tts.rst
+++ b/docs/source/api/paddlespeech.t2s.exps.stream_play_tts.rst
@ -1,7 +0,0 @@
-paddlespeech.t2s.exps.stream\_play\_tts module
-==============================================
-
-.. automodule:: paddlespeech.t2s.exps.stream_play_tts
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.t2s.models.ernie_sat.mlm.rst
+++ b/docs/source/api/paddlespeech.t2s.models.ernie_sat.mlm.rst
@ -1,7 +0,0 @@
-paddlespeech.t2s.models.ernie\_sat.mlm module
-=============================================
-
-.. automodule:: paddlespeech.t2s.models.ernie_sat.mlm
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.core.rst
+++ b/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.core.rst
@ -1,7 +0,0 @@
-paddlespeech.t2s.models.vits.monotonic\_align.core module
-=========================================================
-
-.. automodule:: paddlespeech.t2s.models.vits.monotonic_align.core
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.rst
+++ b/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.rst
@ -1,16 +0,0 @@
-paddlespeech.t2s.models.vits.monotonic\_align package
-=====================================================
-
-.. automodule:: paddlespeech.t2s.models.vits.monotonic_align
-   :members:
-   :undoc-members:
-   :show-inheritance:
-
-Submodules
----------
-
-.. toctree::
-   :maxdepth: 4
-
-   paddlespeech.t2s.models.vits.monotonic_align.core
-   paddlespeech.t2s.models.vits.monotonic_align.setup
--- a/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.setup.rst
+++ b/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.setup.rst
@ -1,7 +0,0 @@
-paddlespeech.t2s.models.vits.monotonic\_align.setup module
-==========================================================
-
-.. automodule:: paddlespeech.t2s.models.vits.monotonic_align.setup
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/source/api/paddlespeech.t2s.models.vits.rst
+++ b/docs/source/api/paddlespeech.t2s.models.vits.rst
@ -12,7 +12,6 @@ Subpackages
 .. toctree::
   :maxdepth: 4

-   paddlespeech.t2s.models.vits.monotonic_align
   paddlespeech.t2s.models.vits.wavenet

 Submodules
--- a/docs/source/tts/demo.rst
+++ b/docs/source/tts/demo.rst
--- a/docs/source/tts/demo_2.rst
+++ b/docs/source/tts/demo_2.rst
@ -19,7 +19,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>早上好，今天是2020/10/29，最低温度是-3°C。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/001.wav"
                        type="audio/wav">
@ -27,7 +27,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/001.wav"
                        type="audio/wav">
@ -38,7 +38,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>你好，我的编号是37249，很高兴为您服务。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/002.wav"
                        type="audio/wav">
@ -46,7 +46,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/002.wav"
                        type="audio/wav">
@ -57,7 +57,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>我们公司有37249个人。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/003.wav"
                        type="audio/wav">
@ -65,7 +65,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/003.wav"
                        type="audio/wav">
@ -76,7 +76,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>我出生于2005年10月8日。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/004.wav"
                        type="audio/wav">
@ -84,7 +84,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/004.wav"
                        type="audio/wav">
@ -95,7 +95,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>我们习惯在12:30吃中午饭。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/005.wav"
                        type="audio/wav">
@ -103,7 +103,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/005.wav"
                        type="audio/wav">
@ -114,7 +114,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>只要有超过3/4的人投票同意，你就会成为我们的新班长。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/006.wav"
                        type="audio/wav">
@ -122,7 +122,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/006.wav"
                        type="audio/wav">
@ -133,7 +133,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>我要买一只价值999.9元的手表。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/007.wav"
                        type="audio/wav">
@ -141,7 +141,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/007.wav"
                        type="audio/wav">
@ -152,7 +152,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>我的手机号是18544139121，欢迎来电。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/008.wav"
                        type="audio/wav">
@ -160,7 +160,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/008.wav"
                        type="audio/wav">
@ -171,7 +171,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>明天有62%的概率降雨。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/009.wav"
                        type="audio/wav">
@ -179,7 +179,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/009.wav"
                        type="audio/wav">
@ -190,7 +190,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>手表厂有五种好产品。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/010.wav"
                        type="audio/wav">
@ -198,7 +198,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/010.wav"
                        type="audio/wav">
@ -209,7 +209,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>跑马场有五百匹很勇敢的千里马。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/011.wav"
                        type="audio/wav">
@ -217,7 +217,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/011.wav"
                        type="audio/wav">
@ -228,7 +228,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>有一天，我看到了一栋楼，我顿感不妙，因为我看不清里面有没有人。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/012.wav"
                        type="audio/wav">
@ -236,7 +236,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/012.wav"
                        type="audio/wav">
@ -247,7 +247,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>史小姐拿着小雨伞去找她的老保姆了。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/013.wav"
                        type="audio/wav">
@ -255,7 +255,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/013.wav"
                        type="audio/wav">
@ -266,7 +266,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>不要相信这个老奶奶说的话，她一点儿也不好。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/014.wav"
                        type="audio/wav">
@ -274,7 +274,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/014.wav"
                        type="audio/wav">
--- a/examples/aishell/asr0/local/train.sh
+++ b/examples/aishell/asr0/local/train.sh
@ -26,6 +26,10 @@ if [ ${seed} != 0 ]; then
    export FLAGS_cudnn_deterministic=True
 fi

+# default memeory allocator strategy may case gpu training hang
+# for no OOM raised when memory exhaused
+export FLAGS_allocator_strategy=naive_best_fit
+
 if [ ${ngpu} == 0 ]; then
 python3 -u ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
--- a/examples/aishell/asr1/local/train.sh
+++ b/examples/aishell/asr1/local/train.sh
@ -35,6 +35,10 @@ echo ${ips_config}

 mkdir -p exp

+# default memeory allocator strategy may case gpu training hang
+# for no OOM raised when memory exhaused
+export FLAGS_allocator_strategy=naive_best_fit
+
 if [ ${ngpu} == 0 ]; then
 python3 -u ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
--- a/examples/aishell3/ernie_sat/README.md
+++ b/examples/aishell3/ernie_sat/README.md
@ -1,4 +1,4 @@
-# ERNIE-SAT with VCTK dataset
+# ERNIE-SAT with AISHELL-3 dataset
 ERNIE-SAT speech-text joint pretraining framework, which achieves SOTA results in cross-lingual multi-speaker speech synthesis and cross-lingual speech editing tasks, It can be applied to a series of scenarios such as Speech Editing, personalized Speech Synthesis, and Voice Cloning.

 ## Model Framework
--- a/examples/aishell3/vc0/local/synthesize.sh
+++ b/examples/aishell3/vc0/local/synthesize.sh
@ -4,8 +4,6 @@ config_path=$1
 train_output_path=$2
 ckpt_name=$3

-FLAGS_allocator_strategy=naive_best_fit \
-FLAGS_fraction_of_gpu_memory_to_use=0.01 \
 python3 ${BIN_DIR}/../synthesize.py \
    --am=tacotron2_aishell3 \
    --am_config=${config_path} \
--- a/examples/aishell3/vc0/local/voice_cloning.sh
+++ b/examples/aishell3/vc0/local/voice_cloning.sh
@ -6,8 +6,6 @@ ckpt_name=$3
 ge2e_params_path=$4
 ref_audio_dir=$5

-FLAGS_allocator_strategy=naive_best_fit \
-FLAGS_fraction_of_gpu_memory_to_use=0.01 \
 python3 ${BIN_DIR}/../voice_cloning.py \
    --am=tacotron2_aishell3 \
    --am_config=${config_path} \
--- a/examples/aishell3/vc1/local/synthesize.sh
+++ b/examples/aishell3/vc1/local/synthesize.sh
@ -4,8 +4,6 @@ config_path=$1
 train_output_path=$2
 ckpt_name=$3

-FLAGS_allocator_strategy=naive_best_fit \
-FLAGS_fraction_of_gpu_memory_to_use=0.01 \
 python3 ${BIN_DIR}/../synthesize.py \
    --am=fastspeech2_aishell3 \
    --am_config=${config_path} \
--- a/examples/aishell3/vc1/local/voice_cloning.sh
+++ b/examples/aishell3/vc1/local/voice_cloning.sh
@ -6,8 +6,6 @@ ckpt_name=$3
 ge2e_params_path=$4
 ref_audio_dir=$5

-FLAGS_allocator_strategy=naive_best_fit \
-FLAGS_fraction_of_gpu_memory_to_use=0.01 \
 python3 ${BIN_DIR}/../voice_cloning.py \
    --am=fastspeech2_aishell3 \
    --am_config=${config_path} \
--- a/examples/aishell3/vc2/local/synthesize.sh
+++ b/examples/aishell3/vc2/local/synthesize.sh
@ -4,8 +4,6 @@ config_path=$1
 train_output_path=$2
 ckpt_name=$3

-FLAGS_allocator_strategy=naive_best_fit \
-FLAGS_fraction_of_gpu_memory_to_use=0.01 \
 python3 ${BIN_DIR}/../synthesize.py \
    --am=fastspeech2_aishell3 \
    --am_config=${config_path} \
--- a/examples/aishell3/vc2/local/voice_cloning.sh
+++ b/examples/aishell3/vc2/local/voice_cloning.sh
@ -5,8 +5,6 @@ train_output_path=$2
 ckpt_name=$3
 ref_audio_dir=$4

-FLAGS_allocator_strategy=naive_best_fit \
-FLAGS_fraction_of_gpu_memory_to_use=0.01 \
 python3 ${BIN_DIR}/../voice_cloning.py \
    --am=fastspeech2_aishell3 \
    --am_config=${config_path} \
--- a/examples/aishell3_vctk/ernie_sat/README.md
+++ b/examples/aishell3_vctk/ernie_sat/README.md
@ -1,4 +1,4 @@
-# ERNIE-SAT with VCTK dataset
+# ERNIE-SAT with AISHELL-3 and VCTK dataset
 ERNIE-SAT speech-text joint pretraining framework, which achieves SOTA results in cross-lingual multi-speaker speech synthesis and cross-lingual speech editing tasks, It can be applied to a series of scenarios such as Speech Editing, personalized Speech Synthesis, and Voice Cloning.

 ## Model Framework
--- a/examples/iwslt2012/punc0/conf/ernie-3.0-base.yaml
+++ b/examples/iwslt2012/punc0/conf/ernie-3.0-base.yaml
@ -0,0 +1,44 @@
+###########################################################
+#                       DATA SETTING                      #
+###########################################################
+dataset_type: Ernie
+train_path: data/iwslt2012_zh/train.txt
+dev_path: data/iwslt2012_zh/dev.txt
+test_path: data/iwslt2012_zh/test.txt
+batch_size: 64
+num_workers: 2
+data_params: 
+    pretrained_token: ernie-3.0-base-zh
+    punc_path: data/iwslt2012_zh/punc_vocab
+    seq_len: 100
+
+
+###########################################################
+#                       MODEL SETTING                     #
+###########################################################
+model_type: ErnieLinear
+model:
+    pretrained_token: ernie-3.0-base-zh
+    num_classes: 4
+
+###########################################################
+#                     OPTIMIZER SETTING                   #
+###########################################################
+optimizer_params:
+    weight_decay: 1.0e-6               # weight decay coefficient.
+
+scheduler_params:
+    learning_rate: 1.0e-5               # learning rate.
+    gamma: 0.9999                          # scheduler gamma must between(0.0, 1.0) and closer to 1.0 is better.
+
+###########################################################
+#                     TRAINING SETTING                    #
+###########################################################
+max_epoch: 20
+num_snapshots: 5
+
+###########################################################
+#                     OTHER SETTING                       #
+###########################################################
+num_snapshots: 10                 # max number of snapshots to keep while training
+seed: 42                          # random seed for paddle, random, and np.random
--- a/examples/iwslt2012/punc0/conf/ernie-3.0-medium.yaml
+++ b/examples/iwslt2012/punc0/conf/ernie-3.0-medium.yaml
@ -0,0 +1,44 @@
+###########################################################
+#                       DATA SETTING                      #
+###########################################################
+dataset_type: Ernie
+train_path: data/iwslt2012_zh/train.txt
+dev_path: data/iwslt2012_zh/dev.txt
+test_path: data/iwslt2012_zh/test.txt
+batch_size: 64
+num_workers: 2
+data_params: 
+    pretrained_token: ernie-3.0-medium-zh
+    punc_path: data/iwslt2012_zh/punc_vocab
+    seq_len: 100
+
+
+###########################################################
+#                       MODEL SETTING                     #
+###########################################################
+model_type: ErnieLinear
+model:
+    pretrained_token: ernie-3.0-medium-zh
+    num_classes: 4
+
+###########################################################
+#                     OPTIMIZER SETTING                   #
+###########################################################
+optimizer_params:
+    weight_decay: 1.0e-6               # weight decay coefficient.
+
+scheduler_params:
+    learning_rate: 1.0e-5               # learning rate.
+    gamma: 0.9999                          # scheduler gamma must between(0.0, 1.0) and closer to 1.0 is better.
+
+###########################################################
+#                     TRAINING SETTING                    #
+###########################################################
+max_epoch: 20
+num_snapshots: 5
+
+###########################################################
+#                     OTHER SETTING                       #
+###########################################################
+num_snapshots: 10                 # max number of snapshots to keep while training
+seed: 42                          # random seed for paddle, random, and np.random
--- a/examples/iwslt2012/punc0/conf/ernie-3.0-mini.yaml
+++ b/examples/iwslt2012/punc0/conf/ernie-3.0-mini.yaml
@ -0,0 +1,44 @@
+###########################################################
+#                       DATA SETTING                      #
+###########################################################
+dataset_type: Ernie
+train_path: data/iwslt2012_zh/train.txt
+dev_path: data/iwslt2012_zh/dev.txt
+test_path: data/iwslt2012_zh/test.txt
+batch_size: 64
+num_workers: 2
+data_params: 
+    pretrained_token: ernie-3.0-mini-zh
+    punc_path: data/iwslt2012_zh/punc_vocab
+    seq_len: 100
+
+
+###########################################################
+#                       MODEL SETTING                     #
+###########################################################
+model_type: ErnieLinear
+model:
+    pretrained_token: ernie-3.0-mini-zh
+    num_classes: 4
+
+###########################################################
+#                     OPTIMIZER SETTING                   #
+###########################################################
+optimizer_params:
+    weight_decay: 1.0e-6               # weight decay coefficient.
+
+scheduler_params:
+    learning_rate: 1.0e-5               # learning rate.
+    gamma: 0.9999                          # scheduler gamma must between(0.0, 1.0) and closer to 1.0 is better.
+
+###########################################################
+#                     TRAINING SETTING                    #
+###########################################################
+max_epoch: 20
+num_snapshots: 5
+
+###########################################################
+#                     OTHER SETTING                       #
+###########################################################
+num_snapshots: 10                 # max number of snapshots to keep while training
+seed: 42                          # random seed for paddle, random, and np.random
--- a/examples/iwslt2012/punc0/conf/ernie-3.0-nano-zh.yaml
+++ b/examples/iwslt2012/punc0/conf/ernie-3.0-nano-zh.yaml
@ -0,0 +1,44 @@
+###########################################################
+#                       DATA SETTING                      #
+###########################################################
+dataset_type: Ernie
+train_path: data/iwslt2012_zh/train.txt
+dev_path: data/iwslt2012_zh/dev.txt
+test_path: data/iwslt2012_zh/test.txt
+batch_size: 64
+num_workers: 2
+data_params: 
+    pretrained_token: ernie-3.0-nano-zh
+    punc_path: data/iwslt2012_zh/punc_vocab
+    seq_len: 100
+
+
+###########################################################
+#                       MODEL SETTING                     #
+###########################################################
+model_type: ErnieLinear
+model:
+    pretrained_token: ernie-3.0-nano-zh
+    num_classes: 4
+
+###########################################################
+#                     OPTIMIZER SETTING                   #
+###########################################################
+optimizer_params:
+    weight_decay: 1.0e-6               # weight decay coefficient.
+
+scheduler_params:
+    learning_rate: 1.0e-5               # learning rate.
+    gamma: 0.9999                          # scheduler gamma must between(0.0, 1.0) and closer to 1.0 is better.
+
+###########################################################
+#                     TRAINING SETTING                    #
+###########################################################
+max_epoch: 20
+num_snapshots: 5
+
+###########################################################
+#                     OTHER SETTING                       #
+###########################################################
+num_snapshots: 10                 # max number of snapshots to keep while training
+seed: 42                          # random seed for paddle, random, and np.random
--- a/examples/iwslt2012/punc0/conf/ernie-tiny.yaml
+++ b/examples/iwslt2012/punc0/conf/ernie-tiny.yaml
@ -0,0 +1,44 @@
+###########################################################
+#                       DATA SETTING                      #
+###########################################################
+dataset_type: Ernie
+train_path: data/iwslt2012_zh/train.txt
+dev_path: data/iwslt2012_zh/dev.txt
+test_path: data/iwslt2012_zh/test.txt
+batch_size: 64
+num_workers: 2
+data_params: 
+    pretrained_token: ernie-tiny
+    punc_path: data/iwslt2012_zh/punc_vocab
+    seq_len: 100
+
+
+###########################################################
+#                       MODEL SETTING                     #
+###########################################################
+model_type: ErnieLinear
+model:
+    pretrained_token: ernie-tiny
+    num_classes: 4
+
+###########################################################
+#                     OPTIMIZER SETTING                   #
+###########################################################
+optimizer_params:
+    weight_decay: 1.0e-6               # weight decay coefficient.
+
+scheduler_params:
+    learning_rate: 1.0e-5               # learning rate.
+    gamma: 0.9999                          # scheduler gamma must between(0.0, 1.0) and closer to 1.0 is better.
+
+###########################################################
+#                     TRAINING SETTING                    #
+###########################################################
+max_epoch: 20
+num_snapshots: 5
+
+###########################################################
+#                     OTHER SETTING                       #
+###########################################################
+num_snapshots: 10                 # max number of snapshots to keep while training
+seed: 42                          # random seed for paddle, random, and np.random
--- a/examples/librispeech/asr0/local/train.sh
+++ b/examples/librispeech/asr0/local/train.sh
@ -26,6 +26,10 @@ if [ ${seed} != 0 ]; then
    export FLAGS_cudnn_deterministic=True
 fi

+# default memeory allocator strategy may case gpu training hang
+# for no OOM raised when memory exhaused
+export FLAGS_allocator_strategy=naive_best_fit
+
 if [ ${ngpu} == 0 ]; then
 python3 -u ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
--- a/examples/librispeech/asr1/local/train.sh
+++ b/examples/librispeech/asr1/local/train.sh
@ -29,6 +29,10 @@ fi
 # export FLAGS_cudnn_exhaustive_search=true
 # export FLAGS_conv_workspace_size_limit=4000

+# default memeory allocator strategy may case gpu training hang
+# for no OOM raised when memory exhaused
+export FLAGS_allocator_strategy=naive_best_fit
+
 if [ ${ngpu} == 0 ]; then
 python3 -u ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
--- a/Show More
+++ b/Show More