Merge branch 'develop' into u2pp_export

3 years ago · bdf876ea7b
parent afda7ed7d1 a657cc3e1b
commit bdf876ea7b
130 changed files with 4391 additions and 1546 deletions
--- a/README.md
+++ b/README.md
@ -19,8 +19,6 @@
 <div align="center">  
 <h4>
    <a href="#quick-start"> Quick Start </a>
  | <a href="#quick-start-server"> Quick Start Server </a>
  | <a href="#quick-start-streaming-server"> Quick Start Streaming Server</a>
  | <a href="#documents"> Documents </a>
  | <a href="#model-list"> Models List </a>
  | <a href="https://aistudio.baidu.com/aistudio/education/group/info/25130"> AIStudio Courses </a>
@ -159,6 +157,8 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
  - 🧩  *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).
 ### Recent Update
 - 🔥 2022.09.26: Add Voice Cloning, TTS finetune, and ERNIE-SAT in [PaddleSpeech Web Demo](./demos/speech_web).
 - ⚡ 2022.09.09: Add AISHELL-3 Voice Cloning [example](./examples/aishell3/vc2) with ECAPA-TDNN speaker encoder.
 - ⚡ 2022.08.25: Release TTS [finetune](./examples/other/tts_finetune/tts3) example.
 - 🔥 2022.08.22: Add ERNIE-SAT models: [ERNIE-SAT-vctk](./examples/vctk/ernie_sat)、[ERNIE-SAT-aishell3](./examples/aishell3/ernie_sat)、[ERNIE-SAT-zh_en](./examples/aishell3_vctk/ernie_sat).
 - 🔥 2022.08.15: Add [g2pW](https://github.com/GitYCC/g2pW) into TTS Chinese Text Frontend.
@ -705,7 +705,7 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
  <tbody>
  <tr>
      <td>Speaker Verification</td>
-      <td>VoxCeleb12</td>
+      <td>VoxCeleb1/2</td>
      <td>ECAPA-TDNN</td>
      <td>
      <a href = "./examples/voxceleb/sv0">ecapa-tdnn-voxceleb12</a>
@ -714,6 +714,31 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
  </tbody>
 </table>
 <a name="SpeakerDiarization"></a>
 **Speaker Diarization**
 <table style="width:100%">
  <thead>
    <tr>
      <th> Task </th>
      <th> Dataset </th>
      <th> Model Type </th>
      <th> Example </th>
    </tr>
  </thead>
  <tbody>
  <tr>
      <td>Speaker Diarization</td>
     <td>AMI</td>
      <td>ECAPA-TDNN + AHC / SC</td>
      <td>
      <a href = "./examples/ami/sd0">ecapa-tdnn-ami</a>
      </td>
    </tr>
  </tbody>
 </table>
 <a name="PunctuationRestoration"></a>
 **Punctuation Restoration**
@ -767,6 +792,7 @@ Normally, [Speech SoTA](https://paperswithcode.com/area/speech), [Audio SoTA](ht
  - [Text-to-Speech](#TextToSpeech)
  - [Audio Classification](#AudioClassification)
  - [Speaker Verification](#SpeakerVerification)
  - [Speaker Diarization](#SpeakerDiarization)
  - [Punctuation Restoration](#PunctuationRestoration)
 - [Community](#Community)
 - [Welcome to contribute](#contribution)
--- a/README_cn.md
+++ b/README_cn.md
@ -19,10 +19,8 @@
 </p>
 <div align="center">  
 <h4>
-  <a href="#安装"> 安装 </a>
+    <a href="#安装"> 安装 </a>
  | <a href="#快速开始"> 快速开始 </a>
  | <a href="#快速使用服务"> 快速使用服务 </a>
  | <a href="#快速使用流式服务"> 快速使用流式服务 </a>
  | <a href="#教程文档"> 教程文档 </a>
  | <a href="#模型列表"> 模型列表 </a>
  | <a href="https://aistudio.baidu.com/aistudio/education/group/info/25130"> AIStudio 课程 </a>
@ -181,6 +179,8 @@
 </div>
 ### 近期更新
 - 🔥 2022.09.26: 新增 Voice Cloning, TTS finetune 和 ERNIE-SAT 到 [PaddleSpeech 网页应用](./demos/speech_web)。
 - ⚡ 2022.09.09: 新增基于 ECAPA-TDNN 声纹模型的 AISHELL-3 Voice Cloning [示例](./examples/aishell3/vc2)。
 - ⚡ 2022.08.25: 发布 TTS [finetune](./examples/other/tts_finetune/tts3) 示例。
 - 🔥 2022.08.22: 新增 ERNIE-SAT 模型: [ERNIE-SAT-vctk](./examples/vctk/ernie_sat)、[ERNIE-SAT-aishell3](./examples/aishell3/ernie_sat)、[ERNIE-SAT-zh_en](./examples/aishell3_vctk/ernie_sat)。
 - 🔥 2022.08.15: 将 [g2pW](https://github.com/GitYCC/g2pW) 引入 TTS 中文文本前端。
@ -717,8 +717,8 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
  </thead>
  <tbody>
  <tr>
-      <td>Speaker Verification</td>
+      <td>声纹识别</td>
-      <td>VoxCeleb12</td>
+      <td>VoxCeleb1/2</td>
      <td>ECAPA-TDNN</td>
      <td>
      <a href = "./examples/voxceleb/sv0">ecapa-tdnn-voxceleb12</a>
@ -727,6 +727,31 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
  </tbody>
 </table>
 <a name="说话人日志模型"></a>
 **说话人日志**
 <table style="width:100%">
  <thead>
    <tr>
      <th> 任务 </th>
      <th> 数据集 </th>
      <th> 模型类型 </th>
      <th> 脚本 </th>
    </tr>
  </thead>
  <tbody>
  <tr>
      <td>说话人日志</td>
      <td>AMI</td>
      <td>ECAPA-TDNN + AHC / SC</td>
      <td>
      <a href = "./examples/ami/sd0">ecapa-tdnn-ami</a>
      </td>
    </tr>
  </tbody>
 </table>
 <a name="标点恢复模型"></a>
 **标点恢复**
@ -786,6 +811,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
  - [语音合成](#语音合成模型)
  - [声音分类](#声音分类模型)
  - [声纹识别](#声纹识别模型)
  - [说话人日志](#说话人日志模型)
  - [标点恢复](#标点恢复模型)
 - [技术交流群](#技术交流群)
 - [欢迎贡献](#欢迎贡献)
--- a/demos/speech_web/.gitignore
+++ b/demos/speech_web/.gitignore
@ -13,4 +13,7 @@
 *.pdmodel
 */source/*
 */PaddleSpeech/*
 */tmp*/*
 */duration.txt
 */oov_info.txt
--- a/demos/speech_web/README.md
+++ b/demos/speech_web/README.md
@ -1,55 +1,82 @@
 # Paddle Speech Demo
-PaddleSpeechDemo 是一个以 PaddleSpeech 的语音交互功能为主体开发的 Demo 展示项目，用于帮助大家更好的上手 PaddleSpeech 以及使用 PaddleSpeech 构建自己的应用。
+## 简介
 Paddle Speech Demo 是一个以 PaddleSpeech 的语音交互功能为主体开发的 Demo 展示项目，用于帮助大家更好的上手 PaddleSpeech 以及使用 PaddleSpeech 构建自己的应用。
-智能语音交互部分使用 PaddleSpeech，对话以及信息抽取部分使用 PaddleNLP，网页前端展示部分基于 Vue3 进行开发
+智能语音交互部分使用 PaddleSpeech，对话以及信息抽取部分使用 PaddleNLP，网页前端展示部分基于 Vue3 进行开发。
 主要功能：
 `main.py` 中包含功能
 + 语音聊天：PaddleSpeech 的语音识别能力+语音合成能力，对话部分基于 PaddleNLP 的闲聊功能
 + 声纹识别：PaddleSpeech 的声纹识别功能展示
 + 语音识别：支持【实时语音识别】，【端到端识别】，【音频文件识别】三种模式
 + 语音合成：支持【流式合成】与【端到端合成】两种方式
 + 语音指令：基于 PaddleSpeech 的语音识别能力与 PaddleNLP 的信息抽取，实现交通费的智能报销
 `vc.py` 中包含功能
 + 一句话合成：基于 GE2E 和 ECAPA-TDNN 模型的一句话合成方案，可以模仿输入的音频的音色进行合成任务
  + GE2E 音色克隆方案可以参考： [【FastSpeech2 + AISHELL-3 Voice Cloning】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/vc1)
  + ECAPA-TDNN 音色克隆方案可以参考: [【FastSpeech2 + AISHELL-3 Voice Cloning (ECAPA-TDNN)】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/vc2)
 + 小数据微调：基于小数据集的微调方案，内置用12句话标贝中文女声微调示例，你也可以通过一键重置，录制自己的声音，注意在安静环境下录制，效果会更好。你可以在 [【Finetune your own AM based on FastSpeech2 with AISHELL-3】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/tts_finetune/tts3)中尝试使用自己的数据集进行微调。
 + ENIRE-SAT：语言-语音跨模态大模型 ENIRE-SAT 可视化展示示例，支持个性化合成，跨语言语音合成（音频为中文则输入英文文本进行合成），语音编辑（修改音频文字中间的结果）功能。 ENIRE-SAT 更多实现细节，可以参考：
  + [【ERNIE-SAT with AISHELL-3 dataset】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/ernie_sat)
  + [【ERNIE-SAT with with AISHELL3 and VCTK datasets】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3_vctk/ernie_sat)
  + [【ERNIE-SAT with VCTK dataset】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/ernie_sat)
 运行效果：
- ![效果](docs/效果展示.png)
+ ![效果](https://user-images.githubusercontent.com/30135920/192155349-9ef93d20-730b-413d-8d50-412fedf11d4b.png)
 ## 安装
 ### 后端环境安装
-```
+## 基础环境安装
 # 安装环境
 cd speech_server
 pip install -r requirements.txt
-# 下载 ie 模型，针对地点进行微调，效果更好，不下载的话会使用其它版本，效果没有这个好
+### 后端环境安装
-cd source
+```bash 
-mkdir model
+# 需要先安装 PaddleSpeech
-cd model
+cd speech_server
-wget https://bj.bcebos.com/paddlenlp/applications/speech-cmd-analysis/finetune/model_state.pdparams
+pip install -r requirements.txt -i https://mirror.baidu.com/pypi/simple
 cd ../
 ```
 ### 前端环境安装
 前端依赖 `node.js` ，需要提前安装，确保 `npm` 可用，`npm` 测试版本 `8.3.1`，建议下载[官网](https://nodejs.org/en/)稳定版的 `node.js`
-```
+如果因为网络问题，无法下载依赖库，可以参考 FAQ 部分，`npm / yarn 下载速度慢问题`
 ```bash
 # 进入前端目录
 cd web_client
 # 安装 `yarn`，已经安装可跳过
 npm install -g yarn
 # 使用yarn安装前端依赖
 yarn install
 cd ../
 ```
 ## 启动服务
 【注意】目前只支持 `main.py` 和 `vc.py` 两者中选择开启一个后端服务。
 ### 启动 `main.py` 后端服务
 #### 下载相关模型
 只需手动下载语音指令所需模型即可，其他模型会自动下载。
-### 开启后端服务
+```bash
 cd speech_server
 mkdir -p source/model
 cd source/model
 # 下载IE模型
 wget https://bj.bcebos.com/paddlenlp/applications/speech-cmd-analysis/finetune/model_state.pdparams
 cd ../../../
 ```
 #### 启动后端服务
 ```
 cd speech_server
@ -57,14 +84,116 @@ cd speech_server
 python main.py --port 8010
 ```
-### 开启前端服务
+
 ### 启动 `vc.py` 后端服务
 参照下面的步骤自行配置项目所需环境。
 Aistudio 在线体验小样本合成后端功能：[【PaddleSpeech进阶】PaddleSpeech小样本合成方案体验](https://aistudio.baidu.com/aistudio/projectdetail/4573549?sUid=2470186&shared=1&ts=1664174385948)
 #### 下载相关模型和音频
 ```bash
 cd speech_server
 # 已创建则跳过
 mkdir -p source/model
 cd source
 # 下载 & 解压 wav （包含VC测试音频）
 wget https://paddlespeech.bj.bcebos.com/demos/speech_web/wav_vc.zip
 unzip wav_vc.zip
 cd model
 # 下载 GE2E 相关模型
 wget https://bj.bcebos.com/paddlespeech/Parakeet/released_models/ge2e/ge2e_ckpt_0.3.zip
 unzip ge2e_ckpt_0.3.zip
 wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_aishell3_ckpt_0.5.zip
 unzip pwg_aishell3_ckpt_0.5.zip
 wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip
 unzip fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip
 # 下载 ECAPA-TDNN 相关模型
 wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_ckpt_vc2_1.2.0.zip
 unzip fastspeech2_aishell3_ckpt_vc2_1.2.0.zip
 # 下载 ERNIE-SAT 相关模型
 # aishell3 ERNIE-SAT
 wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/ernie_sat/erniesat_aishell3_ckpt_1.2.0.zip
 unzip erniesat_aishell3_ckpt_1.2.0.zip
 # vctk ERNIE-SAT
 wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/ernie_sat/erniesat_vctk_ckpt_1.2.0.zip
 unzip erniesat_vctk_ckpt_1.2.0.zip
 # aishell3_vctk ERNIE-SAT
 wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/ernie_sat/erniesat_aishell3_vctk_ckpt_1.2.0.zip
 unzip erniesat_aishell3_vctk_ckpt_1.2.0.zip
 # 下载 finetune 相关模型
 wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_ckpt_1.1.0.zip
 unzip fastspeech2_aishell3_ckpt_1.1.0.zip
 # 下载声码器
 wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_ckpt_0.2.0.zip
 unzip hifigan_aishell3_ckpt_0.2.0.zip
 wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_ckpt_0.2.0.zip
 unzip hifigan_vctk_ckpt_0.2.0.zip
 cd ../../../
 ```
 #### ERNIE-SAT 环境配置
 ERNIE-SAT 体验依赖于 [examples/aishell3_vctk/ernie_sat](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3_vctk/ernie_sat) 的环境。参考 `examples/aishell3_vctk/ernie_sat` 下的 `README.md`， 确保 `examples/aishell3_vctk/ernie_sat` 下 `run.sh` 相关示例代码有效。
 运行好 `examples/aishell3_vctk/ernie_sat` 后，回到当前目录，创建环境：
 ```bash
 cd speech_server
 ln -snf ../../../examples/aishell3_vctk/ernie_sat/download .
 ln -snf ../../../examples/aishell3_vctk/ernie_sat/tools .
 cd ../
 ```
 #### finetune 环境配置
 `finetune` 需要解压 `tools/aligner` 中的 `aishell3_model.zip`，finetune 过程需要使用到 `tools/aligner/aishell3_model/meta.yaml` 文件。
 ```bash
 cd speech_server/tools/aligner
 unzip aishell3_model.zip
 cd -
 ```
 #### 启动后端服务
 ```
 cd speech_server
 # 默认8010端口
 python vc.py --port 8010
 ```
 ### 启动前端服务
 ```
 cd web_client
 yarn dev --port 8011
 ```
-默认配置下，前端中配置的后台地址信息是 localhost，确保后端服务器和打开页面的游览器在同一台机器上，不在一台机器的配置方式见下方的 FAQ：【后端如果部署在其它机器或者别的端口如何修改】
+默认配置下，前端配置的后台地址信息是 `localhost`，确保后端服务器和打开页面的游览器在同一台机器上，不在一台机器的配置方式见下方的 FAQ：【后端如果部署在其它机器或者别的端口如何修改】
 #### 关于前端的一些说明
 为了方便后期的维护，这里并没有给出打包好的 HTML 文件，而是 Vue3 的项目，使用 `yarn dev --port 8011` 的方式启动测试，方便大家debug，相当于是启动了一个前端服务器。
 比如我们在本机启动的这个前端服务（运行 `yarn dev --port 8011` ），我们就可以通过在游览器中通过 `http://localhost:8011` 访问前端页面
 如果我们在其它服务器上（例如：`*.*.*.*` ）启动这个前端服务（运行 `yarn dev --port 8011` ），我们就可以通过在游览器中访问 `http://*.*.*.*:8011` 访问前端页面
 那前端跟后端是什么关系呢？ 两个是独立的，只要前端能够通过代理访问到后端的接口，那就没有问题。你可以在 A 机器上部署后端服务，然后在 B 机器上部署前端服务。我们在 `./web_client/vite.config.js` 中将 `/api` 映射到的是 `http://localhost:8010`，你可以把它配置成任意你想要访问后端地址。
 当前端在以 `*.*.*.*` 这类以 IP 地址形式的网页中访问时，由于游览器的安全限制，会禁止录音，需要重新配置游览器的安全策略， 可以看下面 FAQ 部分： [【前端以IP地址的形式访问，无法录音】]
 ## FAQ 
 #### Q: 如何安装node.js
@ -75,7 +204,7 @@ A： node.js的安装可以参考[【菜鸟教程】](https://www.runoob.com/nod
 A：后端的配置地址有分散在两个文件中
-修改第一个文件 `PaddleSpeechWebClient/vite.config.js`
+修改第一个文件 `./web_client/vite.config.js`
 ```
 server: {
@ -90,7 +219,7 @@ server: {
  }
 ```
-修改第二个文件 `PaddleSpeechWebClient/src/api/API.js`（ Websocket 代理配置失败，所以需要在这个文件中修改）
+修改第二个文件 `./web_client/src/api/API.js`（ Websocket 代理配置失败，所以需要在这个文件中修改）
 ```
 // websocket （这里改成后端所在的接口）
@ -99,12 +228,24 @@ ASR_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/onlineStream',  // Stream ASR 接
 TTS_SOCKET_RECORD: 'ws://localhost:8010/ws/tts/online', // Stream TTS 接口
 ```
-#### Q：后端以IP地址的形式，前端无法录音
+#### Q：前端以IP地址的形式访问，无法录音
 A：这里主要是游览器安全策略的限制，需要配置游览器后重启。游览器修改配置可参考[使用js-audio-recorder报浏览器不支持getUserMedia](https://blog.csdn.net/YRY_LIKE_YOU/article/details/113745273)
 chrome设置地址: chrome://flags/#unsafely-treat-insecure-origin-as-secure
 #### Q: npm / yarn 配置淘宝镜像源
 A: 配置淘宝镜像源，详细可以参考 [【yarn npm 设置淘宝镜像】](https://www.jianshu.com/p/f6f43e8f9d6b)
 ```bash
 # npm 配置淘宝镜像源
 npm config set registry https://registry.npmmirror.com
 # yarn 配置淘宝镜像源
 yarn config set registry http://registry.npm.taobao.org/
 ```
 ## 参考资料
 vue实现录音参考资料：https://blog.csdn.net/qq_41619796/article/details/107865602#t1
--- a/demos/speech_web/docs/效果展示.png
+++ b/demos/speech_web/docs/效果展示.png
--- a/demos/speech_web/speech_server/conf/tts3_finetune.yaml
+++ b/demos/speech_web/speech_server/conf/tts3_finetune.yaml
@ -3,10 +3,10 @@
 ###########################################################
 # Set to -1 to indicate that the parameter is the same as the pretrained model configuration
-batch_size: -1
+batch_size: 10
 learning_rate: 0.0001     # learning rate
 num_snapshots: -1
 # frozen_layers should be a list
 # if you don't need to freeze, set frozen_layers to []
-frozen_layers: ["encoder", "duration_predictor"]
+frozen_layers: ["encoder"]
--- a/demos/speech_web/speech_server/main.py
+++ b/demos/speech_web/speech_server/main.py
@ -1,8 +1,3 @@
 # todo:
 # 1. 开启服务
 # 2. 接收录音音频，返回识别结果
 # 3. 接收ASR识别结果，返回NLP对话结果
 # 4. 接收NLP对话结果，返回TTS音频
 import argparse
 import base64
 import datetime
@ -32,6 +27,7 @@ from starlette.requests import Request
 from starlette.responses import FileResponse
 from starlette.websockets import WebSocketState as WebSocketState
 from paddlespeech.cli.tts.infer import TTSExecutor
 from paddlespeech.server.engine.asr.online.python.asr_engine import PaddleASRConnectionHanddler
 from paddlespeech.server.utils.audio_process import float2pcm
@ -55,7 +51,7 @@ asr_config = "conf/ws_conformer_wenetspeech_application_faster.yaml"
 asr_init_path = "source/demo/demo.wav"
 db_path = "source/db/vpr.sqlite"
 ie_model_path = "source/model"
-
+tts_model = TTSExecutor()
 # 路径配置
 UPLOAD_PATH = "source/vpr"
 WAV_PATH = "source/wav"
@ -72,6 +68,14 @@ manager = ConnectionManager()
 aumanager = AudioMannger(chatbot)
 aumanager.init()
 vpr = VPR(db_path, dim=192, top_k=5)
 # 初始化下载模型
 tts_model(
    text="今天天气准不错",
    output="test.wav",
    am='fastspeech2_mix',
    spk_id=174,
    voc='hifigan_csmsc',
    lang='mix', )
 # 服务配置
@ -331,6 +335,7 @@ async def ieOffline(nlp_base: NlpBase):
 #####################################################################
 # 端到端合成
@app.post("/tts/offline")
 async def text2speechOffline(tts_base: TtsBase):
    text = tts_base.text
@ -340,8 +345,14 @@ async def text2speechOffline(tts_base: TtsBase):
        now_name = "tts_" + datetime.datetime.strftime(
            datetime.datetime.now(), '%Y%m%d%H%M%S') + randName() + ".wav"
        out_file_path = os.path.join(WAV_PATH, now_name)
-        # 保存为文件，再转成base64传输
+        # 使用中英混合CLI
-        chatbot.text2speech(text, outpath=out_file_path)
+        tts_model(
            text=text,
            output=out_file_path,
            am='fastspeech2_mix',
            spk_id=174,
            voc='hifigan_csmsc',
            lang='mix')
        with open(out_file_path, "rb") as f:
            data_bin = f.read()
        base_str = base64.b64encode(data_bin)
--- a/demos/speech_web/speech_server/requirements.txt
+++ b/demos/speech_web/speech_server/requirements.txt
@ -1,13 +1,8 @@
 aiofiles
 faiss-cpu
-fastapi
+praatio==5.0.0
 librosa
 numpy
 paddlenlp
 paddlepaddle
 paddlespeech
 pydantic
-python-multipartscikit_learn
+python-multipart
-SoundFile
+scikit_learn
 starlette
 uvicorn
--- a/demos/speech_web/speech_server/src/ernie_sat.py
+++ b/demos/speech_web/speech_server/src/ernie_sat.py
@ -0,0 +1,198 @@
 import os
 from .util import get_ngpu
 from .util import MAIN_ROOT
 from .util import run_cmd
 class SAT:
    def __init__(self):
        # pretrain model path
        self.zh_pretrain_model_path = os.path.realpath(
            "source/model/erniesat_aishell3_ckpt_1.2.0")
        self.en_pretrain_model_path = os.path.realpath(
            "source/model/erniesat_vctk_ckpt_1.2.0")
        self.cross_pretrain_model_path = os.path.realpath(
            "source/model/erniesat_aishell3_vctk_ckpt_1.2.0")
        self.zh_voc_model_path = os.path.realpath(
            "source/model/hifigan_aishell3_ckpt_0.2.0")
        self.eb_voc_model_path = os.path.realpath(
            "source/model/hifigan_vctk_ckpt_0.2.0")
        self.cross_voc_model_path = os.path.realpath(
            "source/model/hifigan_aishell3_ckpt_0.2.0")
        self.BIN_DIR = os.path.join(MAIN_ROOT,
                                    "paddlespeech/t2s/exps/ernie_sat")
    def zh_synthesize_edit(self,
                           old_str: str,
                           new_str: str,
                           input_name: os.PathLike,
                           output_name: os.PathLike,
                           task_name: str="synthesize",
                           erniesat_ckpt_name: str="snapshot_iter_289500.pdz"):
        if task_name not in ['synthesize', 'edit']:
            print("task name only in ['edit', 'synthesize']")
            return None
        # 推理文件配置
        config_path = os.path.join(self.zh_pretrain_model_path, "default.yaml")
        phones_dict = os.path.join(self.zh_pretrain_model_path,
                                   "phone_id_map.txt")
        erniesat_ckpt = os.path.join(self.zh_pretrain_model_path,
                                     erniesat_ckpt_name)
        erniesat_stat = os.path.join(self.zh_pretrain_model_path,
                                     "speech_stats.npy")
        voc = "hifigan_aishell3"
        voc_config = os.path.join(self.zh_voc_model_path, "default.yaml")
        voc_ckpt = os.path.join(self.zh_voc_model_path,
                                "snapshot_iter_2500000.pdz")
        voc_stat = os.path.join(self.zh_voc_model_path, "feats_stats.npy")
        cmd = self.get_cmd(
            task_name=task_name,
            input_name=input_name,
            old_str=old_str,
            new_str=new_str,
            config_path=config_path,
            phones_dict=phones_dict,
            erniesat_ckpt=erniesat_ckpt,
            erniesat_stat=erniesat_stat,
            voc=voc,
            voc_config=voc_config,
            voc_ckpt=voc_ckpt,
            voc_stat=voc_stat,
            output_name=output_name,
            source_lang="zh",
            target_lang="zh")
        return run_cmd(cmd, output_name)
    def crossclone(self,
                   old_str: str,
                   new_str: str,
                   input_name: os.PathLike,
                   output_name: os.PathLike,
                   source_lang: str,
                   target_lang: str,
                   erniesat_ckpt_name: str="snapshot_iter_489000.pdz"):
        # 推理文件配置
        config_path = os.path.join(self.cross_pretrain_model_path,
                                   "default.yaml")
        phones_dict = os.path.join(self.cross_pretrain_model_path,
                                   "phone_id_map.txt")
        erniesat_ckpt = os.path.join(self.cross_pretrain_model_path,
                                     erniesat_ckpt_name)
        erniesat_stat = os.path.join(self.cross_pretrain_model_path,
                                     "speech_stats.npy")
        voc = "hifigan_aishell3"
        voc_config = os.path.join(self.cross_voc_model_path, "default.yaml")
        voc_ckpt = os.path.join(self.cross_voc_model_path,
                                "snapshot_iter_2500000.pdz")
        voc_stat = os.path.join(self.cross_voc_model_path, "feats_stats.npy")
        task_name = "synthesize"
        cmd = self.get_cmd(
            task_name=task_name,
            input_name=input_name,
            old_str=old_str,
            new_str=new_str,
            config_path=config_path,
            phones_dict=phones_dict,
            erniesat_ckpt=erniesat_ckpt,
            erniesat_stat=erniesat_stat,
            voc=voc,
            voc_config=voc_config,
            voc_ckpt=voc_ckpt,
            voc_stat=voc_stat,
            output_name=output_name,
            source_lang=source_lang,
            target_lang=target_lang)
        return run_cmd(cmd, output_name)
    def en_synthesize_edit(self,
                           old_str: str,
                           new_str: str,
                           input_name: os.PathLike,
                           output_name: os.PathLike,
                           task_name: str="synthesize",
                           erniesat_ckpt_name: str="snapshot_iter_199500.pdz"):
        # 推理文件配置
        config_path = os.path.join(self.en_pretrain_model_path, "default.yaml")
        phones_dict = os.path.join(self.en_pretrain_model_path,
                                   "phone_id_map.txt")
        erniesat_ckpt = os.path.join(self.en_pretrain_model_path,
                                     erniesat_ckpt_name)
        erniesat_stat = os.path.join(self.en_pretrain_model_path,
                                     "speech_stats.npy")
        voc = "hifigan_aishell3"
        voc_config = os.path.join(self.zh_voc_model_path, "default.yaml")
        voc_ckpt = os.path.join(self.zh_voc_model_path,
                                "snapshot_iter_2500000.pdz")
        voc_stat = os.path.join(self.zh_voc_model_path, "feats_stats.npy")
        cmd = self.get_cmd(
            task_name=task_name,
            input_name=input_name,
            old_str=old_str,
            new_str=new_str,
            config_path=config_path,
            phones_dict=phones_dict,
            erniesat_ckpt=erniesat_ckpt,
            erniesat_stat=erniesat_stat,
            voc=voc,
            voc_config=voc_config,
            voc_ckpt=voc_ckpt,
            voc_stat=voc_stat,
            output_name=output_name,
            source_lang="en",
            target_lang="en")
        return run_cmd(cmd, output_name)
    def get_cmd(self,
                task_name: str,
                input_name: str,
                old_str: str,
                new_str: str,
                config_path: str,
                phones_dict: str,
                erniesat_ckpt: str,
                erniesat_stat: str,
                voc: str,
                voc_config: str,
                voc_ckpt: str,
                voc_stat: str,
                output_name: str,
                source_lang: str,
                target_lang: str):
        ngpu = get_ngpu()
        cmd = f"""
            FLAGS_allocator_strategy=naive_best_fit \
            FLAGS_fraction_of_gpu_memory_to_use=0.01 \
            python3 {self.BIN_DIR}/synthesize_e2e.py \
                --task_name={task_name} \
                --wav_path={input_name} \
                --old_str='{old_str}' \
                --new_str='{new_str}' \
                --source_lang={source_lang} \
                --target_lang={target_lang} \
                --erniesat_config={config_path} \
                --phones_dict={phones_dict} \
                --erniesat_ckpt={erniesat_ckpt} \
                --erniesat_stat={erniesat_stat} \
                --voc={voc} \
                --voc_config={voc_config} \
                --voc_ckpt={voc_ckpt} \
                --voc_stat={voc_stat} \
                --output_name={output_name} \
                --ngpu={ngpu}
        """
        return cmd
--- a/demos/speech_web/speech_server/src/finetune.py
+++ b/demos/speech_web/speech_server/src/finetune.py
@ -0,0 +1,127 @@
 import os
 from .util import get_ngpu
 from .util import MAIN_ROOT
 from .util import run_cmd
 def find_max_ckpt(model_path):
    max_ckpt = 0
    for filename in os.listdir(model_path):
        if filename.endswith('.pdz'):
            files = filename[:-4]
            a1, a2, it = files.split("_")
            if int(it) > max_ckpt:
                max_ckpt = int(it)
    return max_ckpt
 class FineTune:
    def __init__(self):
        self.now_file_path = os.path.dirname(__file__)
        self.PYTHONPATH = os.path.join(MAIN_ROOT,
                                       "examples/other/tts_finetune/tts3")
        self.BIN_DIR = os.path.join(MAIN_ROOT,
                                    "paddlespeech/t2s/exps/fastspeech2")
        self.pretrained_model_dir = os.path.realpath(
            "source/model/fastspeech2_aishell3_ckpt_1.1.0")
        self.voc_model_dir = os.path.realpath(
            "source/model/hifigan_aishell3_ckpt_0.2.0")
        self.finetune_config = os.path.join("conf/tts3_finetune.yaml")
    def finetune(self, input_dir, exp_dir='temp', epoch=100):
        """
        use cmd follow examples/other/tts_finetune/tts3/run.sh
        """
        newdir_name = "newdir"
        new_dir = os.path.join(input_dir, newdir_name)
        mfa_dir = os.path.join(exp_dir, 'mfa_result')
        dump_dir = os.path.join(exp_dir, 'dump')
        output_dir = os.path.join(exp_dir, 'exp')
        lang = "zh"
        ngpu = get_ngpu()
        cmd = f"""
            # check oov
            python3 {self.PYTHONPATH}/local/check_oov.py \
                --input_dir={input_dir} \
                --pretrained_model_dir={self.pretrained_model_dir} \
                --newdir_name={newdir_name} \
                --lang={lang}
            # get mfa result
            python3 {self.PYTHONPATH}/local/get_mfa_result.py \
                --input_dir={new_dir} \
                --mfa_dir={mfa_dir} \
                --lang={lang}
            # generate durations.txt
            python3 {self.PYTHONPATH}/local/generate_duration.py \
                --mfa_dir={mfa_dir} 
            # extract feature
            python3 {self.PYTHONPATH}/local/extract_feature.py \
                --duration_file="./durations.txt" \
                --input_dir={new_dir} \
                --dump_dir={dump_dir} \
                --pretrained_model_dir={self.pretrained_model_dir}
            # create finetune env
            python3 {self.PYTHONPATH}/local/prepare_env.py \
                --pretrained_model_dir={self.pretrained_model_dir} \
                --output_dir={output_dir}
            # finetune
            python3 {self.PYTHONPATH}/local/finetune.py \
                --pretrained_model_dir={self.pretrained_model_dir} \
                --dump_dir={dump_dir} \
                --output_dir={output_dir} \
                --ngpu={ngpu} \
                --epoch=100 \
                --finetune_config={self.finetune_config}
        """
        print(cmd)
        return run_cmd(cmd, exp_dir)
    def synthesize(self, text, wav_name, out_wav_dir, exp_dir='temp'):
        voc = "hifigan_aishell3"
        dump_dir = os.path.join(exp_dir, 'dump')
        output_dir = os.path.join(exp_dir, 'exp')
        text_path = os.path.join(exp_dir, 'sentences.txt')
        lang = "zh"
        ngpu = get_ngpu()
        model_path = f"{output_dir}/checkpoints"
        ckpt = find_max_ckpt(model_path)
        # 生成对应的语句
        with open(text_path, "w", encoding='utf8') as f:
            f.write(wav_name + " " + text)
        cmd = f"""
            FLAGS_allocator_strategy=naive_best_fit \
            FLAGS_fraction_of_gpu_memory_to_use=0.01 \
            python3 {self.BIN_DIR}/../synthesize_e2e.py \
                --am=fastspeech2_aishell3 \
                --am_config={self.pretrained_model_dir}/default.yaml \
                --am_ckpt={output_dir}/checkpoints/snapshot_iter_{ckpt}.pdz \
                --am_stat={self.pretrained_model_dir}/speech_stats.npy \
                --voc={voc} \
                --voc_config={self.voc_model_dir}/default.yaml \
                --voc_ckpt={self.voc_model_dir}/snapshot_iter_2500000.pdz \
                --voc_stat={self.voc_model_dir}/feats_stats.npy \
                --lang={lang} \
                --text={text_path} \
                --output_dir={out_wav_dir} \
                --phones_dict={dump_dir}/phone_id_map.txt \
                --speaker_dict={dump_dir}/speaker_id_map.txt \
                --spk_id=0 \
                --ngpu={ngpu}
        """
        out_path = os.path.join(out_wav_dir, f"{wav_name}.wav")
        return run_cmd(cmd, out_path)
--- a/demos/speech_web/speech_server/src/ge2e_clone.py
+++ b/demos/speech_web/speech_server/src/ge2e_clone.py
@ -0,0 +1,60 @@
 import os
 import shutil
 from .util import get_ngpu
 from .util import MAIN_ROOT
 from .util import run_cmd
 class VoiceCloneGE2E():
    def __init__(self):
        # Path 到指定路径上
        self.BIN_DIR = os.path.join(MAIN_ROOT, "paddlespeech/t2s/exps")
        # am
        self.am = "fastspeech2_aishell3"
        self.am_config = "source/model/fastspeech2_nosil_aishell3_vc1_ckpt_0.5/default.yaml"
        self.am_ckpt = "source/model/fastspeech2_nosil_aishell3_vc1_ckpt_0.5/snapshot_iter_96400.pdz"
        self.am_stat = "source/model/fastspeech2_nosil_aishell3_vc1_ckpt_0.5/speech_stats.npy"
        self.phones_dict = "source/model/fastspeech2_nosil_aishell3_vc1_ckpt_0.5/phone_id_map.txt"
        # voc
        self.voc = "pwgan_aishell3"
        self.voc_config = "source/model/pwg_aishell3_ckpt_0.5/default.yaml"
        self.voc_ckpt = "source/model/pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz"
        self.voc_stat = "source/model/pwg_aishell3_ckpt_0.5/feats_stats.npy"
        # ge2e
        self.ge2e_params_path = "source/model/ge2e_ckpt_0.3/step-3000000.pdparams"
    def vc(self, text, input_wav, out_wav):
        # input wav 需要形成临时单独文件夹
        _, full_file_name = os.path.split(input_wav)
        ref_audio_dir = os.path.realpath("tmp_dir/ge2e")
        if os.path.exists(ref_audio_dir):
            shutil.rmtree(ref_audio_dir)
        os.makedirs(ref_audio_dir, exist_ok=True)
        shutil.copy(input_wav, ref_audio_dir)
        output_dir = os.path.dirname(out_wav)
        ngpu = get_ngpu()
        cmd = f"""
            python3 {self.BIN_DIR}/voice_cloning.py \
                    --am={self.am} \
                    --am_config={self.am_config} \
                    --am_ckpt={self.am_ckpt} \
                    --am_stat={self.am_stat} \
                    --voc={self.voc} \
                    --voc_config={self.voc_config} \
                    --voc_ckpt={self.voc_ckpt} \
                    --voc_stat={self.voc_stat} \
                    --ge2e_params_path={self.ge2e_params_path} \
                    --text="{text}" \
                    --input-dir={ref_audio_dir} \
                    --output-dir={output_dir} \
                    --phones-dict={self.phones_dict} \
                    --ngpu={ngpu}
        """
        output_name = os.path.join(output_dir, full_file_name)
        return run_cmd(cmd, output_name=output_name)
--- a/demos/speech_web/speech_server/src/tdnn_clone.py
+++ b/demos/speech_web/speech_server/src/tdnn_clone.py
@ -0,0 +1,56 @@
 import os
 import shutil
 from .util import get_ngpu
 from .util import MAIN_ROOT
 from .util import run_cmd
 class VoiceCloneTDNN():
    def __init__(self):
        # Path 到指定路径上
        self.BIN_DIR = os.path.join(MAIN_ROOT, "paddlespeech/t2s/exps")
        self.am = "fastspeech2_aishell3"
        self.am_config = "source/model/fastspeech2_aishell3_ckpt_vc2_1.2.0/default.yaml"
        self.am_ckpt = "source/model/fastspeech2_aishell3_ckpt_vc2_1.2.0/snapshot_iter_96400.pdz"
        self.am_stat = "source/model/fastspeech2_aishell3_ckpt_vc2_1.2.0/speech_stats.npy"
        self.phones_dict = "source/model/fastspeech2_aishell3_ckpt_vc2_1.2.0/phone_id_map.txt"
        # voc
        self.voc = "pwgan_aishell3"
        self.voc_config = "source/model/pwg_aishell3_ckpt_0.5/default.yaml"
        self.voc_ckpt = "source/model/pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz"
        self.voc_stat = "source/model/pwg_aishell3_ckpt_0.5/feats_stats.npy"
    def vc(self, text, input_wav, out_wav):
        # input wav 需要形成临时单独文件夹
        _, full_file_name = os.path.split(input_wav)
        ref_audio_dir = os.path.realpath("tmp_dir/tdnn")
        if os.path.exists(ref_audio_dir):
            shutil.rmtree(ref_audio_dir)
        os.makedirs(ref_audio_dir, exist_ok=True)
        shutil.copy(input_wav, ref_audio_dir)
        output_dir = os.path.dirname(out_wav)
        ngpu = get_ngpu()
        cmd = f"""
            python3 {self.BIN_DIR}/voice_cloning.py \
                    --am={self.am} \
                    --am_config={self.am_config} \
                    --am_ckpt={self.am_ckpt} \
                    --am_stat={self.am_stat} \
                    --voc={self.voc} \
                    --voc_config={self.voc_config} \
                    --voc_ckpt={self.voc_ckpt} \
                    --voc_stat={self.voc_stat} \
                    --text="{text}" \
                    --input-dir={ref_audio_dir} \
                    --output-dir={output_dir} \
                    --phones-dict={self.phones_dict} \
                    --use_ecapa=True \
                    --ngpu={ngpu}
        """
        output_name = os.path.join(output_dir, full_file_name)
        return run_cmd(cmd, output_name=output_name)
--- a/demos/speech_web/speech_server/src/util.py
+++ b/demos/speech_web/speech_server/src/util.py
@ -1,4 +1,18 @@
 import os
 import random
 import subprocess
 import paddle
 NOW_FILE_PATH = os.path.dirname(__file__)
 MAIN_ROOT = os.path.realpath(os.path.join(NOW_FILE_PATH, "../../../../"))
 def get_ngpu():
    if paddle.device.get_device() == "cpu":
        return 0
    else:
        return 1
 def randName(n=5):
@ -11,3 +25,20 @@ def SuccessRequest(result=None, message="ok"):
 def ErrorRequest(result=None, message="error"):
    return {"code": -1, "result": result, "message": message}
 def run_cmd(cmd, output_name):
    p = subprocess.Popen(cmd, shell=True)
    res = p.wait()
    print(cmd)
    print("运行结果：", res)
    if res == 0:
        # 运行成功
        if os.path.exists(output_name):
            return output_name
        else:
            # 合成的文件不存在
            return None
    else:
        # 运行失败
        return None
--- a/demos/speech_web/speech_server/vc.py
+++ b/demos/speech_web/speech_server/vc.py
@ -0,0 +1,550 @@
 import argparse
 import base64
 import datetime
 import json
 import os
 from typing import List
 import aiofiles
 import librosa
 import soundfile as sf
 import uvicorn
 from fastapi import FastAPI
 from fastapi import UploadFile
 from pydantic import BaseModel
 from src.ernie_sat import SAT
 from src.finetune import FineTune
 from src.ge2e_clone import VoiceCloneGE2E
 from src.tdnn_clone import VoiceCloneTDNN
 from src.util import *
 from starlette.responses import FileResponse
 from paddlespeech.server.utils.audio_process import float2pcm
 # 解析配置
 parser = argparse.ArgumentParser(prog='PaddleSpeechDemo', add_help=True)
 parser.add_argument(
    "--port",
    action="store",
    type=int,
    help="port of the app",
    default=8010,
    required=False)
 args = parser.parse_args()
 port = args.port
 # 这里会对finetune产生影响，所以finetune使用了cmd
 vc_model = VoiceCloneGE2E()
 vc_model_tdnn = VoiceCloneTDNN()
 sat_model = SAT()
 ft_model = FineTune()
 # 配置文件
 tts_config = "conf/tts_online_application.yaml"
 asr_config = "conf/ws_conformer_wenetspeech_application_faster.yaml"
 asr_init_path = "source/demo/demo.wav"
 db_path = "source/db/vc.sqlite"
 ie_model_path = "source/model"
 # 路径配置
 VC_UPLOAD_PATH = "source/wav/vc/upload"
 VC_OUT_PATH = "source/wav/vc/out"
 FT_UPLOAD_PATH = "source/wav/finetune/upload"
 FT_OUT_PATH = "source/wav/finetune/out"
 FT_LABEL_PATH = "source/wav/finetune/label.json"
 FT_LABEL_TXT_PATH = "source/wav/finetune/labels.txt"
 FT_DEFAULT_PATH = "source/wav/finetune/default"
 FT_EXP_BASE_PATH = "tmp_dir/finetune"
 SAT_UPLOAD_PATH = "source/wav/SAT/upload"
 SAT_OUT_PATH = "source/wav/SAT/out"
 SAT_LABEL_PATH = "source/wav/SAT/label.json"
 # SAT 标注结果初始化
 if os.path.exists(SAT_LABEL_PATH):
    with open(SAT_LABEL_PATH, "r", encoding='utf8') as f:
        sat_label_dic = json.load(f)
 else:
    sat_label_dic = {}
 # ft 标注结果初始化
 if os.path.exists(FT_LABEL_PATH):
    with open(FT_LABEL_PATH, "r", encoding='utf8') as f:
        ft_label_dic = json.load(f)
 else:
    ft_label_dic = {}
 # 新建文件夹
 base_sources = [
    VC_UPLOAD_PATH,
    VC_OUT_PATH,
    FT_UPLOAD_PATH,
    FT_OUT_PATH,
    FT_DEFAULT_PATH,
    SAT_UPLOAD_PATH,
    SAT_OUT_PATH,
 ]
 for path in base_sources:
    os.makedirs(path, exist_ok=True)
 #####################################################################
 ########################### APP初始化  ###############################
 #####################################################################
 app = FastAPI()
 ######################################################################
 ########################### 接口类型  #################################
 #####################################################################
 # 接口结构
 class VcBase(BaseModel):
    wavName: str
    wavPath: str
 class VcBaseText(BaseModel):
    wavName: str
    wavPath: str
    text: str
    func: str
 class VcBaseSAT(BaseModel):
    old_str: str
    new_str: str
    language: str
    function: str
    wav: str  # base64编码
    filename: str
 class FTPath(BaseModel):
    dataPath: str
 class VcBaseFT(BaseModel):
    wav: str  # base64编码
    filename: str
    wav_path: str
 class VcBaseFTModel(BaseModel):
    wav_path: str
 class VcBaseFTSyn(BaseModel):
    exp_path: str
    text: str
 ######################################################################
 ########################### 文件列表查询与保存服务 #################################
 #####################################################################
 def getVCList(path):
    VC_FileDict = []
    # 查询upload路径下的wav文件名
    for root, dirs, files in os.walk(path, topdown=False):
        for name in files:
            # print(os.path.join(root, name))
            VC_FileDict.append({'name': name, 'path': os.path.join(root, name)})
    VC_FileDict = sorted(VC_FileDict, key=lambda x: x['name'], reverse=True)
    return VC_FileDict
 async def saveFiles(files, SavePath):
    right = 0
    error = 0
    error_info = "错误文件："
    for file in files:
        try:
            if 'blob' in file.filename:
                out_file_path = os.path.join(
                    SavePath,
                    datetime.datetime.strftime(datetime.datetime.now(),
                                               '%H%M') + randName(3) + ".wav")
            else:
                out_file_path = os.path.join(SavePath, file.filename)
            print("上传文件名:", out_file_path)
            async with aiofiles.open(out_file_path, 'wb') as out_file:
                content = await file.read()  # async read
                await out_file.write(content)  # async write
            # 将文件转成24k, 16bit类型的wav文件
            wav, sr = librosa.load(out_file_path, sr=16000)
            sf.write(out_file_path, data=wav, samplerate=sr)
            right += 1
        except Exception as e:
            error += 1
            error_info = error_info + file.filename + " " + str(e) + "\n"
            continue
    return f"上传成功：{right}, 上传失败：{error}, 失败原因： {error_info}"
 # 音频下载
@app.post("/vc/download")
 async def VcDownload(base: VcBase):
    if os.path.exists(base.wavPath):
        return FileResponse(base.wavPath)
    else:
        return ErrorRequest(message="下载请求失败，文件不存在")
 # 音频下载base64
@app.post("/vc/download_base64")
 async def VcDownloadBase64(base: VcBase):
    if os.path.exists(base.wavPath):
        # 将文件转成16k, 16bit类型的wav文件
        wav, sr = librosa.load(base.wavPath, sr=16000)
        wav = float2pcm(wav)  # float32 to int16
        wav_bytes = wav.tobytes()  # to bytes
        wav_base64 = base64.b64encode(wav_bytes).decode('utf8')
        return SuccessRequest(result=wav_base64)
    else:
        return ErrorRequest(message="播放请求失败，文件不存在")
 ######################################################################
 ########################### VC 服务 #################################
 #####################################################################
 # 上传文件
@app.post("/vc/upload")
 async def VcUpload(files: List[UploadFile]):
    # res = saveFiles(files, VC_UPLOAD_PATH)
    right = 0
    error = 0
    error_info = "错误文件："
    for file in files:
        try:
            if 'blob' in file.filename:
                out_file_path = os.path.join(
                    VC_UPLOAD_PATH,
                    datetime.datetime.strftime(datetime.datetime.now(),
                                               '%H%M') + randName(3) + ".wav")
            else:
                out_file_path = os.path.join(VC_UPLOAD_PATH, file.filename)
            print("上传文件名:", out_file_path)
            async with aiofiles.open(out_file_path, 'wb') as out_file:
                content = await file.read()  # async read
                await out_file.write(content)  # async write
            # 将文件转成24k, 16bit类型的wav文件
            wav, sr = librosa.load(out_file_path, sr=16000)
            sf.write(out_file_path, data=wav, samplerate=sr)
            right += 1
        except Exception as e:
            error += 1
            error_info = error_info + file.filename + " " + str(e) + "\n"
            continue
    return SuccessRequest(
        result=f"上传成功：{right}, 上传失败：{error}, 失败原因： {error_info}")
 # 获取文件列表
@app.get("/vc/list")
 async def VcList():
    res = getVCList(VC_UPLOAD_PATH)
    return SuccessRequest(result=res)
 # 获取音频文件
@app.post("/vc/file")
 async def VcFileGet(base: VcBase):
    if os.path.exists(base.wavPath):
        return FileResponse(base.wavPath)
    else:
        return ErrorRequest(result="获取文件失败")
 # 删除音频文件
@app.post("/vc/del")
 async def VcFileDel(base: VcBase):
    if os.path.exists(base.wavPath):
        os.remove(base.wavPath)
        return SuccessRequest(result="删除成功")
    else:
        return ErrorRequest(result="删除失败")
 # 声音克隆G2P
@app.post("/vc/clone_g2p")
 async def VcCloneG2P(base: VcBaseText):
    if os.path.exists(base.wavPath):
        try:
            if base.func == 'ge2e':
                wavName = base.wavName
                wavPath = os.path.join(VC_OUT_PATH, wavName)
                wavPath = vc_model.vc(
                    text=base.text, input_wav=base.wavPath, out_wav=wavPath)
            else:
                wavName = base.wavName
                wavPath = os.path.join(VC_OUT_PATH, wavName)
                wavPath = vc_model_tdnn.vc(
                    text=base.text, input_wav=base.wavPath, out_wav=wavPath)
            if wavPath:
                res = {"wavName": wavName, "wavPath": wavPath}
                return SuccessRequest(result=res)
            else:
                return ErrorRequest(message="克隆失败，检查克隆脚本是否有效")
        except Exception as e:
            print(e)
            return ErrorRequest(message="克隆失败，合成过程报错")
    else:
        return ErrorRequest(message="克隆失败，音频不存在")
 ######################################################################
 ########################### SAT 服务 #################################
 #####################################################################
 # 声音克隆SAT
@app.post("/vc/clone_sat")
 async def VcCloneSAT(base: VcBaseSAT):
    # 重新整理 sat_label_dict
    if base.filename not in sat_label_dic or sat_label_dic[
            base.filename] != base.old_str:
        sat_label_dic[base.filename] = base.old_str
        with open(SAT_LABEL_PATH, "w", encoding='utf8') as f:
            json.dump(sat_label_dic, f, ensure_ascii=False, indent=4)
    input_file_path = base.wav
    # 选择任务
    if base.language == "zh":
        # 中文
        if base.function == "synthesize":
            output_file_path = os.path.join(SAT_OUT_PATH,
                                            "sat_syn_zh_" + base.filename)
            # 中文克隆
            sat_result = sat_model.zh_synthesize_edit(
                old_str=base.old_str,
                new_str=base.new_str,
                input_name=os.path.realpath(input_file_path),
                output_name=os.path.realpath(output_file_path),
                task_name="synthesize")
        elif base.function == "edit":
            output_file_path = os.path.join(SAT_OUT_PATH,
                                            "sat_edit_zh_" + base.filename)
            # 中文语音编辑
            sat_result = sat_model.zh_synthesize_edit(
                old_str=base.old_str,
                new_str=base.new_str,
                input_name=os.path.realpath(input_file_path),
                output_name=os.path.realpath(output_file_path),
                task_name="edit")
        elif base.function == "crossclone":
            output_file_path = os.path.join(SAT_OUT_PATH,
                                            "sat_cross_zh_" + base.filename)
            # 中文跨语言
            sat_result = sat_model.crossclone(
                old_str=base.old_str,
                new_str=base.new_str,
                input_name=os.path.realpath(input_file_path),
                output_name=os.path.realpath(output_file_path),
                source_lang="zh",
                target_lang="en")
        else:
            return ErrorRequest(
                message="请检查功能选项是否正确，仅支持:synthesize, edit, crossclone")
    elif base.language == "en":
        if base.function == "synthesize":
            output_file_path = os.path.join(SAT_OUT_PATH,
                                            "sat_syn_zh_" + base.filename)
            # 英文语音克隆
            sat_result = sat_model.en_synthesize_edit(
                old_str=base.old_str,
                new_str=base.new_str,
                input_name=os.path.realpath(input_file_path),
                output_name=os.path.realpath(output_file_path),
                task_name="synthesize")
        elif base.function == "edit":
            output_file_path = os.path.join(SAT_OUT_PATH,
                                            "sat_edit_zh_" + base.filename)
            # 英文语音编辑
            sat_result = sat_model.en_synthesize_edit(
                old_str=base.old_str,
                new_str=base.new_str,
                input_name=os.path.realpath(input_file_path),
                output_name=os.path.realpath(output_file_path),
                task_name="edit")
        elif base.function == "crossclone":
            output_file_path = os.path.join(SAT_OUT_PATH,
                                            "sat_cross_zh_" + base.filename)
            # 英文跨语言
            sat_result = sat_model.crossclone(
                old_str=base.old_str,
                new_str=base.new_str,
                input_name=os.path.realpath(input_file_path),
                output_name=os.path.realpath(output_file_path),
                source_lang="en",
                target_lang="zh")
        else:
            return ErrorRequest(
                message="请检查功能选项是否正确，仅支持:synthesize, edit, crossclone")
    else:
        return ErrorRequest(message="请检查功能选项是否正确，仅支持中文和英文")
    if sat_result:
        return SuccessRequest(result=sat_result, message="SAT合成成功")
    else:
        return ErrorRequest(message="SAT 合成失败，请从后台检查错误信息！")
 # SAT 文件列表
@app.get("/sat/list")
 async def SatList():
    res = []
    filelist = getVCList(SAT_UPLOAD_PATH)
    for fileitem in filelist:
        if fileitem['name'] in sat_label_dic:
            fileitem['label'] = sat_label_dic[fileitem['name']]
        else:
            fileitem['label'] = ""
        res.append(fileitem)
    return SuccessRequest(result=res)
 # 上传 SAT 音频
 # 上传文件
@app.post("/sat/upload")
 async def SATUpload(files: List[UploadFile]):
    right = 0
    error = 0
    error_info = "错误文件："
    for file in files:
        try:
            if 'blob' in file.filename:
                out_file_path = os.path.join(
                    SAT_UPLOAD_PATH,
                    datetime.datetime.strftime(datetime.datetime.now(),
                                               '%H%M') + randName(3) + ".wav")
            else:
                out_file_path = os.path.join(SAT_UPLOAD_PATH, file.filename)
            print("上传文件名:", out_file_path)
            async with aiofiles.open(out_file_path, 'wb') as out_file:
                content = await file.read()  # async read
                await out_file.write(content)  # async write
            # 将文件转成24k, 16bit类型的wav文件
            wav, sr = librosa.load(out_file_path, sr=16000)
            sf.write(out_file_path, data=wav, samplerate=sr)
            right += 1
        except Exception as e:
            error += 1
            error_info = error_info + file.filename + " " + str(e) + "\n"
            continue
    return SuccessRequest(
        result=f"上传成功：{right}, 上传失败：{error}, 失败原因： {error_info}")
 ######################################################################
 ########################### FinueTune 服务 #################################
 #####################################################################
 # finetune 文件列表
@app.post("/finetune/list")
 async def FineTuneList(Path: FTPath):
    dataPath = Path.dataPath
    if dataPath == "default":
        # 默认路径
        FT_PATH = FT_DEFAULT_PATH
    else:
        FT_PATH = dataPath
    res = []
    filelist = getVCList(FT_PATH)
    for name, value in ft_label_dic.items():
        wav_path = os.path.join(FT_PATH, name)
        if not os.path.exists(wav_path):
            wav_path = ""
        d = {'text': value['text'], 'name': name, 'path': wav_path}
        res.append(d)
    return SuccessRequest(result=res)
 # 一键重置，获取新的文件地址
@app.get('/finetune/newdir')
 async def FTGetNewDir():
    new_path = os.path.join(FT_UPLOAD_PATH, randName(3))
    if not os.path.exists(new_path):
        os.makedirs(new_path, exist_ok=True)
    # 把 labels.txt 复制进去
    cmd = f"cp {FT_LABEL_TXT_PATH} {new_path}"
    os.system(cmd)
    return SuccessRequest(result=new_path)
 # finetune 上传文件
@app.post("/finetune/upload")
 async def FTUpload(base: VcBaseFT):
    try:
        # 文件夹是否存在
        if not os.path.exists(base.wav_path):
            os.makedirs(base.wav_path)
        # 保存音频文件
        out_file_path = os.path.join(base.wav_path, base.filename)
        wav_b = base64.b64decode(base.wav)
        async with aiofiles.open(out_file_path, 'wb') as out_file:
            await out_file.write(wav_b)  # async write
        return SuccessRequest(result="上传成功")
    except Exception as e:
        return ErrorRequest(result="上传失败")
 # finetune 微调
@app.post("/finetune/clone_finetune")
 async def FTModel(base: VcBaseFTModel):
    # 先检查 wav_path 是否有效
    if base.wav_path == 'default':
        data_path = FT_DEFAULT_PATH
    else:
        data_path = base.wav_path
    if not os.path.exists(data_path):
        return ErrorRequest(message="数据文件夹不存在")
    data_base = data_path.split(os.sep)[-1]
    exp_dir = os.path.join(FT_EXP_BASE_PATH, data_base)
    try:
        exp_dir = ft_model.finetune(
            input_dir=os.path.realpath(data_path),
            exp_dir=os.path.realpath(exp_dir))
        if exp_dir:
            return SuccessRequest(result=exp_dir)
        else:
            return ErrorRequest(message="微调失败")
    except Exception as e:
        print(e)
        return ErrorRequest(message="微调失败")
 # finetune 合成
@app.post("/finetune/clone_finetune_syn")
 async def FTSyn(base: VcBaseFTSyn):
    try:
        if not os.path.exists(base.exp_path):
            return ErrorRequest(result="模型路径不存在")
        wav_name = randName(5)
        wav_path = ft_model.synthesize(
            text=base.text,
            wav_name=wav_name,
            out_wav_dir=os.path.realpath(FT_OUT_PATH),
            exp_dir=os.path.realpath(base.exp_path))
        if wav_path:
            res = {"wavName": wav_name + ".wav", "wavPath": wav_path}
            return SuccessRequest(result=res)
        else:
            return ErrorRequest(message="音频合成失败")
    except Exception as e:
        return ErrorRequest(message="音频合成失败")
 if __name__ == '__main__':
    uvicorn.run(app=app, host='0.0.0.0', port=port)
--- a/demos/speech_web/web_client/package.json
+++ b/demos/speech_web/web_client/package.json
@ -8,6 +8,7 @@
    "preview": "vite preview"
  },
  "dependencies": {
    "@element-plus/icons-vue": "^2.0.9",
    "ant-design-vue": "^2.2.8",
    "axios": "^0.26.1",
    "element-plus": "^2.1.9",
@ -18,6 +19,7 @@
  },
  "devDependencies": {
    "@vitejs/plugin-vue": "^2.3.0",
-    "vite": "^2.9.0"
+    "vite": "^2.9.13",
    "@vue/compiler-sfc": "^3.1.0"
  }
 }
--- a/demos/speech_web/web_client/src/api/API.js
+++ b/demos/speech_web/web_client/src/api/API.js
@ -19,6 +19,26 @@ export const apiURL =   {
    CHAT_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/offlineStream', // ChatBot websocket 接口
    ASR_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/onlineStream',  // Stream ASR 接口
    TTS_SOCKET_RECORD: 'ws://localhost:8010/ws/tts/online', // Stream TTS 接口
    // voice clone
    // Voice Clone
    VC_List: '/api/vc/list',
    SAT_List: '/api/sat/list',
    FineTune_List: '/api/finetune/list',
    VC_Upload: '/api/vc/upload',
    SAT_Upload: '/api/sat/upload',
    FineTune_Upload: '/api/finetune/upload',
    FineTune_NewDir: '/api/finetune/newdir',
    VC_Download: '/api/vc/download',
    VC_Download_Base64: '/api/vc/download_base64',
    VC_Del: '/api/vc/del',
    VC_CloneG2p: '/api/vc/clone_g2p',
    VC_CloneSAT: '/api/vc/clone_sat',
    VC_CloneFineTune: '/api/finetune/clone_finetune',
    VC_CloneFineTuneSyn: '/api/finetune/clone_finetune_syn',
 }
--- a/demos/speech_web/web_client/src/api/ApiVC.js
+++ b/demos/speech_web/web_client/src/api/ApiVC.js
@ -0,0 +1,88 @@
 import axios from 'axios'
 import {apiURL} from "./API.js"
 // 上传音频-vc
 export async function vcUpload(params){
    const result = await axios.post(apiURL.VC_Upload, params);
    return result
 }
 // 上传音频-sat
 export async function satUpload(params){
    const result = await axios.post(apiURL.SAT_Upload, params);
    return result
 }
 // 上传音频-finetune
 export async function fineTuneUpload(params){
    const result = await axios.post(apiURL.FineTune_Upload, params);
    return result
 }
 // 删除音频
 export async function vcDel(params){
    const result = await axios.post(apiURL.VC_Del, params);
    return result
 }
 // 获取音频列表vc
 export async function vcList(){
    const result = await axios.get(apiURL.VC_List);
    return result
 }
 // 获取音频列表Sat
 export async function satList(){
    const result = await axios.get(apiURL.SAT_List);
    return result
 }
 // 获取音频列表fineTune
 export async function fineTuneList(params){
    const result = await axios.post(apiURL.FineTune_List, params);
    return result
 }
 // fineTune 一键重置 获取新的文件夹
 export async function fineTuneNewDir(){
    const result = await axios.get(apiURL.FineTune_NewDir);
    return result
 }
 // 获取音频数据
 export async function vcDownload(params){
    const result = await axios.post(apiURL.VC_Download, params);
    return result
 }
 // 获取音频数据Base64
 export async function vcDownloadBase64(params){
    const result = await axios.post(apiURL.VC_Download_Base64, params);
    return result
 }
 // 克隆合成G2P
 export async function vcCloneG2P(params){
    const result = await axios.post(apiURL.VC_CloneG2p, params);
    return result
 }
 // 克隆合成SAT
 export async function vcCloneSAT(params){
    const result = await axios.post(apiURL.VC_CloneSAT, params);
    return result
 }
 // 克隆合成 - finetune 微调
 export async function vcCloneFineTune(params){
    const result = await axios.post(apiURL.VC_CloneFineTune, params);
    return result
 }
 // 克隆合成 - finetune 合成
 export async function vcCloneFineTuneSyn(params){
    const result = await axios.post(apiURL.VC_CloneFineTuneSyn, params);
    return result
 }
--- a/demos/speech_web/web_client/src/components/Content/Header/Header.vue
+++ b/demos/speech_web/web_client/src/components/Content/Header/Header.vue
@ -4,7 +4,7 @@
        飞桨-PaddleSpeech
      </div>
      <div className="speech_header_describe">
-        PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源模型库，用于语音和音频中的各种关键任务的开发，欢迎大家Star收藏鼓励
+        PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源模型库，用于语音和音频中的各种关键任务的开发。支持语音识别，语音合成，声纹识别，声音分类，语音唤醒，语音翻译等多种语音任务，荣获 NAACL2022 Best Demo Award 。如果你喜欢这个示例，欢迎在 github 中 star 收藏鼓励。
      </div>
      <div className="speech_header_link_box">
        <a href="https://github.com/PaddlePaddle/PaddleSpeech" className="speech_header_link"  target='_blank' rel='noreferrer' key={index}>
--- a/demos/speech_web/web_client/src/components/Content/Header/style.less
+++ b/demos/speech_web/web_client/src/components/Content/Header/style.less
@ -43,6 +43,7 @@
        margin-bottom: 40px;
        display: flex;
        align-items: center;
        margin-top: 40px;
    };
    .speech_header_link {
        display: block;
--- a/demos/speech_web/web_client/src/components/Experience.vue
+++ b/demos/speech_web/web_client/src/components/Experience.vue
@ -6,6 +6,10 @@ import TTST from './SubMenu/TTS/TTST.vue'
 import VPRT from './SubMenu/VPR/VPRT.vue'
 import IET from './SubMenu/IE/IET.vue'
 import VoiceCloneT from './SubMenu/VoiceClone/VoiceClone.vue'
 import ENIRE_SATT from './SubMenu/ENIRE_SAT/ENIRE_SAT.vue'
 import FineTuneT from './SubMenu/FineTune/FineTune.vue'
 </script>
 <template>
@ -37,6 +41,15 @@ import IET from './SubMenu/IE/IET.vue'
            <el-tab-pane label="语音指令" key="5">
            <IET></IET>
            </el-tab-pane>
            <el-tab-pane label="一句话合成" key="6">
            <VoiceCloneT></VoiceCloneT>
            </el-tab-pane>
            <el-tab-pane label="小数据微调" key="7">
            <FineTuneT></FineTuneT>
            </el-tab-pane>
            <el-tab-pane label="ENIRE-SAT" key="8">
            <ENIRE_SATT></ENIRE_SATT>
            </el-tab-pane>
          </el-tabs>
        </div>
      </div>
--- a/demos/speech_web/web_client/src/components/SubMenu/ASR/RealTime/RealTime.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/ASR/RealTime/RealTime.vue
@ -58,9 +58,6 @@ export default {
    mounted () {
        this.wsUrl = apiURL.ASR_SOCKET_RECORD
        this.ws = new WebSocket(this.wsUrl)
        if(this.ws.readyState === this.ws.CONNECTING){
            this.$message.success("实时识别 Websocket 连接成功")
        }
        var _that = this
        this.ws.addEventListener('message', function (event) {
                var temp = JSON.parse(event.data);
@ -78,7 +75,7 @@ export default {
            // 检查 websocket 状态
            // debugger
            if(this.ws.readyState != this.ws.OPEN){
-                this.$message.error("websocket 链接失败，请检查链接地址是否正确")
+                this.$message.error("websocket 链接失败，请检查 Websocket 后端服务是否正确开启")
                return
            }
--- a/demos/speech_web/web_client/src/components/SubMenu/ChatBot/Chat.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/ChatBot/Chat.vue
@ -1,298 +0,0 @@
 <template>
  <div class="chatbox">
      <h3>语音聊天</h3>
      <div class="home" style="margin:1vw;">
      <el-button :type="recoType" @click="startRecorder()"  style="margin:1vw;">{{ recoText }}</el-button>
      <!-- <el-button :type="playType" @click="playRecorder()" style="margin:1vw;"> {{ playText }}</el-button> -->
      <el-button :type="envType" @click="envRecorder()" style="margin:1vw;"> {{ envText }}</el-button>
      <!-- <el-button :type="envType" @click="getTts(ttsd)" style="margin:1vw;"> TTS </el-button> -->
      <el-button type="warning" @click="clearChat()" style="margin:1vw;"> 清空聊天</el-button>
      </div>
      <div v-for="Result in allResultList">
      <h3>{{Result}}</h3>
      </div>
  </div>
 </template>
 <script>
 import Recorder from 'js-audio-recorder'
 const recorder = new Recorder({
  sampleBits: 16,                 // 采样位数，支持 8 或 16，默认是16
  sampleRate: 16000,              // 采样率，支持 11025、16000、22050、24000、44100、48000，根据浏览器默认值，我的chrome是48000
  numChannels: 1,                 // 声道，支持 1 或 2， 默认是1
  compiling: true
 })
  export default {
    name: 'home',
    data () {
      return {
        recoType: "primary",
        recoText: "开始录音",
        playType: "success",
        playText: "播放录音",
        envType: "success",
        envText: "环境采样",
        asrResultList: [],
        nlpResultList: [],
        ttsResultList: [],
        allResultList: [],
        webSocketRes: "websocket",
        drawRecordId: null,
        onReco: false,
        onPlay: false,
        onRecoPause: false,
        ws: '',
        ttsd: "你的名字叫什么,你的名字叫什么,你的名字叫什么你的名字叫什么",
        audioCtx: '',
        source: '',
        typedArray: '',
        ttsResult: '',
      }
    },
    mounted () {
        // 播放器
        var AudioContext = window.AudioContext || window.webkitAudioContext;
        this.audioCtx = new AudioContext({
            latencyHint: 'interactive',
            sampleRate: 24000,
          });
        // 定义 play
        recorder.onplayend = () => {
        this.onPlay = false
        this.playText = "播放录音"
        this.playType = "success"
        this.$nextTick(()=>{})
      }
      // 初始化ws
      this.ws = new WebSocket("ws://localhost:8010/ws/asr/offlineStream");
      // 定义消息处理逻辑
      var _that = this
      this.ws.addEventListener('message', function (event) {
          _that.allResultList.push("asr:" + event.data)
          _that.$nextTick(()=>{})
          _that.getNlp(event.data)
      })
    },
    methods: {
      // 清空录音
      clearChat(){
        this.allResultList = []
      },
      // 开始录音
      startRecorder () {
        if(!this.onReco){
          this.resumeRecordOnline()
          recorder.start().then(() => {
            setInterval(() => {
              // 持续录音
              let newData = recorder.getNextData();
              if (!newData.length) {
                return;
              }
              // 上传到流式测试1
              this.uploadChunk(newData)
            }, 500)
        }, (error) => {
          console.log("录音出错");
        })
        this.onReco = true
        this.recoType = "danger"
        this.recoText = "结束录音"
        this.$nextTick(()=>{
          })
        } else {
          // 结束录音
          recorder.stop()
          this.onReco = false
          this.recoType = "primary"
          this.recoText = "开始录音"
          this.$nextTick(()=>{})
          recorder.clear()
          // 音频导出成wav,然后上传到服务器
          // const wavs = recorder.getWAVBlob()
          // this.uploadFile(wavs, "/api/asr/offline")
          // console.log(wavs)
          // 给服务器发送停止指令, 清空缓存数据
          this.stopRecordOnline()
        }
      },
      // 开始录音
      envRecorder () {
        if(!this.onReco){
          recorder.start().then(() => {
        }, (error) => {
          console.log("录音出错");
        })
        this.onReco = true
        this.envType = "danger"
        this.envText = "结束采样"
        this.$nextTick(()=>{
          })
        } else {
          // 结束录音
          recorder.stop()
          this.onReco = false
          this.envType = "success"
          this.envText = "环境采样"
          this.$nextTick(()=>{})
          const wavs = recorder.getWAVBlob()
          this.uploadFile(wavs, "/api/asr/collectEnv")
        }
      },
      // 录音播放
      playRecorder () {
        if(!this.onPlay){
          // 播放音频
          recorder.play()
          this.onPlay = true
          this.playText = "结束播放"
          this.playType = "warning"
          this.$nextTick(()=>{})
        } else {
          recorder.stopPlay()
          this.onPlay = false
          this.playText = "播放录音"
          this.playType = "success"
          this.$nextTick(()=>{})
        }
      },
      // 上传录音文件
      async uploadFile(file, post_url){
        const formData = new FormData()
        formData.append('files', file)
        const result = await this.$http.post(post_url, formData);
        if (result.data.code === 0) {
              this.asrResultList.push(result.data.result)
              // this.$message.success(result.data.message);
          } else {
              this.$message.error(result.data.message);
          }
      },
      // 上传chunk语音包
      async uploadChunk(chunkDatas) {
        chunkDatas.forEach((chunkData) => {
                this.ws.send(chunkData)
              })
      },
      // 停止录音,输出成pcm
      async stopRecordOnline(){
        const result = await this.$http.get("/api/asr/stopRecord");
        if (result.data.code === 0) {
            console.log("Online 录音停止成功")
          } else {
            // console.log("chunk 发送失败")
          }
      },
      // 恢复录音，中间抛出的语音，一律不接受
      async resumeRecordOnline(){
        const result = await this.$http.get("/api/asr/resumeRecord");
        if (result.data.code === 0) {
            console.log("chunk 发送成功")
          } else {
            // console.log("chunk 发送失败")
          }
      },
      // 请求 NLP 对话结果
      async getNlp(asrText){
        // 录音暂停
        this.onRecoPause = true
        recorder.pause()
        this.stopRecordOnline()
        console.log('录音暂停')
        const result = await this.$http.post("/api/nlp/chat", { chat: asrText});
        if (result.data.code === 0) {
              this.allResultList.push("nlp:" + result.data.result)
              this.getTts(result.data.result)
              // this.$message.success(result.data.message);
          } else {
              this.$message.error(result.data.message);
          }
        // console.log("录音恢复")
      },
    base64ToUint8Array(base64String) {
      const padding = '='.repeat((4 - base64String.length % 4) % 4);
       const base64 = (base64String + padding)
                    .replace(/-/g, '+')
                    .replace(/_/g, '/');
       const rawData = window.atob(base64);
       const outputArray = new Uint8Array(rawData.length);
       for (let i = 0; i < rawData.length; ++i) {
            outputArray[i] = rawData.charCodeAt(i);
       }
       return outputArray;
      },
      // 合成TTS音频
      async getTts(nlpText){
        // base64
        this.ttsResult = await this.$http.post("/api/tts/offline", { text : nlpText});
        this.typedArray = this.base64ToUint8Array(this.ttsResult.data.result)
        // console.log("chat", this.typedArray.buffer)
        this.playAudioData( this.typedArray.buffer )
      },
      // play
      playAudioData( wav_buffer ) {
        this.audioCtx.decodeAudioData(wav_buffer, buffer => {
            this.source = this.audioCtx.createBufferSource();
            this.source.onended = () => {
              // 如果被暂停
              if(this.onRecoPause){
                console.log("恢复录音")
                this.onRecoPause = false
                // 客户端录音恢复
                recorder.resume()
                // 服务器录音恢复
                this.resumeRecordOnline()
              }
            }
            this.source.buffer = buffer;
            this.source.connect(this.audioCtx.destination);
            this.source.start();
        }, function(e) {
            Recorder.throwError(e);
        });
    }
    },
  }
 </script>
 <style lang='less' scoped>
 .chatbox {
  border: 4px solid #F00;
  // position: fixed;
  width: 100%;
  height: 20%;
  overflow: auto;
 }
 </style>
--- a/demos/speech_web/web_client/src/components/SubMenu/ChatBot/ChatT.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/ChatBot/ChatT.vue
@ -91,6 +91,10 @@ export default {
    methods: {
        // 开始录音
        startRecorder(){
          if(this.ws.readyState != this.ws.OPEN){
                this.$message.error("websocket 链接失败，请检查 Websocket 后端服务是否正确开启")
                return
            }
          this.allResultList = []
          if(!this.onReco){
            this.asrResult = this.speakingText
--- a/demos/speech_web/web_client/src/components/SubMenu/ENIRE_SAT/ENIRE_SAT.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/ENIRE_SAT/ENIRE_SAT.vue
@ -0,0 +1,487 @@
 <template>
    <div class="sat">
      <el-row :gutter="20">
            <el-col :span="12"><div class="grid-content ep-bg-purple" />
                <el-row :gutter="60" class="btn_row_wav" justify="center">
                    <el-button class="ml-3" v-if="onEnrollRec === 0" @click="startRecorderEnroll()" type="primary">录制音频</el-button>
                    <el-button class="ml-3" v-else-if="onEnrollRec === 1" @click="stopRecorderEnroll()" type="danger">停止录音</el-button>
                    <el-button class="ml-3" v-else @click="uploadRecord()" type="success">上传录音</el-button>
                    <a>&#12288</a>
                    <el-upload
                        :multiple="false"
                        :accept="'.wav'"
                        :auto-upload="false"
                        :on-change="handleChange"
                        :show-file-list="false"
                    >
                        <el-button class="ml-3" type="success">上传音频文件</el-button>
                    </el-upload>
                </el-row>
                <div class="recording_table">
                <el-table :data="vcDatas" border class="recording_table_box" scrollbar-always-on max-height="250px">
                    <!-- <el-table-column prop="wavId" label="序号" width="60"/> -->
                    <el-table-column prop="wavName" label="文件名" width="150"/>
                    <el-table-column label="文本">
                      <template #default="scope">
                            <el-input 
                              v-model="scope.row.label"
                              :autosize="{ minRows: 8, maxRows: 13 }" 
                              placeholder="Please input"
                              />
                        </template>
                    </el-table-column>
                    <el-table-column label="操作" width="80">
                        <template #default="scope">
                            <div class="flex justify-space-between mb-4 flex-wrap gap-4">
                                <a @click="PlayTable(scope.row.wavId)"><el-icon><VideoPlay /></el-icon></a>
                                <a>&#12288</a>
                                <a @click="delWav(scope.row.wavId)"><el-icon><DeleteFilled /></el-icon></a>
                            </div>
                        </template>
                    </el-table-column>
                    <el-table-column fixed="right" label="选择" width="70">
                        <template #default="scope">
                            <el-switch v-model="scope.row.status"  @click="choseWav(scope.row.wavId)"/>
                        </template>
                    </el-table-column>
                </el-table>
                </div>
            </el-col>
            <el-col :span="8"><div class="grid-content ep-bg-purple" />
                <el-space direction="vertical">
                    <el-card class="box-card" style="width: 250px; height:310px">
                        <template #header>
                            <div class="card-header">
                            <span>功能选择</span>
                            </div>
                        </template>  
                        <el-radio-group v-model="funcMode">
                          <el-radio label="1" size="middle" border style="margin-bottom: 10px">个性化语音合成</el-radio>
                            <el-input
                              v-if="funcMode === '1'"
                              v-model="ttsText"
                              :autosize="{ minRows: 2, maxRows: 2 }"
                              type="textarea"
                              placeholder="Please input"
                              style="margin-bottom: 10px"
                              />
                          <el-radio label="2" size="middle" border style="margin-bottom: 10px">跨语言语音合成</el-radio>
                            <el-input
                              v-if="funcMode === '2'"
                              v-model="ttsText"
                              :autosize="{ minRows: 2, maxRows: 2 }"
                              type="textarea"
                              placeholder="Please input"
                              style="margin-bottom: 10px"
                              />
                          <el-radio label="3" size="middle" border style="margin-bottom: 10px">语音编辑</el-radio>
                            <el-input
                                v-if="funcMode === '3'"
                                v-model="ttsText"
                                :autosize="{ minRows: 2, maxRows: 2 }"
                                type="textarea"
                                placeholder="Please input"
                                style="margin-bottom: 10px"
                                />
                        </el-radio-group>
                    </el-card>                    
                </el-space>
            </el-col>
            <el-col :span="4"><div class="grid-content ep-bg-purple" />
                <div class="play_board">
                    <el-space direction="vertical">
                        <el-row :gutter="20">
                            <el-button size="large" v-if="onSyn === 0" type="primary" @click="SatSyn()">开始合成</el-button>
                            <el-button size="large" v-else :loading-icon="Eleme" type="danger">合成中</el-button>
                        </el-row>
                        <el-row :gutter="20">
                            <el-button v-if='this.cloneWav' type="success" @click="PlaySyn()">播放</el-button>
                            <el-button v-else disabled type="primary" @click="PlaySyn()">播放</el-button>
                            <el-button v-if='this.cloneWav' type="primary" @click="downLoadCloneWav()">下载</el-button>
                            <el-button v-else disabled type="primary" @click="downLoadCloneWav()">下载</el-button>
                        </el-row>
                    </el-space>
                </div>
            </el-col>
        </el-row>
 </div>
 </template>
 <script>
 import { vcCloneSAT, vcDownload, vcDownloadBase64, satUpload, satList, vcDel } from '../../../api/ApiVC'
 import Recorder from 'js-audio-recorder'
 let audioCtx = new AudioContext({
 latencyHint: 'interactive',
 sampleRate: 24000,
 });
 // 初始化录音
 const recorder = new Recorder({
  sampleBits: 16,                 // 采样位数，支持 8 或 16，默认是16
  sampleRate: 16000,              // 采样率，支持 11025、16000、22050、24000、44100、48000，根据浏览器默认值，我的chrome是48000
  numChannels: 1,                 // 声道，支持 1 或 2， 默认是1
  compiling: true
 })
 export default {
 name:"",
 data(){
    return {
        uploadStatus : 0,
        recognitionStatus : 0,
        asrResult : "",
        indicator : "",
        filename: "",
        upfile: "",
        mode: 1,
        language: 1,
        wav_input: "卡尔普陪外孙玩滑梯",
        new_input: "卡尔普陪外孙打滑梯",
        received_file:"",
        // 分割线
        onEnrollRec: 0,
        onSyn:0,
        vcDatas: [],
        funcMode: '1',
        selected_Id: -1,
        ttsText: '',
        cloneWav: '',
        wav:''
    }
 },
 mounted () {
        this.GetList()
    },
 methods:{
    // 获取文件列表
    async GetList(){
            this.vcDatas =[]
            const result = await satList();
            console.log("List: ", result);
            for(let i=0; i < result.data.result.length; i++){
                this.vcDatas.push({
                    wavName: result.data.result[i]['name'],
                    wavId: i,
                    wavPath: result.data.result[i]['path'],
                    status: false,
                    label: result.data.result[i]['label']
                })
            }
            console.log("vcDatas: ", this.vcDatas);
            this.$nextTick(()=>{})
    },
    // 上传文件切换
    async handleChange(file, fileList){
      for(let i=0; i<fileList.length; i++){
        this.uploadFile(fileList[i])
      }
      this.GetList()
    },
    async uploadFile(file){
      let formData = new FormData();
      formData.append('files', file.raw);
      const result = await satUpload(formData);
      if (result.data.code === 0) {
          this.$message.success("音频上传成功")
      } else {
          this.$message.error("音频上传失败")
      }
    },
    // 开始录音
    startRecorderEnroll(){
            this.onEnrollRec = 1
            recorder.clear()
            recorder.start()
        },
    // 结束录音
    stopRecorderEnroll(){
        this.onEnrollRec = 2
        recorder.stop()
        this.wav = recorder.getWAVBlob()
    },
    // 上传录音
    async uploadRecord(){
            this.onEnrollRec = 0
            if(this.wav === ""){
                this.$message.error("未检测到录音，录音失败，请重新录制")
                return
            } else {
                if(this.wav === ''){
                    this.$message.error("请先完成录音");
                    this.onEnrollRec = 0
                    return
                } else {
                    let formData = new FormData();
                    formData.append('files', this.wav);
                    const result = await satUpload(formData);
                    console.log(result)
                    this.GetList() 
                }
                this.$message.success("录音上传成功")
            }
        }, 
    // 删除音频文件
    async delWav(wavId){
            console.log('wavId', wavId)
            // 删除文件
            const result = await vcDel(
                {
                  wavName: this.vcDatas[wavId]['wavName'],
                  wavPath: this.vcDatas[wavId]['wavPath']
                }
            );
            if(!result.data.code){
                this.$message.success("删除成功")
            } else {
                this.$message.error(result.data.msg)
            }
            this.GetList()
            this.reset()
        },
    // 播放表格
    async PlayTable(wavId){
        this.Play(this.vcDatas[wavId])
    },
    // 播放音频
    async Play(wavBase){
        // 获取音频数据
        const result = await vcDownloadBase64(wavBase);
        // console.log('play result', result)
        if (result.data.code === 0) {
            // base转换二进制数
            let typedArray = this.base64ToUint8Array(result.data.result)
            // 添加wav文件头
            let view = new DataView(typedArray.buffer);
            view = Recorder.encodeWAV(view, 16000, 16000, 1, 16, true);
            // 播放音频
            this.playAudioData(view.buffer);
        };
        },
    // chose wav
    choseWav(wavId){
            this.cloneWav = ''
            this.nowFile = this.vcDatas[wavId].wavName
            this.nowIndex = wavId
            // only wavId is true else false
            for(let i=0; i<this.vcDatas.length; i++){
                if(i==wavId){
                    this.vcDatas[wavId].status = true
                    this.selected_Id = wavId
                    this.ttsText = this.vcDatas[wavId]['label']
                } else {
                    this.vcDatas[i].status = false
                }
            }
            this.$nextTick(()=>{})
        },
    // 播放音频
    playAudioData(wav_buffer){
        audioCtx.decodeAudioData(wav_buffer, buffer => {
            let source = audioCtx.createBufferSource();
            source.buffer = buffer
            source.connect(audioCtx.destination);
            source.start();
        }, function (e) {
        });
    },
    base64ToUint8Array(base64String){
       const padding = '='.repeat((4 - base64String.length % 4) % 4);
        const base64 = (base64String + padding)
            .replace(/-/g, '+')
            .replace(/_/g, '/');
        const rawData = window.atob(base64);
        const outputArray = new Uint8Array(rawData.length);
        for (let i = 0; i < rawData.length; ++i) {
            outputArray[i] = rawData.charCodeAt(i);
        }
        return outputArray; 
    },
    // 检查是否包含中文
    hasChinese(str) {
      return /[\u4E00-\u9FA5]+/g.test(str)
    },
    // SAT合成
    async SatSyn(){
      // 检查 select id
      if(this.selected_Id < 0){
        return this.$message.error("请先选择音频文件！")
      }
      // 检查音频对应的文本
      if(!this.vcDatas[this.selected_Id]['label']){
        return this.$message.error("音频对应文本不可以为空！")
      }
      // 检查待合成文本
      if(!this.ttsText){
        return this.$message.error("合成文本不可以为空！")
      }
      // 合成中
      this.onSyn = 1
      // 重置 clone wav
      this.cloneWav = ""
      const old_str = this.vcDatas[this.selected_Id]['label']
      const new_str = this.ttsText
      let language = ""
      // 包含中文
      if(this.hasChinese(old_str)){
        language = "zh"
      } else{
        language = "en"
      }
      // 功能选择
      let func = ""
      if(this.funcMode === '1') {
        func = "synthesize"
      } else if(this.funcMode === '2'){
        func = "crossclone"
      } else {
        func = "edit"
      }
      let wav_path = this.vcDatas[this.selected_Id]['wavPath']
      let filename = this.vcDatas[this.selected_Id]['wavName']
      const data = {
        old_str: old_str,
        new_str: new_str,
        language: language,
        function: func,
        wav: wav_path,
        filename: filename
      }
      console.log("sat data: ", data)
      // sat 接口
      const result = await vcCloneSAT(data)
      // 合成完成
      this.onSyn = 0
      console.log(result);
      // debugger
      if (result.data.code === 0) {
        this.$message.success(result.data.message)
        // 获取识别文本
        this.cloneWav = result.data.result
        console.log("cloneWave", this.cloneWav);
      } else {
        this.$message.error(result.data.message)
      };
    },
    // 播放合成的音频
    // 播放音频
    async PlaySyn(){
        // 获取音频数据
        const data = {
          wavName: "sat_"+this.filename,
          wavPath: this.cloneWav
        }
        const result = await vcDownloadBase64(data);
        // console.log('play result', result)
        if (result.data.code === 0) {
            // base转换二进制数
            let typedArray = this.base64ToUint8Array(result.data.result)
            // 添加wav文件头
            let view = new DataView(typedArray.buffer);
            view = Recorder.encodeWAV(view, 16000, 16000, 1, 16, true);
            // 播放音频
            this.playAudioData(view.buffer);
        };
        },
    // 下载合成文件
    async downLoadCloneWav(){
    if(this.cloneWav  === ""){
        this.$message.error("音频合成完毕后再下载！")
    } else {
        // const result = await vcDownload(this.cloneWav);
        // 获取音频数据
        const data = {
          wavName: "sat_"+this.filename,
          wavPath: this.cloneWav
        }
        const result = await vcDownloadBase64(data);
        let view;
        // console.log('play result', result)
        if (result.data.code === 0) {
            // base转换二进制数
            let typedArray = this.base64ToUint8Array(result.data.result)
            // 添加wav文件头
            view = new DataView(typedArray.buffer);
            view = Recorder.encodeWAV(view, 16000, 16000, 1, 16, true);
            // 播放音频
            // this.playAudioData(view.buffer);
        }
        console.log(view.buffer)
        // debugger
        const blob = new Blob([view.buffer], { type: 'audio/wav' });
        const fileName = new Date().getTime() + '.wav';
        const down = document.createElement('a');
        down.download = fileName;
        down.style.display = 'none';//隐藏,没必要展示出来
        down.href = URL.createObjectURL(blob);
        document.body.appendChild(down);
        down.click();
        URL.revokeObjectURL(down.href); // 释放URL 对象
        document.body.removeChild(down);//下载完成移除
      }
    },
 }
 }   
 </script>
 <style lang="less" scoped>
 // @import "./style.less";
 .sat {
    width: 1200px;
    height: 410px;
    background: #FFFFFF;
    padding: 5px 80px 56px 80px;
    box-sizing: border-box;
 }
 .el-row {
  margin-bottom: 20px;
 }
 .grid-content {
  border-radius: 4px;
  min-height: 36px;
 }
 .play_board{
    height: 100%;
    display: flex;
    align-items: center;
 }
 </style>
--- a/demos/speech_web/web_client/src/components/SubMenu/FineTune/FineTune.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/FineTune/FineTune.vue
@ -0,0 +1,427 @@
 <template>
    <div class="finetune">
      <el-row :gutter="20"> 
        <el-col :span="12"><div class="grid-content ep-bg-purple" />
          <el-row :gutter="60" class="btn_row_wav" justify="center">
              <el-button class="ml-3" @click="clearAll()" type="primary">一键重置</el-button>
              <el-button class="ml-3" @click="resetDefault()" type="primary">默认示例</el-button>
              <el-button v-if='onFinetune === 0' class="ml-3" @click="fineTuneModel()" type="primary">一键微调</el-button>
              <el-button v-else-if='onFinetune === 1' class="ml-3" @click="fineTuneModel()" type="danger">微调中</el-button>
              <el-button v-else-if='onFinetune === 2' class="ml-3" @click="resetFinetuneBtn()" type="success">微调成功</el-button>
              <el-button v-else class="ml-3" @click="resetFinetuneBtn()" type="success">微调失败</el-button>
              <!-- <el-button class="ml-3" @click="chooseHistory()" type="warning">历史数据选择</el-button> -->
        </el-row>
        <div class="recording_table">
            <el-table :data="vcDatas" border class="recording_table_box" scrollbar-always-on max-height="250px">
                <el-table-column prop="wavId" label="序号" width="60"/>
                <el-table-column prop="text" label="文本" />
                <el-table-column label="音频" width="80">
                    <template #default="scope">
                        <a v-if="scope.row.wavPath != ''">{{ scope.row.wavName }}</a>
                        <a v-else>
                            <el-button class="ml-3" v-if="onEnrollRec === 0" @click="startRecorderEnroll()" type="primary" circle>
                                <el-icon><Microphone /></el-icon>
                            </el-button>
                            <el-button class="ml-3" v-else-if="onEnrollRec === 1" @click="stopRecorderEnroll()" type="danger" circle>
                                <el-icon><Microphone /></el-icon>
                            </el-button>
                            <el-button class="ml-3" v-else @click="uploadRecord(scope.row.wavId)" type="success" circle>
                                <el-icon><Upload /></el-icon>
                            </el-button>
                        </a>
                    </template>
                </el-table-column>
                <el-table-column label="操作" width="80" fixed="right">
                    <template #default="scope">
                        <div class="flex justify-space-between mb-4 flex-wrap gap-4">
                            <a @click="PlayTable(scope.row.wavId)"><el-icon><VideoPlay /></el-icon></a>
                            <a>&#12288</a>
                            <a @click="delWav(scope.row.wavId)"><el-icon><DeleteFilled /></el-icon></a>
                        </div>
                    </template>
                </el-table-column>
            </el-table>
        </div>
            </el-col>
            <el-col :span="8"><div class="grid-content ep-bg-purple" />
                <el-space direction="vertical">
                    <el-card class="box-card" style="width: 250px; height:310px">
                        <template #header>
                            <div class="card-header">
                                <span>试验路径</span>
                                <el-input
                                    v-model="expPath"
                                    :autosize="{ minRows: 2, maxRows: 3 }"
                                    type="textarea"
                                    placeholder="一键微调自动生成，可使用历史试验路径"
                                    />
                            </div>
                        </template>
                        <span>请输入中文文本</span>
                        <el-input
                            v-model="ttsText"
                            :autosize="{ minRows: 5, maxRows: 6 }"
                            type="textarea"
                            placeholder="请输入待合成文本"
                            />
                    </el-card>                    
                </el-space>
            </el-col>
            <el-col :span="4"><div class="grid-content ep-bg-purple" />
                <div class="play_board">
                    <el-space direction="vertical">
                        <el-row :gutter="20">
                            <el-button size="large" v-if="onSyn === 0" type="primary" @click="fineTuneSyn()">开始合成</el-button>
                            <el-button size="large" v-else :loading-icon="Eleme" type="danger">合成中</el-button>
                        </el-row>
                        <el-row :gutter="20">
                            <el-button v-if='this.cloneWav' type="success" @click="PlaySyn()">播放</el-button>
                            <el-button v-else disabled type="primary" @click="PlaySyn()">播放</el-button>
                            <el-button v-if='this.cloneWav' type="primary" @click="downLoadCloneWav()">下载</el-button>
                            <el-button v-else disabled type="primary" @click="downLoadCloneWav()">下载</el-button>
                        </el-row>
                    </el-space>
                </div>
            </el-col>
        </el-row>
    </div>
    </template>
    <script>
    import Recorder from 'js-audio-recorder'
    import { vcDownload, vcDownloadBase64, vcCloneFineTune, vcCloneFineTuneSyn, fineTuneList, vcDel, fineTuneUpload, fineTuneNewDir } from '../../../api/ApiVC';
    // 初始化录音
    const recorder = new Recorder({
      sampleBits: 16,                 // 采样位数，支持 8 或 16，默认是16
      sampleRate: 16000,              // 采样率，支持 11025、16000、22050、24000、44100、48000，根据浏览器默认值，我的chrome是48000
      numChannels: 1,                 // 声道，支持 1 或 2， 默认是1
      compiling: true
    })
    // 初始化播放器
    const audioCtx = new AudioContext({
        latencyHint: 'interactive',
        sampleRate: 16000,
    });
    function blobToDataURL(blob, callback) {
        let a = new FileReader();
        a.onload = function (e) { callback(e.target.result); }
        a.readAsDataURL(blob);
    }
    export default {
        data(){
            return {
              vcDatas:[],
              defaultDataPath: 'default',
              nowDataPath: '',
              expPath: '',
              wav: '',
              wav_base64: '',
              ttsText: '欢迎使用飞桨语音套件',
              cloneWav: '',
              onEnrollRec: 0,  // 录音状态
              onFinetune: 0,  // 微调状态
              onSyn: 0, // 合成状态
            }
        },
        mounted () {
            this.nowDataPath = this.defaultDataPath
            this.GetList()
        },
        methods: {
            // 重置 btn 
            resetFinetuneBtn(){
                this.onFinetune = 0
            },
        // 一键重置
        async clearAll(){
            this.vcDatas = []
            const result = await fineTuneNewDir()
            console.log("clearALL: ", result.data.result);
            this.nowDataPath = result.data.result
            this.expPath = ''
            this.onFinetune = 0
            await this.GetList()
        },
        // 显示默认
        async resetDefault(){
            this.nowDataPath = this.defaultDataPath
            await this.GetList()
            this.expPath = ''
        },
        // 开始录音
        startRecorderEnroll(){
            this.onEnrollRec = 1
            recorder.clear()
            recorder.start()
        },
        // 结束录音
        stopRecorderEnroll(){
            this.onEnrollRec = 2
            recorder.stop()
            this.wav = recorder.getWAVBlob()
        },
        // 上传录音
        async uploadRecord(wavId){
            this.onEnrollRec = 0
            if(this.wav === ""){
                this.$message.error("未检测到录音，录音失败，请重新录制")
                return
            } else {
                if(this.wav === ''){
                    this.$message.error("请先完成录音");
                    this.onEnrollRec = 0
                    return
                } else {
                    let fileRes = ""
                    let fileString = ""
                    fileRes = await this.readFile(this.wav);
                    fileString = fileRes.result;
                    const audioBase64type = (fileString.match(/data:[^;]*;base64,/))?.[0] ?? '';
                    const isBase64 = !!fileString.match(/data:[^;]*;base64,/);
                    const uploadBase64 = fileString.substr(audioBase64type.length);
                    // 上传时指定文件路径
                    const data = {
                        'wav': uploadBase64,
                        'filename': this.vcDatas[wavId]['wavName'],
                        'wav_path': this.nowDataPath
                    }
                    const result = await fineTuneUpload(data);
                    console.log(result)
                    this.GetList() 
                }
                this.$message.success("录音上传成功")
            }
        }, 
        // 读取文件和Blob
        readFile(file) {
            return new Promise((resolve, reject) => {
                const fileReader = new FileReader();
                fileReader.onload = function () {
                    resolve(fileReader);
                };
                fileReader.onerror = function (err) {
                    reject(err);
                };
                fileReader.readAsDataURL(file);
                });
            },
            // 获取文件列表
          async GetList(){
            this.vcDatas = []
            const result = await fineTuneList({
              dataPath: this.nowDataPath
            });
            console.log(result, result.data.result);
            for(let i=0; i<result.data.result.length; i++){
                this.vcDatas.push({
                  wavId: i,
                  text: result.data.result[i]['text'],
                  wavName: result.data.result[i]['name'],
                  wavPath: result.data.result[i]['path'],
                })
            }
            this.$nextTick(()=>{})
          },
                  // 播放音频
    playAudioData( wav_buffer ) {
        audioCtx.decodeAudioData(wav_buffer, buffer => {
            var source = audioCtx.createBufferSource();
            source.buffer = buffer;
            source.connect(audioCtx.destination);
            source.start();
        }, function(e) {
            Recorder.throwError(e);
            })
    },
        // base64解码
        base64ToUint8Array(base64String) {
        const padding = '='.repeat((4 - base64String.length % 4) % 4);
        const base64 = (base64String + padding)
                        .replace(/-/g, '+')
                        .replace(/_/g, '/');
        const rawData = window.atob(base64);
        const outputArray = new Uint8Array(rawData.length);
        for (let i = 0; i < rawData.length; ++i) {
                outputArray[i] = rawData.charCodeAt(i);
        }
        return outputArray;
    },
            // 播放表格
        async PlayTable(wavId){
            this.Play(this.vcDatas[wavId])
        },
        // 播放合成后的音频
        async PlaySyn(){
            if(this.cloneWav  === ""){
                this.$message.error("请合成音频后再播放！！")
                return
            } else {
                this.Play(this.cloneWav)
            }
        },
        // 播放音频
        async Play(wavBase){
                // 获取音频数据
                const result = await vcDownloadBase64(wavBase);
                // console.log('play result', result)
                if (result.data.code === 0) {
                    // base转换二进制数
                    let typedArray = this.base64ToUint8Array(result.data.result)
                    // 添加wav文件头
                    let view = new DataView(typedArray.buffer);
                    view = Recorder.encodeWAV(view, 16000, 16000, 1, 16, true);
                    // 播放音频
                    this.playAudioData(view.buffer);
                } else {
                    this.$message.error("获取音频文件失败")
                }
        },
                // 下载合成文件
        async downLoadCloneWav(){
            if(this.cloneWav  === ""){
                this.$message.error("音频合成完毕后再下载！")
            } else {
                // const result = await vcDownload(this.cloneWav);
                // 获取音频数据
                const result = await vcDownloadBase64(this.cloneWav);
                let view;
                // console.log('play result', result)
                if (result.data.code === 0) {
                    // base转换二进制数
                    let typedArray = this.base64ToUint8Array(result.data.result)
                    // 添加wav文件头
                    view = new DataView(typedArray.buffer);
                    view = Recorder.encodeWAV(view, 16000, 16000, 1, 16, true);
                    // 播放音频
                    // this.playAudioData(view.buffer);
                }
                console.log(view.buffer)
                // debugger
                const blob = new Blob([view.buffer], { type: 'audio/wav' });
                const fileName = new Date().getTime() + '.wav';
                const down = document.createElement('a');
                down.download = fileName;
                down.style.display = 'none';//隐藏,没必要展示出来
                down.href = URL.createObjectURL(blob);
                document.body.appendChild(down);
                down.click();
                URL.revokeObjectURL(down.href); // 释放URL 对象
                document.body.removeChild(down);//下载完成移除
            }
        },
        // 删除音频文件
        async delWav(wavId){
            if(this.nowDataPath === this.defaultDataPath){
                this.$message.error("默认音频不允许删除，可以一键重置，重新录音")
                return 
            }
            console.log('wavId', wavId)
            // 删除文件
            const result = await vcDel(
                {
                    wavName: this.vcDatas[wavId]['wavName'],
                    wavPath: this.vcDatas[wavId]['wavPath']
                }
            );
            if(!result.data.code){
                this.$message.success("删除成功")
                this.GetList()
            } else {
                this.$message.error("文件删除失败")
            }
        }, 
        // 微调模型
        async fineTuneModel(){
            // 先检查是否都有录音
            for(let i=0; i < this.vcDatas.length; i++){
                if(this.vcDatas['wavPath'] === ''){
                    return this.$message.error("还有录音未完成，请先完成录音！")
                }
            }
            this.onFinetune = 1
            const result = await vcCloneFineTune(
                {
                    wav_path: this.nowDataPath,
                }
            );
            if(!result.data.code){
                this.onFinetune = 2
                this.expPath = result.data.result
                console.log("this.expPath: ", this.expPath)
                this.$message.success("小数据微调成功")
            } else {
                this.onFinetune = 3
                this.$message.error(result.data.msg)
            }
        },
        // 合成音频
        async fineTuneSyn(){
            if(!this.expPath){
                return this.$message.error("请先微调生成模型后再生成！")
            }
            // 合成
            this.onSyn = 1
            const result = await vcCloneFineTuneSyn(
                {
                    exp_path: this.expPath,
                    text: this.ttsText
                }
            );
            this.onSyn = 0
            if(!result.data.code){
                this.cloneWav = result.data.result
                console.log("clone wav: ", this.cloneWav)
                this.$message.success("音色克隆成功")
            } else {
                this.$message.error(result.data.msg)
            }
            this.$nextTick(()=>{})
        }
 },
 };
 </script>
 <style lang="less" scoped>
 // @import "./style.less";
 .finetune {
  width: 1200px;
  height: 410px;
  background: #FFFFFF;
  padding: 5px 80px 56px 80px;
  box-sizing: border-box;
 }
 .el-row {
  margin-bottom: 20px;
 }
 .grid-content {
  border-radius: 4px;
  min-height: 36px;
 }
 .play_board{
    height: 100%;
    display: flex;
    align-items: center;
 }
 </style>
--- a/demos/speech_web/web_client/src/components/SubMenu/IE/IE.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/IE/IE.vue
@ -1,125 +0,0 @@
 <template>
    <div class="iebox">
        <h1>信息抽取体验</h1>
        <el-button :type="recoType" @click="startRecorder()"  style="margin:1vw;">{{ recoText }}</el-button>
        <h3>识别结果: {{ asrResultOffline }}</h3>
        <h4>时间：{{ time }}</h4>
        <h4>出发地：{{ outset }}</h4>
        <h4>目的地：{{ destination }}</h4>
        <h4>费用：{{ amount }}</h4>
    </div>
 </template>
 <script>
 import Recorder from 'js-audio-recorder'
 const recorder = new Recorder({
  sampleBits: 16,                 // 采样位数，支持 8 或 16，默认是16
  sampleRate: 16000,              // 采样率，支持 11025、16000、22050、24000、44100、48000，根据浏览器默认值，我的chrome是48000
  numChannels: 1,                 // 声道，支持 1 或 2， 默认是1
  compiling: true
 })
    export default {
        name: "IE",
        data(){
            return {
                streamAsrResult: '',
                recoType: "primary",
                recoText: "开始录音",
                playType: "success",
                asrResultOffline: '',
                onReco: false,
                ws:'',
                time: '',
                outset: '',
                destination: '',
                amount: ''
            }
        },
        methods: {
            startRecorder () {
                if(!this.onReco){
                    recorder.clear()
                    recorder.start().then(() => {
                    }, (error) => {
                    console.log("录音出错");
                })
                this.onReco = true
                this.recoType = "danger"
                this.recoText = "结束录音"
                this.time = ''
                this.outset=''
                this.destination = ''
                this.amount = ''
                this.$nextTick(()=>{
                })
                } else {
                // 结束录音
                    recorder.stop()
                    this.onReco = false
                    this.recoType = "primary"
                    this.recoText = "开始录音"
                    this.$nextTick(()=>{})
                    // 音频导出成wav,然后上传到服务器
                    const wavs = recorder.getWAVBlob()
                    this.uploadFile(wavs, "/api/asr/offline")
                }
            },
            async uploadFile(file, post_url){
                const formData = new FormData()
                formData.append('files', file)
                const result = await this.$http.post(post_url, formData);
                if (result.data.code === 0) {
                    this.asrResultOffline = result.data.result
                    this.$nextTick(()=>{})
                    this.$message.success(result.data.message);
                    this.informationExtract()
                } else {
                    this.$message.error(result.data.message);
                }
            },
            async informationExtract(){
                const postdata = {
                    chat: this.asrResultOffline
                }
                const result = await this.$http.post('/api/nlp/ie', postdata)
                console.log("ie", result)
                                if(result.data.result[0]['时间']){
                    this.time = result.data.result[0]['时间'][0]['text']
                }
                if(result.data.result[0]['出发地']){
                    this.outset = result.data.result[0]['出发地'][0]['text']
                }
                if(result.data.result[0]['目的地']){
                    this.destination = result.data.result[0]['目的地'][0]['text']
                }
                if(result.data.result[0]['费用']){
                    this.amount = result.data.result[0]['费用'][0]['text']
                }
            }
        },
    }
 </script>
 <style lang="less" scoped>
 .iebox {
  border: 4px solid #F00;
  top:80%;
  width: 100%;
  height: 20%;
  overflow: auto;
 }
 </style>
--- a/demos/speech_web/web_client/src/components/SubMenu/TTS/TTST.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/TTS/TTST.vue
@ -228,6 +228,10 @@ export default {
        },
        // 基于WS的流式合成
        async getTtsChunkWavWS(){
            if(this.ws.readyState != this.ws.OPEN){
                this.$message.error("websocket 链接失败，请检查 Websocket 后端服务是否正确开启")
                return
            }
            // 初始化 chunks
            chunks = []
            chunk_index = 0
--- a/demos/speech_web/web_client/src/components/SubMenu/VPR/VPR.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/VPR/VPR.vue
@ -1,178 +0,0 @@
 <template>
 <div class="vprbox">
        <div>
      <h1>声纹识别展示</h1>
    <el-input
      v-model="spk_id"
      class="w-50 m-2"
      size="large"
      placeholder="spk_id"
    />
    <el-button :type="recoType" @click="startRecorder()"  style="margin:1vw;">{{ recoText }}</el-button>
    <el-button type="primary" @click="Enroll(spk_id)"  style="margin:1vw;"> 注册 </el-button>
    <el-button type="primary" @click="Recog()"  style="margin:1vw;"> 识别 </el-button>
    </div>
    <div>
        <h2>声纹得分结果</h2>
        <el-table :data="score_result" style="width: 40%">
            <el-table-column prop="spkId" label="spk_id" />
            <el-table-column prop="score" label="score" />
        </el-table>
    </div>
    <div>
        <h2>声纹数据列表</h2>
        <el-table :data="vpr_datas" style="width: 40%">
            <el-table-column prop="spkId" label="spk_id" />
            <el-table-column label="wav">
                <template #default="scope2">
                    <audio :src="'/VPR/vpr/data/?vprId='+scope2.row.vprId" controls>
                    </audio>
                </template>
            </el-table-column>
            <el-table-column fixed="right" label="Operations">
                <template #default="scope">
                    <el-button @click="Del(scope.row.spkId)" type="text" size="small">Delete</el-button>
                </template>
            </el-table-column>
        </el-table>
    </div>
 </div>
 </template>
 <script>
 import Recorder from 'js-audio-recorder'
 const recorder = new Recorder({
  sampleBits: 16,                 // 采样位数，支持 8 或 16，默认是16
  sampleRate: 16000,              // 采样率，支持 11025、16000、22050、24000、44100、48000，根据浏览器默认值，我的chrome是48000
  numChannels: 1,                 // 声道，支持 1 或 2， 默认是1
  compiling: true
 })
    export default {
        name: "VPR",
        data () {
            return {
                url_enroll: '/VPR/vpr/enroll', //注册
                url_recog: '/VPR/vpr/recog',  //识别
                url_del: '/VPR/vpr/del',    // 删除
                url_list: '/VPR/vpr/list',   // 获取列表
                url_data: '/VPR/vpr/data',   // 获取音频
                spk_id: 'sss',
                onRecord: false,
                recoType: "primary",
                recoText: "开始录音",
                wav: '',
                score_result: [],
                vpr_datas: []
            }
        },
        mounted () {
            this.GetList()
        },
        methods: {
            startRecorder () {
                this.score_result = []
                if(!this.onReco){
                        recorder.start().then(() => {
                    }, (error) => {
                    console.log("录音出错");
                })
                this.onReco = true
                this.recoType = "danger"
                this.recoText = "结束录音"
                this.$nextTick(()=>{
                })
                } else {
                // 结束录音
                    recorder.stop()
                    this.onReco = false
                    this.recoType = "primary"
                    this.recoText = "开始录音"
                    this.$nextTick(()=>{})
                    // 音频导出成wav,然后上传到服务器
                    this.wav = recorder.getWAVBlob()
                }
            },
            async Enroll(spk_id){
                if(this.wav === ''){
                    this.$message.error("请先完成录音");
                    return
                }
                let formData = new FormData()
                formData.append('spk_id', this.spk_id)
                formData.append('audio', this.wav)
                console.log("formData", formData)
                console.log("spk_id", this.spk_id)
                const result = await this.$http.post(this.url_enroll, formData);
                if(result.data.status){
                    this.$message.success("声纹注册成功")
                } else {
                    this.$message.error(result.data.msg)
                }
                console.log(result)
                this.GetList()
            },
            async Recog(){
                this.score_result = []
                if(this.wav === ''){
                    this.$message.error("请先完成录音");
                    return
                }
                let formData = new FormData()
                formData.append('audio', this.wav)
                const result = await this.$http.post(this.url_recog, formData);
                console.log(result)
                result.data.forEach(dat => {
                    this.score_result.push({
                        spkId: dat[0],
                        score: dat[1][1]
                    })
                });
            },
            async Del(spkId){
                console.log('spkId', spkId)
                // 删除用户
                const result = await this.$http.post(this.url_del, {spk_id: spkId});
                if(result.data.status){
                    this.$message.success("删除成功")
                } else {
                    this.$message.error(result.data.msg)
                }
                this.GetList()
            },
            async GetList(){
                this.vpr_datas =[]
                const result = await this.$http.get(this.url_list);
                console.log("list", result)
                for(let i=0; i<result.data[0].length; i++){
                    this.vpr_datas.push({
                        spkId: result.data[0][i],
                        vprId: result.data[1][i]
                    })
                }
                this.$nextTick(()=>{})
            },
            GetData(){},
        },
    }
 </script>
 <style lang='less' scoped>
 .vprbox {
  border: 4px solid #F00;
 //   position: fixed;
  top:60%;
  width: 100%;
  height: 20%;
  overflow: auto;
 }
 </style>
--- a/demos/speech_web/web_client/src/components/SubMenu/VPR/VPRT.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/VPR/VPRT.vue
@ -216,12 +216,15 @@ export default {
                formData.append('audio', this.wav)
                const result = await vprEnroll(formData)
                if (!result){
                    this.$message.error("请检查后端服务是否正确开启")
                    return 
                }
                if(result.data.status){
                    this.$message.success("声纹注册成功")
                } else {
                    this.$message.error(result.data.msg)
                }
                // console.log(result)
                this.GetList()
                this.wav = ''
                this.randomSpkId()
--- a/demos/speech_web/web_client/src/components/SubMenu/VoiceClone/VoiceClone.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/VoiceClone/VoiceClone.vue
@ -0,0 +1,380 @@
 <template>
    <div class="voiceclone">
        <el-row :gutter="20">
            <el-col :span="12"><div class="grid-content ep-bg-purple" />
                <el-row :gutter="60" class="btn_row_wav" justify="center">
                    <el-button class="ml-3" v-if="onEnrollRec === 0" @click="startRecorderEnroll()" type="primary">录制音频</el-button>
                    <el-button class="ml-3" v-else-if="onEnrollRec === 1" @click="stopRecorderEnroll()" type="danger">停止录音</el-button>
                    <el-button class="ml-3" v-else @click="uploadRecord()" type="success">上传录音</el-button>
                    <a>&#12288</a>
                    <el-upload
                        :multiple="false"
                        :accept="'.wav'"
                        :auto-upload="false"
                        :on-change="handleChange"
                        :show-file-list="false"
                    >
                        <el-button class="ml-3" type="success">上传音频文件</el-button>
                    </el-upload>
                </el-row>
                <div class="recording_table">
                <el-table :data="vcDatas" border class="recording_table_box" scrollbar-always-on max-height="250px">
                    <el-table-column prop="wavId" label="序号" width="60"/>
                    <el-table-column prop="wavName" label="文件名" />
                    <el-table-column label="操作" width="80">
                        <template #default="scope">
                            <div class="flex justify-space-between mb-4 flex-wrap gap-4">
                                <a @click="PlayTable(scope.row.wavId)"><el-icon><VideoPlay /></el-icon></a>
                                <a>&#12288</a>
                                <a @click="delWav(scope.row.wavId)"><el-icon><DeleteFilled /></el-icon></a>
                            </div>
                        </template>
                    </el-table-column>
                    <el-table-column fixed="right" label="选择" width="70">
                        <template #default="scope">
                            <el-switch v-model="scope.row.status"  @click="choseWav(scope.row.wavId)"/>
                        </template>
                    </el-table-column>
                </el-table>
                </div>
            </el-col>
            <el-col :span="8"><div class="grid-content ep-bg-purple" />
                <el-space direction="vertical">
                    <el-card class="box-card" style="width: 250px; height:310px">
                        <template #header>
                            <div class="card-header">
                            <span>请输入中文文本</span>
                            </div>
                        </template>
                        <div class="mb-2 flex items-center text-sm">
                            <el-radio-group v-model="func_radio" class="ml-4">
                            <el-radio label="1" size="large">GE2E</el-radio>
                            <el-radio label="2" size="large">ECAPA-TDNN</el-radio>
                            </el-radio-group>
                        </div>
                        <el-input
                            v-model="ttsText"
                            :autosize="{ minRows: 8, maxRows: 13 }"
                            type="textarea"
                            placeholder="Please input"
                            />
                    </el-card>                    
                </el-space>
            </el-col>
            <el-col :span="4"><div class="grid-content ep-bg-purple" />
                <div class="play_board">
                    <el-space direction="vertical">
                        <el-row :gutter="20">
                            <el-button size="large" v-if="g2pOnSys === 0" type="primary" @click="g2pClone()">开始合成</el-button>
                            <el-button size="large" v-else :loading-icon="Eleme" type="danger">合成中</el-button>
                        </el-row>
                        <el-row :gutter="20">
                            <el-button v-if='this.cloneWav' type="success" @click="PlaySyn()">播放</el-button>
                            <el-button v-else disabled type="primary" @click="PlaySyn()">播放</el-button>
                            <el-button v-if='this.cloneWav' type="primary" @click="downLoadCloneWav()">下载</el-button>
                            <el-button v-else disabled type="primary" @click="downLoadCloneWav()">下载</el-button>
                        </el-row>
                    </el-space>
                </div>
            </el-col>
        </el-row>
    </div>
 </template>
 <script>
 import Recorder from 'js-audio-recorder'
 import { vcCloneG2P, vcCloneSAT, vcDel, vcUpload, vcList, vcDownload, vcDownloadBase64 } from '../../../api/ApiVC';
 // 初始化录音
 const recorder = new Recorder({
  sampleBits: 16,                 // 采样位数，支持 8 或 16，默认是16
  sampleRate: 16000,              // 采样率，支持 11025、16000、22050、24000、44100、48000，根据浏览器默认值，我的chrome是48000
  numChannels: 1,                 // 声道，支持 1 或 2， 默认是1
  compiling: true
 })
 // 初始化播放器
 const audioCtx = new AudioContext({
    latencyHint: 'interactive',
    sampleRate: 16000,
 });
 export default {
    data(){
         return {
            onEnrollRec: 0,     // 注册录音状态
            wav: '',            // 录音结果
            vcDatas: [],       // 已录制的音频
            nowFile: "",        // 当前选择的音频
            ttsText: "欢迎使用飞桨语音套件",
            nowIndex: -1,
            cloneWav: "",
            g2pOnSys: 0,
            func_radio: '1',
         }
    },
    mounted () {
        this.GetList()
    },
    methods:{
        // 重置
        reset(){
            this.onEnrollRec = 0
            this.wav = ''
            this.vcDatas = []
            this.nowFile = ""
            this.ttsText = "欢迎使用飞桨语音套件"
            this.nowIndex = -1
        },
        // 开始录音
        startRecorderEnroll(){
            this.onEnrollRec = 1
            recorder.clear()
            recorder.start()
        },
        // 结束录音
        stopRecorderEnroll(){
            this.onEnrollRec = 2
            recorder.stop()
            this.wav = recorder.getWAVBlob()
        },
        // chose wav
        choseWav(wavId){
            this.cloneWav = ''
            this.nowFile = this.vcDatas[wavId].wavName
            this.nowIndex = wavId
            // only wavId is true else false
            for(let i=0; i<this.vcDatas.length; i++){
                if(i==wavId){
                    this.vcDatas[wavId].status = true
                } else {
                    this.vcDatas[i].status = false
                }
            }
            this.$nextTick(()=>{})
        },
        // 上传录音
        async uploadRecord(){
            this.onEnrollRec = 0
            if(this.wav === ""){
                this.$message.error("未检测到录音，录音失败，请重新录制")
                return
            } else {
                if(this.wav === ''){
                    this.$message.error("请先完成录音");
                    this.onEnrollRec = 0
                    return
                } else {
                    let formData = new FormData();
                    formData.append('files', this.wav);
                    const result = await vcUpload(formData);
                    console.log(result)
                    this.GetList() 
                }
                this.$message.success("录音上传成功")
            }
        }, 
        // 上传列表改变
        async handleChange(file, fileList){
            for(let i=0; i<fileList.length; i++){
                this.uploadFile(fileList[i])
            } 
        },
        // 上传音频
        async uploadFile(file){
            let formData = new FormData();
            formData.append('files', file.raw);
            const result = await vcUpload(formData);
            if (result.data.code === 0) {
                this.$message.success("音频上传成功")
                this.GetList()
            } else {
                this.$message.error("音频上传失败")
            }
        },
        // 获取文件列表
        async GetList(){
            this.vcDatas =[]
            const result = await vcList();
            for(let i=0; i<result.data.result.length; i++){
                this.vcDatas.push({
                    wavName: result.data.result[i]['name'],
                    wavId: i,
                    wavPath: result.data.result[i]['path'],
                    status: false
                })
            }
            this.$nextTick(()=>{})
        },
        // 删除音频文件
        async delWav(wavId){
            console.log('wavId', wavId)
            // 删除文件
            const result = await vcDel(
                {
                    wavName: this.vcDatas[wavId]['wavName'],
                    wavPath: this.vcDatas[wavId]['wavPath']
                }
            );
            if(!result.data.code){
                this.$message.success("删除成功")
            } else {
                this.$message.error(result.data.msg)
            }
            this.GetList()
            this.reset()
        },
        // 下载合成文件
        async downLoadCloneWav(){
            if(this.cloneWav  === ""){
                this.$message.error("音频合成完毕后再下载！")
            } else {
                // const result = await vcDownload(this.cloneWav);
                // 获取音频数据
                const result = await vcDownloadBase64(this.cloneWav);
                let view;
                // console.log('play result', result)
                if (result.data.code === 0) {
                    // base转换二进制数
                    let typedArray = this.base64ToUint8Array(result.data.result)
                    // 添加wav文件头
                    view = new DataView(typedArray.buffer);
                    view = Recorder.encodeWAV(view, 16000, 16000, 1, 16, true);
                    // 播放音频
                    // this.playAudioData(view.buffer);
                }
                console.log(view.buffer)
                // debugger
                const blob = new Blob([view.buffer], { type: 'audio/wav' });
                const fileName = new Date().getTime() + '.wav';
                const down = document.createElement('a');
                down.download = fileName;
                down.style.display = 'none';//隐藏,没必要展示出来
                down.href = URL.createObjectURL(blob);
                document.body.appendChild(down);
                down.click();
                URL.revokeObjectURL(down.href); // 释放URL 对象
                document.body.removeChild(down);//下载完成移除
            }
        },
        // g2p voice clone
        async g2pClone(){
            if(this.nowIndex === -1){
                return this.$message.error("请先录音并上传，选择音频后再点击合成")
            } else if (this.ttsText === ""){
                return this.$message.error("合成文本不可以为空")
            } else if (this.nowIndex >= this.vcDatas.length){
                return this.$message.error("当前序号不可以超过音频个数")
            }
            this.cloneWav = ""
            let func = ''
            if(this.func_radio === '1'){
                func = 'ge2e'
            } else {
                func = 'ecapa_tdnn'
            }
            console.log('func', func)
            // 合成
            this.g2pOnSys = 1
            const result = await vcCloneG2P(
                {
                    wavName: this.vcDatas[this.nowIndex]['wavName'],
                    wavPath: this.vcDatas[this.nowIndex]['wavPath'],
                    text: this.ttsText,
                    func: func
                }
            );
            this.g2pOnSys = 0
            if(result.data.code == 0){
                this.cloneWav = result.data.result
                console.log("clone wav: ", this.cloneWav)
                this.$message.success("音频合成成功")
            } else {
                this.$message.error("音频合成失败，请检查后台错误后重试！")
            }
        },
        // 播放表格
        async PlayTable(wavId){
            this.Play(this.vcDatas[wavId])
        },
        // 播放合成后的音频
        async PlaySyn(){
            if(this.cloneWav  === ""){
                this.$message.error("请合成音频后再播放！！")
                return
            } else {
                this.Play(this.cloneWav)
            }
        },
        // 播放音频
        async Play(wavBase){
                // 获取音频数据
                const result = await vcDownloadBase64(wavBase);
                // console.log('play result', result)
                if (result.data.code === 0) {
                    // base转换二进制数
                    let typedArray = this.base64ToUint8Array(result.data.result)
                    // 添加wav文件头
                    let view = new DataView(typedArray.buffer);
                    view = Recorder.encodeWAV(view, 16000, 16000, 1, 16, true);
                    // 播放音频
                    this.playAudioData(view.buffer);
                };
        },
        // base64解码
        base64ToUint8Array(base64String) {
            const padding = '='.repeat((4 - base64String.length % 4) % 4);
            const base64 = (base64String + padding)
                            .replace(/-/g, '+')
                            .replace(/_/g, '/');
            const rawData = window.atob(base64);
            const outputArray = new Uint8Array(rawData.length);
            for (let i = 0; i < rawData.length; ++i) {
                    outputArray[i] = rawData.charCodeAt(i);
            }
            return outputArray;
        }, 
        // 播放音频
        playAudioData( wav_buffer ) {
        audioCtx.decodeAudioData(wav_buffer, buffer => {
            var source = audioCtx.createBufferSource();
            source.buffer = buffer;
            source.connect(audioCtx.destination);
            source.start();
        }, function(e) {
            Recorder.throwError(e);
            })
        },
    },
 }
 </script>
 <style lang="less" scoped>
 // @import "./style.less";
 .voiceclone {
    width: 1200px;
    height: 410px;
    background: #FFFFFF;
    padding: 5px 80px 56px 80px;
    box-sizing: border-box;
 }
 .el-row {
  margin-bottom: 20px;
 }
 .grid-content {
  border-radius: 4px;
  min-height: 36px;
 }
 .play_board{
    height: 100%;
    display: flex;
    align-items: center;
 }
 </style>
--- a/demos/speech_web/web_client/src/main.js
+++ b/demos/speech_web/web_client/src/main.js
@ -1,5 +1,6 @@
 import { createApp } from 'vue'
 import ElementPlus from 'element-plus'
 import * as ElementPlusIconsVue from '@element-plus/icons-vue'
 import 'element-plus/dist/index.css'
 import Antd from 'ant-design-vue';
 import 'ant-design-vue/dist/antd.css';
@ -9,5 +10,8 @@ import axios from 'axios'
 const app = createApp(App)
 app.config.globalProperties.$http = axios
 for (const [key, component] of Object.entries(ElementPlusIconsVue)) {
    app.component(key, component)
  }
 app.use(ElementPlus).use(Antd)
 app.mount('#app')
--- a/demos/speech_web/web_client/yarn.lock
+++ b/demos/speech_web/web_client/yarn.lock
@ -44,6 +44,11 @@
  resolved "https://registry.npmmirror.com/@element-plus/icons-vue/-/icons-vue-1.1.4.tgz"
  integrity sha512-Iz/nHqdp1sFPmdzRwHkEQQA3lKvoObk8azgABZ81QUOpW9s/lUyQVUSh0tNtEPZXQlKwlSh7SPgoVxzrE0uuVQ==
 "@element-plus/icons-vue@^2.0.9":
  version "2.0.9"
  resolved "https://registry.npmmirror.com/@element-plus/icons-vue/-/icons-vue-2.0.9.tgz#b7777c57534522e387303d194451d50ff549d49a"
  integrity sha512-okdrwiVeKBmW41Hkl0eMrXDjzJwhQMuKiBOu17rOszqM+LS/yBYpNQNV5Jvoh06Wc+89fMmb/uhzf8NZuDuUaQ==
 "@floating-ui/core@^0.6.1":
  version "0.6.1"
  resolved "https://registry.npmmirror.com/@floating-ui/core/-/core-0.6.1.tgz"
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@ -20,6 +20,7 @@ onnxruntime==1.10.0
 opencc
 paddlenlp
 paddlepaddle>=2.2.2
 paddlespeech_ctcdecoders
 paddlespeech_feat
 pandas
 pathos == 0.2.8
@ -27,8 +28,8 @@ pattern_singleton
 Pillow>=9.0.0
 praatio==5.0.0
 prettytable
 pypinyin<=0.44.0
 pypinyin-dict
 pypinyin<=0.44.0
 python-dateutil
 pyworld==0.2.12
 recommonmark>=0.5.0
--- a/docs/source/api/paddlespeech.cls.exps.panns.deploy.predict.rst
+++ b/docs/source/api/paddlespeech.cls.exps.panns.deploy.predict.rst
@ -1,7 +0,0 @@
 paddlespeech.cls.exps.panns.deploy.predict module
 =================================================
 .. automodule:: paddlespeech.cls.exps.panns.deploy.predict
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.cls.exps.panns.deploy.rst
+++ b/docs/source/api/paddlespeech.cls.exps.panns.deploy.rst
@ -12,4 +12,3 @@ Submodules
 .. toctree::
   :maxdepth: 4
   paddlespeech.cls.exps.panns.deploy.predict
--- a/docs/source/api/paddlespeech.cls.exps.panns.export_model.rst
+++ b/docs/source/api/paddlespeech.cls.exps.panns.export_model.rst
@ -1,7 +0,0 @@
 paddlespeech.cls.exps.panns.export\_model module
 ================================================
 .. automodule:: paddlespeech.cls.exps.panns.export_model
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.cls.exps.panns.predict.rst
+++ b/docs/source/api/paddlespeech.cls.exps.panns.predict.rst
@ -1,7 +0,0 @@
 paddlespeech.cls.exps.panns.predict module
 ==========================================
 .. automodule:: paddlespeech.cls.exps.panns.predict
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.cls.exps.panns.rst
+++ b/docs/source/api/paddlespeech.cls.exps.panns.rst
@ -20,6 +20,3 @@ Submodules
 .. toctree::
   :maxdepth: 4
   paddlespeech.cls.exps.panns.export_model
   paddlespeech.cls.exps.panns.predict
   paddlespeech.cls.exps.panns.train
--- a/docs/source/api/paddlespeech.cls.exps.panns.train.rst
+++ b/docs/source/api/paddlespeech.cls.exps.panns.train.rst
@ -1,7 +0,0 @@
 paddlespeech.cls.exps.panns.train module
 ========================================
 .. automodule:: paddlespeech.cls.exps.panns.train
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.kws.exps.mdtc.plot_det_curve.rst
+++ b/docs/source/api/paddlespeech.kws.exps.mdtc.plot_det_curve.rst
@ -1,7 +0,0 @@
 paddlespeech.kws.exps.mdtc.plot\_det\_curve module
 ==================================================
 .. automodule:: paddlespeech.kws.exps.mdtc.plot_det_curve
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.kws.exps.mdtc.rst
+++ b/docs/source/api/paddlespeech.kws.exps.mdtc.rst
@ -14,6 +14,5 @@ Submodules
   paddlespeech.kws.exps.mdtc.collate
   paddlespeech.kws.exps.mdtc.compute_det
   paddlespeech.kws.exps.mdtc.plot_det_curve
   paddlespeech.kws.exps.mdtc.score
   paddlespeech.kws.exps.mdtc.train
--- a/docs/source/api/paddlespeech.s2t.decoders.ctcdecoder.rst
+++ b/docs/source/api/paddlespeech.s2t.decoders.ctcdecoder.rst
@ -13,5 +13,4 @@ Submodules
   :maxdepth: 4
   paddlespeech.s2t.decoders.ctcdecoder.decoders_deprecated
   paddlespeech.s2t.decoders.ctcdecoder.scorer_deprecated
   paddlespeech.s2t.decoders.ctcdecoder.swig_wrapper
--- a/docs/source/api/paddlespeech.s2t.decoders.ctcdecoder.scorer_deprecated.rst
+++ b/docs/source/api/paddlespeech.s2t.decoders.ctcdecoder.scorer_deprecated.rst
@ -1,7 +0,0 @@
 paddlespeech.s2t.decoders.ctcdecoder.scorer\_deprecated module
 ==============================================================
 .. automodule:: paddlespeech.s2t.decoders.ctcdecoder.scorer_deprecated
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.decoders.recog_bin.rst
+++ b/docs/source/api/paddlespeech.s2t.decoders.recog_bin.rst
@ -1,7 +0,0 @@
 paddlespeech.s2t.decoders.recog\_bin module
 ===========================================
 .. automodule:: paddlespeech.s2t.decoders.recog_bin
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.decoders.rst
+++ b/docs/source/api/paddlespeech.s2t.decoders.rst
@ -23,5 +23,4 @@ Submodules
   :maxdepth: 4
   paddlespeech.s2t.decoders.recog
   paddlespeech.s2t.decoders.recog_bin
   paddlespeech.s2t.decoders.utils
--- a/docs/source/api/paddlespeech.s2t.decoders.scorers.ngram.rst
+++ b/docs/source/api/paddlespeech.s2t.decoders.scorers.ngram.rst
@ -1,7 +0,0 @@
 paddlespeech.s2t.decoders.scorers.ngram module
 ==============================================
 .. automodule:: paddlespeech.s2t.decoders.scorers.ngram
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.decoders.scorers.rst
+++ b/docs/source/api/paddlespeech.s2t.decoders.scorers.rst
@ -15,5 +15,4 @@ Submodules
   paddlespeech.s2t.decoders.scorers.ctc
   paddlespeech.s2t.decoders.scorers.ctc_prefix_score
   paddlespeech.s2t.decoders.scorers.length_bonus
   paddlespeech.s2t.decoders.scorers.ngram
   paddlespeech.s2t.decoders.scorers.scorer_interface
--- a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.client.rst
+++ b/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.client.rst
@ -1,7 +0,0 @@
 paddlespeech.s2t.exps.deepspeech2.bin.deploy.client module
 ==========================================================
 .. automodule:: paddlespeech.s2t.exps.deepspeech2.bin.deploy.client
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.record.rst
+++ b/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.record.rst
@ -1,7 +0,0 @@
 paddlespeech.s2t.exps.deepspeech2.bin.deploy.record module
 ==========================================================
 .. automodule:: paddlespeech.s2t.exps.deepspeech2.bin.deploy.record
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.rst
+++ b/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.rst
@ -12,8 +12,5 @@ Submodules
 .. toctree::
   :maxdepth: 4
   paddlespeech.s2t.exps.deepspeech2.bin.deploy.client
   paddlespeech.s2t.exps.deepspeech2.bin.deploy.record
   paddlespeech.s2t.exps.deepspeech2.bin.deploy.runtime
   paddlespeech.s2t.exps.deepspeech2.bin.deploy.send
   paddlespeech.s2t.exps.deepspeech2.bin.deploy.server
--- a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.send.rst
+++ b/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.send.rst
@ -1,7 +0,0 @@
 paddlespeech.s2t.exps.deepspeech2.bin.deploy.send module
 ========================================================
 .. automodule:: paddlespeech.s2t.exps.deepspeech2.bin.deploy.send
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.exps.u2.rst
+++ b/docs/source/api/paddlespeech.s2t.exps.u2.rst
@ -21,4 +21,3 @@ Submodules
   :maxdepth: 4
   paddlespeech.s2t.exps.u2.model
   paddlespeech.s2t.exps.u2.trainer
--- a/docs/source/api/paddlespeech.s2t.exps.u2.trainer.rst
+++ b/docs/source/api/paddlespeech.s2t.exps.u2.trainer.rst
@ -1,7 +0,0 @@
 paddlespeech.s2t.exps.u2.trainer module
 =======================================
 .. automodule:: paddlespeech.s2t.exps.u2.trainer
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.exps.u2_kaldi.bin.recog.rst
+++ b/docs/source/api/paddlespeech.s2t.exps.u2_kaldi.bin.recog.rst
@ -1,7 +0,0 @@
 paddlespeech.s2t.exps.u2\_kaldi.bin.recog module
 ================================================
 .. automodule:: paddlespeech.s2t.exps.u2_kaldi.bin.recog
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.exps.u2_kaldi.bin.rst
+++ b/docs/source/api/paddlespeech.s2t.exps.u2_kaldi.bin.rst
@ -12,6 +12,5 @@ Submodules
 .. toctree::
   :maxdepth: 4
   paddlespeech.s2t.exps.u2_kaldi.bin.recog
   paddlespeech.s2t.exps.u2_kaldi.bin.test
   paddlespeech.s2t.exps.u2_kaldi.bin.train
--- a/docs/source/api/paddlespeech.s2t.training.extensions.rst
+++ b/docs/source/api/paddlespeech.s2t.training.extensions.rst
@ -15,5 +15,3 @@ Submodules
   paddlespeech.s2t.training.extensions.evaluator
   paddlespeech.s2t.training.extensions.extension
   paddlespeech.s2t.training.extensions.plot
   paddlespeech.s2t.training.extensions.snapshot
   paddlespeech.s2t.training.extensions.visualizer
--- a/docs/source/api/paddlespeech.s2t.training.extensions.snapshot.rst
+++ b/docs/source/api/paddlespeech.s2t.training.extensions.snapshot.rst
@ -1,7 +0,0 @@
 paddlespeech.s2t.training.extensions.snapshot module
 ====================================================
 .. automodule:: paddlespeech.s2t.training.extensions.snapshot
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.training.extensions.visualizer.rst
+++ b/docs/source/api/paddlespeech.s2t.training.extensions.visualizer.rst
@ -1,7 +0,0 @@
 paddlespeech.s2t.training.extensions.visualizer module
 ======================================================
 .. automodule:: paddlespeech.s2t.training.extensions.visualizer
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.training.updaters.rst
+++ b/docs/source/api/paddlespeech.s2t.training.updaters.rst
@ -13,5 +13,4 @@ Submodules
   :maxdepth: 4
   paddlespeech.s2t.training.updaters.standard_updater
   paddlespeech.s2t.training.updaters.trainer
   paddlespeech.s2t.training.updaters.updater
--- a/docs/source/api/paddlespeech.s2t.training.updaters.trainer.rst
+++ b/docs/source/api/paddlespeech.s2t.training.updaters.trainer.rst
@ -1,7 +0,0 @@
 paddlespeech.s2t.training.updaters.trainer module
 =================================================
 .. automodule:: paddlespeech.s2t.training.updaters.trainer
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.transform.add_deltas.rst
+++ b/docs/source/api/paddlespeech.s2t.transform.add_deltas.rst
@ -1,7 +0,0 @@
 paddlespeech.s2t.transform.add\_deltas module
 =============================================
 .. automodule:: paddlespeech.s2t.transform.add_deltas
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.transform.channel_selector.rst
+++ b/docs/source/api/paddlespeech.s2t.transform.channel_selector.rst
@ -1,7 +0,0 @@
 paddlespeech.s2t.transform.channel\_selector module
 ===================================================
 .. automodule:: paddlespeech.s2t.transform.channel_selector
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.transform.cmvn.rst
+++ b/docs/source/api/paddlespeech.s2t.transform.cmvn.rst
@ -1,7 +0,0 @@
 paddlespeech.s2t.transform.cmvn module
 ======================================
 .. automodule:: paddlespeech.s2t.transform.cmvn
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.transform.functional.rst
+++ b/docs/source/api/paddlespeech.s2t.transform.functional.rst
@ -1,7 +0,0 @@
 paddlespeech.s2t.transform.functional module
 ============================================
 .. automodule:: paddlespeech.s2t.transform.functional
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.transform.perturb.rst
+++ b/docs/source/api/paddlespeech.s2t.transform.perturb.rst
@ -1,7 +0,0 @@
 paddlespeech.s2t.transform.perturb module
 =========================================
 .. automodule:: paddlespeech.s2t.transform.perturb
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.transform.rst
+++ b/docs/source/api/paddlespeech.s2t.transform.rst
@ -1,24 +0,0 @@
 paddlespeech.s2t.transform package
 ==================================
 .. automodule:: paddlespeech.s2t.transform
   :members:
   :undoc-members:
   :show-inheritance:
 Submodules
 ----------
 .. toctree::
   :maxdepth: 4
   paddlespeech.s2t.transform.add_deltas
   paddlespeech.s2t.transform.channel_selector
   paddlespeech.s2t.transform.cmvn
   paddlespeech.s2t.transform.functional
   paddlespeech.s2t.transform.perturb
   paddlespeech.s2t.transform.spec_augment
   paddlespeech.s2t.transform.spectrogram
   paddlespeech.s2t.transform.transform_interface
   paddlespeech.s2t.transform.transformation
   paddlespeech.s2t.transform.wpe
--- a/docs/source/api/paddlespeech.s2t.transform.spec_augment.rst
+++ b/docs/source/api/paddlespeech.s2t.transform.spec_augment.rst
@ -1,7 +0,0 @@
 paddlespeech.s2t.transform.spec\_augment module
 ===============================================
 .. automodule:: paddlespeech.s2t.transform.spec_augment
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.transform.spectrogram.rst
+++ b/docs/source/api/paddlespeech.s2t.transform.spectrogram.rst
@ -1,7 +0,0 @@
 paddlespeech.s2t.transform.spectrogram module
 =============================================
 .. automodule:: paddlespeech.s2t.transform.spectrogram
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.transform.transform_interface.rst
+++ b/docs/source/api/paddlespeech.s2t.transform.transform_interface.rst
@ -1,7 +0,0 @@
 paddlespeech.s2t.transform.transform\_interface module
 ======================================================
 .. automodule:: paddlespeech.s2t.transform.transform_interface
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.transform.transformation.rst
+++ b/docs/source/api/paddlespeech.s2t.transform.transformation.rst
@ -1,7 +0,0 @@
 paddlespeech.s2t.transform.transformation module
 ================================================
 .. automodule:: paddlespeech.s2t.transform.transformation
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.s2t.transform.wpe.rst
+++ b/docs/source/api/paddlespeech.s2t.transform.wpe.rst
@ -1,7 +0,0 @@
 paddlespeech.s2t.transform.wpe module
 =====================================
 .. automodule:: paddlespeech.s2t.transform.wpe
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.server.engine.acs.python.acs_engine.rst
+++ b/docs/source/api/paddlespeech.server.engine.acs.python.acs_engine.rst
@ -1,7 +0,0 @@
 paddlespeech.server.engine.acs.python.acs\_engine module
 ========================================================
 .. automodule:: paddlespeech.server.engine.acs.python.acs_engine
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.server.engine.acs.python.rst
+++ b/docs/source/api/paddlespeech.server.engine.acs.python.rst
@ -12,4 +12,3 @@ Submodules
 .. toctree::
   :maxdepth: 4
   paddlespeech.server.engine.acs.python.acs_engine
--- a/docs/source/api/paddlespeech.server.utils.log.rst
+++ b/docs/source/api/paddlespeech.server.utils.log.rst
@ -1,7 +0,0 @@
 paddlespeech.server.utils.log module
 ====================================
 .. automodule:: paddlespeech.server.utils.log
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.t2s.exps.rst
+++ b/docs/source/api/paddlespeech.t2s.exps.rst
@ -30,10 +30,10 @@ Submodules
   paddlespeech.t2s.exps.inference
   paddlespeech.t2s.exps.inference_streaming
   paddlespeech.t2s.models.vits.monotonic_align
   paddlespeech.t2s.exps.ort_predict
   paddlespeech.t2s.exps.ort_predict_e2e
   paddlespeech.t2s.exps.ort_predict_streaming
   paddlespeech.t2s.exps.stream_play_tts
   paddlespeech.t2s.exps.syn_utils
   paddlespeech.t2s.exps.synthesize
   paddlespeech.t2s.exps.synthesize_e2e
--- a/docs/source/api/paddlespeech.t2s.exps.stream_play_tts.rst
+++ b/docs/source/api/paddlespeech.t2s.exps.stream_play_tts.rst
@ -1,7 +0,0 @@
 paddlespeech.t2s.exps.stream\_play\_tts module
 ==============================================
 .. automodule:: paddlespeech.t2s.exps.stream_play_tts
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.t2s.models.ernie_sat.mlm.rst
+++ b/docs/source/api/paddlespeech.t2s.models.ernie_sat.mlm.rst
@ -1,7 +0,0 @@
 paddlespeech.t2s.models.ernie\_sat.mlm module
 =============================================
 .. automodule:: paddlespeech.t2s.models.ernie_sat.mlm
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.core.rst
+++ b/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.core.rst
@ -1,7 +0,0 @@
 paddlespeech.t2s.models.vits.monotonic\_align.core module
 =========================================================
 .. automodule:: paddlespeech.t2s.models.vits.monotonic_align.core
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.rst
+++ b/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.rst
@ -1,16 +0,0 @@
 paddlespeech.t2s.models.vits.monotonic\_align package
 =====================================================
 .. automodule:: paddlespeech.t2s.models.vits.monotonic_align
   :members:
   :undoc-members:
   :show-inheritance:
 Submodules
 ----------
 .. toctree::
   :maxdepth: 4
   paddlespeech.t2s.models.vits.monotonic_align.core
   paddlespeech.t2s.models.vits.monotonic_align.setup
--- a/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.setup.rst
+++ b/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.setup.rst
@ -1,7 +0,0 @@
 paddlespeech.t2s.models.vits.monotonic\_align.setup module
 ==========================================================
 .. automodule:: paddlespeech.t2s.models.vits.monotonic_align.setup
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/api/paddlespeech.t2s.models.vits.rst
+++ b/docs/source/api/paddlespeech.t2s.models.vits.rst
@ -12,7 +12,6 @@ Subpackages
 .. toctree::
   :maxdepth: 4
   paddlespeech.t2s.models.vits.monotonic_align
   paddlespeech.t2s.models.vits.wavenet
 Submodules
--- a/docs/source/tts/demo.rst
+++ b/docs/source/tts/demo.rst
--- a/docs/source/tts/demo_2.rst
+++ b/docs/source/tts/demo_2.rst
@ -19,7 +19,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>早上好，今天是2020/10/29，最低温度是-3°C。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/001.wav"
                        type="audio/wav">
@ -27,7 +27,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/001.wav"
                        type="audio/wav">
@ -38,7 +38,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>你好，我的编号是37249，很高兴为您服务。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/002.wav"
                        type="audio/wav">
@ -46,7 +46,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/002.wav"
                        type="audio/wav">
@ -57,7 +57,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>我们公司有37249个人。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/003.wav"
                        type="audio/wav">
@ -65,7 +65,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/003.wav"
                        type="audio/wav">
@ -76,7 +76,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>我出生于2005年10月8日。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/004.wav"
                        type="audio/wav">
@ -84,7 +84,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/004.wav"
                        type="audio/wav">
@ -95,7 +95,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>我们习惯在12:30吃中午饭。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/005.wav"
                        type="audio/wav">
@ -103,7 +103,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/005.wav"
                        type="audio/wav">
@ -114,7 +114,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>只要有超过3/4的人投票同意，你就会成为我们的新班长。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/006.wav"
                        type="audio/wav">
@ -122,7 +122,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/006.wav"
                        type="audio/wav">
@ -133,7 +133,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>我要买一只价值999.9元的手表。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/007.wav"
                        type="audio/wav">
@ -141,7 +141,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/007.wav"
                        type="audio/wav">
@ -152,7 +152,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>我的手机号是18544139121，欢迎来电。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/008.wav"
                        type="audio/wav">
@ -160,7 +160,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/008.wav"
                        type="audio/wav">
@ -171,7 +171,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>明天有62%的概率降雨。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/009.wav"
                        type="audio/wav">
@ -179,7 +179,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/009.wav"
                        type="audio/wav">
@ -190,7 +190,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>手表厂有五种好产品。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/010.wav"
                        type="audio/wav">
@ -198,7 +198,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/010.wav"
                        type="audio/wav">
@ -209,7 +209,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>跑马场有五百匹很勇敢的千里马。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/011.wav"
                        type="audio/wav">
@ -217,7 +217,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/011.wav"
                        type="audio/wav">
@ -228,7 +228,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>有一天，我看到了一栋楼，我顿感不妙，因为我看不清里面有没有人。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/012.wav"
                        type="audio/wav">
@ -236,7 +236,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/012.wav"
                        type="audio/wav">
@ -247,7 +247,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>史小姐拿着小雨伞去找她的老保姆了。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/013.wav"
                        type="audio/wav">
@ -255,7 +255,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/013.wav"
                        type="audio/wav">
@ -266,7 +266,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
        <tr>
            <td>不要相信这个老奶奶说的话，她一点儿也不好。</td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/espent/014.wav"
                        type="audio/wav">
@ -274,7 +274,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
                </audio>
            </td>
            <td>
-                <audio controls="controls">
+                <audio controls="controls" style="width: 220px;">
                    <source
                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/014.wav"
                        type="audio/wav">
--- a/examples/aishell/asr0/local/train.sh
+++ b/examples/aishell/asr0/local/train.sh
@ -26,6 +26,10 @@ if [ ${seed} != 0 ]; then
    export FLAGS_cudnn_deterministic=True
 fi
 # default memeory allocator strategy may case gpu training hang
 # for no OOM raised when memory exhaused
 export FLAGS_allocator_strategy=naive_best_fit
 if [ ${ngpu} == 0 ]; then
 python3 -u ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
--- a/examples/aishell/asr1/local/train.sh
+++ b/examples/aishell/asr1/local/train.sh
@ -35,6 +35,10 @@ echo ${ips_config}
 mkdir -p exp
 # default memeory allocator strategy may case gpu training hang
 # for no OOM raised when memory exhaused
 export FLAGS_allocator_strategy=naive_best_fit
 if [ ${ngpu} == 0 ]; then
 python3 -u ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
--- a/examples/aishell3/ernie_sat/README.md
+++ b/examples/aishell3/ernie_sat/README.md
@ -1,4 +1,4 @@
-# ERNIE-SAT with VCTK dataset
+# ERNIE-SAT with AISHELL-3 dataset
 ERNIE-SAT speech-text joint pretraining framework, which achieves SOTA results in cross-lingual multi-speaker speech synthesis and cross-lingual speech editing tasks, It can be applied to a series of scenarios such as Speech Editing, personalized Speech Synthesis, and Voice Cloning.
 ## Model Framework
--- a/examples/aishell3/vc0/local/synthesize.sh
+++ b/examples/aishell3/vc0/local/synthesize.sh
@ -4,8 +4,6 @@ config_path=$1
 train_output_path=$2
 ckpt_name=$3
 FLAGS_allocator_strategy=naive_best_fit \
 FLAGS_fraction_of_gpu_memory_to_use=0.01 \
 python3 ${BIN_DIR}/../synthesize.py \
    --am=tacotron2_aishell3 \
    --am_config=${config_path} \
--- a/examples/aishell3/vc0/local/voice_cloning.sh
+++ b/examples/aishell3/vc0/local/voice_cloning.sh
@ -6,8 +6,6 @@ ckpt_name=$3
 ge2e_params_path=$4
 ref_audio_dir=$5
 FLAGS_allocator_strategy=naive_best_fit \
 FLAGS_fraction_of_gpu_memory_to_use=0.01 \
 python3 ${BIN_DIR}/../voice_cloning.py \
    --am=tacotron2_aishell3 \
    --am_config=${config_path} \
--- a/examples/aishell3/vc1/local/synthesize.sh
+++ b/examples/aishell3/vc1/local/synthesize.sh
@ -4,8 +4,6 @@ config_path=$1
 train_output_path=$2
 ckpt_name=$3
 FLAGS_allocator_strategy=naive_best_fit \
 FLAGS_fraction_of_gpu_memory_to_use=0.01 \
 python3 ${BIN_DIR}/../synthesize.py \
    --am=fastspeech2_aishell3 \
    --am_config=${config_path} \
--- a/examples/aishell3/vc1/local/voice_cloning.sh
+++ b/examples/aishell3/vc1/local/voice_cloning.sh
@ -6,8 +6,6 @@ ckpt_name=$3
 ge2e_params_path=$4
 ref_audio_dir=$5
 FLAGS_allocator_strategy=naive_best_fit \
 FLAGS_fraction_of_gpu_memory_to_use=0.01 \
 python3 ${BIN_DIR}/../voice_cloning.py \
    --am=fastspeech2_aishell3 \
    --am_config=${config_path} \
--- a/examples/aishell3/vc2/local/synthesize.sh
+++ b/examples/aishell3/vc2/local/synthesize.sh
@ -4,8 +4,6 @@ config_path=$1
 train_output_path=$2
 ckpt_name=$3
 FLAGS_allocator_strategy=naive_best_fit \
 FLAGS_fraction_of_gpu_memory_to_use=0.01 \
 python3 ${BIN_DIR}/../synthesize.py \
    --am=fastspeech2_aishell3 \
    --am_config=${config_path} \
--- a/examples/aishell3/vc2/local/voice_cloning.sh
+++ b/examples/aishell3/vc2/local/voice_cloning.sh
@ -5,8 +5,6 @@ train_output_path=$2
 ckpt_name=$3
 ref_audio_dir=$4
 FLAGS_allocator_strategy=naive_best_fit \
 FLAGS_fraction_of_gpu_memory_to_use=0.01 \
 python3 ${BIN_DIR}/../voice_cloning.py \
    --am=fastspeech2_aishell3 \
    --am_config=${config_path} \
--- a/examples/aishell3_vctk/ernie_sat/README.md
+++ b/examples/aishell3_vctk/ernie_sat/README.md
@ -1,4 +1,4 @@
-# ERNIE-SAT with VCTK dataset
+# ERNIE-SAT with AISHELL-3 and VCTK dataset
 ERNIE-SAT speech-text joint pretraining framework, which achieves SOTA results in cross-lingual multi-speaker speech synthesis and cross-lingual speech editing tasks, It can be applied to a series of scenarios such as Speech Editing, personalized Speech Synthesis, and Voice Cloning.
 ## Model Framework
--- a/examples/iwslt2012/punc0/conf/ernie-3.0-base.yaml
+++ b/examples/iwslt2012/punc0/conf/ernie-3.0-base.yaml
@ -0,0 +1,44 @@
 ###########################################################
 #                       DATA SETTING                      #
 ###########################################################
 dataset_type: Ernie
 train_path: data/iwslt2012_zh/train.txt
 dev_path: data/iwslt2012_zh/dev.txt
 test_path: data/iwslt2012_zh/test.txt
 batch_size: 64
 num_workers: 2
 data_params: 
    pretrained_token: ernie-3.0-base-zh
    punc_path: data/iwslt2012_zh/punc_vocab
    seq_len: 100
 ###########################################################
 #                       MODEL SETTING                     #
 ###########################################################
 model_type: ErnieLinear
 model:
    pretrained_token: ernie-3.0-base-zh
    num_classes: 4
 ###########################################################
 #                     OPTIMIZER SETTING                   #
 ###########################################################
 optimizer_params:
    weight_decay: 1.0e-6               # weight decay coefficient.
 scheduler_params:
    learning_rate: 1.0e-5               # learning rate.
    gamma: 0.9999                          # scheduler gamma must between(0.0, 1.0) and closer to 1.0 is better.
 ###########################################################
 #                     TRAINING SETTING                    #
 ###########################################################
 max_epoch: 20
 num_snapshots: 5
 ###########################################################
 #                     OTHER SETTING                       #
 ###########################################################
 num_snapshots: 10                 # max number of snapshots to keep while training
 seed: 42                          # random seed for paddle, random, and np.random
--- a/examples/iwslt2012/punc0/conf/ernie-3.0-medium.yaml
+++ b/examples/iwslt2012/punc0/conf/ernie-3.0-medium.yaml
@ -0,0 +1,44 @@
 ###########################################################
 #                       DATA SETTING                      #
 ###########################################################
 dataset_type: Ernie
 train_path: data/iwslt2012_zh/train.txt
 dev_path: data/iwslt2012_zh/dev.txt
 test_path: data/iwslt2012_zh/test.txt
 batch_size: 64
 num_workers: 2
 data_params: 
    pretrained_token: ernie-3.0-medium-zh
    punc_path: data/iwslt2012_zh/punc_vocab
    seq_len: 100
 ###########################################################
 #                       MODEL SETTING                     #
 ###########################################################
 model_type: ErnieLinear
 model:
    pretrained_token: ernie-3.0-medium-zh
    num_classes: 4
 ###########################################################
 #                     OPTIMIZER SETTING                   #
 ###########################################################
 optimizer_params:
    weight_decay: 1.0e-6               # weight decay coefficient.
 scheduler_params:
    learning_rate: 1.0e-5               # learning rate.
    gamma: 0.9999                          # scheduler gamma must between(0.0, 1.0) and closer to 1.0 is better.
 ###########################################################
 #                     TRAINING SETTING                    #
 ###########################################################
 max_epoch: 20
 num_snapshots: 5
 ###########################################################
 #                     OTHER SETTING                       #
 ###########################################################
 num_snapshots: 10                 # max number of snapshots to keep while training
 seed: 42                          # random seed for paddle, random, and np.random
--- a/examples/iwslt2012/punc0/conf/ernie-3.0-mini.yaml
+++ b/examples/iwslt2012/punc0/conf/ernie-3.0-mini.yaml
@ -0,0 +1,44 @@
 ###########################################################
 #                       DATA SETTING                      #
 ###########################################################
 dataset_type: Ernie
 train_path: data/iwslt2012_zh/train.txt
 dev_path: data/iwslt2012_zh/dev.txt
 test_path: data/iwslt2012_zh/test.txt
 batch_size: 64
 num_workers: 2
 data_params: 
    pretrained_token: ernie-3.0-mini-zh
    punc_path: data/iwslt2012_zh/punc_vocab
    seq_len: 100
 ###########################################################
 #                       MODEL SETTING                     #
 ###########################################################
 model_type: ErnieLinear
 model:
    pretrained_token: ernie-3.0-mini-zh
    num_classes: 4
 ###########################################################
 #                     OPTIMIZER SETTING                   #
 ###########################################################
 optimizer_params:
    weight_decay: 1.0e-6               # weight decay coefficient.
 scheduler_params:
    learning_rate: 1.0e-5               # learning rate.
    gamma: 0.9999                          # scheduler gamma must between(0.0, 1.0) and closer to 1.0 is better.
 ###########################################################
 #                     TRAINING SETTING                    #
 ###########################################################
 max_epoch: 20
 num_snapshots: 5
 ###########################################################
 #                     OTHER SETTING                       #
 ###########################################################
 num_snapshots: 10                 # max number of snapshots to keep while training
 seed: 42                          # random seed for paddle, random, and np.random
--- a/examples/iwslt2012/punc0/conf/ernie-3.0-nano-zh.yaml
+++ b/examples/iwslt2012/punc0/conf/ernie-3.0-nano-zh.yaml
@ -0,0 +1,44 @@
 ###########################################################
 #                       DATA SETTING                      #
 ###########################################################
 dataset_type: Ernie
 train_path: data/iwslt2012_zh/train.txt
 dev_path: data/iwslt2012_zh/dev.txt
 test_path: data/iwslt2012_zh/test.txt
 batch_size: 64
 num_workers: 2
 data_params: 
    pretrained_token: ernie-3.0-nano-zh
    punc_path: data/iwslt2012_zh/punc_vocab
    seq_len: 100
 ###########################################################
 #                       MODEL SETTING                     #
 ###########################################################
 model_type: ErnieLinear
 model:
    pretrained_token: ernie-3.0-nano-zh
    num_classes: 4
 ###########################################################
 #                     OPTIMIZER SETTING                   #
 ###########################################################
 optimizer_params:
    weight_decay: 1.0e-6               # weight decay coefficient.
 scheduler_params:
    learning_rate: 1.0e-5               # learning rate.
    gamma: 0.9999                          # scheduler gamma must between(0.0, 1.0) and closer to 1.0 is better.
 ###########################################################
 #                     TRAINING SETTING                    #
 ###########################################################
 max_epoch: 20
 num_snapshots: 5
 ###########################################################
 #                     OTHER SETTING                       #
 ###########################################################
 num_snapshots: 10                 # max number of snapshots to keep while training
 seed: 42                          # random seed for paddle, random, and np.random
--- a/examples/iwslt2012/punc0/conf/ernie-tiny.yaml
+++ b/examples/iwslt2012/punc0/conf/ernie-tiny.yaml
@ -0,0 +1,44 @@
 ###########################################################
 #                       DATA SETTING                      #
 ###########################################################
 dataset_type: Ernie
 train_path: data/iwslt2012_zh/train.txt
 dev_path: data/iwslt2012_zh/dev.txt
 test_path: data/iwslt2012_zh/test.txt
 batch_size: 64
 num_workers: 2
 data_params: 
    pretrained_token: ernie-tiny
    punc_path: data/iwslt2012_zh/punc_vocab
    seq_len: 100
 ###########################################################
 #                       MODEL SETTING                     #
 ###########################################################
 model_type: ErnieLinear
 model:
    pretrained_token: ernie-tiny
    num_classes: 4
 ###########################################################
 #                     OPTIMIZER SETTING                   #
 ###########################################################
 optimizer_params:
    weight_decay: 1.0e-6               # weight decay coefficient.
 scheduler_params:
    learning_rate: 1.0e-5               # learning rate.
    gamma: 0.9999                          # scheduler gamma must between(0.0, 1.0) and closer to 1.0 is better.
 ###########################################################
 #                     TRAINING SETTING                    #
 ###########################################################
 max_epoch: 20
 num_snapshots: 5
 ###########################################################
 #                     OTHER SETTING                       #
 ###########################################################
 num_snapshots: 10                 # max number of snapshots to keep while training
 seed: 42                          # random seed for paddle, random, and np.random
--- a/examples/librispeech/asr0/local/train.sh
+++ b/examples/librispeech/asr0/local/train.sh
@ -26,6 +26,10 @@ if [ ${seed} != 0 ]; then
    export FLAGS_cudnn_deterministic=True
 fi
 # default memeory allocator strategy may case gpu training hang
 # for no OOM raised when memory exhaused
 export FLAGS_allocator_strategy=naive_best_fit
 if [ ${ngpu} == 0 ]; then
 python3 -u ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
--- a/examples/librispeech/asr1/local/train.sh
+++ b/examples/librispeech/asr1/local/train.sh
@ -29,6 +29,10 @@ fi
 # export FLAGS_cudnn_exhaustive_search=true
 # export FLAGS_conv_workspace_size_limit=4000
 # default memeory allocator strategy may case gpu training hang
 # for no OOM raised when memory exhaused
 export FLAGS_allocator_strategy=naive_best_fit
 if [ ${ngpu} == 0 ]; then
 python3 -u ${BIN_DIR}/train.py \
 --ngpu ${ngpu} \
--- a/Show More
+++ b/Show More