diff --git a/README.md b/README.md
index 59c61f776..72db64b7d 100644
--- a/README.md
+++ b/README.md
@@ -19,8 +19,6 @@
Quick Start
- | Quick Start Server
- | Quick Start Streaming Server
| Documents
| Models List
| AIStudio Courses
@@ -159,6 +157,8 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
- 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).
### Recent Update
+- 🔥 2022.09.26: Add Voice Cloning, TTS finetune, and ERNIE-SAT in [PaddleSpeech Web Demo](./demos/speech_web).
+- ⚡ 2022.09.09: Add AISHELL-3 Voice Cloning [example](./examples/aishell3/vc2) with ECAPA-TDNN speaker encoder.
- ⚡ 2022.08.25: Release TTS [finetune](./examples/other/tts_finetune/tts3) example.
- 🔥 2022.08.22: Add ERNIE-SAT models: [ERNIE-SAT-vctk](./examples/vctk/ernie_sat)、[ERNIE-SAT-aishell3](./examples/aishell3/ernie_sat)、[ERNIE-SAT-zh_en](./examples/aishell3_vctk/ernie_sat).
- 🔥 2022.08.15: Add [g2pW](https://github.com/GitYCC/g2pW) into TTS Chinese Text Frontend.
@@ -705,7 +705,7 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
Speaker Verification
- VoxCeleb12
+ VoxCeleb1/2
ECAPA-TDNN
ecapa-tdnn-voxceleb12
@@ -714,6 +714,31 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
+
+
+**Speaker Diarization**
+
+
+
+
+ Task
+ Dataset
+ Model Type
+ Example
+
+
+
+
+ Speaker Diarization
+ AMI
+ ECAPA-TDNN + AHC / SC
+
+ ecapa-tdnn-ami
+
+
+
+
+
**Punctuation Restoration**
@@ -767,6 +792,7 @@ Normally, [Speech SoTA](https://paperswithcode.com/area/speech), [Audio SoTA](ht
- [Text-to-Speech](#TextToSpeech)
- [Audio Classification](#AudioClassification)
- [Speaker Verification](#SpeakerVerification)
+ - [Speaker Diarization](#SpeakerDiarization)
- [Punctuation Restoration](#PunctuationRestoration)
- [Community](#Community)
- [Welcome to contribute](#contribution)
diff --git a/README_cn.md b/README_cn.md
index 070a656a2..725f7eda1 100644
--- a/README_cn.md
+++ b/README_cn.md
@@ -19,10 +19,8 @@
### 近期更新
+- 🔥 2022.09.26: 新增 Voice Cloning, TTS finetune 和 ERNIE-SAT 到 [PaddleSpeech 网页应用](./demos/speech_web)。
+- ⚡ 2022.09.09: 新增基于 ECAPA-TDNN 声纹模型的 AISHELL-3 Voice Cloning [示例](./examples/aishell3/vc2)。
- ⚡ 2022.08.25: 发布 TTS [finetune](./examples/other/tts_finetune/tts3) 示例。
- 🔥 2022.08.22: 新增 ERNIE-SAT 模型: [ERNIE-SAT-vctk](./examples/vctk/ernie_sat)、[ERNIE-SAT-aishell3](./examples/aishell3/ernie_sat)、[ERNIE-SAT-zh_en](./examples/aishell3_vctk/ernie_sat)。
- 🔥 2022.08.15: 将 [g2pW](https://github.com/GitYCC/g2pW) 引入 TTS 中文文本前端。
@@ -717,8 +717,8 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
- Speaker Verification
- VoxCeleb12
+ 声纹识别
+ VoxCeleb1/2
ECAPA-TDNN
ecapa-tdnn-voxceleb12
@@ -727,6 +727,31 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
+
+
+**说话人日志**
+
+
+
+
+ 任务
+ 数据集
+ 模型类型
+ 脚本
+
+
+
+
+ 说话人日志
+ AMI
+ ECAPA-TDNN + AHC / SC
+
+ ecapa-tdnn-ami
+
+
+
+
+
**标点恢复**
@@ -786,6 +811,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
- [语音合成](#语音合成模型)
- [声音分类](#声音分类模型)
- [声纹识别](#声纹识别模型)
+ - [说话人日志](#说话人日志模型)
- [标点恢复](#标点恢复模型)
- [技术交流群](#技术交流群)
- [欢迎贡献](#欢迎贡献)
diff --git a/demos/speech_web/.gitignore b/demos/speech_web/.gitignore
index 54418e605..1e961a385 100644
--- a/demos/speech_web/.gitignore
+++ b/demos/speech_web/.gitignore
@@ -13,4 +13,7 @@
*.pdmodel
*/source/*
*/PaddleSpeech/*
+*/tmp*/*
+*/duration.txt
+*/oov_info.txt
diff --git a/demos/speech_web/README.md b/demos/speech_web/README.md
index 3b2da6e9a..89d22382a 100644
--- a/demos/speech_web/README.md
+++ b/demos/speech_web/README.md
@@ -1,55 +1,82 @@
# Paddle Speech Demo
-PaddleSpeechDemo 是一个以 PaddleSpeech 的语音交互功能为主体开发的 Demo 展示项目,用于帮助大家更好的上手 PaddleSpeech 以及使用 PaddleSpeech 构建自己的应用。
+## 简介
+Paddle Speech Demo 是一个以 PaddleSpeech 的语音交互功能为主体开发的 Demo 展示项目,用于帮助大家更好的上手 PaddleSpeech 以及使用 PaddleSpeech 构建自己的应用。
-智能语音交互部分使用 PaddleSpeech,对话以及信息抽取部分使用 PaddleNLP,网页前端展示部分基于 Vue3 进行开发
+智能语音交互部分使用 PaddleSpeech,对话以及信息抽取部分使用 PaddleNLP,网页前端展示部分基于 Vue3 进行开发。
主要功能:
+`main.py` 中包含功能
+ 语音聊天:PaddleSpeech 的语音识别能力+语音合成能力,对话部分基于 PaddleNLP 的闲聊功能
+ 声纹识别:PaddleSpeech 的声纹识别功能展示
+ 语音识别:支持【实时语音识别】,【端到端识别】,【音频文件识别】三种模式
+ 语音合成:支持【流式合成】与【端到端合成】两种方式
+ 语音指令:基于 PaddleSpeech 的语音识别能力与 PaddleNLP 的信息抽取,实现交通费的智能报销
+`vc.py` 中包含功能
++ 一句话合成:基于 GE2E 和 ECAPA-TDNN 模型的一句话合成方案,可以模仿输入的音频的音色进行合成任务
+ + GE2E 音色克隆方案可以参考: [【FastSpeech2 + AISHELL-3 Voice Cloning】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/vc1)
+ + ECAPA-TDNN 音色克隆方案可以参考: [【FastSpeech2 + AISHELL-3 Voice Cloning (ECAPA-TDNN)】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/vc2)
+
++ 小数据微调:基于小数据集的微调方案,内置用12句话标贝中文女声微调示例,你也可以通过一键重置,录制自己的声音,注意在安静环境下录制,效果会更好。你可以在 [【Finetune your own AM based on FastSpeech2 with AISHELL-3】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/tts_finetune/tts3)中尝试使用自己的数据集进行微调。
+
++ ENIRE-SAT:语言-语音跨模态大模型 ENIRE-SAT 可视化展示示例,支持个性化合成,跨语言语音合成(音频为中文则输入英文文本进行合成),语音编辑(修改音频文字中间的结果)功能。 ENIRE-SAT 更多实现细节,可以参考:
+ + [【ERNIE-SAT with AISHELL-3 dataset】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/ernie_sat)
+ + [【ERNIE-SAT with with AISHELL3 and VCTK datasets】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3_vctk/ernie_sat)
+ + [【ERNIE-SAT with VCTK dataset】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/ernie_sat)
+
运行效果:
- 
+ 
-## 安装
-### 后端环境安装
-```
-# 安装环境
-cd speech_server
-pip install -r requirements.txt
+## 基础环境安装
-# 下载 ie 模型,针对地点进行微调,效果更好,不下载的话会使用其它版本,效果没有这个好
-cd source
-mkdir model
-cd model
-wget https://bj.bcebos.com/paddlenlp/applications/speech-cmd-analysis/finetune/model_state.pdparams
+### 后端环境安装
+```bash
+# 需要先安装 PaddleSpeech
+cd speech_server
+pip install -r requirements.txt -i https://mirror.baidu.com/pypi/simple
+cd ../
```
### 前端环境安装
-
前端依赖 `node.js` ,需要提前安装,确保 `npm` 可用,`npm` 测试版本 `8.3.1`,建议下载[官网](https://nodejs.org/en/)稳定版的 `node.js`
-```
+如果因为网络问题,无法下载依赖库,可以参考 FAQ 部分,`npm / yarn 下载速度慢问题`
+
+```bash
# 进入前端目录
cd web_client
-
# 安装 `yarn`,已经安装可跳过
npm install -g yarn
-
# 使用yarn安装前端依赖
yarn install
+cd ../
```
+
## 启动服务
+【注意】目前只支持 `main.py` 和 `vc.py` 两者中选择开启一个后端服务。
+
+### 启动 `main.py` 后端服务
+
+#### 下载相关模型
+
+只需手动下载语音指令所需模型即可,其他模型会自动下载。
-### 开启后端服务
+```bash
+cd speech_server
+mkdir -p source/model
+cd source/model
+# 下载IE模型
+wget https://bj.bcebos.com/paddlenlp/applications/speech-cmd-analysis/finetune/model_state.pdparams
+cd ../../../
+
+```
+#### 启动后端服务
```
cd speech_server
@@ -57,14 +84,116 @@ cd speech_server
python main.py --port 8010
```
-### 开启前端服务
+
+### 启动 `vc.py` 后端服务
+
+参照下面的步骤自行配置项目所需环境。
+
+Aistudio 在线体验小样本合成后端功能:[【PaddleSpeech进阶】PaddleSpeech小样本合成方案体验](https://aistudio.baidu.com/aistudio/projectdetail/4573549?sUid=2470186&shared=1&ts=1664174385948)
+
+#### 下载相关模型和音频
+
+```bash
+cd speech_server
+
+# 已创建则跳过
+mkdir -p source/model
+cd source
+# 下载 & 解压 wav (包含VC测试音频)
+wget https://paddlespeech.bj.bcebos.com/demos/speech_web/wav_vc.zip
+unzip wav_vc.zip
+
+cd model
+# 下载 GE2E 相关模型
+wget https://bj.bcebos.com/paddlespeech/Parakeet/released_models/ge2e/ge2e_ckpt_0.3.zip
+unzip ge2e_ckpt_0.3.zip
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_aishell3_ckpt_0.5.zip
+unzip pwg_aishell3_ckpt_0.5.zip
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip
+unzip fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip
+
+# 下载 ECAPA-TDNN 相关模型
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_ckpt_vc2_1.2.0.zip
+unzip fastspeech2_aishell3_ckpt_vc2_1.2.0.zip
+
+# 下载 ERNIE-SAT 相关模型
+# aishell3 ERNIE-SAT
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/ernie_sat/erniesat_aishell3_ckpt_1.2.0.zip
+unzip erniesat_aishell3_ckpt_1.2.0.zip
+
+# vctk ERNIE-SAT
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/ernie_sat/erniesat_vctk_ckpt_1.2.0.zip
+unzip erniesat_vctk_ckpt_1.2.0.zip
+
+# aishell3_vctk ERNIE-SAT
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/ernie_sat/erniesat_aishell3_vctk_ckpt_1.2.0.zip
+unzip erniesat_aishell3_vctk_ckpt_1.2.0.zip
+
+# 下载 finetune 相关模型
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_ckpt_1.1.0.zip
+unzip fastspeech2_aishell3_ckpt_1.1.0.zip
+
+# 下载声码器
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_ckpt_0.2.0.zip
+unzip hifigan_aishell3_ckpt_0.2.0.zip
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_ckpt_0.2.0.zip
+unzip hifigan_vctk_ckpt_0.2.0.zip
+
+cd ../../../
+```
+
+#### ERNIE-SAT 环境配置
+
+ERNIE-SAT 体验依赖于 [examples/aishell3_vctk/ernie_sat](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3_vctk/ernie_sat) 的环境。参考 `examples/aishell3_vctk/ernie_sat` 下的 `README.md`, 确保 `examples/aishell3_vctk/ernie_sat` 下 `run.sh` 相关示例代码有效。
+
+运行好 `examples/aishell3_vctk/ernie_sat` 后,回到当前目录,创建环境:
+```bash
+cd speech_server
+ln -snf ../../../examples/aishell3_vctk/ernie_sat/download .
+ln -snf ../../../examples/aishell3_vctk/ernie_sat/tools .
+cd ../
+```
+
+#### finetune 环境配置
+
+`finetune` 需要解压 `tools/aligner` 中的 `aishell3_model.zip`,finetune 过程需要使用到 `tools/aligner/aishell3_model/meta.yaml` 文件。
+
+```bash
+cd speech_server/tools/aligner
+unzip aishell3_model.zip
+cd -
+```
+
+#### 启动后端服务
+
+```
+cd speech_server
+# 默认8010端口
+python vc.py --port 8010
+```
+
+### 启动前端服务
```
cd web_client
yarn dev --port 8011
```
-默认配置下,前端中配置的后台地址信息是 localhost,确保后端服务器和打开页面的游览器在同一台机器上,不在一台机器的配置方式见下方的 FAQ:【后端如果部署在其它机器或者别的端口如何修改】
+默认配置下,前端配置的后台地址信息是 `localhost`,确保后端服务器和打开页面的游览器在同一台机器上,不在一台机器的配置方式见下方的 FAQ:【后端如果部署在其它机器或者别的端口如何修改】
+
+#### 关于前端的一些说明
+
+为了方便后期的维护,这里并没有给出打包好的 HTML 文件,而是 Vue3 的项目,使用 `yarn dev --port 8011` 的方式启动测试,方便大家debug,相当于是启动了一个前端服务器。
+
+比如我们在本机启动的这个前端服务(运行 `yarn dev --port 8011` ),我们就可以通过在游览器中通过 `http://localhost:8011` 访问前端页面
+
+如果我们在其它服务器上(例如:`*.*.*.*` )启动这个前端服务(运行 `yarn dev --port 8011` ),我们就可以通过在游览器中访问 `http://*.*.*.*:8011` 访问前端页面
+
+那前端跟后端是什么关系呢? 两个是独立的,只要前端能够通过代理访问到后端的接口,那就没有问题。你可以在 A 机器上部署后端服务,然后在 B 机器上部署前端服务。我们在 `./web_client/vite.config.js` 中将 `/api` 映射到的是 `http://localhost:8010`,你可以把它配置成任意你想要访问后端地址。
+
+当前端在以 `*.*.*.*` 这类以 IP 地址形式的网页中访问时,由于游览器的安全限制,会禁止录音,需要重新配置游览器的安全策略, 可以看下面 FAQ 部分: [【前端以IP地址的形式访问,无法录音】]
+
+
## FAQ
#### Q: 如何安装node.js
@@ -75,7 +204,7 @@ A: node.js的安装可以参考[【菜鸟教程】](https://www.runoob.com/nod
A:后端的配置地址有分散在两个文件中
-修改第一个文件 `PaddleSpeechWebClient/vite.config.js`
+修改第一个文件 `./web_client/vite.config.js`
```
server: {
@@ -90,7 +219,7 @@ server: {
}
```
-修改第二个文件 `PaddleSpeechWebClient/src/api/API.js`( Websocket 代理配置失败,所以需要在这个文件中修改)
+修改第二个文件 `./web_client/src/api/API.js`( Websocket 代理配置失败,所以需要在这个文件中修改)
```
// websocket (这里改成后端所在的接口)
@@ -99,12 +228,24 @@ ASR_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/onlineStream', // Stream ASR 接
TTS_SOCKET_RECORD: 'ws://localhost:8010/ws/tts/online', // Stream TTS 接口
```
-#### Q:后端以IP地址的形式,前端无法录音
+#### Q:前端以IP地址的形式访问,无法录音
A:这里主要是游览器安全策略的限制,需要配置游览器后重启。游览器修改配置可参考[使用js-audio-recorder报浏览器不支持getUserMedia](https://blog.csdn.net/YRY_LIKE_YOU/article/details/113745273)
chrome设置地址: chrome://flags/#unsafely-treat-insecure-origin-as-secure
+#### Q: npm / yarn 配置淘宝镜像源
+
+A: 配置淘宝镜像源,详细可以参考 [【yarn npm 设置淘宝镜像】](https://www.jianshu.com/p/f6f43e8f9d6b)
+
+```bash
+# npm 配置淘宝镜像源
+npm config set registry https://registry.npmmirror.com
+
+# yarn 配置淘宝镜像源
+yarn config set registry http://registry.npm.taobao.org/
+```
+
## 参考资料
vue实现录音参考资料:https://blog.csdn.net/qq_41619796/article/details/107865602#t1
diff --git a/demos/speech_web/docs/效果展示.png b/demos/speech_web/docs/效果展示.png
deleted file mode 100644
index 5f7997c17..000000000
Binary files a/demos/speech_web/docs/效果展示.png and /dev/null differ
diff --git a/examples/other/tts_finetune/tts3/finetune.yaml b/demos/speech_web/speech_server/conf/tts3_finetune.yaml
similarity index 86%
rename from examples/other/tts_finetune/tts3/finetune.yaml
rename to demos/speech_web/speech_server/conf/tts3_finetune.yaml
index 374a69f3d..4f708bd71 100644
--- a/examples/other/tts_finetune/tts3/finetune.yaml
+++ b/demos/speech_web/speech_server/conf/tts3_finetune.yaml
@@ -3,10 +3,10 @@
###########################################################
# Set to -1 to indicate that the parameter is the same as the pretrained model configuration
-batch_size: -1
+batch_size: 10
learning_rate: 0.0001 # learning rate
num_snapshots: -1
# frozen_layers should be a list
# if you don't need to freeze, set frozen_layers to []
-frozen_layers: ["encoder", "duration_predictor"]
+frozen_layers: ["encoder"]
diff --git a/demos/speech_web/speech_server/main.py b/demos/speech_web/speech_server/main.py
index d4750d598..03e7e5996 100644
--- a/demos/speech_web/speech_server/main.py
+++ b/demos/speech_web/speech_server/main.py
@@ -1,8 +1,3 @@
-# todo:
-# 1. 开启服务
-# 2. 接收录音音频,返回识别结果
-# 3. 接收ASR识别结果,返回NLP对话结果
-# 4. 接收NLP对话结果,返回TTS音频
import argparse
import base64
import datetime
@@ -32,6 +27,7 @@ from starlette.requests import Request
from starlette.responses import FileResponse
from starlette.websockets import WebSocketState as WebSocketState
+from paddlespeech.cli.tts.infer import TTSExecutor
from paddlespeech.server.engine.asr.online.python.asr_engine import PaddleASRConnectionHanddler
from paddlespeech.server.utils.audio_process import float2pcm
@@ -55,7 +51,7 @@ asr_config = "conf/ws_conformer_wenetspeech_application_faster.yaml"
asr_init_path = "source/demo/demo.wav"
db_path = "source/db/vpr.sqlite"
ie_model_path = "source/model"
-
+tts_model = TTSExecutor()
# 路径配置
UPLOAD_PATH = "source/vpr"
WAV_PATH = "source/wav"
@@ -72,6 +68,14 @@ manager = ConnectionManager()
aumanager = AudioMannger(chatbot)
aumanager.init()
vpr = VPR(db_path, dim=192, top_k=5)
+# 初始化下载模型
+tts_model(
+ text="今天天气准不错",
+ output="test.wav",
+ am='fastspeech2_mix',
+ spk_id=174,
+ voc='hifigan_csmsc',
+ lang='mix', )
# 服务配置
@@ -331,6 +335,7 @@ async def ieOffline(nlp_base: NlpBase):
#####################################################################
+# 端到端合成
@app.post("/tts/offline")
async def text2speechOffline(tts_base: TtsBase):
text = tts_base.text
@@ -340,8 +345,14 @@ async def text2speechOffline(tts_base: TtsBase):
now_name = "tts_" + datetime.datetime.strftime(
datetime.datetime.now(), '%Y%m%d%H%M%S') + randName() + ".wav"
out_file_path = os.path.join(WAV_PATH, now_name)
- # 保存为文件,再转成base64传输
- chatbot.text2speech(text, outpath=out_file_path)
+ # 使用中英混合CLI
+ tts_model(
+ text=text,
+ output=out_file_path,
+ am='fastspeech2_mix',
+ spk_id=174,
+ voc='hifigan_csmsc',
+ lang='mix')
with open(out_file_path, "rb") as f:
data_bin = f.read()
base_str = base64.b64encode(data_bin)
diff --git a/demos/speech_web/speech_server/requirements.txt b/demos/speech_web/speech_server/requirements.txt
index 607f0d4d0..cdc654656 100644
--- a/demos/speech_web/speech_server/requirements.txt
+++ b/demos/speech_web/speech_server/requirements.txt
@@ -1,13 +1,8 @@
aiofiles
faiss-cpu
-fastapi
-librosa
-numpy
-paddlenlp
-paddlepaddle
-paddlespeech
+praatio==5.0.0
pydantic
-python-multipartscikit_learn
-SoundFile
+python-multipart
+scikit_learn
starlette
uvicorn
diff --git a/demos/speech_web/speech_server/src/ernie_sat.py b/demos/speech_web/speech_server/src/ernie_sat.py
new file mode 100644
index 000000000..02e1ed9d9
--- /dev/null
+++ b/demos/speech_web/speech_server/src/ernie_sat.py
@@ -0,0 +1,198 @@
+import os
+
+from .util import get_ngpu
+from .util import MAIN_ROOT
+from .util import run_cmd
+
+
+class SAT:
+ def __init__(self):
+ # pretrain model path
+ self.zh_pretrain_model_path = os.path.realpath(
+ "source/model/erniesat_aishell3_ckpt_1.2.0")
+ self.en_pretrain_model_path = os.path.realpath(
+ "source/model/erniesat_vctk_ckpt_1.2.0")
+ self.cross_pretrain_model_path = os.path.realpath(
+ "source/model/erniesat_aishell3_vctk_ckpt_1.2.0")
+
+ self.zh_voc_model_path = os.path.realpath(
+ "source/model/hifigan_aishell3_ckpt_0.2.0")
+ self.eb_voc_model_path = os.path.realpath(
+ "source/model/hifigan_vctk_ckpt_0.2.0")
+ self.cross_voc_model_path = os.path.realpath(
+ "source/model/hifigan_aishell3_ckpt_0.2.0")
+
+ self.BIN_DIR = os.path.join(MAIN_ROOT,
+ "paddlespeech/t2s/exps/ernie_sat")
+
+ def zh_synthesize_edit(self,
+ old_str: str,
+ new_str: str,
+ input_name: os.PathLike,
+ output_name: os.PathLike,
+ task_name: str="synthesize",
+ erniesat_ckpt_name: str="snapshot_iter_289500.pdz"):
+
+ if task_name not in ['synthesize', 'edit']:
+ print("task name only in ['edit', 'synthesize']")
+ return None
+
+ # 推理文件配置
+ config_path = os.path.join(self.zh_pretrain_model_path, "default.yaml")
+ phones_dict = os.path.join(self.zh_pretrain_model_path,
+ "phone_id_map.txt")
+ erniesat_ckpt = os.path.join(self.zh_pretrain_model_path,
+ erniesat_ckpt_name)
+ erniesat_stat = os.path.join(self.zh_pretrain_model_path,
+ "speech_stats.npy")
+
+ voc = "hifigan_aishell3"
+ voc_config = os.path.join(self.zh_voc_model_path, "default.yaml")
+ voc_ckpt = os.path.join(self.zh_voc_model_path,
+ "snapshot_iter_2500000.pdz")
+ voc_stat = os.path.join(self.zh_voc_model_path, "feats_stats.npy")
+
+ cmd = self.get_cmd(
+ task_name=task_name,
+ input_name=input_name,
+ old_str=old_str,
+ new_str=new_str,
+ config_path=config_path,
+ phones_dict=phones_dict,
+ erniesat_ckpt=erniesat_ckpt,
+ erniesat_stat=erniesat_stat,
+ voc=voc,
+ voc_config=voc_config,
+ voc_ckpt=voc_ckpt,
+ voc_stat=voc_stat,
+ output_name=output_name,
+ source_lang="zh",
+ target_lang="zh")
+
+ return run_cmd(cmd, output_name)
+
+ def crossclone(self,
+ old_str: str,
+ new_str: str,
+ input_name: os.PathLike,
+ output_name: os.PathLike,
+ source_lang: str,
+ target_lang: str,
+ erniesat_ckpt_name: str="snapshot_iter_489000.pdz"):
+ # 推理文件配置
+ config_path = os.path.join(self.cross_pretrain_model_path,
+ "default.yaml")
+ phones_dict = os.path.join(self.cross_pretrain_model_path,
+ "phone_id_map.txt")
+ erniesat_ckpt = os.path.join(self.cross_pretrain_model_path,
+ erniesat_ckpt_name)
+ erniesat_stat = os.path.join(self.cross_pretrain_model_path,
+ "speech_stats.npy")
+
+ voc = "hifigan_aishell3"
+ voc_config = os.path.join(self.cross_voc_model_path, "default.yaml")
+ voc_ckpt = os.path.join(self.cross_voc_model_path,
+ "snapshot_iter_2500000.pdz")
+ voc_stat = os.path.join(self.cross_voc_model_path, "feats_stats.npy")
+ task_name = "synthesize"
+ cmd = self.get_cmd(
+ task_name=task_name,
+ input_name=input_name,
+ old_str=old_str,
+ new_str=new_str,
+ config_path=config_path,
+ phones_dict=phones_dict,
+ erniesat_ckpt=erniesat_ckpt,
+ erniesat_stat=erniesat_stat,
+ voc=voc,
+ voc_config=voc_config,
+ voc_ckpt=voc_ckpt,
+ voc_stat=voc_stat,
+ output_name=output_name,
+ source_lang=source_lang,
+ target_lang=target_lang)
+
+ return run_cmd(cmd, output_name)
+
+ def en_synthesize_edit(self,
+ old_str: str,
+ new_str: str,
+ input_name: os.PathLike,
+ output_name: os.PathLike,
+ task_name: str="synthesize",
+ erniesat_ckpt_name: str="snapshot_iter_199500.pdz"):
+
+ # 推理文件配置
+ config_path = os.path.join(self.en_pretrain_model_path, "default.yaml")
+ phones_dict = os.path.join(self.en_pretrain_model_path,
+ "phone_id_map.txt")
+ erniesat_ckpt = os.path.join(self.en_pretrain_model_path,
+ erniesat_ckpt_name)
+ erniesat_stat = os.path.join(self.en_pretrain_model_path,
+ "speech_stats.npy")
+
+ voc = "hifigan_aishell3"
+ voc_config = os.path.join(self.zh_voc_model_path, "default.yaml")
+ voc_ckpt = os.path.join(self.zh_voc_model_path,
+ "snapshot_iter_2500000.pdz")
+ voc_stat = os.path.join(self.zh_voc_model_path, "feats_stats.npy")
+
+ cmd = self.get_cmd(
+ task_name=task_name,
+ input_name=input_name,
+ old_str=old_str,
+ new_str=new_str,
+ config_path=config_path,
+ phones_dict=phones_dict,
+ erniesat_ckpt=erniesat_ckpt,
+ erniesat_stat=erniesat_stat,
+ voc=voc,
+ voc_config=voc_config,
+ voc_ckpt=voc_ckpt,
+ voc_stat=voc_stat,
+ output_name=output_name,
+ source_lang="en",
+ target_lang="en")
+
+ return run_cmd(cmd, output_name)
+
+ def get_cmd(self,
+ task_name: str,
+ input_name: str,
+ old_str: str,
+ new_str: str,
+ config_path: str,
+ phones_dict: str,
+ erniesat_ckpt: str,
+ erniesat_stat: str,
+ voc: str,
+ voc_config: str,
+ voc_ckpt: str,
+ voc_stat: str,
+ output_name: str,
+ source_lang: str,
+ target_lang: str):
+ ngpu = get_ngpu()
+ cmd = f"""
+ FLAGS_allocator_strategy=naive_best_fit \
+ FLAGS_fraction_of_gpu_memory_to_use=0.01 \
+ python3 {self.BIN_DIR}/synthesize_e2e.py \
+ --task_name={task_name} \
+ --wav_path={input_name} \
+ --old_str='{old_str}' \
+ --new_str='{new_str}' \
+ --source_lang={source_lang} \
+ --target_lang={target_lang} \
+ --erniesat_config={config_path} \
+ --phones_dict={phones_dict} \
+ --erniesat_ckpt={erniesat_ckpt} \
+ --erniesat_stat={erniesat_stat} \
+ --voc={voc} \
+ --voc_config={voc_config} \
+ --voc_ckpt={voc_ckpt} \
+ --voc_stat={voc_stat} \
+ --output_name={output_name} \
+ --ngpu={ngpu}
+ """
+
+ return cmd
diff --git a/demos/speech_web/speech_server/src/finetune.py b/demos/speech_web/speech_server/src/finetune.py
new file mode 100644
index 000000000..6ca99251b
--- /dev/null
+++ b/demos/speech_web/speech_server/src/finetune.py
@@ -0,0 +1,127 @@
+import os
+
+from .util import get_ngpu
+from .util import MAIN_ROOT
+from .util import run_cmd
+
+
+def find_max_ckpt(model_path):
+ max_ckpt = 0
+ for filename in os.listdir(model_path):
+ if filename.endswith('.pdz'):
+ files = filename[:-4]
+ a1, a2, it = files.split("_")
+ if int(it) > max_ckpt:
+ max_ckpt = int(it)
+ return max_ckpt
+
+
+class FineTune:
+ def __init__(self):
+ self.now_file_path = os.path.dirname(__file__)
+ self.PYTHONPATH = os.path.join(MAIN_ROOT,
+ "examples/other/tts_finetune/tts3")
+ self.BIN_DIR = os.path.join(MAIN_ROOT,
+ "paddlespeech/t2s/exps/fastspeech2")
+ self.pretrained_model_dir = os.path.realpath(
+ "source/model/fastspeech2_aishell3_ckpt_1.1.0")
+ self.voc_model_dir = os.path.realpath(
+ "source/model/hifigan_aishell3_ckpt_0.2.0")
+ self.finetune_config = os.path.join("conf/tts3_finetune.yaml")
+
+ def finetune(self, input_dir, exp_dir='temp', epoch=100):
+ """
+ use cmd follow examples/other/tts_finetune/tts3/run.sh
+ """
+ newdir_name = "newdir"
+ new_dir = os.path.join(input_dir, newdir_name)
+ mfa_dir = os.path.join(exp_dir, 'mfa_result')
+ dump_dir = os.path.join(exp_dir, 'dump')
+ output_dir = os.path.join(exp_dir, 'exp')
+ lang = "zh"
+ ngpu = get_ngpu()
+
+ cmd = f"""
+ # check oov
+ python3 {self.PYTHONPATH}/local/check_oov.py \
+ --input_dir={input_dir} \
+ --pretrained_model_dir={self.pretrained_model_dir} \
+ --newdir_name={newdir_name} \
+ --lang={lang}
+
+ # get mfa result
+ python3 {self.PYTHONPATH}/local/get_mfa_result.py \
+ --input_dir={new_dir} \
+ --mfa_dir={mfa_dir} \
+ --lang={lang}
+
+ # generate durations.txt
+ python3 {self.PYTHONPATH}/local/generate_duration.py \
+ --mfa_dir={mfa_dir}
+
+ # extract feature
+ python3 {self.PYTHONPATH}/local/extract_feature.py \
+ --duration_file="./durations.txt" \
+ --input_dir={new_dir} \
+ --dump_dir={dump_dir} \
+ --pretrained_model_dir={self.pretrained_model_dir}
+
+ # create finetune env
+ python3 {self.PYTHONPATH}/local/prepare_env.py \
+ --pretrained_model_dir={self.pretrained_model_dir} \
+ --output_dir={output_dir}
+
+ # finetune
+ python3 {self.PYTHONPATH}/local/finetune.py \
+ --pretrained_model_dir={self.pretrained_model_dir} \
+ --dump_dir={dump_dir} \
+ --output_dir={output_dir} \
+ --ngpu={ngpu} \
+ --epoch=100 \
+ --finetune_config={self.finetune_config}
+ """
+
+ print(cmd)
+
+ return run_cmd(cmd, exp_dir)
+
+ def synthesize(self, text, wav_name, out_wav_dir, exp_dir='temp'):
+
+ voc = "hifigan_aishell3"
+ dump_dir = os.path.join(exp_dir, 'dump')
+ output_dir = os.path.join(exp_dir, 'exp')
+ text_path = os.path.join(exp_dir, 'sentences.txt')
+ lang = "zh"
+ ngpu = get_ngpu()
+
+ model_path = f"{output_dir}/checkpoints"
+ ckpt = find_max_ckpt(model_path)
+
+ # 生成对应的语句
+ with open(text_path, "w", encoding='utf8') as f:
+ f.write(wav_name + " " + text)
+
+ cmd = f"""
+ FLAGS_allocator_strategy=naive_best_fit \
+ FLAGS_fraction_of_gpu_memory_to_use=0.01 \
+ python3 {self.BIN_DIR}/../synthesize_e2e.py \
+ --am=fastspeech2_aishell3 \
+ --am_config={self.pretrained_model_dir}/default.yaml \
+ --am_ckpt={output_dir}/checkpoints/snapshot_iter_{ckpt}.pdz \
+ --am_stat={self.pretrained_model_dir}/speech_stats.npy \
+ --voc={voc} \
+ --voc_config={self.voc_model_dir}/default.yaml \
+ --voc_ckpt={self.voc_model_dir}/snapshot_iter_2500000.pdz \
+ --voc_stat={self.voc_model_dir}/feats_stats.npy \
+ --lang={lang} \
+ --text={text_path} \
+ --output_dir={out_wav_dir} \
+ --phones_dict={dump_dir}/phone_id_map.txt \
+ --speaker_dict={dump_dir}/speaker_id_map.txt \
+ --spk_id=0 \
+ --ngpu={ngpu}
+ """
+
+ out_path = os.path.join(out_wav_dir, f"{wav_name}.wav")
+
+ return run_cmd(cmd, out_path)
diff --git a/demos/speech_web/speech_server/src/ge2e_clone.py b/demos/speech_web/speech_server/src/ge2e_clone.py
new file mode 100644
index 000000000..83c2b3f35
--- /dev/null
+++ b/demos/speech_web/speech_server/src/ge2e_clone.py
@@ -0,0 +1,60 @@
+import os
+import shutil
+
+from .util import get_ngpu
+from .util import MAIN_ROOT
+from .util import run_cmd
+
+
+class VoiceCloneGE2E():
+ def __init__(self):
+ # Path 到指定路径上
+ self.BIN_DIR = os.path.join(MAIN_ROOT, "paddlespeech/t2s/exps")
+ # am
+ self.am = "fastspeech2_aishell3"
+ self.am_config = "source/model/fastspeech2_nosil_aishell3_vc1_ckpt_0.5/default.yaml"
+ self.am_ckpt = "source/model/fastspeech2_nosil_aishell3_vc1_ckpt_0.5/snapshot_iter_96400.pdz"
+ self.am_stat = "source/model/fastspeech2_nosil_aishell3_vc1_ckpt_0.5/speech_stats.npy"
+ self.phones_dict = "source/model/fastspeech2_nosil_aishell3_vc1_ckpt_0.5/phone_id_map.txt"
+ # voc
+ self.voc = "pwgan_aishell3"
+ self.voc_config = "source/model/pwg_aishell3_ckpt_0.5/default.yaml"
+ self.voc_ckpt = "source/model/pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz"
+ self.voc_stat = "source/model/pwg_aishell3_ckpt_0.5/feats_stats.npy"
+ # ge2e
+ self.ge2e_params_path = "source/model/ge2e_ckpt_0.3/step-3000000.pdparams"
+
+ def vc(self, text, input_wav, out_wav):
+
+ # input wav 需要形成临时单独文件夹
+ _, full_file_name = os.path.split(input_wav)
+ ref_audio_dir = os.path.realpath("tmp_dir/ge2e")
+ if os.path.exists(ref_audio_dir):
+ shutil.rmtree(ref_audio_dir)
+
+ os.makedirs(ref_audio_dir, exist_ok=True)
+ shutil.copy(input_wav, ref_audio_dir)
+
+ output_dir = os.path.dirname(out_wav)
+ ngpu = get_ngpu()
+
+ cmd = f"""
+ python3 {self.BIN_DIR}/voice_cloning.py \
+ --am={self.am} \
+ --am_config={self.am_config} \
+ --am_ckpt={self.am_ckpt} \
+ --am_stat={self.am_stat} \
+ --voc={self.voc} \
+ --voc_config={self.voc_config} \
+ --voc_ckpt={self.voc_ckpt} \
+ --voc_stat={self.voc_stat} \
+ --ge2e_params_path={self.ge2e_params_path} \
+ --text="{text}" \
+ --input-dir={ref_audio_dir} \
+ --output-dir={output_dir} \
+ --phones-dict={self.phones_dict} \
+ --ngpu={ngpu}
+ """
+
+ output_name = os.path.join(output_dir, full_file_name)
+ return run_cmd(cmd, output_name=output_name)
diff --git a/demos/speech_web/speech_server/src/tdnn_clone.py b/demos/speech_web/speech_server/src/tdnn_clone.py
new file mode 100644
index 000000000..53c5a3816
--- /dev/null
+++ b/demos/speech_web/speech_server/src/tdnn_clone.py
@@ -0,0 +1,56 @@
+import os
+import shutil
+
+from .util import get_ngpu
+from .util import MAIN_ROOT
+from .util import run_cmd
+
+
+class VoiceCloneTDNN():
+ def __init__(self):
+ # Path 到指定路径上
+ self.BIN_DIR = os.path.join(MAIN_ROOT, "paddlespeech/t2s/exps")
+
+ self.am = "fastspeech2_aishell3"
+ self.am_config = "source/model/fastspeech2_aishell3_ckpt_vc2_1.2.0/default.yaml"
+ self.am_ckpt = "source/model/fastspeech2_aishell3_ckpt_vc2_1.2.0/snapshot_iter_96400.pdz"
+ self.am_stat = "source/model/fastspeech2_aishell3_ckpt_vc2_1.2.0/speech_stats.npy"
+ self.phones_dict = "source/model/fastspeech2_aishell3_ckpt_vc2_1.2.0/phone_id_map.txt"
+ # voc
+ self.voc = "pwgan_aishell3"
+ self.voc_config = "source/model/pwg_aishell3_ckpt_0.5/default.yaml"
+ self.voc_ckpt = "source/model/pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz"
+ self.voc_stat = "source/model/pwg_aishell3_ckpt_0.5/feats_stats.npy"
+
+ def vc(self, text, input_wav, out_wav):
+ # input wav 需要形成临时单独文件夹
+ _, full_file_name = os.path.split(input_wav)
+ ref_audio_dir = os.path.realpath("tmp_dir/tdnn")
+ if os.path.exists(ref_audio_dir):
+ shutil.rmtree(ref_audio_dir)
+ os.makedirs(ref_audio_dir, exist_ok=True)
+ shutil.copy(input_wav, ref_audio_dir)
+
+ output_dir = os.path.dirname(out_wav)
+ ngpu = get_ngpu()
+
+ cmd = f"""
+ python3 {self.BIN_DIR}/voice_cloning.py \
+ --am={self.am} \
+ --am_config={self.am_config} \
+ --am_ckpt={self.am_ckpt} \
+ --am_stat={self.am_stat} \
+ --voc={self.voc} \
+ --voc_config={self.voc_config} \
+ --voc_ckpt={self.voc_ckpt} \
+ --voc_stat={self.voc_stat} \
+ --text="{text}" \
+ --input-dir={ref_audio_dir} \
+ --output-dir={output_dir} \
+ --phones-dict={self.phones_dict} \
+ --use_ecapa=True \
+ --ngpu={ngpu}
+ """
+
+ output_name = os.path.join(output_dir, full_file_name)
+ return run_cmd(cmd, output_name=output_name)
diff --git a/demos/speech_web/speech_server/src/util.py b/demos/speech_web/speech_server/src/util.py
index 4a566b6ee..0188f0280 100644
--- a/demos/speech_web/speech_server/src/util.py
+++ b/demos/speech_web/speech_server/src/util.py
@@ -1,4 +1,18 @@
+import os
import random
+import subprocess
+
+import paddle
+
+NOW_FILE_PATH = os.path.dirname(__file__)
+MAIN_ROOT = os.path.realpath(os.path.join(NOW_FILE_PATH, "../../../../"))
+
+
+def get_ngpu():
+ if paddle.device.get_device() == "cpu":
+ return 0
+ else:
+ return 1
def randName(n=5):
@@ -11,3 +25,20 @@ def SuccessRequest(result=None, message="ok"):
def ErrorRequest(result=None, message="error"):
return {"code": -1, "result": result, "message": message}
+
+
+def run_cmd(cmd, output_name):
+ p = subprocess.Popen(cmd, shell=True)
+ res = p.wait()
+ print(cmd)
+ print("运行结果:", res)
+ if res == 0:
+ # 运行成功
+ if os.path.exists(output_name):
+ return output_name
+ else:
+ # 合成的文件不存在
+ return None
+ else:
+ # 运行失败
+ return None
diff --git a/demos/speech_web/speech_server/vc.py b/demos/speech_web/speech_server/vc.py
new file mode 100644
index 000000000..d035c02a4
--- /dev/null
+++ b/demos/speech_web/speech_server/vc.py
@@ -0,0 +1,550 @@
+import argparse
+import base64
+import datetime
+import json
+import os
+from typing import List
+
+import aiofiles
+import librosa
+import soundfile as sf
+import uvicorn
+from fastapi import FastAPI
+from fastapi import UploadFile
+from pydantic import BaseModel
+from src.ernie_sat import SAT
+from src.finetune import FineTune
+from src.ge2e_clone import VoiceCloneGE2E
+from src.tdnn_clone import VoiceCloneTDNN
+from src.util import *
+from starlette.responses import FileResponse
+
+from paddlespeech.server.utils.audio_process import float2pcm
+
+# 解析配置
+parser = argparse.ArgumentParser(prog='PaddleSpeechDemo', add_help=True)
+
+parser.add_argument(
+ "--port",
+ action="store",
+ type=int,
+ help="port of the app",
+ default=8010,
+ required=False)
+
+args = parser.parse_args()
+port = args.port
+
+# 这里会对finetune产生影响,所以finetune使用了cmd
+vc_model = VoiceCloneGE2E()
+vc_model_tdnn = VoiceCloneTDNN()
+
+sat_model = SAT()
+ft_model = FineTune()
+
+# 配置文件
+tts_config = "conf/tts_online_application.yaml"
+asr_config = "conf/ws_conformer_wenetspeech_application_faster.yaml"
+asr_init_path = "source/demo/demo.wav"
+db_path = "source/db/vc.sqlite"
+ie_model_path = "source/model"
+
+# 路径配置
+VC_UPLOAD_PATH = "source/wav/vc/upload"
+VC_OUT_PATH = "source/wav/vc/out"
+
+FT_UPLOAD_PATH = "source/wav/finetune/upload"
+FT_OUT_PATH = "source/wav/finetune/out"
+FT_LABEL_PATH = "source/wav/finetune/label.json"
+FT_LABEL_TXT_PATH = "source/wav/finetune/labels.txt"
+FT_DEFAULT_PATH = "source/wav/finetune/default"
+FT_EXP_BASE_PATH = "tmp_dir/finetune"
+
+SAT_UPLOAD_PATH = "source/wav/SAT/upload"
+SAT_OUT_PATH = "source/wav/SAT/out"
+SAT_LABEL_PATH = "source/wav/SAT/label.json"
+
+# SAT 标注结果初始化
+if os.path.exists(SAT_LABEL_PATH):
+ with open(SAT_LABEL_PATH, "r", encoding='utf8') as f:
+ sat_label_dic = json.load(f)
+else:
+ sat_label_dic = {}
+
+# ft 标注结果初始化
+if os.path.exists(FT_LABEL_PATH):
+ with open(FT_LABEL_PATH, "r", encoding='utf8') as f:
+ ft_label_dic = json.load(f)
+else:
+ ft_label_dic = {}
+
+# 新建文件夹
+base_sources = [
+ VC_UPLOAD_PATH,
+ VC_OUT_PATH,
+ FT_UPLOAD_PATH,
+ FT_OUT_PATH,
+ FT_DEFAULT_PATH,
+ SAT_UPLOAD_PATH,
+ SAT_OUT_PATH,
+]
+for path in base_sources:
+ os.makedirs(path, exist_ok=True)
+#####################################################################
+########################### APP初始化 ###############################
+#####################################################################
+app = FastAPI()
+
+######################################################################
+########################### 接口类型 #################################
+#####################################################################
+
+
+# 接口结构
+class VcBase(BaseModel):
+ wavName: str
+ wavPath: str
+
+
+class VcBaseText(BaseModel):
+ wavName: str
+ wavPath: str
+ text: str
+ func: str
+
+
+class VcBaseSAT(BaseModel):
+ old_str: str
+ new_str: str
+ language: str
+ function: str
+ wav: str # base64编码
+ filename: str
+
+
+class FTPath(BaseModel):
+ dataPath: str
+
+
+class VcBaseFT(BaseModel):
+ wav: str # base64编码
+ filename: str
+ wav_path: str
+
+
+class VcBaseFTModel(BaseModel):
+ wav_path: str
+
+
+class VcBaseFTSyn(BaseModel):
+ exp_path: str
+ text: str
+
+
+######################################################################
+########################### 文件列表查询与保存服务 #################################
+#####################################################################
+
+
+def getVCList(path):
+ VC_FileDict = []
+ # 查询upload路径下的wav文件名
+ for root, dirs, files in os.walk(path, topdown=False):
+ for name in files:
+ # print(os.path.join(root, name))
+ VC_FileDict.append({'name': name, 'path': os.path.join(root, name)})
+ VC_FileDict = sorted(VC_FileDict, key=lambda x: x['name'], reverse=True)
+ return VC_FileDict
+
+
+async def saveFiles(files, SavePath):
+ right = 0
+ error = 0
+ error_info = "错误文件:"
+ for file in files:
+ try:
+ if 'blob' in file.filename:
+ out_file_path = os.path.join(
+ SavePath,
+ datetime.datetime.strftime(datetime.datetime.now(),
+ '%H%M') + randName(3) + ".wav")
+ else:
+ out_file_path = os.path.join(SavePath, file.filename)
+
+ print("上传文件名:", out_file_path)
+ async with aiofiles.open(out_file_path, 'wb') as out_file:
+ content = await file.read() # async read
+ await out_file.write(content) # async write
+ # 将文件转成24k, 16bit类型的wav文件
+ wav, sr = librosa.load(out_file_path, sr=16000)
+ sf.write(out_file_path, data=wav, samplerate=sr)
+ right += 1
+ except Exception as e:
+ error += 1
+ error_info = error_info + file.filename + " " + str(e) + "\n"
+ continue
+ return f"上传成功:{right}, 上传失败:{error}, 失败原因: {error_info}"
+
+
+# 音频下载
+@app.post("/vc/download")
+async def VcDownload(base: VcBase):
+ if os.path.exists(base.wavPath):
+ return FileResponse(base.wavPath)
+ else:
+ return ErrorRequest(message="下载请求失败,文件不存在")
+
+
+# 音频下载base64
+@app.post("/vc/download_base64")
+async def VcDownloadBase64(base: VcBase):
+ if os.path.exists(base.wavPath):
+ # 将文件转成16k, 16bit类型的wav文件
+ wav, sr = librosa.load(base.wavPath, sr=16000)
+ wav = float2pcm(wav) # float32 to int16
+ wav_bytes = wav.tobytes() # to bytes
+ wav_base64 = base64.b64encode(wav_bytes).decode('utf8')
+ return SuccessRequest(result=wav_base64)
+ else:
+ return ErrorRequest(message="播放请求失败,文件不存在")
+
+
+######################################################################
+########################### VC 服务 #################################
+#####################################################################
+
+
+# 上传文件
+@app.post("/vc/upload")
+async def VcUpload(files: List[UploadFile]):
+ # res = saveFiles(files, VC_UPLOAD_PATH)
+ right = 0
+ error = 0
+ error_info = "错误文件:"
+ for file in files:
+ try:
+ if 'blob' in file.filename:
+ out_file_path = os.path.join(
+ VC_UPLOAD_PATH,
+ datetime.datetime.strftime(datetime.datetime.now(),
+ '%H%M') + randName(3) + ".wav")
+ else:
+ out_file_path = os.path.join(VC_UPLOAD_PATH, file.filename)
+
+ print("上传文件名:", out_file_path)
+ async with aiofiles.open(out_file_path, 'wb') as out_file:
+ content = await file.read() # async read
+ await out_file.write(content) # async write
+ # 将文件转成24k, 16bit类型的wav文件
+ wav, sr = librosa.load(out_file_path, sr=16000)
+ sf.write(out_file_path, data=wav, samplerate=sr)
+ right += 1
+ except Exception as e:
+ error += 1
+ error_info = error_info + file.filename + " " + str(e) + "\n"
+ continue
+ return SuccessRequest(
+ result=f"上传成功:{right}, 上传失败:{error}, 失败原因: {error_info}")
+
+
+# 获取文件列表
+@app.get("/vc/list")
+async def VcList():
+ res = getVCList(VC_UPLOAD_PATH)
+ return SuccessRequest(result=res)
+
+
+# 获取音频文件
+@app.post("/vc/file")
+async def VcFileGet(base: VcBase):
+ if os.path.exists(base.wavPath):
+ return FileResponse(base.wavPath)
+ else:
+ return ErrorRequest(result="获取文件失败")
+
+
+# 删除音频文件
+@app.post("/vc/del")
+async def VcFileDel(base: VcBase):
+ if os.path.exists(base.wavPath):
+ os.remove(base.wavPath)
+ return SuccessRequest(result="删除成功")
+ else:
+ return ErrorRequest(result="删除失败")
+
+
+# 声音克隆G2P
+@app.post("/vc/clone_g2p")
+async def VcCloneG2P(base: VcBaseText):
+ if os.path.exists(base.wavPath):
+ try:
+ if base.func == 'ge2e':
+ wavName = base.wavName
+ wavPath = os.path.join(VC_OUT_PATH, wavName)
+ wavPath = vc_model.vc(
+ text=base.text, input_wav=base.wavPath, out_wav=wavPath)
+ else:
+ wavName = base.wavName
+ wavPath = os.path.join(VC_OUT_PATH, wavName)
+ wavPath = vc_model_tdnn.vc(
+ text=base.text, input_wav=base.wavPath, out_wav=wavPath)
+ if wavPath:
+ res = {"wavName": wavName, "wavPath": wavPath}
+ return SuccessRequest(result=res)
+ else:
+ return ErrorRequest(message="克隆失败,检查克隆脚本是否有效")
+ except Exception as e:
+ print(e)
+ return ErrorRequest(message="克隆失败,合成过程报错")
+ else:
+ return ErrorRequest(message="克隆失败,音频不存在")
+
+
+######################################################################
+########################### SAT 服务 #################################
+#####################################################################
+# 声音克隆SAT
+@app.post("/vc/clone_sat")
+async def VcCloneSAT(base: VcBaseSAT):
+ # 重新整理 sat_label_dict
+ if base.filename not in sat_label_dic or sat_label_dic[
+ base.filename] != base.old_str:
+ sat_label_dic[base.filename] = base.old_str
+ with open(SAT_LABEL_PATH, "w", encoding='utf8') as f:
+ json.dump(sat_label_dic, f, ensure_ascii=False, indent=4)
+
+ input_file_path = base.wav
+
+ # 选择任务
+ if base.language == "zh":
+ # 中文
+ if base.function == "synthesize":
+ output_file_path = os.path.join(SAT_OUT_PATH,
+ "sat_syn_zh_" + base.filename)
+ # 中文克隆
+ sat_result = sat_model.zh_synthesize_edit(
+ old_str=base.old_str,
+ new_str=base.new_str,
+ input_name=os.path.realpath(input_file_path),
+ output_name=os.path.realpath(output_file_path),
+ task_name="synthesize")
+ elif base.function == "edit":
+ output_file_path = os.path.join(SAT_OUT_PATH,
+ "sat_edit_zh_" + base.filename)
+ # 中文语音编辑
+ sat_result = sat_model.zh_synthesize_edit(
+ old_str=base.old_str,
+ new_str=base.new_str,
+ input_name=os.path.realpath(input_file_path),
+ output_name=os.path.realpath(output_file_path),
+ task_name="edit")
+ elif base.function == "crossclone":
+ output_file_path = os.path.join(SAT_OUT_PATH,
+ "sat_cross_zh_" + base.filename)
+ # 中文跨语言
+ sat_result = sat_model.crossclone(
+ old_str=base.old_str,
+ new_str=base.new_str,
+ input_name=os.path.realpath(input_file_path),
+ output_name=os.path.realpath(output_file_path),
+ source_lang="zh",
+ target_lang="en")
+ else:
+ return ErrorRequest(
+ message="请检查功能选项是否正确,仅支持:synthesize, edit, crossclone")
+ elif base.language == "en":
+ if base.function == "synthesize":
+ output_file_path = os.path.join(SAT_OUT_PATH,
+ "sat_syn_zh_" + base.filename)
+ # 英文语音克隆
+ sat_result = sat_model.en_synthesize_edit(
+ old_str=base.old_str,
+ new_str=base.new_str,
+ input_name=os.path.realpath(input_file_path),
+ output_name=os.path.realpath(output_file_path),
+ task_name="synthesize")
+ elif base.function == "edit":
+ output_file_path = os.path.join(SAT_OUT_PATH,
+ "sat_edit_zh_" + base.filename)
+ # 英文语音编辑
+ sat_result = sat_model.en_synthesize_edit(
+ old_str=base.old_str,
+ new_str=base.new_str,
+ input_name=os.path.realpath(input_file_path),
+ output_name=os.path.realpath(output_file_path),
+ task_name="edit")
+ elif base.function == "crossclone":
+ output_file_path = os.path.join(SAT_OUT_PATH,
+ "sat_cross_zh_" + base.filename)
+ # 英文跨语言
+ sat_result = sat_model.crossclone(
+ old_str=base.old_str,
+ new_str=base.new_str,
+ input_name=os.path.realpath(input_file_path),
+ output_name=os.path.realpath(output_file_path),
+ source_lang="en",
+ target_lang="zh")
+ else:
+ return ErrorRequest(
+ message="请检查功能选项是否正确,仅支持:synthesize, edit, crossclone")
+ else:
+ return ErrorRequest(message="请检查功能选项是否正确,仅支持中文和英文")
+
+ if sat_result:
+ return SuccessRequest(result=sat_result, message="SAT合成成功")
+ else:
+ return ErrorRequest(message="SAT 合成失败,请从后台检查错误信息!")
+
+
+# SAT 文件列表
+@app.get("/sat/list")
+async def SatList():
+ res = []
+ filelist = getVCList(SAT_UPLOAD_PATH)
+ for fileitem in filelist:
+ if fileitem['name'] in sat_label_dic:
+ fileitem['label'] = sat_label_dic[fileitem['name']]
+ else:
+ fileitem['label'] = ""
+ res.append(fileitem)
+ return SuccessRequest(result=res)
+
+
+# 上传 SAT 音频
+# 上传文件
+@app.post("/sat/upload")
+async def SATUpload(files: List[UploadFile]):
+ right = 0
+ error = 0
+ error_info = "错误文件:"
+ for file in files:
+ try:
+ if 'blob' in file.filename:
+ out_file_path = os.path.join(
+ SAT_UPLOAD_PATH,
+ datetime.datetime.strftime(datetime.datetime.now(),
+ '%H%M') + randName(3) + ".wav")
+ else:
+ out_file_path = os.path.join(SAT_UPLOAD_PATH, file.filename)
+
+ print("上传文件名:", out_file_path)
+ async with aiofiles.open(out_file_path, 'wb') as out_file:
+ content = await file.read() # async read
+ await out_file.write(content) # async write
+ # 将文件转成24k, 16bit类型的wav文件
+ wav, sr = librosa.load(out_file_path, sr=16000)
+ sf.write(out_file_path, data=wav, samplerate=sr)
+ right += 1
+ except Exception as e:
+ error += 1
+ error_info = error_info + file.filename + " " + str(e) + "\n"
+ continue
+ return SuccessRequest(
+ result=f"上传成功:{right}, 上传失败:{error}, 失败原因: {error_info}")
+
+
+######################################################################
+########################### FinueTune 服务 #################################
+#####################################################################
+
+
+# finetune 文件列表
+@app.post("/finetune/list")
+async def FineTuneList(Path: FTPath):
+ dataPath = Path.dataPath
+ if dataPath == "default":
+ # 默认路径
+ FT_PATH = FT_DEFAULT_PATH
+ else:
+ FT_PATH = dataPath
+
+ res = []
+ filelist = getVCList(FT_PATH)
+ for name, value in ft_label_dic.items():
+ wav_path = os.path.join(FT_PATH, name)
+ if not os.path.exists(wav_path):
+ wav_path = ""
+ d = {'text': value['text'], 'name': name, 'path': wav_path}
+ res.append(d)
+ return SuccessRequest(result=res)
+
+
+# 一键重置,获取新的文件地址
+@app.get('/finetune/newdir')
+async def FTGetNewDir():
+ new_path = os.path.join(FT_UPLOAD_PATH, randName(3))
+ if not os.path.exists(new_path):
+ os.makedirs(new_path, exist_ok=True)
+ # 把 labels.txt 复制进去
+ cmd = f"cp {FT_LABEL_TXT_PATH} {new_path}"
+ os.system(cmd)
+ return SuccessRequest(result=new_path)
+
+
+# finetune 上传文件
+@app.post("/finetune/upload")
+async def FTUpload(base: VcBaseFT):
+ try:
+ # 文件夹是否存在
+ if not os.path.exists(base.wav_path):
+ os.makedirs(base.wav_path)
+ # 保存音频文件
+ out_file_path = os.path.join(base.wav_path, base.filename)
+ wav_b = base64.b64decode(base.wav)
+ async with aiofiles.open(out_file_path, 'wb') as out_file:
+ await out_file.write(wav_b) # async write
+
+ return SuccessRequest(result="上传成功")
+ except Exception as e:
+ return ErrorRequest(result="上传失败")
+
+
+# finetune 微调
+@app.post("/finetune/clone_finetune")
+async def FTModel(base: VcBaseFTModel):
+ # 先检查 wav_path 是否有效
+ if base.wav_path == 'default':
+ data_path = FT_DEFAULT_PATH
+ else:
+ data_path = base.wav_path
+ if not os.path.exists(data_path):
+ return ErrorRequest(message="数据文件夹不存在")
+
+ data_base = data_path.split(os.sep)[-1]
+ exp_dir = os.path.join(FT_EXP_BASE_PATH, data_base)
+ try:
+ exp_dir = ft_model.finetune(
+ input_dir=os.path.realpath(data_path),
+ exp_dir=os.path.realpath(exp_dir))
+ if exp_dir:
+ return SuccessRequest(result=exp_dir)
+ else:
+ return ErrorRequest(message="微调失败")
+ except Exception as e:
+ print(e)
+ return ErrorRequest(message="微调失败")
+
+
+# finetune 合成
+@app.post("/finetune/clone_finetune_syn")
+async def FTSyn(base: VcBaseFTSyn):
+ try:
+ if not os.path.exists(base.exp_path):
+ return ErrorRequest(result="模型路径不存在")
+ wav_name = randName(5)
+ wav_path = ft_model.synthesize(
+ text=base.text,
+ wav_name=wav_name,
+ out_wav_dir=os.path.realpath(FT_OUT_PATH),
+ exp_dir=os.path.realpath(base.exp_path))
+ if wav_path:
+ res = {"wavName": wav_name + ".wav", "wavPath": wav_path}
+ return SuccessRequest(result=res)
+ else:
+ return ErrorRequest(message="音频合成失败")
+ except Exception as e:
+ return ErrorRequest(message="音频合成失败")
+
+
+if __name__ == '__main__':
+ uvicorn.run(app=app, host='0.0.0.0', port=port)
diff --git a/demos/speech_web/web_client/package.json b/demos/speech_web/web_client/package.json
index 7f28d4c97..d8c213e4a 100644
--- a/demos/speech_web/web_client/package.json
+++ b/demos/speech_web/web_client/package.json
@@ -8,6 +8,7 @@
"preview": "vite preview"
},
"dependencies": {
+ "@element-plus/icons-vue": "^2.0.9",
"ant-design-vue": "^2.2.8",
"axios": "^0.26.1",
"element-plus": "^2.1.9",
@@ -18,6 +19,7 @@
},
"devDependencies": {
"@vitejs/plugin-vue": "^2.3.0",
- "vite": "^2.9.0"
+ "vite": "^2.9.13",
+ "@vue/compiler-sfc": "^3.1.0"
}
}
diff --git a/demos/speech_web/web_client/src/api/API.js b/demos/speech_web/web_client/src/api/API.js
index 0feaa63f1..5adca3622 100644
--- a/demos/speech_web/web_client/src/api/API.js
+++ b/demos/speech_web/web_client/src/api/API.js
@@ -19,6 +19,26 @@ export const apiURL = {
CHAT_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/offlineStream', // ChatBot websocket 接口
ASR_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/onlineStream', // Stream ASR 接口
TTS_SOCKET_RECORD: 'ws://localhost:8010/ws/tts/online', // Stream TTS 接口
+
+ // voice clone
+ // Voice Clone
+ VC_List: '/api/vc/list',
+ SAT_List: '/api/sat/list',
+ FineTune_List: '/api/finetune/list',
+
+ VC_Upload: '/api/vc/upload',
+ SAT_Upload: '/api/sat/upload',
+ FineTune_Upload: '/api/finetune/upload',
+ FineTune_NewDir: '/api/finetune/newdir',
+
+ VC_Download: '/api/vc/download',
+ VC_Download_Base64: '/api/vc/download_base64',
+ VC_Del: '/api/vc/del',
+
+ VC_CloneG2p: '/api/vc/clone_g2p',
+ VC_CloneSAT: '/api/vc/clone_sat',
+ VC_CloneFineTune: '/api/finetune/clone_finetune',
+ VC_CloneFineTuneSyn: '/api/finetune/clone_finetune_syn',
}
diff --git a/demos/speech_web/web_client/src/api/ApiVC.js b/demos/speech_web/web_client/src/api/ApiVC.js
new file mode 100644
index 000000000..0dc0f6834
--- /dev/null
+++ b/demos/speech_web/web_client/src/api/ApiVC.js
@@ -0,0 +1,88 @@
+import axios from 'axios'
+import {apiURL} from "./API.js"
+
+// 上传音频-vc
+export async function vcUpload(params){
+ const result = await axios.post(apiURL.VC_Upload, params);
+ return result
+}
+
+// 上传音频-sat
+export async function satUpload(params){
+ const result = await axios.post(apiURL.SAT_Upload, params);
+ return result
+}
+
+// 上传音频-finetune
+export async function fineTuneUpload(params){
+ const result = await axios.post(apiURL.FineTune_Upload, params);
+ return result
+}
+
+// 删除音频
+export async function vcDel(params){
+ const result = await axios.post(apiURL.VC_Del, params);
+ return result
+}
+
+// 获取音频列表vc
+export async function vcList(){
+ const result = await axios.get(apiURL.VC_List);
+ return result
+}
+// 获取音频列表Sat
+export async function satList(){
+ const result = await axios.get(apiURL.SAT_List);
+ return result
+}
+
+// 获取音频列表fineTune
+export async function fineTuneList(params){
+ const result = await axios.post(apiURL.FineTune_List, params);
+ return result
+}
+
+// fineTune 一键重置 获取新的文件夹
+export async function fineTuneNewDir(){
+ const result = await axios.get(apiURL.FineTune_NewDir);
+ return result
+}
+
+// 获取音频数据
+export async function vcDownload(params){
+ const result = await axios.post(apiURL.VC_Download, params);
+ return result
+}
+
+// 获取音频数据Base64
+export async function vcDownloadBase64(params){
+ const result = await axios.post(apiURL.VC_Download_Base64, params);
+ return result
+}
+
+
+// 克隆合成G2P
+export async function vcCloneG2P(params){
+ const result = await axios.post(apiURL.VC_CloneG2p, params);
+ return result
+}
+
+// 克隆合成SAT
+export async function vcCloneSAT(params){
+ const result = await axios.post(apiURL.VC_CloneSAT, params);
+ return result
+}
+
+// 克隆合成 - finetune 微调
+export async function vcCloneFineTune(params){
+ const result = await axios.post(apiURL.VC_CloneFineTune, params);
+ return result
+}
+
+// 克隆合成 - finetune 合成
+export async function vcCloneFineTuneSyn(params){
+ const result = await axios.post(apiURL.VC_CloneFineTuneSyn, params);
+ return result
+}
+
+
diff --git a/demos/speech_web/web_client/src/components/Content/Header/Header.vue b/demos/speech_web/web_client/src/components/Content/Header/Header.vue
index 8135a2bff..c20f3366e 100644
--- a/demos/speech_web/web_client/src/components/Content/Header/Header.vue
+++ b/demos/speech_web/web_client/src/components/Content/Header/Header.vue
@@ -4,7 +4,7 @@
飞桨-PaddleSpeech
- PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源模型库,用于语音和音频中的各种关键任务的开发,欢迎大家Star收藏鼓励
+ PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源模型库,用于语音和音频中的各种关键任务的开发。支持语音识别,语音合成,声纹识别,声音分类,语音唤醒,语音翻译等多种语音任务,荣获 NAACL2022 Best Demo Award 。如果你喜欢这个示例,欢迎在 github 中 star 收藏鼓励。
diff --git a/demos/speech_web/web_client/src/components/SubMenu/ASR/RealTime/RealTime.vue b/demos/speech_web/web_client/src/components/SubMenu/ASR/RealTime/RealTime.vue
index 761a5c11f..5494bb8f8 100644
--- a/demos/speech_web/web_client/src/components/SubMenu/ASR/RealTime/RealTime.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/ASR/RealTime/RealTime.vue
@@ -58,9 +58,6 @@ export default {
mounted () {
this.wsUrl = apiURL.ASR_SOCKET_RECORD
this.ws = new WebSocket(this.wsUrl)
- if(this.ws.readyState === this.ws.CONNECTING){
- this.$message.success("实时识别 Websocket 连接成功")
- }
var _that = this
this.ws.addEventListener('message', function (event) {
var temp = JSON.parse(event.data);
@@ -78,7 +75,7 @@ export default {
// 检查 websocket 状态
// debugger
if(this.ws.readyState != this.ws.OPEN){
- this.$message.error("websocket 链接失败,请检查链接地址是否正确")
+ this.$message.error("websocket 链接失败,请检查 Websocket 后端服务是否正确开启")
return
}
diff --git a/demos/speech_web/web_client/src/components/SubMenu/ChatBot/Chat.vue b/demos/speech_web/web_client/src/components/SubMenu/ChatBot/Chat.vue
deleted file mode 100644
index 9d356fc80..000000000
--- a/demos/speech_web/web_client/src/components/SubMenu/ChatBot/Chat.vue
+++ /dev/null
@@ -1,298 +0,0 @@
-
-
-
语音聊天
-
- {{ recoText }}
-
- {{ envText }}
-
- 清空聊天
-
-
-
-
-
{{Result}}
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/demos/speech_web/web_client/src/components/SubMenu/ChatBot/ChatT.vue b/demos/speech_web/web_client/src/components/SubMenu/ChatBot/ChatT.vue
index c37c083ff..6db847706 100644
--- a/demos/speech_web/web_client/src/components/SubMenu/ChatBot/ChatT.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/ChatBot/ChatT.vue
@@ -91,6 +91,10 @@ export default {
methods: {
// 开始录音
startRecorder(){
+ if(this.ws.readyState != this.ws.OPEN){
+ this.$message.error("websocket 链接失败,请检查 Websocket 后端服务是否正确开启")
+ return
+ }
this.allResultList = []
if(!this.onReco){
this.asrResult = this.speakingText
diff --git a/demos/speech_web/web_client/src/components/SubMenu/ENIRE_SAT/ENIRE_SAT.vue b/demos/speech_web/web_client/src/components/SubMenu/ENIRE_SAT/ENIRE_SAT.vue
new file mode 100644
index 000000000..4a0aa2c63
--- /dev/null
+++ b/demos/speech_web/web_client/src/components/SubMenu/ENIRE_SAT/ENIRE_SAT.vue
@@ -0,0 +1,487 @@
+
+
+
+
+
+ 录制音频
+ 停止录音
+ 上传录音
+
+
+ 上传音频文件
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ 个性化语音合成
+
+ 跨语言语音合成
+
+ 语音编辑
+
+
+
+
+
+
+
+
+
+ 开始合成
+ 合成中
+
+
+ 播放
+ 播放
+ 下载
+ 下载
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/demos/speech_web/web_client/src/components/SubMenu/FineTune/FineTune.vue b/demos/speech_web/web_client/src/components/SubMenu/FineTune/FineTune.vue
new file mode 100644
index 000000000..abf203ae8
--- /dev/null
+++ b/demos/speech_web/web_client/src/components/SubMenu/FineTune/FineTune.vue
@@ -0,0 +1,427 @@
+
+
+
+
+
+ 一键重置
+ 默认示例
+ 一键微调
+ 微调中
+ 微调成功
+ 微调失败
+
+
+
+
+
+
+
+
+
+
+
+
+
+ 请输入中文文本
+
+
+
+
+
+
+
+
+ 开始合成
+ 合成中
+
+
+
+ 播放
+ 播放
+ 下载
+ 下载
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/demos/speech_web/web_client/src/components/SubMenu/IE/IE.vue b/demos/speech_web/web_client/src/components/SubMenu/IE/IE.vue
deleted file mode 100644
index c7dd04e9d..000000000
--- a/demos/speech_web/web_client/src/components/SubMenu/IE/IE.vue
+++ /dev/null
@@ -1,125 +0,0 @@
-
-
-
信息抽取体验
- {{ recoText }}
- 识别结果: {{ asrResultOffline }}
- 时间:{{ time }}
- 出发地:{{ outset }}
- 目的地:{{ destination }}
- 费用:{{ amount }}
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/demos/speech_web/web_client/src/components/SubMenu/TTS/TTST.vue b/demos/speech_web/web_client/src/components/SubMenu/TTS/TTST.vue
index 353221f7b..ef5591783 100644
--- a/demos/speech_web/web_client/src/components/SubMenu/TTS/TTST.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/TTS/TTST.vue
@@ -228,6 +228,10 @@ export default {
},
// 基于WS的流式合成
async getTtsChunkWavWS(){
+ if(this.ws.readyState != this.ws.OPEN){
+ this.$message.error("websocket 链接失败,请检查 Websocket 后端服务是否正确开启")
+ return
+ }
// 初始化 chunks
chunks = []
chunk_index = 0
diff --git a/demos/speech_web/web_client/src/components/SubMenu/VPR/VPR.vue b/demos/speech_web/web_client/src/components/SubMenu/VPR/VPR.vue
deleted file mode 100644
index 1fe71e4d8..000000000
--- a/demos/speech_web/web_client/src/components/SubMenu/VPR/VPR.vue
+++ /dev/null
@@ -1,178 +0,0 @@
-
-
-
-
声纹识别展示
-
- {{ recoText }}
- 注册
- 识别
-
-
-
声纹得分结果
-
-
-
-
-
-
-
声纹数据列表
-
-
-
-
-
-
-
-
-
-
-
- Delete
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/demos/speech_web/web_client/src/components/SubMenu/VPR/VPRT.vue b/demos/speech_web/web_client/src/components/SubMenu/VPR/VPRT.vue
index e398da00c..47eb41df5 100644
--- a/demos/speech_web/web_client/src/components/SubMenu/VPR/VPRT.vue
+++ b/demos/speech_web/web_client/src/components/SubMenu/VPR/VPRT.vue
@@ -214,14 +214,17 @@ export default {
let formData = new FormData()
formData.append('spk_id', this.enrollSpkId)
formData.append('audio', this.wav)
-
+
const result = await vprEnroll(formData)
+ if (!result){
+ this.$message.error("请检查后端服务是否正确开启")
+ return
+ }
if(result.data.status){
this.$message.success("声纹注册成功")
} else {
this.$message.error(result.data.msg)
}
- // console.log(result)
this.GetList()
this.wav = ''
this.randomSpkId()
diff --git a/demos/speech_web/web_client/src/components/SubMenu/VoiceClone/VoiceClone.vue b/demos/speech_web/web_client/src/components/SubMenu/VoiceClone/VoiceClone.vue
new file mode 100644
index 000000000..afa572417
--- /dev/null
+++ b/demos/speech_web/web_client/src/components/SubMenu/VoiceClone/VoiceClone.vue
@@ -0,0 +1,380 @@
+
+
+
+
+
+ 录制音频
+ 停止录音
+ 上传录音
+
+
+ 上传音频文件
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ GE2E
+ ECAPA-TDNN
+
+
+
+
+
+
+
+
+
+
+ 开始合成
+ 合成中
+
+
+
+ 播放
+ 播放
+ 下载
+ 下载
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/demos/speech_web/web_client/src/main.js b/demos/speech_web/web_client/src/main.js
index 3fbf87c85..544f5b30c 100644
--- a/demos/speech_web/web_client/src/main.js
+++ b/demos/speech_web/web_client/src/main.js
@@ -1,5 +1,6 @@
import { createApp } from 'vue'
import ElementPlus from 'element-plus'
+import * as ElementPlusIconsVue from '@element-plus/icons-vue'
import 'element-plus/dist/index.css'
import Antd from 'ant-design-vue';
import 'ant-design-vue/dist/antd.css';
@@ -9,5 +10,8 @@ import axios from 'axios'
const app = createApp(App)
app.config.globalProperties.$http = axios
+for (const [key, component] of Object.entries(ElementPlusIconsVue)) {
+ app.component(key, component)
+ }
app.use(ElementPlus).use(Antd)
app.mount('#app')
diff --git a/demos/speech_web/web_client/yarn.lock b/demos/speech_web/web_client/yarn.lock
index 6777cf4ce..7f07daa06 100644
--- a/demos/speech_web/web_client/yarn.lock
+++ b/demos/speech_web/web_client/yarn.lock
@@ -44,6 +44,11 @@
resolved "https://registry.npmmirror.com/@element-plus/icons-vue/-/icons-vue-1.1.4.tgz"
integrity sha512-Iz/nHqdp1sFPmdzRwHkEQQA3lKvoObk8azgABZ81QUOpW9s/lUyQVUSh0tNtEPZXQlKwlSh7SPgoVxzrE0uuVQ==
+"@element-plus/icons-vue@^2.0.9":
+ version "2.0.9"
+ resolved "https://registry.npmmirror.com/@element-plus/icons-vue/-/icons-vue-2.0.9.tgz#b7777c57534522e387303d194451d50ff549d49a"
+ integrity sha512-okdrwiVeKBmW41Hkl0eMrXDjzJwhQMuKiBOu17rOszqM+LS/yBYpNQNV5Jvoh06Wc+89fMmb/uhzf8NZuDuUaQ==
+
"@floating-ui/core@^0.6.1":
version "0.6.1"
resolved "https://registry.npmmirror.com/@floating-ui/core/-/core-0.6.1.tgz"
diff --git a/docs/requirements.txt b/docs/requirements.txt
index 3fb82367f..fd7a481ba 100644
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@@ -20,6 +20,7 @@ onnxruntime==1.10.0
opencc
paddlenlp
paddlepaddle>=2.2.2
+paddlespeech_ctcdecoders
paddlespeech_feat
pandas
pathos == 0.2.8
@@ -27,8 +28,8 @@ pattern_singleton
Pillow>=9.0.0
praatio==5.0.0
prettytable
-pypinyin<=0.44.0
pypinyin-dict
+pypinyin<=0.44.0
python-dateutil
pyworld==0.2.12
recommonmark>=0.5.0
diff --git a/docs/source/api/paddlespeech.cls.exps.panns.deploy.predict.rst b/docs/source/api/paddlespeech.cls.exps.panns.deploy.predict.rst
deleted file mode 100644
index d4f92a2ea..000000000
--- a/docs/source/api/paddlespeech.cls.exps.panns.deploy.predict.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.cls.exps.panns.deploy.predict module
-=================================================
-
-.. automodule:: paddlespeech.cls.exps.panns.deploy.predict
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.cls.exps.panns.deploy.rst b/docs/source/api/paddlespeech.cls.exps.panns.deploy.rst
index 4415c9330..369862ccf 100644
--- a/docs/source/api/paddlespeech.cls.exps.panns.deploy.rst
+++ b/docs/source/api/paddlespeech.cls.exps.panns.deploy.rst
@@ -12,4 +12,3 @@ Submodules
.. toctree::
:maxdepth: 4
- paddlespeech.cls.exps.panns.deploy.predict
diff --git a/docs/source/api/paddlespeech.cls.exps.panns.export_model.rst b/docs/source/api/paddlespeech.cls.exps.panns.export_model.rst
deleted file mode 100644
index 6c39c2bc8..000000000
--- a/docs/source/api/paddlespeech.cls.exps.panns.export_model.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.cls.exps.panns.export\_model module
-================================================
-
-.. automodule:: paddlespeech.cls.exps.panns.export_model
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.cls.exps.panns.predict.rst b/docs/source/api/paddlespeech.cls.exps.panns.predict.rst
deleted file mode 100644
index 88cd40338..000000000
--- a/docs/source/api/paddlespeech.cls.exps.panns.predict.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.cls.exps.panns.predict module
-==========================================
-
-.. automodule:: paddlespeech.cls.exps.panns.predict
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.cls.exps.panns.rst b/docs/source/api/paddlespeech.cls.exps.panns.rst
index 6147b245e..72f30ba61 100644
--- a/docs/source/api/paddlespeech.cls.exps.panns.rst
+++ b/docs/source/api/paddlespeech.cls.exps.panns.rst
@@ -20,6 +20,3 @@ Submodules
.. toctree::
:maxdepth: 4
- paddlespeech.cls.exps.panns.export_model
- paddlespeech.cls.exps.panns.predict
- paddlespeech.cls.exps.panns.train
diff --git a/docs/source/api/paddlespeech.cls.exps.panns.train.rst b/docs/source/api/paddlespeech.cls.exps.panns.train.rst
deleted file mode 100644
index a89b7eecc..000000000
--- a/docs/source/api/paddlespeech.cls.exps.panns.train.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.cls.exps.panns.train module
-========================================
-
-.. automodule:: paddlespeech.cls.exps.panns.train
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.kws.exps.mdtc.plot_det_curve.rst b/docs/source/api/paddlespeech.kws.exps.mdtc.plot_det_curve.rst
deleted file mode 100644
index 46a149b0b..000000000
--- a/docs/source/api/paddlespeech.kws.exps.mdtc.plot_det_curve.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.kws.exps.mdtc.plot\_det\_curve module
-==================================================
-
-.. automodule:: paddlespeech.kws.exps.mdtc.plot_det_curve
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.kws.exps.mdtc.rst b/docs/source/api/paddlespeech.kws.exps.mdtc.rst
index f6cad64e3..33d4a55cd 100644
--- a/docs/source/api/paddlespeech.kws.exps.mdtc.rst
+++ b/docs/source/api/paddlespeech.kws.exps.mdtc.rst
@@ -14,6 +14,5 @@ Submodules
paddlespeech.kws.exps.mdtc.collate
paddlespeech.kws.exps.mdtc.compute_det
- paddlespeech.kws.exps.mdtc.plot_det_curve
paddlespeech.kws.exps.mdtc.score
paddlespeech.kws.exps.mdtc.train
diff --git a/docs/source/api/paddlespeech.s2t.decoders.ctcdecoder.rst b/docs/source/api/paddlespeech.s2t.decoders.ctcdecoder.rst
index 8093619b1..dfcd274ca 100644
--- a/docs/source/api/paddlespeech.s2t.decoders.ctcdecoder.rst
+++ b/docs/source/api/paddlespeech.s2t.decoders.ctcdecoder.rst
@@ -13,5 +13,4 @@ Submodules
:maxdepth: 4
paddlespeech.s2t.decoders.ctcdecoder.decoders_deprecated
- paddlespeech.s2t.decoders.ctcdecoder.scorer_deprecated
paddlespeech.s2t.decoders.ctcdecoder.swig_wrapper
diff --git a/docs/source/api/paddlespeech.s2t.decoders.ctcdecoder.scorer_deprecated.rst b/docs/source/api/paddlespeech.s2t.decoders.ctcdecoder.scorer_deprecated.rst
deleted file mode 100644
index 1079d6721..000000000
--- a/docs/source/api/paddlespeech.s2t.decoders.ctcdecoder.scorer_deprecated.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.decoders.ctcdecoder.scorer\_deprecated module
-==============================================================
-
-.. automodule:: paddlespeech.s2t.decoders.ctcdecoder.scorer_deprecated
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.decoders.recog_bin.rst b/docs/source/api/paddlespeech.s2t.decoders.recog_bin.rst
deleted file mode 100644
index 4952e2e6a..000000000
--- a/docs/source/api/paddlespeech.s2t.decoders.recog_bin.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.decoders.recog\_bin module
-===========================================
-
-.. automodule:: paddlespeech.s2t.decoders.recog_bin
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.decoders.rst b/docs/source/api/paddlespeech.s2t.decoders.rst
index e4eabedfd..53e0d9c49 100644
--- a/docs/source/api/paddlespeech.s2t.decoders.rst
+++ b/docs/source/api/paddlespeech.s2t.decoders.rst
@@ -23,5 +23,4 @@ Submodules
:maxdepth: 4
paddlespeech.s2t.decoders.recog
- paddlespeech.s2t.decoders.recog_bin
paddlespeech.s2t.decoders.utils
diff --git a/docs/source/api/paddlespeech.s2t.decoders.scorers.ngram.rst b/docs/source/api/paddlespeech.s2t.decoders.scorers.ngram.rst
deleted file mode 100644
index f38a61099..000000000
--- a/docs/source/api/paddlespeech.s2t.decoders.scorers.ngram.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.decoders.scorers.ngram module
-==============================================
-
-.. automodule:: paddlespeech.s2t.decoders.scorers.ngram
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.decoders.scorers.rst b/docs/source/api/paddlespeech.s2t.decoders.scorers.rst
index 83808c49b..ca834f6b5 100644
--- a/docs/source/api/paddlespeech.s2t.decoders.scorers.rst
+++ b/docs/source/api/paddlespeech.s2t.decoders.scorers.rst
@@ -15,5 +15,4 @@ Submodules
paddlespeech.s2t.decoders.scorers.ctc
paddlespeech.s2t.decoders.scorers.ctc_prefix_score
paddlespeech.s2t.decoders.scorers.length_bonus
- paddlespeech.s2t.decoders.scorers.ngram
paddlespeech.s2t.decoders.scorers.scorer_interface
diff --git a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.client.rst b/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.client.rst
deleted file mode 100644
index a73a56853..000000000
--- a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.client.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.exps.deepspeech2.bin.deploy.client module
-==========================================================
-
-.. automodule:: paddlespeech.s2t.exps.deepspeech2.bin.deploy.client
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.record.rst b/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.record.rst
deleted file mode 100644
index bc1078485..000000000
--- a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.record.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.exps.deepspeech2.bin.deploy.record module
-==========================================================
-
-.. automodule:: paddlespeech.s2t.exps.deepspeech2.bin.deploy.record
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.rst b/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.rst
index d1f966fc1..28de0f7fb 100644
--- a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.rst
+++ b/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.rst
@@ -12,8 +12,5 @@ Submodules
.. toctree::
:maxdepth: 4
- paddlespeech.s2t.exps.deepspeech2.bin.deploy.client
- paddlespeech.s2t.exps.deepspeech2.bin.deploy.record
paddlespeech.s2t.exps.deepspeech2.bin.deploy.runtime
- paddlespeech.s2t.exps.deepspeech2.bin.deploy.send
paddlespeech.s2t.exps.deepspeech2.bin.deploy.server
diff --git a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.send.rst b/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.send.rst
deleted file mode 100644
index ba1ae0a62..000000000
--- a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.send.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.exps.deepspeech2.bin.deploy.send module
-========================================================
-
-.. automodule:: paddlespeech.s2t.exps.deepspeech2.bin.deploy.send
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.exps.u2.rst b/docs/source/api/paddlespeech.s2t.exps.u2.rst
index e0ebb7fc9..bf5656701 100644
--- a/docs/source/api/paddlespeech.s2t.exps.u2.rst
+++ b/docs/source/api/paddlespeech.s2t.exps.u2.rst
@@ -21,4 +21,3 @@ Submodules
:maxdepth: 4
paddlespeech.s2t.exps.u2.model
- paddlespeech.s2t.exps.u2.trainer
diff --git a/docs/source/api/paddlespeech.s2t.exps.u2.trainer.rst b/docs/source/api/paddlespeech.s2t.exps.u2.trainer.rst
deleted file mode 100644
index 0cd28945a..000000000
--- a/docs/source/api/paddlespeech.s2t.exps.u2.trainer.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.exps.u2.trainer module
-=======================================
-
-.. automodule:: paddlespeech.s2t.exps.u2.trainer
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.exps.u2_kaldi.bin.recog.rst b/docs/source/api/paddlespeech.s2t.exps.u2_kaldi.bin.recog.rst
deleted file mode 100644
index bc749c8f8..000000000
--- a/docs/source/api/paddlespeech.s2t.exps.u2_kaldi.bin.recog.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.exps.u2\_kaldi.bin.recog module
-================================================
-
-.. automodule:: paddlespeech.s2t.exps.u2_kaldi.bin.recog
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.exps.u2_kaldi.bin.rst b/docs/source/api/paddlespeech.s2t.exps.u2_kaldi.bin.rst
index ff1a6efee..087b87677 100644
--- a/docs/source/api/paddlespeech.s2t.exps.u2_kaldi.bin.rst
+++ b/docs/source/api/paddlespeech.s2t.exps.u2_kaldi.bin.rst
@@ -12,6 +12,5 @@ Submodules
.. toctree::
:maxdepth: 4
- paddlespeech.s2t.exps.u2_kaldi.bin.recog
paddlespeech.s2t.exps.u2_kaldi.bin.test
paddlespeech.s2t.exps.u2_kaldi.bin.train
diff --git a/docs/source/api/paddlespeech.s2t.training.extensions.rst b/docs/source/api/paddlespeech.s2t.training.extensions.rst
index f31b8427e..13530a8d2 100644
--- a/docs/source/api/paddlespeech.s2t.training.extensions.rst
+++ b/docs/source/api/paddlespeech.s2t.training.extensions.rst
@@ -15,5 +15,3 @@ Submodules
paddlespeech.s2t.training.extensions.evaluator
paddlespeech.s2t.training.extensions.extension
paddlespeech.s2t.training.extensions.plot
- paddlespeech.s2t.training.extensions.snapshot
- paddlespeech.s2t.training.extensions.visualizer
diff --git a/docs/source/api/paddlespeech.s2t.training.extensions.snapshot.rst b/docs/source/api/paddlespeech.s2t.training.extensions.snapshot.rst
deleted file mode 100644
index e0ca21a73..000000000
--- a/docs/source/api/paddlespeech.s2t.training.extensions.snapshot.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.training.extensions.snapshot module
-====================================================
-
-.. automodule:: paddlespeech.s2t.training.extensions.snapshot
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.training.extensions.visualizer.rst b/docs/source/api/paddlespeech.s2t.training.extensions.visualizer.rst
deleted file mode 100644
index 22ae11f11..000000000
--- a/docs/source/api/paddlespeech.s2t.training.extensions.visualizer.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.training.extensions.visualizer module
-======================================================
-
-.. automodule:: paddlespeech.s2t.training.extensions.visualizer
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.training.updaters.rst b/docs/source/api/paddlespeech.s2t.training.updaters.rst
index a06170168..b38704a0d 100644
--- a/docs/source/api/paddlespeech.s2t.training.updaters.rst
+++ b/docs/source/api/paddlespeech.s2t.training.updaters.rst
@@ -13,5 +13,4 @@ Submodules
:maxdepth: 4
paddlespeech.s2t.training.updaters.standard_updater
- paddlespeech.s2t.training.updaters.trainer
paddlespeech.s2t.training.updaters.updater
diff --git a/docs/source/api/paddlespeech.s2t.training.updaters.trainer.rst b/docs/source/api/paddlespeech.s2t.training.updaters.trainer.rst
deleted file mode 100644
index 6981a8f05..000000000
--- a/docs/source/api/paddlespeech.s2t.training.updaters.trainer.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.training.updaters.trainer module
-=================================================
-
-.. automodule:: paddlespeech.s2t.training.updaters.trainer
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.transform.add_deltas.rst b/docs/source/api/paddlespeech.s2t.transform.add_deltas.rst
deleted file mode 100644
index 5007fd9d8..000000000
--- a/docs/source/api/paddlespeech.s2t.transform.add_deltas.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.add\_deltas module
-=============================================
-
-.. automodule:: paddlespeech.s2t.transform.add_deltas
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.transform.channel_selector.rst b/docs/source/api/paddlespeech.s2t.transform.channel_selector.rst
deleted file mode 100644
index e08dd253e..000000000
--- a/docs/source/api/paddlespeech.s2t.transform.channel_selector.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.channel\_selector module
-===================================================
-
-.. automodule:: paddlespeech.s2t.transform.channel_selector
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.transform.cmvn.rst b/docs/source/api/paddlespeech.s2t.transform.cmvn.rst
deleted file mode 100644
index 8348e3d4b..000000000
--- a/docs/source/api/paddlespeech.s2t.transform.cmvn.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.cmvn module
-======================================
-
-.. automodule:: paddlespeech.s2t.transform.cmvn
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.transform.functional.rst b/docs/source/api/paddlespeech.s2t.transform.functional.rst
deleted file mode 100644
index eb2b54a67..000000000
--- a/docs/source/api/paddlespeech.s2t.transform.functional.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.functional module
-============================================
-
-.. automodule:: paddlespeech.s2t.transform.functional
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.transform.perturb.rst b/docs/source/api/paddlespeech.s2t.transform.perturb.rst
deleted file mode 100644
index 0be28ab7e..000000000
--- a/docs/source/api/paddlespeech.s2t.transform.perturb.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.perturb module
-=========================================
-
-.. automodule:: paddlespeech.s2t.transform.perturb
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.transform.rst b/docs/source/api/paddlespeech.s2t.transform.rst
deleted file mode 100644
index 5016ff4f1..000000000
--- a/docs/source/api/paddlespeech.s2t.transform.rst
+++ /dev/null
@@ -1,24 +0,0 @@
-paddlespeech.s2t.transform package
-==================================
-
-.. automodule:: paddlespeech.s2t.transform
- :members:
- :undoc-members:
- :show-inheritance:
-
-Submodules
-----------
-
-.. toctree::
- :maxdepth: 4
-
- paddlespeech.s2t.transform.add_deltas
- paddlespeech.s2t.transform.channel_selector
- paddlespeech.s2t.transform.cmvn
- paddlespeech.s2t.transform.functional
- paddlespeech.s2t.transform.perturb
- paddlespeech.s2t.transform.spec_augment
- paddlespeech.s2t.transform.spectrogram
- paddlespeech.s2t.transform.transform_interface
- paddlespeech.s2t.transform.transformation
- paddlespeech.s2t.transform.wpe
diff --git a/docs/source/api/paddlespeech.s2t.transform.spec_augment.rst b/docs/source/api/paddlespeech.s2t.transform.spec_augment.rst
deleted file mode 100644
index 00fd3ea12..000000000
--- a/docs/source/api/paddlespeech.s2t.transform.spec_augment.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.spec\_augment module
-===============================================
-
-.. automodule:: paddlespeech.s2t.transform.spec_augment
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.transform.spectrogram.rst b/docs/source/api/paddlespeech.s2t.transform.spectrogram.rst
deleted file mode 100644
index 33c499a7a..000000000
--- a/docs/source/api/paddlespeech.s2t.transform.spectrogram.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.spectrogram module
-=============================================
-
-.. automodule:: paddlespeech.s2t.transform.spectrogram
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.transform.transform_interface.rst b/docs/source/api/paddlespeech.s2t.transform.transform_interface.rst
deleted file mode 100644
index 009b06589..000000000
--- a/docs/source/api/paddlespeech.s2t.transform.transform_interface.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.transform\_interface module
-======================================================
-
-.. automodule:: paddlespeech.s2t.transform.transform_interface
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.transform.transformation.rst b/docs/source/api/paddlespeech.s2t.transform.transformation.rst
deleted file mode 100644
index a03e731a5..000000000
--- a/docs/source/api/paddlespeech.s2t.transform.transformation.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.transformation module
-================================================
-
-.. automodule:: paddlespeech.s2t.transform.transformation
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.transform.wpe.rst b/docs/source/api/paddlespeech.s2t.transform.wpe.rst
deleted file mode 100644
index c4831f7f9..000000000
--- a/docs/source/api/paddlespeech.s2t.transform.wpe.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.wpe module
-=====================================
-
-.. automodule:: paddlespeech.s2t.transform.wpe
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.server.engine.acs.python.acs_engine.rst b/docs/source/api/paddlespeech.server.engine.acs.python.acs_engine.rst
deleted file mode 100644
index 9b61633e0..000000000
--- a/docs/source/api/paddlespeech.server.engine.acs.python.acs_engine.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.server.engine.acs.python.acs\_engine module
-========================================================
-
-.. automodule:: paddlespeech.server.engine.acs.python.acs_engine
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.server.engine.acs.python.rst b/docs/source/api/paddlespeech.server.engine.acs.python.rst
index 3c06ba080..7e5582bd0 100644
--- a/docs/source/api/paddlespeech.server.engine.acs.python.rst
+++ b/docs/source/api/paddlespeech.server.engine.acs.python.rst
@@ -12,4 +12,3 @@ Submodules
.. toctree::
:maxdepth: 4
- paddlespeech.server.engine.acs.python.acs_engine
diff --git a/docs/source/api/paddlespeech.server.utils.log.rst b/docs/source/api/paddlespeech.server.utils.log.rst
deleted file mode 100644
index 453b4a61f..000000000
--- a/docs/source/api/paddlespeech.server.utils.log.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.server.utils.log module
-====================================
-
-.. automodule:: paddlespeech.server.utils.log
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.t2s.exps.rst b/docs/source/api/paddlespeech.t2s.exps.rst
index bee18a972..643f97b4c 100644
--- a/docs/source/api/paddlespeech.t2s.exps.rst
+++ b/docs/source/api/paddlespeech.t2s.exps.rst
@@ -30,10 +30,10 @@ Submodules
paddlespeech.t2s.exps.inference
paddlespeech.t2s.exps.inference_streaming
+ paddlespeech.t2s.models.vits.monotonic_align
paddlespeech.t2s.exps.ort_predict
paddlespeech.t2s.exps.ort_predict_e2e
paddlespeech.t2s.exps.ort_predict_streaming
- paddlespeech.t2s.exps.stream_play_tts
paddlespeech.t2s.exps.syn_utils
paddlespeech.t2s.exps.synthesize
paddlespeech.t2s.exps.synthesize_e2e
diff --git a/docs/source/api/paddlespeech.t2s.exps.stream_play_tts.rst b/docs/source/api/paddlespeech.t2s.exps.stream_play_tts.rst
deleted file mode 100644
index cb22dde0c..000000000
--- a/docs/source/api/paddlespeech.t2s.exps.stream_play_tts.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.t2s.exps.stream\_play\_tts module
-==============================================
-
-.. automodule:: paddlespeech.t2s.exps.stream_play_tts
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.t2s.models.ernie_sat.mlm.rst b/docs/source/api/paddlespeech.t2s.models.ernie_sat.mlm.rst
deleted file mode 100644
index f0e8fd11a..000000000
--- a/docs/source/api/paddlespeech.t2s.models.ernie_sat.mlm.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.t2s.models.ernie\_sat.mlm module
-=============================================
-
-.. automodule:: paddlespeech.t2s.models.ernie_sat.mlm
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.core.rst b/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.core.rst
deleted file mode 100644
index 7aaba7952..000000000
--- a/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.core.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.t2s.models.vits.monotonic\_align.core module
-=========================================================
-
-.. automodule:: paddlespeech.t2s.models.vits.monotonic_align.core
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.rst b/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.rst
deleted file mode 100644
index 25c819a7e..000000000
--- a/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.rst
+++ /dev/null
@@ -1,16 +0,0 @@
-paddlespeech.t2s.models.vits.monotonic\_align package
-=====================================================
-
-.. automodule:: paddlespeech.t2s.models.vits.monotonic_align
- :members:
- :undoc-members:
- :show-inheritance:
-
-Submodules
-----------
-
-.. toctree::
- :maxdepth: 4
-
- paddlespeech.t2s.models.vits.monotonic_align.core
- paddlespeech.t2s.models.vits.monotonic_align.setup
diff --git a/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.setup.rst b/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.setup.rst
deleted file mode 100644
index a93c3b8bf..000000000
--- a/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.setup.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.t2s.models.vits.monotonic\_align.setup module
-==========================================================
-
-.. automodule:: paddlespeech.t2s.models.vits.monotonic_align.setup
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.t2s.models.vits.rst b/docs/source/api/paddlespeech.t2s.models.vits.rst
index 3146094b0..205496f0f 100644
--- a/docs/source/api/paddlespeech.t2s.models.vits.rst
+++ b/docs/source/api/paddlespeech.t2s.models.vits.rst
@@ -12,7 +12,6 @@ Subpackages
.. toctree::
:maxdepth: 4
- paddlespeech.t2s.models.vits.monotonic_align
paddlespeech.t2s.models.vits.wavenet
Submodules
diff --git a/docs/source/tts/demo.rst b/docs/source/tts/demo.rst
index ca2fd98e4..1ae687f85 100644
--- a/docs/source/tts/demo.rst
+++ b/docs/source/tts/demo.rst
@@ -42,7 +42,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
Printing, in the only sense with which we are at present concerned, differs from most if not from all the arts and crafts represented in the Exhibition
-
+
@@ -50,7 +50,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
-
+
@@ -61,7 +61,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
in being comparatively modern.
-
+
@@ -70,7 +70,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
-
+
@@ -81,7 +81,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
For although the Chinese took impressions from wood blocks engraved in relief for centuries before the woodcutters of the Netherlands, by a similar process
-
+
@@ -89,7 +89,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
-
+
@@ -100,7 +100,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
produced the block books, which were the immediate predecessors of the true printed book
-
+
@@ -108,7 +108,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
-
+
@@ -119,7 +119,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
the invention of movable metal letters in the middle of the fifteenth century may justly be considered as the invention of the art of printing.
-
+
@@ -127,7 +127,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
-
+
@@ -153,7 +153,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
昨日,这名“伤者”与医生全部被警方依法刑事拘留
-
+
@@ -161,7 +161,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
-
+
@@ -172,7 +172,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
钱伟长想到上海来办学校是经过深思熟虑的。
-
+
@@ -180,7 +180,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
-
+
@@ -191,7 +191,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
她见我一进门就骂,吃饭时也骂,骂得我抬不起头。
-
+
@@ -199,7 +199,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
-
+
@@ -210,7 +210,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
李述德在离开之前,只说了一句“柱驼杀父亲了”
-
+
@@ -218,7 +218,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
-
+
@@ -230,7 +230,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
这种车票和保险单捆绑出售属于重复性购买。
-
+
@@ -238,7 +238,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
-
+
@@ -271,7 +271,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
Life was like a box of chocolates, you never know what you're gonna get.
-
+
@@ -279,7 +279,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -290,7 +290,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
With great power there must come great responsibility.
-
+
@@ -298,7 +298,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -309,7 +309,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
To be or not to be, that’s a question.
-
+
@@ -318,7 +318,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -330,7 +330,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
A man can be destroyed but not defeated.
-
+
@@ -339,7 +339,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -350,7 +350,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
Do not, for one repulse, give up the purpose that you resolved to effort.
-
+
@@ -359,7 +359,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -370,7 +370,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
Death is just a part of life, something we're all destined to do.
-
+
@@ -379,7 +379,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -390,7 +390,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
I think it's hard winning a war with words.
-
+
@@ -399,7 +399,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -410,7 +410,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
Don’t argue with the people of strong determination, because they may change the fact!
-
+
@@ -419,7 +419,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -430,7 +430,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
Love you three thousand times.
-
+
@@ -439,7 +439,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -465,7 +465,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
凯莫瑞安联合体的经济崩溃,迫在眉睫。
-
+
@@ -473,7 +473,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -484,7 +484,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
对于所有想要离开那片废土,去寻找更美好生活的人来说。
-
+
@@ -492,7 +492,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -503,7 +503,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
克哈,是你们所有人安全的港湾。
-
+
@@ -511,7 +511,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -523,7 +523,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
为了保护尤摩扬人民不受异虫的残害,我所做的,比他们自己的领导委员会都多。
-
+
@@ -531,7 +531,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -542,7 +542,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
无论他们如何诽谤我,我将继续为所有泰伦人的最大利益,而努力奋斗。
-
+
@@ -550,7 +550,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -561,7 +561,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
身为你们的元首,我带领泰伦人实现了人类统治领地和经济的扩张。
-
+
@@ -569,7 +569,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -580,7 +580,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
我们将继续成长,用行动回击那些只会说风凉话,不愿意和我们相向而行的害群之马。
-
+
@@ -588,7 +588,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -599,7 +599,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
帝国武装力量,无数的优秀儿女,正时刻守卫着我们的家园大门,但是他们孤木难支。
-
+
@@ -607,7 +607,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -618,7 +618,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
凡是今天应征入伍者,所获的所有刑罚罪责,减半。
-
+
@@ -626,7 +626,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -641,11 +641,11 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
- FastSpeech2-Conformer + ParallelWaveGAN
+ FastSpeech2-Conformer + ParallelWaveGAN
-
+
@@ -655,7 +655,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -665,7 +665,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -676,7 +676,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -686,7 +686,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -696,7 +696,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -706,7 +706,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -716,7 +716,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -726,7 +726,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -756,7 +756,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -764,7 +764,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -774,7 +774,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -782,7 +782,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -792,7 +792,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -800,7 +800,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -810,7 +810,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -818,7 +818,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -828,7 +828,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -836,7 +836,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -846,7 +846,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -854,7 +854,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -864,7 +864,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -872,7 +872,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -882,7 +882,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -890,7 +890,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -900,7 +900,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -908,7 +908,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -918,7 +918,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -926,7 +926,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -936,7 +936,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -944,7 +944,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -954,7 +954,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -962,7 +962,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -972,7 +972,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -980,7 +980,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -990,7 +990,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -998,7 +998,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1008,7 +1008,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1016,7 +1016,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1026,7 +1026,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1034,7 +1034,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1044,7 +1044,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1052,7 +1052,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1062,7 +1062,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1070,7 +1070,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1080,7 +1080,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1088,7 +1088,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1098,7 +1098,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1106,7 +1106,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1142,7 +1142,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1150,7 +1150,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1158,7 +1158,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1168,7 +1168,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1176,7 +1176,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1184,7 +1184,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1194,7 +1194,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1202,7 +1202,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1210,7 +1210,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1220,7 +1220,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1228,7 +1228,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1236,7 +1236,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1246,7 +1246,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1254,7 +1254,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1262,7 +1262,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1272,7 +1272,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1280,7 +1280,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1288,7 +1288,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1298,7 +1298,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1306,7 +1306,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1314,7 +1314,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1324,7 +1324,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1332,7 +1332,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1340,7 +1340,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1374,7 +1374,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1382,7 +1382,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1392,7 +1392,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1400,7 +1400,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1410,7 +1410,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1418,7 +1418,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1428,7 +1428,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1436,7 +1436,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1446,7 +1446,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1454,7 +1454,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1464,7 +1464,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1472,7 +1472,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1482,7 +1482,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1490,7 +1490,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1500,7 +1500,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1508,7 +1508,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1542,7 +1542,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
他只是一个纸老虎。
-
+
@@ -1550,7 +1550,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
-
+
@@ -1561,7 +1561,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
手表厂有五种好产品。
-
+
@@ -1569,7 +1569,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
-
+
@@ -1580,7 +1580,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
老板的轿车需要保养。
-
+
@@ -1588,7 +1588,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
-
+
@@ -1599,7 +1599,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
我们所有人都好喜欢你呀。
-
+
@@ -1607,7 +1607,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
-
+
@@ -1618,7 +1618,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
岂有此理。
-
+
@@ -1626,7 +1626,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
-
+
@@ -1637,7 +1637,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
虎骨酒多少钱一瓶。
-
+
@@ -1645,7 +1645,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
-
+
@@ -1656,7 +1656,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
这件事情需要冷处理。
-
+
@@ -1664,7 +1664,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
-
+
@@ -1675,7 +1675,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
这个老奶奶是个大喇叭。
-
+
@@ -1683,7 +1683,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
-
+
@@ -1694,7 +1694,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
我喜欢说相声。
-
+
@@ -1702,7 +1702,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
-
+
@@ -1713,7 +1713,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
有一天,我路过了一栋楼。
-
+
@@ -1721,7 +1721,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
-
+
@@ -1735,4 +1735,142 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
-
\ No newline at end of file
+
+Finetune FastSpeech2 for CSMSC
+--------------------------------------
+
+Finetuning demos of `tts_finetune/tts3 `_ for CSMSC dataset.
+
+When finetuning for CSMSC, we thought ``Freeze encoder`` > ``Non Frozen`` > ``Freeze encoder && duration_predictor`` for audio quality.
+
+.. raw:: html
+
+
+ CSMSC reference audio (fastspeech2_csmsc + hifigan_aishlle3 in CLI): 欢迎使用飞桨语音套件。
+
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+
+ Frozen Method
+ train_num=10, bs=10, epoch=100, lr=1e-4
+ train_num=18, bs=18, epoch=100, lr=1e-4
+ train_num=97, bs=64, epoch=100, lr=1e-4
+ train_num=196, bs=64, epoch=100, lr=1e-4
+
+
+ Non Frozen
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+ Freeze encoder
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+ Freeze encoder && duration_predictor
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+
+
+
diff --git a/docs/source/tts/demo_2.rst b/docs/source/tts/demo_2.rst
index 2f0ca7cdb..06d0d0399 100644
--- a/docs/source/tts/demo_2.rst
+++ b/docs/source/tts/demo_2.rst
@@ -19,7 +19,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
早上好,今天是2020/10/29,最低温度是-3°C。
-
+
@@ -27,7 +27,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
-
+
@@ -38,7 +38,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
你好,我的编号是37249,很高兴为您服务。
-
+
@@ -46,7 +46,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
-
+
@@ -57,7 +57,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
我们公司有37249个人。
-
+
@@ -65,7 +65,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
-
+
@@ -76,7 +76,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
我出生于2005年10月8日。
-
+
@@ -84,7 +84,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
-
+
@@ -95,7 +95,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
我们习惯在12:30吃中午饭。
-
+
@@ -103,7 +103,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
-
+
@@ -114,7 +114,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
只要有超过3/4的人投票同意,你就会成为我们的新班长。
-
+
@@ -122,7 +122,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
-
+
@@ -133,7 +133,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
我要买一只价值999.9元的手表。
-
+
@@ -141,7 +141,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
-
+
@@ -152,7 +152,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
我的手机号是18544139121,欢迎来电。
-
+
@@ -160,7 +160,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
-
+
@@ -171,7 +171,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
明天有62%的概率降雨。
-
+