diff --git a/demos/speech_web/.gitignore b/demos/speech_web/.gitignore
index 54418e60..1e961a38 100644
--- a/demos/speech_web/.gitignore
+++ b/demos/speech_web/.gitignore
@@ -13,4 +13,7 @@
*.pdmodel
*/source/*
*/PaddleSpeech/*
+*/tmp*/*
+*/duration.txt
+*/oov_info.txt
diff --git a/demos/speech_web/README.md b/demos/speech_web/README.md
index 3b2da6e9..e8c59ea8 100644
--- a/demos/speech_web/README.md
+++ b/demos/speech_web/README.md
@@ -1,55 +1,79 @@
# Paddle Speech Demo
-PaddleSpeechDemo 是一个以 PaddleSpeech 的语音交互功能为主体开发的 Demo 展示项目,用于帮助大家更好的上手 PaddleSpeech 以及使用 PaddleSpeech 构建自己的应用。
+## 简介
+Paddle Speech Demo 是一个以 PaddleSpeech 的语音交互功能为主体开发的 Demo 展示项目,用于帮助大家更好的上手 PaddleSpeech 以及使用 PaddleSpeech 构建自己的应用。
-智能语音交互部分使用 PaddleSpeech,对话以及信息抽取部分使用 PaddleNLP,网页前端展示部分基于 Vue3 进行开发
+智能语音交互部分使用 PaddleSpeech,对话以及信息抽取部分使用 PaddleNLP,网页前端展示部分基于 Vue3 进行开发。
主要功能:
+`main.py` 中包含功能
+ 语音聊天:PaddleSpeech 的语音识别能力+语音合成能力,对话部分基于 PaddleNLP 的闲聊功能
+ 声纹识别:PaddleSpeech 的声纹识别功能展示
+ 语音识别:支持【实时语音识别】,【端到端识别】,【音频文件识别】三种模式
+ 语音合成:支持【流式合成】与【端到端合成】两种方式
+ 语音指令:基于 PaddleSpeech 的语音识别能力与 PaddleNLP 的信息抽取,实现交通费的智能报销
+`vc.py` 中包含功能
++ 一句话合成:基于 GE2E 和 ECAPA-TDNN 模型的一句话合成方案,可以模仿输入的音频的音色进行合成任务
+ + GE2E 音色克隆方案可以参考: [【FastSpeech2 + AISHELL-3 Voice Cloning】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/vc1)
+ + ECAPA-TDNN 音色克隆方案可以参考: [【FastSpeech2 + AISHELL-3 Voice Cloning (ECAPA-TDNN)】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/vc2)
+
++ 小数据微调:基于小数据集的微调方案,内置用12句话标贝中文女声微调示例,你也可以通过一键重置,录制自己的声音,注意在安静环境下录制,效果会更好。你可以在 [【Finetune your own AM based on FastSpeech2 with AISHELL-3】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/tts_finetune/tts3)中尝试使用自己的数据集进行微调。
+
++ ENIRE-SAT:语言-语音跨模态大模型 ENIRE-SAT 可视化展示示例,支持个性化合成,跨语言语音合成(音频为中文则输入英文文本进行合成),语音编辑(修改音频文字中间的结果)功能。 ENIRE-SAT 更多实现细节,可以参考:
+ + [【ERNIE-SAT with AISHELL-3 dataset】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/ernie_sat)
+ + [【ERNIE-SAT with with AISHELL3 and VCTK datasets】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3_vctk/ernie_sat)
+ + [【ERNIE-SAT with VCTK dataset】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/ernie_sat)
+
运行效果:
- ![效果](docs/效果展示.png)
+ ![效果](https://user-images.githubusercontent.com/30135920/191188766-12e7ca15-f7b4-45f8-9da5-0c0b0bbe5fcb.png)
-## 安装
-### 后端环境安装
-```
-# 安装环境
-cd speech_server
-pip install -r requirements.txt
+## 基础环境安装
-# 下载 ie 模型,针对地点进行微调,效果更好,不下载的话会使用其它版本,效果没有这个好
-cd source
-mkdir model
-cd model
-wget https://bj.bcebos.com/paddlenlp/applications/speech-cmd-analysis/finetune/model_state.pdparams
+### 后端环境安装
+```bash
+cd speech_server
+pip install -r requirements.txt -i https://mirror.baidu.com/pypi/simple
+cd ../
```
### 前端环境安装
-
前端依赖 `node.js` ,需要提前安装,确保 `npm` 可用,`npm` 测试版本 `8.3.1`,建议下载[官网](https://nodejs.org/en/)稳定版的 `node.js`
-```
+```bash
# 进入前端目录
cd web_client
-
# 安装 `yarn`,已经安装可跳过
npm install -g yarn
-
# 使用yarn安装前端依赖
yarn install
+cd ../
```
+
## 启动服务
+【注意】目前只支持 `main.py` 和 `vc.py` 两者中选择开启一个后端服务。
+
+### 启动 `main.py` 后端服务
+
+#### 下载相关模型
+
+只需手动下载语音指令所需模型即可,其他模型会自动下载。
-### 开启后端服务
+```bash
+cd speech_server
+mkdir -p source/model
+cd source/model
+# 下载IE模型
+wget https://bj.bcebos.com/paddlenlp/applications/speech-cmd-analysis/finetune/model_state.pdparams
+cd ../../
+
+```
+#### 启动后端服务
```
cd speech_server
@@ -57,7 +81,91 @@ cd speech_server
python main.py --port 8010
```
-### 开启前端服务
+
+### 启动 `vc.py` 后端服务
+
+#### 下载相关模型和音频
+
+```bash
+cd speech_server
+
+# 已创建则跳过
+mkdir -p source/model
+cd source
+# 下载 & 解压 wav (包含VC测试音频)
+wget https://paddlespeech.bj.bcebos.com/demos/speech_web/wav_vc.zip
+unzip wav_vc.zip
+
+cd model
+# 下载 GE2E 相关模型
+wget https://bj.bcebos.com/paddlespeech/Parakeet/released_models/ge2e/ge2e_ckpt_0.3.zip
+unzip ge2e_ckpt_0.3.zip
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_aishell3_ckpt_0.5.zip
+unzip pwg_aishell3_ckpt_0.5.zip
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip
+unzip fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip
+
+# 下载 ECAPA-TDNN 相关模型
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_ckpt_vc2_1.2.0.zip
+unzip fastspeech2_aishell3_ckpt_vc2_1.2.0.zip
+
+# 下载 ERNIE-SAT 相关模型
+# aishell3 ERNIE-SAT
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/ernie_sat/erniesat_aishell3_ckpt_1.2.0.zip
+unzip erniesat_aishell3_ckpt_1.2.0.zip
+
+# vctk ERNIE-SAT
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/ernie_sat/erniesat_vctk_ckpt_1.2.0.zip
+unzip erniesat_vctk_ckpt_1.2.0.zip
+
+# aishell3_vctk ERNIE-SAT
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/ernie_sat/erniesat_aishell3_vctk_ckpt_1.2.0.zip
+unzip erniesat_aishell3_vctk_ckpt_1.2.0.zip
+
+# 下载 finetune 相关模型
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_ckpt_1.1.0.zip
+unzip fastspeech2_aishell3_ckpt_1.1.0.zip
+
+# 下载声码器
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_ckpt_0.2.0.zip
+unzip hifigan_aishell3_ckpt_0.2.0.zip
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_ckpt_0.2.0.zip
+unzip hifigan_vctk_ckpt_0.2.0.zip
+
+cd ../../../
+```
+
+#### ERNIE-SAT 环境配置
+
+ERNIE-SAT 体验依赖于 [examples/aishell3_vctk/ernie_sat](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3_vctk/ernie_sat) 的环境。参考 `examples/aishell3_vctk/ernie_sat` 下的 `README.md`, 确保 `examples/aishell3_vctk/ernie_sat` 下 `run.sh` 相关示例代码有效。
+
+运行好 `examples/aishell3_vctk/ernie_sat` 后,回到当前目录,创建环境:
+```bash
+cd speech_server
+ln -snf ../../../examples/aishell3_vctk/ernie_sat/download .
+ln -snf ../../../examples/aishell3_vctk/ernie_sat/tools .
+cd ../
+```
+
+#### finetune 环境配置
+
+`finetune` 需要解压 `tools/aligner` 中的 `aishell3_model.zip`,finetune 过程需要使用到 `tools/aligner/aishell3_model/meta.yaml` 文件。
+
+```bash
+cd speech_server/tools/aligner
+unzip aishell3_model.zip
+cd -
+```
+
+#### 启动后端服务
+
+```
+cd speech_server
+# 默认8010端口
+python vc.py --port 8010
+```
+
+### 启动前端服务
```
cd web_client
@@ -65,6 +173,9 @@ yarn dev --port 8011
```
默认配置下,前端中配置的后台地址信息是 localhost,确保后端服务器和打开页面的游览器在同一台机器上,不在一台机器的配置方式见下方的 FAQ:【后端如果部署在其它机器或者别的端口如何修改】
+
+
+
## FAQ
#### Q: 如何安装node.js
@@ -75,7 +186,7 @@ A: node.js的安装可以参考[【菜鸟教程】](https://www.runoob.com/nod
A:后端的配置地址有分散在两个文件中
-修改第一个文件 `PaddleSpeechWebClient/vite.config.js`
+修改第一个文件 `./web_client/vite.config.js`
```
server: {
@@ -90,7 +201,7 @@ server: {
}
```
-修改第二个文件 `PaddleSpeechWebClient/src/api/API.js`( Websocket 代理配置失败,所以需要在这个文件中修改)
+修改第二个文件 `./web_client/src/api/API.js`( Websocket 代理配置失败,所以需要在这个文件中修改)
```
// websocket (这里改成后端所在的接口)
diff --git a/demos/speech_web/docs/效果展示.png b/demos/speech_web/docs/效果展示.png
deleted file mode 100644
index 5f7997c1..00000000
Binary files a/demos/speech_web/docs/效果展示.png and /dev/null differ
diff --git a/examples/other/tts_finetune/tts3/finetune.yaml b/demos/speech_web/speech_server/conf/tts3_finetune.yaml
similarity index 86%
rename from examples/other/tts_finetune/tts3/finetune.yaml
rename to demos/speech_web/speech_server/conf/tts3_finetune.yaml
index 374a69f3..4f708bd7 100644
--- a/examples/other/tts_finetune/tts3/finetune.yaml
+++ b/demos/speech_web/speech_server/conf/tts3_finetune.yaml
@@ -3,10 +3,10 @@
###########################################################
# Set to -1 to indicate that the parameter is the same as the pretrained model configuration
-batch_size: -1
+batch_size: 10
learning_rate: 0.0001 # learning rate
num_snapshots: -1
# frozen_layers should be a list
# if you don't need to freeze, set frozen_layers to []
-frozen_layers: ["encoder", "duration_predictor"]
+frozen_layers: ["encoder"]
diff --git a/demos/speech_web/speech_server/main.py b/demos/speech_web/speech_server/main.py
index d4750d59..03e7e599 100644
--- a/demos/speech_web/speech_server/main.py
+++ b/demos/speech_web/speech_server/main.py
@@ -1,8 +1,3 @@
-# todo:
-# 1. 开启服务
-# 2. 接收录音音频,返回识别结果
-# 3. 接收ASR识别结果,返回NLP对话结果
-# 4. 接收NLP对话结果,返回TTS音频
import argparse
import base64
import datetime
@@ -32,6 +27,7 @@ from starlette.requests import Request
from starlette.responses import FileResponse
from starlette.websockets import WebSocketState as WebSocketState
+from paddlespeech.cli.tts.infer import TTSExecutor
from paddlespeech.server.engine.asr.online.python.asr_engine import PaddleASRConnectionHanddler
from paddlespeech.server.utils.audio_process import float2pcm
@@ -55,7 +51,7 @@ asr_config = "conf/ws_conformer_wenetspeech_application_faster.yaml"
asr_init_path = "source/demo/demo.wav"
db_path = "source/db/vpr.sqlite"
ie_model_path = "source/model"
-
+tts_model = TTSExecutor()
# 路径配置
UPLOAD_PATH = "source/vpr"
WAV_PATH = "source/wav"
@@ -72,6 +68,14 @@ manager = ConnectionManager()
aumanager = AudioMannger(chatbot)
aumanager.init()
vpr = VPR(db_path, dim=192, top_k=5)
+# 初始化下载模型
+tts_model(
+ text="今天天气准不错",
+ output="test.wav",
+ am='fastspeech2_mix',
+ spk_id=174,
+ voc='hifigan_csmsc',
+ lang='mix', )
# 服务配置
@@ -331,6 +335,7 @@ async def ieOffline(nlp_base: NlpBase):
#####################################################################
+# 端到端合成
@app.post("/tts/offline")
async def text2speechOffline(tts_base: TtsBase):
text = tts_base.text
@@ -340,8 +345,14 @@ async def text2speechOffline(tts_base: TtsBase):
now_name = "tts_" + datetime.datetime.strftime(
datetime.datetime.now(), '%Y%m%d%H%M%S') + randName() + ".wav"
out_file_path = os.path.join(WAV_PATH, now_name)
- # 保存为文件,再转成base64传输
- chatbot.text2speech(text, outpath=out_file_path)
+ # 使用中英混合CLI
+ tts_model(
+ text=text,
+ output=out_file_path,
+ am='fastspeech2_mix',
+ spk_id=174,
+ voc='hifigan_csmsc',
+ lang='mix')
with open(out_file_path, "rb") as f:
data_bin = f.read()
base_str = base64.b64encode(data_bin)
diff --git a/demos/speech_web/speech_server/requirements.txt b/demos/speech_web/speech_server/requirements.txt
index 607f0d4d..cdc65465 100644
--- a/demos/speech_web/speech_server/requirements.txt
+++ b/demos/speech_web/speech_server/requirements.txt
@@ -1,13 +1,8 @@
aiofiles
faiss-cpu
-fastapi
-librosa
-numpy
-paddlenlp
-paddlepaddle
-paddlespeech
+praatio==5.0.0
pydantic
-python-multipartscikit_learn
-SoundFile
+python-multipart
+scikit_learn
starlette
uvicorn
diff --git a/demos/speech_web/speech_server/src/ernie_sat.py b/demos/speech_web/speech_server/src/ernie_sat.py
new file mode 100644
index 00000000..b74dd8e3
--- /dev/null
+++ b/demos/speech_web/speech_server/src/ernie_sat.py
@@ -0,0 +1,195 @@
+import os
+
+from .util import MAIN_ROOT
+from .util import run_cmd
+
+
+class SAT:
+ def __init__(self):
+ # pretrain model path
+ self.zh_pretrain_model_path = os.path.realpath(
+ "source/model/erniesat_aishell3_ckpt_1.2.0")
+ self.en_pretrain_model_path = os.path.realpath(
+ "source/model/erniesat_vctk_ckpt_1.2.0")
+ self.cross_pretrain_model_path = os.path.realpath(
+ "source/model/erniesat_aishell3_vctk_ckpt_1.2.0")
+
+ self.zh_voc_model_path = os.path.realpath(
+ "source/model/hifigan_aishell3_ckpt_0.2.0")
+ self.eb_voc_model_path = os.path.realpath(
+ "source/model/hifigan_vctk_ckpt_0.2.0")
+ self.cross_voc_model_path = os.path.realpath(
+ "source/model/hifigan_aishell3_ckpt_0.2.0")
+
+ self.BIN_DIR = os.path.join(MAIN_ROOT,
+ "paddlespeech/t2s/exps/ernie_sat")
+
+ def zh_synthesize_edit(self,
+ old_str: str,
+ new_str: str,
+ input_name: os.PathLike,
+ output_name: os.PathLike,
+ task_name: str="synthesize",
+ erniesat_ckpt_name: str="snapshot_iter_289500.pdz"):
+
+ if task_name not in ['synthesize', 'edit']:
+ print("task name only in ['edit', 'synthesize']")
+ return None
+
+ # 推理文件配置
+ config_path = os.path.join(self.zh_pretrain_model_path, "default.yaml")
+ phones_dict = os.path.join(self.zh_pretrain_model_path,
+ "phone_id_map.txt")
+ erniesat_ckpt = os.path.join(self.zh_pretrain_model_path,
+ erniesat_ckpt_name)
+ erniesat_stat = os.path.join(self.zh_pretrain_model_path,
+ "speech_stats.npy")
+
+ voc = "hifigan_aishell3"
+ voc_config = os.path.join(self.zh_voc_model_path, "default.yaml")
+ voc_ckpt = os.path.join(self.zh_voc_model_path,
+ "snapshot_iter_2500000.pdz")
+ voc_stat = os.path.join(self.zh_voc_model_path, "feats_stats.npy")
+
+ cmd = self.get_cmd(
+ task_name=task_name,
+ input_name=input_name,
+ old_str=old_str,
+ new_str=new_str,
+ config_path=config_path,
+ phones_dict=phones_dict,
+ erniesat_ckpt=erniesat_ckpt,
+ erniesat_stat=erniesat_stat,
+ voc=voc,
+ voc_config=voc_config,
+ voc_ckpt=voc_ckpt,
+ voc_stat=voc_stat,
+ output_name=output_name,
+ source_lang="zh",
+ target_lang="zh")
+
+ return run_cmd(cmd, output_name)
+
+ def crossclone(self,
+ old_str: str,
+ new_str: str,
+ input_name: os.PathLike,
+ output_name: os.PathLike,
+ source_lang: str,
+ target_lang: str,
+ erniesat_ckpt_name: str="snapshot_iter_489000.pdz"):
+ # 推理文件配置
+ config_path = os.path.join(self.cross_pretrain_model_path,
+ "default.yaml")
+ phones_dict = os.path.join(self.cross_pretrain_model_path,
+ "phone_id_map.txt")
+ erniesat_ckpt = os.path.join(self.cross_pretrain_model_path,
+ erniesat_ckpt_name)
+ erniesat_stat = os.path.join(self.cross_pretrain_model_path,
+ "speech_stats.npy")
+
+ voc = "hifigan_aishell3"
+ voc_config = os.path.join(self.cross_voc_model_path, "default.yaml")
+ voc_ckpt = os.path.join(self.cross_voc_model_path,
+ "snapshot_iter_2500000.pdz")
+ voc_stat = os.path.join(self.cross_voc_model_path, "feats_stats.npy")
+ task_name = "synthesize"
+ cmd = self.get_cmd(
+ task_name=task_name,
+ input_name=input_name,
+ old_str=old_str,
+ new_str=new_str,
+ config_path=config_path,
+ phones_dict=phones_dict,
+ erniesat_ckpt=erniesat_ckpt,
+ erniesat_stat=erniesat_stat,
+ voc=voc,
+ voc_config=voc_config,
+ voc_ckpt=voc_ckpt,
+ voc_stat=voc_stat,
+ output_name=output_name,
+ source_lang=source_lang,
+ target_lang=target_lang)
+
+ return run_cmd(cmd, output_name)
+
+ def en_synthesize_edit(self,
+ old_str: str,
+ new_str: str,
+ input_name: os.PathLike,
+ output_name: os.PathLike,
+ task_name: str="synthesize",
+ erniesat_ckpt_name: str="snapshot_iter_199500.pdz"):
+
+ # 推理文件配置
+ config_path = os.path.join(self.en_pretrain_model_path, "default.yaml")
+ phones_dict = os.path.join(self.en_pretrain_model_path,
+ "phone_id_map.txt")
+ erniesat_ckpt = os.path.join(self.en_pretrain_model_path,
+ erniesat_ckpt_name)
+ erniesat_stat = os.path.join(self.en_pretrain_model_path,
+ "speech_stats.npy")
+
+ voc = "hifigan_aishell3"
+ voc_config = os.path.join(self.zh_voc_model_path, "default.yaml")
+ voc_ckpt = os.path.join(self.zh_voc_model_path,
+ "snapshot_iter_2500000.pdz")
+ voc_stat = os.path.join(self.zh_voc_model_path, "feats_stats.npy")
+
+ cmd = self.get_cmd(
+ task_name=task_name,
+ input_name=input_name,
+ old_str=old_str,
+ new_str=new_str,
+ config_path=config_path,
+ phones_dict=phones_dict,
+ erniesat_ckpt=erniesat_ckpt,
+ erniesat_stat=erniesat_stat,
+ voc=voc,
+ voc_config=voc_config,
+ voc_ckpt=voc_ckpt,
+ voc_stat=voc_stat,
+ output_name=output_name,
+ source_lang="en",
+ target_lang="en")
+
+ return run_cmd(cmd, output_name)
+
+ def get_cmd(self,
+ task_name: str,
+ input_name: str,
+ old_str: str,
+ new_str: str,
+ config_path: str,
+ phones_dict: str,
+ erniesat_ckpt: str,
+ erniesat_stat: str,
+ voc: str,
+ voc_config: str,
+ voc_ckpt: str,
+ voc_stat: str,
+ output_name: str,
+ source_lang: str,
+ target_lang: str):
+ cmd = f"""
+ FLAGS_allocator_strategy=naive_best_fit \
+ FLAGS_fraction_of_gpu_memory_to_use=0.01 \
+ python3 {self.BIN_DIR}/synthesize_e2e.py \
+ --task_name={task_name} \
+ --wav_path={input_name} \
+ --old_str='{old_str}' \
+ --new_str='{new_str}' \
+ --source_lang={source_lang} \
+ --target_lang={target_lang} \
+ --erniesat_config={config_path} \
+ --phones_dict={phones_dict} \
+ --erniesat_ckpt={erniesat_ckpt} \
+ --erniesat_stat={erniesat_stat} \
+ --voc={voc} \
+ --voc_config={voc_config} \
+ --voc_ckpt={voc_ckpt} \
+ --voc_stat={voc_stat} \
+ --output_name={output_name}
+ """
+
+ return cmd
diff --git a/demos/speech_web/speech_server/src/finetune.py b/demos/speech_web/speech_server/src/finetune.py
new file mode 100644
index 00000000..d7a440f9
--- /dev/null
+++ b/demos/speech_web/speech_server/src/finetune.py
@@ -0,0 +1,125 @@
+import os
+
+from .util import MAIN_ROOT
+from .util import run_cmd
+
+
+def find_max_ckpt(model_path):
+ max_ckpt = 0
+ for filename in os.listdir(model_path):
+ if filename.endswith('.pdz'):
+ files = filename[:-4]
+ a1, a2, it = files.split("_")
+ if int(it) > max_ckpt:
+ max_ckpt = int(it)
+ return max_ckpt
+
+
+class FineTune:
+ def __init__(self):
+ self.now_file_path = os.path.dirname(__file__)
+ self.PYTHONPATH = os.path.join(MAIN_ROOT,
+ "examples/other/tts_finetune/tts3")
+ self.BIN_DIR = os.path.join(MAIN_ROOT,
+ "paddlespeech/t2s/exps/fastspeech2")
+ self.pretrained_model_dir = os.path.realpath(
+ "source/model/fastspeech2_aishell3_ckpt_1.1.0")
+ self.voc_model_dir = os.path.realpath(
+ "source/model/hifigan_aishell3_ckpt_0.2.0")
+ self.finetune_config = os.path.join("conf/tts3_finetune.yaml")
+
+ def finetune(self, input_dir, exp_dir='temp', epoch=100):
+ """
+ use cmd follow examples/other/tts_finetune/tts3/run.sh
+ """
+ newdir_name = "newdir"
+ new_dir = os.path.join(input_dir, newdir_name)
+ mfa_dir = os.path.join(exp_dir, 'mfa_result')
+ dump_dir = os.path.join(exp_dir, 'dump')
+ output_dir = os.path.join(exp_dir, 'exp')
+ lang = "zh"
+ ngpu = 1
+
+ cmd = f"""
+ # check oov
+ python3 {self.PYTHONPATH}/local/check_oov.py \
+ --input_dir={input_dir} \
+ --pretrained_model_dir={self.pretrained_model_dir} \
+ --newdir_name={newdir_name} \
+ --lang={lang}
+
+ # get mfa result
+ python3 {self.PYTHONPATH}/local/get_mfa_result.py \
+ --input_dir={new_dir} \
+ --mfa_dir={mfa_dir} \
+ --lang={lang}
+
+ # generate durations.txt
+ python3 {self.PYTHONPATH}/local/generate_duration.py \
+ --mfa_dir={mfa_dir}
+
+ # extract feature
+ python3 {self.PYTHONPATH}/local/extract_feature.py \
+ --duration_file="./durations.txt" \
+ --input_dir={new_dir} \
+ --dump_dir={dump_dir} \
+ --pretrained_model_dir={self.pretrained_model_dir}
+
+ # create finetune env
+ python3 {self.PYTHONPATH}/local/prepare_env.py \
+ --pretrained_model_dir={self.pretrained_model_dir} \
+ --output_dir={output_dir}
+
+ # finetune
+ python3 {self.PYTHONPATH}/local/finetune.py \
+ --pretrained_model_dir={self.pretrained_model_dir} \
+ --dump_dir={dump_dir} \
+ --output_dir={output_dir} \
+ --ngpu={ngpu} \
+ --epoch=100 \
+ --finetune_config={self.finetune_config}
+ """
+
+ print(cmd)
+
+ return run_cmd(cmd, exp_dir)
+
+ def synthesize(self, text, wav_name, out_wav_dir, exp_dir='temp'):
+
+ voc = "hifigan_aishell3"
+ dump_dir = os.path.join(exp_dir, 'dump')
+ output_dir = os.path.join(exp_dir, 'exp')
+ text_path = os.path.join(exp_dir, 'sentences.txt')
+ lang = "zh"
+ ngpu = 1
+
+ model_path = f"{output_dir}/checkpoints"
+ ckpt = find_max_ckpt(model_path)
+
+ # 生成对应的语句
+ with open(text_path, "w", encoding='utf8') as f:
+ f.write(wav_name + " " + text)
+
+ cmd = f"""
+ FLAGS_allocator_strategy=naive_best_fit \
+ FLAGS_fraction_of_gpu_memory_to_use=0.01 \
+ python3 {self.BIN_DIR}/../synthesize_e2e.py \
+ --am=fastspeech2_aishell3 \
+ --am_config={self.pretrained_model_dir}/default.yaml \
+ --am_ckpt={output_dir}/checkpoints/snapshot_iter_{ckpt}.pdz \
+ --am_stat={self.pretrained_model_dir}/speech_stats.npy \
+ --voc={voc} \
+ --voc_config={self.voc_model_dir}/default.yaml \
+ --voc_ckpt={self.voc_model_dir}/snapshot_iter_2500000.pdz \
+ --voc_stat={self.voc_model_dir}/feats_stats.npy \
+ --lang={lang} \
+ --text={text_path} \
+ --output_dir={out_wav_dir} \
+ --phones_dict={dump_dir}/phone_id_map.txt \
+ --speaker_dict={dump_dir}/speaker_id_map.txt \
+ --spk_id=0
+ """
+
+ out_path = os.path.join(out_wav_dir, f"{wav_name}.wav")
+
+ return run_cmd(cmd, out_path)
diff --git a/demos/speech_web/speech_server/src/ge2e_clone.py b/demos/speech_web/speech_server/src/ge2e_clone.py
new file mode 100644
index 00000000..d90013b9
--- /dev/null
+++ b/demos/speech_web/speech_server/src/ge2e_clone.py
@@ -0,0 +1,57 @@
+import os
+import shutil
+
+from .util import MAIN_ROOT
+from .util import run_cmd
+
+
+class VoiceCloneGE2E():
+ def __init__(self):
+ # Path 到指定路径上
+ self.BIN_DIR = os.path.join(MAIN_ROOT, "paddlespeech/t2s/exps")
+ # am
+ self.am = "fastspeech2_aishell3"
+ self.am_config = "source/model/fastspeech2_nosil_aishell3_vc1_ckpt_0.5/default.yaml"
+ self.am_ckpt = "source/model/fastspeech2_nosil_aishell3_vc1_ckpt_0.5/snapshot_iter_96400.pdz"
+ self.am_stat = "source/model/fastspeech2_nosil_aishell3_vc1_ckpt_0.5/speech_stats.npy"
+ self.phones_dict = "source/model/fastspeech2_nosil_aishell3_vc1_ckpt_0.5/phone_id_map.txt"
+ # voc
+ self.voc = "pwgan_aishell3"
+ self.voc_config = "source/model/pwg_aishell3_ckpt_0.5/default.yaml"
+ self.voc_ckpt = "source/model/pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz"
+ self.voc_stat = "source/model/pwg_aishell3_ckpt_0.5/feats_stats.npy"
+ # ge2e
+ self.ge2e_params_path = "source/model/ge2e_ckpt_0.3/step-3000000.pdparams"
+
+ def vc(self, text, input_wav, out_wav):
+
+ # input wav 需要形成临时单独文件夹
+ _, full_file_name = os.path.split(input_wav)
+ ref_audio_dir = os.path.realpath("tmp_dir/ge2e")
+ if os.path.exists(ref_audio_dir):
+ shutil.rmtree(ref_audio_dir)
+ else:
+ os.makedirs(ref_audio_dir, exist_ok=True)
+ shutil.copy(input_wav, ref_audio_dir)
+
+ output_dir = os.path.dirname(out_wav)
+
+ cmd = f"""
+ python3 {self.BIN_DIR}/voice_cloning.py \
+ --am={self.am} \
+ --am_config={self.am_config} \
+ --am_ckpt={self.am_ckpt} \
+ --am_stat={self.am_stat} \
+ --voc={self.voc} \
+ --voc_config={self.voc_config} \
+ --voc_ckpt={self.voc_ckpt} \
+ --voc_stat={self.voc_stat} \
+ --ge2e_params_path={self.ge2e_params_path} \
+ --text="{text}" \
+ --input-dir={ref_audio_dir} \
+ --output-dir={output_dir} \
+ --phones-dict={self.phones_dict}
+ """
+
+ output_name = os.path.join(output_dir, full_file_name)
+ return run_cmd(cmd, output_name=output_name)
diff --git a/demos/speech_web/speech_server/src/tdnn_clone.py b/demos/speech_web/speech_server/src/tdnn_clone.py
new file mode 100644
index 00000000..c24b9b07
--- /dev/null
+++ b/demos/speech_web/speech_server/src/tdnn_clone.py
@@ -0,0 +1,54 @@
+import os
+import shutil
+
+from .util import MAIN_ROOT
+from .util import run_cmd
+
+
+class VoiceCloneTDNN():
+ def __init__(self):
+ # Path 到指定路径上
+ self.BIN_DIR = os.path.join(MAIN_ROOT, "paddlespeech/t2s/exps")
+
+ self.am = "fastspeech2_aishell3"
+ self.am_config = "source/model/fastspeech2_aishell3_ckpt_vc2_1.2.0/default.yaml"
+ self.am_ckpt = "source/model/fastspeech2_aishell3_ckpt_vc2_1.2.0/snapshot_iter_96400.pdz"
+ self.am_stat = "source/model/fastspeech2_aishell3_ckpt_vc2_1.2.0/speech_stats.npy"
+ self.phones_dict = "source/model/fastspeech2_aishell3_ckpt_vc2_1.2.0/phone_id_map.txt"
+ # voc
+ self.voc = "pwgan_aishell3"
+ self.voc_config = "source/model/pwg_aishell3_ckpt_0.5/default.yaml"
+ self.voc_ckpt = "source/model/pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz"
+ self.voc_stat = "source/model/pwg_aishell3_ckpt_0.5/feats_stats.npy"
+
+ def vc(self, text, input_wav, out_wav):
+ # input wav 需要形成临时单独文件夹
+ _, full_file_name = os.path.split(input_wav)
+ ref_audio_dir = os.path.realpath("tmp_dir/tdnn")
+ if os.path.exists(ref_audio_dir):
+ shutil.rmtree(ref_audio_dir)
+ else:
+ os.makedirs(ref_audio_dir, exist_ok=True)
+ shutil.copy(input_wav, ref_audio_dir)
+
+ output_dir = os.path.dirname(out_wav)
+
+ cmd = f"""
+ python3 {self.BIN_DIR}/voice_cloning.py \
+ --am={self.am} \
+ --am_config={self.am_config} \
+ --am_ckpt={self.am_ckpt} \
+ --am_stat={self.am_stat} \
+ --voc={self.voc} \
+ --voc_config={self.voc_config} \
+ --voc_ckpt={self.voc_ckpt} \
+ --voc_stat={self.voc_stat} \
+ --text="{text}" \
+ --input-dir={ref_audio_dir} \
+ --output-dir={output_dir} \
+ --phones-dict={self.phones_dict} \
+ --use_ecapa=True
+ """
+
+ output_name = os.path.join(output_dir, full_file_name)
+ return run_cmd(cmd, output_name=output_name)
diff --git a/demos/speech_web/speech_server/src/util.py b/demos/speech_web/speech_server/src/util.py
index 4a566b6e..a69e6c42 100644
--- a/demos/speech_web/speech_server/src/util.py
+++ b/demos/speech_web/speech_server/src/util.py
@@ -1,4 +1,9 @@
+import os
import random
+import subprocess
+
+NOW_FILE_PATH = os.path.dirname(__file__)
+MAIN_ROOT = os.path.realpath(os.path.join(NOW_FILE_PATH, "../../../../"))
def randName(n=5):
@@ -11,3 +16,20 @@ def SuccessRequest(result=None, message="ok"):
def ErrorRequest(result=None, message="error"):
return {"code": -1, "result": result, "message": message}
+
+
+def run_cmd(cmd, output_name):
+ p = subprocess.Popen(cmd, shell=True)
+ res = p.wait()
+ print(cmd)
+ print("运行结果:", res)
+ if res == 0:
+ # 运行成功
+ if os.path.exists(output_name):
+ return output_name
+ else:
+ # 合成的文件不存在
+ return None
+ else:
+ # 运行失败
+ return None
diff --git a/demos/speech_web/speech_server/vc.py b/demos/speech_web/speech_server/vc.py
new file mode 100644
index 00000000..99e56b40
--- /dev/null
+++ b/demos/speech_web/speech_server/vc.py
@@ -0,0 +1,547 @@
+import argparse
+import base64
+import datetime
+import json
+import os
+from typing import List
+
+import aiofiles
+import librosa
+import soundfile as sf
+import uvicorn
+from fastapi import FastAPI
+from fastapi import UploadFile
+from pydantic import BaseModel
+from src.ernie_sat import SAT
+from src.finetune import FineTune
+from src.ge2e_clone import VoiceCloneGE2E
+from src.tdnn_clone import VoiceCloneTDNN
+from src.util import *
+from starlette.responses import FileResponse
+
+from paddlespeech.server.utils.audio_process import float2pcm
+
+# 解析配置
+parser = argparse.ArgumentParser(prog='PaddleSpeechDemo', add_help=True)
+
+parser.add_argument(
+ "--port",
+ action="store",
+ type=int,
+ help="port of the app",
+ default=8010,
+ required=False)
+
+args = parser.parse_args()
+port = args.port
+
+# 这里会对finetune产生影响,所以finetune使用了cmd
+vc_model = VoiceCloneGE2E()
+vc_model_tdnn = VoiceCloneTDNN()
+
+sat_model = SAT()
+ft_model = FineTune()
+
+# 配置文件
+tts_config = "conf/tts_online_application.yaml"
+asr_config = "conf/ws_conformer_wenetspeech_application_faster.yaml"
+asr_init_path = "source/demo/demo.wav"
+db_path = "source/db/vc.sqlite"
+ie_model_path = "source/model"
+
+# 路径配置
+VC_UPLOAD_PATH = "source/wav/vc/upload"
+VC_OUT_PATH = "source/wav/vc/out"
+
+FT_UPLOAD_PATH = "source/wav/finetune/upload"
+FT_OUT_PATH = "source/wav/finetune/out"
+FT_LABEL_PATH = "source/wav/finetune/label.json"
+FT_LABEL_TXT_PATH = "source/wav/finetune/labels.txt"
+FT_DEFAULT_PATH = "source/wav/finetune/default"
+FT_EXP_BASE_PATH = "tmp_dir/finetune"
+
+SAT_UPLOAD_PATH = "source/wav/SAT/upload"
+SAT_OUT_PATH = "source/wav/SAT/out"
+SAT_LABEL_PATH = "source/wav/SAT/label.json"
+
+# SAT 标注结果初始化
+if os.path.exists(SAT_LABEL_PATH):
+ with open(SAT_LABEL_PATH, "r", encoding='utf8') as f:
+ sat_label_dic = json.load(f)
+else:
+ sat_label_dic = {}
+
+# ft 标注结果初始化
+if os.path.exists(FT_LABEL_PATH):
+ with open(FT_LABEL_PATH, "r", encoding='utf8') as f:
+ ft_label_dic = json.load(f)
+else:
+ ft_label_dic = {}
+
+# 新建文件夹
+base_sources = [
+ VC_UPLOAD_PATH,
+ VC_OUT_PATH,
+ FT_UPLOAD_PATH,
+ FT_OUT_PATH,
+ FT_DEFAULT_PATH,
+ SAT_UPLOAD_PATH,
+ SAT_OUT_PATH,
+]
+for path in base_sources:
+ os.makedirs(path, exist_ok=True)
+#####################################################################
+########################### APP初始化 ###############################
+#####################################################################
+app = FastAPI()
+
+######################################################################
+########################### 接口类型 #################################
+#####################################################################
+
+
+# 接口结构
+class VcBase(BaseModel):
+ wavName: str
+ wavPath: str
+
+
+class VcBaseText(BaseModel):
+ wavName: str
+ wavPath: str
+ text: str
+ func: str
+
+
+class VcBaseSAT(BaseModel):
+ old_str: str
+ new_str: str
+ language: str
+ function: str
+ wav: str # base64编码
+ filename: str
+
+
+class FTPath(BaseModel):
+ dataPath: str
+
+
+class VcBaseFT(BaseModel):
+ wav: str # base64编码
+ filename: str
+ wav_path: str
+
+
+class VcBaseFTModel(BaseModel):
+ wav_path: str
+
+
+class VcBaseFTSyn(BaseModel):
+ exp_path: str
+ text: str
+
+
+######################################################################
+########################### 文件列表查询与保存服务 #################################
+#####################################################################
+
+
+def getVCList(path):
+ VC_FileDict = []
+ # 查询upload路径下的wav文件名
+ for root, dirs, files in os.walk(path, topdown=False):
+ for name in files:
+ # print(os.path.join(root, name))
+ VC_FileDict.append({'name': name, 'path': os.path.join(root, name)})
+ VC_FileDict = sorted(VC_FileDict, key=lambda x: x['name'], reverse=True)
+ return VC_FileDict
+
+
+async def saveFiles(files, SavePath):
+ right = 0
+ error = 0
+ error_info = "错误文件:"
+ for file in files:
+ try:
+ if 'blob' in file.filename:
+ out_file_path = os.path.join(
+ SavePath,
+ datetime.datetime.strftime(datetime.datetime.now(),
+ '%H%M') + randName(3) + ".wav")
+ else:
+ out_file_path = os.path.join(SavePath, file.filename)
+
+ print("上传文件名:", out_file_path)
+ async with aiofiles.open(out_file_path, 'wb') as out_file:
+ content = await file.read() # async read
+ await out_file.write(content) # async write
+ # 将文件转成24k, 16bit类型的wav文件
+ wav, sr = librosa.load(out_file_path, sr=16000)
+ sf.write(out_file_path, data=wav, samplerate=sr)
+ right += 1
+ except Exception as e:
+ error += 1
+ error_info = error_info + file.filename + " " + str(e) + "\n"
+ continue
+ return f"上传成功:{right}, 上传失败:{error}, 失败原因: {error_info}"
+
+
+# 音频下载
+@app.post("/vc/download")
+async def VcDownload(base: VcBase):
+ if os.path.exists(base.wavPath):
+ return FileResponse(base.wavPath)
+ else:
+ return ErrorRequest(message="下载请求失败,文件不存在")
+
+
+# 音频下载base64
+@app.post("/vc/download_base64")
+async def VcDownloadBase64(base: VcBase):
+ if os.path.exists(base.wavPath):
+ # 将文件转成16k, 16bit类型的wav文件
+ wav, sr = librosa.load(base.wavPath, sr=16000)
+ wav = float2pcm(wav) # float32 to int16
+ wav_bytes = wav.tobytes() # to bytes
+ wav_base64 = base64.b64encode(wav_bytes).decode('utf8')
+ return SuccessRequest(result=wav_base64)
+ else:
+ return ErrorRequest(message="播放请求失败,文件不存在")
+
+
+######################################################################
+########################### VC 服务 #################################
+#####################################################################
+
+
+# 上传文件
+@app.post("/vc/upload")
+async def VcUpload(files: List[UploadFile]):
+ # res = saveFiles(files, VC_UPLOAD_PATH)
+ right = 0
+ error = 0
+ error_info = "错误文件:"
+ for file in files:
+ try:
+ if 'blob' in file.filename:
+ out_file_path = os.path.join(
+ VC_UPLOAD_PATH,
+ datetime.datetime.strftime(datetime.datetime.now(),
+ '%H%M') + randName(3) + ".wav")
+ else:
+ out_file_path = os.path.join(VC_UPLOAD_PATH, file.filename)
+
+ print("上传文件名:", out_file_path)
+ async with aiofiles.open(out_file_path, 'wb') as out_file:
+ content = await file.read() # async read
+ await out_file.write(content) # async write
+ # 将文件转成24k, 16bit类型的wav文件
+ wav, sr = librosa.load(out_file_path, sr=16000)
+ sf.write(out_file_path, data=wav, samplerate=sr)
+ right += 1
+ except Exception as e:
+ error += 1
+ error_info = error_info + file.filename + " " + str(e) + "\n"
+ continue
+ return SuccessRequest(
+ result=f"上传成功:{right}, 上传失败:{error}, 失败原因: {error_info}")
+
+
+# 获取文件列表
+@app.get("/vc/list")
+async def VcList():
+ res = getVCList(VC_UPLOAD_PATH)
+ return SuccessRequest(result=res)
+
+
+# 获取音频文件
+@app.post("/vc/file")
+async def VcFileGet(base: VcBase):
+ if os.path.exists(base.wavPath):
+ return FileResponse(base.wavPath)
+ else:
+ return ErrorRequest(result="获取文件失败")
+
+
+# 删除音频文件
+@app.post("/vc/del")
+async def VcFileDel(base: VcBase):
+ if os.path.exists(base.wavPath):
+ os.remove(base.wavPath)
+ return SuccessRequest(result="删除成功")
+ else:
+ return ErrorRequest(result="删除失败")
+
+
+# 声音克隆G2P
+@app.post("/vc/clone_g2p")
+async def VcCloneG2P(base: VcBaseText):
+ if os.path.exists(base.wavPath):
+ try:
+ if base.func == 'ge2e':
+ wavName = base.wavName
+ wavPath = os.path.join(VC_OUT_PATH, wavName)
+ vc_model.vc(
+ text=base.text, input_wav=base.wavPath, out_wav=wavPath)
+ else:
+ wavName = base.wavName
+ wavPath = os.path.join(VC_OUT_PATH, wavName)
+ vc_model_tdnn.vc(
+ text=base.text, input_wav=base.wavPath, out_wav=wavPath)
+ res = {"wavName": wavName, "wavPath": wavPath}
+ return SuccessRequest(result=res)
+ except Exception as e:
+ print(e)
+ return ErrorRequest(message="克隆失败,合成过程报错")
+ else:
+ return ErrorRequest(message="克隆失败,音频不存在")
+
+
+######################################################################
+########################### SAT 服务 #################################
+#####################################################################
+# 声音克隆SAT
+@app.post("/vc/clone_sat")
+async def VcCloneSAT(base: VcBaseSAT):
+ # 重新整理 sat_label_dict
+ if base.filename not in sat_label_dic or sat_label_dic[
+ base.filename] != base.old_str:
+ sat_label_dic[base.filename] = base.old_str
+ with open(SAT_LABEL_PATH, "w", encoding='utf8') as f:
+ json.dump(sat_label_dic, f, ensure_ascii=False, indent=4)
+
+ input_file_path = base.wav
+
+ # 选择任务
+ if base.language == "zh":
+ # 中文
+ if base.function == "synthesize":
+ output_file_path = os.path.join(SAT_OUT_PATH,
+ "sat_syn_zh_" + base.filename)
+ # 中文克隆
+ sat_result = sat_model.zh_synthesize_edit(
+ old_str=base.old_str,
+ new_str=base.new_str,
+ input_name=os.path.realpath(input_file_path),
+ output_name=os.path.realpath(output_file_path),
+ task_name="synthesize")
+ elif base.function == "edit":
+ output_file_path = os.path.join(SAT_OUT_PATH,
+ "sat_edit_zh_" + base.filename)
+ # 中文语音编辑
+ sat_result = sat_model.zh_synthesize_edit(
+ old_str=base.old_str,
+ new_str=base.new_str,
+ input_name=os.path.realpath(input_file_path),
+ output_name=os.path.realpath(output_file_path),
+ task_name="edit")
+ elif base.function == "crossclone":
+ output_file_path = os.path.join(SAT_OUT_PATH,
+ "sat_cross_zh_" + base.filename)
+ # 中文跨语言
+ sat_result = sat_model.crossclone(
+ old_str=base.old_str,
+ new_str=base.new_str,
+ input_name=os.path.realpath(input_file_path),
+ output_name=os.path.realpath(output_file_path),
+ source_lang="zh",
+ target_lang="en")
+ else:
+ return ErrorRequest(
+ message="请检查功能选项是否正确,仅支持:synthesize, edit, crossclone")
+ elif base.language == "en":
+ if base.function == "synthesize":
+ output_file_path = os.path.join(SAT_OUT_PATH,
+ "sat_syn_zh_" + base.filename)
+ # 英文语音克隆
+ sat_result = sat_model.en_synthesize_edit(
+ old_str=base.old_str,
+ new_str=base.new_str,
+ input_name=os.path.realpath(input_file_path),
+ output_name=os.path.realpath(output_file_path),
+ task_name="synthesize")
+ elif base.function == "edit":
+ output_file_path = os.path.join(SAT_OUT_PATH,
+ "sat_edit_zh_" + base.filename)
+ # 英文语音编辑
+ sat_result = sat_model.en_synthesize_edit(
+ old_str=base.old_str,
+ new_str=base.new_str,
+ input_name=os.path.realpath(input_file_path),
+ output_name=os.path.realpath(output_file_path),
+ task_name="edit")
+ elif base.function == "crossclone":
+ output_file_path = os.path.join(SAT_OUT_PATH,
+ "sat_cross_zh_" + base.filename)
+ # 英文跨语言
+ sat_result = sat_model.crossclone(
+ old_str=base.old_str,
+ new_str=base.new_str,
+ input_name=os.path.realpath(input_file_path),
+ output_name=os.path.realpath(output_file_path),
+ source_lang="en",
+ target_lang="zh")
+ else:
+ return ErrorRequest(
+ message="请检查功能选项是否正确,仅支持:synthesize, edit, crossclone")
+ else:
+ return ErrorRequest(message="请检查功能选项是否正确,仅支持中文和英文")
+
+ if sat_result:
+ return SuccessRequest(result=sat_result, message="SAT合成成功")
+ else:
+ return ErrorRequest(message="SAT 合成失败,请从后台检查错误信息!")
+
+
+# SAT 文件列表
+@app.get("/sat/list")
+async def SatList():
+ res = []
+ filelist = getVCList(SAT_UPLOAD_PATH)
+ for fileitem in filelist:
+ if fileitem['name'] in sat_label_dic:
+ fileitem['label'] = sat_label_dic[fileitem['name']]
+ else:
+ fileitem['label'] = ""
+ res.append(fileitem)
+ return SuccessRequest(result=res)
+
+
+# 上传 SAT 音频
+# 上传文件
+@app.post("/sat/upload")
+async def SATUpload(files: List[UploadFile]):
+ right = 0
+ error = 0
+ error_info = "错误文件:"
+ for file in files:
+ try:
+ if 'blob' in file.filename:
+ out_file_path = os.path.join(
+ SAT_UPLOAD_PATH,
+ datetime.datetime.strftime(datetime.datetime.now(),
+ '%H%M') + randName(3) + ".wav")
+ else:
+ out_file_path = os.path.join(SAT_UPLOAD_PATH, file.filename)
+
+ print("上传文件名:", out_file_path)
+ async with aiofiles.open(out_file_path, 'wb') as out_file:
+ content = await file.read() # async read
+ await out_file.write(content) # async write
+ # 将文件转成24k, 16bit类型的wav文件
+ wav, sr = librosa.load(out_file_path, sr=16000)
+ sf.write(out_file_path, data=wav, samplerate=sr)
+ right += 1
+ except Exception as e:
+ error += 1
+ error_info = error_info + file.filename + " " + str(e) + "\n"
+ continue
+ return SuccessRequest(
+ result=f"上传成功:{right}, 上传失败:{error}, 失败原因: {error_info}")
+
+
+######################################################################
+########################### FinueTune 服务 #################################
+#####################################################################
+
+
+# finetune 文件列表
+@app.post("/finetune/list")
+async def FineTuneList(Path: FTPath):
+ dataPath = Path.dataPath
+ if dataPath == "default":
+ # 默认路径
+ FT_PATH = FT_DEFAULT_PATH
+ else:
+ FT_PATH = dataPath
+
+ res = []
+ filelist = getVCList(FT_PATH)
+ for name, value in ft_label_dic.items():
+ wav_path = os.path.join(FT_PATH, name)
+ if not os.path.exists(wav_path):
+ wav_path = ""
+ d = {'text': value['text'], 'name': name, 'path': wav_path}
+ res.append(d)
+ return SuccessRequest(result=res)
+
+
+# 一键重置,获取新的文件地址
+@app.get('/finetune/newdir')
+async def FTGetNewDir():
+ new_path = os.path.join(FT_UPLOAD_PATH, randName(3))
+ if not os.path.exists(new_path):
+ os.makedirs(new_path, exist_ok=True)
+ # 把 labels.txt 复制进去
+ cmd = f"cp {FT_LABEL_TXT_PATH} {new_path}"
+ os.system(cmd)
+ return SuccessRequest(result=new_path)
+
+
+# finetune 上传文件
+@app.post("/finetune/upload")
+async def FTUpload(base: VcBaseFT):
+ try:
+ # 文件夹是否存在
+ if not os.path.exists(base.wav_path):
+ os.makedirs(base.wav_path)
+ # 保存音频文件
+ out_file_path = os.path.join(base.wav_path, base.filename)
+ wav_b = base64.b64decode(base.wav)
+ async with aiofiles.open(out_file_path, 'wb') as out_file:
+ await out_file.write(wav_b) # async write
+
+ return SuccessRequest(result="上传成功")
+ except Exception as e:
+ return ErrorRequest(result="上传失败")
+
+
+# finetune 微调
+@app.post("/finetune/clone_finetune")
+async def FTModel(base: VcBaseFTModel):
+ # 先检查 wav_path 是否有效
+ if base.wav_path == 'default':
+ data_path = FT_DEFAULT_PATH
+ else:
+ data_path = base.wav_path
+ if not os.path.exists(data_path):
+ return ErrorRequest(message="数据文件夹不存在")
+
+ data_base = data_path.split(os.sep)[-1]
+ exp_dir = os.path.join(FT_EXP_BASE_PATH, data_base)
+ try:
+ exp_dir = ft_model.finetune(
+ input_dir=os.path.realpath(data_path),
+ exp_dir=os.path.realpath(exp_dir))
+ if exp_dir:
+ return SuccessRequest(result=exp_dir)
+ else:
+ return ErrorRequest(message="微调失败")
+ except Exception as e:
+ print(e)
+ return ErrorRequest(message="微调失败")
+
+
+# finetune 合成
+@app.post("/finetune/clone_finetune_syn")
+async def FTSyn(base: VcBaseFTSyn):
+ try:
+ if not os.path.exists(base.exp_path):
+ return ErrorRequest(result="模型路径不存在")
+ wav_name = randName(5)
+ wav_path = ft_model.synthesize(
+ text=base.text,
+ wav_name=wav_name,
+ out_wav_dir=os.path.realpath(FT_OUT_PATH),
+ exp_dir=os.path.realpath(base.exp_path))
+ if wav_path:
+ res = {"wavName": wav_name + ".wav", "wavPath": wav_path}
+ return SuccessRequest(result=res)
+ else:
+ return ErrorRequest(message="音频合成失败")
+ except Exception as e:
+ return ErrorRequest(message="音频合成失败")
+
+
+if __name__ == '__main__':
+ uvicorn.run(app=app, host='0.0.0.0', port=port)
diff --git a/demos/speech_web/web_client/package.json b/demos/speech_web/web_client/package.json
index 7f28d4c9..d8c213e4 100644
--- a/demos/speech_web/web_client/package.json
+++ b/demos/speech_web/web_client/package.json
@@ -8,6 +8,7 @@
"preview": "vite preview"
},
"dependencies": {
+ "@element-plus/icons-vue": "^2.0.9",
"ant-design-vue": "^2.2.8",
"axios": "^0.26.1",
"element-plus": "^2.1.9",
@@ -18,6 +19,7 @@
},
"devDependencies": {
"@vitejs/plugin-vue": "^2.3.0",
- "vite": "^2.9.0"
+ "vite": "^2.9.13",
+ "@vue/compiler-sfc": "^3.1.0"
}
}
diff --git a/demos/speech_web/web_client/src/api/API.js b/demos/speech_web/web_client/src/api/API.js
index 0feaa63f..5adca362 100644
--- a/demos/speech_web/web_client/src/api/API.js
+++ b/demos/speech_web/web_client/src/api/API.js
@@ -19,6 +19,26 @@ export const apiURL = {
CHAT_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/offlineStream', // ChatBot websocket 接口
ASR_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/onlineStream', // Stream ASR 接口
TTS_SOCKET_RECORD: 'ws://localhost:8010/ws/tts/online', // Stream TTS 接口
+
+ // voice clone
+ // Voice Clone
+ VC_List: '/api/vc/list',
+ SAT_List: '/api/sat/list',
+ FineTune_List: '/api/finetune/list',
+
+ VC_Upload: '/api/vc/upload',
+ SAT_Upload: '/api/sat/upload',
+ FineTune_Upload: '/api/finetune/upload',
+ FineTune_NewDir: '/api/finetune/newdir',
+
+ VC_Download: '/api/vc/download',
+ VC_Download_Base64: '/api/vc/download_base64',
+ VC_Del: '/api/vc/del',
+
+ VC_CloneG2p: '/api/vc/clone_g2p',
+ VC_CloneSAT: '/api/vc/clone_sat',
+ VC_CloneFineTune: '/api/finetune/clone_finetune',
+ VC_CloneFineTuneSyn: '/api/finetune/clone_finetune_syn',
}
diff --git a/demos/speech_web/web_client/src/api/ApiVC.js b/demos/speech_web/web_client/src/api/ApiVC.js
new file mode 100644
index 00000000..0dc0f683
--- /dev/null
+++ b/demos/speech_web/web_client/src/api/ApiVC.js
@@ -0,0 +1,88 @@
+import axios from 'axios'
+import {apiURL} from "./API.js"
+
+// 上传音频-vc
+export async function vcUpload(params){
+ const result = await axios.post(apiURL.VC_Upload, params);
+ return result
+}
+
+// 上传音频-sat
+export async function satUpload(params){
+ const result = await axios.post(apiURL.SAT_Upload, params);
+ return result
+}
+
+// 上传音频-finetune
+export async function fineTuneUpload(params){
+ const result = await axios.post(apiURL.FineTune_Upload, params);
+ return result
+}
+
+// 删除音频
+export async function vcDel(params){
+ const result = await axios.post(apiURL.VC_Del, params);
+ return result
+}
+
+// 获取音频列表vc
+export async function vcList(){
+ const result = await axios.get(apiURL.VC_List);
+ return result
+}
+// 获取音频列表Sat
+export async function satList(){
+ const result = await axios.get(apiURL.SAT_List);
+ return result
+}
+
+// 获取音频列表fineTune
+export async function fineTuneList(params){
+ const result = await axios.post(apiURL.FineTune_List, params);
+ return result
+}
+
+// fineTune 一键重置 获取新的文件夹
+export async function fineTuneNewDir(){
+ const result = await axios.get(apiURL.FineTune_NewDir);
+ return result
+}
+
+// 获取音频数据
+export async function vcDownload(params){
+ const result = await axios.post(apiURL.VC_Download, params);
+ return result
+}
+
+// 获取音频数据Base64
+export async function vcDownloadBase64(params){
+ const result = await axios.post(apiURL.VC_Download_Base64, params);
+ return result
+}
+
+
+// 克隆合成G2P
+export async function vcCloneG2P(params){
+ const result = await axios.post(apiURL.VC_CloneG2p, params);
+ return result
+}
+
+// 克隆合成SAT
+export async function vcCloneSAT(params){
+ const result = await axios.post(apiURL.VC_CloneSAT, params);
+ return result
+}
+
+// 克隆合成 - finetune 微调
+export async function vcCloneFineTune(params){
+ const result = await axios.post(apiURL.VC_CloneFineTune, params);
+ return result
+}
+
+// 克隆合成 - finetune 合成
+export async function vcCloneFineTuneSyn(params){
+ const result = await axios.post(apiURL.VC_CloneFineTuneSyn, params);
+ return result
+}
+
+
diff --git a/demos/speech_web/web_client/src/components/Content/Header/Header.vue b/demos/speech_web/web_client/src/components/Content/Header/Header.vue
index 8135a2bf..c20f3366 100644
--- a/demos/speech_web/web_client/src/components/Content/Header/Header.vue
+++ b/demos/speech_web/web_client/src/components/Content/Header/Header.vue
@@ -4,7 +4,7 @@
飞桨-PaddleSpeech
- PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源模型库,用于语音和音频中的各种关键任务的开发,欢迎大家Star收藏鼓励
+ PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源模型库,用于语音和音频中的各种关键任务的开发。支持语音识别,语音合成,声纹识别,声音分类,语音唤醒,语音翻译等多种语音任务,荣获 NAACL2022 Best Demo Award 。如果你喜欢这个示例,欢迎在 github 中 star 收藏鼓励。
diff --git a/demos/speech_web/web_client/src/components/SubMenu/ENIRE_SAT/ENIRE_SAT.vue b/demos/speech_web/web_client/src/components/SubMenu/ENIRE_SAT/ENIRE_SAT.vue
new file mode 100644
index 00000000..e1a4f234
--- /dev/null
+++ b/demos/speech_web/web_client/src/components/SubMenu/ENIRE_SAT/ENIRE_SAT.vue
@@ -0,0 +1,487 @@
+
+
+
+
+
+ 录制音频
+ 停止录音
+ 上传录音
+
+
+ 上传音频文件
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ 个性化语音合成
+
+ 跨语言语音合成
+
+ 语音编辑
+
+
+
+
+
+
+
+
+
+ 开始合成
+ 合成中
+
+
+ 播放
+ 播放
+ 下载
+ 下载
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/demos/speech_web/web_client/src/components/SubMenu/FineTune/FineTune.vue b/demos/speech_web/web_client/src/components/SubMenu/FineTune/FineTune.vue
new file mode 100644
index 00000000..895dd586
--- /dev/null
+++ b/demos/speech_web/web_client/src/components/SubMenu/FineTune/FineTune.vue
@@ -0,0 +1,427 @@
+
+
+
+
+
+ 一键重置
+ 默认示例
+ 一键微调
+ 微调中
+ 微调成功
+ 微调失败
+
+
+
+
+
+
+
+
+
+
+
+
+
+ 请输入中文文本
+
+
+
+
+
+
+
+
+ 开始合成
+ 合成中
+
+
+
+ 播放
+ 播放
+ 下载
+ 下载
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/demos/speech_web/web_client/src/components/SubMenu/VoiceClone/VoiceClone.vue b/demos/speech_web/web_client/src/components/SubMenu/VoiceClone/VoiceClone.vue
new file mode 100644
index 00000000..1e380d28
--- /dev/null
+++ b/demos/speech_web/web_client/src/components/SubMenu/VoiceClone/VoiceClone.vue
@@ -0,0 +1,379 @@
+
+
+
+
+
+ 录制音频
+ 停止录音
+ 上传录音
+
+
+ 上传音频文件
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ GE2E
+ ECAPA-TDNN
+
+
+
+
+
+
+
+
+
+
+ 开始合成
+ 合成中
+
+
+
+ 播放
+ 播放
+ 下载
+ 下载
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/demos/speech_web/web_client/src/main.js b/demos/speech_web/web_client/src/main.js
index 3fbf87c8..544f5b30 100644
--- a/demos/speech_web/web_client/src/main.js
+++ b/demos/speech_web/web_client/src/main.js
@@ -1,5 +1,6 @@
import { createApp } from 'vue'
import ElementPlus from 'element-plus'
+import * as ElementPlusIconsVue from '@element-plus/icons-vue'
import 'element-plus/dist/index.css'
import Antd from 'ant-design-vue';
import 'ant-design-vue/dist/antd.css';
@@ -9,5 +10,8 @@ import axios from 'axios'
const app = createApp(App)
app.config.globalProperties.$http = axios
+for (const [key, component] of Object.entries(ElementPlusIconsVue)) {
+ app.component(key, component)
+ }
app.use(ElementPlus).use(Antd)
app.mount('#app')
diff --git a/demos/speech_web/web_client/yarn.lock b/demos/speech_web/web_client/yarn.lock
index 6777cf4c..7f07daa0 100644
--- a/demos/speech_web/web_client/yarn.lock
+++ b/demos/speech_web/web_client/yarn.lock
@@ -44,6 +44,11 @@
resolved "https://registry.npmmirror.com/@element-plus/icons-vue/-/icons-vue-1.1.4.tgz"
integrity sha512-Iz/nHqdp1sFPmdzRwHkEQQA3lKvoObk8azgABZ81QUOpW9s/lUyQVUSh0tNtEPZXQlKwlSh7SPgoVxzrE0uuVQ==
+"@element-plus/icons-vue@^2.0.9":
+ version "2.0.9"
+ resolved "https://registry.npmmirror.com/@element-plus/icons-vue/-/icons-vue-2.0.9.tgz#b7777c57534522e387303d194451d50ff549d49a"
+ integrity sha512-okdrwiVeKBmW41Hkl0eMrXDjzJwhQMuKiBOu17rOszqM+LS/yBYpNQNV5Jvoh06Wc+89fMmb/uhzf8NZuDuUaQ==
+
"@floating-ui/core@^0.6.1":
version "0.6.1"
resolved "https://registry.npmmirror.com/@floating-ui/core/-/core-0.6.1.tgz"
diff --git a/docs/requirements.txt b/docs/requirements.txt
index 3fb82367..fd7a481b 100644
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@@ -20,6 +20,7 @@ onnxruntime==1.10.0
opencc
paddlenlp
paddlepaddle>=2.2.2
+paddlespeech_ctcdecoders
paddlespeech_feat
pandas
pathos == 0.2.8
@@ -27,8 +28,8 @@ pattern_singleton
Pillow>=9.0.0
praatio==5.0.0
prettytable
-pypinyin<=0.44.0
pypinyin-dict
+pypinyin<=0.44.0
python-dateutil
pyworld==0.2.12
recommonmark>=0.5.0
diff --git a/docs/source/api/paddlespeech.cls.exps.panns.deploy.predict.rst b/docs/source/api/paddlespeech.cls.exps.panns.deploy.predict.rst
deleted file mode 100644
index d4f92a2e..00000000
--- a/docs/source/api/paddlespeech.cls.exps.panns.deploy.predict.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.cls.exps.panns.deploy.predict module
-=================================================
-
-.. automodule:: paddlespeech.cls.exps.panns.deploy.predict
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.cls.exps.panns.deploy.rst b/docs/source/api/paddlespeech.cls.exps.panns.deploy.rst
index 4415c933..369862cc 100644
--- a/docs/source/api/paddlespeech.cls.exps.panns.deploy.rst
+++ b/docs/source/api/paddlespeech.cls.exps.panns.deploy.rst
@@ -12,4 +12,3 @@ Submodules
.. toctree::
:maxdepth: 4
- paddlespeech.cls.exps.panns.deploy.predict
diff --git a/docs/source/api/paddlespeech.cls.exps.panns.export_model.rst b/docs/source/api/paddlespeech.cls.exps.panns.export_model.rst
deleted file mode 100644
index 6c39c2bc..00000000
--- a/docs/source/api/paddlespeech.cls.exps.panns.export_model.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.cls.exps.panns.export\_model module
-================================================
-
-.. automodule:: paddlespeech.cls.exps.panns.export_model
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.cls.exps.panns.predict.rst b/docs/source/api/paddlespeech.cls.exps.panns.predict.rst
deleted file mode 100644
index 88cd4033..00000000
--- a/docs/source/api/paddlespeech.cls.exps.panns.predict.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.cls.exps.panns.predict module
-==========================================
-
-.. automodule:: paddlespeech.cls.exps.panns.predict
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.cls.exps.panns.rst b/docs/source/api/paddlespeech.cls.exps.panns.rst
index 6147b245..72f30ba6 100644
--- a/docs/source/api/paddlespeech.cls.exps.panns.rst
+++ b/docs/source/api/paddlespeech.cls.exps.panns.rst
@@ -20,6 +20,3 @@ Submodules
.. toctree::
:maxdepth: 4
- paddlespeech.cls.exps.panns.export_model
- paddlespeech.cls.exps.panns.predict
- paddlespeech.cls.exps.panns.train
diff --git a/docs/source/api/paddlespeech.cls.exps.panns.train.rst b/docs/source/api/paddlespeech.cls.exps.panns.train.rst
deleted file mode 100644
index a89b7eec..00000000
--- a/docs/source/api/paddlespeech.cls.exps.panns.train.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.cls.exps.panns.train module
-========================================
-
-.. automodule:: paddlespeech.cls.exps.panns.train
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.kws.exps.mdtc.plot_det_curve.rst b/docs/source/api/paddlespeech.kws.exps.mdtc.plot_det_curve.rst
deleted file mode 100644
index 46a149b0..00000000
--- a/docs/source/api/paddlespeech.kws.exps.mdtc.plot_det_curve.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.kws.exps.mdtc.plot\_det\_curve module
-==================================================
-
-.. automodule:: paddlespeech.kws.exps.mdtc.plot_det_curve
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.kws.exps.mdtc.rst b/docs/source/api/paddlespeech.kws.exps.mdtc.rst
index f6cad64e..33d4a55c 100644
--- a/docs/source/api/paddlespeech.kws.exps.mdtc.rst
+++ b/docs/source/api/paddlespeech.kws.exps.mdtc.rst
@@ -14,6 +14,5 @@ Submodules
paddlespeech.kws.exps.mdtc.collate
paddlespeech.kws.exps.mdtc.compute_det
- paddlespeech.kws.exps.mdtc.plot_det_curve
paddlespeech.kws.exps.mdtc.score
paddlespeech.kws.exps.mdtc.train
diff --git a/docs/source/api/paddlespeech.s2t.decoders.ctcdecoder.rst b/docs/source/api/paddlespeech.s2t.decoders.ctcdecoder.rst
index 8093619b..dfcd274c 100644
--- a/docs/source/api/paddlespeech.s2t.decoders.ctcdecoder.rst
+++ b/docs/source/api/paddlespeech.s2t.decoders.ctcdecoder.rst
@@ -13,5 +13,4 @@ Submodules
:maxdepth: 4
paddlespeech.s2t.decoders.ctcdecoder.decoders_deprecated
- paddlespeech.s2t.decoders.ctcdecoder.scorer_deprecated
paddlespeech.s2t.decoders.ctcdecoder.swig_wrapper
diff --git a/docs/source/api/paddlespeech.s2t.decoders.ctcdecoder.scorer_deprecated.rst b/docs/source/api/paddlespeech.s2t.decoders.ctcdecoder.scorer_deprecated.rst
deleted file mode 100644
index 1079d672..00000000
--- a/docs/source/api/paddlespeech.s2t.decoders.ctcdecoder.scorer_deprecated.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.decoders.ctcdecoder.scorer\_deprecated module
-==============================================================
-
-.. automodule:: paddlespeech.s2t.decoders.ctcdecoder.scorer_deprecated
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.decoders.recog_bin.rst b/docs/source/api/paddlespeech.s2t.decoders.recog_bin.rst
deleted file mode 100644
index 4952e2e6..00000000
--- a/docs/source/api/paddlespeech.s2t.decoders.recog_bin.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.decoders.recog\_bin module
-===========================================
-
-.. automodule:: paddlespeech.s2t.decoders.recog_bin
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.decoders.rst b/docs/source/api/paddlespeech.s2t.decoders.rst
index e4eabedf..53e0d9c4 100644
--- a/docs/source/api/paddlespeech.s2t.decoders.rst
+++ b/docs/source/api/paddlespeech.s2t.decoders.rst
@@ -23,5 +23,4 @@ Submodules
:maxdepth: 4
paddlespeech.s2t.decoders.recog
- paddlespeech.s2t.decoders.recog_bin
paddlespeech.s2t.decoders.utils
diff --git a/docs/source/api/paddlespeech.s2t.decoders.scorers.ngram.rst b/docs/source/api/paddlespeech.s2t.decoders.scorers.ngram.rst
deleted file mode 100644
index f38a6109..00000000
--- a/docs/source/api/paddlespeech.s2t.decoders.scorers.ngram.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.decoders.scorers.ngram module
-==============================================
-
-.. automodule:: paddlespeech.s2t.decoders.scorers.ngram
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.decoders.scorers.rst b/docs/source/api/paddlespeech.s2t.decoders.scorers.rst
index 83808c49..ca834f6b 100644
--- a/docs/source/api/paddlespeech.s2t.decoders.scorers.rst
+++ b/docs/source/api/paddlespeech.s2t.decoders.scorers.rst
@@ -15,5 +15,4 @@ Submodules
paddlespeech.s2t.decoders.scorers.ctc
paddlespeech.s2t.decoders.scorers.ctc_prefix_score
paddlespeech.s2t.decoders.scorers.length_bonus
- paddlespeech.s2t.decoders.scorers.ngram
paddlespeech.s2t.decoders.scorers.scorer_interface
diff --git a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.client.rst b/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.client.rst
deleted file mode 100644
index a73a5685..00000000
--- a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.client.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.exps.deepspeech2.bin.deploy.client module
-==========================================================
-
-.. automodule:: paddlespeech.s2t.exps.deepspeech2.bin.deploy.client
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.record.rst b/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.record.rst
deleted file mode 100644
index bc107848..00000000
--- a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.record.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.exps.deepspeech2.bin.deploy.record module
-==========================================================
-
-.. automodule:: paddlespeech.s2t.exps.deepspeech2.bin.deploy.record
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.rst b/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.rst
index d1f966fc..28de0f7f 100644
--- a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.rst
+++ b/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.rst
@@ -12,8 +12,5 @@ Submodules
.. toctree::
:maxdepth: 4
- paddlespeech.s2t.exps.deepspeech2.bin.deploy.client
- paddlespeech.s2t.exps.deepspeech2.bin.deploy.record
paddlespeech.s2t.exps.deepspeech2.bin.deploy.runtime
- paddlespeech.s2t.exps.deepspeech2.bin.deploy.send
paddlespeech.s2t.exps.deepspeech2.bin.deploy.server
diff --git a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.send.rst b/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.send.rst
deleted file mode 100644
index ba1ae0a6..00000000
--- a/docs/source/api/paddlespeech.s2t.exps.deepspeech2.bin.deploy.send.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.exps.deepspeech2.bin.deploy.send module
-========================================================
-
-.. automodule:: paddlespeech.s2t.exps.deepspeech2.bin.deploy.send
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.exps.u2.rst b/docs/source/api/paddlespeech.s2t.exps.u2.rst
index e0ebb7fc..bf565670 100644
--- a/docs/source/api/paddlespeech.s2t.exps.u2.rst
+++ b/docs/source/api/paddlespeech.s2t.exps.u2.rst
@@ -21,4 +21,3 @@ Submodules
:maxdepth: 4
paddlespeech.s2t.exps.u2.model
- paddlespeech.s2t.exps.u2.trainer
diff --git a/docs/source/api/paddlespeech.s2t.exps.u2.trainer.rst b/docs/source/api/paddlespeech.s2t.exps.u2.trainer.rst
deleted file mode 100644
index 0cd28945..00000000
--- a/docs/source/api/paddlespeech.s2t.exps.u2.trainer.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.exps.u2.trainer module
-=======================================
-
-.. automodule:: paddlespeech.s2t.exps.u2.trainer
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.exps.u2_kaldi.bin.recog.rst b/docs/source/api/paddlespeech.s2t.exps.u2_kaldi.bin.recog.rst
deleted file mode 100644
index bc749c8f..00000000
--- a/docs/source/api/paddlespeech.s2t.exps.u2_kaldi.bin.recog.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.exps.u2\_kaldi.bin.recog module
-================================================
-
-.. automodule:: paddlespeech.s2t.exps.u2_kaldi.bin.recog
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.exps.u2_kaldi.bin.rst b/docs/source/api/paddlespeech.s2t.exps.u2_kaldi.bin.rst
index ff1a6efe..087b8767 100644
--- a/docs/source/api/paddlespeech.s2t.exps.u2_kaldi.bin.rst
+++ b/docs/source/api/paddlespeech.s2t.exps.u2_kaldi.bin.rst
@@ -12,6 +12,5 @@ Submodules
.. toctree::
:maxdepth: 4
- paddlespeech.s2t.exps.u2_kaldi.bin.recog
paddlespeech.s2t.exps.u2_kaldi.bin.test
paddlespeech.s2t.exps.u2_kaldi.bin.train
diff --git a/docs/source/api/paddlespeech.s2t.training.extensions.rst b/docs/source/api/paddlespeech.s2t.training.extensions.rst
index f31b8427..13530a8d 100644
--- a/docs/source/api/paddlespeech.s2t.training.extensions.rst
+++ b/docs/source/api/paddlespeech.s2t.training.extensions.rst
@@ -15,5 +15,3 @@ Submodules
paddlespeech.s2t.training.extensions.evaluator
paddlespeech.s2t.training.extensions.extension
paddlespeech.s2t.training.extensions.plot
- paddlespeech.s2t.training.extensions.snapshot
- paddlespeech.s2t.training.extensions.visualizer
diff --git a/docs/source/api/paddlespeech.s2t.training.extensions.snapshot.rst b/docs/source/api/paddlespeech.s2t.training.extensions.snapshot.rst
deleted file mode 100644
index e0ca21a7..00000000
--- a/docs/source/api/paddlespeech.s2t.training.extensions.snapshot.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.training.extensions.snapshot module
-====================================================
-
-.. automodule:: paddlespeech.s2t.training.extensions.snapshot
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.training.extensions.visualizer.rst b/docs/source/api/paddlespeech.s2t.training.extensions.visualizer.rst
deleted file mode 100644
index 22ae11f1..00000000
--- a/docs/source/api/paddlespeech.s2t.training.extensions.visualizer.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.training.extensions.visualizer module
-======================================================
-
-.. automodule:: paddlespeech.s2t.training.extensions.visualizer
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.training.updaters.rst b/docs/source/api/paddlespeech.s2t.training.updaters.rst
index a0617016..b38704a0 100644
--- a/docs/source/api/paddlespeech.s2t.training.updaters.rst
+++ b/docs/source/api/paddlespeech.s2t.training.updaters.rst
@@ -13,5 +13,4 @@ Submodules
:maxdepth: 4
paddlespeech.s2t.training.updaters.standard_updater
- paddlespeech.s2t.training.updaters.trainer
paddlespeech.s2t.training.updaters.updater
diff --git a/docs/source/api/paddlespeech.s2t.training.updaters.trainer.rst b/docs/source/api/paddlespeech.s2t.training.updaters.trainer.rst
deleted file mode 100644
index 6981a8f0..00000000
--- a/docs/source/api/paddlespeech.s2t.training.updaters.trainer.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.training.updaters.trainer module
-=================================================
-
-.. automodule:: paddlespeech.s2t.training.updaters.trainer
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.transform.add_deltas.rst b/docs/source/api/paddlespeech.s2t.transform.add_deltas.rst
deleted file mode 100644
index 5007fd9d..00000000
--- a/docs/source/api/paddlespeech.s2t.transform.add_deltas.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.add\_deltas module
-=============================================
-
-.. automodule:: paddlespeech.s2t.transform.add_deltas
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.transform.channel_selector.rst b/docs/source/api/paddlespeech.s2t.transform.channel_selector.rst
deleted file mode 100644
index e08dd253..00000000
--- a/docs/source/api/paddlespeech.s2t.transform.channel_selector.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.channel\_selector module
-===================================================
-
-.. automodule:: paddlespeech.s2t.transform.channel_selector
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.transform.cmvn.rst b/docs/source/api/paddlespeech.s2t.transform.cmvn.rst
deleted file mode 100644
index 8348e3d4..00000000
--- a/docs/source/api/paddlespeech.s2t.transform.cmvn.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.cmvn module
-======================================
-
-.. automodule:: paddlespeech.s2t.transform.cmvn
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.transform.functional.rst b/docs/source/api/paddlespeech.s2t.transform.functional.rst
deleted file mode 100644
index eb2b54a6..00000000
--- a/docs/source/api/paddlespeech.s2t.transform.functional.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.functional module
-============================================
-
-.. automodule:: paddlespeech.s2t.transform.functional
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.transform.perturb.rst b/docs/source/api/paddlespeech.s2t.transform.perturb.rst
deleted file mode 100644
index 0be28ab7..00000000
--- a/docs/source/api/paddlespeech.s2t.transform.perturb.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.perturb module
-=========================================
-
-.. automodule:: paddlespeech.s2t.transform.perturb
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.transform.rst b/docs/source/api/paddlespeech.s2t.transform.rst
deleted file mode 100644
index 5016ff4f..00000000
--- a/docs/source/api/paddlespeech.s2t.transform.rst
+++ /dev/null
@@ -1,24 +0,0 @@
-paddlespeech.s2t.transform package
-==================================
-
-.. automodule:: paddlespeech.s2t.transform
- :members:
- :undoc-members:
- :show-inheritance:
-
-Submodules
-----------
-
-.. toctree::
- :maxdepth: 4
-
- paddlespeech.s2t.transform.add_deltas
- paddlespeech.s2t.transform.channel_selector
- paddlespeech.s2t.transform.cmvn
- paddlespeech.s2t.transform.functional
- paddlespeech.s2t.transform.perturb
- paddlespeech.s2t.transform.spec_augment
- paddlespeech.s2t.transform.spectrogram
- paddlespeech.s2t.transform.transform_interface
- paddlespeech.s2t.transform.transformation
- paddlespeech.s2t.transform.wpe
diff --git a/docs/source/api/paddlespeech.s2t.transform.spec_augment.rst b/docs/source/api/paddlespeech.s2t.transform.spec_augment.rst
deleted file mode 100644
index 00fd3ea1..00000000
--- a/docs/source/api/paddlespeech.s2t.transform.spec_augment.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.spec\_augment module
-===============================================
-
-.. automodule:: paddlespeech.s2t.transform.spec_augment
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.transform.spectrogram.rst b/docs/source/api/paddlespeech.s2t.transform.spectrogram.rst
deleted file mode 100644
index 33c499a7..00000000
--- a/docs/source/api/paddlespeech.s2t.transform.spectrogram.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.spectrogram module
-=============================================
-
-.. automodule:: paddlespeech.s2t.transform.spectrogram
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.transform.transform_interface.rst b/docs/source/api/paddlespeech.s2t.transform.transform_interface.rst
deleted file mode 100644
index 009b0658..00000000
--- a/docs/source/api/paddlespeech.s2t.transform.transform_interface.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.transform\_interface module
-======================================================
-
-.. automodule:: paddlespeech.s2t.transform.transform_interface
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.transform.transformation.rst b/docs/source/api/paddlespeech.s2t.transform.transformation.rst
deleted file mode 100644
index a03e731a..00000000
--- a/docs/source/api/paddlespeech.s2t.transform.transformation.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.transformation module
-================================================
-
-.. automodule:: paddlespeech.s2t.transform.transformation
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.s2t.transform.wpe.rst b/docs/source/api/paddlespeech.s2t.transform.wpe.rst
deleted file mode 100644
index c4831f7f..00000000
--- a/docs/source/api/paddlespeech.s2t.transform.wpe.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.s2t.transform.wpe module
-=====================================
-
-.. automodule:: paddlespeech.s2t.transform.wpe
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.server.engine.acs.python.acs_engine.rst b/docs/source/api/paddlespeech.server.engine.acs.python.acs_engine.rst
deleted file mode 100644
index 9b61633e..00000000
--- a/docs/source/api/paddlespeech.server.engine.acs.python.acs_engine.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.server.engine.acs.python.acs\_engine module
-========================================================
-
-.. automodule:: paddlespeech.server.engine.acs.python.acs_engine
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.server.engine.acs.python.rst b/docs/source/api/paddlespeech.server.engine.acs.python.rst
index 3c06ba08..7e5582bd 100644
--- a/docs/source/api/paddlespeech.server.engine.acs.python.rst
+++ b/docs/source/api/paddlespeech.server.engine.acs.python.rst
@@ -12,4 +12,3 @@ Submodules
.. toctree::
:maxdepth: 4
- paddlespeech.server.engine.acs.python.acs_engine
diff --git a/docs/source/api/paddlespeech.server.utils.log.rst b/docs/source/api/paddlespeech.server.utils.log.rst
deleted file mode 100644
index 453b4a61..00000000
--- a/docs/source/api/paddlespeech.server.utils.log.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.server.utils.log module
-====================================
-
-.. automodule:: paddlespeech.server.utils.log
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.t2s.exps.rst b/docs/source/api/paddlespeech.t2s.exps.rst
index bee18a97..643f97b4 100644
--- a/docs/source/api/paddlespeech.t2s.exps.rst
+++ b/docs/source/api/paddlespeech.t2s.exps.rst
@@ -30,10 +30,10 @@ Submodules
paddlespeech.t2s.exps.inference
paddlespeech.t2s.exps.inference_streaming
+ paddlespeech.t2s.models.vits.monotonic_align
paddlespeech.t2s.exps.ort_predict
paddlespeech.t2s.exps.ort_predict_e2e
paddlespeech.t2s.exps.ort_predict_streaming
- paddlespeech.t2s.exps.stream_play_tts
paddlespeech.t2s.exps.syn_utils
paddlespeech.t2s.exps.synthesize
paddlespeech.t2s.exps.synthesize_e2e
diff --git a/docs/source/api/paddlespeech.t2s.exps.stream_play_tts.rst b/docs/source/api/paddlespeech.t2s.exps.stream_play_tts.rst
deleted file mode 100644
index cb22dde0..00000000
--- a/docs/source/api/paddlespeech.t2s.exps.stream_play_tts.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.t2s.exps.stream\_play\_tts module
-==============================================
-
-.. automodule:: paddlespeech.t2s.exps.stream_play_tts
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.t2s.models.ernie_sat.mlm.rst b/docs/source/api/paddlespeech.t2s.models.ernie_sat.mlm.rst
deleted file mode 100644
index f0e8fd11..00000000
--- a/docs/source/api/paddlespeech.t2s.models.ernie_sat.mlm.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.t2s.models.ernie\_sat.mlm module
-=============================================
-
-.. automodule:: paddlespeech.t2s.models.ernie_sat.mlm
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.core.rst b/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.core.rst
deleted file mode 100644
index 7aaba795..00000000
--- a/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.core.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.t2s.models.vits.monotonic\_align.core module
-=========================================================
-
-.. automodule:: paddlespeech.t2s.models.vits.monotonic_align.core
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.rst b/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.rst
deleted file mode 100644
index 25c819a7..00000000
--- a/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.rst
+++ /dev/null
@@ -1,16 +0,0 @@
-paddlespeech.t2s.models.vits.monotonic\_align package
-=====================================================
-
-.. automodule:: paddlespeech.t2s.models.vits.monotonic_align
- :members:
- :undoc-members:
- :show-inheritance:
-
-Submodules
-----------
-
-.. toctree::
- :maxdepth: 4
-
- paddlespeech.t2s.models.vits.monotonic_align.core
- paddlespeech.t2s.models.vits.monotonic_align.setup
diff --git a/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.setup.rst b/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.setup.rst
deleted file mode 100644
index a93c3b8b..00000000
--- a/docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.setup.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-paddlespeech.t2s.models.vits.monotonic\_align.setup module
-==========================================================
-
-.. automodule:: paddlespeech.t2s.models.vits.monotonic_align.setup
- :members:
- :undoc-members:
- :show-inheritance:
diff --git a/docs/source/api/paddlespeech.t2s.models.vits.rst b/docs/source/api/paddlespeech.t2s.models.vits.rst
index 3146094b..205496f0 100644
--- a/docs/source/api/paddlespeech.t2s.models.vits.rst
+++ b/docs/source/api/paddlespeech.t2s.models.vits.rst
@@ -12,7 +12,6 @@ Subpackages
.. toctree::
:maxdepth: 4
- paddlespeech.t2s.models.vits.monotonic_align
paddlespeech.t2s.models.vits.wavenet
Submodules
diff --git a/docs/source/tts/demo.rst b/docs/source/tts/demo.rst
index ca2fd98e..1ae687f8 100644
--- a/docs/source/tts/demo.rst
+++ b/docs/source/tts/demo.rst
@@ -42,7 +42,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
Printing, in the only sense with which we are at present concerned, differs from most if not from all the arts and crafts represented in the Exhibition
-
+
@@ -50,7 +50,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
-
+
@@ -61,7 +61,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
in being comparatively modern.
-
+
@@ -70,7 +70,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
-
+
@@ -81,7 +81,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
For although the Chinese took impressions from wood blocks engraved in relief for centuries before the woodcutters of the Netherlands, by a similar process
-
+
@@ -89,7 +89,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
-
+
@@ -100,7 +100,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
produced the block books, which were the immediate predecessors of the true printed book
-
+
@@ -108,7 +108,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
-
+
@@ -119,7 +119,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
the invention of movable metal letters in the middle of the fifteenth century may justly be considered as the invention of the art of printing.
-
+
@@ -127,7 +127,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
-
+
@@ -153,7 +153,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
昨日,这名“伤者”与医生全部被警方依法刑事拘留
-
+
@@ -161,7 +161,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
-
+
@@ -172,7 +172,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
钱伟长想到上海来办学校是经过深思熟虑的。
-
+
@@ -180,7 +180,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
-
+
@@ -191,7 +191,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
她见我一进门就骂,吃饭时也骂,骂得我抬不起头。
-
+
@@ -199,7 +199,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
-
+
@@ -210,7 +210,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
李述德在离开之前,只说了一句“柱驼杀父亲了”
-
+
@@ -218,7 +218,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
-
+
@@ -230,7 +230,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
这种车票和保险单捆绑出售属于重复性购买。
-
+
@@ -238,7 +238,7 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
-
+
@@ -271,7 +271,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
Life was like a box of chocolates, you never know what you're gonna get.
-
+
@@ -279,7 +279,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -290,7 +290,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
With great power there must come great responsibility.
-
+
@@ -298,7 +298,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -309,7 +309,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
To be or not to be, that’s a question.
-
+
@@ -318,7 +318,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -330,7 +330,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
A man can be destroyed but not defeated.
-
+
@@ -339,7 +339,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -350,7 +350,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
Do not, for one repulse, give up the purpose that you resolved to effort.
-
+
@@ -359,7 +359,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -370,7 +370,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
Death is just a part of life, something we're all destined to do.
-
+
@@ -379,7 +379,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -390,7 +390,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
I think it's hard winning a war with words.
-
+
@@ -399,7 +399,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -410,7 +410,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
Don’t argue with the people of strong determination, because they may change the fact!
-
+
@@ -419,7 +419,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -430,7 +430,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
Love you three thousand times.
-
+
@@ -439,7 +439,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -465,7 +465,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
凯莫瑞安联合体的经济崩溃,迫在眉睫。
-
+
@@ -473,7 +473,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -484,7 +484,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
对于所有想要离开那片废土,去寻找更美好生活的人来说。
-
+
@@ -492,7 +492,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -503,7 +503,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
克哈,是你们所有人安全的港湾。
-
+
@@ -511,7 +511,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -523,7 +523,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
为了保护尤摩扬人民不受异虫的残害,我所做的,比他们自己的领导委员会都多。
-
+
@@ -531,7 +531,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -542,7 +542,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
无论他们如何诽谤我,我将继续为所有泰伦人的最大利益,而努力奋斗。
-
+
@@ -550,7 +550,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -561,7 +561,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
身为你们的元首,我带领泰伦人实现了人类统治领地和经济的扩张。
-
+
@@ -569,7 +569,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -580,7 +580,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
我们将继续成长,用行动回击那些只会说风凉话,不愿意和我们相向而行的害群之马。
-
+
@@ -588,7 +588,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -599,7 +599,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
帝国武装力量,无数的优秀儿女,正时刻守卫着我们的家园大门,但是他们孤木难支。
-
+
@@ -607,7 +607,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -618,7 +618,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
凡是今天应征入伍者,所获的所有刑罚罪责,减半。
-
+
@@ -626,7 +626,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -641,11 +641,11 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
- FastSpeech2-Conformer + ParallelWaveGAN
+ FastSpeech2-Conformer + ParallelWaveGAN
-
+
@@ -655,7 +655,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -665,7 +665,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -676,7 +676,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -686,7 +686,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -696,7 +696,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -706,7 +706,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -716,7 +716,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -726,7 +726,7 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
-
+
@@ -756,7 +756,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -764,7 +764,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -774,7 +774,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -782,7 +782,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -792,7 +792,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -800,7 +800,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -810,7 +810,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -818,7 +818,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -828,7 +828,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -836,7 +836,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -846,7 +846,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -854,7 +854,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -864,7 +864,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -872,7 +872,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -882,7 +882,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -890,7 +890,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -900,7 +900,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -908,7 +908,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -918,7 +918,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -926,7 +926,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -936,7 +936,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -944,7 +944,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -954,7 +954,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -962,7 +962,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -972,7 +972,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -980,7 +980,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -990,7 +990,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -998,7 +998,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1008,7 +1008,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1016,7 +1016,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1026,7 +1026,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1034,7 +1034,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1044,7 +1044,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1052,7 +1052,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1062,7 +1062,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1070,7 +1070,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1080,7 +1080,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1088,7 +1088,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1098,7 +1098,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1106,7 +1106,7 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
-
+
@@ -1142,7 +1142,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1150,7 +1150,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1158,7 +1158,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1168,7 +1168,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1176,7 +1176,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1184,7 +1184,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1194,7 +1194,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1202,7 +1202,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1210,7 +1210,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1220,7 +1220,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1228,7 +1228,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1236,7 +1236,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1246,7 +1246,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1254,7 +1254,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1262,7 +1262,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1272,7 +1272,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1280,7 +1280,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1288,7 +1288,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1298,7 +1298,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1306,7 +1306,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1314,7 +1314,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1324,7 +1324,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1332,7 +1332,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1340,7 +1340,7 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
-
+
@@ -1374,7 +1374,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1382,7 +1382,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1392,7 +1392,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1400,7 +1400,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1410,7 +1410,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1418,7 +1418,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1428,7 +1428,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1436,7 +1436,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1446,7 +1446,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1454,7 +1454,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1464,7 +1464,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1472,7 +1472,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1482,7 +1482,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1490,7 +1490,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1500,7 +1500,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1508,7 +1508,7 @@ The nomal audios are in the second column of the previous table.
-
+
@@ -1542,7 +1542,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
他只是一个纸老虎。
-
+
@@ -1550,7 +1550,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
-
+
@@ -1561,7 +1561,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
手表厂有五种好产品。
-
+
@@ -1569,7 +1569,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
-
+
@@ -1580,7 +1580,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
老板的轿车需要保养。
-
+
@@ -1588,7 +1588,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
-
+
@@ -1599,7 +1599,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
我们所有人都好喜欢你呀。
-
+
@@ -1607,7 +1607,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
-
+
@@ -1618,7 +1618,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
岂有此理。
-
+
@@ -1626,7 +1626,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
-
+
@@ -1637,7 +1637,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
虎骨酒多少钱一瓶。
-
+
@@ -1645,7 +1645,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
-
+
@@ -1656,7 +1656,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
这件事情需要冷处理。
-
+
@@ -1664,7 +1664,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
-
+
@@ -1675,7 +1675,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
这个老奶奶是个大喇叭。
-
+
@@ -1683,7 +1683,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
-
+
@@ -1694,7 +1694,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
我喜欢说相声。
-
+
@@ -1702,7 +1702,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
-
+
@@ -1713,7 +1713,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
有一天,我路过了一栋楼。
-
+
@@ -1721,7 +1721,7 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
-
+
@@ -1735,4 +1735,142 @@ We use ``FastSpeech2`` + ``ParallelWaveGAN`` here.
-
\ No newline at end of file
+
+Finetune FastSpeech2 for CSMSC
+--------------------------------------
+
+Finetuning demos of `tts_finetune/tts3 `_ for CSMSC dataset.
+
+When finetuning for CSMSC, we thought ``Freeze encoder`` > ``Non Frozen`` > ``Freeze encoder && duration_predictor`` for audio quality.
+
+.. raw:: html
+
+
+ CSMSC reference audio (fastspeech2_csmsc + hifigan_aishlle3 in CLI): 欢迎使用飞桨语音套件。
+
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+
+ Frozen Method
+ train_num=10, bs=10, epoch=100, lr=1e-4
+ train_num=18, bs=18, epoch=100, lr=1e-4
+ train_num=97, bs=64, epoch=100, lr=1e-4
+ train_num=196, bs=64, epoch=100, lr=1e-4
+
+
+ Non Frozen
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+ Freeze encoder
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+ Freeze encoder && duration_predictor
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+
+ Your browser does not support the audio
element.
+
+
+
+
+
+
+
diff --git a/docs/source/tts/demo_2.rst b/docs/source/tts/demo_2.rst
index 2f0ca7cd..06d0d039 100644
--- a/docs/source/tts/demo_2.rst
+++ b/docs/source/tts/demo_2.rst
@@ -19,7 +19,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
早上好,今天是2020/10/29,最低温度是-3°C。
-
+
@@ -27,7 +27,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
-
+
@@ -38,7 +38,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
你好,我的编号是37249,很高兴为您服务。
-
+
@@ -46,7 +46,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
-
+
@@ -57,7 +57,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
我们公司有37249个人。
-
+
@@ -65,7 +65,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
-
+
@@ -76,7 +76,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
我出生于2005年10月8日。
-
+
@@ -84,7 +84,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
-
+
@@ -95,7 +95,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
我们习惯在12:30吃中午饭。
-
+
@@ -103,7 +103,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
-
+
@@ -114,7 +114,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
只要有超过3/4的人投票同意,你就会成为我们的新班长。
-
+
@@ -122,7 +122,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
-
+
@@ -133,7 +133,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
我要买一只价值999.9元的手表。
-
+
@@ -141,7 +141,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
-
+
@@ -152,7 +152,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
我的手机号是18544139121,欢迎来电。
-
+
@@ -160,7 +160,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
-
+
@@ -171,7 +171,7 @@ FastSpeech2 + Parallel WaveGAN in CSMSC
明天有62%的概率降雨。
-
+