use cmd for voiceclone , finetune and ernie-sat

pull/2412/head
iftaken 2 years ago
parent 5e714ecb4a
commit a488ec8342

@ -13,4 +13,7 @@
*.pdmodel *.pdmodel
*/source/* */source/*
*/PaddleSpeech/* */PaddleSpeech/*
*/tmp*/*
*/duration.txt
*/oov_info.txt

@ -6,12 +6,23 @@ PaddleSpeechDemo 是一个以 PaddleSpeech 的语音交互功能为主体开发
主要功能: 主要功能:
`main.py` 中包含功能
+ 语音聊天PaddleSpeech 的语音识别能力+语音合成能力,对话部分基于 PaddleNLP 的闲聊功能 + 语音聊天PaddleSpeech 的语音识别能力+语音合成能力,对话部分基于 PaddleNLP 的闲聊功能
+ 声纹识别PaddleSpeech 的声纹识别功能展示 + 声纹识别PaddleSpeech 的声纹识别功能展示
+ 语音识别:支持【实时语音识别】,【端到端识别】,【音频文件识别】三种模式 + 语音识别:支持【实时语音识别】,【端到端识别】,【音频文件识别】三种模式
+ 语音合成:支持【流式合成】与【端到端合成】两种方式 + 语音合成:支持【流式合成】与【端到端合成】两种方式
+ 语音指令:基于 PaddleSpeech 的语音识别能力与 PaddleNLP 的信息抽取,实现交通费的智能报销 + 语音指令:基于 PaddleSpeech 的语音识别能力与 PaddleNLP 的信息抽取,实现交通费的智能报销
`vc.py` 中包含功能
+ 一句话合成:基于 GE2E 和 ECAPA-TDNN 模型的一句话合成方案,可以模仿输入的音频的音色进行合成任务
+ GE2E 音色克隆方案可以参考: [【FastSpeech2 + AISHELL-3 Voice Cloning】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/vc1)
+ ECAPA-TDNN 音色克隆方案可以参考: [【FastSpeech2 + AISHELL-3 Voice Cloning (ECAPA-TDNN)
】](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/vc2)
+ 小数据微调基于小数据集的微调方案内置用12句话标贝中文女声微调示例你也可以通过一键重置录制自己的声音注意在安静环境下录制效果会更好你可以
+ ENIRE-SAT语言-语音跨模态大模型 ENIRE-SAT 可视化展示示例,支持个性化合成,跨语言语音合成(音频为中文则输入英文文本进行合成),语音编辑(修改音频文字中间的结果)功能
运行效果: 运行效果:
![效果](docs/效果展示.png) ![效果](docs/效果展示.png)
@ -20,18 +31,130 @@ PaddleSpeechDemo 是一个以 PaddleSpeech 的语音交互功能为主体开发
### 后端环境安装 ### 后端环境安装
Model 中如果有模型之前是已经下载过的,就不需要在下载了,引一个软链接到 `source/model` 目录下就可以了,不需要重复下载
``` ```
# 安装环境 # 安装环境
cd speech_server cd speech_server
pip install -r requirements.txt pip install -r requirements.txt -i https://mirror.baidu.com/pypi/simple
# 下载 ie 模型,针对地点进行微调,效果更好,不下载的话会使用其它版本,效果没有这个好 mkdir source
cd source cd source
# 下载 wav
wget https://paddlespeech.bj.bcebos.com/demos/speech_web/wav_vc.zip
unzip wav_vc.zip
# 下载相关模型
mkdir model mkdir model
cd model cd model
# 下载IE模型
wget https://bj.bcebos.com/paddlenlp/applications/speech-cmd-analysis/finetune/model_state.pdparams wget https://bj.bcebos.com/paddlenlp/applications/speech-cmd-analysis/finetune/model_state.pdparams
# 如果不需要 vc.py 的相关功能,可以跳过下面这些模型
# 下载 GE2E 相关模型
wget https://bj.bcebos.com/paddlespeech/Parakeet/released_models/ge2e/ge2e_ckpt_0.3.zip
unzip ge2e_ckpt_0.3.zip
wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_aishell3_ckpt_0.5.zip
unzip pwg_aishell3_ckpt_0.5.zip
wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip
unzip fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip
# 下载 TDNN 相关模型
wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_ckpt_vc2_1.2.0.zip
unzip fastspeech2_aishell3_ckpt_vc2_1.2.0.zip
# 下载 SAT 相关模型
# aishell3
wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/ernie_sat/erniesat_aishell3_ckpt_1.2.0.zip
unzip erniesat_aishell3_ckpt_1.2.0.zip
# vctk
wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/ernie_sat/erniesat_vctk_ckpt_1.2.0.zip
unzip erniesat_vctk_ckpt_1.2.0.zip
# aishell3_vctk
wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/ernie_sat/erniesat_aishell3_vctk_ckpt_1.2.0.zip
unzip erniesat_aishell3_vctk_ckpt_1.2.0.zip
# 下载 finetune 相关模型
wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_ckpt_1.1.0.zip
unzip fastspeech2_aishell3_ckpt_1.1.0.zip
# 下载声码器
wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_ckpt_0.2.0.zip
unzip hifigan_aishell3_ckpt_0.2.0.zip
wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_ckpt_0.2.0.zip
unzip hifigan_vctk_ckpt_0.2.0.zip
``` ```
### 配置 `vc.py` 相关环境
如果不需要启动 vc 相关功能,可以跳过下面这些步骤
#### ERNIE-SAT 环境配置
ERNIE-SAT 体验依赖于 PaddleSpeech 中和 ERNIE-SAT相关的三个 `examples` 环境的配置,先确保按照在对应路径下,测试脚本可以运行(主要是 `tools`, `download`, `source`),部分可通用,在对用的环境下生成软链接就可以
在`PaddleSpeech/demos/speech_web/speech_server` 路径下,生成 tools 和 download ,可以参考 `examples/aishell3/ernie_sat`中的 `README.md` , 如果你之前已经下载过了,可以使用软链接
准备 `tools`文件夹:
```shell
mkdir -p tools/aligner
cd tools
# download MFA
wget https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/releases/download/v1.0.1/montreal-forced-aligner_linux.tar.gz
# extract MFA
tar xvf montreal-forced-aligner_linux.tar.gz
# fix .so of MFA
cd montreal-forced-aligner/lib
ln -snf libpython3.6m.so.1.0 libpython3.6m.so
cd -
# download align models and dicts
cd aligner
wget https://paddlespeech.bj.bcebos.com/MFA/ernie_sat/aishell3_model.zip
wget https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/simple.lexicon
wget https://paddlespeech.bj.bcebos.com/MFA/ernie_sat/vctk_model.zip
wget https://paddlespeech.bj.bcebos.com/MFA/LJSpeech-1.1/cmudict-0.7b
cd ../../
```
准备 `download` 文件夹
```bash
mkdir download
cd download
wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_conformer_baker_ckpt_0.5.zip
wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_ljspeech_ckpt_0.5.zip
unzip fastspeech2_conformer_baker_ckpt_0.5.zip
unzip fastspeech2_nosil_ljspeech_ckpt_0.5.zip
cd ../
```
1. 中文 SAT 配置,参考 `examples/aishell3/ernie_sat` 按照 `README.md` 要求配置环境,确保在路径下执行 `run.sh` 相关示例代码有效
2. 英文 SAT 配置,参考 `examples/vctk/ernie_sat`,按照 `README.md` 要求配置环境,确保在路径下执行 `run.sh` 相关示例代码有效
3. 中英文 SAT 配置,参考 `examples/aishell3_vctk/ernie_sat`,按照 `README.md` 要求配置环境,确保在路径下执行 `run.sh` 相关示例代码有效
#### finetune 环境配置
`finetune` 环境配置请参考 `examples/other/tts_finetune/tts3`,按照 `README.md` 要求配置环境,确保在路径下执行 `run.sh` 相关示例代码有效
`finetune` 需要在 `tools/aligner` 中解压 `aishell3_model.zip`,包含`tools/aligner/aishell3_model/meta.yaml` 文件finetune中需要使用
```bash
cd tools/aligner
unzip aishell3.zip
cd ../..
```
### 前端环境安装 ### 前端环境安装
前端依赖 `node.js` ,需要提前安装,确保 `npm` 可用,`npm` 测试版本 `8.3.1`,建议下载[官网](https://nodejs.org/en/)稳定版的 `node.js` 前端依赖 `node.js` ,需要提前安装,确保 `npm` 可用,`npm` 测试版本 `8.3.1`,建议下载[官网](https://nodejs.org/en/)稳定版的 `node.js`
@ -51,12 +174,26 @@ yarn install
### 开启后端服务 ### 开启后端服务
#### `main.py`
【语音聊天】【声纹识别】【语音识别】【语音合成】【语音指令】功能体验,可直接使用下面的代码
``` ```
cd speech_server cd speech_server
# 默认8010端口 # 默认8010端口
python main.py --port 8010 python main.py --port 8010
``` ```
#### `vc.py`
【一句话合成】【小数据微调】【ENIRE-SAT】体验都依赖于MFA体验前先确保 MFA 可用项目tools中使用的 mfa v1 linux 版本,先确保在当前环境下 mfa 可用
```
cd speech_server
# 默认8010端口
python vc.py --port 8010
```
> 如果你是其它的系统,可以使用 conda 安装 mfa v2 进行体验,安装请参考 [Montreal Forced Aligner](https://montreal-forced-aligner.readthedocs.io/en/latest/getting_started.html),使用 MFA v2 需要自行配置环境,并修改调用 MFA 相关的代码, mfa v1 与 mfa v2 使用上有差异
### 开启前端服务 ### 开启前端服务
``` ```

Binary file not shown.

Before

Width:  |  Height:  |  Size: 84 KiB

After

Width:  |  Height:  |  Size: 106 KiB

@ -0,0 +1,12 @@
###########################################################
# PARAS SETTING #
###########################################################
# Set to -1 to indicate that the parameter is the same as the pretrained model configuration
batch_size: 10
learning_rate: 0.0001 # learning rate
num_snapshots: -1
# frozen_layers should be a list
# if you don't need to freeze, set frozen_layers to []
frozen_layers: ["encoder"]

@ -1,8 +1,3 @@
# todo:
# 1. 开启服务
# 2. 接收录音音频,返回识别结果
# 3. 接收ASR识别结果返回NLP对话结果
# 4. 接收NLP对话结果返回TTS音频
import argparse import argparse
import base64 import base64
import datetime import datetime
@ -32,6 +27,7 @@ from starlette.requests import Request
from starlette.responses import FileResponse from starlette.responses import FileResponse
from starlette.websockets import WebSocketState as WebSocketState from starlette.websockets import WebSocketState as WebSocketState
from paddlespeech.cli.tts.infer import TTSExecutor
from paddlespeech.server.engine.asr.online.python.asr_engine import PaddleASRConnectionHanddler from paddlespeech.server.engine.asr.online.python.asr_engine import PaddleASRConnectionHanddler
from paddlespeech.server.utils.audio_process import float2pcm from paddlespeech.server.utils.audio_process import float2pcm
@ -55,7 +51,7 @@ asr_config = "conf/ws_conformer_wenetspeech_application_faster.yaml"
asr_init_path = "source/demo/demo.wav" asr_init_path = "source/demo/demo.wav"
db_path = "source/db/vpr.sqlite" db_path = "source/db/vpr.sqlite"
ie_model_path = "source/model" ie_model_path = "source/model"
tts_model = TTSExecutor()
# 路径配置 # 路径配置
UPLOAD_PATH = "source/vpr" UPLOAD_PATH = "source/vpr"
WAV_PATH = "source/wav" WAV_PATH = "source/wav"
@ -72,6 +68,14 @@ manager = ConnectionManager()
aumanager = AudioMannger(chatbot) aumanager = AudioMannger(chatbot)
aumanager.init() aumanager.init()
vpr = VPR(db_path, dim=192, top_k=5) vpr = VPR(db_path, dim=192, top_k=5)
# 初始化下载模型
tts_model(
text="今天天气准不错",
output="test.wav",
am='fastspeech2_mix',
spk_id=174,
voc='hifigan_csmsc',
lang='mix', )
# 服务配置 # 服务配置
@ -331,6 +335,7 @@ async def ieOffline(nlp_base: NlpBase):
##################################################################### #####################################################################
# 端到端合成
@app.post("/tts/offline") @app.post("/tts/offline")
async def text2speechOffline(tts_base: TtsBase): async def text2speechOffline(tts_base: TtsBase):
text = tts_base.text text = tts_base.text
@ -341,7 +346,15 @@ async def text2speechOffline(tts_base: TtsBase):
datetime.datetime.now(), '%Y%m%d%H%M%S') + randName() + ".wav" datetime.datetime.now(), '%Y%m%d%H%M%S') + randName() + ".wav"
out_file_path = os.path.join(WAV_PATH, now_name) out_file_path = os.path.join(WAV_PATH, now_name)
# 保存为文件再转成base64传输 # 保存为文件再转成base64传输
chatbot.text2speech(text, outpath=out_file_path) # chatbot.text2speech(text, outpath=out_file_path)
# 使用中英混合CLI
tts_model(
text=text,
output=out_file_path,
am='fastspeech2_mix',
spk_id=174,
voc='hifigan_csmsc',
lang='mix')
with open(out_file_path, "rb") as f: with open(out_file_path, "rb") as f:
data_bin = f.read() data_bin = f.read()
base_str = base64.b64encode(data_bin) base_str = base64.b64encode(data_bin)

@ -1,13 +1,8 @@
aiofiles aiofiles
faiss-cpu faiss-cpu
fastapi praatio==5.0.0
librosa
numpy
paddlenlp
paddlepaddle
paddlespeech
pydantic pydantic
python-multipartscikit_learn python-multipart
SoundFile scikit_learn
starlette starlette
uvicorn uvicorn

@ -0,0 +1,182 @@
import os
from .util import run_cmd
class SAT:
def __init__(self):
# pretrain model path
self.zh_pretrain_model_path = os.path.realpath(
"source/model/erniesat_aishell3_ckpt_1.2.0")
self.en_pretrain_model_path = os.path.realpath(
"source/model/erniesat_vctk_ckpt_1.2.0")
self.cross_pretrain_model_path = os.path.realpath(
"source/model/erniesat_aishell3_vctk_ckpt_1.2.0")
self.zh_voc_model_path = os.path.realpath(
"source/model/hifigan_aishell3_ckpt_0.2.0")
self.eb_voc_model_path = os.path.realpath(
"source/model/hifigan_vctk_ckpt_0.2.0")
self.cross_voc_model_path = os.path.realpath(
"source/model/hifigan_aishell3_ckpt_0.2.0")
self.now_file_path = os.path.dirname(__file__)
self.BIN_DIR = os.path.realpath(
os.path.join(self.now_file_path,
"../../../../paddlespeech/t2s/exps/ernie_sat"))
def zh_synthesize_edit(self,
old_str: str,
new_str: str,
input_name: os.PathLike,
output_name: os.PathLike,
task_name: str="synthesize",
erniesat_ckpt_name: str="snapshot_iter_289500.pdz"):
if task_name not in ['synthesize', 'edit']:
print("task name only in ['edit', 'synthesize']")
return None
# 运行时的 PYTHONPATH
PYTHONPATH = os.path.realpath(
os.path.join(self.now_file_path,
"../../../../examples/aishell3/ernie_sat"))
# 推理文件配置
config_path = os.path.join(self.zh_pretrain_model_path, "default.yaml")
phones_dict = os.path.join(self.zh_pretrain_model_path,
"phone_id_map.txt")
erniesat_ckpt = os.path.join(self.zh_pretrain_model_path,
erniesat_ckpt_name)
erniesat_stat = os.path.join(self.zh_pretrain_model_path,
"speech_stats.npy")
voc = "hifigan_aishell3"
voc_config = os.path.join(self.zh_voc_model_path, "default.yaml")
voc_ckpt = os.path.join(self.zh_voc_model_path,
"snapshot_iter_2500000.pdz")
voc_stat = os.path.join(self.zh_voc_model_path, "feats_stats.npy")
cmd = self.get_cmd(
task_name,
input_name,
old_str,
new_str,
config_path,
phones_dict,
erniesat_ckpt,
erniesat_stat,
voc,
voc_config,
voc_ckpt,
voc_stat,
output_name,
source_lang="zh",
target_lang="zh")
return run_cmd(cmd, output_name)
def crossclone(self,
old_str: str,
new_str: str,
input_name: os.PathLike,
output_name: os.PathLike,
source_lang: str,
target_lang: str,
erniesat_ckpt_name: str="snapshot_iter_489000.pdz"):
PYTHONPATH = os.path.realpath(
os.path.join(self.now_file_path,
"../../../../examples/aishell3_vctk/ernie_sat"))
# 推理文件配置
config_path = os.path.join(self.cross_pretrain_model_path,
"default.yaml")
phones_dict = os.path.join(self.cross_pretrain_model_path,
"phone_id_map.txt")
erniesat_ckpt = os.path.join(self.cross_pretrain_model_path,
erniesat_ckpt_name)
erniesat_stat = os.path.join(self.cross_pretrain_model_path,
"speech_stats.npy")
voc = "hifigan_aishell3"
voc_config = os.path.join(self.cross_voc_model_path, "default.yaml")
voc_ckpt = os.path.join(self.cross_voc_model_path,
"snapshot_iter_2500000.pdz")
voc_stat = os.path.join(self.cross_voc_model_path, "feats_stats.npy")
task_name = "synthesize"
cmd = self.get_cmd(task_name, input_name, old_str, new_str, config_path,
phones_dict, erniesat_ckpt, erniesat_stat, voc,
voc_config, voc_ckpt, voc_stat, output_name,
source_lang, target_lang)
return run_cmd(cmd, output_name)
def en_synthesize_edit(self,
old_str: str,
new_str: str,
input_name: os.PathLike,
output_name: os.PathLike,
task_name: str="synthesize",
erniesat_ckpt_name: str="snapshot_iter_199500.pdz"):
PYTHONPATH = os.path.realpath(
os.path.join(self.now_file_path,
"../../../../examples/vctk/ernie_sat"))
# 推理文件配置
config_path = os.path.join(self.en_pretrain_model_path, "default.yaml")
phones_dict = os.path.join(self.en_pretrain_model_path,
"phone_id_map.txt")
erniesat_ckpt = os.path.join(self.en_pretrain_model_path,
erniesat_ckpt_name)
erniesat_stat = os.path.join(self.en_pretrain_model_path,
"speech_stats.npy")
voc = "hifigan_aishell3"
voc_config = os.path.join(self.zh_voc_model_path, "default.yaml")
voc_ckpt = os.path.join(self.zh_voc_model_path,
"snapshot_iter_2500000.pdz")
voc_stat = os.path.join(self.zh_voc_model_path, "feats_stats.npy")
cmd = self.get_cmd(
task_name,
input_name,
old_str,
new_str,
config_path,
phones_dict,
erniesat_ckpt,
erniesat_stat,
voc,
voc_config,
voc_ckpt,
voc_stat,
output_name,
source_lang="en",
target_lang="en")
return run_cmd(cmd, output_name)
def get_cmd(self, task_name, input_name, old_str, new_str, config_path,
phones_dict, erniesat_ckpt, erniesat_stat, voc, voc_config,
voc_ckpt, voc_stat, output_name, source_lang, target_lang):
cmd = f"""
FLAGS_allocator_strategy=naive_best_fit \
FLAGS_fraction_of_gpu_memory_to_use=0.01 \
python3 {self.BIN_DIR}/synthesize_e2e.py \
--task_name={task_name} \
--wav_path={input_name} \
--old_str='{old_str}' \
--new_str='{new_str}' \
--source_lang={source_lang} \
--target_lang={target_lang} \
--erniesat_config={config_path} \
--phones_dict={phones_dict} \
--erniesat_ckpt={erniesat_ckpt} \
--erniesat_stat={erniesat_stat} \
--voc={voc} \
--voc_config={voc_config} \
--voc_ckpt={voc_ckpt} \
--voc_stat={voc_stat} \
--output_name={output_name}
"""
return cmd

@ -0,0 +1,94 @@
import os
from .util import run_cmd
def find_max_ckpt(model_path):
max_ckpt = 0
for filename in os.listdir(model_path):
if filename.endswith('.pdz'):
files = filename[:-4]
a1, a2, it = files.split("_")
if int(it) > max_ckpt:
max_ckpt = int(it)
return max_ckpt
class FineTune:
def __init__(self):
self.now_file_path = os.path.dirname(__file__)
self.PYTHONPATH = os.path.realpath(
os.path.join(self.now_file_path,
"../../../../examples/other/tts_finetune/tts3"))
self.BIN_DIR = os.path.realpath(
os.path.join(self.now_file_path,
"../../../../paddlespeech/t2s/exps/fastspeech2"))
self.pretrained_model_dir = os.path.realpath(
"source/model/fastspeech2_aishell3_ckpt_1.1.0")
self.voc_model_dir = os.path.realpath(
"source/model/hifigan_aishell3_ckpt_0.2.0")
self.finetune_config = os.path.join("conf/tts3_finetune.yaml")
def finetune(self, input_dir, exp_dir='temp', epoch=100):
mfa_dir = os.path.join(exp_dir, 'mfa_result')
dump_dir = os.path.join(exp_dir, 'dump')
output_dir = os.path.join(exp_dir, 'exp')
lang = "zh"
ngpu = 1
cmd = f"""
python3 {self.PYTHONPATH}/finetune.py \
--input_dir={input_dir} \
--pretrained_model_dir={self.pretrained_model_dir} \
--mfa_dir={mfa_dir} \
--dump_dir={dump_dir} \
--output_dir={output_dir} \
--lang={lang} \
--ngpu={ngpu} \
--epoch={epoch} \
--finetune_config={self.finetune_config}
"""
print(cmd)
return run_cmd(cmd, exp_dir)
def synthesize(self, text, wav_name, out_wav_dir, exp_dir='temp'):
voc = "hifigan_aishell3"
dump_dir = os.path.join(exp_dir, 'dump')
output_dir = os.path.join(exp_dir, 'exp')
text_path = os.path.join(exp_dir, 'sentences.txt')
lang = "zh"
ngpu = 1
model_path = f"{output_dir}/checkpoints"
ckpt = find_max_ckpt(model_path)
# 生成对应的语句
with open(text_path, "w", encoding='utf8') as f:
f.write(wav_name + " " + text)
cmd = f"""
FLAGS_allocator_strategy=naive_best_fit \
FLAGS_fraction_of_gpu_memory_to_use=0.01 \
python3 {self.BIN_DIR}/../synthesize_e2e.py \
--am=fastspeech2_aishell3 \
--am_config={self.pretrained_model_dir}/default.yaml \
--am_ckpt={output_dir}/checkpoints/snapshot_iter_{ckpt}.pdz \
--am_stat={self.pretrained_model_dir}/speech_stats.npy \
--voc={voc} \
--voc_config={self.voc_model_dir}/default.yaml \
--voc_ckpt={self.voc_model_dir}/snapshot_iter_2500000.pdz \
--voc_stat={self.voc_model_dir}/feats_stats.npy \
--lang={lang} \
--text={text_path} \
--output_dir={out_wav_dir} \
--phones_dict={dump_dir}/phone_id_map.txt \
--speaker_dict={dump_dir}/speaker_id_map.txt \
--spk_id=0
"""
out_path = os.path.join(out_wav_dir, f"{wav_name}.wav")
return run_cmd(cmd, out_path)

@ -0,0 +1,59 @@
import os
import shutil
from .util import run_cmd
class VoiceCloneGE2E():
def __init__(self):
# Path 到指定路径上
self.now_file_path = os.path.dirname(__file__)
self.BIN_DIR = os.path.realpath(
os.path.join(self.now_file_path,
"../../../../paddlespeech/t2s/exps"))
# am
self.am = "fastspeech2_aishell3"
self.am_config = "source/model/fastspeech2_nosil_aishell3_vc1_ckpt_0.5/default.yaml"
self.am_ckpt = "source/model/fastspeech2_nosil_aishell3_vc1_ckpt_0.5/snapshot_iter_96400.pdz"
self.am_stat = "source/model/fastspeech2_nosil_aishell3_vc1_ckpt_0.5/speech_stats.npy"
self.phones_dict = "source/model/fastspeech2_nosil_aishell3_vc1_ckpt_0.5/phone_id_map.txt"
# voc
self.voc = "pwgan_aishell3"
self.voc_config = "source/model/pwg_aishell3_ckpt_0.5/default.yaml"
self.voc_ckpt = "source/model/pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz"
self.voc_stat = "source/model/pwg_aishell3_ckpt_0.5/feats_stats.npy"
# ge2e
self.ge2e_params_path = "source/model/ge2e_ckpt_0.3/step-3000000.pdparams"
def vc(self, text, input_wav, out_wav):
# input wav 需要形成临时单独文件夹
_, full_file_name = os.path.split(input_wav)
ref_audio_dir = os.path.realpath("tmp_dir/ge2e")
if os.path.exists(ref_audio_dir):
shutil.rmtree(ref_audio_dir)
else:
os.makedirs(ref_audio_dir, exist_ok=True)
shutil.copy(input_wav, ref_audio_dir)
output_dir = os.path.dirname(out_wav)
cmd = f"""
python3 {self.BIN_DIR}/voice_cloning.py \
--am={self.am} \
--am_config={self.am_config} \
--am_ckpt={self.am_ckpt} \
--am_stat={self.am_stat} \
--voc={self.voc} \
--voc_config={self.voc_config} \
--voc_ckpt={self.voc_ckpt} \
--voc_stat={self.voc_stat} \
--ge2e_params_path={self.ge2e_params_path} \
--text="{text}" \
--input-dir={ref_audio_dir} \
--output-dir={output_dir} \
--phones-dict={self.phones_dict}
"""
output_name = os.path.join(output_dir, full_file_name)
return run_cmd(cmd, output_name=output_name)

@ -0,0 +1,56 @@
import os
import shutil
from .util import run_cmd
class VoiceCloneTDNN():
def __init__(self):
# Path 到指定路径上
self.now_file_path = os.path.dirname(__file__)
self.BIN_DIR = os.path.realpath(
os.path.join(self.now_file_path,
"../../../../paddlespeech/t2s/exps"))
self.am = "fastspeech2_aishell3"
self.am_config = "source/model/fastspeech2_aishell3_ckpt_vc2_1.2.0/default.yaml"
self.am_ckpt = "source/model/fastspeech2_aishell3_ckpt_vc2_1.2.0/snapshot_iter_96400.pdz"
self.am_stat = "source/model/fastspeech2_aishell3_ckpt_vc2_1.2.0/speech_stats.npy"
self.phones_dict = "source/model/fastspeech2_aishell3_ckpt_vc2_1.2.0/phone_id_map.txt"
# voc
self.voc = "pwgan_aishell3"
self.voc_config = "source/model/pwg_aishell3_ckpt_0.5/default.yaml"
self.voc_ckpt = "source/model/pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz"
self.voc_stat = "source/model/pwg_aishell3_ckpt_0.5/feats_stats.npy"
def vc(self, text, input_wav, out_wav):
# input wav 需要形成临时单独文件夹
_, full_file_name = os.path.split(input_wav)
ref_audio_dir = os.path.realpath("tmp_dir/tdnn")
if os.path.exists(ref_audio_dir):
shutil.rmtree(ref_audio_dir)
else:
os.makedirs(ref_audio_dir, exist_ok=True)
shutil.copy(input_wav, ref_audio_dir)
output_dir = os.path.dirname(out_wav)
cmd = f"""
python3 {self.BIN_DIR}/voice_cloning.py \
--am={self.am} \
--am_config={self.am_config} \
--am_ckpt={self.am_ckpt} \
--am_stat={self.am_stat} \
--voc={self.voc} \
--voc_config={self.voc_config} \
--voc_ckpt={self.voc_ckpt} \
--voc_stat={self.voc_stat} \
--text="{text}" \
--input-dir={ref_audio_dir} \
--output-dir={output_dir} \
--phones-dict={self.phones_dict} \
--use_ecapa=True
"""
output_name = os.path.join(output_dir, full_file_name)
return run_cmd(cmd, output_name=output_name)

@ -1,4 +1,6 @@
import os
import random import random
import subprocess
def randName(n=5): def randName(n=5):
@ -11,3 +13,20 @@ def SuccessRequest(result=None, message="ok"):
def ErrorRequest(result=None, message="error"): def ErrorRequest(result=None, message="error"):
return {"code": -1, "result": result, "message": message} return {"code": -1, "result": result, "message": message}
def run_cmd(cmd, output_name):
p = subprocess.Popen(cmd, shell=True)
res = p.wait()
print(cmd)
print("运行结果:", res)
if res == 0:
# 运行成功
if os.path.exists(output_name):
return output_name
else:
# 合成的文件不存在
return None
else:
# 运行失败
return None

@ -0,0 +1,547 @@
import argparse
import base64
import datetime
import json
import os
from typing import List
import aiofiles
import librosa
import soundfile as sf
import uvicorn
from fastapi import FastAPI
from fastapi import UploadFile
from pydantic import BaseModel
from src.ernie_sat import SAT
from src.finetune import FineTune
from src.ge2e_clone import VoiceCloneGE2E
from src.tdnn_clone import VoiceCloneTDNN
from src.util import *
from starlette.responses import FileResponse
from paddlespeech.server.utils.audio_process import float2pcm
# 解析配置
parser = argparse.ArgumentParser(prog='PaddleSpeechDemo', add_help=True)
parser.add_argument(
"--port",
action="store",
type=int,
help="port of the app",
default=8010,
required=False)
args = parser.parse_args()
port = args.port
# 这里会对finetune产生影响所以finetune使用了cmd
vc_model = VoiceCloneGE2E()
vc_model_tdnn = VoiceCloneTDNN()
sat_model = SAT()
ft_model = FineTune()
# 配置文件
tts_config = "conf/tts_online_application.yaml"
asr_config = "conf/ws_conformer_wenetspeech_application_faster.yaml"
asr_init_path = "source/demo/demo.wav"
db_path = "source/db/vc.sqlite"
ie_model_path = "source/model"
# 路径配置
VC_UPLOAD_PATH = "source/wav/vc/upload"
VC_OUT_PATH = "source/wav/vc/out"
FT_UPLOAD_PATH = "source/wav/finetune/upload"
FT_OUT_PATH = "source/wav/finetune/out"
FT_LABEL_PATH = "source/wav/finetune/label.json"
FT_LABEL_TXT_PATH = "source/wav/finetune/labels.txt"
FT_DEFAULT_PATH = "source/wav/finetune/default"
FT_EXP_BASE_PATH = "tmp_dir/finetune"
SAT_UPLOAD_PATH = "source/wav/SAT/upload"
SAT_OUT_PATH = "source/wav/SAT/out"
SAT_LABEL_PATH = "source/wav/SAT/label.json"
# SAT 标注结果初始化
if os.path.exists(SAT_LABEL_PATH):
with open(SAT_LABEL_PATH, "r", encoding='utf8') as f:
sat_label_dic = json.load(f)
else:
sat_label_dic = {}
# ft 标注结果初始化
if os.path.exists(FT_LABEL_PATH):
with open(FT_LABEL_PATH, "r", encoding='utf8') as f:
ft_label_dic = json.load(f)
else:
ft_label_dic = {}
# 新建文件夹
base_sources = [
VC_UPLOAD_PATH,
VC_OUT_PATH,
FT_UPLOAD_PATH,
FT_OUT_PATH,
FT_DEFAULT_PATH,
SAT_UPLOAD_PATH,
SAT_OUT_PATH,
]
for path in base_sources:
os.makedirs(path, exist_ok=True)
#####################################################################
########################### APP初始化 ###############################
#####################################################################
app = FastAPI()
######################################################################
########################### 接口类型 #################################
#####################################################################
# 接口结构
class VcBase(BaseModel):
wavName: str
wavPath: str
class VcBaseText(BaseModel):
wavName: str
wavPath: str
text: str
func: str
class VcBaseSAT(BaseModel):
old_str: str
new_str: str
language: str
function: str
wav: str # base64编码
filename: str
class FTPath(BaseModel):
dataPath: str
class VcBaseFT(BaseModel):
wav: str # base64编码
filename: str
wav_path: str
class VcBaseFTModel(BaseModel):
wav_path: str
class VcBaseFTSyn(BaseModel):
exp_path: str
text: str
######################################################################
########################### 文件列表查询与保存服务 #################################
#####################################################################
def getVCList(path):
VC_FileDict = []
# 查询upload路径下的wav文件名
for root, dirs, files in os.walk(path, topdown=False):
for name in files:
# print(os.path.join(root, name))
VC_FileDict.append({'name': name, 'path': os.path.join(root, name)})
VC_FileDict = sorted(VC_FileDict, key=lambda x: x['name'], reverse=True)
return VC_FileDict
async def saveFiles(files, SavePath):
right = 0
error = 0
error_info = "错误文件:"
for file in files:
try:
if 'blob' in file.filename:
out_file_path = os.path.join(
SavePath,
datetime.datetime.strftime(datetime.datetime.now(),
'%H%M') + randName(3) + ".wav")
else:
out_file_path = os.path.join(SavePath, file.filename)
print("上传文件名:", out_file_path)
async with aiofiles.open(out_file_path, 'wb') as out_file:
content = await file.read() # async read
await out_file.write(content) # async write
# 将文件转成24k, 16bit类型的wav文件
wav, sr = librosa.load(out_file_path, sr=16000)
sf.write(out_file_path, data=wav, samplerate=sr)
right += 1
except Exception as e:
error += 1
error_info = error_info + file.filename + " " + str(e) + "\n"
continue
return f"上传成功:{right}, 上传失败:{error}, 失败原因: {error_info}"
# 音频下载
@app.post("/vc/download")
async def VcDownload(base: VcBase):
if os.path.exists(base.wavPath):
return FileResponse(base.wavPath)
else:
return ErrorRequest(message="下载请求失败,文件不存在")
# 音频下载base64
@app.post("/vc/download_base64")
async def VcDownloadBase64(base: VcBase):
if os.path.exists(base.wavPath):
# 将文件转成16k, 16bit类型的wav文件
wav, sr = librosa.load(base.wavPath, sr=16000)
wav = float2pcm(wav) # float32 to int16
wav_bytes = wav.tobytes() # to bytes
wav_base64 = base64.b64encode(wav_bytes).decode('utf8')
return SuccessRequest(result=wav_base64)
else:
return ErrorRequest(message="播放请求失败,文件不存在")
######################################################################
########################### VC 服务 #################################
#####################################################################
# 上传文件
@app.post("/vc/upload")
async def VcUpload(files: List[UploadFile]):
# res = saveFiles(files, VC_UPLOAD_PATH)
right = 0
error = 0
error_info = "错误文件:"
for file in files:
try:
if 'blob' in file.filename:
out_file_path = os.path.join(
VC_UPLOAD_PATH,
datetime.datetime.strftime(datetime.datetime.now(),
'%H%M') + randName(3) + ".wav")
else:
out_file_path = os.path.join(VC_UPLOAD_PATH, file.filename)
print("上传文件名:", out_file_path)
async with aiofiles.open(out_file_path, 'wb') as out_file:
content = await file.read() # async read
await out_file.write(content) # async write
# 将文件转成24k, 16bit类型的wav文件
wav, sr = librosa.load(out_file_path, sr=16000)
sf.write(out_file_path, data=wav, samplerate=sr)
right += 1
except Exception as e:
error += 1
error_info = error_info + file.filename + " " + str(e) + "\n"
continue
return SuccessRequest(
result=f"上传成功:{right}, 上传失败:{error}, 失败原因: {error_info}")
# 获取文件列表
@app.get("/vc/list")
async def VcList():
res = getVCList(VC_UPLOAD_PATH)
return SuccessRequest(result=res)
# 获取音频文件
@app.post("/vc/file")
async def VcFileGet(base: VcBase):
if os.path.exists(base.wavPath):
return FileResponse(base.wavPath)
else:
return ErrorRequest(result="获取文件失败")
# 删除音频文件
@app.post("/vc/del")
async def VcFileDel(base: VcBase):
if os.path.exists(base.wavPath):
os.remove(base.wavPath)
return SuccessRequest(result="删除成功")
else:
return ErrorRequest(result="删除失败")
# 声音克隆G2P
@app.post("/vc/clone_g2p")
async def VcCloneG2P(base: VcBaseText):
if os.path.exists(base.wavPath):
try:
if base.func == 'ge2e':
wavName = base.wavName
wavPath = os.path.join(VC_OUT_PATH, wavName)
vc_model.vc(
text=base.text, input_wav=base.wavPath, out_wav=wavPath)
else:
wavName = base.wavName
wavPath = os.path.join(VC_OUT_PATH, wavName)
vc_model_tdnn.vc(
text=base.text, input_wav=base.wavPath, out_wav=wavPath)
res = {"wavName": wavName, "wavPath": wavPath}
return SuccessRequest(result=res)
except Exception as e:
print(e)
return ErrorRequest(message="克隆失败,合成过程报错")
else:
return ErrorRequest(message="克隆失败,音频不存在")
######################################################################
########################### SAT 服务 #################################
#####################################################################
# 声音克隆SAT
@app.post("/vc/clone_sat")
async def VcCloneSAT(base: VcBaseSAT):
# 重新整理 sat_label_dict
if base.filename not in sat_label_dic or sat_label_dic[
base.filename] != base.old_str:
sat_label_dic[base.filename] = base.old_str
with open(SAT_LABEL_PATH, "w", encoding='utf8') as f:
json.dump(sat_label_dic, f, ensure_ascii=False, indent=4)
input_file_path = base.wav
# 选择任务
if base.language == "zh":
# 中文
if base.function == "synthesize":
output_file_path = os.path.join(SAT_OUT_PATH,
"sat_syn_zh_" + base.filename)
# 中文克隆
sat_result = sat_model.zh_synthesize_edit(
old_str=base.old_str,
new_str=base.new_str,
input_name=os.path.realpath(input_file_path),
output_name=os.path.realpath(output_file_path),
task_name="synthesize")
elif base.function == "edit":
output_file_path = os.path.join(SAT_OUT_PATH,
"sat_edit_zh_" + base.filename)
# 中文语音编辑
sat_result = sat_model.zh_synthesize_edit(
old_str=base.old_str,
new_str=base.new_str,
input_name=os.path.realpath(input_file_path),
output_name=os.path.realpath(output_file_path),
task_name="edit")
elif base.function == "crossclone":
output_file_path = os.path.join(SAT_OUT_PATH,
"sat_cross_zh_" + base.filename)
# 中文跨语言
sat_result = sat_model.crossclone(
old_str=base.old_str,
new_str=base.new_str,
input_name=os.path.realpath(input_file_path),
output_name=os.path.realpath(output_file_path),
source_lang="zh",
target_lang="en")
else:
return ErrorRequest(
message="请检查功能选项是否正确,仅支持:synthesize, edit, crossclone")
elif base.language == "en":
if base.function == "synthesize":
output_file_path = os.path.join(SAT_OUT_PATH,
"sat_syn_zh_" + base.filename)
# 英文语音克隆
sat_result = sat_model.en_synthesize_edit(
old_str=base.old_str,
new_str=base.new_str,
input_name=os.path.realpath(input_file_path),
output_name=os.path.realpath(output_file_path),
task_name="synthesize")
elif base.function == "edit":
output_file_path = os.path.join(SAT_OUT_PATH,
"sat_edit_zh_" + base.filename)
# 英文语音编辑
sat_result = sat_model.en_synthesize_edit(
old_str=base.old_str,
new_str=base.new_str,
input_name=os.path.realpath(input_file_path),
output_name=os.path.realpath(output_file_path),
task_name="edit")
elif base.function == "crossclone":
output_file_path = os.path.join(SAT_OUT_PATH,
"sat_cross_zh_" + base.filename)
# 英文跨语言
sat_result = sat_model.crossclone(
old_str=base.old_str,
new_str=base.new_str,
input_name=os.path.realpath(input_file_path),
output_name=os.path.realpath(output_file_path),
source_lang="en",
target_lang="zh")
else:
return ErrorRequest(
message="请检查功能选项是否正确,仅支持:synthesize, edit, crossclone")
else:
return ErrorRequest(message="请检查功能选项是否正确,仅支持中文和英文")
if sat_result:
return SuccessRequest(result=sat_result, message="SAT合成成功")
else:
return ErrorRequest(message="SAT 合成失败,请从后台检查错误信息!")
# SAT 文件列表
@app.get("/sat/list")
async def SatList():
res = []
filelist = getVCList(SAT_UPLOAD_PATH)
for fileitem in filelist:
if fileitem['name'] in sat_label_dic:
fileitem['label'] = sat_label_dic[fileitem['name']]
else:
fileitem['label'] = ""
res.append(fileitem)
return SuccessRequest(result=res)
# 上传 SAT 音频
# 上传文件
@app.post("/sat/upload")
async def SATUpload(files: List[UploadFile]):
right = 0
error = 0
error_info = "错误文件:"
for file in files:
try:
if 'blob' in file.filename:
out_file_path = os.path.join(
SAT_UPLOAD_PATH,
datetime.datetime.strftime(datetime.datetime.now(),
'%H%M') + randName(3) + ".wav")
else:
out_file_path = os.path.join(SAT_UPLOAD_PATH, file.filename)
print("上传文件名:", out_file_path)
async with aiofiles.open(out_file_path, 'wb') as out_file:
content = await file.read() # async read
await out_file.write(content) # async write
# 将文件转成24k, 16bit类型的wav文件
wav, sr = librosa.load(out_file_path, sr=16000)
sf.write(out_file_path, data=wav, samplerate=sr)
right += 1
except Exception as e:
error += 1
error_info = error_info + file.filename + " " + str(e) + "\n"
continue
return SuccessRequest(
result=f"上传成功:{right}, 上传失败:{error}, 失败原因: {error_info}")
######################################################################
########################### FinueTune 服务 #################################
#####################################################################
# finetune 文件列表
@app.post("/finetune/list")
async def FineTuneList(Path: FTPath):
dataPath = Path.dataPath
if dataPath == "default":
# 默认路径
FT_PATH = FT_DEFAULT_PATH
else:
FT_PATH = dataPath
res = []
filelist = getVCList(FT_PATH)
for name, value in ft_label_dic.items():
wav_path = os.path.join(FT_PATH, name)
if not os.path.exists(wav_path):
wav_path = ""
d = {'text': value['text'], 'name': name, 'path': wav_path}
res.append(d)
return SuccessRequest(result=res)
# 一键重置,获取新的文件地址
@app.get('/finetune/newdir')
async def FTGetNewDir():
new_path = os.path.join(FT_UPLOAD_PATH, randName(3))
if not os.path.exists(new_path):
os.makedirs(new_path, exist_ok=True)
# 把 labels.txt 复制进去
cmd = f"cp {FT_LABEL_TXT_PATH} {new_path}"
os.system(cmd)
return SuccessRequest(result=new_path)
# finetune 上传文件
@app.post("/finetune/upload")
async def FTUpload(base: VcBaseFT):
try:
# 文件夹是否存在
if not os.path.exists(base.wav_path):
os.makedirs(base.wav_path)
# 保存音频文件
out_file_path = os.path.join(base.wav_path, base.filename)
wav_b = base64.b64decode(base.wav)
async with aiofiles.open(out_file_path, 'wb') as out_file:
await out_file.write(wav_b) # async write
return SuccessRequest(result="上传成功")
except Exception as e:
return ErrorRequest(result="上传失败")
# finetune 微调
@app.post("/finetune/clone_finetune")
async def FTModel(base: VcBaseFTModel):
# 先检查 wav_path 是否有效
if base.wav_path == 'default':
data_path = FT_DEFAULT_PATH
else:
data_path = base.wav_path
if not os.path.exists(data_path):
return ErrorRequest(message="数据文件夹不存在")
data_base = data_path.split(os.sep)[-1]
exp_dir = os.path.join(FT_EXP_BASE_PATH, data_base)
try:
exp_dir = ft_model.finetune(
input_dir=os.path.realpath(data_path),
exp_dir=os.path.realpath(exp_dir))
if exp_dir:
return SuccessRequest(result=exp_dir)
else:
return ErrorRequest(message="微调失败")
except Exception as e:
print(e)
return ErrorRequest(message="微调失败")
# finetune 合成
@app.post("/finetune/clone_finetune_syn")
async def FTSyn(base: VcBaseFTSyn):
try:
if not os.path.exists(base.exp_path):
return ErrorRequest(result="模型路径不存在")
wav_name = randName(5)
wav_path = ft_model.synthesize(
text=base.text,
wav_name=wav_name,
out_wav_dir=os.path.realpath(FT_OUT_PATH),
exp_dir=os.path.realpath(base.exp_path))
if wav_path:
res = {"wavName": wav_name + ".wav", "wavPath": wav_path}
return SuccessRequest(result=res)
else:
return ErrorRequest(message="音频合成失败")
except Exception as e:
return ErrorRequest(message="音频合成失败")
if __name__ == '__main__':
uvicorn.run(app=app, host='0.0.0.0', port=port)

@ -8,6 +8,7 @@
"preview": "vite preview" "preview": "vite preview"
}, },
"dependencies": { "dependencies": {
"@element-plus/icons-vue": "^2.0.9",
"ant-design-vue": "^2.2.8", "ant-design-vue": "^2.2.8",
"axios": "^0.26.1", "axios": "^0.26.1",
"element-plus": "^2.1.9", "element-plus": "^2.1.9",
@ -18,6 +19,7 @@
}, },
"devDependencies": { "devDependencies": {
"@vitejs/plugin-vue": "^2.3.0", "@vitejs/plugin-vue": "^2.3.0",
"vite": "^2.9.0" "vite": "^2.9.13",
"@vue/compiler-sfc": "^3.1.0"
} }
} }

@ -19,6 +19,26 @@ export const apiURL = {
CHAT_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/offlineStream', // ChatBot websocket 接口 CHAT_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/offlineStream', // ChatBot websocket 接口
ASR_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/onlineStream', // Stream ASR 接口 ASR_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/onlineStream', // Stream ASR 接口
TTS_SOCKET_RECORD: 'ws://localhost:8010/ws/tts/online', // Stream TTS 接口 TTS_SOCKET_RECORD: 'ws://localhost:8010/ws/tts/online', // Stream TTS 接口
// voice clone
// Voice Clone
VC_List: '/api/vc/list',
SAT_List: '/api/sat/list',
FineTune_List: '/api/finetune/list',
VC_Upload: '/api/vc/upload',
SAT_Upload: '/api/sat/upload',
FineTune_Upload: '/api/finetune/upload',
FineTune_NewDir: '/api/finetune/newdir',
VC_Download: '/api/vc/download',
VC_Download_Base64: '/api/vc/download_base64',
VC_Del: '/api/vc/del',
VC_CloneG2p: '/api/vc/clone_g2p',
VC_CloneSAT: '/api/vc/clone_sat',
VC_CloneFineTune: '/api/finetune/clone_finetune',
VC_CloneFineTuneSyn: '/api/finetune/clone_finetune_syn',
} }

@ -0,0 +1,88 @@
import axios from 'axios'
import {apiURL} from "./API.js"
// 上传音频-vc
export async function vcUpload(params){
const result = await axios.post(apiURL.VC_Upload, params);
return result
}
// 上传音频-sat
export async function satUpload(params){
const result = await axios.post(apiURL.SAT_Upload, params);
return result
}
// 上传音频-finetune
export async function fineTuneUpload(params){
const result = await axios.post(apiURL.FineTune_Upload, params);
return result
}
// 删除音频
export async function vcDel(params){
const result = await axios.post(apiURL.VC_Del, params);
return result
}
// 获取音频列表vc
export async function vcList(){
const result = await axios.get(apiURL.VC_List);
return result
}
// 获取音频列表Sat
export async function satList(){
const result = await axios.get(apiURL.SAT_List);
return result
}
// 获取音频列表fineTune
export async function fineTuneList(params){
const result = await axios.post(apiURL.FineTune_List, params);
return result
}
// fineTune 一键重置 获取新的文件夹
export async function fineTuneNewDir(){
const result = await axios.get(apiURL.FineTune_NewDir);
return result
}
// 获取音频数据
export async function vcDownload(params){
const result = await axios.post(apiURL.VC_Download, params);
return result
}
// 获取音频数据Base64
export async function vcDownloadBase64(params){
const result = await axios.post(apiURL.VC_Download_Base64, params);
return result
}
// 克隆合成G2P
export async function vcCloneG2P(params){
const result = await axios.post(apiURL.VC_CloneG2p, params);
return result
}
// 克隆合成SAT
export async function vcCloneSAT(params){
const result = await axios.post(apiURL.VC_CloneSAT, params);
return result
}
// 克隆合成 - finetune 微调
export async function vcCloneFineTune(params){
const result = await axios.post(apiURL.VC_CloneFineTune, params);
return result
}
// 克隆合成 - finetune 合成
export async function vcCloneFineTuneSyn(params){
const result = await axios.post(apiURL.VC_CloneFineTuneSyn, params);
return result
}

@ -4,7 +4,7 @@
飞桨-PaddleSpeech 飞桨-PaddleSpeech
</div> </div>
<div className="speech_header_describe"> <div className="speech_header_describe">
PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源模型库用于语音和音频中的各种关键任务的开发欢迎大家Star收藏鼓励 PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源模型库用于语音和音频中的各种关键任务的开发支持语音识别语音合成声纹识别声音分类语音唤醒语音翻译等多种语音任务荣获 NAACL2022 Best Demo Award 如果你喜欢这个示例欢迎在 github star 收藏鼓励
</div> </div>
<div className="speech_header_link_box"> <div className="speech_header_link_box">
<a href="https://github.com/PaddlePaddle/PaddleSpeech" className="speech_header_link" target='_blank' rel='noreferrer' key={index}> <a href="https://github.com/PaddlePaddle/PaddleSpeech" className="speech_header_link" target='_blank' rel='noreferrer' key={index}>

@ -43,6 +43,7 @@
margin-bottom: 40px; margin-bottom: 40px;
display: flex; display: flex;
align-items: center; align-items: center;
margin-top: 40px;
}; };
.speech_header_link { .speech_header_link {
display: block; display: block;

@ -6,6 +6,10 @@ import TTST from './SubMenu/TTS/TTST.vue'
import VPRT from './SubMenu/VPR/VPRT.vue' import VPRT from './SubMenu/VPR/VPRT.vue'
import IET from './SubMenu/IE/IET.vue' import IET from './SubMenu/IE/IET.vue'
import VoiceCloneT from './SubMenu/VoiceClone/VoiceClone.vue'
import ENIRE_SATT from './SubMenu/ENIRE_SAT/ENIRE_SAT.vue'
import FineTuneT from './SubMenu/FineTune/FineTune.vue'
</script> </script>
<template> <template>
@ -37,6 +41,15 @@ import IET from './SubMenu/IE/IET.vue'
<el-tab-pane label="语音指令" key="5"> <el-tab-pane label="语音指令" key="5">
<IET></IET> <IET></IET>
</el-tab-pane> </el-tab-pane>
<el-tab-pane label="一句话合成" key="6">
<VoiceCloneT></VoiceCloneT>
</el-tab-pane>
<el-tab-pane label="小数据微调" key="7">
<FineTuneT></FineTuneT>
</el-tab-pane>
<el-tab-pane label="ENIRE SAT" key="8">
<ENIRE_SATT></ENIRE_SATT>
</el-tab-pane>
</el-tabs> </el-tabs>
</div> </div>
</div> </div>

@ -0,0 +1,487 @@
<template>
<div class="sat">
<el-row :gutter="20">
<el-col :span="12"><div class="grid-content ep-bg-purple" />
<el-row :gutter="60" class="btn_row_wav" justify="center">
<el-button class="ml-3" v-if="onEnrollRec === 0" @click="startRecorderEnroll()" type="primary"></el-button>
<el-button class="ml-3" v-else-if="onEnrollRec === 1" @click="stopRecorderEnroll()" type="danger">停止录音</el-button>
<el-button class="ml-3" v-else @click="uploadRecord()" type="success">上传录音</el-button>
<a>&#12288</a>
<el-upload
:multiple="false"
:accept="'.wav'"
:auto-upload="false"
:on-change="handleChange"
:show-file-list="false"
>
<el-button class="ml-3" type="success">上传音频文件</el-button>
</el-upload>
</el-row>
<div class="recording_table">
<el-table :data="vcDatas" border class="recording_table_box" scrollbar-always-on max-height="250px">
<!-- <el-table-column prop="wavId" label="序号" width="60"/> -->
<el-table-column prop="wavName" label="文件名" width="150"/>
<el-table-column label="文本">
<template #default="scope">
<el-input
v-model="scope.row.label"
:autosize="{ minRows: 8, maxRows: 13 }"
placeholder="Please input"
/>
</template>
</el-table-column>
<el-table-column label="操作" width="80">
<template #default="scope">
<div class="flex justify-space-between mb-4 flex-wrap gap-4">
<a @click="PlayTable(scope.row.wavId)"><el-icon><VideoPlay /></el-icon></a>
<a>&#12288</a>
<a @click="delWav(scope.row.wavId)"><el-icon><DeleteFilled /></el-icon></a>
</div>
</template>
</el-table-column>
<el-table-column fixed="right" label="选择" width="70">
<template #default="scope">
<el-switch v-model="scope.row.status" @click="choseWav(scope.row.wavId)"/>
</template>
</el-table-column>
</el-table>
</div>
</el-col>
<el-col :span="8"><div class="grid-content ep-bg-purple" />
<el-space direction="vertical">
<el-card class="box-card" style="width: 250px; height:310px">
<template #header>
<div class="card-header">
<span>功能选择</span>
</div>
</template>
<el-radio-group v-model="funcMode">
<el-radio label="1" size="middle" border style="margin-bottom: 10px">个性化语音合成</el-radio>
<el-input
v-if="funcMode === '1'"
v-model="ttsText"
:autosize="{ minRows: 2, maxRows: 2 }"
type="textarea"
placeholder="Please input"
style="margin-bottom: 10px"
/>
<el-radio label="2" size="middle" border style="margin-bottom: 10px">跨语言语音合成</el-radio>
<el-input
v-if="funcMode === '2'"
v-model="ttsText"
:autosize="{ minRows: 2, maxRows: 2 }"
type="textarea"
placeholder="Please input"
style="margin-bottom: 10px"
/>
<el-radio label="3" size="middle" border style="margin-bottom: 10px">语音编辑</el-radio>
<el-input
v-if="funcMode === '3'"
v-model="ttsText"
:autosize="{ minRows: 2, maxRows: 2 }"
type="textarea"
placeholder="Please input"
style="margin-bottom: 10px"
/>
</el-radio-group>
</el-card>
</el-space>
</el-col>
<el-col :span="4"><div class="grid-content ep-bg-purple" />
<div class="play_board">
<el-space direction="vertical">
<el-row :gutter="20">
<el-button size="large" v-if="onSyn === 0" type="primary" @click="SatSyn()"></el-button>
<el-button size="large" v-else :loading-icon="Eleme" type="danger">合成中</el-button>
</el-row>
<el-row :gutter="20">
<el-button v-if='this.cloneWav' type="success" @click="PlaySyn()"></el-button>
<el-button v-else disabled type="success" @click="PlaySyn()"></el-button>
<el-button v-if='this.cloneWav' type="primary" @click="downLoadCloneWav()"></el-button>
<el-button v-else disabled type="primary" @click="downLoadCloneWav()"></el-button>
</el-row>
</el-space>
</div>
</el-col>
</el-row>
</div>
</template>
<script>
import { vcCloneSAT, vcDownload, vcDownloadBase64, satUpload, satList, vcDel } from '../../../api/ApiVC'
import Recorder from 'js-audio-recorder'
let audioCtx = new AudioContext({
latencyHint: 'interactive',
sampleRate: 24000,
});
//
const recorder = new Recorder({
sampleBits: 16, // 8 1616
sampleRate: 16000, // 110251600022050240004410048000chrome48000
numChannels: 1, // 1 2 1
compiling: true
})
export default {
name:"",
data(){
return {
uploadStatus : 0,
recognitionStatus : 0,
asrResult : "",
indicator : "",
filename: "",
upfile: "",
mode: 1,
language: 1,
wav_input: "卡尔普陪外孙玩滑梯",
new_input: "卡尔普陪外孙打滑梯",
received_file:"",
// 线
onEnrollRec: 0,
onSyn:0,
vcDatas: [],
funcMode: '1',
selected_Id: -1,
ttsText: '',
cloneWav: '',
wav:''
}
},
mounted () {
this.GetList()
},
methods:{
//
async GetList(){
this.vcDatas =[]
const result = await satList();
console.log("List: ", result);
for(let i=0; i < result.data.result.length; i++){
this.vcDatas.push({
wavName: result.data.result[i]['name'],
wavId: i,
wavPath: result.data.result[i]['path'],
status: false,
label: result.data.result[i]['label']
})
}
console.log("vcDatas: ", this.vcDatas);
this.$nextTick(()=>{})
},
//
async handleChange(file, fileList){
for(let i=0; i<fileList.length; i++){
this.uploadFile(fileList[i])
}
this.GetList()
},
async uploadFile(file){
let formData = new FormData();
formData.append('files', file.raw);
const result = await satUpload(formData);
if (result.data.code === 0) {
this.$message.success("音频上传成功")
} else {
this.$message.error("音频上传失败")
}
},
//
startRecorderEnroll(){
this.onEnrollRec = 1
recorder.clear()
recorder.start()
},
//
stopRecorderEnroll(){
this.onEnrollRec = 2
recorder.stop()
this.wav = recorder.getWAVBlob()
},
//
async uploadRecord(){
this.onEnrollRec = 0
if(this.wav === ""){
this.$message.error("未检测到录音,录音失败,请重新录制")
return
} else {
if(this.wav === ''){
this.$message.error("请先完成录音");
this.onEnrollRec = 0
return
} else {
let formData = new FormData();
formData.append('files', this.wav);
const result = await satUpload(formData);
console.log(result)
this.GetList()
}
this.$message.success("录音上传成功")
}
},
//
async delWav(wavId){
console.log('wavId', wavId)
//
const result = await vcDel(
{
wavName: this.vcDatas[wavId]['wavName'],
wavPath: this.vcDatas[wavId]['wavPath']
}
);
if(!result.data.code){
this.$message.success("删除成功")
} else {
this.$message.error(result.data.msg)
}
this.GetList()
this.reset()
},
//
async PlayTable(wavId){
this.Play(this.vcDatas[wavId])
},
//
async Play(wavBase){
//
const result = await vcDownloadBase64(wavBase);
// console.log('play result', result)
if (result.data.code === 0) {
// base
let typedArray = this.base64ToUint8Array(result.data.result)
// wav
let view = new DataView(typedArray.buffer);
view = Recorder.encodeWAV(view, 16000, 16000, 1, 16, true);
//
this.playAudioData(view.buffer);
};
},
// chose wav
choseWav(wavId){
this.cloneWav = ''
this.nowFile = this.vcDatas[wavId].wavName
this.nowIndex = wavId
// only wavId is true else false
for(let i=0; i<this.vcDatas.length; i++){
if(i==wavId){
this.vcDatas[wavId].status = true
this.selected_Id = wavId
this.ttsText = this.vcDatas[wavId]['label']
} else {
this.vcDatas[i].status = false
}
}
this.$nextTick(()=>{})
},
//
playAudioData(wav_buffer){
audioCtx.decodeAudioData(wav_buffer, buffer => {
let source = audioCtx.createBufferSource();
source.buffer = buffer
source.connect(audioCtx.destination);
source.start();
}, function (e) {
});
},
base64ToUint8Array(base64String){
const padding = '='.repeat((4 - base64String.length % 4) % 4);
const base64 = (base64String + padding)
.replace(/-/g, '+')
.replace(/_/g, '/');
const rawData = window.atob(base64);
const outputArray = new Uint8Array(rawData.length);
for (let i = 0; i < rawData.length; ++i) {
outputArray[i] = rawData.charCodeAt(i);
}
return outputArray;
},
//
hasChinese(str) {
return /[\u4E00-\u9FA5]+/g.test(str)
},
// SAT
async SatSyn(){
// select id
if(this.selected_Id < 0){
return this.$message.error("请先选择音频文件!")
}
//
if(!this.vcDatas[this.selected_Id]['label']){
return this.$message.error("音频对应文本不可以为空!")
}
//
if(!this.ttsText){
return this.$message.error("合成文本不可以为空!")
}
//
this.onSyn = 1
// clone wav
this.cloneWav = ""
const old_str = this.vcDatas[this.selected_Id]['label']
const new_str = this.ttsText
let language = ""
//
if(this.hasChinese(old_str)){
language = "zh"
} else{
language = "en"
}
//
let func = ""
if(this.funcMode === '1') {
func = "synthesize"
} else if(this.funcMode === '2'){
func = "crossclone"
} else {
func = "edit"
}
let wav_path = this.vcDatas[this.selected_Id]['wavPath']
let filename = this.vcDatas[this.selected_Id]['wavName']
const data = {
old_str: old_str,
new_str: new_str,
language: language,
function: func,
wav: wav_path,
filename: filename
}
console.log("sat data: ", data)
// sat
const result = await vcCloneSAT(data)
//
this.onSyn = 0
console.log(result);
// debugger
if (result.data.code === 0) {
this.$message.success(result.data.message)
//
this.cloneWav = result.data.result
console.log("cloneWave", this.cloneWav);
} else {
this.$message.error(result.data.message)
};
},
//
//
async PlaySyn(){
//
const data = {
wavName: "sat_"+this.filename,
wavPath: this.cloneWav
}
const result = await vcDownloadBase64(data);
// console.log('play result', result)
if (result.data.code === 0) {
// base
let typedArray = this.base64ToUint8Array(result.data.result)
// wav
let view = new DataView(typedArray.buffer);
view = Recorder.encodeWAV(view, 16000, 16000, 1, 16, true);
//
this.playAudioData(view.buffer);
};
},
//
async downLoadCloneWav(){
if(this.cloneWav === ""){
this.$message.error("音频合成完毕后再下载!")
} else {
// const result = await vcDownload(this.cloneWav);
//
const data = {
wavName: "sat_"+this.filename,
wavPath: this.cloneWav
}
const result = await vcDownloadBase64(data);
let view;
// console.log('play result', result)
if (result.data.code === 0) {
// base
let typedArray = this.base64ToUint8Array(result.data.result)
// wav
view = new DataView(typedArray.buffer);
view = Recorder.encodeWAV(view, 16000, 16000, 1, 16, true);
//
// this.playAudioData(view.buffer);
}
console.log(view.buffer)
// debugger
const blob = new Blob([view.buffer], { type: 'audio/wav' });
const fileName = new Date().getTime() + '.wav';
const down = document.createElement('a');
down.download = fileName;
down.style.display = 'none';//,
down.href = URL.createObjectURL(blob);
document.body.appendChild(down);
down.click();
URL.revokeObjectURL(down.href); // URL
document.body.removeChild(down);//
}
},
}
}
</script>
<style lang="less" scoped>
// @import "./style.less";
.sat {
width: 1200px;
height: 410px;
background: #FFFFFF;
padding: 5px 80px 56px 80px;
box-sizing: border-box;
}
.el-row {
margin-bottom: 20px;
}
.grid-content {
border-radius: 4px;
min-height: 36px;
}
.play_board{
height: 100%;
display: flex;
align-items: center;
}
</style>

@ -0,0 +1,427 @@
<template>
<div class="finetune">
<el-row :gutter="20">
<el-col :span="12"><div class="grid-content ep-bg-purple" />
<el-row :gutter="60" class="btn_row_wav" justify="center">
<el-button class="ml-3" @click="clearAll()" type="primary">一键重置</el-button>
<el-button class="ml-3" @click="resetDefault()" type="primary">默认示例</el-button>
<el-button v-if='onFinetune === 0' class="ml-3" @click="fineTuneModel()" type="primary">一键微调</el-button>
<el-button v-else-if='onFinetune === 1' class="ml-3" @click="fineTuneModel()" type="danger">微调中</el-button>
<el-button v-else-if='onFinetune === 2' class="ml-3" @click="resetFinetuneBtn()" type="success">微调成功</el-button>
<el-button v-else class="ml-3" @click="resetFinetuneBtn()" type="success">微调失败</el-button>
<!-- <el-button class="ml-3" @click="chooseHistory()" type="warning">历史数据选择</el-button> -->
</el-row>
<div class="recording_table">
<el-table :data="vcDatas" border class="recording_table_box" scrollbar-always-on max-height="250px">
<el-table-column prop="wavId" label="序号" width="60"/>
<el-table-column prop="text" label="文本" />
<el-table-column label="音频" width="80">
<template #default="scope">
<a v-if="scope.row.wavPath != ''">{{ scope.row.wavName }}</a>
<a v-else>
<el-button class="ml-3" v-if="onEnrollRec === 0" @click="startRecorderEnroll()" type="primary" circle>
<el-icon><Microphone /></el-icon>
</el-button>
<el-button class="ml-3" v-else-if="onEnrollRec === 1" @click="stopRecorderEnroll()" type="danger" circle>
<el-icon><Microphone /></el-icon>
</el-button>
<el-button class="ml-3" v-else @click="uploadRecord(scope.row.wavId)" type="success" circle>
<el-icon><Upload /></el-icon>
</el-button>
</a>
</template>
</el-table-column>
<el-table-column label="操作" width="80" fixed="right">
<template #default="scope">
<div class="flex justify-space-between mb-4 flex-wrap gap-4">
<a @click="PlayTable(scope.row.wavId)"><el-icon><VideoPlay /></el-icon></a>
<a>&#12288</a>
<a @click="delWav(scope.row.wavId)"><el-icon><DeleteFilled /></el-icon></a>
</div>
</template>
</el-table-column>
</el-table>
</div>
</el-col>
<el-col :span="8"><div class="grid-content ep-bg-purple" />
<el-space direction="vertical">
<el-card class="box-card" style="width: 250px; height:310px">
<template #header>
<div class="card-header">
<span>试验路径</span>
<el-input
v-model="expPath"
:autosize="{ minRows: 2, maxRows: 3 }"
type="textarea"
placeholder="一键微调自动生成,可使用历史试验路径"
/>
</div>
</template>
<span>请输入中文文本</span>
<el-input
v-model="ttsText"
:autosize="{ minRows: 5, maxRows: 6 }"
type="textarea"
placeholder="请输入待合成文本"
/>
</el-card>
</el-space>
</el-col>
<el-col :span="4"><div class="grid-content ep-bg-purple" />
<div class="play_board">
<el-space direction="vertical">
<el-row :gutter="20">
<el-button size="large" v-if="onSyn === 0" type="primary" @click="fineTuneSyn()"></el-button>
<el-button size="large" v-else :loading-icon="Eleme" type="danger">合成中</el-button>
</el-row>
<el-row :gutter="20">
<el-button v-if='this.cloneWav' type="primary" @click="PlaySyn()"></el-button>
<el-button v-else disabled type="primary" @click="PlaySyn()"></el-button>
<el-button v-if='this.cloneWav' type="primary" @click="downLoadCloneWav()"></el-button>
<el-button v-else disabled type="primary" @click="downLoadCloneWav()"></el-button>
</el-row>
</el-space>
</div>
</el-col>
</el-row>
</div>
</template>
<script>
import Recorder from 'js-audio-recorder'
import { vcDownload, vcDownloadBase64, vcCloneFineTune, vcCloneFineTuneSyn, fineTuneList, vcDel, fineTuneUpload, fineTuneNewDir } from '../../../api/ApiVC';
//
const recorder = new Recorder({
sampleBits: 16, // 8 1616
sampleRate: 16000, // 110251600022050240004410048000chrome48000
numChannels: 1, // 1 2 1
compiling: true
})
//
const audioCtx = new AudioContext({
latencyHint: 'interactive',
sampleRate: 16000,
});
function blobToDataURL(blob, callback) {
let a = new FileReader();
a.onload = function (e) { callback(e.target.result); }
a.readAsDataURL(blob);
}
export default {
data(){
return {
vcDatas:[],
defaultDataPath: 'default',
nowDataPath: '',
expPath: '',
wav: '',
wav_base64: '',
ttsText: '',
cloneWav: '',
onEnrollRec: 0, //
onFinetune: 0, //
onSyn: 0, //
}
},
mounted () {
this.nowDataPath = this.defaultDataPath
this.GetList()
},
methods: {
// btn
resetFinetuneBtn(){
this.onFinetune = 0
},
//
async clearAll(){
this.vcDatas = []
const result = await fineTuneNewDir()
console.log("clearALL: ", result.data.result);
this.nowDataPath = result.data.result
this.expPath = ''
this.onFinetune = 0
await this.GetList()
},
//
async resetDefault(){
this.nowDataPath = this.defaultDataPath
await this.GetList()
this.expPath = ''
},
//
startRecorderEnroll(){
this.onEnrollRec = 1
recorder.clear()
recorder.start()
},
//
stopRecorderEnroll(){
this.onEnrollRec = 2
recorder.stop()
this.wav = recorder.getWAVBlob()
},
//
async uploadRecord(wavId){
this.onEnrollRec = 0
if(this.wav === ""){
this.$message.error("未检测到录音,录音失败,请重新录制")
return
} else {
if(this.wav === ''){
this.$message.error("请先完成录音");
this.onEnrollRec = 0
return
} else {
let fileRes = ""
let fileString = ""
fileRes = await this.readFile(this.wav);
fileString = fileRes.result;
const audioBase64type = (fileString.match(/data:[^;]*;base64,/))?.[0] ?? '';
const isBase64 = !!fileString.match(/data:[^;]*;base64,/);
const uploadBase64 = fileString.substr(audioBase64type.length);
//
const data = {
'wav': uploadBase64,
'filename': this.vcDatas[wavId]['wavName'],
'wav_path': this.nowDataPath
}
const result = await fineTuneUpload(data);
console.log(result)
this.GetList()
}
this.$message.success("录音上传成功")
}
},
// Blob
readFile(file) {
return new Promise((resolve, reject) => {
const fileReader = new FileReader();
fileReader.onload = function () {
resolve(fileReader);
};
fileReader.onerror = function (err) {
reject(err);
};
fileReader.readAsDataURL(file);
});
},
//
async GetList(){
this.vcDatas = []
const result = await fineTuneList({
dataPath: this.nowDataPath
});
console.log(result, result.data.result);
for(let i=0; i<result.data.result.length; i++){
this.vcDatas.push({
wavId: i,
text: result.data.result[i]['text'],
wavName: result.data.result[i]['name'],
wavPath: result.data.result[i]['path'],
})
}
this.$nextTick(()=>{})
},
//
playAudioData( wav_buffer ) {
audioCtx.decodeAudioData(wav_buffer, buffer => {
var source = audioCtx.createBufferSource();
source.buffer = buffer;
source.connect(audioCtx.destination);
source.start();
}, function(e) {
Recorder.throwError(e);
})
},
// base64
base64ToUint8Array(base64String) {
const padding = '='.repeat((4 - base64String.length % 4) % 4);
const base64 = (base64String + padding)
.replace(/-/g, '+')
.replace(/_/g, '/');
const rawData = window.atob(base64);
const outputArray = new Uint8Array(rawData.length);
for (let i = 0; i < rawData.length; ++i) {
outputArray[i] = rawData.charCodeAt(i);
}
return outputArray;
},
//
async PlayTable(wavId){
this.Play(this.vcDatas[wavId])
},
//
async PlaySyn(){
if(this.cloneWav === ""){
this.$message.error("请合成音频后再播放!!")
return
} else {
this.Play(this.cloneWav)
}
},
//
async Play(wavBase){
//
const result = await vcDownloadBase64(wavBase);
// console.log('play result', result)
if (result.data.code === 0) {
// base
let typedArray = this.base64ToUint8Array(result.data.result)
// wav
let view = new DataView(typedArray.buffer);
view = Recorder.encodeWAV(view, 16000, 16000, 1, 16, true);
//
this.playAudioData(view.buffer);
} else {
this.$message.error("获取音频文件失败")
}
},
//
async downLoadCloneWav(){
if(this.cloneWav === ""){
this.$message.error("音频合成完毕后再下载!")
} else {
// const result = await vcDownload(this.cloneWav);
//
const result = await vcDownloadBase64(this.cloneWav);
let view;
// console.log('play result', result)
if (result.data.code === 0) {
// base
let typedArray = this.base64ToUint8Array(result.data.result)
// wav
view = new DataView(typedArray.buffer);
view = Recorder.encodeWAV(view, 16000, 16000, 1, 16, true);
//
// this.playAudioData(view.buffer);
}
console.log(view.buffer)
// debugger
const blob = new Blob([view.buffer], { type: 'audio/wav' });
const fileName = new Date().getTime() + '.wav';
const down = document.createElement('a');
down.download = fileName;
down.style.display = 'none';//,
down.href = URL.createObjectURL(blob);
document.body.appendChild(down);
down.click();
URL.revokeObjectURL(down.href); // URL
document.body.removeChild(down);//
}
},
//
async delWav(wavId){
if(this.nowDataPath === this.defaultDataPath){
this.$message.error("默认音频不允许删除,可以一键重置,重新录音")
return
}
console.log('wavId', wavId)
//
const result = await vcDel(
{
wavName: this.vcDatas[wavId]['wavName'],
wavPath: this.vcDatas[wavId]['wavPath']
}
);
if(!result.data.code){
this.$message.success("删除成功")
this.GetList()
} else {
this.$message.error("文件删除失败")
}
},
//
async fineTuneModel(){
//
for(let i=0; i < this.vcDatas.length; i++){
if(this.vcDatas['wavPath'] === ''){
return this.$message.error("还有录音未完成,请先完成录音!")
}
}
this.onFinetune = 1
const result = await vcCloneFineTune(
{
wav_path: this.nowDataPath,
}
);
if(!result.data.code){
this.onFinetune = 2
this.expPath = result.data.result
console.log("this.expPath: ", this.expPath)
this.$message.success("小数据微调成功")
} else {
this.onFinetune = 3
this.$message.error(result.data.msg)
}
},
//
async fineTuneSyn(){
if(!this.expPath){
return this.$message.error("请先微调生成模型后再生成!")
}
//
this.onSyn = 1
const result = await vcCloneFineTuneSyn(
{
exp_path: this.expPath,
text: this.ttsText
}
);
this.onSyn = 0
if(!result.data.code){
this.cloneWav = result.data.result
console.log("clone wav: ", this.cloneWav)
this.$message.success("音色克隆成功")
} else {
this.$message.error(result.data.msg)
}
this.$nextTick(()=>{})
}
},
};
</script>
<style lang="less" scoped>
// @import "./style.less";
.finetune {
width: 1200px;
height: 410px;
background: #FFFFFF;
padding: 5px 80px 56px 80px;
box-sizing: border-box;
}
.el-row {
margin-bottom: 20px;
}
.grid-content {
border-radius: 4px;
min-height: 36px;
}
.play_board{
height: 100%;
display: flex;
align-items: center;
}
</style>

@ -0,0 +1,379 @@
<template>
<div class="voiceclone">
<el-row :gutter="20">
<el-col :span="12"><div class="grid-content ep-bg-purple" />
<el-row :gutter="60" class="btn_row_wav" justify="center">
<el-button class="ml-3" v-if="onEnrollRec === 0" @click="startRecorderEnroll()" type="primary"></el-button>
<el-button class="ml-3" v-else-if="onEnrollRec === 1" @click="stopRecorderEnroll()" type="danger">停止录音</el-button>
<el-button class="ml-3" v-else @click="uploadRecord()" type="success">上传录音</el-button>
<a>&#12288</a>
<el-upload
:multiple="false"
:accept="'.wav'"
:auto-upload="false"
:on-change="handleChange"
:show-file-list="false"
>
<el-button class="ml-3" type="success">上传音频文件</el-button>
</el-upload>
</el-row>
<div class="recording_table">
<el-table :data="vcDatas" border class="recording_table_box" scrollbar-always-on max-height="250px">
<el-table-column prop="wavId" label="序号" width="60"/>
<el-table-column prop="wavName" label="文件名" />
<el-table-column label="操作" width="80">
<template #default="scope">
<div class="flex justify-space-between mb-4 flex-wrap gap-4">
<a @click="PlayTable(scope.row.wavId)"><el-icon><VideoPlay /></el-icon></a>
<a>&#12288</a>
<a @click="delWav(scope.row.wavId)"><el-icon><DeleteFilled /></el-icon></a>
</div>
</template>
</el-table-column>
<el-table-column fixed="right" label="选择" width="70">
<template #default="scope">
<el-switch v-model="scope.row.status" @click="choseWav(scope.row.wavId)"/>
</template>
</el-table-column>
</el-table>
</div>
</el-col>
<el-col :span="8"><div class="grid-content ep-bg-purple" />
<el-space direction="vertical">
<el-card class="box-card" style="width: 250px; height:310px">
<template #header>
<div class="card-header">
<span>请输入中文文本</span>
</div>
</template>
<div class="mb-2 flex items-center text-sm">
<el-radio-group v-model="func_radio" class="ml-4">
<el-radio label="1" size="large">GE2E</el-radio>
<el-radio label="2" size="large">ECAPA-TDNN</el-radio>
</el-radio-group>
</div>
<el-input
v-model="ttsText"
:autosize="{ minRows: 8, maxRows: 13 }"
type="textarea"
placeholder="Please input"
/>
</el-card>
</el-space>
</el-col>
<el-col :span="4"><div class="grid-content ep-bg-purple" />
<div class="play_board">
<el-space direction="vertical">
<el-row :gutter="20">
<el-button size="large" v-if="g2pOnSys === 0" type="primary" @click="g2pClone()"></el-button>
<el-button size="large" v-else :loading-icon="Eleme" type="danger">合成中</el-button>
</el-row>
<el-row :gutter="20">
<el-button v-if='this.cloneWav' type="primary" @click="PlaySyn()"></el-button>
<el-button v-else disabled type="primary" @click="PlaySyn()"></el-button>
<el-button v-if='this.cloneWav' type="primary" @click="downLoadCloneWav()"></el-button>
<el-button v-else disabled type="primary" @click="downLoadCloneWav()"></el-button>
</el-row>
</el-space>
</div>
</el-col>
</el-row>
</div>
</template>
<script>
import Recorder from 'js-audio-recorder'
import { vcCloneG2P, vcCloneSAT, vcDel, vcUpload, vcList, vcDownload, vcDownloadBase64 } from '../../../api/ApiVC';
//
const recorder = new Recorder({
sampleBits: 16, // 8 1616
sampleRate: 16000, // 110251600022050240004410048000chrome48000
numChannels: 1, // 1 2 1
compiling: true
})
//
const audioCtx = new AudioContext({
latencyHint: 'interactive',
sampleRate: 16000,
});
export default {
data(){
return {
onEnrollRec: 0, //
wav: '', //
vcDatas: [], //
nowFile: "", //
ttsText: "欢迎使用飞桨语音套件",
nowIndex: -1,
cloneWav: "",
g2pOnSys: 0,
func_radio: '1',
}
},
mounted () {
this.GetList()
},
methods:{
//
reset(){
this.onEnrollRec = 0
this.wav = ''
this.vcDatas = []
this.nowFile = ""
this.ttsText = "欢迎使用飞桨语音套件"
this.nowIndex = -1
},
//
startRecorderEnroll(){
this.onEnrollRec = 1
recorder.clear()
recorder.start()
},
//
stopRecorderEnroll(){
this.onEnrollRec = 2
recorder.stop()
this.wav = recorder.getWAVBlob()
},
// chose wav
choseWav(wavId){
this.cloneWav = ''
this.nowFile = this.vcDatas[wavId].wavName
this.nowIndex = wavId
// only wavId is true else false
for(let i=0; i<this.vcDatas.length; i++){
if(i==wavId){
this.vcDatas[wavId].status = true
} else {
this.vcDatas[i].status = false
}
}
this.$nextTick(()=>{})
},
//
async uploadRecord(){
this.onEnrollRec = 0
if(this.wav === ""){
this.$message.error("未检测到录音,录音失败,请重新录制")
return
} else {
if(this.wav === ''){
this.$message.error("请先完成录音");
this.onEnrollRec = 0
return
} else {
let formData = new FormData();
formData.append('files', this.wav);
const result = await vcUpload(formData);
console.log(result)
this.GetList()
}
this.$message.success("录音上传成功")
}
},
//
async handleChange(file, fileList){
for(let i=0; i<fileList.length; i++){
this.uploadFile(fileList[i])
}
},
//
async uploadFile(file){
let formData = new FormData();
formData.append('files', file.raw);
const result = await vcUpload(formData);
if (result.data.code === 0) {
this.$message.success("音频上传成功")
this.GetList()
} else {
this.$message.error("音频上传失败")
}
},
//
async GetList(){
this.vcDatas =[]
const result = await vcList();
for(let i=0; i<result.data.result.length; i++){
this.vcDatas.push({
wavName: result.data.result[i]['name'],
wavId: i,
wavPath: result.data.result[i]['path'],
status: false
})
}
this.$nextTick(()=>{})
},
//
async delWav(wavId){
console.log('wavId', wavId)
//
const result = await vcDel(
{
wavName: this.vcDatas[wavId]['wavName'],
wavPath: this.vcDatas[wavId]['wavPath']
}
);
if(!result.data.code){
this.$message.success("删除成功")
} else {
this.$message.error(result.data.msg)
}
this.GetList()
this.reset()
},
//
async downLoadCloneWav(){
if(this.cloneWav === ""){
this.$message.error("音频合成完毕后再下载!")
} else {
// const result = await vcDownload(this.cloneWav);
//
const result = await vcDownloadBase64(this.cloneWav);
let view;
// console.log('play result', result)
if (result.data.code === 0) {
// base
let typedArray = this.base64ToUint8Array(result.data.result)
// wav
view = new DataView(typedArray.buffer);
view = Recorder.encodeWAV(view, 16000, 16000, 1, 16, true);
//
// this.playAudioData(view.buffer);
}
console.log(view.buffer)
// debugger
const blob = new Blob([view.buffer], { type: 'audio/wav' });
const fileName = new Date().getTime() + '.wav';
const down = document.createElement('a');
down.download = fileName;
down.style.display = 'none';//,
down.href = URL.createObjectURL(blob);
document.body.appendChild(down);
down.click();
URL.revokeObjectURL(down.href); // URL
document.body.removeChild(down);//
}
},
// g2p voice clone
async g2pClone(){
if(this.nowIndex === -1){
return this.$message.error("请先录音并上传,选择音频后再点击合成")
} else if (this.ttsText === ""){
return this.$message.error("合成文本不可以为空")
} else if (this.nowIndex >= this.vcDatas.length){
return this.$message.error("当前序号不可以超过音频个数")
}
let func = ''
if(this.func_radio === '1'){
func = 'ge2e'
} else {
func = 'ecapa_tdnn'
}
console.log('func', func)
//
this.g2pOnSys = 1
const result = await vcCloneG2P(
{
wavName: this.vcDatas[this.nowIndex]['wavName'],
wavPath: this.vcDatas[this.nowIndex]['wavPath'],
text: this.ttsText,
func: func
}
);
this.g2pOnSys = 0
if(!result.data.code){
this.cloneWav = result.data.result
console.log("clone wav: ", this.cloneWav)
this.$message.success("音色克隆成功")
} else {
this.$message.error(result.data.msg)
}
},
//
async PlayTable(wavId){
this.Play(this.vcDatas[wavId])
},
//
async PlaySyn(){
if(this.cloneWav === ""){
this.$message.error("请合成音频后再播放!!")
return
} else {
this.Play(this.cloneWav)
}
},
//
async Play(wavBase){
//
const result = await vcDownloadBase64(wavBase);
// console.log('play result', result)
if (result.data.code === 0) {
// base
let typedArray = this.base64ToUint8Array(result.data.result)
// wav
let view = new DataView(typedArray.buffer);
view = Recorder.encodeWAV(view, 16000, 16000, 1, 16, true);
//
this.playAudioData(view.buffer);
};
},
// base64
base64ToUint8Array(base64String) {
const padding = '='.repeat((4 - base64String.length % 4) % 4);
const base64 = (base64String + padding)
.replace(/-/g, '+')
.replace(/_/g, '/');
const rawData = window.atob(base64);
const outputArray = new Uint8Array(rawData.length);
for (let i = 0; i < rawData.length; ++i) {
outputArray[i] = rawData.charCodeAt(i);
}
return outputArray;
},
//
playAudioData( wav_buffer ) {
audioCtx.decodeAudioData(wav_buffer, buffer => {
var source = audioCtx.createBufferSource();
source.buffer = buffer;
source.connect(audioCtx.destination);
source.start();
}, function(e) {
Recorder.throwError(e);
})
},
},
}
</script>
<style lang="less" scoped>
// @import "./style.less";
.voiceclone {
width: 1200px;
height: 410px;
background: #FFFFFF;
padding: 5px 80px 56px 80px;
box-sizing: border-box;
}
.el-row {
margin-bottom: 20px;
}
.grid-content {
border-radius: 4px;
min-height: 36px;
}
.play_board{
height: 100%;
display: flex;
align-items: center;
}
</style>

@ -1,5 +1,6 @@
import { createApp } from 'vue' import { createApp } from 'vue'
import ElementPlus from 'element-plus' import ElementPlus from 'element-plus'
import * as ElementPlusIconsVue from '@element-plus/icons-vue'
import 'element-plus/dist/index.css' import 'element-plus/dist/index.css'
import Antd from 'ant-design-vue'; import Antd from 'ant-design-vue';
import 'ant-design-vue/dist/antd.css'; import 'ant-design-vue/dist/antd.css';
@ -9,5 +10,8 @@ import axios from 'axios'
const app = createApp(App) const app = createApp(App)
app.config.globalProperties.$http = axios app.config.globalProperties.$http = axios
for (const [key, component] of Object.entries(ElementPlusIconsVue)) {
app.component(key, component)
}
app.use(ElementPlus).use(Antd) app.use(ElementPlus).use(Antd)
app.mount('#app') app.mount('#app')

@ -44,6 +44,11 @@
resolved "https://registry.npmmirror.com/@element-plus/icons-vue/-/icons-vue-1.1.4.tgz" resolved "https://registry.npmmirror.com/@element-plus/icons-vue/-/icons-vue-1.1.4.tgz"
integrity sha512-Iz/nHqdp1sFPmdzRwHkEQQA3lKvoObk8azgABZ81QUOpW9s/lUyQVUSh0tNtEPZXQlKwlSh7SPgoVxzrE0uuVQ== integrity sha512-Iz/nHqdp1sFPmdzRwHkEQQA3lKvoObk8azgABZ81QUOpW9s/lUyQVUSh0tNtEPZXQlKwlSh7SPgoVxzrE0uuVQ==
"@element-plus/icons-vue@^2.0.9":
version "2.0.9"
resolved "https://registry.npmmirror.com/@element-plus/icons-vue/-/icons-vue-2.0.9.tgz#b7777c57534522e387303d194451d50ff549d49a"
integrity sha512-okdrwiVeKBmW41Hkl0eMrXDjzJwhQMuKiBOu17rOszqM+LS/yBYpNQNV5Jvoh06Wc+89fMmb/uhzf8NZuDuUaQ==
"@floating-ui/core@^0.6.1": "@floating-ui/core@^0.6.1":
version "0.6.1" version "0.6.1"
resolved "https://registry.npmmirror.com/@floating-ui/core/-/core-0.6.1.tgz" resolved "https://registry.npmmirror.com/@floating-ui/core/-/core-0.6.1.tgz"

Loading…
Cancel
Save