From 7883aa6cbdfe656edb2683aa36fd1c47fd66cab1 Mon Sep 17 00:00:00 2001 From: Echo-Nie <157974576+Echo-Nie@users.noreply.github.com> Date: Mon, 21 Apr 2025 17:21:12 +0800 Subject: [PATCH] update the stage of run.sh and synthesize_e2e.sh, to be clear (#4057) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * run.sh修改:为 synthesize 和 synthesize_e2e 添加 --stage 参数控制 vocoder 模型选择,REAMDE.md修改:补充 stage 参数说明,明确 vocoder 选择逻辑 * 添加run.sh中stage参数相关的注释 * HiFiGAN改为MultiBand MelGAN * cmsc文件改回原位(No.15不修改),这里只对No.6做修改 * update the stage of run.sh and synthesize_e2e.sh, to be clear * fix the md --- examples/aishell3/ernie_sat/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/aishell3/ernie_sat/README.md b/examples/aishell3/ernie_sat/README.md index e26808e95..10956f9e9 100644 --- a/examples/aishell3/ernie_sat/README.md +++ b/examples/aishell3/ernie_sat/README.md @@ -13,7 +13,7 @@ In ERNIE-SAT, we propose two innovations: ## Dataset ### Download and Extract Download AISHELL-3 from it's [Official Website](http://www.aishelltech.com/aishell_3) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/data_aishell3`. - + ### Get MFA Result and Extract We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2. You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.cdn.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo. @@ -138,7 +138,7 @@ You can check the text of downloaded wavs in `source/README.md`. ```bash ./run.sh --stage 3 --stop-stage 3 --gpus 0 ``` -`stage 3` of `run.sh` calls `local/synthesize_e2e.sh`, `stage 0` of it is **Speech Synthesis** and `stage 1` of it is **Speech Editing**. +`stage 3` of `run.sh` calls `local/synthesize_e2e.sh`. `synthesize_e2e.sh` is a script for performing both **Speech Synthesis** and **Speech Editing** tasks by default. It converts input text into speech for synthesis and modifies existing speech based on new text content for editing. You can modify `--wav_path`、`--old_str` and `--new_str` yourself, `--old_str` should be the text corresponding to the audio of `--wav_path`, `--new_str` should be designed according to `--task_name`, both `--source_lang` and `--target_lang` should be `zh` for model trained with AISHELL3 dataset. ## Pretrained Model