From 7883aa6cbdfe656edb2683aa36fd1c47fd66cab1 Mon Sep 17 00:00:00 2001
From: Echo-Nie <157974576+Echo-Nie@users.noreply.github.com>
Date: Mon, 21 Apr 2025 17:21:12 +0800
Subject: [PATCH] update the stage of run.sh and synthesize_e2e.sh, to be clear
 (#4057)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* run.sh修改：为 synthesize 和 synthesize_e2e 添加 --stage 参数控制 vocoder 模型选择，REAMDE.md修改：补充 stage 参数说明，明确 vocoder 选择逻辑

* 添加run.sh中stage参数相关的注释

* HiFiGAN改为MultiBand MelGAN

* cmsc文件改回原位（No.15不修改），这里只对No.6做修改

* update the stage of run.sh and synthesize_e2e.sh, to be clear

* fix the md
---
 examples/aishell3/ernie_sat/README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/examples/aishell3/ernie_sat/README.md b/examples/aishell3/ernie_sat/README.md
index e26808e95..10956f9e9 100644
--- a/examples/aishell3/ernie_sat/README.md
+++ b/examples/aishell3/ernie_sat/README.md
@@ -13,7 +13,7 @@ In ERNIE-SAT, we propose two innovations:
 ## Dataset
 ### Download and Extract
 Download AISHELL-3 from it's [Official Website](http://www.aishelltech.com/aishell_3) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/data_aishell3`.
- 
+
 ### Get MFA Result and Extract
 We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2.
 You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.cdn.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo.
@@ -138,7 +138,7 @@ You can check the text of downloaded wavs in `source/README.md`.
 ```bash
 ./run.sh --stage 3 --stop-stage 3 --gpus 0
 ```
-`stage 3` of `run.sh` calls `local/synthesize_e2e.sh`, `stage 0` of it is **Speech Synthesis** and  `stage 1` of it is **Speech Editing**.
+`stage 3` of `run.sh` calls `local/synthesize_e2e.sh`. `synthesize_e2e.sh` is a script for performing both **Speech Synthesis** and **Speech Editing** tasks by default. It converts input text into speech for synthesis and modifies existing speech based on new text content for editing.
 
 You can modify `--wav_path`、`--old_str` and `--new_str` yourself, `--old_str`  should be the text corresponding to the audio of  `--wav_path`, `--new_str` should be designed according to `--task_name`, both `--source_lang` and `--target_lang` should be `zh` for model trained with AISHELL3 dataset.
 ## Pretrained Model