From 99f392d5c476b152f7b08775711f0042bafa43c7 Mon Sep 17 00:00:00 2001 From: nyx-c-language Date: Sat, 12 Apr 2025 23:36:37 +0800 Subject: [PATCH] update the stage of run.sh and synthesize_e2e.sh, to be clear --- examples/aishell3/ernie_sat/README.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/examples/aishell3/ernie_sat/README.md b/examples/aishell3/ernie_sat/README.md index e26808e95..aee732cf8 100644 --- a/examples/aishell3/ernie_sat/README.md +++ b/examples/aishell3/ernie_sat/README.md @@ -13,7 +13,7 @@ In ERNIE-SAT, we propose two innovations: ## Dataset ### Download and Extract Download AISHELL-3 from it's [Official Website](http://www.aishelltech.com/aishell_3) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/data_aishell3`. - + ### Get MFA Result and Extract We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2. You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.cdn.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo. @@ -138,7 +138,13 @@ You can check the text of downloaded wavs in `source/README.md`. ```bash ./run.sh --stage 3 --stop-stage 3 --gpus 0 ``` -`stage 3` of `run.sh` calls `local/synthesize_e2e.sh`, `stage 0` of it is **Speech Synthesis** and `stage 1` of it is **Speech Editing**. +`run.sh`'s `stage 3` invokes `synthesize_e2e.sh` and uses the `--stage` parameter to select between tasks. By default, `synthesize_e2e.sh` executes `stage 0`, which performs speech synthesis. To switch to speech editing, use `--stage 1`. + +To perform speech synthesis, modify the command to: + +```bash +./run.sh --stage 3 --stop-stage 3 --gpus 0 --stage 1 +``` You can modify `--wav_path`、`--old_str` and `--new_str` yourself, `--old_str` should be the text corresponding to the audio of `--wav_path`, `--new_str` should be designed according to `--task_name`, both `--source_lang` and `--target_lang` should be `zh` for model trained with AISHELL3 dataset. ## Pretrained Model