update the vocoder of synthesize and synthesize_e2e to 4 stages, and update the READEM to be consistent with the script

pull/4008/head
nyx-c-language 6 months ago
parent b2bae2f40d
commit d8f1d036b2

@ -101,7 +101,7 @@ pwg_baker_ckpt_0.4
```bash ```bash
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh --stage 0 ${conf_path} ${train_output_path} ${ckpt_name} CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh --stage 0 ${conf_path} ${train_output_path} ${ckpt_name}
``` ```
`--stage` controls the vocoder model during synthesis, which can be `0` or `1`, use `pwgan` or `hifigan` model as vocoder. `--stage` controls the vocoder model during synthesis, which can be `0` or `1` or `2` or `3`, use `pwgan` or `multi band melgan` or `style melgan` or `hifigan`model as vocoder.
```text ```text
usage: synthesize.py [-h] usage: synthesize.py [-h]
@ -148,9 +148,12 @@ optional arguments:
output dir. output dir.
``` ```
`./local/synthesize_e2e.sh` calls `${BIN_DIR}/../synthesize_e2e.py`, which can synthesize waveform from text file. `./local/synthesize_e2e.sh` calls `${BIN_DIR}/../synthesize_e2e.py`, which can synthesize waveform from text file.
```bash ```bash
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_output_path} ${ckpt_name} CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh --stage 0 ${conf_path} ${train_output_path} ${ckpt_name}
``` ```
`--stage` controls the vocoder model during synthesis, which can be `0` or `1` or `3` or `4`, use `pwgan` or `multi band melgan` or `hifigan` or `wavernn`model as vocoder.
```text ```text
usage: synthesize_e2e.py [-h] usage: synthesize_e2e.py [-h]
[--am {speedyspeech_csmsc,speedyspeech_aishell3,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech}] [--am {speedyspeech_csmsc,speedyspeech_aishell3,fastspeech2_csmsc,fastspeech2_ljspeech,fastspeech2_aishell3,fastspeech2_vctk,tacotron2_csmsc,tacotron2_ljspeech}]

@ -27,12 +27,18 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
fi fi
if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
# synthesize, vocoder is pwgan by default stage 0, stage 1 will use hifigan as vocoder # synthesize, vocoder is pwgan by default stage 0
# stage 1 will use multi band melgan as vocoder
# stage 2 will use style melgan as vocoder
# stage 3 will use hifigan as vocoer
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh --stage 0 ${conf_path} ${train_output_path} ${ckpt_name} || exit -1 CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh --stage 0 ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
fi fi
if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
# synthesize_e2e, vocoder is pwgan by default stage 0, stage 1 will use hifigan as vocoder # synthesize_e2e, vocoder is pwgan by default stage 0
# stage 1 will use multi band melgan as vocoder
# stage 3 will use hifigan as vocoder
# stage 4 will use wavernn as vocoder
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh --stage 0 ${conf_path} ${train_output_path} ${ckpt_name} || exit -1 CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh --stage 0 ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
fi fi

Loading…
Cancel
Save