|
|
@ -6,11 +6,11 @@ This example contains code used to train a [Fastspeech2](https://arxiv.org/abs/2
|
|
|
|
|
|
|
|
|
|
|
|
## Dataset
|
|
|
|
## Dataset
|
|
|
|
### Download and Extract
|
|
|
|
### Download and Extract
|
|
|
|
Download all datasets and extract it to `~/datasets`:
|
|
|
|
Download all datasets and extract it to `./data`:
|
|
|
|
- The CSMSC dataset is in the directory `~/datasets/BZNSYP`
|
|
|
|
- The CSMSC dataset is in the directory `./data/BZNSYP`
|
|
|
|
- The Ljspeech dataset is in the directory `~/datasets/LJSpeech-1.1`
|
|
|
|
- The Ljspeech dataset is in the directory `./data/LJSpeech-1.1`
|
|
|
|
- The aishell3 dataset is in the directory `~/datasets/data_aishell3`
|
|
|
|
- The aishell3 dataset is in the directory `./data/data_aishell3`
|
|
|
|
- The vctk dataset is in the directory `~/datasets/VCTK-Corpus-0.92`
|
|
|
|
- The vctk dataset is in the directory `./data/VCTK-Corpus-0.92`
|
|
|
|
|
|
|
|
|
|
|
|
### Get MFA Result and Extract
|
|
|
|
### Get MFA Result and Extract
|
|
|
|
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for the fastspeech2 training.
|
|
|
|
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for the fastspeech2 training.
|
|
|
@ -24,16 +24,16 @@ Or train your MFA model reference to [mfa example](https://github.com/PaddlePadd
|
|
|
|
|
|
|
|
|
|
|
|
## Get Started
|
|
|
|
## Get Started
|
|
|
|
Assume the paths to the datasets are:
|
|
|
|
Assume the paths to the datasets are:
|
|
|
|
- `~/datasets/BZNSYP`
|
|
|
|
- `./data/BZNSYP`
|
|
|
|
- `~/datasets/LJSpeech-1.1`
|
|
|
|
- `./data/LJSpeech-1.1`
|
|
|
|
- `~/datasets/data_aishell3`
|
|
|
|
- `./data/data_aishell3`
|
|
|
|
- `~/datasets/VCTK-Corpus-0.92`
|
|
|
|
- `./data/VCTK-Corpus-0.92`
|
|
|
|
|
|
|
|
|
|
|
|
Assume the path to the MFA results of the datasets are:
|
|
|
|
Assume the path to the MFA results of the datasets are:
|
|
|
|
- `./mfa_results/baker_alignment_tone`
|
|
|
|
- `./data/mfa/baker_alignment_tone`
|
|
|
|
- `./mfa_results/ljspeech_alignment`
|
|
|
|
- `./data/mfa/ljspeech_alignment`
|
|
|
|
- `./mfa_results/aishell3_alignment_tone`
|
|
|
|
- `./data/mfa/aishell3_alignment_tone`
|
|
|
|
- `./mfa_results/vctk_alignment`
|
|
|
|
- `./data/mfa/vctk_alignment`
|
|
|
|
|
|
|
|
|
|
|
|
Run the command below to
|
|
|
|
Run the command below to
|
|
|
|
1. **source path**.
|
|
|
|
1. **source path**.
|
|
|
@ -288,6 +288,9 @@ python3 ${BIN_DIR}/../synthesize_e2e.py \
|
|
|
|
--am_config=fastspeech2_mix_ckpt_1.2.0/default.yaml \
|
|
|
|
--am_config=fastspeech2_mix_ckpt_1.2.0/default.yaml \
|
|
|
|
--am_ckpt=fastspeech2_mix_ckpt_1.2.0/snapshot_iter_99200.pdz \
|
|
|
|
--am_ckpt=fastspeech2_mix_ckpt_1.2.0/snapshot_iter_99200.pdz \
|
|
|
|
--am_stat=fastspeech2_mix_ckpt_1.2.0/speech_stats.npy \
|
|
|
|
--am_stat=fastspeech2_mix_ckpt_1.2.0/speech_stats.npy \
|
|
|
|
|
|
|
|
--phones_dict=fastspeech2_mix_ckpt_1.2.0/phone_id_map.txt \
|
|
|
|
|
|
|
|
--speaker_dict=fastspeech2_mix_ckpt_1.2.0/speaker_id_map.txt \
|
|
|
|
|
|
|
|
--spk_id=174 \
|
|
|
|
--voc=pwgan_aishell3 \
|
|
|
|
--voc=pwgan_aishell3 \
|
|
|
|
--voc_config=pwg_aishell3_ckpt_0.5/default.yaml \
|
|
|
|
--voc_config=pwg_aishell3_ckpt_0.5/default.yaml \
|
|
|
|
--voc_ckpt=pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz \
|
|
|
|
--voc_ckpt=pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz \
|
|
|
@ -295,8 +298,5 @@ python3 ${BIN_DIR}/../synthesize_e2e.py \
|
|
|
|
--lang=mix \
|
|
|
|
--lang=mix \
|
|
|
|
--text=${BIN_DIR}/../sentences_mix.txt \
|
|
|
|
--text=${BIN_DIR}/../sentences_mix.txt \
|
|
|
|
--output_dir=exp/default/test_e2e \
|
|
|
|
--output_dir=exp/default/test_e2e \
|
|
|
|
--phones_dict=fastspeech2_mix_ckpt_1.2.0/phone_id_map.txt \
|
|
|
|
|
|
|
|
--speaker_dict=fastspeech2_mix_ckpt_1.2.0/speaker_id_map.txt \
|
|
|
|
|
|
|
|
--spk_id=174 \
|
|
|
|
|
|
|
|
--inference_dir=exp/default/inference
|
|
|
|
--inference_dir=exp/default/inference
|
|
|
|
```
|
|
|
|
```
|
|
|
|