# Mixed Chinese and English TTS with CSMSC, LJSpeech-1.1, AISHELL-3 and VCTK datasets
This example contains code used to train a [Fastspeech2](https://arxiv.org/abs/2006.04558) model with [CSMSC](https://www.data-baker.com/open_source.html), [LJSpeech-1.1](https://keithito.com/LJ-Speech-Dataset/), [AISHELL3](http://www.aishelltech.com/aishell_3) and [VCTK](https://datashare.ed.ac.uk/handle/10283/3443) datasets.
@ -5,15 +6,34 @@ This example contains code used to train a [Fastspeech2](https://arxiv.org/abs/2
## Dataset
### Download and Extract
Download all datasets and extract it to `~/datasets`. The CSMSC dataset is in the directory `~/datasets/BZNSYP`. The Ljspeech dataset is in the directory `~/datasets/LJSpeech-1.1`. The aishell3 dataset is in the directory `~/datasets/data_aishell3`. The vctk dataset is in the directory `~/datasets/VCTK-Corpus-0.92`.
Download all datasets and extract it to `~/datasets`:
- The CSMSC dataset is in the directory `~/datasets/BZNSYP`
- The Ljspeech dataset is in the directory `~/datasets/LJSpeech-1.1`
- The aishell3 dataset is in the directory `~/datasets/data_aishell3`
- The vctk dataset is in the directory `~/datasets/VCTK-Corpus-0.92`
### Get MFA Result and Extract
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for the fastspeech2 training.
You can download from here [baker_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/BZNSYP/with_tone/baker_alignment_tone.tar.gz), [ljspeech_alignment.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/LJSpeech-1.1/ljspeech_alignment.tar.gz), [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz) and [vctk_alignment.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/VCTK-Corpus-0.92/vctk_alignment.tar.gz). Or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo.
Or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo.
## Get Started
Assume the paths of the datasets are `~/datasets/BZNSYP`, `~/datasets/LJSpeech-1.1`, `~/datasets/data_aishell3` and `~/datasets/VCTK-Corpus-0.92`.
Assume the path to the MFA result of the datasets are `./mfa_results/baker_alignment_tone`, `./mfa_results/ljspeech_alignment`, `./mfa_results/aishell3_alignment_tone` and `./mfa_results/vctk_alignment`.
Assume the paths to the datasets are:
- `~/datasets/BZNSYP`
- `~/datasets/LJSpeech-1.1`
- `~/datasets/data_aishell3`
- `~/datasets/VCTK-Corpus-0.92`
Assume the path to the MFA results of the datasets are:
- `./mfa_results/baker_alignment_tone`
- `./mfa_results/ljspeech_alignment`
- `./mfa_results/aishell3_alignment_tone`
- `./mfa_results/vctk_alignment`
Run the command below to
1. **source path**.
@ -252,7 +272,7 @@ fastspeech2_mix_ckpt_0.2.0
You can use the following scripts to synthesize for `${BIN_DIR}/../sentences_mix.txt` using pretrained fastspeech2 and parallel wavegan models.
174 means baker speaker, 175 means ljspeech speaker. For other speaker information, see `speaker_id_map.txt`
`174` means baker speaker, `175` means ljspeech speaker. For other speaker information, please see `speaker_id_map.txt`.