PaddleSpeech/docs/source/tts/tts_datasets.md

# TTS Datasets
<!--
see https://openslr.org/
-->
## Mandarin
- [CSMSC](https://www.data-baker.com/open_source.html): Chinese Standard Mandarin Speech Copus
    - Duration/h: 12
    - Number of Sentences: 10,000
    - Size: 2.14GB
    - Speaker: 1 female, ages 20 ~30
    - Sample Rate: 48 kHz、16bit
    - Mean Words per Clip: 16
- [AISHELL-3](http://www.aishelltech.com/aishell_3)
    - Duration/h: 85
    - Number of Sentences: 88,035
    - Size: 17.75GB
    - Speaker: 218
    - Sample Rate: 44.1 kHz、16bit 

## English
- [LJSpeech](https://keithito.com/LJ-Speech-Dataset/)
    - Duration/h: 24
    - Number of Sentences: 13,100
    - Size: 2.56GB
    - Speaker: 1, age 20 ~30
    - Sample Rate: 22050 Hz、16bit
    - Mean Words per Clip: 17.23
- [VCTK](https://datashare.ed.ac.uk/handle/10283/3443)
    - Number of Sentences: 44,583
    - Size: 10.94GB
    - Speaker: 110 
    - Sample Rate: 48 kHz、16bit
    - Mean Words per Clip: 17.23

## Japanese
<!--
see https://sites.google.com/site/shinnosuketakamichi/publication/corpus
-->

- [tri-jek](https://sites.google.com/site/shinnosuketakamichi/research-topics/tri-jek_corpus): Japanese-English-Korean tri-lingual corpus
- [JSSS-misc](https://sites.google.com/site/shinnosuketakamichi/research-topics/jsss-misc_corpus): misc tasks of JSSS corpus
- [JTubeSpeech](https://github.com/sarulab-speech/jtubespeech): Corpus of Japanese speech collected from YouTube
- [J-MAC](https://sites.google.com/site/shinnosuketakamichi/research-topics/j-mac_corpus): Japanese multi-speaker audiobook corpus
- [J-KAC](https://sites.google.com/site/shinnosuketakamichi/research-topics/j-kac_corpus): Japanese Kamishibai and audiobook corpus
- [JMD](https://sites.google.com/site/shinnosuketakamichi/research-topics/jmd_corpus): Japanese multi-dialect corpus
- [JSSS](https://sites.google.com/site/shinnosuketakamichi/research-topics/jsss_corpus): Japanese multi-style (summarization and simplification) corpus
- [RWCP-SSD-Onomatopoeia](https://www.ksuke.net/dataset/rwcp-ssd-onomatopoeia): onomatopoeic word dataset for environmental sounds 
- [Life-m](https://sites.google.com/site/shinnosuketakamichi/research-topics/life-m_corpus): landmark image-themed music corpus
- [PJS](https://sites.google.com/site/shinnosuketakamichi/research-topics/pjs_corpus): Phoneme-balanced Japanese singing voice corpus
- [JVS-MuSiC](https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_music): Japanese multi-speaker singing-voice corpus
- [JVS](https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_corpus): Japanese multi-speaker voice corpus
- [JSUT-book](https://sites.google.com/site/shinnosuketakamichi/publication/jsut-book): audiobook corpus by a single Japanese speaker
- [JSUT-vi](https://sites.google.com/site/shinnosuketakamichi/publication/jsut-vi): vocal imitation corpus by a single Japanese speaker
- [JSUT-song](https://sites.google.com/site/shinnosuketakamichi/publication/jsut-song): singing voice corpus by a single Japanese singer
- [JSUT](https://sites.google.com/site/shinnosuketakamichi/publication/jsut): a large-scaled corpus of reading-style Japanese speech by a single speaker

## Emotions
### English
- [CREMA-D](https://github.com/CheyneyComputerScience/CREMA-D)
- [Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset](https://kunzhou9646.github.io/controllable-evc/)
    - paper : [Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset](https://arxiv.org/abs/2010.14794)
### Mandarin
- [EMOVIE Dataset](https://viem-ccy.github.io/EMOVIE/dataset_release )
    - paper: [EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model](https://arxiv.org/abs/2106.09317)
- MASC
    - paper: [MASC: A Speech Corpus in Mandarin for Emotion Analysis and Affective Speaker Recognition](https://ieeexplore.ieee.org/document/4013501)
### English && Mandarin
- [Emotional Voice Conversion: Theory, Databases and ESD](https://github.com/HLTSingapore/Emotional-Speech-Data)    
    - paper: [Emotional Voice Conversion: Theory, Databases and ESD](https://arxiv.org/abs/2105.14762) 

## Music
- [GiantMIDI-Piano](https://github.com/bytedance/GiantMIDI-Piano)
- [MAESTRO Dataset](https://magenta.tensorflow.org/datasets/maestro)
     - [tf code](https://www.tensorflow.org/tutorials/audio/music_generation) 
- [Opencpop](https://wenet.org.cn/opencpop/)
add tts datasets doc, test=doc 3 years ago			`# TTS Datasets`
			`<!--`
			`see https://openslr.org/`
			`-->`
			`## Mandarin`
			`- [CSMSC](https://www.data-baker.com/open_source.html): Chinese Standard Mandarin Speech Copus`
			`- Duration/h: 12`
			`- Number of Sentences: 10,000`
			`- Size: 2.14GB`
			`- Speaker: 1 female, ages 20 ~30`
			`- Sample Rate: 48 kHz、16bit`
			`- Mean Words per Clip: 16`
			`- [AISHELL-3](http://www.aishelltech.com/aishell_3)`
			`- Duration/h: 85`
			`- Number of Sentences: 88,035`
			`- Size: 17.75GB`
			`- Speaker: 218`
			`- Sample Rate: 44.1 kHz、16bit`

			`## English`
			`- [LJSpeech](https://keithito.com/LJ-Speech-Dataset/)`
			`- Duration/h: 24`
			`- Number of Sentences: 13,100`
			`- Size: 2.56GB`
			`- Speaker: 1, age 20 ~30`
			`- Sample Rate: 22050 Hz、16bit`
			`- Mean Words per Clip: 17.23`
			`- [VCTK](https://datashare.ed.ac.uk/handle/10283/3443)`
			`- Number of Sentences: 44,583`
			`- Size: 10.94GB`
			`- Speaker: 110`
			`- Sample Rate: 48 kHz、16bit`
			`- Mean Words per Clip: 17.23`

			`## Japanese`
			`<!--`
			`see https://sites.google.com/site/shinnosuketakamichi/publication/corpus`
			`-->`

			`- [tri-jek](https://sites.google.com/site/shinnosuketakamichi/research-topics/tri-jek_corpus): Japanese-English-Korean tri-lingual corpus`
			`- [JSSS-misc](https://sites.google.com/site/shinnosuketakamichi/research-topics/jsss-misc_corpus): misc tasks of JSSS corpus`
			`- [JTubeSpeech](https://github.com/sarulab-speech/jtubespeech): Corpus of Japanese speech collected from YouTube`
			`- [J-MAC](https://sites.google.com/site/shinnosuketakamichi/research-topics/j-mac_corpus): Japanese multi-speaker audiobook corpus`
			`- [J-KAC](https://sites.google.com/site/shinnosuketakamichi/research-topics/j-kac_corpus): Japanese Kamishibai and audiobook corpus`
			`- [JMD](https://sites.google.com/site/shinnosuketakamichi/research-topics/jmd_corpus): Japanese multi-dialect corpus`
			`- [JSSS](https://sites.google.com/site/shinnosuketakamichi/research-topics/jsss_corpus): Japanese multi-style (summarization and simplification) corpus`
			`- [RWCP-SSD-Onomatopoeia](https://www.ksuke.net/dataset/rwcp-ssd-onomatopoeia): onomatopoeic word dataset for environmental sounds`
			`- [Life-m](https://sites.google.com/site/shinnosuketakamichi/research-topics/life-m_corpus): landmark image-themed music corpus`
			`- [PJS](https://sites.google.com/site/shinnosuketakamichi/research-topics/pjs_corpus): Phoneme-balanced Japanese singing voice corpus`
			`- [JVS-MuSiC](https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_music): Japanese multi-speaker singing-voice corpus`
			`- [JVS](https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_corpus): Japanese multi-speaker voice corpus`
			`- [JSUT-book](https://sites.google.com/site/shinnosuketakamichi/publication/jsut-book): audiobook corpus by a single Japanese speaker`
			`- [JSUT-vi](https://sites.google.com/site/shinnosuketakamichi/publication/jsut-vi): vocal imitation corpus by a single Japanese speaker`
			`- [JSUT-song](https://sites.google.com/site/shinnosuketakamichi/publication/jsut-song): singing voice corpus by a single Japanese singer`
			`- [JSUT](https://sites.google.com/site/shinnosuketakamichi/publication/jsut): a large-scaled corpus of reading-style Japanese speech by a single speaker`

			`## Emotions`
			`### English`
			`- [CREMA-D](https://github.com/CheyneyComputerScience/CREMA-D)`
			`- [Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset](https://kunzhou9646.github.io/controllable-evc/)`
			`- paper : [Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset](https://arxiv.org/abs/2010.14794)`
			`### Mandarin`
			`- [EMOVIE Dataset](https://viem-ccy.github.io/EMOVIE/dataset_release )`
			`- paper: [EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model](https://arxiv.org/abs/2106.09317)`
			`- MASC`
			`- paper: [MASC: A Speech Corpus in Mandarin for Emotion Analysis and Affective Speaker Recognition](https://ieeexplore.ieee.org/document/4013501)`
			`### English && Mandarin`
			`- [Emotional Voice Conversion: Theory, Databases and ESD](https://github.com/HLTSingapore/Emotional-Speech-Data)`
			`- paper: [Emotional Voice Conversion: Theory, Databases and ESD](https://arxiv.org/abs/2105.14762)`

			`## Music`
			`- [GiantMIDI-Piano](https://github.com/bytedance/GiantMIDI-Piano)`
			`- [MAESTRO Dataset](https://magenta.tensorflow.org/datasets/maestro)`
			`- [tf code](https://www.tensorflow.org/tutorials/audio/music_generation)`
			`- [Opencpop](https://wenet.org.cn/opencpop/)`