diff --git a/docs/source/tts/tts_datasets.md b/docs/source/tts/tts_datasets.md new file mode 100644 index 000000000..a79981dfe --- /dev/null +++ b/docs/source/tts/tts_datasets.md @@ -0,0 +1,75 @@ +# TTS Datasets + +## Mandarin +- [CSMSC](https://www.data-baker.com/open_source.html): Chinese Standard Mandarin Speech Copus + - Duration/h: 12 + - Number of Sentences: 10,000 + - Size: 2.14GB + - Speaker: 1 female, ages 20 ~30 + - Sample Rate: 48 kHz、16bit + - Mean Words per Clip: 16 +- [AISHELL-3](http://www.aishelltech.com/aishell_3) + - Duration/h: 85 + - Number of Sentences: 88,035 + - Size: 17.75GB + - Speaker: 218 + - Sample Rate: 44.1 kHz、16bit + +## English +- [LJSpeech](https://keithito.com/LJ-Speech-Dataset/) + - Duration/h: 24 + - Number of Sentences: 13,100 + - Size: 2.56GB + - Speaker: 1, age 20 ~30 + - Sample Rate: 22050 Hz、16bit + - Mean Words per Clip: 17.23 +- [VCTK](https://datashare.ed.ac.uk/handle/10283/3443) + - Number of Sentences: 44,583 + - Size: 10.94GB + - Speaker: 110 + - Sample Rate: 48 kHz、16bit + - Mean Words per Clip: 17.23 + +## Japanese + + +- [tri-jek](https://sites.google.com/site/shinnosuketakamichi/research-topics/tri-jek_corpus): Japanese-English-Korean tri-lingual corpus +- [JSSS-misc](https://sites.google.com/site/shinnosuketakamichi/research-topics/jsss-misc_corpus): misc tasks of JSSS corpus +- [JTubeSpeech](https://github.com/sarulab-speech/jtubespeech): Corpus of Japanese speech collected from YouTube +- [J-MAC](https://sites.google.com/site/shinnosuketakamichi/research-topics/j-mac_corpus): Japanese multi-speaker audiobook corpus +- [J-KAC](https://sites.google.com/site/shinnosuketakamichi/research-topics/j-kac_corpus): Japanese Kamishibai and audiobook corpus +- [JMD](https://sites.google.com/site/shinnosuketakamichi/research-topics/jmd_corpus): Japanese multi-dialect corpus +- [JSSS](https://sites.google.com/site/shinnosuketakamichi/research-topics/jsss_corpus): Japanese multi-style (summarization and simplification) corpus +- [RWCP-SSD-Onomatopoeia](https://www.ksuke.net/dataset/rwcp-ssd-onomatopoeia): onomatopoeic word dataset for environmental sounds +- [Life-m](https://sites.google.com/site/shinnosuketakamichi/research-topics/life-m_corpus): landmark image-themed music corpus +- [PJS](https://sites.google.com/site/shinnosuketakamichi/research-topics/pjs_corpus): Phoneme-balanced Japanese singing voice corpus +- [JVS-MuSiC](https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_music): Japanese multi-speaker singing-voice corpus +- [JVS](https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_corpus): Japanese multi-speaker voice corpus +- [JSUT-book](https://sites.google.com/site/shinnosuketakamichi/publication/jsut-book): audiobook corpus by a single Japanese speaker +- [JSUT-vi](https://sites.google.com/site/shinnosuketakamichi/publication/jsut-vi): vocal imitation corpus by a single Japanese speaker +- [JSUT-song](https://sites.google.com/site/shinnosuketakamichi/publication/jsut-song): singing voice corpus by a single Japanese singer +- [JSUT](https://sites.google.com/site/shinnosuketakamichi/publication/jsut): a large-scaled corpus of reading-style Japanese speech by a single speaker + +## Emotions +### English +- [CREMA-D](https://github.com/CheyneyComputerScience/CREMA-D) +- [Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset](https://kunzhou9646.github.io/controllable-evc/) + - paper : [Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset](https://arxiv.org/abs/2010.14794) +### Mandarin +- [EMOVIE Dataset](https://viem-ccy.github.io/EMOVIE/dataset_release ) + - paper: [EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model](https://arxiv.org/abs/2106.09317) +- MASC + - paper: [MASC: A Speech Corpus in Mandarin for Emotion Analysis and Affective Speaker Recognition](https://ieeexplore.ieee.org/document/4013501) +### English && Mandarin +- [Emotional Voice Conversion: Theory, Databases and ESD](https://github.com/HLTSingapore/Emotional-Speech-Data) + - paper: [Emotional Voice Conversion: Theory, Databases and ESD](https://arxiv.org/abs/2105.14762) + +## Music +- [GiantMIDI-Piano](https://github.com/bytedance/GiantMIDI-Piano) +- [MAESTRO Dataset](https://magenta.tensorflow.org/datasets/maestro) + - [tf code](https://www.tensorflow.org/tutorials/audio/music_generation) +- [Opencpop](https://wenet.org.cn/opencpop/)