You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
4.3 KiB
4.3 KiB
TTS Datasets
Mandarin
- CSMSC: Chinese Standard Mandarin Speech Copus
- Duration/h: 12
- Number of Sentences: 10,000
- Size: 2.14GB
- Speaker: 1 female, ages 20 ~30
- Sample Rate: 48 kHz、16bit
- Mean Words per Clip: 16
- AISHELL-3
- Duration/h: 85
- Number of Sentences: 88,035
- Size: 17.75GB
- Speaker: 218
- Sample Rate: 44.1 kHz、16bit
English
- LJSpeech
- Duration/h: 24
- Number of Sentences: 13,100
- Size: 2.56GB
- Speaker: 1, age 20 ~30
- Sample Rate: 22050 Hz、16bit
- Mean Words per Clip: 17.23
- VCTK
- Number of Sentences: 44,583
- Size: 10.94GB
- Speaker: 110
- Sample Rate: 48 kHz、16bit
- Mean Words per Clip: 17.23
Japanese
- tri-jek: Japanese-English-Korean tri-lingual corpus
- JSSS-misc: misc tasks of JSSS corpus
- JTubeSpeech: Corpus of Japanese speech collected from YouTube
- J-MAC: Japanese multi-speaker audiobook corpus
- J-KAC: Japanese Kamishibai and audiobook corpus
- JMD: Japanese multi-dialect corpus
- JSSS: Japanese multi-style (summarization and simplification) corpus
- RWCP-SSD-Onomatopoeia: onomatopoeic word dataset for environmental sounds
- Life-m: landmark image-themed music corpus
- PJS: Phoneme-balanced Japanese singing voice corpus
- JVS-MuSiC: Japanese multi-speaker singing-voice corpus
- JVS: Japanese multi-speaker voice corpus
- JSUT-book: audiobook corpus by a single Japanese speaker
- JSUT-vi: vocal imitation corpus by a single Japanese speaker
- JSUT-song: singing voice corpus by a single Japanese singer
- JSUT: a large-scaled corpus of reading-style Japanese speech by a single speaker
Emotions
English
- CREMA-D
- Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset
Mandarin
- EMOVIE Dataset
- MASC