You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
PaddleSpeech/docs/source/tts/tts_datasets.md

4.3 KiB

TTS Datasets

Mandarin

  • CSMSC: Chinese Standard Mandarin Speech Copus
    • Duration/h: 12
    • Number of Sentences: 10,000
    • Size: 2.14GB
    • Speaker: 1 female, ages 20 ~30
    • Sample Rate: 48 kHz、16bit
    • Mean Words per Clip: 16
  • AISHELL-3
    • Duration/h: 85
    • Number of Sentences: 88,035
    • Size: 17.75GB
    • Speaker: 218
    • Sample Rate: 44.1 kHz、16bit

English

  • LJSpeech
    • Duration/h: 24
    • Number of Sentences: 13,100
    • Size: 2.56GB
    • Speaker: 1, age 20 ~30
    • Sample Rate: 22050 Hz、16bit
    • Mean Words per Clip: 17.23
  • VCTK
    • Number of Sentences: 44,583
    • Size: 10.94GB
    • Speaker: 110
    • Sample Rate: 48 kHz、16bit
    • Mean Words per Clip: 17.23

Japanese

  • tri-jek: Japanese-English-Korean tri-lingual corpus
  • JSSS-misc: misc tasks of JSSS corpus
  • JTubeSpeech: Corpus of Japanese speech collected from YouTube
  • J-MAC: Japanese multi-speaker audiobook corpus
  • J-KAC: Japanese Kamishibai and audiobook corpus
  • JMD: Japanese multi-dialect corpus
  • JSSS: Japanese multi-style (summarization and simplification) corpus
  • RWCP-SSD-Onomatopoeia: onomatopoeic word dataset for environmental sounds
  • Life-m: landmark image-themed music corpus
  • PJS: Phoneme-balanced Japanese singing voice corpus
  • JVS-MuSiC: Japanese multi-speaker singing-voice corpus
  • JVS: Japanese multi-speaker voice corpus
  • JSUT-book: audiobook corpus by a single Japanese speaker
  • JSUT-vi: vocal imitation corpus by a single Japanese speaker
  • JSUT-song: singing voice corpus by a single Japanese singer
  • JSUT: a large-scaled corpus of reading-style Japanese speech by a single speaker

Emotions

English

Mandarin

English && Mandarin

Music