You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
PaddleSpeech/dataset/tal_cs/README.md

675 B

TAL_CSASR

This data set is TAL English class audio, including mixed Chinese and English speech. Each audio has only one speaker, and this data set has more than 100 speakers. (File 63.36G) This data contains the sample of intra sentence and inter sentence mixing. The ratio between Chinese characters and English words in the data is 13:1.

  • Total data: 587H (train_set: 555.9H, dev_set: 8H, test_set: 23.6H)
  • Sample rate: 16000
  • Sample bit: 16
  • Recording device: microphone
  • Speaker number: 200+
  • Recording time: 2019
  • Data format: audio: .wav; test: .txt
  • Audio duration: 1-60s
  • Data type: audio of English teachers' teaching