You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
PaddleSpeech/dataset/tal_cs
zxcd e793d267d9
[ASR] add code-switch asr tal_cs recipe (#2796)
2 years ago
..
README.md [ASR] add code-switch asr tal_cs recipe (#2796) 2 years ago
tal_cs.py [ASR] add code-switch asr tal_cs recipe (#2796) 2 years ago

README.md

TAL_CSASR

This data set is TAL English class audio, including mixed Chinese and English speech. Each audio has only one speaker, and this data set has more than 100 speakers. (File 63.36G) This data contains the sample of intra sentence and inter sentence mixing. The ratio between Chinese characters and English words in the data is 13:1.

  • Total data: 587H (train_set: 555.9H, dev_set: 8H, test_set: 23.6H)
  • Sample rate: 16000
  • Sample bit: 16
  • Recording device: microphone
  • Speaker number: 200+
  • Recording time: 2019
  • Data format: audio: .wav; test: .txt
  • Audio duration: 1-60s
  • Data type: audio of English teachers' teaching