# [TAL_CSASR](https://ai.100tal.com/dataset/)

This data set is TAL English class audio, including mixed Chinese and English speech. Each audio has only one speaker, and this data set has more than 100 speakers. (File 63.36G) This data contains the sample of intra sentence and inter sentence mixing. The ratio between Chinese characters and English words in the data is 13:1. 

- Total data: 587H (train_set: 555.9H, dev_set: 8H, test_set: 23.6H)
- Sample rate: 16000
- Sample bit: 16
- Recording device: microphone
- Speaker number: 200+
- Recording time: 2019
- Data format: audio: .wav; test: .txt
- Audio duration: 1-60s
- Data type: audio of English teachers' teaching