You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
15 lines
1.1 KiB
15 lines
1.1 KiB
4 years ago
|
# [Aidatatang_200zh](http://www.openslr.org/62/)
|
||
|
|
||
|
Aidatatang_200zh is a free Chinese Mandarin speech corpus provided by Beijing DataTang Technology Co., Ltd under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License.
|
||
|
The contents and the corresponding descriptions of the corpus include:
|
||
|
|
||
|
* The corpus contains 200 hours of acoustic data, which is mostly mobile recorded data.
|
||
|
* 600 speakers from different accent areas in China are invited to participate in the recording.
|
||
|
* The transcription accuracy for each sentence is larger than 98%.
|
||
|
* Recordings are conducted in a quiet indoor environment.
|
||
|
* The database is divided into training set, validation set, and testing set in a ratio of 7: 1: 2.
|
||
|
* Detail information such as speech data coding and speaker information is preserved in the metadata file.
|
||
|
* Segmented transcripts are also provided.
|
||
|
|
||
|
The corpus aims to support researchers in speech recognition, machine translation, voiceprint recognition, and other speech-related fields. Therefore, the corpus is totally free for academic use.
|