You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
11 lines
907 B
11 lines
907 B
3 years ago
|
# [VoxCeleb](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/)
|
||
|
VoxCeleb is an audio-visual dataset consisting of short clips of human speech, extracted from interview videos uploaded to YouTube。
|
||
|
|
||
|
VoxCeleb contains speech from speakers spanning a wide range of different ethnicities, accents, professions and ages.
|
||
|
All speaking face-tracks are captured "in the wild", with background chatter, laughter, overlapping speech, pose variation and different lighting conditions.
|
||
|
VoxCeleb consists of both audio and video. Each segment is at least 3 seconds long.
|
||
|
|
||
|
The dataset consists of two versions, VoxCeleb1 and VoxCeleb2. Each version has it's own train/test split. For each we provide YouTube URLs, face detections and tracks, audio files, cropped face videos and speaker meta-data. There is no overlap between the two versions.
|
||
|
|
||
|
more info in details refers to http://www.robots.ox.ac.uk/~vgg/data/voxceleb/
|