[vector] add AMI data preparation scripts

pull/1335/head
qingen 3 years ago
parent 9d32f62f48
commit 1899200cae

@ -1,13 +1,3 @@
# Speaker Diarization on AMI corpus # Speaker Diarization on AMI corpus
## About the AMI corpus: * sd0 - speaker diarization by AHC,SC base on x-vectors
"The AMI Meeting Corpus consists of 100 hours of meeting recordings. The recordings use a range of signals synchronized to a common timeline. These include close-talking and far-field microphones, individual and room-view video cameras, and output from a slide projector and an electronic whiteboard. During the meetings, the participants also have unsynchronized pens available to them that record what is written. The meetings were recorded in English using three different rooms with different acoustic properties, and include mostly non-native speakers." See [ami overview](http://groups.inf.ed.ac.uk/ami/corpus/overview.shtml) for more details.
## About the example
The script performs diarization using x-vectors(TDNN,ECAPA-TDNN) on the AMI mix-headset data. We demonstrate the use of different clustering methods: AHC, spectral.
## How to Run
Use the following command to run diarization on AMI corpus.
`bash ./run.sh`
## Results (DER) coming soon! :)

@ -0,0 +1,13 @@
# Speaker Diarization on AMI corpus
## About the AMI corpus:
"The AMI Meeting Corpus consists of 100 hours of meeting recordings. The recordings use a range of signals synchronized to a common timeline. These include close-talking and far-field microphones, individual and room-view video cameras, and output from a slide projector and an electronic whiteboard. During the meetings, the participants also have unsynchronized pens available to them that record what is written. The meetings were recorded in English using three different rooms with different acoustic properties, and include mostly non-native speakers." See [ami overview](http://groups.inf.ed.ac.uk/ami/corpus/overview.shtml) for more details.
## About the example
The script performs diarization using x-vectors(TDNN,ECAPA-TDNN) on the AMI mix-headset data. We demonstrate the use of different clustering methods: AHC, spectral.
## How to Run
Use the following command to run diarization on AMI corpus.
`bash ./run.sh`
## Results (DER) coming soon! :)

@ -2,12 +2,13 @@
stage=1 stage=1
data_folder=/home/data/ami/amicorpus #e.g., /path/to/amicorpus/ TARGET_DIR=${MAIN_ROOT}/dataset/ami
manual_annot_folder=/home/data/ami/ami_public_manual_1.6.2 #e.g., /path/to/ami_public_manual_1.6.2/ data_folder=${TARGET_DIR}/amicorpus #e.g., /path/to/amicorpus/
manual_annot_folder=${TARGET_DIR}/ami_public_manual_1.6.2 #e.g., /path/to/ami_public_manual_1.6.2/
save_folder=results save_folder=${MAIN_ROOT}/dataset/ami/results
ref_rttm_dir=results/ref_rttms ref_rttm_dir=${save_folder}/ref_rttms
meta_data_dir=results/metadata meta_data_dir=${save_folder}/metadata
set=L set=L
@ -23,8 +24,9 @@ if [ ${stage} -le 0 ]; then
# so you need to use the chooser to indicate which ones you wish to download # so you need to use the chooser to indicate which ones you wish to download
echo "Please follow https://groups.inf.ed.ac.uk/ami/download/ to download the data." echo "Please follow https://groups.inf.ed.ac.uk/ami/download/ to download the data."
echo "Annotations: AMI manual annotations v1.6.2 " echo "Annotations: AMI manual annotations v1.6.2 "
echo "Signals: Scenario Meetings/Non Scenario Meetings, some sessions recommended but not all" echo "Signals: "
echo "media streams: Headset mix, recommended first" echo "1) Select one or more AMI meetings: the IDs please follow ./ami_split.py"
echo "2) Select media streams: Just select Headset mix"
exit 0; exit 0;
fi fi
Loading…
Cancel
Save