diff --git a/examples/ami/README.md b/examples/ami/README.md index a038eaeb..adc9dc4b 100644 --- a/examples/ami/README.md +++ b/examples/ami/README.md @@ -1,3 +1,3 @@ # Speaker Diarization on AMI corpus -* sd0 - speaker diarization by AHC,SC base on x-vectors +* sd0 - speaker diarization by AHC,SC base on embeddings diff --git a/examples/ami/sd0/README.md b/examples/ami/sd0/README.md index ffe95741..e9ecc285 100644 --- a/examples/ami/sd0/README.md +++ b/examples/ami/sd0/README.md @@ -7,7 +7,23 @@ The script performs diarization using x-vectors(TDNN,ECAPA-TDNN) on the AMI mix-headset data. We demonstrate the use of different clustering methods: AHC, spectral. ## How to Run +### prepare annotations and audios +Download AMI corpus, You need around 10GB of free space to get whole data +The signals are too large to package in this way, so you need to use the chooser to indicate which ones you wish to download + +```bash +## download annotations +wget http://groups.inf.ed.ac.uk/ami/AMICorpusAnnotations/ami_public_manual_1.6.2.zip && unzip ami_public_manual_1.6.2.zip +``` + +then please follow https://groups.inf.ed.ac.uk/ami/download/ to download the Signals: +1) Select one or more AMI meetings: the IDs please follow ./ami_split.py +2) Select media streams: Just select Headset mix + +### start running Use the following command to run diarization on AMI corpus. -`bash ./run.sh` +```bash +./run.sh --data_folder ./amicorpus --manual_annot_folder ./ami_public_manual_1.6.2 +``` ## Results (DER) coming soon! :) diff --git a/examples/ami/sd0/run.sh b/examples/ami/sd0/run.sh index 9035f595..1fcec269 100644 --- a/examples/ami/sd0/run.sh +++ b/examples/ami/sd0/run.sh @@ -17,18 +17,6 @@ device=gpu . ${MAIN_ROOT}/utils/parse_options.sh || exit 1; -if [ $stage -le 0 ]; then - # Prepare data - # Download AMI corpus, You need around 10GB of free space to get whole data - # The signals are too large to package in this way, - # so you need to use the chooser to indicate which ones you wish to download - echo "Please follow https://groups.inf.ed.ac.uk/ami/download/ to download the data." - echo "Annotations: AMI manual annotations v1.6.2 " - echo "Signals: " - echo "1) Select one or more AMI meetings: the IDs please follow ./ami_split.py" - echo "2) Select media streams: Just select Headset mix" -fi - if [ $stage -le 1 ]; then # Download the pretrained model wget https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_1.tar.gz