History

xiongxinlei e72912adb9 update the speaker verification model, test=doc		3 years ago
..
README.md	update the speaker verification model, test=doc	3 years ago
README_cn.md	update the speaker verification model, test=doc	3 years ago
run.sh	add speaker verification method, test=doc	3 years ago

README.md

Unescape Escape

(简体中文|English)

Speech Verification)

Introduction

Speaker Verification, refers to the problem of getting a speaker embedding from an audio.

This demo is an implementation to extract speaker embedding from a specific audio file. It can be done by a single command or a few lines in python using PaddleSpeech.

Usage

1. Installation

see installation.

You can choose one way from easy, meduim and hard to install paddlespeech.

2. Prepare Input File

The input of this demo should be a WAV file(.wav), and the sample rate must be the same as the model.

Here are sample files for this demo that can be downloaded:

wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav

3. Usage

Command Line(Recommended)

paddlespeech vector --task spk --input 85236145389.wav

echo -e "demo1 85236145389.wav" > vec.job
paddlespeech vector --task spk --input vec.job

echo -e "demo2 85236145389.wav \n demo3 85236145389.wav" | paddlespeech vector --task spk

paddlespeech vector --task score --input "./85236145389.wav ./123456789.wav"

echo -e "demo4 85236145389.wav 85236145389.wav \n demo5 85236145389.wav 123456789.wav" > vec.job
paddlespeech vector --task score --input vec.job

Usage:

paddlespeech vector --help

Arguments:

input(required): Audio file to recognize.
task (required): Specify vector task. Default spk。
model: Model type of vector task. Default: ecapatdnn_voxceleb12.
sample_rate: Sample rate of the model. Default: 16000.
config: Config of vector task. Use pretrained model when it is None. Default: None.
ckpt_path: Model checkpoint. Use pretrained model when it is None. Default: None.
device: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.

Output:

  demo [  1.4217498    5.626253    -5.342073     1.1773866    3.308055
  1.756596     5.167894    10.80636     -3.8226728   -5.6141334
  2.623845    -0.8072968    1.9635103   -7.3128724    0.01103897
  -9.723131     0.6619743   -6.976803    10.213478     7.494748
  2.9105635    3.8949256    3.7999806    7.1061673   16.905321
  -7.1493764    8.733103     3.4230042   -4.831653   -11.403367
  11.232214     7.1274667   -4.2828417    2.452362    -5.130748
  -18.177666    -2.6116815  -11.000337    -6.7314315    1.6564683
  0.7618269    1.1253023   -2.083836     4.725744    -8.782597
  -3.539873     3.814236     5.1420674    2.162061     4.096431
  -6.4162116   12.747448     1.9429878  -15.152943     6.417416
  16.097002    -9.716668    -1.9920526   -3.3649497   -1.871939
  11.567354     3.69788     11.258265     7.442363     9.183411
  4.5281515   -1.2417862    4.3959084    6.6727695    5.8898783
  7.627124    -0.66919386 -11.889693    -9.208865    -7.4274073
  -3.7776625    6.917234    -9.848748    -2.0944717   -5.135116
  0.49563864   9.317534    -5.9141874   -1.8098574   -0.11738578
  -7.169265    -1.0578263   -5.7216787   -5.1173844   16.137651
  -4.473626     7.6624317   -0.55381083   9.631587    -6.4704556
  -8.548508     4.3716145   -0.79702514   4.478997    -2.9758704
  3.272176     2.8382776    5.134597    -9.190781    -0.5657382
  -4.8745747    2.3165567   -5.984303    -2.1798875    0.35541576
  -0.31784213   9.493548     2.1144536    4.358092   -12.089823
  8.451689    -7.925461     4.6242585    4.4289427   18.692003
  -2.6204622   -5.149185    -0.35821092   8.488551     4.981496
  -9.32683     -2.2544234    6.6417594    1.2119585   10.977129
  16.555033     3.3238444    9.551863    -1.6676947   -0.79539716
  -8.605674    -0.47356385   2.6741948   -5.359179    -2.6673796
  0.66607     15.443222     4.740594    -3.4725387   11.592567
  -2.054497     1.7361217   -8.265324    -9.30447      5.4068313
  -1.5180256   -7.746615    -6.089606     0.07112726  -0.34904733
  -8.649895    -9.998958    -2.564841    -0.53999114   2.601808
  -0.31927416  -1.8815292   -2.07215     -3.4105783   -8.2998085
  1.483641   -15.365992    -8.288208     3.8847756   -3.4876456
  7.3629923    0.4657332    3.132599    12.438889    -1.8337058
  4.532936     2.7264361   10.145339    -6.521951     2.897153
  -3.3925855    5.079156     7.759716     4.677565     5.8457737
  2.402413     7.7071047    3.9711342   -6.390043     6.1268735
  -3.7760346  -11.118123  ]

Python API

import paddle
from paddlespeech.cli import VectorExecutor

vector_executor = VectorExecutor()
audio_emb = vector_executor(
    model='ecapatdnn_voxceleb12',
    sample_rate=16000,
    config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
    ckpt_path=None,
    audio_file='./85236145389.wav',
    device=paddle.get_device())
print('Audio embedding Result: \n{}'.format(audio_emb))

test_emb = vector_executor(
    model='ecapatdnn_voxceleb12',
    sample_rate=16000,
    config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
    ckpt_path=None,
    audio_file='./123456789.wav',
    device=paddle.get_device())
print('Test embedding Result: \n{}'.format(test_emb))
score = vector_executor.get_embeddings_score(audio_emb, test_emb)
print(f"Eembeddings Score: {score}")

Output：

# Vector Result:
 Audio embedding Result:
  [  1.4217498    5.626253    -5.342073     1.1773866    3.308055
  1.756596     5.167894    10.80636     -3.8226728   -5.6141334
  2.623845    -0.8072968    1.9635103   -7.3128724    0.01103897
  -9.723131     0.6619743   -6.976803    10.213478     7.494748
  2.9105635    3.8949256    3.7999806    7.1061673   16.905321
  -7.1493764    8.733103     3.4230042   -4.831653   -11.403367
  11.232214     7.1274667   -4.2828417    2.452362    -5.130748
  -18.177666    -2.6116815  -11.000337    -6.7314315    1.6564683
  0.7618269    1.1253023   -2.083836     4.725744    -8.782597
  -3.539873     3.814236     5.1420674    2.162061     4.096431
  -6.4162116   12.747448     1.9429878  -15.152943     6.417416
  16.097002    -9.716668    -1.9920526   -3.3649497   -1.871939
  11.567354     3.69788     11.258265     7.442363     9.183411
  4.5281515   -1.2417862    4.3959084    6.6727695    5.8898783
  7.627124    -0.66919386 -11.889693    -9.208865    -7.4274073
  -3.7776625    6.917234    -9.848748    -2.0944717   -5.135116
  0.49563864   9.317534    -5.9141874   -1.8098574   -0.11738578
  -7.169265    -1.0578263   -5.7216787   -5.1173844   16.137651
  -4.473626     7.6624317   -0.55381083   9.631587    -6.4704556
  -8.548508     4.3716145   -0.79702514   4.478997    -2.9758704
  3.272176     2.8382776    5.134597    -9.190781    -0.5657382
  -4.8745747    2.3165567   -5.984303    -2.1798875    0.35541576
  -0.31784213   9.493548     2.1144536    4.358092   -12.089823
  8.451689    -7.925461     4.6242585    4.4289427   18.692003
  -2.6204622   -5.149185    -0.35821092   8.488551     4.981496
  -9.32683     -2.2544234    6.6417594    1.2119585   10.977129
  16.555033     3.3238444    9.551863    -1.6676947   -0.79539716
  -8.605674    -0.47356385   2.6741948   -5.359179    -2.6673796
  0.66607     15.443222     4.740594    -3.4725387   11.592567
  -2.054497     1.7361217   -8.265324    -9.30447      5.4068313
  -1.5180256   -7.746615    -6.089606     0.07112726  -0.34904733
  -8.649895    -9.998958    -2.564841    -0.53999114   2.601808
  -0.31927416  -1.8815292   -2.07215     -3.4105783   -8.2998085
  1.483641   -15.365992    -8.288208     3.8847756   -3.4876456
  7.3629923    0.4657332    3.132599    12.438889    -1.8337058
  4.532936     2.7264361   10.145339    -6.521951     2.897153
  -3.3925855    5.079156     7.759716     4.677565     5.8457737
  2.402413     7.7071047    3.9711342   -6.390043     6.1268735
  -3.7760346  -11.118123  ]
  # get the test embedding
  Test embedding Result:
  [ -1.902964     2.0690894   -8.034194     3.5472693    0.18089125
    6.9085927    1.4097427   -1.9487704  -10.021278    -0.20755845
    -8.04332      4.344489     2.3200977  -14.306299     5.184692
  -11.55602     -3.8497238    0.6444722    1.2833948    2.6766639
    0.5878921    0.7946299    1.7207596    2.5791872   14.998469
    -1.3385371   15.031221    -0.8006958    1.99287     -9.52007
    2.435466     4.003221    -4.33817     -4.898601    -5.304714
  -18.033886    10.790787   -12.784645    -5.641755     2.9761686
  -10.566622     1.4839455    6.152458    -5.7195854    2.8603241
    6.112133     8.489869     5.5958056    1.2836679   -1.2293907
    0.89927405   7.0288725   -2.854029    -0.9782962    5.8255906
    14.905906    -5.025907     0.7866458   -4.2444224  -16.354029
    10.521315     0.9604709   -3.3257897    7.144871   -13.592733
    -8.568869    -1.7953678    0.26313916  10.916714    -6.9374123
    1.857403    -6.2746415    2.8154466   -7.2338667   -2.293357
    -0.05452765   5.4287076    5.0849075   -6.690375    -1.6183422
    3.654291     0.94352573  -9.200294    -5.4749465   -3.5235846
    1.3420814    4.240421    -2.772944    -2.8451524   16.311104
    4.2969875   -1.762936   -12.5758915    8.595198    -0.8835239
    -1.5708797    1.568961     1.1413603    3.5032008   -0.45251232
    -6.786333    16.89443      5.3366146   -8.789056     0.6355629
    3.2579517   -3.328322     7.5969577    0.66025066  -6.550468
    -9.148656     2.020372    -0.4615173    1.1965656   -3.8764873
    11.6562195   -6.0750933   12.182899     3.2218833    0.81969476
    5.570001    -3.8459578   -7.205299     7.9262037   -7.6611166
    -5.249467    -2.2671914    7.2658715  -13.298164     4.821147
    -2.7263982   11.691089    -3.8918593   -2.838112    -1.0336838
    -3.8034165    2.8536487   -5.60398     -1.1972581    1.3455094
    -3.4903061    2.2408795    5.5010734   -3.970756    11.99696
    -7.8858757    0.43160373  -5.5059714    4.3426995   16.322706
    11.635366     0.72157705  -9.245714    -3.91465     -4.449838
    -1.5716927    7.713747    -2.2430465   -6.198303   -13.481864
    2.8156567   -5.7812386    5.1456156    2.7289324  -14.505571
    13.270688     3.448231    -7.0659585    4.5886116   -4.466099
    -0.296428   -11.463529    -2.6076477   14.110243    -6.9725137
    -1.9962958    2.7119343   19.391657     0.01961198  14.607133
    -1.6695905   -4.391516     1.3131028   -6.670972    -5.888604
    12.0612335    5.9285784    3.3715196    1.492534    10.723728
    -0.95514804 -12.085431  ]
  # get the score between enroll and test
  Eembeddings Score: 0.4292638301849365

4.Pretrained Models

Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API:

Model	Sample Rate
ecapatdnn_voxceleb12	16k

README.md Unescape Escape