声纹识别

介绍

声纹识别是一项用计算机程序自动提取说话人特征的技术。

这个 demo 是从一个给定音频文件中提取说话人特征，它可以通过使用 PaddleSpeech 的单个命令或 python 中的几行代码来实现。

使用方法

1. 安装

请看安装文档。

你可以从easy medium，hard 三种方式中选择一种方式安装。

2. 准备输入

声纹cli demo 的输入应该是一个 WAV 文件（.wav），并且采样率必须与模型的采样率相同。

可以下载此 demo 的示例音频：

# 该音频的内容是数字串 85236145389
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav

3. 使用方法

命令行 (推荐使用)

paddlespeech vector --task spk --input 85236145389.wav

echo -e "demo1 85236145389.wav" > vec.job
paddlespeech vector --task spk --input vec.job

echo -e "demo2 85236145389.wav \n demo3 85236145389.wav" | paddlespeech vector --task spk

paddlespeech vector --task score --input "./85236145389.wav ./123456789.wav"

echo -e "demo4 85236145389.wav 85236145389.wav \n demo5 85236145389.wav 123456789.wav" > vec.job
paddlespeech vector --task score --input vec.job

使用方法：

paddlespeech vector --help

参数：

input(必须输入)：用于识别的音频文件。
task (必须输入): 用于指定 vector 处理的具体任务，默认是 spk。
model：声纹任务的模型，默认值：ecapatdnn_voxceleb12。
sample_rate：音频采样率，默认值：16000。
config：声纹任务的参数文件，若不设置则使用预训练模型中的默认配置，默认值：None。
ckpt_path：模型参数文件，若不设置则下载预训练模型使用，默认值：None。
device：执行预测的设备，默认值：当前系统下 paddlepaddle 的默认 device。

输出：

demo  [  1.4217498    5.626253    -5.342073     1.1773866    3.308055
  1.756596     5.167894    10.80636     -3.8226728   -5.6141334
  2.623845    -0.8072968    1.9635103   -7.3128724    0.01103897
  -9.723131     0.6619743   -6.976803    10.213478     7.494748
  2.9105635    3.8949256    3.7999806    7.1061673   16.905321
  -7.1493764    8.733103     3.4230042   -4.831653   -11.403367
  11.232214     7.1274667   -4.2828417    2.452362    -5.130748
  -18.177666    -2.6116815  -11.000337    -6.7314315    1.6564683
  0.7618269    1.1253023   -2.083836     4.725744    -8.782597
  -3.539873     3.814236     5.1420674    2.162061     4.096431
  -6.4162116   12.747448     1.9429878  -15.152943     6.417416
  16.097002    -9.716668    -1.9920526   -3.3649497   -1.871939
  11.567354     3.69788     11.258265     7.442363     9.183411
  4.5281515   -1.2417862    4.3959084    6.6727695    5.8898783
  7.627124    -0.66919386 -11.889693    -9.208865    -7.4274073
  -3.7776625    6.917234    -9.848748    -2.0944717   -5.135116
  0.49563864   9.317534    -5.9141874   -1.8098574   -0.11738578
  -7.169265    -1.0578263   -5.7216787   -5.1173844   16.137651
  -4.473626     7.6624317   -0.55381083   9.631587    -6.4704556
  -8.548508     4.3716145   -0.79702514   4.478997    -2.9758704
  3.272176     2.8382776    5.134597    -9.190781    -0.5657382
  -4.8745747    2.3165567   -5.984303    -2.1798875    0.35541576
  -0.31784213   9.493548     2.1144536    4.358092   -12.089823
  8.451689    -7.925461     4.6242585    4.4289427   18.692003
  -2.6204622   -5.149185    -0.35821092   8.488551     4.981496
  -9.32683     -2.2544234    6.6417594    1.2119585   10.977129
  16.555033     3.3238444    9.551863    -1.6676947   -0.79539716
  -8.605674    -0.47356385   2.6741948   -5.359179    -2.6673796
  0.66607     15.443222     4.740594    -3.4725387   11.592567
  -2.054497     1.7361217   -8.265324    -9.30447      5.4068313
  -1.5180256   -7.746615    -6.089606     0.07112726  -0.34904733
  -8.649895    -9.998958    -2.564841    -0.53999114   2.601808
  -0.31927416  -1.8815292   -2.07215     -3.4105783   -8.2998085
  1.483641   -15.365992    -8.288208     3.8847756   -3.4876456
  7.3629923    0.4657332    3.132599    12.438889    -1.8337058
  4.532936     2.7264361   10.145339    -6.521951     2.897153
  -3.3925855    5.079156     7.759716     4.677565     5.8457737
  2.402413     7.7071047    3.9711342   -6.390043     6.1268735
  -3.7760346  -11.118123  ]

Python API

import paddle
from paddlespeech.cli import VectorExecutor

vector_executor = VectorExecutor()
audio_emb = vector_executor(
    model='ecapatdnn_voxceleb12',
    sample_rate=16000,
    config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
    ckpt_path=None,
    audio_file='./85236145389.wav',
    device=paddle.get_device())
print('Audio embedding Result: \n{}'.format(audio_emb))

test_emb = vector_executor(
    model='ecapatdnn_voxceleb12',
    sample_rate=16000,
    config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
    ckpt_path=None,
    audio_file='./123456789.wav',
    device=paddle.get_device())
print('Test embedding Result: \n{}'.format(test_emb))

# score range [0, 1]
score = vector_executor.get_embeddings_score(audio_emb, test_emb)
print(f"Eembeddings Score: {score}")

输出：

# Vector Result:
 Audio embedding Result:
  [  1.4217498    5.626253    -5.342073     1.1773866    3.308055
  1.756596     5.167894    10.80636     -3.8226728   -5.6141334
  2.623845    -0.8072968    1.9635103   -7.3128724    0.01103897
  -9.723131     0.6619743   -6.976803    10.213478     7.494748
  2.9105635    3.8949256    3.7999806    7.1061673   16.905321
  -7.1493764    8.733103     3.4230042   -4.831653   -11.403367
  11.232214     7.1274667   -4.2828417    2.452362    -5.130748
  -18.177666    -2.6116815  -11.000337    -6.7314315    1.6564683
  0.7618269    1.1253023   -2.083836     4.725744    -8.782597
  -3.539873     3.814236     5.1420674    2.162061     4.096431
  -6.4162116   12.747448     1.9429878  -15.152943     6.417416
  16.097002    -9.716668    -1.9920526   -3.3649497   -1.871939
  11.567354     3.69788     11.258265     7.442363     9.183411
  4.5281515   -1.2417862    4.3959084    6.6727695    5.8898783
  7.627124    -0.66919386 -11.889693    -9.208865    -7.4274073
  -3.7776625    6.917234    -9.848748    -2.0944717   -5.135116
  0.49563864   9.317534    -5.9141874   -1.8098574   -0.11738578
  -7.169265    -1.0578263   -5.7216787   -5.1173844   16.137651
  -4.473626     7.6624317   -0.55381083   9.631587    -6.4704556
  -8.548508     4.3716145   -0.79702514   4.478997    -2.9758704
  3.272176     2.8382776    5.134597    -9.190781    -0.5657382
  -4.8745747    2.3165567   -5.984303    -2.1798875    0.35541576
  -0.31784213   9.493548     2.1144536    4.358092   -12.089823
  8.451689    -7.925461     4.6242585    4.4289427   18.692003
  -2.6204622   -5.149185    -0.35821092   8.488551     4.981496
  -9.32683     -2.2544234    6.6417594    1.2119585   10.977129
  16.555033     3.3238444    9.551863    -1.6676947   -0.79539716
  -8.605674    -0.47356385   2.6741948   -5.359179    -2.6673796
  0.66607     15.443222     4.740594    -3.4725387   11.592567
  -2.054497     1.7361217   -8.265324    -9.30447      5.4068313
  -1.5180256   -7.746615    -6.089606     0.07112726  -0.34904733
  -8.649895    -9.998958    -2.564841    -0.53999114   2.601808
  -0.31927416  -1.8815292   -2.07215     -3.4105783   -8.2998085
  1.483641   -15.365992    -8.288208     3.8847756   -3.4876456
  7.3629923    0.4657332    3.132599    12.438889    -1.8337058
  4.532936     2.7264361   10.145339    -6.521951     2.897153
  -3.3925855    5.079156     7.759716     4.677565     5.8457737
  2.402413     7.7071047    3.9711342   -6.390043     6.1268735
  -3.7760346  -11.118123  ]
  # get the test embedding
  Test embedding Result:
  [ -1.902964     2.0690894   -8.034194     3.5472693    0.18089125
    6.9085927    1.4097427   -1.9487704  -10.021278    -0.20755845
    -8.04332      4.344489     2.3200977  -14.306299     5.184692
  -11.55602     -3.8497238    0.6444722    1.2833948    2.6766639
    0.5878921    0.7946299    1.7207596    2.5791872   14.998469
    -1.3385371   15.031221    -0.8006958    1.99287     -9.52007
    2.435466     4.003221    -4.33817     -4.898601    -5.304714
  -18.033886    10.790787   -12.784645    -5.641755     2.9761686
  -10.566622     1.4839455    6.152458    -5.7195854    2.8603241
    6.112133     8.489869     5.5958056    1.2836679   -1.2293907
    0.89927405   7.0288725   -2.854029    -0.9782962    5.8255906
    14.905906    -5.025907     0.7866458   -4.2444224  -16.354029
    10.521315     0.9604709   -3.3257897    7.144871   -13.592733
    -8.568869    -1.7953678    0.26313916  10.916714    -6.9374123
    1.857403    -6.2746415    2.8154466   -7.2338667   -2.293357
    -0.05452765   5.4287076    5.0849075   -6.690375    -1.6183422
    3.654291     0.94352573  -9.200294    -5.4749465   -3.5235846
    1.3420814    4.240421    -2.772944    -2.8451524   16.311104
    4.2969875   -1.762936   -12.5758915    8.595198    -0.8835239
    -1.5708797    1.568961     1.1413603    3.5032008   -0.45251232
    -6.786333    16.89443      5.3366146   -8.789056     0.6355629
    3.2579517   -3.328322     7.5969577    0.66025066  -6.550468
    -9.148656     2.020372    -0.4615173    1.1965656   -3.8764873
    11.6562195   -6.0750933   12.182899     3.2218833    0.81969476
    5.570001    -3.8459578   -7.205299     7.9262037   -7.6611166
    -5.249467    -2.2671914    7.2658715  -13.298164     4.821147
    -2.7263982   11.691089    -3.8918593   -2.838112    -1.0336838
    -3.8034165    2.8536487   -5.60398     -1.1972581    1.3455094
    -3.4903061    2.2408795    5.5010734   -3.970756    11.99696
    -7.8858757    0.43160373  -5.5059714    4.3426995   16.322706
    11.635366     0.72157705  -9.245714    -3.91465     -4.449838
    -1.5716927    7.713747    -2.2430465   -6.198303   -13.481864
    2.8156567   -5.7812386    5.1456156    2.7289324  -14.505571
    13.270688     3.448231    -7.0659585    4.5886116   -4.466099
    -0.296428   -11.463529    -2.6076477   14.110243    -6.9725137
    -1.9962958    2.7119343   19.391657     0.01961198  14.607133
    -1.6695905   -4.391516     1.3131028   -6.670972    -5.888604
    12.0612335    5.9285784    3.3715196    1.492534    10.723728
    -0.95514804 -12.085431  ]
  # get the score between enroll and test
  Eembeddings Score: 0.4292638301849365

4.预训练模型

以下是 PaddleSpeech 提供的可以被命令行和 python API 使用的预训练模型列表：

模型	采样率
ecapatdnn_voxceleb12	16k

11 KiB Raw Blame History Unescape Escape