8.3 KiB

Raw Blame History

Speech Verification)

Introduction

Speaker Verification, refers to the problem of getting a speaker embedding from an audio.

This demo is an implementation to extract speaker embedding from a specific audio file. It can be done by a single command or a few lines in python using PaddleSpeech.

Usage

1. Installation

see installation.

You can choose one way from easy, meduim and hard to install paddlespeech.

2. Prepare Input File

The input of this demo should be a WAV file(.wav), and the sample rate must be the same as the model.

Here are sample files for this demo that can be downloaded:

wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav

3. Usage

Command Line(Recommended)
```
paddlespeech vector --task spk --input 85236145389.wav

echo -e "demo1 85236145389.wav" > vec.job
paddlespeech vector --task spk --input vec.job

echo -e "demo2 85236145389.wav \n demo3 85236145389.wav" | paddlespeech vector --task spk
```
Usage:
```
paddlespeech vector --help
```
Arguments:
- input(required): Audio file to recognize.
- model: Model type of vector task. Default: ecapatdnn_voxceleb12.
- sample_rate: Sample rate of the model. Default: 16000.
- config: Config of vector task. Use pretrained model when it is None. Default: None.
- ckpt_path: Model checkpoint. Use pretrained model when it is None. Default: None.
- device: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.
Output:

  demo  {'dim': 192, 'embedding': array([ -5.749211  ,   9.505463  ,  -8.200284  ,  -5.2075014 ,
         5.3940268 ,  -3.04878   ,   1.611095  ,  10.127234  ,
       -10.534177  , -15.821609  ,   1.2032688 ,  -0.35080156,
         1.2629458 , -12.643498  ,  -2.5758228 , -11.343508  ,
         2.3385992 ,  -8.719341  ,  14.213509  ,  15.404744  ,
        -0.39327756,   6.338786  ,   2.688887  ,   8.7104025 ,
        17.469526  ,  -8.77959   ,   7.0576906 ,   4.648855  ,
        -1.3089896 , -23.294737  ,   8.013747  ,  13.891729  ,
        -9.926753  ,   5.655307  ,  -5.9422326 , -22.842539  ,
         0.6293588 , -18.46266   , -10.811862  ,   9.8192625 ,
         3.0070958 ,   3.8072643 ,  -2.3861165 ,   3.0821571 ,
       -14.739942  ,   1.7594414 ,  -0.6485091 ,   4.485623  ,
         2.0207152 ,   7.264915  ,  -6.40137   ,  23.63524   ,
         2.9711294 , -22.708025  ,   9.93719   ,  20.354511  ,
       -10.324688  ,  -0.700492  ,  -8.783211  ,  -5.27593   ,
        15.999649  ,   3.3004563 ,  12.747926  ,  15.429879  ,
         4.7849145 ,   5.6699696 ,  -2.3826702 ,  10.605882  ,
         3.9112158 ,   3.1500628 ,  15.859915  ,  -2.1832209 ,
       -23.908653  ,  -6.4799504 ,  -4.5365124 ,  -9.224193  ,
        14.568347  , -10.568833  ,   4.982321  ,  -4.342062  ,
         0.0914714 ,  12.645902  ,  -5.74285   ,  -3.2141201 ,
        -2.7173362 ,  -6.680575  ,   0.4757669 ,  -5.035051  ,
        -6.7964664 ,  16.865469  , -11.54324   ,   7.681869  ,
         0.44475392,   9.708182  ,  -8.932846  ,   0.4123232 ,
        -4.361452  ,   1.3948607 ,   9.511665  ,   0.11667654,
         2.9079323 ,   6.049952  ,   9.275183  , -18.078873  ,
         6.2983274 ,  -0.7500531 ,  -2.725033  ,  -7.6027865 ,
         3.3404543 ,   2.990815  ,   4.010979  ,  11.000591  ,
        -2.8873312 ,   7.1352735 , -16.79663   ,  18.495346  ,
       -14.293832  ,   7.89578   ,   2.2714825 ,  22.976387  ,
        -4.875734  ,  -3.0836344 ,  -2.9999814 ,  13.751918  ,
         6.448228  , -11.924197  ,   2.171869  ,   2.0423572 ,
        -6.173772  ,  10.778437  ,  25.77281   ,  -4.9495463 ,
        14.57806   ,   0.3044315 ,   2.6132357 ,  -7.591999  ,
        -2.076944  ,   9.025118  ,   1.7834753 ,  -3.1799617 ,
        -4.9401326 ,  23.465864  ,   5.1685796 ,  -9.018578  ,
         9.037825  ,  -4.4150195 ,   6.859591  , -12.274467  ,
        -0.88911164,   5.186309  ,  -3.9988663 , -13.638606  ,
        -9.925445  ,  -0.06329413,  -3.6709652 , -12.397416  ,
       -12.719869  ,  -1.395601  ,   2.1150916 ,   5.7381287 ,
        -4.4691963 ,  -3.82819   ,  -0.84233856,  -1.1604277 ,
       -13.490127  ,   8.731719  , -20.778936  , -11.495662  ,
         5.8033476 ,  -4.752041  ,  10.833007  ,  -6.717991  ,
         4.504732  ,  13.4244375 ,   1.1306485 ,   7.3435574 ,
         1.400918  ,  14.704036  ,  -9.501399  ,   7.2315617 ,
        -6.417456  ,   1.3333273 ,  11.872697  ,  -0.30664724,
         8.8845    ,   6.5569253 ,   4.7948146 ,   0.03662816,
        -8.704245  ,   6.224871  ,  -3.2701402 , -11.508579  ],
      dtype=float32)}

Python API

import paddle
from paddlespeech.cli import VectorExecutor

vector_executor = VectorExecutor()
audio_emb = vector_executor(
    model='ecapatdnn_voxceleb12',
    sample_rate=16000,
    config=None, 
    ckpt_path=None,
    audio_file='./85236145389.wav',
    force_yes=False,
    device=paddle.get_device())
print('Audio embedding Result: \n{}'.format(audio_emb))

Output:

# Vector Result:
 {'dim': 192, 'embedding': array([ -5.749211  ,   9.505463  ,  -8.200284  ,  -5.2075014 ,
       5.3940268 ,  -3.04878   ,   1.611095  ,  10.127234  ,
     -10.534177  , -15.821609  ,   1.2032688 ,  -0.35080156,
       1.2629458 , -12.643498  ,  -2.5758228 , -11.343508  ,
       2.3385992 ,  -8.719341  ,  14.213509  ,  15.404744  ,
      -0.39327756,   6.338786  ,   2.688887  ,   8.7104025 ,
      17.469526  ,  -8.77959   ,   7.0576906 ,   4.648855  ,
      -1.3089896 , -23.294737  ,   8.013747  ,  13.891729  ,
      -9.926753  ,   5.655307  ,  -5.9422326 , -22.842539  ,
       0.6293588 , -18.46266   , -10.811862  ,   9.8192625 ,
       3.0070958 ,   3.8072643 ,  -2.3861165 ,   3.0821571 ,
     -14.739942  ,   1.7594414 ,  -0.6485091 ,   4.485623  ,
       2.0207152 ,   7.264915  ,  -6.40137   ,  23.63524   ,
       2.9711294 , -22.708025  ,   9.93719   ,  20.354511  ,
     -10.324688  ,  -0.700492  ,  -8.783211  ,  -5.27593   ,
      15.999649  ,   3.3004563 ,  12.747926  ,  15.429879  ,
       4.7849145 ,   5.6699696 ,  -2.3826702 ,  10.605882  ,
       3.9112158 ,   3.1500628 ,  15.859915  ,  -2.1832209 ,
     -23.908653  ,  -6.4799504 ,  -4.5365124 ,  -9.224193  ,
      14.568347  , -10.568833  ,   4.982321  ,  -4.342062  ,
       0.0914714 ,  12.645902  ,  -5.74285   ,  -3.2141201 ,
      -2.7173362 ,  -6.680575  ,   0.4757669 ,  -5.035051  ,
      -6.7964664 ,  16.865469  , -11.54324   ,   7.681869  ,
       0.44475392,   9.708182  ,  -8.932846  ,   0.4123232 ,
      -4.361452  ,   1.3948607 ,   9.511665  ,   0.11667654,
       2.9079323 ,   6.049952  ,   9.275183  , -18.078873  ,
       6.2983274 ,  -0.7500531 ,  -2.725033  ,  -7.6027865 ,
       3.3404543 ,   2.990815  ,   4.010979  ,  11.000591  ,
      -2.8873312 ,   7.1352735 , -16.79663   ,  18.495346  ,
     -14.293832  ,   7.89578   ,   2.2714825 ,  22.976387  ,
      -4.875734  ,  -3.0836344 ,  -2.9999814 ,  13.751918  ,
       6.448228  , -11.924197  ,   2.171869  ,   2.0423572 ,
      -6.173772  ,  10.778437  ,  25.77281   ,  -4.9495463 ,
      14.57806   ,   0.3044315 ,   2.6132357 ,  -7.591999  ,
      -2.076944  ,   9.025118  ,   1.7834753 ,  -3.1799617 ,
      -4.9401326 ,  23.465864  ,   5.1685796 ,  -9.018578  ,
       9.037825  ,  -4.4150195 ,   6.859591  , -12.274467  ,
      -0.88911164,   5.186309  ,  -3.9988663 , -13.638606  ,
      -9.925445  ,  -0.06329413,  -3.6709652 , -12.397416  ,
     -12.719869  ,  -1.395601  ,   2.1150916 ,   5.7381287 ,
      -4.4691963 ,  -3.82819   ,  -0.84233856,  -1.1604277 ,
     -13.490127  ,   8.731719  , -20.778936  , -11.495662  ,
       5.8033476 ,  -4.752041  ,  10.833007  ,  -6.717991  ,
       4.504732  ,  13.4244375 ,   1.1306485 ,   7.3435574 ,
       1.400918  ,  14.704036  ,  -9.501399  ,   7.2315617 ,
      -6.417456  ,   1.3333273 ,  11.872697  ,  -0.30664724,
       8.8845    ,   6.5569253 ,   4.7948146 ,   0.03662816,
      -8.704245  ,   6.224871  ,  -3.2701402 , -11.508579  ],
    dtype=float32)}

4.Pretrained Models

Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API:

Model	Sample Rate
ecapatdnn_voxceleb12	16k

8.3 KiB Raw Blame History