You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
PaddleSpeech/demos/speaker_verification
xiongxinlei d5142e5e15
add vector cli annotation, test=doc
3 years ago
..
README.md remove incorrect doc, test=doc 3 years ago
README_cn.md add vector cli annotation, test=doc 3 years ago
run.sh add speaker verification demo and doc, test=doc 3 years ago

README.md

(简体中文|English)

Speech Verification)

Introduction

Speaker Verification, refers to the problem of getting a speaker embedding from an audio.

This demo is an implementation to extract speaker embedding from a specific audio file. It can be done by a single command or a few lines in python using PaddleSpeech.

Usage

1. Installation

see installation.

You can choose one way from easy, meduim and hard to install paddlespeech.

2. Prepare Input File

The input of this demo should be a WAV file(.wav), and the sample rate must be the same as the model.

Here are sample files for this demo that can be downloaded:

wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav

3. Usage

  • Command Line(Recommended)

    paddlespeech vector --task spk --input 85236145389.wav
    
    echo -e "demo1 85236145389.wav" > vec.job
    paddlespeech vector --task spk --input vec.job
    
    echo -e "demo2 85236145389.wav \n demo3 85236145389.wav" | paddlespeech vector --task spk
    

    Usage:

    paddlespeech asr --help
    

    Arguments:

    • input(required): Audio file to recognize.
    • model: Model type of asr task. Default: conformer_wenetspeech.
    • sample_rate: Sample rate of the model. Default: 16000.
    • config: Config of asr task. Use pretrained model when it is None. Default: None.
    • ckpt_path: Model checkpoint. Use pretrained model when it is None. Default: None.
    • device: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.

    Output:

  demo [ -5.749211     9.505463    -8.200284    -5.2075014    5.3940268
  -3.04878      1.611095    10.127234   -10.534177   -15.821609
   1.2032688   -0.35080156   1.2629458  -12.643498    -2.5758228
  -11.343508     2.3385992   -8.719341    14.213509    15.404744
    -0.39327756   6.338786     2.688887     8.7104025   17.469526
    -8.77959      7.0576906    4.648855    -1.3089896  -23.294737
    8.013747    13.891729    -9.926753     5.655307    -5.9422326
  -22.842539     0.6293588  -18.46266    -10.811862     9.8192625
    3.0070958    3.8072643   -2.3861165    3.0821571  -14.739942
    1.7594414   -0.6485091    4.485623     2.0207152    7.264915
    -6.40137     23.63524      2.9711294  -22.708025     9.93719
    20.354511   -10.324688    -0.700492    -8.783211    -5.27593
    15.999649     3.3004563   12.747926    15.429879     4.7849145
    5.6699696   -2.3826702   10.605882     3.9112158    3.1500628
    15.859915    -2.1832209  -23.908653    -6.4799504   -4.5365124
    -9.224193    14.568347   -10.568833     4.982321    -4.342062
    0.0914714   12.645902    -5.74285     -3.2141201   -2.7173362
    -6.680575     0.4757669   -5.035051    -6.7964664   16.865469
  -11.54324      7.681869     0.44475392   9.708182    -8.932846
    0.4123232   -4.361452     1.3948607    9.511665     0.11667654
    2.9079323    6.049952     9.275183   -18.078873     6.2983274
    -0.7500531   -2.725033    -7.6027865    3.3404543    2.990815
    4.010979    11.000591    -2.8873312    7.1352735  -16.79663
    18.495346   -14.293832     7.89578      2.2714825   22.976387
    -4.875734    -3.0836344   -2.9999814   13.751918     6.448228
  -11.924197     2.171869     2.0423572   -6.173772    10.778437
    25.77281     -4.9495463   14.57806      0.3044315    2.6132357
    -7.591999    -2.076944     9.025118     1.7834753   -3.1799617
    -4.9401326   23.465864     5.1685796   -9.018578     9.037825
    -4.4150195    6.859591   -12.274467    -0.88911164   5.186309
    -3.9988663  -13.638606    -9.925445    -0.06329413  -3.6709652
  -12.397416   -12.719869    -1.395601     2.1150916    5.7381287
    -4.4691963   -3.82819     -0.84233856  -1.1604277  -13.490127
    8.731719   -20.778936   -11.495662     5.8033476   -4.752041
    10.833007    -6.717991     4.504732    13.4244375    1.1306485
    7.3435574    1.400918    14.704036    -9.501399     7.2315617
    -6.417456     1.3333273   11.872697    -0.30664724   8.8845
    6.5569253    4.7948146    0.03662816  -8.704245     6.224871
    -3.2701402  -11.508579  ]
  • Python API

    import paddle
    from paddlespeech.cli import VectorExecutor
    
    vector_executor = VectorExecutor()
    audio_emb = vector_executor(
        model='ecapatdnn_voxceleb12',
        sample_rate=16000,
        config=None, 
        ckpt_path=None,
        audio_file='./85236145389.wav',
        force_yes=False,
        device=paddle.get_device())
    print('Audio embedding Result: \n{}'.format(audio_emb))
    

    Output:

    # Vector Result:
    [ -5.749211     9.505463    -8.200284    -5.2075014    5.3940268
    -3.04878      1.611095    10.127234   -10.534177   -15.821609
     1.2032688   -0.35080156   1.2629458  -12.643498    -2.5758228
    -11.343508     2.3385992   -8.719341    14.213509    15.404744
      -0.39327756   6.338786     2.688887     8.7104025   17.469526
      -8.77959      7.0576906    4.648855    -1.3089896  -23.294737
      8.013747    13.891729    -9.926753     5.655307    -5.9422326
    -22.842539     0.6293588  -18.46266    -10.811862     9.8192625
      3.0070958    3.8072643   -2.3861165    3.0821571  -14.739942
      1.7594414   -0.6485091    4.485623     2.0207152    7.264915
      -6.40137     23.63524      2.9711294  -22.708025     9.93719
      20.354511   -10.324688    -0.700492    -8.783211    -5.27593
      15.999649     3.3004563   12.747926    15.429879     4.7849145
      5.6699696   -2.3826702   10.605882     3.9112158    3.1500628
      15.859915    -2.1832209  -23.908653    -6.4799504   -4.5365124
      -9.224193    14.568347   -10.568833     4.982321    -4.342062
      0.0914714   12.645902    -5.74285     -3.2141201   -2.7173362
      -6.680575     0.4757669   -5.035051    -6.7964664   16.865469
    -11.54324      7.681869     0.44475392   9.708182    -8.932846
      0.4123232   -4.361452     1.3948607    9.511665     0.11667654
      2.9079323    6.049952     9.275183   -18.078873     6.2983274
      -0.7500531   -2.725033    -7.6027865    3.3404543    2.990815
      4.010979    11.000591    -2.8873312    7.1352735  -16.79663
      18.495346   -14.293832     7.89578      2.2714825   22.976387
      -4.875734    -3.0836344   -2.9999814   13.751918     6.448228
    -11.924197     2.171869     2.0423572   -6.173772    10.778437
      25.77281     -4.9495463   14.57806      0.3044315    2.6132357
      -7.591999    -2.076944     9.025118     1.7834753   -3.1799617
      -4.9401326   23.465864     5.1685796   -9.018578     9.037825
      -4.4150195    6.859591   -12.274467    -0.88911164   5.186309
      -3.9988663  -13.638606    -9.925445    -0.06329413  -3.6709652
    -12.397416   -12.719869    -1.395601     2.1150916    5.7381287
      -4.4691963   -3.82819     -0.84233856  -1.1604277  -13.490127
      8.731719   -20.778936   -11.495662     5.8033476   -4.752041
      10.833007    -6.717991     4.504732    13.4244375    1.1306485
      7.3435574    1.400918    14.704036    -9.501399     7.2315617
      -6.417456     1.3333273   11.872697    -0.30664724   8.8845
      6.5569253    4.7948146    0.03662816  -8.704245     6.224871
      -3.2701402  -11.508579  ]
    

4.Pretrained Models

Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API:

Model Sample Rate
ecapatdnn_voxceleb12 16k