commit
caee809513
@ -0,0 +1,178 @@
|
||||
([简体中文](./README_cn.md)|English)
|
||||
# Speech Verification)
|
||||
|
||||
## Introduction
|
||||
|
||||
Speaker Verification, refers to the problem of getting a speaker embedding from an audio.
|
||||
|
||||
This demo is an implementation to extract speaker embedding from a specific audio file. It can be done by a single command or a few lines in python using `PaddleSpeech`.
|
||||
|
||||
## Usage
|
||||
### 1. Installation
|
||||
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
|
||||
|
||||
You can choose one way from easy, meduim and hard to install paddlespeech.
|
||||
|
||||
### 2. Prepare Input File
|
||||
The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
|
||||
|
||||
Here are sample files for this demo that can be downloaded:
|
||||
```bash
|
||||
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
|
||||
```
|
||||
|
||||
### 3. Usage
|
||||
- Command Line(Recommended)
|
||||
```bash
|
||||
paddlespeech vector --task spk --input 85236145389.wav
|
||||
|
||||
echo -e "demo1 85236145389.wav" > vec.job
|
||||
paddlespeech vector --task spk --input vec.job
|
||||
|
||||
echo -e "demo2 85236145389.wav \n demo3 85236145389.wav" | paddlespeech vector --task spk
|
||||
```
|
||||
|
||||
Usage:
|
||||
```bash
|
||||
paddlespeech vector --help
|
||||
```
|
||||
Arguments:
|
||||
- `input`(required): Audio file to recognize.
|
||||
- `model`: Model type of vector task. Default: `ecapatdnn_voxceleb12`.
|
||||
- `sample_rate`: Sample rate of the model. Default: `16000`.
|
||||
- `config`: Config of vector task. Use pretrained model when it is None. Default: `None`.
|
||||
- `ckpt_path`: Model checkpoint. Use pretrained model when it is None. Default: `None`.
|
||||
- `device`: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.
|
||||
|
||||
Output:
|
||||
|
||||
```bash
|
||||
demo {'dim': 192, 'embedding': array([ -5.749211 , 9.505463 , -8.200284 , -5.2075014 ,
|
||||
5.3940268 , -3.04878 , 1.611095 , 10.127234 ,
|
||||
-10.534177 , -15.821609 , 1.2032688 , -0.35080156,
|
||||
1.2629458 , -12.643498 , -2.5758228 , -11.343508 ,
|
||||
2.3385992 , -8.719341 , 14.213509 , 15.404744 ,
|
||||
-0.39327756, 6.338786 , 2.688887 , 8.7104025 ,
|
||||
17.469526 , -8.77959 , 7.0576906 , 4.648855 ,
|
||||
-1.3089896 , -23.294737 , 8.013747 , 13.891729 ,
|
||||
-9.926753 , 5.655307 , -5.9422326 , -22.842539 ,
|
||||
0.6293588 , -18.46266 , -10.811862 , 9.8192625 ,
|
||||
3.0070958 , 3.8072643 , -2.3861165 , 3.0821571 ,
|
||||
-14.739942 , 1.7594414 , -0.6485091 , 4.485623 ,
|
||||
2.0207152 , 7.264915 , -6.40137 , 23.63524 ,
|
||||
2.9711294 , -22.708025 , 9.93719 , 20.354511 ,
|
||||
-10.324688 , -0.700492 , -8.783211 , -5.27593 ,
|
||||
15.999649 , 3.3004563 , 12.747926 , 15.429879 ,
|
||||
4.7849145 , 5.6699696 , -2.3826702 , 10.605882 ,
|
||||
3.9112158 , 3.1500628 , 15.859915 , -2.1832209 ,
|
||||
-23.908653 , -6.4799504 , -4.5365124 , -9.224193 ,
|
||||
14.568347 , -10.568833 , 4.982321 , -4.342062 ,
|
||||
0.0914714 , 12.645902 , -5.74285 , -3.2141201 ,
|
||||
-2.7173362 , -6.680575 , 0.4757669 , -5.035051 ,
|
||||
-6.7964664 , 16.865469 , -11.54324 , 7.681869 ,
|
||||
0.44475392, 9.708182 , -8.932846 , 0.4123232 ,
|
||||
-4.361452 , 1.3948607 , 9.511665 , 0.11667654,
|
||||
2.9079323 , 6.049952 , 9.275183 , -18.078873 ,
|
||||
6.2983274 , -0.7500531 , -2.725033 , -7.6027865 ,
|
||||
3.3404543 , 2.990815 , 4.010979 , 11.000591 ,
|
||||
-2.8873312 , 7.1352735 , -16.79663 , 18.495346 ,
|
||||
-14.293832 , 7.89578 , 2.2714825 , 22.976387 ,
|
||||
-4.875734 , -3.0836344 , -2.9999814 , 13.751918 ,
|
||||
6.448228 , -11.924197 , 2.171869 , 2.0423572 ,
|
||||
-6.173772 , 10.778437 , 25.77281 , -4.9495463 ,
|
||||
14.57806 , 0.3044315 , 2.6132357 , -7.591999 ,
|
||||
-2.076944 , 9.025118 , 1.7834753 , -3.1799617 ,
|
||||
-4.9401326 , 23.465864 , 5.1685796 , -9.018578 ,
|
||||
9.037825 , -4.4150195 , 6.859591 , -12.274467 ,
|
||||
-0.88911164, 5.186309 , -3.9988663 , -13.638606 ,
|
||||
-9.925445 , -0.06329413, -3.6709652 , -12.397416 ,
|
||||
-12.719869 , -1.395601 , 2.1150916 , 5.7381287 ,
|
||||
-4.4691963 , -3.82819 , -0.84233856, -1.1604277 ,
|
||||
-13.490127 , 8.731719 , -20.778936 , -11.495662 ,
|
||||
5.8033476 , -4.752041 , 10.833007 , -6.717991 ,
|
||||
4.504732 , 13.4244375 , 1.1306485 , 7.3435574 ,
|
||||
1.400918 , 14.704036 , -9.501399 , 7.2315617 ,
|
||||
-6.417456 , 1.3333273 , 11.872697 , -0.30664724,
|
||||
8.8845 , 6.5569253 , 4.7948146 , 0.03662816,
|
||||
-8.704245 , 6.224871 , -3.2701402 , -11.508579 ],
|
||||
dtype=float32)}
|
||||
```
|
||||
|
||||
- Python API
|
||||
```python
|
||||
import paddle
|
||||
from paddlespeech.cli import VectorExecutor
|
||||
|
||||
vector_executor = VectorExecutor()
|
||||
audio_emb = vector_executor(
|
||||
model='ecapatdnn_voxceleb12',
|
||||
sample_rate=16000,
|
||||
config=None,
|
||||
ckpt_path=None,
|
||||
audio_file='./85236145389.wav',
|
||||
force_yes=False,
|
||||
device=paddle.get_device())
|
||||
print('Audio embedding Result: \n{}'.format(audio_emb))
|
||||
```
|
||||
|
||||
Output:
|
||||
```bash
|
||||
# Vector Result:
|
||||
{'dim': 192, 'embedding': array([ -5.749211 , 9.505463 , -8.200284 , -5.2075014 ,
|
||||
5.3940268 , -3.04878 , 1.611095 , 10.127234 ,
|
||||
-10.534177 , -15.821609 , 1.2032688 , -0.35080156,
|
||||
1.2629458 , -12.643498 , -2.5758228 , -11.343508 ,
|
||||
2.3385992 , -8.719341 , 14.213509 , 15.404744 ,
|
||||
-0.39327756, 6.338786 , 2.688887 , 8.7104025 ,
|
||||
17.469526 , -8.77959 , 7.0576906 , 4.648855 ,
|
||||
-1.3089896 , -23.294737 , 8.013747 , 13.891729 ,
|
||||
-9.926753 , 5.655307 , -5.9422326 , -22.842539 ,
|
||||
0.6293588 , -18.46266 , -10.811862 , 9.8192625 ,
|
||||
3.0070958 , 3.8072643 , -2.3861165 , 3.0821571 ,
|
||||
-14.739942 , 1.7594414 , -0.6485091 , 4.485623 ,
|
||||
2.0207152 , 7.264915 , -6.40137 , 23.63524 ,
|
||||
2.9711294 , -22.708025 , 9.93719 , 20.354511 ,
|
||||
-10.324688 , -0.700492 , -8.783211 , -5.27593 ,
|
||||
15.999649 , 3.3004563 , 12.747926 , 15.429879 ,
|
||||
4.7849145 , 5.6699696 , -2.3826702 , 10.605882 ,
|
||||
3.9112158 , 3.1500628 , 15.859915 , -2.1832209 ,
|
||||
-23.908653 , -6.4799504 , -4.5365124 , -9.224193 ,
|
||||
14.568347 , -10.568833 , 4.982321 , -4.342062 ,
|
||||
0.0914714 , 12.645902 , -5.74285 , -3.2141201 ,
|
||||
-2.7173362 , -6.680575 , 0.4757669 , -5.035051 ,
|
||||
-6.7964664 , 16.865469 , -11.54324 , 7.681869 ,
|
||||
0.44475392, 9.708182 , -8.932846 , 0.4123232 ,
|
||||
-4.361452 , 1.3948607 , 9.511665 , 0.11667654,
|
||||
2.9079323 , 6.049952 , 9.275183 , -18.078873 ,
|
||||
6.2983274 , -0.7500531 , -2.725033 , -7.6027865 ,
|
||||
3.3404543 , 2.990815 , 4.010979 , 11.000591 ,
|
||||
-2.8873312 , 7.1352735 , -16.79663 , 18.495346 ,
|
||||
-14.293832 , 7.89578 , 2.2714825 , 22.976387 ,
|
||||
-4.875734 , -3.0836344 , -2.9999814 , 13.751918 ,
|
||||
6.448228 , -11.924197 , 2.171869 , 2.0423572 ,
|
||||
-6.173772 , 10.778437 , 25.77281 , -4.9495463 ,
|
||||
14.57806 , 0.3044315 , 2.6132357 , -7.591999 ,
|
||||
-2.076944 , 9.025118 , 1.7834753 , -3.1799617 ,
|
||||
-4.9401326 , 23.465864 , 5.1685796 , -9.018578 ,
|
||||
9.037825 , -4.4150195 , 6.859591 , -12.274467 ,
|
||||
-0.88911164, 5.186309 , -3.9988663 , -13.638606 ,
|
||||
-9.925445 , -0.06329413, -3.6709652 , -12.397416 ,
|
||||
-12.719869 , -1.395601 , 2.1150916 , 5.7381287 ,
|
||||
-4.4691963 , -3.82819 , -0.84233856, -1.1604277 ,
|
||||
-13.490127 , 8.731719 , -20.778936 , -11.495662 ,
|
||||
5.8033476 , -4.752041 , 10.833007 , -6.717991 ,
|
||||
4.504732 , 13.4244375 , 1.1306485 , 7.3435574 ,
|
||||
1.400918 , 14.704036 , -9.501399 , 7.2315617 ,
|
||||
-6.417456 , 1.3333273 , 11.872697 , -0.30664724,
|
||||
8.8845 , 6.5569253 , 4.7948146 , 0.03662816,
|
||||
-8.704245 , 6.224871 , -3.2701402 , -11.508579 ],
|
||||
dtype=float32)}
|
||||
```
|
||||
|
||||
### 4.Pretrained Models
|
||||
|
||||
Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API:
|
||||
|
||||
| Model | Sample Rate
|
||||
| :--- | :---: |
|
||||
| ecapatdnn_voxceleb12 | 16k
|
@ -0,0 +1,6 @@
|
||||
#!/bin/bash
|
||||
|
||||
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
|
||||
|
||||
# asr
|
||||
paddlespeech vector --task spk --input ./85236145389.wav
|
@ -0,0 +1,7 @@
|
||||
# VoxCeleb
|
||||
|
||||
## ECAPA-TDNN
|
||||
|
||||
| Model | Number of Params | Release | Config | dim | Test set | Cosine | Cosine + S-Norm |
|
||||
| --- | --- | --- | --- | --- | --- | --- | ---- |
|
||||
| ECAPA-TDNN | 85M | 0.1.1 | conf/ecapa_tdnn.yaml |192 | test | 1.15 | 1.06 |
|
@ -0,0 +1,13 @@
|
||||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
@ -0,0 +1,13 @@
|
||||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
@ -0,0 +1,13 @@
|
||||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
@ -0,0 +1,13 @@
|
||||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
@ -0,0 +1,13 @@
|
||||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
Loading…
Reference in new issue