You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
11 KiB
11 KiB
(简体中文|English)
Speech Verification
Introduction
Speaker Verification, refers to the problem of getting a speaker embedding from an audio.
This demo is an implementation to extract speaker embedding from a specific audio file. It can be done by a single command or a few lines in python using PaddleSpeech.
Usage
1. Installation
see installation.
You can choose one way from easy, medium and hard to install paddlespeech.
2. Prepare Input File
The input of this cli demo should be a WAV file(.wav), and the sample rate must be the same as the model.
Here are sample files for this demo that can be downloaded:
wget -c https://paddlespeech.cdn.bcebos.com/vector/audio/85236145389.wav
wget -c https://paddlespeech.cdn.bcebos.com/vector/audio/123456789.wav
3. Usage
-
Command Line(Recommended)
paddlespeech vector --task spk --input 85236145389.wav echo -e "demo1 85236145389.wav" > vec.job paddlespeech vector --task spk --input vec.job echo -e "demo2 85236145389.wav \n demo3 85236145389.wav" | paddlespeech vector --task spk paddlespeech vector --task score --input "./85236145389.wav ./123456789.wav" echo -e "demo4 85236145389.wav 85236145389.wav \n demo5 85236145389.wav 123456789.wav" > vec.job paddlespeech vector --task score --input vec.jobUsage:
paddlespeech vector --helpArguments:
input(required): Audio file to recognize.task(required): Specifyvectortask. Defaultspk。model: Model type of vector task. Default:ecapatdnn_voxceleb12.sample_rate: Sample rate of the model. Default:16000.config: Config of vector task. Use pretrained model when it is None. Default:None.ckpt_path: Model checkpoint. Use pretrained model when it is None. Default:None.device: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.
Output:
demo [ -1.3251206 7.8606825 -4.620626 0.3000721 2.2648535 -1.1931441 3.0647137 7.673595 -6.0044727 -12.02426 -1.9496069 3.1269536 1.618838 -7.6383104 -1.2299773 -12.338331 2.1373026 -5.3957124 9.717328 5.6752305 3.7805123 3.0597172 3.429692 8.97601 13.174125 -0.53132284 8.9424715 4.46511 -4.4262476 -9.726503 8.399328 7.2239175 -7.435854 2.9441683 -4.3430395 -13.886965 -1.6346735 -10.9027405 -5.311245 3.8007221 3.8976038 -2.1230774 -2.3521194 4.151031 -7.4048667 0.13911647 2.4626107 4.9664545 0.9897574 5.4839754 -3.3574002 10.1340065 -0.6120171 -10.403095 4.6007543 16.00935 -7.7836914 -4.1945305 -6.9368606 1.1789556 11.490801 4.2380238 9.550931 8.375046 7.5089145 -0.65707296 -0.30051577 2.8406055 3.0828028 0.730817 6.148354 0.13766119 -13.424735 -7.7461405 -2.3227983 -8.305252 2.9879124 -10.995229 0.15211068 -2.3820348 -1.7984174 8.495629 -5.8522367 -3.755498 0.6989711 -5.2702994 -2.6188622 -1.8828466 -4.64665 14.078544 -0.5495333 10.579158 -3.2160501 9.349004 -4.381078 -11.675817 -2.8630207 4.5721755 2.246612 -4.574342 1.8610188 2.3767874 5.6257877 -9.784078 0.64967257 -1.4579505 0.4263264 -4.9211264 -2.454784 3.4869802 -0.42654222 8.341269 1.356552 7.0966883 -13.102829 8.016734 -7.1159344 1.8699781 0.208721 14.699384 -1.025278 -2.6107233 -2.5082312 8.427193 6.9138527 -6.2912464 0.6157366 2.489688 -3.4668267 9.921763 11.200815 -0.1966403 7.4916005 -0.62312716 -0.25848144 -9.947997 -0.9611041 1.1649219 -2.1907122 -1.5028487 -0.51926106 15.165954 2.4649463 -0.9980445 7.4416637 -2.0768049 3.5896823 -7.3055434 -7.5620847 4.323335 0.0804418 -6.56401 -2.3148053 -1.7642345 -2.4708817 -7.675618 -9.548878 -1.0177554 0.16986446 2.5877135 -1.8752296 -0.36614323 -6.0493784 -2.3965611 -5.9453387 0.9424033 -13.155974 -7.457801 0.14658108 -3.742797 5.8414927 -1.2872906 5.5694313 12.57059 1.0939219 2.2142086 1.9181576 6.9914207 -5.888139 3.1409824 -2.003628 2.4434285 9.973139 5.03668 2.0051203 2.8615603 5.860224 2.9176188 -1.6311141 2.0292206 -4.070415 -6.831437 ] -
Python API
from paddlespeech.cli.vector import VectorExecutor vector_executor = VectorExecutor() audio_emb = vector_executor( model='ecapatdnn_voxceleb12', sample_rate=16000, config=None, # Set `config` and `ckpt_path` to None to use pretrained model. ckpt_path=None, audio_file='./85236145389.wav', device=paddle.get_device()) print('Audio embedding Result: \n{}'.format(audio_emb)) test_emb = vector_executor( model='ecapatdnn_voxceleb12', sample_rate=16000, config=None, # Set `config` and `ckpt_path` to None to use pretrained model. ckpt_path=None, audio_file='./123456789.wav', device=paddle.get_device()) print('Test embedding Result: \n{}'.format(test_emb)) # score range [0, 1] score = vector_executor.get_embeddings_score(audio_emb, test_emb) print(f"Eembeddings Score: {score}")Output:
# Vector Result: Audio embedding Result: [ -1.3251206 7.8606825 -4.620626 0.3000721 2.2648535 -1.1931441 3.0647137 7.673595 -6.0044727 -12.02426 -1.9496069 3.1269536 1.618838 -7.6383104 -1.2299773 -12.338331 2.1373026 -5.3957124 9.717328 5.6752305 3.7805123 3.0597172 3.429692 8.97601 13.174125 -0.53132284 8.9424715 4.46511 -4.4262476 -9.726503 8.399328 7.2239175 -7.435854 2.9441683 -4.3430395 -13.886965 -1.6346735 -10.9027405 -5.311245 3.8007221 3.8976038 -2.1230774 -2.3521194 4.151031 -7.4048667 0.13911647 2.4626107 4.9664545 0.9897574 5.4839754 -3.3574002 10.1340065 -0.6120171 -10.403095 4.6007543 16.00935 -7.7836914 -4.1945305 -6.9368606 1.1789556 11.490801 4.2380238 9.550931 8.375046 7.5089145 -0.65707296 -0.30051577 2.8406055 3.0828028 0.730817 6.148354 0.13766119 -13.424735 -7.7461405 -2.3227983 -8.305252 2.9879124 -10.995229 0.15211068 -2.3820348 -1.7984174 8.495629 -5.8522367 -3.755498 0.6989711 -5.2702994 -2.6188622 -1.8828466 -4.64665 14.078544 -0.5495333 10.579158 -3.2160501 9.349004 -4.381078 -11.675817 -2.8630207 4.5721755 2.246612 -4.574342 1.8610188 2.3767874 5.6257877 -9.784078 0.64967257 -1.4579505 0.4263264 -4.9211264 -2.454784 3.4869802 -0.42654222 8.341269 1.356552 7.0966883 -13.102829 8.016734 -7.1159344 1.8699781 0.208721 14.699384 -1.025278 -2.6107233 -2.5082312 8.427193 6.9138527 -6.2912464 0.6157366 2.489688 -3.4668267 9.921763 11.200815 -0.1966403 7.4916005 -0.62312716 -0.25848144 -9.947997 -0.9611041 1.1649219 -2.1907122 -1.5028487 -0.51926106 15.165954 2.4649463 -0.9980445 7.4416637 -2.0768049 3.5896823 -7.3055434 -7.5620847 4.323335 0.0804418 -6.56401 -2.3148053 -1.7642345 -2.4708817 -7.675618 -9.548878 -1.0177554 0.16986446 2.5877135 -1.8752296 -0.36614323 -6.0493784 -2.3965611 -5.9453387 0.9424033 -13.155974 -7.457801 0.14658108 -3.742797 5.8414927 -1.2872906 5.5694313 12.57059 1.0939219 2.2142086 1.9181576 6.9914207 -5.888139 3.1409824 -2.003628 2.4434285 9.973139 5.03668 2.0051203 2.8615603 5.860224 2.9176188 -1.6311141 2.0292206 -4.070415 -6.831437 ] # get the test embedding Test embedding Result: [ 2.5247195 5.119042 -4.335273 4.4583654 5.047907 3.5059214 1.6159848 0.49364898 -11.6899185 -3.1014526 -5.6589785 -0.42684984 2.674276 -11.937654 6.2248464 -10.776924 -5.694543 1.112041 1.5709964 1.0961034 1.3976512 2.324352 1.339981 5.279319 13.734659 -2.5753925 13.651442 -2.2357535 5.1575427 -3.251567 1.4023279 6.1191974 -6.0845175 -1.3646189 -2.6789894 -15.220778 9.779349 -9.411551 -6.388947 6.8313975 -9.245996 0.31196198 2.5509644 -4.413065 6.1649427 6.793837 2.6328635 8.620976 3.4832475 0.52491665 2.9115407 5.8392377 0.6702376 -3.2726715 2.6694255 16.91701 -5.5811176 0.23362345 -4.5573606 -11.801059 14.728292 -0.5198082 -3.999922 7.0927105 -7.0459595 -5.4389 -0.46420583 -5.1085467 10.376568 -8.889225 -0.37705845 -1.659806 2.6731026 -7.1909504 1.4608804 -2.163136 -0.17949677 4.0241547 0.11319201 0.601279 2.039692 3.1910992 -11.649526 -8.121584 -4.8707457 0.3851982 1.4231744 -2.3321972 0.99332285 14.121717 5.899413 0.7384519 -17.760096 10.555021 4.1366534 -0.3391071 -0.20792882 3.208204 0.8847948 -8.721497 -6.432868 13.006379 4.8956 -9.155822 -1.9441519 5.7815638 -2.066733 10.425042 -0.8802383 -2.4314315 -9.869258 0.35095334 -5.3549943 2.1076174 -8.290468 8.4433365 -4.689333 9.334139 -2.172678 -3.0250976 8.394216 -3.2110903 -7.93868 2.3960824 -2.3213403 -1.4963245 -3.476059 4.132903 -10.893354 4.362673 -0.45456508 10.258634 -1.1655927 -6.7799754 0.22885278 -4.399287 2.333433 -4.84745 -4.2752337 -1.3577863 -1.0685898 9.505196 7.3062205 0.08708266 12.927811 -9.57974 1.3936648 -1.9444873 5.776769 15.251903 10.6118355 -1.4903594 -9.535318 -3.6553776 -1.6699586 -0.5933151 7.600357 -4.8815503 -8.698617 -15.855757 0.25632986 -7.2235737 0.9506656 0.7128582 -9.051738 8.74869 -1.6426028 -6.5762258 2.506905 -6.7431564 5.129912 -12.189555 -3.6435068 12.068113 -6.0059533 -2.3535995 2.9014351 22.3082 -1.5563312 13.193291 2.7583609 -7.468798 1.3407065 -4.599617 -6.2345777 10.7689295 7.137627 5.099476 0.3473359 9.647881 -2.0484571 -5.8549366 ] # get the score between enroll and test Eembeddings Score: 0.45332613587379456
4.Pretrained Models
Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API:
| Model | Sample Rate |
|---|---|
| ecapatdnn_voxceleb12 | 16k |