From ad2caf2ccba6e18fee2a360d6449b3816ed8e71a Mon Sep 17 00:00:00 2001 From: xiongxinlei Date: Fri, 25 Mar 2022 18:31:48 +0800 Subject: [PATCH] add speaker verification demo and doc, test=doc --- README.md | 29 +++++ README_cn.md | 29 +++++ demos/speaker_verification/README.md | 158 ++++++++++++++++++++++++ demos/speaker_verification/README_cn.md | 156 +++++++++++++++++++++++ demos/speaker_verification/run.sh | 6 + docs/source/released_model.md | 8 +- examples/voxceleb/sv0/RESULT.md | 8 ++ paddlespeech/cli/README.md | 6 + paddlespeech/cli/README_cn.md | 6 + 9 files changed, 405 insertions(+), 1 deletion(-) create mode 100644 demos/speaker_verification/README.md create mode 100644 demos/speaker_verification/README_cn.md create mode 100644 demos/speaker_verification/run.sh create mode 100644 examples/voxceleb/sv0/RESULT.md diff --git a/README.md b/README.md index ceef15af..cb2b1227 100644 --- a/README.md +++ b/README.md @@ -203,6 +203,11 @@ Developers can have a try of our models with [PaddleSpeech Command Line](./paddl paddlespeech cls --input input.wav ``` +**Speaker Verification** +``` +paddlespeech vector --task spk --input input_16k.wav +``` + **Automatic Speech Recognition** ```shell paddlespeech asr --lang zh --input input_16k.wav @@ -458,6 +463,29 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r +**Speaker Verification** + + + + + + + + + + + + + + + + + + +
Task Dataset Model Type Link
Speaker VerificationVoxCeleb12ECAPA-TDNN + ecapa-tdnn-voxceleb12 +
+ **Punctuation Restoration** @@ -499,6 +527,7 @@ Normally, [Speech SoTA](https://paperswithcode.com/area/speech), [Audio SoTA](ht - [Chinese Rule Based Text Frontend](./docs/source/tts/zh_text_frontend.md) - [Test Audio Samples](https://paddlespeech.readthedocs.io/en/latest/tts/demo.html) - [Audio Classification](./demos/audio_tagging/README.md) + - [Speaker Verification](./demos/speaker_verification/README.md) - [Speech Translation](./demos/speech_translation/README.md) - [Released Models](./docs/source/released_model.md) - [Community](#Community) diff --git a/README_cn.md b/README_cn.md index 8ea91e98..4d88ab8b 100644 --- a/README_cn.md +++ b/README_cn.md @@ -202,6 +202,10 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme ```shell paddlespeech cls --input input.wav ``` +**声纹识别** +```shell +paddlespeech vector --task spk --input input_16k.wav +``` **语音识别** ```shell paddlespeech asr --lang zh --input input_16k.wav @@ -453,6 +457,30 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
+ +**声纹识别** + + + + + + + + + + + + + + + + + + +
Task Dataset Model Type Link
Speaker VerificationVoxCeleb12ECAPA-TDNN + ecapa-tdnn-voxceleb12 +
+ **标点恢复** @@ -499,6 +527,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声 - [中文文本前端](./docs/source/tts/zh_text_frontend.md) - [测试语音样本](https://paddlespeech.readthedocs.io/en/latest/tts/demo.html) - [声音分类](./demos/audio_tagging/README_cn.md) + - [声纹识别](./demos/speaker_verification/README_cn.md) - [语音翻译](./demos/speech_translation/README_cn.md) - [模型列表](#模型列表) - [语音识别](#语音识别模型) diff --git a/demos/speaker_verification/README.md b/demos/speaker_verification/README.md new file mode 100644 index 00000000..b1dfbc7c --- /dev/null +++ b/demos/speaker_verification/README.md @@ -0,0 +1,158 @@ +([简体中文](./README_cn.md)|English) +# Speech Verification) + +## Introduction + +Speaker Verification, refers to the problem of getting a speaker embedding from an audio. + +This demo is an implementation to extract speaker embedding from a specific audio file. It can be done by a single command or a few lines in python using `PaddleSpeech`. + +## Usage +### 1. Installation +see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). + +You can choose one way from easy, meduim and hard to install paddlespeech. + +### 2. Prepare Input File +The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model. + +Here are sample files for this demo that can be downloaded: +```bash +wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav +``` + +### 3. Usage +- Command Line(Recommended) + ```bash + paddlespeech vector --task spk --input 85236145389.wav + + echo -e "demo1 85236145389.wav" > vec.job + paddlespeech vector --task spk --input vec.job + + echo -e "demo2 85236145389.wav \n demo3 85236145389.wav" | paddlespeech vector --task spk + ``` + + Usage: + ```bash + paddlespeech asr --help + ``` + Arguments: + - `input`(required): Audio file to recognize. + - `model`: Model type of asr task. Default: `conformer_wenetspeech`. + - `sample_rate`: Sample rate of the model. Default: `16000`. + - `config`: Config of asr task. Use pretrained model when it is None. Default: `None`. + - `ckpt_path`: Model checkpoint. Use pretrained model when it is None. Default: `None`. + - `device`: Choose device to execute model inference. Default: default device of paddlepaddle in current environment. + + Output: + +```bash + demo [ -5.749211 9.505463 -8.200284 -5.2075014 5.3940268 + -3.04878 1.611095 10.127234 -10.534177 -15.821609 + 1.2032688 -0.35080156 1.2629458 -12.643498 -2.5758228 + -11.343508 2.3385992 -8.719341 14.213509 15.404744 + -0.39327756 6.338786 2.688887 8.7104025 17.469526 + -8.77959 7.0576906 4.648855 -1.3089896 -23.294737 + 8.013747 13.891729 -9.926753 5.655307 -5.9422326 + -22.842539 0.6293588 -18.46266 -10.811862 9.8192625 + 3.0070958 3.8072643 -2.3861165 3.0821571 -14.739942 + 1.7594414 -0.6485091 4.485623 2.0207152 7.264915 + -6.40137 23.63524 2.9711294 -22.708025 9.93719 + 20.354511 -10.324688 -0.700492 -8.783211 -5.27593 + 15.999649 3.3004563 12.747926 15.429879 4.7849145 + 5.6699696 -2.3826702 10.605882 3.9112158 3.1500628 + 15.859915 -2.1832209 -23.908653 -6.4799504 -4.5365124 + -9.224193 14.568347 -10.568833 4.982321 -4.342062 + 0.0914714 12.645902 -5.74285 -3.2141201 -2.7173362 + -6.680575 0.4757669 -5.035051 -6.7964664 16.865469 + -11.54324 7.681869 0.44475392 9.708182 -8.932846 + 0.4123232 -4.361452 1.3948607 9.511665 0.11667654 + 2.9079323 6.049952 9.275183 -18.078873 6.2983274 + -0.7500531 -2.725033 -7.6027865 3.3404543 2.990815 + 4.010979 11.000591 -2.8873312 7.1352735 -16.79663 + 18.495346 -14.293832 7.89578 2.2714825 22.976387 + -4.875734 -3.0836344 -2.9999814 13.751918 6.448228 + -11.924197 2.171869 2.0423572 -6.173772 10.778437 + 25.77281 -4.9495463 14.57806 0.3044315 2.6132357 + -7.591999 -2.076944 9.025118 1.7834753 -3.1799617 + -4.9401326 23.465864 5.1685796 -9.018578 9.037825 + -4.4150195 6.859591 -12.274467 -0.88911164 5.186309 + -3.9988663 -13.638606 -9.925445 -0.06329413 -3.6709652 + -12.397416 -12.719869 -1.395601 2.1150916 5.7381287 + -4.4691963 -3.82819 -0.84233856 -1.1604277 -13.490127 + 8.731719 -20.778936 -11.495662 5.8033476 -4.752041 + 10.833007 -6.717991 4.504732 13.4244375 1.1306485 + 7.3435574 1.400918 14.704036 -9.501399 7.2315617 + -6.417456 1.3333273 11.872697 -0.30664724 8.8845 + 6.5569253 4.7948146 0.03662816 -8.704245 6.224871 + -3.2701402 -11.508579 ] + ``` + +- Python API + ```python + import paddle + from paddlespeech.cli import VectorExecutor + + vector_executor = VectorExecutor() + audio_emb = vector_executor( + model='ecapatdnn_voxceleb12', + sample_rate=16000, + config=None, + ckpt_path=None, + audio_file='./85236145389.wav', + force_yes=False, + device=paddle.get_device()) + print('Audio embedding Result: \n{}'.format(audio_emb)) + ``` + + Output: + ```bash + # Vector Result: + [ -5.749211 9.505463 -8.200284 -5.2075014 5.3940268 + -3.04878 1.611095 10.127234 -10.534177 -15.821609 + 1.2032688 -0.35080156 1.2629458 -12.643498 -2.5758228 + -11.343508 2.3385992 -8.719341 14.213509 15.404744 + -0.39327756 6.338786 2.688887 8.7104025 17.469526 + -8.77959 7.0576906 4.648855 -1.3089896 -23.294737 + 8.013747 13.891729 -9.926753 5.655307 -5.9422326 + -22.842539 0.6293588 -18.46266 -10.811862 9.8192625 + 3.0070958 3.8072643 -2.3861165 3.0821571 -14.739942 + 1.7594414 -0.6485091 4.485623 2.0207152 7.264915 + -6.40137 23.63524 2.9711294 -22.708025 9.93719 + 20.354511 -10.324688 -0.700492 -8.783211 -5.27593 + 15.999649 3.3004563 12.747926 15.429879 4.7849145 + 5.6699696 -2.3826702 10.605882 3.9112158 3.1500628 + 15.859915 -2.1832209 -23.908653 -6.4799504 -4.5365124 + -9.224193 14.568347 -10.568833 4.982321 -4.342062 + 0.0914714 12.645902 -5.74285 -3.2141201 -2.7173362 + -6.680575 0.4757669 -5.035051 -6.7964664 16.865469 + -11.54324 7.681869 0.44475392 9.708182 -8.932846 + 0.4123232 -4.361452 1.3948607 9.511665 0.11667654 + 2.9079323 6.049952 9.275183 -18.078873 6.2983274 + -0.7500531 -2.725033 -7.6027865 3.3404543 2.990815 + 4.010979 11.000591 -2.8873312 7.1352735 -16.79663 + 18.495346 -14.293832 7.89578 2.2714825 22.976387 + -4.875734 -3.0836344 -2.9999814 13.751918 6.448228 + -11.924197 2.171869 2.0423572 -6.173772 10.778437 + 25.77281 -4.9495463 14.57806 0.3044315 2.6132357 + -7.591999 -2.076944 9.025118 1.7834753 -3.1799617 + -4.9401326 23.465864 5.1685796 -9.018578 9.037825 + -4.4150195 6.859591 -12.274467 -0.88911164 5.186309 + -3.9988663 -13.638606 -9.925445 -0.06329413 -3.6709652 + -12.397416 -12.719869 -1.395601 2.1150916 5.7381287 + -4.4691963 -3.82819 -0.84233856 -1.1604277 -13.490127 + 8.731719 -20.778936 -11.495662 5.8033476 -4.752041 + 10.833007 -6.717991 4.504732 13.4244375 1.1306485 + 7.3435574 1.400918 14.704036 -9.501399 7.2315617 + -6.417456 1.3333273 11.872697 -0.30664724 8.8845 + 6.5569253 4.7948146 0.03662816 -8.704245 6.224871 + -3.2701402 -11.508579 ] + ``` + +### 4.Pretrained Models + +Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API: + +| Model | Sample Rate +| :--- | :---: | +| ecapatdnn_voxceleb12 | 16k diff --git a/demos/speaker_verification/README_cn.md b/demos/speaker_verification/README_cn.md new file mode 100644 index 00000000..dd7a39fb --- /dev/null +++ b/demos/speaker_verification/README_cn.md @@ -0,0 +1,156 @@ +(简体中文|[English](./README.md)) + +# 声纹识别 +## 介绍 +声纹识别是一项用计算机程序自动提取说话人特征的技术。 + +这个 demo 是一个从给定音频文件提取说话人特征,它可以通过使用 `PaddleSpeech` 的单个命令或 python 中的几行代码来实现。 + +## 使用方法 +### 1. 安装 +请看[安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)。 + +你可以从 easy,medium,hard 三中方式中选择一种方式安装。 + +### 2. 准备输入 +这个 demo 的输入应该是一个 WAV 文件(`.wav`),并且采样率必须与模型的采样率相同。 + +可以下载此 demo 的示例音频: +```bash +# 该音频的内容是数字串 85236145389 +wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav +``` +### 3. 使用方法 +- 命令行 (推荐使用) + ```bash + paddlespeech vector --task spk --input 85236145389.wav + + echo -e "demo1 85236145389.wav" > vec.job + paddlespeech vector --task spk --input vec.job + + echo -e "demo2 85236145389.wav \n demo3 85236145389.wav" | paddlespeech vector --task spk + ``` + + 使用方法: + ```bash + paddlespeech asr --help + ``` + 参数: + - `input`(必须输入):用于识别的音频文件。 + - `model`:声纹任务的模型,默认值:`ecapatdnn_voxceleb12`。 + - `sample_rate`:音频采样率,默认值:`16000`。 + - `config`:声纹任务的参数文件,若不设置则使用预训练模型中的默认配置,默认值:`None`。 + - `ckpt_path`:模型参数文件,若不设置则下载预训练模型使用,默认值:`None`。 + - `device`:执行预测的设备,默认值:当前系统下 paddlepaddle 的默认 device。 + + 输出: + ```bash + demo [ -5.749211 9.505463 -8.200284 -5.2075014 5.3940268 + -3.04878 1.611095 10.127234 -10.534177 -15.821609 + 1.2032688 -0.35080156 1.2629458 -12.643498 -2.5758228 + -11.343508 2.3385992 -8.719341 14.213509 15.404744 + -0.39327756 6.338786 2.688887 8.7104025 17.469526 + -8.77959 7.0576906 4.648855 -1.3089896 -23.294737 + 8.013747 13.891729 -9.926753 5.655307 -5.9422326 + -22.842539 0.6293588 -18.46266 -10.811862 9.8192625 + 3.0070958 3.8072643 -2.3861165 3.0821571 -14.739942 + 1.7594414 -0.6485091 4.485623 2.0207152 7.264915 + -6.40137 23.63524 2.9711294 -22.708025 9.93719 + 20.354511 -10.324688 -0.700492 -8.783211 -5.27593 + 15.999649 3.3004563 12.747926 15.429879 4.7849145 + 5.6699696 -2.3826702 10.605882 3.9112158 3.1500628 + 15.859915 -2.1832209 -23.908653 -6.4799504 -4.5365124 + -9.224193 14.568347 -10.568833 4.982321 -4.342062 + 0.0914714 12.645902 -5.74285 -3.2141201 -2.7173362 + -6.680575 0.4757669 -5.035051 -6.7964664 16.865469 + -11.54324 7.681869 0.44475392 9.708182 -8.932846 + 0.4123232 -4.361452 1.3948607 9.511665 0.11667654 + 2.9079323 6.049952 9.275183 -18.078873 6.2983274 + -0.7500531 -2.725033 -7.6027865 3.3404543 2.990815 + 4.010979 11.000591 -2.8873312 7.1352735 -16.79663 + 18.495346 -14.293832 7.89578 2.2714825 22.976387 + -4.875734 -3.0836344 -2.9999814 13.751918 6.448228 + -11.924197 2.171869 2.0423572 -6.173772 10.778437 + 25.77281 -4.9495463 14.57806 0.3044315 2.6132357 + -7.591999 -2.076944 9.025118 1.7834753 -3.1799617 + -4.9401326 23.465864 5.1685796 -9.018578 9.037825 + -4.4150195 6.859591 -12.274467 -0.88911164 5.186309 + -3.9988663 -13.638606 -9.925445 -0.06329413 -3.6709652 + -12.397416 -12.719869 -1.395601 2.1150916 5.7381287 + -4.4691963 -3.82819 -0.84233856 -1.1604277 -13.490127 + 8.731719 -20.778936 -11.495662 5.8033476 -4.752041 + 10.833007 -6.717991 4.504732 13.4244375 1.1306485 + 7.3435574 1.400918 14.704036 -9.501399 7.2315617 + -6.417456 1.3333273 11.872697 -0.30664724 8.8845 + 6.5569253 4.7948146 0.03662816 -8.704245 6.224871 + -3.2701402 -11.508579 ] + ``` + +- Python API + ```python + import paddle + from paddlespeech.cli import VectorExecutor + + vector_executor = VectorExecutor() + audio_emb = vector_executor( + model='ecapatdnn_voxceleb12', + sample_rate=16000, + config=None, # Set `config` and `ckpt_path` to None to use pretrained model. + ckpt_path=None, + audio_file='./zh.wav', + force_yes=False, + device=paddle.get_device()) + print('Audio embedding Result: \n{}'.format(audio_emb)) + ``` + + 输出: + ```bash + # Vector Result: + [ -5.749211 9.505463 -8.200284 -5.2075014 5.3940268 + -3.04878 1.611095 10.127234 -10.534177 -15.821609 + 1.2032688 -0.35080156 1.2629458 -12.643498 -2.5758228 + -11.343508 2.3385992 -8.719341 14.213509 15.404744 + -0.39327756 6.338786 2.688887 8.7104025 17.469526 + -8.77959 7.0576906 4.648855 -1.3089896 -23.294737 + 8.013747 13.891729 -9.926753 5.655307 -5.9422326 + -22.842539 0.6293588 -18.46266 -10.811862 9.8192625 + 3.0070958 3.8072643 -2.3861165 3.0821571 -14.739942 + 1.7594414 -0.6485091 4.485623 2.0207152 7.264915 + -6.40137 23.63524 2.9711294 -22.708025 9.93719 + 20.354511 -10.324688 -0.700492 -8.783211 -5.27593 + 15.999649 3.3004563 12.747926 15.429879 4.7849145 + 5.6699696 -2.3826702 10.605882 3.9112158 3.1500628 + 15.859915 -2.1832209 -23.908653 -6.4799504 -4.5365124 + -9.224193 14.568347 -10.568833 4.982321 -4.342062 + 0.0914714 12.645902 -5.74285 -3.2141201 -2.7173362 + -6.680575 0.4757669 -5.035051 -6.7964664 16.865469 + -11.54324 7.681869 0.44475392 9.708182 -8.932846 + 0.4123232 -4.361452 1.3948607 9.511665 0.11667654 + 2.9079323 6.049952 9.275183 -18.078873 6.2983274 + -0.7500531 -2.725033 -7.6027865 3.3404543 2.990815 + 4.010979 11.000591 -2.8873312 7.1352735 -16.79663 + 18.495346 -14.293832 7.89578 2.2714825 22.976387 + -4.875734 -3.0836344 -2.9999814 13.751918 6.448228 + -11.924197 2.171869 2.0423572 -6.173772 10.778437 + 25.77281 -4.9495463 14.57806 0.3044315 2.6132357 + -7.591999 -2.076944 9.025118 1.7834753 -3.1799617 + -4.9401326 23.465864 5.1685796 -9.018578 9.037825 + -4.4150195 6.859591 -12.274467 -0.88911164 5.186309 + -3.9988663 -13.638606 -9.925445 -0.06329413 -3.6709652 + -12.397416 -12.719869 -1.395601 2.1150916 5.7381287 + -4.4691963 -3.82819 -0.84233856 -1.1604277 -13.490127 + 8.731719 -20.778936 -11.495662 5.8033476 -4.752041 + 10.833007 -6.717991 4.504732 13.4244375 1.1306485 + 7.3435574 1.400918 14.704036 -9.501399 7.2315617 + -6.417456 1.3333273 11.872697 -0.30664724 8.8845 + 6.5569253 4.7948146 0.03662816 -8.704245 6.224871 + -3.2701402 -11.508579 ] + ``` + +### 4.预训练模型 +以下是 PaddleSpeech 提供的可以被命令行和 python API 使用的预训练模型列表: + +| 模型 | 采样率 +| :--- | :---: | +| ecapatdnn_voxceleb12 | 16k + diff --git a/demos/speaker_verification/run.sh b/demos/speaker_verification/run.sh new file mode 100644 index 00000000..856886d3 --- /dev/null +++ b/demos/speaker_verification/run.sh @@ -0,0 +1,6 @@ +#!/bin/bash + +wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav + +# asr +paddlespeech vector --task spk --input ./85236145389.wav \ No newline at end of file diff --git a/docs/source/released_model.md b/docs/source/released_model.md index c5c65c82..354ecf30 100644 --- a/docs/source/released_model.md +++ b/docs/source/released_model.md @@ -75,10 +75,16 @@ Model Type | Dataset| Example Link | Pretrained Models | Static Models PANN | Audioset| [audioset_tagging_cnn](https://github.com/qiuqiangkong/audioset_tagging_cnn) | [panns_cnn6.pdparams](https://bj.bcebos.com/paddleaudio/models/panns_cnn6.pdparams), [panns_cnn10.pdparams](https://bj.bcebos.com/paddleaudio/models/panns_cnn10.pdparams), [panns_cnn14.pdparams](https://bj.bcebos.com/paddleaudio/models/panns_cnn14.pdparams) | [panns_cnn6_static.tar.gz](https://paddlespeech.bj.bcebos.com/cls/inference_model/panns_cnn6_static.tar.gz)(18M), [panns_cnn10_static.tar.gz](https://paddlespeech.bj.bcebos.com/cls/inference_model/panns_cnn10_static.tar.gz)(19M), [panns_cnn14_static.tar.gz](https://paddlespeech.bj.bcebos.com/cls/inference_model/panns_cnn14_static.tar.gz)(289M) PANN | ESC-50 |[pann-esc50](../../examples/esc50/cls0)|[esc50_cnn6.tar.gz](https://paddlespeech.bj.bcebos.com/cls/esc50/esc50_cnn6.tar.gz), [esc50_cnn10.tar.gz](https://paddlespeech.bj.bcebos.com/cls/esc50/esc50_cnn10.tar.gz), [esc50_cnn14.tar.gz](https://paddlespeech.bj.bcebos.com/cls/esc50/esc50_cnn14.tar.gz) +## Speaker Verification Models + +Model Type | Dataset| Example Link | Pretrained Models | Static Models +:-------------:| :------------:| :-----: | :-----: | :-----: +PANN | VoxCeleb| [voxceleb_ecapatdnn](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/voxceleb/sv0) | [ecapatdnn.tar.gz](https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_1.tar.gz) | - + ## Punctuation Restoration Models Model Type | Dataset| Example Link | Pretrained Models :-------------:| :------------:| :-----: | :-----: -Ernie Linear | IWLST2012_zh |[iwslt2012_punc0](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/iwslt2012/punc0)|[ernie_linear_p3_iwslt2012_zh_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/text/ernie_linear_p3_iwslt2012_zh_ckpt_0.1.1.zip) +Ernie Linear | IWLST2012_zh |[iwslt2012_punc0](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/iwslt2012/punc0)|[ernie_linear_p3_iwslt2012_zh_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_1.tar.gz) ## Speech Recognition Model from paddle 1.8 diff --git a/examples/voxceleb/sv0/RESULT.md b/examples/voxceleb/sv0/RESULT.md new file mode 100644 index 00000000..3df2af51 --- /dev/null +++ b/examples/voxceleb/sv0/RESULT.md @@ -0,0 +1,8 @@ +# VoxCeleb + +## ECAPA-TDNN + +| Model | Number of Params | Release | Config | Test set | Cosine | Cosine + S-Norm | +| --- | --- | --- | --- | --- | --- | --- | +| ECAPA-TDNN | 85MM | 0.1.1 | conf/model.yaml | test | 1.15 | 1.06 | + diff --git a/paddlespeech/cli/README.md b/paddlespeech/cli/README.md index 5ac7a3bc..19c82204 100644 --- a/paddlespeech/cli/README.md +++ b/paddlespeech/cli/README.md @@ -13,6 +13,12 @@ paddlespeech cls --input input.wav ``` + ## Speaker Verification + + ```bash + paddlespeech vector --task spk --input input_16k.wav + ``` + ## Automatic Speech Recognition ``` paddlespeech asr --lang zh --input input_16k.wav diff --git a/paddlespeech/cli/README_cn.md b/paddlespeech/cli/README_cn.md index 75ab9e41..4b15d6c7 100644 --- a/paddlespeech/cli/README_cn.md +++ b/paddlespeech/cli/README_cn.md @@ -12,6 +12,12 @@ ## 声音分类 ```bash paddlespeech cls --input input.wav + ``` + + ## 声纹识别 + + ```bash + paddlespeech vector --task spk --input input_16k.wav ``` ## 语音识别