# Silero VAD - pre-trained enterprise-grade Voice Activity Detector This directory provides VAD models on CPU/GPU. ![](https://user-images.githubusercontent.com/36505480/198026365-8da383e0-5398-4a12-b7f8-22c2c0059512.png) ## VAD Interface For vad interface please see [](../../engine/vad/interface/). ### Create Handdle ```c++ PPSHandle_t PPSVadCreateInstance(const char* conf_path); ``` ### Destroy Handdle ```c++ int PPSVadDestroyInstance(PPSHandle_t instance); ``` ### Reset Vad State ```c++ int PPSVadReset(PPSHandle_t instance); ``` Reset Vad state before processing next `wav`. ### Get Chunk Size ```c++ int PPSVadChunkSizeSamples(PPSHandle_t instance); ``` This API will return chunk size in `sample` unit. When do forward, we need feed `chunk size` samples, except last chunk. ### Vad Forward ```c++ PPSVadState_t PPSVadFeedForward(PPSHandle_t instance, float* chunk, int num_element); ``` Vad has below states: ```c++ typedef enum { PPS_VAD_ILLEGAL = 0, // error PPS_VAD_SIL, // silence PPS_VAD_START, // start speech PPS_VAD_SPEECH, // in speech PPS_VAD_END, // end speech PPS_VAD_NUMSTATES, // number of states } PPSVadState_t; ``` If `PPSVadFeedForward` occur an error will return `PPS_VAD_ILLEGAL` state. ## Linux ### Build Runtime ```bash # cd /path/to/paddlespeech/runtime cmake -B build -DBUILD_SHARED_LIBS=OFF -DWITH_ASR=OFF -DWITH_CLS=OFF -DWITH_VAD=ON cmake --build build ``` Since VAD using FastDeploy runtime, if you have another FastDeploy Library, you can using this command to build: ```bash # cd /path/to/paddlespeech/runtime cmake -B build -DBUILD_SHARED_LIBS=OFF -DWITH_ASR=OFF -DWITH_CLS=OFF -DWITH_VAD=ON -DFASTDEPLOY_INSTALL_DIR=/workspace//paddle/FastDeploy/build/Linux/x86_64/install cmake --build build ``` `DFASTDEPLOY_INSTALL_DIR` is the directory of FastDeploy Library. ### Run Demo After building success, we can do this to run demo under this example dir: ```bash bash run.sh ``` The output like these: ```bash /workspace//PaddleSpeech/runtime/engine/vad/nnet/vad.cc(88)::SetConfig sr=16 threshold=0.5 beam=0.15 frame_ms=32 min_silence_duration_ms=200 speech_pad_left_ms=0 speech_pad_right_ms=0[INFO] fastdeploy/runtime/runtime.cc(293)::CreateOrtBackend Runtime initialized with Backend::ORT in Device::CPU./workspace//PaddleSpeech/runtime/engine/vad/nnet/vad.cc(137)::Initialize init done.[SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [STA] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [END] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [STA] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SIL] [SIL] [SIL] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [END] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [STA] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [END] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [STA] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SIL] [SIL] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SPE] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [END] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] [SIL] RTF=0.00774591 speak start: 0.32 s, end: 2.464 s | speak start: 3.296 s, end: 4.64 s | speak start: 5.408 s, end: 7.872 s | speak start: 8.192 s, end: 10.72 s vad_nnet_main done! sr = 16000 frame_ms = 32 threshold = 0.5 beam = 0.15 min_silence_duration_ms = 200 speech_pad_left_ms = 0 speech_pad_right_ms = 0 model_path = ./data/silero_vad/silero_vad.onnx param_path = (default)num_cpu_thread = 1(default)/workspace//PaddleSpeech/runtime/engine/vad/nnet/vad.cc(88)::SetConfig sr=16 threshold=0.5 beam=0.15 frame_ms=32 min_silence_duration_ms=200 speech_pad_left_ms=0 speech_pad_right_ms=0[INFO] fastdeploy/runtime/runtime.cc(293)::CreateOrtBackend Runtime initialized with Backend::ORT in Device::CPU./workspace//PaddleSpeech/runtime/engine/vad/nnet/vad.cc(137)::Initialize init done. 1 1 1 1 1 1 1 1 1 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 2 3 3 3 3 3 3 3 3 3 3 3 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 RTF=0.00778218 vad_interface_main done! ``` ## Android When to using on Android, please setup your `NDK` enverment before, then do as below: ```bash # cd /path/to/paddlespeech/runtime bash build_android.sh ``` ## Result | Arch | RTF | Runtime Size | |--|--|--| | x86_64 | 0.00778218 | | | arm64-v8a | 0.00744745 | ~10.532MB | ## Machine Information #### x86_64 The environment as below: ```text Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 80 On-line CPU(s) list: 0-79 Thread(s) per core: 2 Core(s) per socket: 20 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz Stepping: 7 CPU MHz: 2599.998 BogoMIPS: 5199.99 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 33792K NUMA node0 CPU(s): 0-39 NUMA node1 CPU(s): 40-79 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc eagerfpu pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat umip pku ospke avx512_vnni spec_ctrl arch_capabilities ``` #### arm64-v8a ```text Processor : AArch64 Processor rev 14 (aarch64) processor : 0 BogoMIPS : 38.40 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop CPU implementer : 0x51 CPU architecture: 8 CPU variant : 0xd CPU part : 0x805 CPU revision : 14 processor : 1 BogoMIPS : 38.40 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop CPU implementer : 0x51 CPU architecture: 8 CPU variant : 0xd CPU part : 0x805 CPU revision : 14 processor : 2 BogoMIPS : 38.40 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop CPU implementer : 0x51 CPU architecture: 8 CPU variant : 0xd CPU part : 0x805 CPU revision : 14 processor : 3 BogoMIPS : 38.40 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop CPU implementer : 0x51 CPU architecture: 8 CPU variant : 0xd CPU part : 0x805 CPU revision : 14 processor : 4 BogoMIPS : 38.40 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop CPU implementer : 0x51 CPU architecture: 8 CPU variant : 0xd CPU part : 0x804 CPU revision : 14 processor : 5 BogoMIPS : 38.40 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop CPU implementer : 0x51 CPU architecture: 8 CPU variant : 0xd CPU part : 0x804 CPU revision : 14 processor : 6 BogoMIPS : 38.40 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop CPU implementer : 0x51 CPU architecture: 8 CPU variant : 0xd CPU part : 0x804 CPU revision : 14 processor : 7 BogoMIPS : 38.40 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop CPU implementer : 0x51 CPU architecture: 8 CPU variant : 0xd CPU part : 0x804 CPU revision : 14 Hardware : Qualcomm Technologies, Inc SM8150 ``` ## Download Pre-trained ONNX Model For developers' testing, model exported by VAD are provided below. Developers can download them directly. | 模型 | 大小 | 备注 | | :----------------------------------------------------------- | :---- | :----------------------------------------------------------- | | [silero-vad](https://bj.bcebos.com/paddlehub/fastdeploy/silero_vad.tgz) | 1.8MB | This model file is sourced from [snakers4/silero-vad](https://github.com/snakers4/silero-vad),MIT License | ## FastDeploy Runtime For FastDeploy software and hardware requements, and pre-released library please to see [FastDeploy](https://github.com/PaddlePaddle/FastDeploy): - 1. [FastDeploy Environment Requirements](https://github.com/PaddlePaddle/FastDeploy/docs/en/build_and_install/download_prebuilt_libraries.md). - 2. [FastDeploy Precompiled Library](https://github.com/PaddlePaddle/FastDeploy/docs/en/build_and_install/download_prebuilt_libraries.md). ## Reference * https://github.com/snakers4/silero-vad * https://github.com/PaddlePaddle/FastDeploy/blob/develop/examples/audio/silero-vad/README.md