This VAD library can process audio in real-time utilizing [Gaussian Mixture Model](http://en.wikipedia.org/wiki/Mixture_model#Gaussian_mixture_model) (GMM) which helps identify presence of human speech in an audio sample that contains a mixture of speech and noise. VAD work offline and all processing done on device. Library based on [WebRTC VAD](https://chromium.googlesource.com/external/webrtc/+/branch-heads/43/webrtc/common_audio/vad/) from Google which is reportedly one of the best available: it's fast, modern and free. This algorithm has found wide adoption and has recently become one of the gold-standards for delay-sensitive scenarios like web-based interaction. If you are looking for a higher accuracy and faster processing time I recommend to use Deep Neural Networks(DNN). Please see for reference the following paper with [DNN vs GMM](https://www.microsoft.com/en-us/research/uploads/prod/2018/02/KoPhiliposeTashevZarar_ICASSP_2018.pdf) comparison.
## Parameters VAD library only accepts 16-bit mono PCM audio stream and can work with next Sample Rates, Frame Sizes and Classifiers.
  | Valid Sample Rate | Valid Frame Size | |:-------------------|:------------------| | 8000Hz | 80, 160, 240 | | 16000Hz | 160, 320, 480 | | 32000Hz | 320, 640, 960 | | 48000Hz | 480, 960, 1440 | |   | Valid Classifiers | |:------------------| | NORMAL | | LOW_BITRATE | | AGGRESSIVE | | VERY_AGGRESSIVE | |