You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
147 lines
4.7 KiB
147 lines
4.7 KiB
2 years ago
|
This VAD library can process audio in real-time utilizing
|
||
|
[Gaussian Mixture Model](http://en.wikipedia.org/wiki/Mixture_model#Gaussian_mixture_model) (GMM)
|
||
|
which helps identify presence of human speech in an audio sample that contains a mixture of speech
|
||
|
and noise. VAD work offline and all processing done on device.
|
||
|
|
||
|
Library based on
|
||
|
[WebRTC VAD](https://chromium.googlesource.com/external/webrtc/+/branch-heads/43/webrtc/common_audio/vad/)
|
||
|
from Google which is reportedly one of the best available: it's fast, modern and free.
|
||
|
This algorithm has found wide adoption and has recently become one of
|
||
|
the gold-standards for delay-sensitive scenarios like web-based interaction.
|
||
|
|
||
|
If you are looking for a higher accuracy and faster processing time I recommend to use Deep Neural
|
||
|
Networks(DNN). Please see for reference the following paper with
|
||
|
[DNN vs GMM](https://www.microsoft.com/en-us/research/uploads/prod/2018/02/KoPhiliposeTashevZarar_ICASSP_2018.pdf)
|
||
|
comparison.
|
||
|
|
||
|
<p align="center">
|
||
|
<img src="https://raw.githubusercontent.com/gkonovalov/android-vad/master/demo.gif" alt="drawing" height="400"/>
|
||
|
</p>
|
||
|
|
||
|
## Parameters
|
||
|
VAD library only accepts 16-bit mono PCM audio stream and can work with next Sample Rates, Frame Sizes and Classifiers.
|
||
|
<table>
|
||
|
<tr>
|
||
|
<td>
|
||
|
 
|
||
|
|
||
|
| Valid Sample Rate | Valid Frame Size |
|
||
|
|:-------------------|:------------------|
|
||
|
| 8000Hz | 80, 160, 240 |
|
||
|
| 16000Hz | 160, 320, 480 |
|
||
|
| 32000Hz | 320, 640, 960 |
|
||
|
| 48000Hz | 480, 960, 1440 |
|
||
|
</td>
|
||
|
<td>
|
||
|
 
|
||
|
|
||
|
| Valid Classifiers |
|
||
|
|:------------------|
|
||
|
| NORMAL |
|
||
|
| LOW_BITRATE |
|
||
|
| AGGRESSIVE |
|
||
|
| VERY_AGGRESSIVE |
|
||
|
</td>
|
||
|
</tr>
|
||
|
</table>
|
||
|
|
||
|
|
||
|
**Silence duration (ms)** - this parameter used in Continuous Speech detector,
|
||
|
the value of this parameter will define the necessary and sufficient
|
||
|
duration of negative results to recognize it as silence.
|
||
|
|
||
|
**Voice duration (ms)** - this parameter used in Continuous Speech detector,
|
||
|
the value of this parameter will define the necessary and sufficient
|
||
|
duration of positive results to recognize result as speech.
|
||
|
|
||
|
Recommended parameters:
|
||
|
* Sample Rate - **16KHz**,
|
||
|
* Frame Size - **160**,
|
||
|
* Mode - **VERY_AGGRESSIVE**,
|
||
|
* Silence Duration - **500ms**,
|
||
|
* Voice Duration - **500ms**;
|
||
|
|
||
|
## Usage
|
||
|
VAD supports 2 different ways of detecting speech:
|
||
|
1. Continuous Speech listener was designed to detect long utterances
|
||
|
without returning false positive results when user makes pauses between
|
||
|
sentences.
|
||
|
```java
|
||
|
Vad vad = new Vad(VadConfig.newBuilder()
|
||
|
.setSampleRate(VadConfig.SampleRate.SAMPLE_RATE_16K)
|
||
|
.setFrameSize(VadConfig.FrameSize.FRAME_SIZE_160)
|
||
|
.setMode(VadConfig.Mode.VERY_AGGRESSIVE)
|
||
|
.setSilenceDurationMillis(500)
|
||
|
.setVoiceDurationMillis(500)
|
||
|
.build());
|
||
|
|
||
|
vad.start();
|
||
|
|
||
|
vad.addContinuousSpeechListener(short[] audioFrame, new VadListener() {
|
||
|
@Override
|
||
|
public void onSpeechDetected() {
|
||
|
//speech detected!
|
||
|
}
|
||
|
|
||
|
@Override
|
||
|
public void onNoiseDetected() {
|
||
|
//noise detected!
|
||
|
}
|
||
|
});
|
||
|
|
||
|
vad.stop();
|
||
|
```
|
||
|
|
||
|
2. Speech detector was designed to detect speech/noise in small audio
|
||
|
frames and return result for every frame. This method will not work for
|
||
|
long utterances.
|
||
|
```java
|
||
|
Vad vad = new Vad(VadConfig.newBuilder()
|
||
|
.setSampleRate(VadConfig.SampleRate.SAMPLE_RATE_16K)
|
||
|
.setFrameSize(VadConfig.FrameSize.FRAME_SIZE_160)
|
||
|
.setMode(VadConfig.Mode.VERY_AGGRESSIVE)
|
||
|
.build());
|
||
|
|
||
|
vad.start();
|
||
|
|
||
|
boolean isSpeech = vad.isSpeech(short[] audioFrame);
|
||
|
|
||
|
vad.stop();
|
||
|
```
|
||
|
## Requirements
|
||
|
Android VAD supports Android 4.1 (Jelly Bean) and later.
|
||
|
|
||
|
## Development
|
||
|
|
||
|
To open the project in Android Studio:
|
||
|
|
||
|
1. Go to *File* menu or the *Welcome Screen*
|
||
|
2. Click on *Open...*
|
||
|
3. Navigate to VAD's root directory.
|
||
|
4. Select `setting.gradle`
|
||
|
|
||
|
## Download
|
||
|
[![](https://jitpack.io/v/gkonovalov/android-vad.svg)](https://jitpack.io/#gkonovalov/android-vad)
|
||
|
|
||
|
|
||
|
Gradle is the only supported build configuration, so just add the dependency to your project `build.gradle` file:
|
||
|
1. Add it in your root build.gradle at the end of repositories:
|
||
|
```groovy
|
||
|
allprojects {
|
||
|
repositories {
|
||
|
maven { url 'https://jitpack.io' }
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
2. Add the dependency
|
||
|
```groovy
|
||
|
dependencies {
|
||
|
implementation 'com.github.gkonovalov:android-vad:1.0.1'
|
||
|
}
|
||
|
```
|
||
|
|
||
|
You also can download precompiled AAR library and APK files from GitHub's [releases page](https://github.com/gkonovalov/android-vad/releases).
|
||
|
|
||
|
------------
|
||
|
Georgiy Konovalov 2021 (c) [MIT License](https://opensource.org/licenses/MIT)
|