PaddleSpeech/README.md

# Deep Speech 2 on PaddlePaddle

## Installation

Please replace `$PADDLE_INSTALL_DIR` with your own paddle installation directory.

```
sh setup.sh
export LD_LIBRARY_PATH=$PADDLE_INSTALL_DIR/Paddle/third_party/install/warpctc/lib:$LD_LIBRARY_PATH
```

For some machines, we also need to install libsndfile1. Details to be added.

## Usage

### Preparing Data

```
cd datasets
sh run_all.sh
cd ..
```

`sh run_all.sh` prepares all ASR datasets (currently, only LibriSpeech available). After running, we have several summarization manifest files in json-format.

A manifest file summarizes a speech data set, with each line containing the meta data (i.e. audio filepath, transcript text, audio duration) of each audio file within the data set, in json format. Manifest file serves as an interface informing our system of  where and what to read the speech samples.


More help for arguments:

```
python datasets/librispeech/librispeech.py --help
```

### Preparing for Training

```
python compute_mean_std.py
```

`python compute_mean_std.py` computes mean and stdandard deviation for audio features, and save them to a file with a default name `./mean_std.npz`. This file will be used in both training and inferencing. The default feature of audio data is power spectrum, currently the mfcc feature is also supported. To train and infer based on mfcc feature, you can regenerate this file by

```
python compute_mean_std.py --specgram_type mfcc
```

and specify the ```specgram_type``` to ```mfcc``` in each step, including training, inference etc.

More help for arguments:

```
python compute_mean_std.py --help
```

### Training

For GPU Training:

```
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train.py
```

For CPU Training:

```
python train.py --use_gpu False
```

More help for arguments:

```
python train.py --help
```

### Preparing language model

The following steps, inference, parameters tuning and evaluating, will require a language model during decoding.
A compressed language model is provided and can be accessed by

```
cd ./lm
sh run.sh
cd ..
```

### Inference

For GPU inference

```
CUDA_VISIBLE_DEVICES=0 python infer.py
```

For CPU inference

```
python infer.py --use_gpu=False
```

More help for arguments:

```
python infer.py --help
```

### Evaluating

```
CUDA_VISIBLE_DEVICES=0 python evaluate.py
```

More help for arguments:

```
python evaluate.py --help
```

### Parameters tuning

Usually, the parameters $\alpha$ and $\beta$ for the CTC [prefix beam search](https://arxiv.org/abs/1408.2873) decoder need to be tuned after retraining the acoustic model.

For GPU tuning

```
CUDA_VISIBLE_DEVICES=0 python tune.py
```

For CPU tuning

```
python tune.py --use_gpu=False
```

More help for arguments:

```
python tune.py --help
```

Then reset parameters with the tuning result before inference or evaluating.
Add librispeech dataset, audio data provider and simplfied DeepSpeech2 model configuration. Bug exists when run training. 7 years ago			`# Deep Speech 2 on PaddlePaddle`

Update DS2 README.md and fix bug in librispeech.py 7 years ago			`## Installation`
1. Fix incorrect decoder result printing. 2. Fix incorrect batch-norm usage in RNN. 3. Fix overlapping train/dev/test manfests. 4. Update README.md and requirements.txt. 5. Expose more arguments to users in argparser. 6. Update all other details. 7 years ago
Update DS2 README.md and fix bug in librispeech.py 7 years ago			Please replace `$PADDLE_INSTALL_DIR` with your own paddle installation directory.
1. Fix incorrect decoder result printing. 2. Fix incorrect batch-norm usage in RNN. 3. Fix overlapping train/dev/test manfests. 4. Update README.md and requirements.txt. 5. Expose more arguments to users in argparser. 6. Update all other details. 7 years ago
			```
Follow comments. 7 years ago			`sh setup.sh`
1. Fix incorrect decoder result printing. 2. Fix incorrect batch-norm usage in RNN. 3. Fix overlapping train/dev/test manfests. 4. Update README.md and requirements.txt. 5. Expose more arguments to users in argparser. 6. Update all other details. 7 years ago			`export LD_LIBRARY_PATH=$PADDLE_INSTALL_DIR/Paddle/third_party/install/warpctc/lib:$LD_LIBRARY_PATH`
			```

			`For some machines, we also need to install libsndfile1. Details to be added.`

Update DS2 README.md and fix bug in librispeech.py 7 years ago			`## Usage`

			`### Preparing Data`
1. Fix incorrect decoder result printing. 2. Fix incorrect batch-norm usage in RNN. 3. Fix overlapping train/dev/test manfests. 4. Update README.md and requirements.txt. 5. Expose more arguments to users in argparser. 6. Update all other details. 7 years ago
Add librispeech dataset, audio data provider and simplfied DeepSpeech2 model configuration. Bug exists when run training. 7 years ago			```
Update README.md for DS2. 7 years ago			`cd datasets`
			`sh run_all.sh`
Refactor decoder interfaces and add ./data directory. 7 years ago			`cd ..`
Add librispeech dataset, audio data provider and simplfied DeepSpeech2 model configuration. Bug exists when run training. 7 years ago			```
Add infererence and add SortaGrad for only first pass. 7 years ago
Update README.md for DS2. 7 years ago			`sh run_all.sh` prepares all ASR datasets (currently, only LibriSpeech available). After running, we have several summarization manifest files in json-format.
Remove manifest's line number check from librispeech.py and update README.md. 7 years ago
Update README.md for DS2. 7 years ago			`A manifest file summarizes a speech data set, with each line containing the meta data (i.e. audio filepath, transcript text, audio duration) of each audio file within the data set, in json format. Manifest file serves as an interface informing our system of where and what to read the speech samples.`


			`More help for arguments:`

			```
			`python datasets/librispeech/librispeech.py --help`
			```

			`### Preparing for Training`

			```
			`python compute_mean_std.py`
			```

add mfcc feature for DS2 7 years ago			`python compute_mean_std.py` computes mean and stdandard deviation for audio features, and save them to a file with a default name `./mean_std.npz`. This file will be used in both training and inferencing. The default feature of audio data is power spectrum, currently the mfcc feature is also supported. To train and infer based on mfcc feature, you can regenerate this file by

			```
			`python compute_mean_std.py --specgram_type mfcc`
			```
Remove manifest's line number check from librispeech.py and update README.md. 7 years ago
update several scripts to support mfcc 7 years ago			and specify the ```specgram_type``` to ```mfcc``` in each step, including training, inference etc.

1. Fix incorrect decoder result printing. 2. Fix incorrect batch-norm usage in RNN. 3. Fix overlapping train/dev/test manfests. 4. Update README.md and requirements.txt. 5. Expose more arguments to users in argparser. 6. Update all other details. 7 years ago			`More help for arguments:`

			```
Update README.md for DS2. 7 years ago			`python compute_mean_std.py --help`
1. Fix incorrect decoder result printing. 2. Fix incorrect batch-norm usage in RNN. 3. Fix overlapping train/dev/test manfests. 4. Update README.md and requirements.txt. 5. Expose more arguments to users in argparser. 6. Update all other details. 7 years ago			```

Update README.md for DS2. 7 years ago			`### Training`
1. Fix incorrect decoder result printing. 2. Fix incorrect batch-norm usage in RNN. 3. Fix overlapping train/dev/test manfests. 4. Update README.md and requirements.txt. 5. Expose more arguments to users in argparser. 6. Update all other details. 7 years ago
			`For GPU Training:`

			```
Improve audio featurizer and add shift augmentor. 1. Improve audio featurizer. 2. Add shift augmentor. 3. Update default argument to be the current best seggestion. 4. Add checkpoints with pass id. 7 years ago			`CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train.py`
1. Fix incorrect decoder result printing. 2. Fix incorrect batch-norm usage in RNN. 3. Fix overlapping train/dev/test manfests. 4. Update README.md and requirements.txt. 5. Expose more arguments to users in argparser. 6. Update all other details. 7 years ago			```

			`For CPU Training:`

			```
Improve audio featurizer and add shift augmentor. 1. Improve audio featurizer. 2. Add shift augmentor. 3. Update default argument to be the current best seggestion. 4. Add checkpoints with pass id. 7 years ago			`python train.py --use_gpu False`
1. Fix incorrect decoder result printing. 2. Fix incorrect batch-norm usage in RNN. 3. Fix overlapping train/dev/test manfests. 4. Update README.md and requirements.txt. 5. Expose more arguments to users in argparser. 6. Update all other details. 7 years ago			```

			`More help for arguments:`

			```
			`python train.py --help`
			```

upload the language model 7 years ago			`### Preparing language model`

			`The following steps, inference, parameters tuning and evaluating, will require a language model during decoding.`
			`A compressed language model is provided and can be accessed by`

			```
			`cd ./lm`
			`sh run.sh`
			`cd ..`
			```

			`### Inference`

			`For GPU inference`
1. Fix incorrect decoder result printing. 2. Fix incorrect batch-norm usage in RNN. 3. Fix overlapping train/dev/test manfests. 4. Update README.md and requirements.txt. 5. Expose more arguments to users in argparser. 6. Update all other details. 7 years ago
			```
Update README.md for DS2. 7 years ago			`CUDA_VISIBLE_DEVICES=0 python infer.py`
1. Fix incorrect decoder result printing. 2. Fix incorrect batch-norm usage in RNN. 3. Fix overlapping train/dev/test manfests. 4. Update README.md and requirements.txt. 5. Expose more arguments to users in argparser. 6. Update all other details. 7 years ago			```

upload the language model 7 years ago			`For CPU inference`

			```
			`python infer.py --use_gpu=False`
			```

1. Fix incorrect decoder result printing. 2. Fix incorrect batch-norm usage in RNN. 3. Fix overlapping train/dev/test manfests. 4. Update README.md and requirements.txt. 5. Expose more arguments to users in argparser. 6. Update all other details. 7 years ago			`More help for arguments:`

			```
			`python infer.py --help`
			```
append README.md 7 years ago
			`### Evaluating`

			```
			`CUDA_VISIBLE_DEVICES=0 python evaluate.py`
			```

			`More help for arguments:`

			```
			`python evaluate.py --help`
			```

			`### Parameters tuning`

upload the language model 7 years ago			`Usually, the parameters $\alpha$ and $\beta$ for the CTC [prefix beam search](https://arxiv.org/abs/1408.2873) decoder need to be tuned after retraining the acoustic model.`

			`For GPU tuning`
append README.md 7 years ago
			```
			`CUDA_VISIBLE_DEVICES=0 python tune.py`
			```

upload the language model 7 years ago			`For CPU tuning`

			```
			`python tune.py --use_gpu=False`
			```

append README.md 7 years ago			`More help for arguments:`

			```
			`python tune.py --help`
			```
upload the language model 7 years ago
			`Then reset parameters with the tuning result before inference or evaluating.`