PaddleSpeech/README.md

# Deep Speech 2 on PaddlePaddle

## Installation

Please replace `$PADDLE_INSTALL_DIR` with your own paddle installation directory.

```
pip install -r requirements.txt
export LD_LIBRARY_PATH=$PADDLE_INSTALL_DIR/Paddle/third_party/install/warpctc/lib:$LD_LIBRARY_PATH
```

For some machines, we also need to install libsndfile1. Details to be added.

## Usage

### Preparing Data

```
cd data
python librispeech.py
cat manifest.libri.train-* > manifest.libri.train-all
cd ..
```

After running librispeech.py, we have several "manifest" json files named with a prefix `manifest.libri.`. A manifest file summarizes a speech data set, with each line containing the meta data (i.e. audio filepath, transcription text, audio duration) of each audio file within the data set, in json format.

By `cat manifest.libri.train-* > manifest.libri.train-all`, we simply merge the three seperate sample sets of LibriSpeech (train-clean-100, train-clean-360, train-other-500) into one training set. This is a simple way for merging different data sets.

More help for arguments:

```
python librispeech.py --help
```

### Traininig

For GPU Training:

```
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --trainer_count 4 --train_manifest_path ./data/manifest.libri.train-all
```

For CPU Training:

```
python train.py --trainer_count 8 --use_gpu False -- train_manifest_path ./data/manifest.libri.train-all
```

More help for arguments:

```
python train.py --help
```

### Inferencing

```
python infer.py
```

More help for arguments:

```
python infer.py --help
```
Add librispeech dataset, audio data provider and simplfied DeepSpeech2 model configuration. Bug exists when run training. 7 years ago			`# Deep Speech 2 on PaddlePaddle`

Update DS2 README.md and fix bug in librispeech.py 7 years ago			`## Installation`
1. Fix incorrect decoder result printing. 2. Fix incorrect batch-norm usage in RNN. 3. Fix overlapping train/dev/test manfests. 4. Update README.md and requirements.txt. 5. Expose more arguments to users in argparser. 6. Update all other details. 7 years ago
Update DS2 README.md and fix bug in librispeech.py 7 years ago			Please replace `$PADDLE_INSTALL_DIR` with your own paddle installation directory.
1. Fix incorrect decoder result printing. 2. Fix incorrect batch-norm usage in RNN. 3. Fix overlapping train/dev/test manfests. 4. Update README.md and requirements.txt. 5. Expose more arguments to users in argparser. 6. Update all other details. 7 years ago
			```
			`pip install -r requirements.txt`
			`export LD_LIBRARY_PATH=$PADDLE_INSTALL_DIR/Paddle/third_party/install/warpctc/lib:$LD_LIBRARY_PATH`
			```

			`For some machines, we also need to install libsndfile1. Details to be added.`

Update DS2 README.md and fix bug in librispeech.py 7 years ago			`## Usage`

			`### Preparing Data`
1. Fix incorrect decoder result printing. 2. Fix incorrect batch-norm usage in RNN. 3. Fix overlapping train/dev/test manfests. 4. Update README.md and requirements.txt. 5. Expose more arguments to users in argparser. 6. Update all other details. 7 years ago
Add librispeech dataset, audio data provider and simplfied DeepSpeech2 model configuration. Bug exists when run training. 7 years ago			```
Refactor decoder interfaces and add ./data directory. 7 years ago			`cd data`
Add librispeech dataset, audio data provider and simplfied DeepSpeech2 model configuration. Bug exists when run training. 7 years ago			`python librispeech.py`
Refine librispeech.py for DeepSpeech2. Summary: 1. Add manifest line check. 2. Avoid re-unpacking if unpacked data already exists. 3. Add full_download (download all 7 sub-datasets of LibriSpeech). 7 years ago			`cat manifest.libri.train-* > manifest.libri.train-all`
Refactor decoder interfaces and add ./data directory. 7 years ago			`cd ..`
Add librispeech dataset, audio data provider and simplfied DeepSpeech2 model configuration. Bug exists when run training. 7 years ago			```
Add infererence and add SortaGrad for only first pass. 7 years ago
Remove manifest's line number check from librispeech.py and update README.md. 7 years ago			After running librispeech.py, we have several "manifest" json files named with a prefix `manifest.libri.`. A manifest file summarizes a speech data set, with each line containing the meta data (i.e. audio filepath, transcription text, audio duration) of each audio file within the data set, in json format.

			By `cat manifest.libri.train-* > manifest.libri.train-all`, we simply merge the three seperate sample sets of LibriSpeech (train-clean-100, train-clean-360, train-other-500) into one training set. This is a simple way for merging different data sets.

1. Fix incorrect decoder result printing. 2. Fix incorrect batch-norm usage in RNN. 3. Fix overlapping train/dev/test manfests. 4. Update README.md and requirements.txt. 5. Expose more arguments to users in argparser. 6. Update all other details. 7 years ago			`More help for arguments:`

			```
			`python librispeech.py --help`
			```

			`### Traininig`

			`For GPU Training:`

			```
Refine librispeech.py for DeepSpeech2. Summary: 1. Add manifest line check. 2. Avoid re-unpacking if unpacked data already exists. 3. Add full_download (download all 7 sub-datasets of LibriSpeech). 7 years ago			`CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --trainer_count 4 --train_manifest_path ./data/manifest.libri.train-all`
1. Fix incorrect decoder result printing. 2. Fix incorrect batch-norm usage in RNN. 3. Fix overlapping train/dev/test manfests. 4. Update README.md and requirements.txt. 5. Expose more arguments to users in argparser. 6. Update all other details. 7 years ago			```

			`For CPU Training:`

			```
Refine librispeech.py for DeepSpeech2. Summary: 1. Add manifest line check. 2. Avoid re-unpacking if unpacked data already exists. 3. Add full_download (download all 7 sub-datasets of LibriSpeech). 7 years ago			`python train.py --trainer_count 8 --use_gpu False -- train_manifest_path ./data/manifest.libri.train-all`
1. Fix incorrect decoder result printing. 2. Fix incorrect batch-norm usage in RNN. 3. Fix overlapping train/dev/test manfests. 4. Update README.md and requirements.txt. 5. Expose more arguments to users in argparser. 6. Update all other details. 7 years ago			```

			`More help for arguments:`

			```
			`python train.py --help`
			```

			`### Inferencing`

			```
			`python infer.py`
			```

			`More help for arguments:`

			```
			`python infer.py --help`
			```