Re-design README.md doc structure and add table of contents.

pull/2/head
Xinghai Sun 7 years ago
parent d776ce9bd7
commit 861b946d7a

@ -1,18 +1,39 @@
# DeepSpeech2 on PaddlePaddle # DeepSpeech2 on PaddlePaddle
>TODO: to be updated, since the directory hierarchy was changed. *DeepSpeech2 on PaddlePaddle* is an open-source implementation of end-to-end Automatic Speech Recognition (ASR) engine, based on [Baidu's Deep Speech 2 paper](http://proceedings.mlr.press/v48/amodei16.pdf), with [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform. Our vision is to empower both industrial application and academic research on speech-to-text, via an easy-to-use, efficent and scalable integreted implementation, including training & inferencing module, distributed [PaddleCloud](https://github.com/PaddlePaddle/cloud) training, and demo deployment. Besides, several pre-trained models for both English and Mandarin speech are also released.
## Table of Contents
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Getting Started](#getting-started)
- [Data Preparation](#data-preparation)
- [Training a Model](#training-a-model)
- [Inference and Evaluation](#inference-and-evaluation)
- [Distributed Cloud Training](#distributed-cloud-training)
- [Hyper-parameters Tuning](#hyper-parameters-tuning)
- [Trying Live Demo with Your Own Voice](#trying-live-demo-with-your-own-voice)
- [Experiments and Benchmarks](#experiments-and-benchmarks)
- [Questions and Help](#questions-and-help)
## Prerequisites
- Only support Python 2.7
- PaddlePaddle the latest version (please refer to the [Installation Guide](https://github.com/PaddlePaddle/Paddle#installation))
## Installation ## Installation
Please install the [prerequisites](#prerequisites) above before moving on this.
``` ```
git clone https://github.com/PaddlePaddle/models.git
cd models/deep_speech_2
sh setup.sh sh setup.sh
``` ```
Please replace `$PADDLE_INSTALL_DIR` with your own paddle installation directory. ## Getting Started
## Usage TODO
### Preparing Data ## Data Preparation
``` ```
cd datasets cd datasets
@ -31,7 +52,7 @@ More help for arguments:
python datasets/librispeech/librispeech.py --help python datasets/librispeech/librispeech.py --help
``` ```
### Preparing for Training
``` ```
python tools/compute_mean_std.py python tools/compute_mean_std.py
@ -51,7 +72,7 @@ More help for arguments:
python tools/compute_mean_std.py --help python tools/compute_mean_std.py --help
``` ```
### Training ## Training a model
For GPU Training: For GPU Training:
@ -71,7 +92,7 @@ More help for arguments:
python train.py --help python train.py --help
``` ```
### Preparing language model ### Inference and Evaluation
The following steps, inference, parameters tuning and evaluating, will require a language model during decoding. The following steps, inference, parameters tuning and evaluating, will require a language model during decoding.
A compressed language model is provided and can be accessed by A compressed language model is provided and can be accessed by
@ -82,7 +103,7 @@ sh run.sh
cd .. cd ..
``` ```
### Inference
For GPU inference For GPU inference
@ -102,7 +123,6 @@ More help for arguments:
python infer.py --help python infer.py --help
``` ```
### Evaluating
``` ```
CUDA_VISIBLE_DEVICES=0 python evaluate.py CUDA_VISIBLE_DEVICES=0 python evaluate.py
@ -114,7 +134,7 @@ More help for arguments:
python evaluate.py --help python evaluate.py --help
``` ```
### Parameters tuning ## Hyper-parameters Tuning
Usually, the parameters $\alpha$ and $\beta$ for the CTC [prefix beam search](https://arxiv.org/abs/1408.2873) decoder need to be tuned after retraining the acoustic model. Usually, the parameters $\alpha$ and $\beta$ for the CTC [prefix beam search](https://arxiv.org/abs/1408.2873) decoder need to be tuned after retraining the acoustic model.
@ -138,7 +158,12 @@ python tune.py --help
Then reset parameters with the tuning result before inference or evaluating. Then reset parameters with the tuning result before inference or evaluating.
### Playing with the ASR Demo ## Distributed Cloud Training
If you wish to train DeepSpeech2 on PaddleCloud, please refer to
[Train DeepSpeech2 on PaddleCloud](https://github.com/PaddlePaddle/models/tree/develop/deep_speech_2/cloud).
## Trying Live Demo with Your Own Voice
A real-time ASR demo is built for users to try out the ASR model with their own voice. Please do the following installation on the machine you'd like to run the demo's client (no need for the machine running the demo's server). A real-time ASR demo is built for users to try out the ASR model with their own voice. Please do the following installation on the machine you'd like to run the demo's client (no need for the machine running the demo's server).
@ -163,8 +188,6 @@ On the client console, press and hold the "white-space" key on the keyboard to s
It could be possible to start the server and the client in two seperate machines, e.g. `demo_client.py` is usually started in a machine with a microphone hardware, while `demo_server.py` is usually started in a remote server with powerful GPUs. Please first make sure that these two machines have network access to each other, and then use `--host_ip` and `--host_port` to indicate the server machine's actual IP address (instead of the `localhost` as default) and TCP port, in both `demo_server.py` and `demo_client.py`. It could be possible to start the server and the client in two seperate machines, e.g. `demo_client.py` is usually started in a machine with a microphone hardware, while `demo_server.py` is usually started in a remote server with powerful GPUs. Please first make sure that these two machines have network access to each other, and then use `--host_ip` and `--host_port` to indicate the server machine's actual IP address (instead of the `localhost` as default) and TCP port, in both `demo_server.py` and `demo_client.py`.
## Experiments and Benchmarks
## PaddleCloud Training ## Questions and Help
If you wish to train DeepSpeech2 on PaddleCloud, please refer to
[Train DeepSpeech2 on PaddleCloud](https://github.com/PaddlePaddle/models/tree/develop/deep_speech_2/cloud).

Loading…
Cancel
Save