Merge pull request #519 from PaddlePaddle/py3

upgrade to python3
pull/544/head
Hui Zhang 5 years ago committed by GitHub
commit b882ba5000
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -1,7 +1,7 @@
language: cpp language: cpp
cache: ccache cache: ccache
sudo: required sudo: required
dist: trusty dist: xenial
services: services:
- docker - docker
os: os:
@ -13,11 +13,12 @@ addons:
apt: apt:
packages: packages:
- git - git
- python - python3-pip
- python-pip - python3-dev
- python2.7-dev
before_install: before_install:
- python3 --version
- python3 -m pip --version
- sudo pip install -U virtualenv pre-commit pip - sudo pip install -U virtualenv pre-commit pip
- docker pull paddlepaddle/paddle:latest - docker pull paddlepaddle/paddle:latest

@ -17,7 +17,7 @@ unittest(){
fi fi
find . -name 'tests' -type d -print0 | \ find . -name 'tests' -type d -print0 | \
xargs -0 -I{} -n1 bash -c \ xargs -0 -I{} -n1 bash -c \
'python -m unittest discover -v -s {}' 'python3 -m unittest discover -v -s {}'
cd - > /dev/null cd - > /dev/null
} }

@ -6,12 +6,12 @@
## Table of Contents ## Table of Contents
- [Installation](#installation) - [Installation](#installation)
- [Running in Docker Container](#running-in-docker-container)
- [Getting Started](#getting-started) - [Getting Started](#getting-started)
- [Data Preparation](#data-preparation) - [Data Preparation](#data-preparation)
- [Training a Model](#training-a-model) - [Training a Model](#training-a-model)
- [Data Augmentation Pipeline](#data-augmentation-pipeline) - [Data Augmentation Pipeline](#data-augmentation-pipeline)
- [Inference and Evaluation](#inference-and-evaluation) - [Inference and Evaluation](#inference-and-evaluation)
- [Running in Docker Container](#running-in-docker-container)
- [Hyper-parameters Tuning](#hyper-parameters-tuning) - [Hyper-parameters Tuning](#hyper-parameters-tuning)
- [Training for Mandarin Language](#training-for-mandarin-language) - [Training for Mandarin Language](#training-for-mandarin-language)
- [Trying Live Demo with Your Own Voice](#trying-live-demo-with-your-own-voice) - [Trying Live Demo with Your Own Voice](#trying-live-demo-with-your-own-voice)
@ -26,20 +26,20 @@
To avoid the trouble of environment setup, [running in Docker container](#running-in-docker-container) is highly recommended. Otherwise follow the guidelines below to install the dependencies manually. To avoid the trouble of environment setup, [running in Docker container](#running-in-docker-container) is highly recommended. Otherwise follow the guidelines below to install the dependencies manually.
### Prerequisites ### Prerequisites
- Python 2.7 only supported - Python >= 3.6
- PaddlePaddle 1.8.0 or later (please refer to the [Installation Guide](https://www.paddlepaddle.org.cn/documentation/docs/en/beginners_guide/index_en.html)) - PaddlePaddle 1.8.0 or later (please refer to the [Installation Guide](https://www.paddlepaddle.org.cn/documentation/docs/en/beginners_guide/index_en.html))
### Setup ### Setup
- Make sure these libraries or tools installed: `pkg-config`, `flac`, `ogg`, `vorbis`, `boost` and `swig`, e.g. installing them via `apt-get`: - Make sure these libraries or tools installed: `pkg-config`, `flac`, `ogg`, `vorbis`, `boost` and `swig`, e.g. installing them via `apt-get`:
```bash ```bash
sudo apt-get install -y pkg-config libflac-dev libogg-dev libvorbis-dev libboost-dev swig python-dev sudo apt-get install -y pkg-config libflac-dev libogg-dev libvorbis-dev libboost-dev swig python3-dev
``` ```
or, installing them via `yum`: or, installing them via `yum`:
```bash ```bash
sudo yum install pkgconfig libogg-devel libvorbis-devel boost-devel python-devel sudo yum install pkgconfig libogg-devel libvorbis-devel boost-devel python3-devel
wget https://ftp.osuosl.org/pub/xiph/releases/flac/flac-1.3.1.tar.xz wget https://ftp.osuosl.org/pub/xiph/releases/flac/flac-1.3.1.tar.xz
xz -d flac-1.3.1.tar.xz xz -d flac-1.3.1.tar.xz
tar -xvf flac-1.3.1.tar tar -xvf flac-1.3.1.tar
@ -57,6 +57,39 @@ cd DeepSpeech
sh setup.sh sh setup.sh
``` ```
### Running in Docker Container
Docker is an open source tool to build, ship, and run distributed applications in an isolated environment. A Docker image for this project has been provided in [hub.docker.com](https://hub.docker.com) with all the dependencies installed, including the pre-built PaddlePaddle, CTC decoders, and other necessary Python and third-party packages. This Docker image requires the support of NVIDIA GPU, so please make sure its availiability and the [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) has been installed.
Take several steps to launch the Docker image:
- Download the Docker image
```bash
nvidia-docker pull hub.baidubce.com/paddlepaddle/deep_speech_fluid:latest-gpu
```
- Clone this repository
```
git clone https://github.com/PaddlePaddle/DeepSpeech.git
```
- Run the Docker image
```bash
sudo nvidia-docker run -it -v $(pwd)/DeepSpeech:/DeepSpeech hub.baidubce.com/paddlepaddle/deep_speech_fluid:latest-gpu /bin/bash
```
Now go back and start from the [Getting Started](#getting-started) section, you can execute training, inference and hyper-parameters tuning similarly in the Docker container.
- Install PaddlePaddle
For example, for CUDA 10.1, CuDNN7.5:
```bash
python3 -m pip install paddlepaddle-gpu==1.8.0.post107
```
## Getting Started ## Getting Started
Several shell scripts provided in `./examples` will help us to quickly give it a try, for most major modules, including data preparation, model training, case inference and model evaluation, with a few public dataset (e.g. [LibriSpeech](http://www.openslr.org/12/), [Aishell](http://www.openslr.org/33)). Reading these examples will also help you to understand how to make it work with your own data. Several shell scripts provided in `./examples` will help us to quickly give it a try, for most major modules, including data preparation, model training, case inference and model evaluation, with a few public dataset (e.g. [LibriSpeech](http://www.openslr.org/12/), [Aishell](http://www.openslr.org/33)). Reading these examples will also help you to understand how to make it work with your own data.
@ -132,7 +165,7 @@ For how to generate such manifest files, please refer to `data/librispeech/libri
To perform z-score normalization (zero-mean, unit stddev) upon audio features, we have to estimate in advance the mean and standard deviation of the features, with some training samples: To perform z-score normalization (zero-mean, unit stddev) upon audio features, we have to estimate in advance the mean and standard deviation of the features, with some training samples:
```bash ```bash
python tools/compute_mean_std.py \ python3 tools/compute_mean_std.py \
--num_samples 2000 \ --num_samples 2000 \
--specgram_type linear \ --specgram_type linear \
--manifest_path data/librispeech/manifest.train \ --manifest_path data/librispeech/manifest.train \
@ -147,7 +180,7 @@ It will compute the mean and standard deviatio of power spectrum feature with 20
A vocabulary of possible characters is required to convert the transcription into a list of token indices for training, and in decoding, to convert from a list of indices back to text again. Such a character-based vocabulary can be built with `tools/build_vocab.py`. A vocabulary of possible characters is required to convert the transcription into a list of token indices for training, and in decoding, to convert from a list of indices back to text again. Such a character-based vocabulary can be built with `tools/build_vocab.py`.
```bash ```bash
python tools/build_vocab.py \ python3 tools/build_vocab.py \
--count_threshold 0 \ --count_threshold 0 \
--vocab_path data/librispeech/eng_vocab.txt \ --vocab_path data/librispeech/eng_vocab.txt \
--manifest_paths data/librispeech/manifest.train --manifest_paths data/librispeech/manifest.train
@ -160,9 +193,9 @@ It will write a vocabuary file `data/librispeeech/eng_vocab.txt` with all transc
For more help on arguments: For more help on arguments:
```bash ```bash
python data/librispeech/librispeech.py --help python3 data/librispeech/librispeech.py --help
python tools/compute_mean_std.py --help python3 tools/compute_mean_std.py --help
python tools/build_vocab.py --help python3 tools/build_vocab.py --help
``` ```
## Training a model ## Training a model
@ -172,26 +205,26 @@ python tools/build_vocab.py --help
- Start training from scratch with 8 GPUs: - Start training from scratch with 8 GPUs:
``` ```
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train.py CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 train.py
``` ```
- Start training from scratch with CPUs: - Start training from scratch with CPUs:
``` ```
python train.py --use_gpu False python3 train.py --use_gpu False
``` ```
- Resume training from a checkpoint: - Resume training from a checkpoint:
``` ```
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python train.py \ python3 train.py \
--init_from_pretrained_model CHECKPOINT_PATH_TO_RESUME_FROM --init_from_pretrained_model CHECKPOINT_PATH_TO_RESUME_FROM
``` ```
For more help on arguments: For more help on arguments:
```bash ```bash
python train.py --help python3 train.py --help
``` ```
or refer to `example/librispeech/run_train.sh`. or refer to `example/librispeech/run_train.sh`.
@ -273,13 +306,13 @@ An inference module caller `infer.py` is provided to infer, decode and visualize
- Inference with GPU: - Inference with GPU:
```bash ```bash
CUDA_VISIBLE_DEVICES=0 python infer.py CUDA_VISIBLE_DEVICES=0 python3 infer.py
``` ```
- Inference with CPUs: - Inference with CPUs:
```bash ```bash
python infer.py --use_gpu False python3 infer.py --use_gpu False
``` ```
We provide two types of CTC decoders: *CTC greedy decoder* and *CTC beam search decoder*. The *CTC greedy decoder* is an implementation of the simple best-path decoding algorithm, selecting at each timestep the most likely token, thus being greedy and locally optimal. The [*CTC beam search decoder*](https://arxiv.org/abs/1408.2873) otherwise utilizes a heuristic breadth-first graph search for reaching a near global optimality; it also requires a pre-trained KenLM language model for better scoring and ranking. The decoder type can be set with argument `--decoding_method`. We provide two types of CTC decoders: *CTC greedy decoder* and *CTC beam search decoder*. The *CTC greedy decoder* is an implementation of the simple best-path decoding algorithm, selecting at each timestep the most likely token, thus being greedy and locally optimal. The [*CTC beam search decoder*](https://arxiv.org/abs/1408.2873) otherwise utilizes a heuristic breadth-first graph search for reaching a near global optimality; it also requires a pre-trained KenLM language model for better scoring and ranking. The decoder type can be set with argument `--decoding_method`.
@ -287,7 +320,7 @@ We provide two types of CTC decoders: *CTC greedy decoder* and *CTC beam search
For more help on arguments: For more help on arguments:
``` ```
python infer.py --help python3 infer.py --help
``` ```
or refer to `example/librispeech/run_infer.sh`. or refer to `example/librispeech/run_infer.sh`.
@ -298,13 +331,13 @@ To evaluate a model's performance quantitatively, please run:
- Evaluation with GPUs: - Evaluation with GPUs:
```bash ```bash
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python test.py CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 test.py
``` ```
- Evaluation with CPUs: - Evaluation with CPUs:
```bash ```bash
python test.py --use_gpu False python3 test.py --use_gpu False
``` ```
The error rate (default: word error rate; can be set with `--error_rate_type`) will be printed. The error rate (default: word error rate; can be set with `--error_rate_type`) will be printed.
@ -312,7 +345,7 @@ The error rate (default: word error rate; can be set with `--error_rate_type`) w
For more help on arguments: For more help on arguments:
```bash ```bash
python test.py --help python3 test.py --help
``` ```
or refer to `example/librispeech/run_test.sh`. or refer to `example/librispeech/run_test.sh`.
@ -326,7 +359,7 @@ The hyper-parameters $\alpha$ (language model weight) and $\beta$ (word insertio
```bash ```bash
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python tools/tune.py \ python3 tools/tune.py \
--alpha_from 1.0 \ --alpha_from 1.0 \
--alpha_to 3.2 \ --alpha_to 3.2 \
--num_alphas 45 \ --num_alphas 45 \
@ -338,7 +371,7 @@ The hyper-parameters $\alpha$ (language model weight) and $\beta$ (word insertio
- Tuning with CPU: - Tuning with CPU:
```bash ```bash
python tools/tune.py --use_gpu False python3 tools/tune.py --use_gpu False
``` ```
The grid search will print the WER (word error rate) or CER (character error rate) at each point in the hyper-parameters space, and draw the error surface optionally. A proper hyper-parameters range should include the global minima of the error surface for WER/CER, as illustrated in the following figure. The grid search will print the WER (word error rate) or CER (character error rate) at each point in the hyper-parameters space, and draw the error surface optionally. A proper hyper-parameters range should include the global minima of the error surface for WER/CER, as illustrated in the following figure.
@ -352,36 +385,10 @@ Usually, as the figure shows, the variation of language model weight ($\alpha$)
After tuning, you can reset $\alpha$ and $\beta$ in the inference and evaluation modules to see if they really help improve the ASR performance. For more help After tuning, you can reset $\alpha$ and $\beta$ in the inference and evaluation modules to see if they really help improve the ASR performance. For more help
```bash ```bash
python tune.py --help python3 tune.py --help
``` ```
or refer to `example/librispeech/run_tune.sh`. or refer to `example/librispeech/run_tune.sh`.
## Running in Docker Container
Docker is an open source tool to build, ship, and run distributed applications in an isolated environment. A Docker image for this project has been provided in [hub.docker.com](https://hub.docker.com) with all the dependencies installed, including the pre-built PaddlePaddle, CTC decoders, and other necessary Python and third-party packages. This Docker image requires the support of NVIDIA GPU, so please make sure its availiability and the [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) has been installed.
Take several steps to launch the Docker image:
- Download the Docker image
```bash
nvidia-docker pull hub.baidubce.com/paddlepaddle/deep_speech_fluid:latest-gpu
```
- Clone this repository
```
git clone https://github.com/PaddlePaddle/DeepSpeech.git
```
- Run the Docker image
```bash
sudo nvidia-docker run -it -v $(pwd)/DeepSpeech:/DeepSpeech hub.baidubce.com/paddlepaddle/deep_speech_fluid:latest-gpu /bin/bash
```
Now go back and start from the [Getting Started](#getting-started) section, you can execute training, inference and hyper-parameters tuning similarly in the Docker container.
## Training for Mandarin Language ## Training for Mandarin Language
The key steps of training for Mandarin language are same to that of English language and we have also provided an example for Mandarin training with Aishell in ```examples/aishell```. As mentioned above, please execute ```sh run_data.sh```, ```sh run_train.sh```, ```sh run_test.sh``` and ```sh run_infer.sh``` to do data preparation, training, testing and inference correspondingly. We have also prepared a pre-trained model (downloaded by ./models/aishell/download_model.sh) for users to try with ```sh run_infer_golden.sh``` and ```sh run_test_golden.sh```. Notice that, different from English LM, the Mandarin LM is character-based and please run ```tools/tune.py``` to find an optimal setting. The key steps of training for Mandarin language are same to that of English language and we have also provided an example for Mandarin training with Aishell in ```examples/aishell```. As mentioned above, please execute ```sh run_data.sh```, ```sh run_train.sh```, ```sh run_test.sh``` and ```sh run_infer.sh``` to do data preparation, training, testing and inference correspondingly. We have also prepared a pre-trained model (downloaded by ./models/aishell/download_model.sh) for users to try with ```sh run_infer_golden.sh``` and ```sh run_test_golden.sh```. Notice that, different from English LM, the Mandarin LM is character-based and please run ```tools/tune.py``` to find an optimal setting.
@ -394,7 +401,7 @@ To start the demo's server, please run this in one console:
```bash ```bash
CUDA_VISIBLE_DEVICES=0 \ CUDA_VISIBLE_DEVICES=0 \
python deploy/demo_server.py \ python3 deploy/demo_server.py \
--host_ip localhost \ --host_ip localhost \
--host_port 8086 --host_port 8086
``` ```
@ -413,7 +420,7 @@ Then to start the client, please run this in another console:
```bash ```bash
CUDA_VISIBLE_DEVICES=0 \ CUDA_VISIBLE_DEVICES=0 \
python -u deploy/demo_client.py \ python3 -u deploy/demo_client.py \
--host_ip 'localhost' \ --host_ip 'localhost' \
--host_port 8086 --host_port 8086
``` ```
@ -427,8 +434,8 @@ Please also refer to `examples/deploy_demo/run_english_demo_server.sh`, which wi
For more help on arguments: For more help on arguments:
```bash ```bash
python deploy/demo_server.py --help python3 deploy/demo_server.py --help
python deploy/demo_client.py --help python3 deploy/demo_client.py --help
``` ```
## Released Models ## Released Models

@ -7,12 +7,12 @@
## 目录 ## 目录
- [安装](#安装) - [安装](#安装)
- [在 Docker 容器上运行](#在Docker容器上运行)
- [开始](#开始) - [开始](#开始)
- [数据准备](#数据准备) - [数据准备](#数据准备)
- [训练模型](#训练模型) - [训练模型](#训练模型)
- [数据增强流水线](#数据增强流水线) - [数据增强流水线](#数据增强流水线)
- [推断和评价](#推断和评价) - [推断和评价](#推断和评价)
- [在 Docker 容器上运行](#在Docker容器上运行)
- [超参数调整](#超参数调整) - [超参数调整](#超参数调整)
- [训练汉语语言](#训练汉语语言) - [训练汉语语言](#训练汉语语言)
- [用自己的声音尝试现场演示](#用自己的声音尝试现场演示) - [用自己的声音尝试现场演示](#用自己的声音尝试现场演示)
@ -24,20 +24,20 @@
为了避免环境配置问题,强烈建议在[Docker容器上运行](#在Docker容器上运行),否则请按照下面的指南安装依赖项。 为了避免环境配置问题,强烈建议在[Docker容器上运行](#在Docker容器上运行),否则请按照下面的指南安装依赖项。
### 前提 ### 前提
- 只支持Python 2.7 - Python >= 3.6
- PaddlePaddle 1.8.0 版本及以上(请参考[安装指南](https://www.paddlepaddle.org.cn/install/quick) - PaddlePaddle 1.8.0 版本及以上(请参考[安装指南](https://www.paddlepaddle.org.cn/install/quick)
### 安装 ### 安装
- 请确保以下库或工具已安装完毕:`pkg-config`, `flac`, `ogg`, `vorbis`, `boost``swig`, 如可以通过`apt-get`安装: - 请确保以下库或工具已安装完毕:`pkg-config`, `flac`, `ogg`, `vorbis`, `boost``swig`, 如可以通过`apt-get`安装:
```bash ```bash
sudo apt-get install -y pkg-config libflac-dev libogg-dev libvorbis-dev libboost-dev swig python-dev sudo apt-get install -y pkg-config libflac-dev libogg-dev libvorbis-dev libboost-dev swig python3-dev
``` ```
或者,也可以通过`yum`安装: 或者,也可以通过`yum`安装:
```bash ```bash
sudo yum install pkgconfig libogg-devel libvorbis-devel boost-devel python-devel sudo yum install pkgconfig libogg-devel libvorbis-devel boost-devel python3-devel
wget https://ftp.osuosl.org/pub/xiph/releases/flac/flac-1.3.1.tar.xz wget https://ftp.osuosl.org/pub/xiph/releases/flac/flac-1.3.1.tar.xz
xz -d flac-1.3.1.tar.xz xz -d flac-1.3.1.tar.xz
tar -xvf flac-1.3.1.tar tar -xvf flac-1.3.1.tar
@ -55,6 +55,39 @@ cd DeepSpeech
sh setup.sh sh setup.sh
``` ```
### 在Docker容器上运行
Docker 是一个开源工具,用于在孤立的环境中构建、发布和运行分布式应用程序。此项目的 Docker 镜像已在[hub.docker.com](https://hub.docker.com)中提供并安装了所有依赖项其中包括预先构建的PaddlePaddleCTC解码器以及其他必要的 Python 和第三方库。这个 Docker 映像需要NVIDIA GPU的支持所以请确保它的可用性并已完成[nvidia-docker](https://github.com/NVIDIA/nvidia-docker)的安装。
采取以下步骤来启动 Docker 镜像:
- 下载 Docker 镜像
```bash
nvidia-docker pull hub.baidubce.com/paddlepaddle/deep_speech_fluid:latest-gpu
```
- git clone 这个资源库
```
git clone https://github.com/PaddlePaddle/DeepSpeech.git
```
- 运行 Docker 镜像
```bash
sudo nvidia-docker run -it -v $(pwd)/DeepSpeech:/DeepSpeech hub.baidubce.com/paddlepaddle/deep_speech_fluid:latest-gpu /bin/bash
```
现在返回并从[开始](#开始)部分开始您可以在Docker容器中同样执行模型训练推断和超参数调整。
- 安装 PaddlePaddle
例如 CUDA 10.1, CuDNN7.5:
```bash
python3 -m pip install paddlepaddle-gpu==1.8.0.post107
```
## 开始 ## 开始
`./examples`里的一些 shell 脚本将帮助我们在一些公开数据集(比如:[LibriSpeech](http://www.openslr.org/12/), [Aishell](http://www.openslr.org/33)) 进行快速尝试,包括了数据准备,模型训练,案例推断和模型评价。阅读这些例子将帮助你理解如何使用你的数据集训练模型。 `./examples`里的一些 shell 脚本将帮助我们在一些公开数据集(比如:[LibriSpeech](http://www.openslr.org/12/), [Aishell](http://www.openslr.org/33)) 进行快速尝试,包括了数据准备,模型训练,案例推断和模型评价。阅读这些例子将帮助你理解如何使用你的数据集训练模型。
@ -130,7 +163,7 @@ sh setup.sh
为了对音频特征进行 z-score 归一化(零均值,单位标准差),我们必须预估训练样本特征的均值和标准差: 为了对音频特征进行 z-score 归一化(零均值,单位标准差),我们必须预估训练样本特征的均值和标准差:
```bash ```bash
python tools/compute_mean_std.py \ python3 tools/compute_mean_std.py \
--num_samples 2000 \ --num_samples 2000 \
--specgram_type linear \ --specgram_type linear \
--manifest_path data/librispeech/manifest.train \ --manifest_path data/librispeech/manifest.train \
@ -144,7 +177,7 @@ python tools/compute_mean_std.py \
我们需要一个包含可能会出现的字符集合的词表来在训练的时候将字符转换成索引,并在解码的时候将索引转换回文本。`tools/build_vocab.py`脚本将生成这种基于字符的词表。 我们需要一个包含可能会出现的字符集合的词表来在训练的时候将字符转换成索引,并在解码的时候将索引转换回文本。`tools/build_vocab.py`脚本将生成这种基于字符的词表。
```bash ```bash
python tools/build_vocab.py \ python3 tools/build_vocab.py \
--count_threshold 0 \ --count_threshold 0 \
--vocab_path data/librispeech/eng_vocab.txt \ --vocab_path data/librispeech/eng_vocab.txt \
--manifest_paths data/librispeech/manifest.train --manifest_paths data/librispeech/manifest.train
@ -157,9 +190,9 @@ python tools/build_vocab.py \
获得更多帮助: 获得更多帮助:
```bash ```bash
python data/librispeech/librispeech.py --help python3 data/librispeech/librispeech.py --help
python tools/compute_mean_std.py --help python3 tools/compute_mean_std.py --help
python tools/build_vocab.py --help python3 tools/build_vocab.py --help
``` ```
## 训练模型 ## 训练模型
@ -169,27 +202,27 @@ python tools/build_vocab.py --help
- 开始使用 8 片 GPU 训练: - 开始使用 8 片 GPU 训练:
``` ```
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train.py CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 train.py
``` ```
- 开始使用 CPU 训练: - 开始使用 CPU 训练:
``` ```
python train.py --use_gpu False python3 train.py --use_gpu False
``` ```
- 从检查点恢复训练: - 从检查点恢复训练:
``` ```
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python train.py \ python3 train.py \
--init_from_pretrained_model CHECKPOINT_PATH_TO_RESUME_FROM --init_from_pretrained_model CHECKPOINT_PATH_TO_RESUME_FROM
``` ```
获得更多帮助: 获得更多帮助:
```bash ```bash
python train.py --help python3 train.py --help
``` ```
或参考 `example/librispeech/run_train.sh`. 或参考 `example/librispeech/run_train.sh`.
@ -272,13 +305,13 @@ bash download_lm_ch.sh
- GPU 版本的推断: - GPU 版本的推断:
```bash ```bash
CUDA_VISIBLE_DEVICES=0 python infer.py CUDA_VISIBLE_DEVICES=0 python3 infer.py
``` ```
- CPU 版本的推断: - CPU 版本的推断:
```bash ```bash
python infer.py --use_gpu False python3 infer.py --use_gpu False
``` ```
我们提供两种类型的 CTC 解码器:*CTC贪心解码器*和*CTC波束搜索解码器*。*CTC贪心解码器*是简单的最佳路径解码算法的实现,在每个时间步选择最可能的字符,因此是贪心的并且是局部最优的。[*CTC波束搜索解码器*](https://arxiv.org/abs/1408.2873)另外使用了启发式广度优先图搜索以达到近似全局最优; 它也需要预先训练的KenLM语言模型以获得更好的评分和排名。解码器类型可以用参数`--decoding_method`设置。 我们提供两种类型的 CTC 解码器:*CTC贪心解码器*和*CTC波束搜索解码器*。*CTC贪心解码器*是简单的最佳路径解码算法的实现,在每个时间步选择最可能的字符,因此是贪心的并且是局部最优的。[*CTC波束搜索解码器*](https://arxiv.org/abs/1408.2873)另外使用了启发式广度优先图搜索以达到近似全局最优; 它也需要预先训练的KenLM语言模型以获得更好的评分和排名。解码器类型可以用参数`--decoding_method`设置。
@ -286,7 +319,7 @@ bash download_lm_ch.sh
获得更多帮助: 获得更多帮助:
``` ```
python infer.py --help python3 infer.py --help
``` ```
或参考`example/librispeech/run_infer.sh`. 或参考`example/librispeech/run_infer.sh`.
@ -297,13 +330,13 @@ python infer.py --help
- GPU 版本评估 - GPU 版本评估
```bash ```bash
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python test.py CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 test.py
``` ```
- CPU 版本评估 - CPU 版本评估
```bash ```bash
python test.py --use_gpu False python3 test.py --use_gpu False
``` ```
错误率(默认:误字率;可以用--error_rate_type设置将被打印出来。 错误率(默认:误字率;可以用--error_rate_type设置将被打印出来。
@ -311,7 +344,7 @@ python infer.py --help
获得更多帮助: 获得更多帮助:
```bash ```bash
python test.py --help python3 test.py --help
``` ```
或参考`example/librispeech/run_test.sh`. 或参考`example/librispeech/run_test.sh`.
@ -325,7 +358,7 @@ python test.py --help
```bash ```bash
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python tools/tune.py \ python3 tools/tune.py \
--alpha_from 1.0 \ --alpha_from 1.0 \
--alpha_to 3.2 \ --alpha_to 3.2 \
--num_alphas 45 \ --num_alphas 45 \
@ -337,7 +370,7 @@ python test.py --help
- CPU 版的调整: - CPU 版的调整:
```bash ```bash
python tools/tune.py --use_gpu False python3 tools/tune.py --use_gpu False
``` ```
网格搜索将会在超参数空间的每个点处打印出 WER (误字率)或者 CER (字符错误率),并且可绘出误差曲面。一个合适的超参数范围应包括 WER/CER 误差表面的全局最小值,如下图所示。 网格搜索将会在超参数空间的每个点处打印出 WER (误字率)或者 CER (字符错误率),并且可绘出误差曲面。一个合适的超参数范围应包括 WER/CER 误差表面的全局最小值,如下图所示。
@ -351,37 +384,10 @@ python test.py --help
调整之后,您可以在推理和评价模块中重置$\alpha$和$\beta$,以检查它们是否真的有助于提高 ASR 性能。更多帮助如下: 调整之后,您可以在推理和评价模块中重置$\alpha$和$\beta$,以检查它们是否真的有助于提高 ASR 性能。更多帮助如下:
```bash ```bash
python tune.py --help python3 tune.py --help
``` ```
或参考`example/librispeech/run_tune.sh`. 或参考`example/librispeech/run_tune.sh`.
## 在Docker容器上运行
Docker 是一个开源工具,用于在孤立的环境中构建、发布和运行分布式应用程序。此项目的 Docker 镜像已在[hub.docker.com](https://hub.docker.com)中提供并安装了所有依赖项其中包括预先构建的PaddlePaddleCTC解码器以及其他必要的 Python 和第三方库。这个 Docker 映像需要NVIDIA GPU的支持所以请确保它的可用性并已完成[nvidia-docker](https://github.com/NVIDIA/nvidia-docker)的安装。
采取以下步骤来启动 Docker 镜像:
- 下载 Docker 镜像
```bash
nvidia-docker pull hub.baidubce.com/paddlepaddle/deep_speech_fluid:latest-gpu
```
- git clone 这个资源库
```
git clone https://github.com/PaddlePaddle/DeepSpeech.git
```
- 运行 Docker 镜像
```bash
sudo nvidia-docker run -it -v $(pwd)/DeepSpeech:/DeepSpeech hub.baidubce.com/paddlepaddle/deep_speech_fluid:latest-gpu /bin/bash
```
现在返回并从[开始](#开始)部分开始您可以在Docker容器中同样执行模型训练推断和超参数调整。
## 训练普通话语言 ## 训练普通话语言
普通话语言训练与英语训练的关键步骤相同,我们提供了一个使用 Aishell 进行普通话训练的例子```examples/aishell```。如上所述,请执行```sh run_data.sh```, ```sh run_train.sh```, ```sh run_test.sh```和```sh run_infer.sh```做相应的数据准备,训练,测试和推断。我们还准备了一个预训练过的模型(执行./models/aishell/download_model.sh下载供用户使用```run_infer_golden.sh```和```run_test_golden.sh```来。请注意,与英语语言模型不同,普通话语言模型是基于汉字的,请运行```tools/tune.py```来查找最佳设置。 普通话语言训练与英语训练的关键步骤相同,我们提供了一个使用 Aishell 进行普通话训练的例子```examples/aishell```。如上所述,请执行```sh run_data.sh```, ```sh run_train.sh```, ```sh run_test.sh```和```sh run_infer.sh```做相应的数据准备,训练,测试和推断。我们还准备了一个预训练过的模型(执行./models/aishell/download_model.sh下载供用户使用```run_infer_golden.sh```和```run_test_golden.sh```来。请注意,与英语语言模型不同,普通话语言模型是基于汉字的,请运行```tools/tune.py```来查找最佳设置。
@ -394,7 +400,7 @@ sudo nvidia-docker run -it -v $(pwd)/DeepSpeech:/DeepSpeech hub.baidubce.com/pad
```bash ```bash
CUDA_VISIBLE_DEVICES=0 \ CUDA_VISIBLE_DEVICES=0 \
python deploy/demo_server.py \ python3 deploy/demo_server.py \
--host_ip localhost \ --host_ip localhost \
--host_port 8086 --host_port 8086
``` ```
@ -413,7 +419,7 @@ pip install keyboard
```bash ```bash
CUDA_VISIBLE_DEVICES=0 \ CUDA_VISIBLE_DEVICES=0 \
python -u deploy/demo_client.py \ python3 -u deploy/demo_client.py \
--host_ip 'localhost' \ --host_ip 'localhost' \
--host_port 8086 --host_port 8086
``` ```
@ -427,8 +433,8 @@ python -u deploy/demo_client.py \
获得更多帮助: 获得更多帮助:
```bash ```bash
python deploy/demo_server.py --help python3 deploy/demo_server.py --help
python deploy/demo_client.py --help python3 deploy/demo_client.py --help
``` ```
## 发布模型 ## 发布模型

@ -5,9 +5,6 @@ Manifest file is a json-format file with each line containing the
meta data (i.e. audio filepath, transcript and audio duration) meta data (i.e. audio filepath, transcript and audio duration)
of each audio file in the data set. of each audio file in the data set.
""" """
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os import os
import codecs import codecs

@ -5,9 +5,6 @@ Manifest file is a json-format file with each line containing the
meta data (i.e. audio filepath, transcript and audio duration) meta data (i.e. audio filepath, transcript and audio duration)
of each audio file in the data set. of each audio file in the data set.
""" """
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import distutils.util import distutils.util
import os import os

@ -5,9 +5,6 @@ Manifest file is a json-format file with each line containing the
meta data (i.e. audio filepath, transcript and audio duration) meta data (i.e. audio filepath, transcript and audio duration)
of each audio file in the data set. of each audio file in the data set.
""" """
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import distutils.util import distutils.util
import os import os

@ -5,9 +5,6 @@ Manifest file is a json-format file with each line containing the
meta data (i.e. audio filepath, transcript and audio duration) meta data (i.e. audio filepath, transcript and audio duration)
of each audio file in the data set. of each audio file in the data set.
""" """
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os import os
import codecs import codecs

@ -1,7 +1,4 @@
"""Contains the audio segment class.""" """Contains the audio segment class."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np import numpy as np
import io import io
@ -62,11 +59,11 @@ class AudioSegment(object):
"""Create audio segment from audio file. """Create audio segment from audio file.
:param filepath: Filepath or file object to audio file. :param filepath: Filepath or file object to audio file.
:type filepath: basestring|file :type filepath: str|file
:return: Audio segment instance. :return: Audio segment instance.
:rtype: AudioSegment :rtype: AudioSegment
""" """
if isinstance(file, basestring) and re.findall(r".seqbin_\d+$", file): if isinstance(file, str) and re.findall(r".seqbin_\d+$", file):
return cls.from_sequence_file(file) return cls.from_sequence_file(file)
else: else:
samples, sample_rate = soundfile.read(file, dtype='float32') samples, sample_rate = soundfile.read(file, dtype='float32')
@ -78,7 +75,7 @@ class AudioSegment(object):
the entire file into the memory which can be incredibly wasteful. the entire file into the memory which can be incredibly wasteful.
:param file: Input audio filepath or file object. :param file: Input audio filepath or file object.
:type file: basestring|file :type file: str|file
:param start: Start time in seconds. If start is negative, it wraps :param start: Start time in seconds. If start is negative, it wraps
around from the end. If not provided, this function around from the end. If not provided, this function
reads from the very beginning. reads from the very beginning.
@ -97,7 +94,7 @@ class AudioSegment(object):
sample_rate = sndfile.samplerate sample_rate = sndfile.samplerate
duration = float(len(sndfile)) / sample_rate duration = float(len(sndfile)) / sample_rate
start = 0. if start is None else start start = 0. if start is None else start
end = 0. if end is None else end end = duration if end is None else end
if start < 0.0: if start < 0.0:
start += duration start += duration
if end < 0.0: if end < 0.0:
@ -143,7 +140,7 @@ class AudioSegment(object):
sequence file (starting from 1). sequence file (starting from 1).
:param filepath: Filepath of sequence file. :param filepath: Filepath of sequence file.
:type filepath: basestring :type filepath: str
:return: Audio segment instance. :return: Audio segment instance.
:rtype: AudioSegment :rtype: AudioSegment
""" """
@ -236,7 +233,7 @@ class AudioSegment(object):
:param filepath: WAV filepath or file object to save the :param filepath: WAV filepath or file object to save the
audio segment. audio segment.
:type filepath: basestring|file :type filepath: str|file
:param dtype: Subtype for audio file. Options: 'int16', 'int32', :param dtype: Subtype for audio file. Options: 'int16', 'int32',
'float32', 'float64'. Default is 'float32'. 'float32', 'float64'. Default is 'float32'.
:type dtype: str :type dtype: str

@ -1,7 +1,4 @@
"""Contains the data augmentation pipeline.""" """Contains the data augmentation pipeline."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import json import json
import random import random

@ -1,7 +1,4 @@
"""Contains the abstract base class for augmentation models.""" """Contains the abstract base class for augmentation models."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from abc import ABCMeta, abstractmethod from abc import ABCMeta, abstractmethod

@ -1,7 +1,4 @@
"""Contains the impulse response augmentation model.""" """Contains the impulse response augmentation model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from data_utils.augmentor.base import AugmentorBase from data_utils.augmentor.base import AugmentorBase
from data_utils.utility import read_manifest from data_utils.utility import read_manifest
@ -14,7 +11,7 @@ class ImpulseResponseAugmentor(AugmentorBase):
:param rng: Random generator object. :param rng: Random generator object.
:type rng: random.Random :type rng: random.Random
:param impulse_manifest_path: Manifest path for impulse audio data. :param impulse_manifest_path: Manifest path for impulse audio data.
:type impulse_manifest_path: basestring :type impulse_manifest_path: str
""" """
def __init__(self, rng, impulse_manifest_path): def __init__(self, rng, impulse_manifest_path):

@ -1,7 +1,4 @@
"""Contains the noise perturb augmentation model.""" """Contains the noise perturb augmentation model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from data_utils.augmentor.base import AugmentorBase from data_utils.augmentor.base import AugmentorBase
from data_utils.utility import read_manifest from data_utils.utility import read_manifest
@ -18,7 +15,7 @@ class NoisePerturbAugmentor(AugmentorBase):
:param max_snr_dB: Maximal signal noise ratio, in decibels. :param max_snr_dB: Maximal signal noise ratio, in decibels.
:type max_snr_dB: float :type max_snr_dB: float
:param noise_manifest_path: Manifest path for noise audio data. :param noise_manifest_path: Manifest path for noise audio data.
:type noise_manifest_path: basestring :type noise_manifest_path: str
""" """
def __init__(self, rng, min_snr_dB, max_snr_dB, noise_manifest_path): def __init__(self, rng, min_snr_dB, max_snr_dB, noise_manifest_path):

@ -1,7 +1,4 @@
"""Contain the online bayesian normalization augmentation model.""" """Contain the online bayesian normalization augmentation model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from data_utils.augmentor.base import AugmentorBase from data_utils.augmentor.base import AugmentorBase

@ -1,7 +1,4 @@
"""Contain the resample augmentation model.""" """Contain the resample augmentation model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from data_utils.augmentor.base import AugmentorBase from data_utils.augmentor.base import AugmentorBase

@ -1,7 +1,4 @@
"""Contains the volume perturb augmentation model.""" """Contains the volume perturb augmentation model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from data_utils.augmentor.base import AugmentorBase from data_utils.augmentor.base import AugmentorBase

@ -1,7 +1,4 @@
"""Contain the speech perturbation augmentation model.""" """Contain the speech perturbation augmentation model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from data_utils.augmentor.base import AugmentorBase from data_utils.augmentor.base import AugmentorBase

@ -1,7 +1,4 @@
"""Contains the volume perturb augmentation model.""" """Contains the volume perturb augmentation model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from data_utils.augmentor.base import AugmentorBase from data_utils.augmentor.base import AugmentorBase

@ -1,9 +1,6 @@
"""Contains data generator for orgnaizing various audio data preprocessing """Contains data generator for orgnaizing various audio data preprocessing
pipeline and offering data reader interface of PaddlePaddle requirements. pipeline and offering data reader interface of PaddlePaddle requirements.
""" """
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import random import random
import tarfile import tarfile
@ -25,9 +22,9 @@ class DataGenerator(object):
:param vocab_filepath: Vocabulary filepath for indexing tokenized :param vocab_filepath: Vocabulary filepath for indexing tokenized
transcripts. transcripts.
:type vocab_filepath: basestring :type vocab_filepath: str
:param mean_std_filepath: File containing the pre-computed mean and stddev. :param mean_std_filepath: File containing the pre-computed mean and stddev.
:type mean_std_filepath: None|basestring :type mean_std_filepath: None|str
:param augmentation_config: Augmentation configuration in json string. :param augmentation_config: Augmentation configuration in json string.
Details see AugmentationPipeline.__doc__. Details see AugmentationPipeline.__doc__.
:type augmentation_config: str :type augmentation_config: str
@ -104,14 +101,14 @@ class DataGenerator(object):
"""Load, augment, featurize and normalize for speech data. """Load, augment, featurize and normalize for speech data.
:param audio_file: Filepath or file object of audio file. :param audio_file: Filepath or file object of audio file.
:type audio_file: basestring | file :type audio_file: str | file
:param transcript: Transcription text. :param transcript: Transcription text.
:type transcript: basestring :type transcript: str
:return: Tuple of audio feature tensor and data of transcription part, :return: Tuple of audio feature tensor and data of transcription part,
where transcription part could be token ids or text. where transcription part could be token ids or text.
:rtype: tuple of (2darray, list) :rtype: tuple of (2darray, list)
""" """
if isinstance(audio_file, basestring) and audio_file.startswith('tar:'): if isinstance(audio_file, str) and audio_file.startswith('tar:'):
speech_segment = SpeechSegment.from_file( speech_segment = SpeechSegment.from_file(
self._subfile_from_tar(audio_file), transcript) self._subfile_from_tar(audio_file), transcript)
else: else:
@ -137,7 +134,7 @@ class DataGenerator(object):
same shape, or a user-defined shape. same shape, or a user-defined shape.
:param manifest_path: Filepath of manifest for audio files. :param manifest_path: Filepath of manifest for audio files.
:type manifest_path: basestring :type manifest_path: str
:param batch_size: Number of instances in a batch. :param batch_size: Number of instances in a batch.
:type batch_size: int :type batch_size: int
:param padding_to: If set -1, the maximun shape in the batch :param padding_to: If set -1, the maximun shape in the batch
@ -361,7 +358,7 @@ class DataGenerator(object):
""" """
manifest.sort(key=lambda x: x["duration"]) manifest.sort(key=lambda x: x["duration"])
shift_len = self._rng.randint(0, batch_size - 1) shift_len = self._rng.randint(0, batch_size - 1)
batch_manifest = zip(*[iter(manifest[shift_len:])] * batch_size) batch_manifest = list(zip(*[iter(manifest[shift_len:])] * batch_size))
self._rng.shuffle(batch_manifest) self._rng.shuffle(batch_manifest)
batch_manifest = [item for batch in batch_manifest for item in batch] batch_manifest = [item for batch in batch_manifest for item in batch]
if not clipped: if not clipped:

@ -1,7 +1,4 @@
"""Contains the audio featurizer class.""" """Contains the audio featurizer class."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np import numpy as np
from data_utils.utility import read_manifest from data_utils.utility import read_manifest

@ -1,7 +1,4 @@
"""Contains the speech featurizer class.""" """Contains the speech featurizer class."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from data_utils.featurizer.audio_featurizer import AudioFeaturizer from data_utils.featurizer.audio_featurizer import AudioFeaturizer
from data_utils.featurizer.text_featurizer import TextFeaturizer from data_utils.featurizer.text_featurizer import TextFeaturizer
@ -18,7 +15,7 @@ class SpeechFeaturizer(object):
:param vocab_filepath: Filepath to load vocabulary for token indices :param vocab_filepath: Filepath to load vocabulary for token indices
conversion. conversion.
:type specgram_type: basestring :type specgram_type: str
:param specgram_type: Specgram feature type. Options: 'linear', 'mfcc'. :param specgram_type: Specgram feature type. Options: 'linear', 'mfcc'.
:type specgram_type: str :type specgram_type: str
:param stride_ms: Striding size (in milliseconds) for generating frames. :param stride_ms: Striding size (in milliseconds) for generating frames.

@ -1,7 +1,4 @@
"""Contains the text featurizer class.""" """Contains the text featurizer class."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os import os
import codecs import codecs
@ -16,7 +13,7 @@ class TextFeaturizer(object):
:param vocab_filepath: Filepath to load vocabulary for token indices :param vocab_filepath: Filepath to load vocabulary for token indices
conversion. conversion.
:type specgram_type: basestring :type specgram_type: str
""" """
def __init__(self, vocab_filepath): def __init__(self, vocab_filepath):
@ -28,7 +25,7 @@ class TextFeaturizer(object):
that the token indexing order follows the given vocabulary file. that the token indexing order follows the given vocabulary file.
:param text: Text to process. :param text: Text to process.
:type text: basestring :type text: str
:return: List of char-level token indices. :return: List of char-level token indices.
:rtype: list :rtype: list
""" """

@ -1,7 +1,4 @@
"""Contains feature normalizers.""" """Contains feature normalizers."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np import numpy as np
import random import random
@ -18,9 +15,9 @@ class FeatureNormalizer(object):
should be given for on-the-fly mean and stddev computing. should be given for on-the-fly mean and stddev computing.
:param mean_std_filepath: File containing the pre-computed mean and stddev. :param mean_std_filepath: File containing the pre-computed mean and stddev.
:type mean_std_filepath: None|basestring :type mean_std_filepath: None|str
:param manifest_path: Manifest of instances for computing mean and stddev. :param manifest_path: Manifest of instances for computing mean and stddev.
:type meanifest_path: None|basestring :type meanifest_path: None|str
:param featurize_func: Function to extract features. It should be callable :param featurize_func: Function to extract features. It should be callable
with ``featurize_func(audio_segment)``. with ``featurize_func(audio_segment)``.
:type featurize_func: None|callable :type featurize_func: None|callable
@ -63,7 +60,7 @@ class FeatureNormalizer(object):
"""Write the mean and stddev to the file. """Write the mean and stddev to the file.
:param filepath: File to write mean and stddev. :param filepath: File to write mean and stddev.
:type filepath: basestring :type filepath: str
""" """
np.savez(filepath, mean=self._mean, std=self._std) np.savez(filepath, mean=self._mean, std=self._std)

@ -1,7 +1,4 @@
"""Contains the speech segment class.""" """Contains the speech segment class."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np import numpy as np
from data_utils.audio import AudioSegment from data_utils.audio import AudioSegment
@ -16,7 +13,7 @@ class SpeechSegment(AudioSegment):
:param sample_rate: Audio sample rate. :param sample_rate: Audio sample rate.
:type sample_rate: int :type sample_rate: int
:param transcript: Transcript text for the speech. :param transcript: Transcript text for the speech.
:type transript: basestring :type transript: str
:raises TypeError: If the sample data type is not float or int. :raises TypeError: If the sample data type is not float or int.
""" """
@ -42,9 +39,9 @@ class SpeechSegment(AudioSegment):
"""Create speech segment from audio file and corresponding transcript. """Create speech segment from audio file and corresponding transcript.
:param filepath: Filepath or file object to audio file. :param filepath: Filepath or file object to audio file.
:type filepath: basestring|file :type filepath: str|file
:param transcript: Transcript text for the speech. :param transcript: Transcript text for the speech.
:type transript: basestring :type transript: str
:return: Speech segment instance. :return: Speech segment instance.
:rtype: SpeechSegment :rtype: SpeechSegment
""" """
@ -59,7 +56,7 @@ class SpeechSegment(AudioSegment):
:param bytes: Byte string containing audio samples. :param bytes: Byte string containing audio samples.
:type bytes: str :type bytes: str
:param transcript: Transcript text for the speech. :param transcript: Transcript text for the speech.
:type transript: basestring :type transript: str
:return: Speech segment instance. :return: Speech segment instance.
:rtype: Speech Segment :rtype: Speech Segment
""" """
@ -100,7 +97,7 @@ class SpeechSegment(AudioSegment):
the entire file into the memory which can be incredibly wasteful. the entire file into the memory which can be incredibly wasteful.
:param filepath: Filepath or file object to audio file. :param filepath: Filepath or file object to audio file.
:type filepath: basestring|file :type filepath: str|file
:param start: Start time in seconds. If start is negative, it wraps :param start: Start time in seconds. If start is negative, it wraps
around from the end. If not provided, this function around from the end. If not provided, this function
reads from the very beginning. reads from the very beginning.
@ -111,7 +108,7 @@ class SpeechSegment(AudioSegment):
:type end: float :type end: float
:param transcript: Transcript text for the speech. if not provided, :param transcript: Transcript text for the speech. if not provided,
the defaults is an empty string. the defaults is an empty string.
:type transript: basestring :type transript: str
:return: SpeechSegment instance of the specified slice of the input :return: SpeechSegment instance of the specified slice of the input
speech file. speech file.
:rtype: SpeechSegment :rtype: SpeechSegment
@ -139,6 +136,6 @@ class SpeechSegment(AudioSegment):
"""Return the transcript text. """Return the transcript text.
:return: Transcript text for the speech. :return: Transcript text for the speech.
:rtype: basestring :rtype: str
""" """
return self._transcript return self._transcript

@ -1,14 +1,10 @@
"""Contains data helper functions.""" """Contains data helper functions."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import json import json
import codecs import codecs
import os import os
import tarfile import tarfile
import time import time
from Queue import Queue
from threading import Thread from threading import Thread
from multiprocessing import Process, Manager, Value from multiprocessing import Process, Manager, Value
from paddle.dataset.common import md5file from paddle.dataset.common import md5file
@ -21,7 +17,7 @@ def read_manifest(manifest_path, max_duration=float('inf'), min_duration=0.0):
filtered out. filtered out.
:param manifest_path: Manifest file to load and parse. :param manifest_path: Manifest file to load and parse.
:type manifest_path: basestring :type manifest_path: str
:param max_duration: Maximal duration in seconds for instance filter. :param max_duration: Maximal duration in seconds for instance filter.
:type max_duration: float :type max_duration: float
:param min_duration: Minimal duration in seconds for instance filter. :param min_duration: Minimal duration in seconds for instance filter.

@ -1,7 +1,4 @@
"""Contains various CTC decoders.""" """Contains various CTC decoders."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from itertools import groupby from itertools import groupby
import numpy as np import numpy as np
@ -123,7 +120,7 @@ def ctc_beam_search_decoder(probs_seq,
prob_idx = prob_idx[0:cutoff_len] prob_idx = prob_idx[0:cutoff_len]
for l in prefix_set_prev: for l in prefix_set_prev:
if not prefix_set_next.has_key(l): if l not in prefix_set_next:
probs_b_cur[l], probs_nb_cur[l] = 0.0, 0.0 probs_b_cur[l], probs_nb_cur[l] = 0.0, 0.0
# extend prefix by travering prob_idx # extend prefix by travering prob_idx
@ -137,7 +134,7 @@ def ctc_beam_search_decoder(probs_seq,
last_char = l[-1] last_char = l[-1]
new_char = vocabulary[c] new_char = vocabulary[c]
l_plus = l + new_char l_plus = l + new_char
if not prefix_set_next.has_key(l_plus): if l_plus not in prefix_set_next:
probs_b_cur[l_plus], probs_nb_cur[l_plus] = 0.0, 0.0 probs_b_cur[l_plus], probs_nb_cur[l_plus] = 0.0, 0.0
if new_char == last_char: if new_char == last_char:
@ -164,7 +161,7 @@ def ctc_beam_search_decoder(probs_seq,
## store top beam_size prefixes ## store top beam_size prefixes
prefix_set_prev = sorted( prefix_set_prev = sorted(
prefix_set_next.iteritems(), key=lambda asd: asd[1], reverse=True) prefix_set_next.items(), key=lambda asd: asd[1], reverse=True)
if beam_size < len(prefix_set_prev): if beam_size < len(prefix_set_prev):
prefix_set_prev = prefix_set_prev[:beam_size] prefix_set_prev = prefix_set_prev[:beam_size]
prefix_set_prev = dict(prefix_set_prev) prefix_set_prev = dict(prefix_set_prev)

@ -1,7 +1,4 @@
"""External Scorer for Beam Search Decoder.""" """External Scorer for Beam Search Decoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os import os
import kenlm import kenlm
@ -20,7 +17,7 @@ class Scorer(object):
count when beta = 0. count when beta = 0.
:type beta: float :type beta: float
:model_path: Path to load language model. :model_path: Path to load language model.
:type model_path: basestring :type model_path: str
""" """
def __init__(self, alpha, beta, model_path): def __init__(self, alpha, beta, model_path):
@ -53,7 +50,7 @@ class Scorer(object):
and return the final one. and return the final one.
:param sentence: The input sentence for evalutation :param sentence: The input sentence for evalutation
:type sentence: basestring :type sentence: str
:param log: Whether return the score in log representation. :param log: Whether return the score in log representation.
:type log: bool :type log: bool
:return: Evaluation score, in the decimal or log. :return: Evaluation score, in the decimal or log.

@ -1,7 +1,4 @@
"""Set up paths for DS2""" """Set up paths for DS2"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os.path import os.path
import sys import sys

@ -1,7 +1,4 @@
"""Script to build and install decoder package.""" """Script to build and install decoder package."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from setuptools import setup, Extension, distutils from setuptools import setup, Extension, distutils
import glob import glob

@ -21,4 +21,4 @@ if [ ! -d ThreadPool ]; then
fi fi
echo "Install decoders ..." echo "Install decoders ..."
python setup.py install --num_processes 4 python3 setup.py install --num_processes 4

@ -1,7 +1,4 @@
"""Wrapper for various CTC decoders in SWIG.""" """Wrapper for various CTC decoders in SWIG."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import swig_decoders import swig_decoders
@ -16,7 +13,7 @@ class Scorer(swig_decoders.Scorer):
count when beta = 0. count when beta = 0.
:type beta: float :type beta: float
:model_path: Path to load language model. :model_path: Path to load language model.
:type model_path: basestring :type model_path: str
""" """
def __init__(self, alpha, beta, model_path, vocabulary): def __init__(self, alpha, beta, model_path, vocabulary):
@ -33,7 +30,7 @@ def ctc_greedy_decoder(probs_seq, vocabulary):
:param vocabulary: Vocabulary list. :param vocabulary: Vocabulary list.
:type vocabulary: list :type vocabulary: list
:return: Decoding result string. :return: Decoding result string.
:rtype: basestring :rtype: str
""" """
result = swig_decoders.ctc_greedy_decoder(probs_seq.tolist(), vocabulary) result = swig_decoders.ctc_greedy_decoder(probs_seq.tolist(), vocabulary)
return result.decode('utf-8') return result.decode('utf-8')
@ -117,8 +114,6 @@ def ctc_beam_search_decoder_batch(probs_split,
batch_beam_results = swig_decoders.ctc_beam_search_decoder_batch( batch_beam_results = swig_decoders.ctc_beam_search_decoder_batch(
probs_split, vocabulary, beam_size, num_processes, cutoff_prob, probs_split, vocabulary, beam_size, num_processes, cutoff_prob,
cutoff_top_n, ext_scoring_func) cutoff_top_n, ext_scoring_func)
batch_beam_results = [ batch_beam_results = [[(res[0], res[1]) for res in beam_results]
[(res[0], res[1].decode("utf-8")) for res in beam_results] for beam_results in batch_beam_results]
for beam_results in batch_beam_results
]
return batch_beam_results return batch_beam_results

@ -1,7 +1,4 @@
"""Test decoders.""" """Test decoders."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import unittest import unittest
from decoders import decoders_deprecated as decoder from decoders import decoders_deprecated as decoder

@ -1,7 +1,4 @@
"""Set up paths for DS2""" """Set up paths for DS2"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os.path import os.path
import sys import sys

@ -56,7 +56,7 @@ def callback(in_data, frame_count, time_info, status):
print('Speech[length=%d] Sent.' % len(sent)) print('Speech[length=%d] Sent.' % len(sent))
# Receive data from the server and shut down # Receive data from the server and shut down
received = sock.recv(1024) received = sock.recv(1024)
print "Recognition Results: {}".format(received) print("Recognition Results: {}".format(received))
sock.close() sock.close()
data_list = [] data_list = []
enable_trigger_record = True enable_trigger_record = True

@ -166,7 +166,7 @@ def start_server():
place=place, place=place,
share_rnn_weights=args.share_rnn_weights) share_rnn_weights=args.share_rnn_weights)
vocab_list = [chars.encode("utf-8") for chars in data_generator.vocab_list] vocab_list = [chars for chars in data_generator.vocab_list]
if args.decoding_method == "ctc_beam_search": if args.decoding_method == "ctc_beam_search":
ds2_model.init_ext_scorer(args.alpha, args.beta, args.lang_model_path, ds2_model.init_ext_scorer(args.alpha, args.beta, args.lang_model_path,

@ -3,7 +3,7 @@
cd ../.. > /dev/null cd ../.. > /dev/null
# download data, generate manifests # download data, generate manifests
PYTHONPATH=.:$PYTHONPATH python data/aishell/aishell.py \ PYTHONPATH=.:$PYTHONPATH python3 data/aishell/aishell.py \
--manifest_prefix='data/aishell/manifest' \ --manifest_prefix='data/aishell/manifest' \
--target_dir='./dataset/aishell' --target_dir='./dataset/aishell'
@ -14,7 +14,7 @@ fi
# build vocabulary # build vocabulary
python tools/build_vocab.py \ python3 tools/build_vocab.py \
--count_threshold=0 \ --count_threshold=0 \
--vocab_path='data/aishell/vocab.txt' \ --vocab_path='data/aishell/vocab.txt' \
--manifest_paths 'data/aishell/manifest.train' 'data/aishell/manifest.dev' --manifest_paths 'data/aishell/manifest.train' 'data/aishell/manifest.dev'
@ -26,7 +26,7 @@ fi
# compute mean and stddev for normalizer # compute mean and stddev for normalizer
python tools/compute_mean_std.py \ python3 tools/compute_mean_std.py \
--manifest_path='data/aishell/manifest.train' \ --manifest_path='data/aishell/manifest.train' \
--num_samples=2000 \ --num_samples=2000 \
--specgram_type='linear' \ --specgram_type='linear' \

@ -13,7 +13,7 @@ cd - > /dev/null
# infer # infer
CUDA_VISIBLE_DEVICES=0 \ CUDA_VISIBLE_DEVICES=0 \
python -u infer.py \ python3 -u infer.py \
--num_samples=10 \ --num_samples=10 \
--beam_size=300 \ --beam_size=300 \
--num_proc_bsearch=8 \ --num_proc_bsearch=8 \

@ -22,7 +22,7 @@ cd - > /dev/null
# infer # infer
CUDA_VISIBLE_DEVICES=0 \ CUDA_VISIBLE_DEVICES=0 \
python -u infer.py \ python3 -u infer.py \
--num_samples=10 \ --num_samples=10 \
--beam_size=300 \ --beam_size=300 \
--num_proc_bsearch=8 \ --num_proc_bsearch=8 \

@ -13,7 +13,7 @@ cd - > /dev/null
# evaluate model # evaluate model
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -u test.py \ python3 -u test.py \
--batch_size=128 \ --batch_size=128 \
--beam_size=300 \ --beam_size=300 \
--num_proc_bsearch=8 \ --num_proc_bsearch=8 \

@ -22,7 +22,7 @@ cd - > /dev/null
# evaluate model # evaluate model
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -u test.py \ python3 -u test.py \
--batch_size=128 \ --batch_size=128 \
--beam_size=300 \ --beam_size=300 \
--num_proc_bsearch=8 \ --num_proc_bsearch=8 \

@ -6,7 +6,7 @@ cd ../.. > /dev/null
# if you wish to resume from an exists model, uncomment --init_from_pretrained_model # if you wish to resume from an exists model, uncomment --init_from_pretrained_model
export FLAGS_sync_nccl_allreduce=0 export FLAGS_sync_nccl_allreduce=0
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -u train.py \ python3 -u train.py \
--batch_size=64 \ --batch_size=64 \
--num_epoch=50 \ --num_epoch=50 \
--num_conv_layers=2 \ --num_conv_layers=2 \

@ -22,7 +22,7 @@ cd - > /dev/null
# infer # infer
CUDA_VISIBLE_DEVICES=0 \ CUDA_VISIBLE_DEVICES=0 \
python -u infer.py \ python3 -u infer.py \
--num_samples=10 \ --num_samples=10 \
--beam_size=500 \ --beam_size=500 \
--num_proc_bsearch=5 \ --num_proc_bsearch=5 \

@ -22,7 +22,7 @@ cd - > /dev/null
# evaluate model # evaluate model
CUDA_VISIBLE_DEVICES=0,1,2,3 \ CUDA_VISIBLE_DEVICES=0,1,2,3 \
python -u test.py \ python3 -u test.py \
--batch_size=128 \ --batch_size=128 \
--beam_size=500 \ --beam_size=500 \
--num_proc_bsearch=8 \ --num_proc_bsearch=8 \

@ -4,7 +4,7 @@ cd ../.. > /dev/null
# start demo client # start demo client
CUDA_VISIBLE_DEVICES=0 \ CUDA_VISIBLE_DEVICES=0 \
python -u deploy/demo_client.py \ python3 -u deploy/demo_client.py \
--host_ip='localhost' \ --host_ip='localhost' \
--host_port=8086 \ --host_port=8086 \

@ -23,7 +23,7 @@ cd - > /dev/null
# start demo server # start demo server
CUDA_VISIBLE_DEVICES=0 \ CUDA_VISIBLE_DEVICES=0 \
python -u deploy/demo_server.py \ python3 -u deploy/demo_server.py \
--host_ip='localhost' \ --host_ip='localhost' \
--host_port=8086 \ --host_port=8086 \
--num_conv_layers=2 \ --num_conv_layers=2 \

@ -3,7 +3,7 @@
cd ../.. > /dev/null cd ../.. > /dev/null
# download data, generate manifests # download data, generate manifests
PYTHONPATH=.:$PYTHONPATH python data/librispeech/librispeech.py \ PYTHONPATH=.:$PYTHONPATH python3 data/librispeech/librispeech.py \
--manifest_prefix='data/librispeech/manifest' \ --manifest_prefix='data/librispeech/manifest' \
--target_dir='./dataset/librispeech' \ --target_dir='./dataset/librispeech' \
--full_download='True' --full_download='True'
@ -17,7 +17,7 @@ cat data/librispeech/manifest.train-* | shuf > data/librispeech/manifest.train
# build vocabulary # build vocabulary
python tools/build_vocab.py \ python3 tools/build_vocab.py \
--count_threshold=0 \ --count_threshold=0 \
--vocab_path='data/librispeech/vocab.txt' \ --vocab_path='data/librispeech/vocab.txt' \
--manifest_paths='data/librispeech/manifest.train' --manifest_paths='data/librispeech/manifest.train'
@ -29,7 +29,7 @@ fi
# compute mean and stddev for normalizer # compute mean and stddev for normalizer
python tools/compute_mean_std.py \ python3 tools/compute_mean_std.py \
--manifest_path='data/librispeech/manifest.train' \ --manifest_path='data/librispeech/manifest.train' \
--num_samples=2000 \ --num_samples=2000 \
--specgram_type='linear' \ --specgram_type='linear' \

@ -13,7 +13,7 @@ cd - > /dev/null
# infer # infer
CUDA_VISIBLE_DEVICES=0 \ CUDA_VISIBLE_DEVICES=0 \
python -u infer.py \ python3 -u infer.py \
--num_samples=10 \ --num_samples=10 \
--beam_size=500 \ --beam_size=500 \
--num_proc_bsearch=8 \ --num_proc_bsearch=8 \

@ -22,7 +22,7 @@ cd - > /dev/null
# infer # infer
CUDA_VISIBLE_DEVICES=0 \ CUDA_VISIBLE_DEVICES=0 \
python -u infer.py \ python3 -u infer.py \
--num_samples=10 \ --num_samples=10 \
--beam_size=500 \ --beam_size=500 \
--num_proc_bsearch=8 \ --num_proc_bsearch=8 \

@ -13,7 +13,7 @@ cd - > /dev/null
# evaluate model # evaluate model
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -u test.py \ python3 -u test.py \
--batch_size=128 \ --batch_size=128 \
--beam_size=500 \ --beam_size=500 \
--num_proc_bsearch=8 \ --num_proc_bsearch=8 \

@ -22,7 +22,7 @@ cd - > /dev/null
# evaluate model # evaluate model
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -u test.py \ python3 -u test.py \
--batch_size=128 \ --batch_size=128 \
--beam_size=500 \ --beam_size=500 \
--num_proc_bsearch=8 \ --num_proc_bsearch=8 \

@ -7,7 +7,7 @@ cd ../.. > /dev/null
export FLAGS_sync_nccl_allreduce=0 export FLAGS_sync_nccl_allreduce=0
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -u train.py \ python3 -u train.py \
--batch_size=20 \ --batch_size=20 \
--num_epoch=50 \ --num_epoch=50 \
--num_conv_layers=2 \ --num_conv_layers=2 \

@ -4,7 +4,7 @@ cd ../.. > /dev/null
# grid-search for hyper-parameters in language model # grid-search for hyper-parameters in language model
CUDA_VISIBLE_DEVICES=0,1,2,3 \ CUDA_VISIBLE_DEVICES=0,1,2,3 \
python -u tools/tune.py \ python3 -u tools/tune.py \
--num_batches=-1 \ --num_batches=-1 \
--batch_size=128 \ --batch_size=128 \
--beam_size=500 \ --beam_size=500 \

@ -8,7 +8,7 @@ if [ ! -e data/tiny ]; then
fi fi
# download data, generate manifests # download data, generate manifests
PYTHONPATH=.:$PYTHONPATH python data/librispeech/librispeech.py \ PYTHONPATH=.:$PYTHONPATH python3 data/librispeech/librispeech.py \
--manifest_prefix='data/tiny/manifest' \ --manifest_prefix='data/tiny/manifest' \
--target_dir='./dataset/librispeech' \ --target_dir='./dataset/librispeech' \
--full_download='False' --full_download='False'
@ -21,7 +21,7 @@ fi
head -n 64 data/tiny/manifest.dev-clean > data/tiny/manifest.tiny head -n 64 data/tiny/manifest.dev-clean > data/tiny/manifest.tiny
# build vocabulary # build vocabulary
python tools/build_vocab.py \ python3 tools/build_vocab.py \
--count_threshold=0 \ --count_threshold=0 \
--vocab_path='data/tiny/vocab.txt' \ --vocab_path='data/tiny/vocab.txt' \
--manifest_paths='data/tiny/manifest.tiny' --manifest_paths='data/tiny/manifest.tiny'
@ -33,7 +33,7 @@ fi
# compute mean and stddev for normalizer # compute mean and stddev for normalizer
python tools/compute_mean_std.py \ python3 tools/compute_mean_std.py \
--manifest_path='data/tiny/manifest.tiny' \ --manifest_path='data/tiny/manifest.tiny' \
--num_samples=64 \ --num_samples=64 \
--specgram_type='linear' \ --specgram_type='linear' \

@ -13,7 +13,7 @@ cd - > /dev/null
# infer # infer
CUDA_VISIBLE_DEVICES=0 \ CUDA_VISIBLE_DEVICES=0 \
python -u infer.py \ python3 -u infer.py \
--num_samples=10 \ --num_samples=10 \
--beam_size=500 \ --beam_size=500 \
--num_proc_bsearch=8 \ --num_proc_bsearch=8 \

@ -22,7 +22,7 @@ cd - > /dev/null
# infer # infer
CUDA_VISIBLE_DEVICES=0 \ CUDA_VISIBLE_DEVICES=0 \
python -u infer.py \ python3 -u infer.py \
--num_samples=10 \ --num_samples=10 \
--beam_size=500 \ --beam_size=500 \
--num_proc_bsearch=8 \ --num_proc_bsearch=8 \

@ -13,7 +13,7 @@ cd - > /dev/null
# evaluate model # evaluate model
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -u test.py \ python3 -u test.py \
--batch_size=128 \ --batch_size=128 \
--beam_size=500 \ --beam_size=500 \
--num_proc_bsearch=8 \ --num_proc_bsearch=8 \

@ -22,7 +22,7 @@ cd - > /dev/null
# evaluate model # evaluate model
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -u test.py \ python3 -u test.py \
--batch_size=128 \ --batch_size=128 \
--beam_size=500 \ --beam_size=500 \
--num_proc_bsearch=8 \ --num_proc_bsearch=8 \

@ -6,7 +6,7 @@ cd ../.. > /dev/null
# if you wish to resume from an exists model, uncomment --init_from_pretrained_model # if you wish to resume from an exists model, uncomment --init_from_pretrained_model
export FLAGS_sync_nccl_allreduce=0 export FLAGS_sync_nccl_allreduce=0
CUDA_VISIBLE_DEVICES=0,1,2,3 \ CUDA_VISIBLE_DEVICES=0,1,2,3 \
python -u train.py \ python3 -u train.py \
--batch_size=4 \ --batch_size=4 \
--num_epoch=20 \ --num_epoch=20 \
--num_conv_layers=2 \ --num_conv_layers=2 \

@ -4,7 +4,7 @@ cd ../.. > /dev/null
# grid-search for hyper-parameters in language model # grid-search for hyper-parameters in language model
CUDA_VISIBLE_DEVICES=0,1,2,3 \ CUDA_VISIBLE_DEVICES=0,1,2,3 \
python -u tools/tune.py \ python3 -u tools/tune.py \
--num_batches=-1 \ --num_batches=-1 \
--batch_size=128 \ --batch_size=128 \
--beam_size=500 \ --beam_size=500 \

@ -1,12 +1,6 @@
"""Inferer for DeepSpeech2 model.""" """Inferer for DeepSpeech2 model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys import sys
reload(sys)
sys.setdefaultencoding('utf-8')
import argparse import argparse
import functools import functools
import paddle.fluid as fluid import paddle.fluid as fluid
@ -104,7 +98,7 @@ def infer():
init_from_pretrained_model=args.model_path) init_from_pretrained_model=args.model_path)
# decoders only accept string encoded in utf-8 # decoders only accept string encoded in utf-8
vocab_list = [chars.encode("utf-8") for chars in data_generator.vocab_list] vocab_list = [chars for chars in data_generator.vocab_list]
if args.decoding_method == "ctc_greedy": if args.decoding_method == "ctc_greedy":
ds2_model.logger.info("start inference ...") ds2_model.logger.info("start inference ...")

@ -1,7 +1,4 @@
"""Contains DeepSpeech2 model.""" """Contains DeepSpeech2 model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys import sys
import os import os
@ -10,7 +7,6 @@ import logging
import gzip import gzip
import copy import copy
import inspect import inspect
import cPickle as pickle
import collections import collections
import multiprocessing import multiprocessing
import numpy as np import numpy as np
@ -445,7 +441,7 @@ class DeepSpeech2Model(object):
:param vocab_list: List of tokens in the vocabulary, for decoding. :param vocab_list: List of tokens in the vocabulary, for decoding.
:type vocab_list: list :type vocab_list: list
:return: List of transcription texts. :return: List of transcription texts.
:rtype: List of basestring :rtype: List of str
""" """
results = [] results = []
for i, probs in enumerate(probs_split): for i, probs in enumerate(probs_split):
@ -466,7 +462,7 @@ class DeepSpeech2Model(object):
empty, the external scorer will be set to empty, the external scorer will be set to
None, and the decoding method will be pure None, and the decoding method will be pure
beam search without scorer. beam search without scorer.
:type language_model_path: basestring|None :type language_model_path: str|None
:param vocab_list: List of tokens in the vocabulary, for decoding. :param vocab_list: List of tokens in the vocabulary, for decoding.
:type vocab_list: list :type vocab_list: list
""" """
@ -513,7 +509,7 @@ class DeepSpeech2Model(object):
:param num_processes: Number of processes (CPU) for decoder. :param num_processes: Number of processes (CPU) for decoder.
:type num_processes: int :type num_processes: int
:return: List of transcription texts. :return: List of transcription texts.
:rtype: List of basestring :rtype: List of str
""" """
if self._ext_scorer != None: if self._ext_scorer != None:
self._ext_scorer.reset_params(beam_alpha, beam_beta) self._ext_scorer.reset_params(beam_alpha, beam_beta)

@ -1,7 +1,3 @@
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections import collections
import paddle.fluid as fluid import paddle.fluid as fluid
import numpy as np import numpy as np

@ -2,7 +2,7 @@
# install python dependencies # install python dependencies
if [ -f "requirements.txt" ]; then if [ -f "requirements.txt" ]; then
pip install -r requirements.txt pip3 install -r requirements.txt
fi fi
if [ $? != 0 ]; then if [ $? != 0 ]; then
echo "Install python dependencies failed !!!" echo "Install python dependencies failed !!!"
@ -10,7 +10,7 @@ if [ $? != 0 ]; then
fi fi
# install package libsndfile # install package libsndfile
python -c "import soundfile" python3 -c "import soundfile"
if [ $? != 0 ]; then if [ $? != 0 ]; then
echo "Install package libsndfile into default system path." echo "Install package libsndfile into default system path."
wget "http://www.mega-nerd.com/libsndfile/files/libsndfile-1.0.28.tar.gz" wget "http://www.mega-nerd.com/libsndfile/files/libsndfile-1.0.28.tar.gz"
@ -27,7 +27,7 @@ if [ $? != 0 ]; then
fi fi
# install decoders # install decoders
python -c "import pkg_resources; pkg_resources.require(\"swig_decoders==1.1\")" python3 -c "import pkg_resources; pkg_resources.require(\"swig_decoders==1.1\")"
if [ $? != 0 ]; then if [ $? != 0 ]; then
cd decoders/swig > /dev/null cd decoders/swig > /dev/null
sh setup.sh sh setup.sh

@ -1,7 +1,4 @@
"""Evaluation for DeepSpeech2 model.""" """Evaluation for DeepSpeech2 model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse import argparse
import functools import functools
@ -99,7 +96,7 @@ def evaluate():
init_from_pretrained_model=args.model_path) init_from_pretrained_model=args.model_path)
# decoders only accept string encoded in utf-8 # decoders only accept string encoded in utf-8
vocab_list = [chars.encode("utf-8") for chars in data_generator.vocab_list] vocab_list = [chars for chars in data_generator.vocab_list]
if args.decoding_method == "ctc_beam_search": if args.decoding_method == "ctc_beam_search":
ds2_model.init_ext_scorer(args.alpha, args.beta, args.lang_model_path, ds2_model.init_ext_scorer(args.alpha, args.beta, args.lang_model_path,

@ -1,7 +1,4 @@
"""Set up paths for DS2""" """Set up paths for DS2"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os.path import os.path
import sys import sys

@ -2,9 +2,6 @@
Each item in vocabulary file is a character. Each item in vocabulary file is a character.
""" """
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse import argparse
import functools import functools

@ -1,7 +1,4 @@
"""Compute mean and std for feature normalizer, and save to file.""" """Compute mean and std for feature normalizer, and save to file."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse import argparse
import functools import functools

@ -1,7 +1,4 @@
"""Beam search parameters tuning for DeepSpeech2 model.""" """Beam search parameters tuning for DeepSpeech2 model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys import sys
import os import os
@ -107,7 +104,7 @@ def tune():
share_rnn_weights=args.share_rnn_weights) share_rnn_weights=args.share_rnn_weights)
# decoders only accept string encoded in utf-8 # decoders only accept string encoded in utf-8
vocab_list = [chars.encode("utf-8") for chars in data_generator.vocab_list] vocab_list = [chars for chars in data_generator.vocab_list]
errors_func = char_errors if args.error_rate_type == 'cer' else word_errors errors_func = char_errors if args.error_rate_type == 'cer' else word_errors
# create grid for search # create grid for search
cand_alphas = np.linspace(args.alpha_from, args.alpha_to, args.num_alphas) cand_alphas = np.linspace(args.alpha_from, args.alpha_to, args.num_alphas)

@ -1,7 +1,4 @@
"""Trainer for DeepSpeech2 model.""" """Trainer for DeepSpeech2 model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse import argparse
import functools import functools

@ -1,10 +1,6 @@
# -*- coding: utf-8 -*-
"""This module provides functions to calculate error rate in different level. """This module provides functions to calculate error rate in different level.
e.g. wer for word-level, cer for char-level. e.g. wer for word-level, cer for char-level.
""" """
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np import numpy as np
@ -61,9 +57,9 @@ def word_errors(reference, hypothesis, ignore_case=False, delimiter=' '):
hypothesis sequence in word-level. hypothesis sequence in word-level.
:param reference: The reference sentence. :param reference: The reference sentence.
:type reference: basestring :type reference: str
:param hypothesis: The hypothesis sentence. :param hypothesis: The hypothesis sentence.
:type hypothesis: basestring :type hypothesis: str
:param ignore_case: Whether case-sensitive or not. :param ignore_case: Whether case-sensitive or not.
:type ignore_case: bool :type ignore_case: bool
:param delimiter: Delimiter of input sentences. :param delimiter: Delimiter of input sentences.
@ -75,8 +71,8 @@ def word_errors(reference, hypothesis, ignore_case=False, delimiter=' '):
reference = reference.lower() reference = reference.lower()
hypothesis = hypothesis.lower() hypothesis = hypothesis.lower()
ref_words = filter(None, reference.split(delimiter)) ref_words = list(filter(None, reference.split(delimiter)))
hyp_words = filter(None, hypothesis.split(delimiter)) hyp_words = list(filter(None, hypothesis.split(delimiter)))
edit_distance = _levenshtein_distance(ref_words, hyp_words) edit_distance = _levenshtein_distance(ref_words, hyp_words)
return float(edit_distance), len(ref_words) return float(edit_distance), len(ref_words)
@ -87,9 +83,9 @@ def char_errors(reference, hypothesis, ignore_case=False, remove_space=False):
hypothesis sequence in char-level. hypothesis sequence in char-level.
:param reference: The reference sentence. :param reference: The reference sentence.
:type reference: basestring :type reference: str
:param hypothesis: The hypothesis sentence. :param hypothesis: The hypothesis sentence.
:type hypothesis: basestring :type hypothesis: str
:param ignore_case: Whether case-sensitive or not. :param ignore_case: Whether case-sensitive or not.
:type ignore_case: bool :type ignore_case: bool
:param remove_space: Whether remove internal space characters :param remove_space: Whether remove internal space characters
@ -105,8 +101,8 @@ def char_errors(reference, hypothesis, ignore_case=False, remove_space=False):
if remove_space == True: if remove_space == True:
join_char = '' join_char = ''
reference = join_char.join(filter(None, reference.split(' '))) reference = join_char.join(list(filter(None, reference.split(' '))))
hypothesis = join_char.join(filter(None, hypothesis.split(' '))) hypothesis = join_char.join(list(filter(None, hypothesis.split(' '))))
edit_distance = _levenshtein_distance(reference, hypothesis) edit_distance = _levenshtein_distance(reference, hypothesis)
return float(edit_distance), len(reference) return float(edit_distance), len(reference)
@ -132,9 +128,9 @@ def wer(reference, hypothesis, ignore_case=False, delimiter=' '):
that empty items will be removed when splitting sentences by delimiter. that empty items will be removed when splitting sentences by delimiter.
:param reference: The reference sentence. :param reference: The reference sentence.
:type reference: basestring :type reference: str
:param hypothesis: The hypothesis sentence. :param hypothesis: The hypothesis sentence.
:type hypothesis: basestring :type hypothesis: str
:param ignore_case: Whether case-sensitive or not. :param ignore_case: Whether case-sensitive or not.
:type ignore_case: bool :type ignore_case: bool
:param delimiter: Delimiter of input sentences. :param delimiter: Delimiter of input sentences.
@ -175,9 +171,9 @@ def cer(reference, hypothesis, ignore_case=False, remove_space=False):
characters in a sentence will be replaced by one space character. characters in a sentence will be replaced by one space character.
:param reference: The reference sentence. :param reference: The reference sentence.
:type reference: basestring :type reference: str
:param hypothesis: The hypothesis sentence. :param hypothesis: The hypothesis sentence.
:type hypothesis: basestring :type hypothesis: str
:param ignore_case: Whether case-sensitive or not. :param ignore_case: Whether case-sensitive or not.
:type ignore_case: bool :type ignore_case: bool
:param remove_space: Whether remove internal space characters :param remove_space: Whether remove internal space characters

@ -1,8 +1,4 @@
# -*- coding: utf-8 -*-
"""Test error rate.""" """Test error rate."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import unittest import unittest
from utils import error_rate from utils import error_rate

@ -1,7 +1,4 @@
"""Contains common utility functions.""" """Contains common utility functions."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import distutils.util import distutils.util
@ -22,7 +19,7 @@ def print_arguments(args):
:type args: argparse.Namespace :type args: argparse.Namespace
""" """
print("----------- Configuration Arguments -----------") print("----------- Configuration Arguments -----------")
for arg, value in sorted(vars(args).iteritems()): for arg, value in sorted(vars(args).items()):
print("%s: %s" % (arg, value)) print("%s: %s" % (arg, value))
print("------------------------------------------------") print("------------------------------------------------")

Loading…
Cancel
Save