PaddleSpeech/docs/source/install.md

([简体中文](./install_cn.md)|English)
# Installation
There are 3 ways to use `PaddleSpeech`. According to the degree of difficulty, the 3 ways can be divided into **Easy**, **Medium**, and **Hard**. You can choose one of the 3 ways to install `PaddleSpeech`.

| Way | Function                                                     | Support|
|:---- |:----------------------------------------------------------- |:----|
| Easy     | (1) Use command-line functions of PaddleSpeech. <br> (2) Experience PaddleSpeech on Ai Studio. | Linux, Mac(not support M1 chip)，Windows ( For more information about installation, see [#1195](https://github.com/PaddlePaddle/PaddleSpeech/discussions/1195)) |
| Medium     | Support major functions ，such as using the` ready-made `examples and using PaddleSpeech to train your model.                                           | Linux, Mac(not support M1 chip, not support training models)，Windows (not support training models) |
| Hard     | Support full function of Paddlespeech, including using join ctc decoder with kaldi([asr2](../../examples/librispeech/asr2 )), training n-gram language model, Montreal-Forced-Aligner, and so on. And you are more able to be a developer! | Ubuntu |

## Prerequisites
- Python >= 3.7
- PaddlePaddle latest version (please refer to the [Installation Guide](https://www.paddlepaddle.org.cn/documentation/docs/en/beginners_guide/index_en.html))
- C++ compilation environment
- Tip: For Linux and Mac, do not use command `sh` instead of command `bash` in installation document.
- Tip: We recommand you to install `paddlepaddle` from https://mirror.baidu.com/pypi/simple and install `paddlespeech` from https://pypi.tuna.tsinghua.edu.cn/simple. 

## Easy: Get the Basic Function (Support Linux, Mac, and Windows)
- If you are newer to `PaddleSpeech` and want to experience it easily without your machine. We recommend you to use [AI Studio](https://aistudio.baidu.com/aistudio/index) to experience it. There is a step-by-step [tutorial](https://aistudio.baidu.com/aistudio/education/group/info/25130) for `PaddleSpeech`, and you can use the basic function of `PaddleSpeech` with a free machine.
- If you want to use the command line function of Paddlespeech, you need to complete the following steps to install `PaddleSpeech`. For more information about how to use the command line function, you can see the [cli](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/paddlespeech/cli).
### Install Conda
Conda is a management system of the environment. You can go to [minicoda](https://docs.conda.io/en/latest/miniconda.html)  (select a version py>=3.7) to download and install the conda.
And then Install  conda dependencies for `paddlespeech` :

```bash
conda install -y -c conda-forge sox libsndfile bzip2
```
### Install C++ Compilation Environment 
(If you already have C++ compilation environment, you can miss this step.)
#### Windows
You need to install `Visual Studio` to make the C++ compilation environment.

https://visualstudio.microsoft.com/visual-cpp-build-tools/

You can also see [#1195](https://github.com/PaddlePaddle/PaddleSpeech/discussions/1195) for more help.

#### Mac
```bash
brew install gcc
```
#### Linux
```bash
#  centos
sudo yum install gcc gcc-c++
```
```bash
# ubuntu
sudo apt install build-essential
```
```bash
# Others
conda install -y -c gcc_linux-64=8.4.0 gxx_linux-64=8.4.0
```
### Install PaddleSpeech 
Some users may fail to install `kaldiio` due to the default download source, you can install `pytest-runner` at first；
```bash
pip install pytest-runner -i https://pypi.tuna.tsinghua.edu.cn/simple
```
Then you can use the following commands:
```bash
pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
pip install paddlespeech -i https://pypi.tuna.tsinghua.edu.cn/simple
```
You can also specify the version of paddlepaddle or install the develop version. 
```bash
# install 2.4.1 version. Note, 2.4.1 is just an example, please follow the minimum dependency of paddlepaddle for your selection
pip install paddlepaddle==2.4.1 -i https://mirror.baidu.com/pypi/simple
# install develop version
pip install paddlepaddle==0.0.0 -f https://www.paddlepaddle.org.cn/whl/linux/cpu-mkl/develop.html
```
> If you encounter problem with downloading **nltk_data** while using paddlespeech, it maybe due to your poor network, we suggest you download the [nltk_data](https://paddlespeech.bj.bcebos.com/Parakeet/tools/nltk_data.tar.gz) provided by us, and extract it to your `${HOME}`.

> If you fail to install paddlespeech-ctcdecoders, you only can not use deepspeech2 model inference. For other models, it doesn't matter.
 
## Medium: Get the Major Functions (Support Linux, mac and windows not support training)
If you want to get the major function of  `paddlespeech`, you need to do following steps:
### Git clone PaddleSpeech
You need to `git clone` this repository at first.
```bash
git clone https://github.com/PaddlePaddle/PaddleSpeech.git
cd PaddleSpeech
```

### Install Conda
Conda is a management system of the environment. You can go to [minicoda](https://docs.conda.io/en/latest/miniconda.html) to select a version (py>=3.7). For windows, you can follow the installing guide step by step and for linux and mac, you can use the following commands:
```bash
# download the miniconda
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -P tools/
# install the miniconda
bash tools/Miniconda3-latest-Linux-x86_64.sh -b
# conda init
$HOME/miniconda3/bin/conda init
# activate the conda
bash
```
Then you can create a conda virtual environment using the following command:
```bash
conda create -y -p tools/venv python=3.8
```
Activate the conda virtual environment:
```bash
conda activate tools/venv
```
Install  conda dependencies for `paddlespeech` :
```bash
conda install -y -c conda-forge sox libsndfile swig bzip2
```
### Install C++ Compilation Environment 
(If you already have C++ compilation environment, you can miss this step.)
Do not forget to install `gcc` and `gxx` on your system.
You can choose to use the scripts below to install them.

```bash
#  centos
sudo yum install gcc gcc-c++
```
```bash
# ubuntu
sudo apt install build-essential
```
```bash
# Others
conda install -y -c gcc_linux-64=8.4.0 gxx_linux-64=8.4.0
```
(Hip: Do not use the last script if you want to install by **Hard** way):
### Install PaddlePaddle
You can choose the `PaddlePaddle` version based on your system. For example, for CUDA 10.2, CuDNN7.6 install paddlepaddle-gpu 2.4.1:
```bash
# Note, 2.4.1 is just an example, please follow the minimum dependency of paddlepaddle for your selection
python3 -m pip install paddlepaddle-gpu==2.4.1 -i https://mirror.baidu.com/pypi/simple
```
You can also install the develop version of paddlepaddle. For example, for CUDA 10.2, CuDNN7.6 install paddlepaddle-gpu develop:
```bash
python3 -m pip install paddlepaddle-gpu==0.0.0.post102 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html
```
### Install PaddleSpeech 
You can install  `paddlespeech`  by the following command，then you can use the `ready-made` examples in `paddlespeech` :
```bash
# Some users may fail to install `kaldiio` due to the default download source, you can install `pytest-runner` at first；
pip install pytest-runner -i https://pypi.tuna.tsinghua.edu.cn/simple
# Make sure you are in the root directory of PaddleSpeech
pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple
```

## Hard: Get the Full Function (Support Ubuntu)
### Prerequisites
- Ubuntu >= 16.04.
- choice 1: working with `Ubuntu` Docker Container.
- choice 2: working on `Ubuntu` with `root` privilege. 

To avoid the trouble of environment setup, running in a Docker container is highly recommended. Otherwise, if you work on `Ubuntu` with `root` privilege, you can still complete the installation.

### Choice 1: Running in Docker Container (Recommend)
Docker is an open-source tool to build, ship, and run distributed applications in an isolated environment. If you  do not have a Docker environment, please refer to [Docker](https://www.docker.com/). If you will use GPU version, you also need to install [nvidia-docker](https://github.com/NVIDIA/nvidia-docker).

We provide docker images containing the latest PaddleSpeech code, and all environment and package dependencies are pre-installed. All you have to do is to **pull and run the docker image**. Then you can enjoy PaddleSpeech without any extra steps.

Get these images and guidance in [docker hub](https://hub.docker.com/repository/docker/paddlecloud/paddlespeech), including CPU, GPU, ROCm environment versions. 

If you have some customized requirements about automatic building docker images, you can get it in github repo [PaddlePaddle/PaddleCloud](https://github.com/PaddlePaddle/PaddleCloud/tree/main/tekton).

### Choice 2: Running in Ubuntu with Root Privilege
- Install `build-essential` by apt
```bash
sudo apt install build-essential
```
- Clone this repository
```bash
git clone https://github.com/PaddlePaddle/PaddleSpeech.git
# Enter the PaddleSpeech dir
cd PaddleSpeech
```
### Install the Conda
```bash
# download the miniconda
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -P tools/
# install the miniconda
bash tools/Miniconda3-latest-Linux-x86_64.sh -b
# conda init
$HOME/miniconda3/bin/conda init
# use the "bash" command to make the conda environment works
bash
# create a conda virtual environment
conda create -y -p tools/venv python=3.8
# Activate the conda virtual environment:
conda activate tools/venv
# Install the conda packages
conda install -y -c conda-forge sox libsndfile swig bzip2 libflac bc
```
### Install PaddlePaddle
Make sure you have GPU and the paddlepaddle version is right. For example, for CUDA 10.2, CuDNN7.6 install paddle 2.4.1:
```bash
# Note, 2.4 is just an example, please follow the minimum dependency of paddlepaddle for your selection
python3 -m pip install paddlepaddle-gpu==2.4.1 -i https://mirror.baidu.com/pypi/simple
```
You can also install the develop version of paddlepaddle. For example, for CUDA 10.2, CuDNN7.6 install paddlepaddle-gpu develop:
```bash
python3 -m pip install paddlepaddle-gpu==0.0.0.post102 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html
```
### Install PaddleSpeech in Developing Mode
Some users may fail to install `kaldiio` due to the default download source, you can install `pytest-runner` at first:
```bash
pip install pytest-runner -i https://pypi.tuna.tsinghua.edu.cn/simple
```
Then install PaddleSpeech:
```bash
pip install -e .[develop] -i https://pypi.tuna.tsinghua.edu.cn/simple
```
### Install the Kaldi (Optional)
```bash
pushd tools
bash extras/install_openblas.sh
bash extras/install_kaldi.sh
popd
```
-												update readme, test=doc_fix (#1156)


											
										
										
											3 years ago
+								([简体中文](./install_cn.md)|English)
-												Support paddle 2.x (#538)

* 2.x model

* model test pass

* fix data

* fix soundfile with flac support

* one thread dataloader test pass

* export feasture size
add trainer and utils
add setup model and dataloader
update travis using Bionic dist

* add venv; test under venv

* fix unittest; train and valid

* add train and config

* add config and train script

* fix ctc cuda memcopy error

* fix imports

* fix train valid log

* fix dataset batch shuffle shift start from 1
fix rank_zero_only decreator error
close tensorboard when train over
add decoding config and code

* test process can run

* test with decoding

* test and infer with decoding

* fix infer

* fix ctc loss
lr schedule
sortagrad
logger

* aishell egs

* refactor train
add aishell egs

* fix dataset batch shuffle and add batch sampler log
print model parameter

* fix model and ctc

* sequence_mask make all inputs zeros, which cause grad be zero, this is a bug of LessThanOp
add grad clip by global norm
add model train test notebook

* ctc loss
remove run prefix
using ord value as text id

* using unk when training
compute_loss need text ids
ord id using in test mode, which compute wer/cer

* fix tester

* add lr_deacy
refactor code

* fix tools

* fix ci
add tune
fix gru model bugs
add dataset and model test

* fix decoding

* refactor repo
fix decoding

* fix musan and rir dataset

* refactor io, loss, conv, rnn, gradclip, model, utils

* fix ci and import

* refactor model
add export jit model

* add deploy bin and test it

* rm uselss egs

* add layer tools

* refactor socket server
new model from pretrain

* remve useless

* fix instability loss and grad nan or inf for librispeech training

* fix sampler

* fix libri train.sh

* fix doc

* add license on cpp

* fix doc

* fix libri script

* fix install

* clip 5 wer 7.39, clip 400 wer 7.54, 1.8 clip 400 baseline 7.49
											
										
										
											3 years ago
+								# Installation
-												update readme, test=doc_fix (#1156)


											
										
										
											3 years ago
+								There are 3 ways to use `PaddleSpeech`. According to the degree of difficulty, the 3 ways can be divided into **Easy**, **Medium**, and **Hard**. You can choose one of the 3 ways to install `PaddleSpeech`.
-												revise the install.md, setup.py and makefile, rm the setup.sh

											
										
										
											3 years ago
-												Update install.md (#1127)

test=doc_fix
											
										
										
											3 years ago
+								| Way | Function                                                     | Support|
-												update readme, test=doc_fix (#1156)


											
										
										
											3 years ago
+								|:---- |:----------------------------------------------------------- |:----|
-												test=doc
											
										
										
											2 years ago
+								| Easy     | (1) Use command-line functions of PaddleSpeech. <br> (2) Experience PaddleSpeech on Ai Studio. | Linux, Mac(not support M1 chip)，Windows ( For more information about installation, see [#1195](https://github.com/PaddlePaddle/PaddleSpeech/discussions/1195)) |
-												test=doc
											
										
										
											2 years ago
+								| Medium     | Support major functions ，such as using the` ready-made `examples and using PaddleSpeech to train your model.                                           | Linux, Mac(not support M1 chip, not support training models)，Windows (not support training models) |
 								| Hard     | Support full function of Paddlespeech, including using join ctc decoder with kaldi([asr2](../../examples/librispeech/asr2 )), training n-gram language model, Montreal-Forced-Aligner, and so on. And you are more able to be a developer! | Ubuntu |
-												[install.md]Update install.md (#1123)

* Update install.md

* Update install.md

* Update install.md

* Update install.md
											
										
										
											3 years ago
 								## Prerequisites
-												Support paddle 2.x (#538)

* 2.x model

* model test pass

* fix data

* fix soundfile with flac support

* one thread dataloader test pass

* export feasture size
add trainer and utils
add setup model and dataloader
update travis using Bionic dist

* add venv; test under venv

* fix unittest; train and valid

* add train and config

* add config and train script

* fix ctc cuda memcopy error

* fix imports

* fix train valid log

* fix dataset batch shuffle shift start from 1
fix rank_zero_only decreator error
close tensorboard when train over
add decoding config and code

* test process can run

* test with decoding

* test and infer with decoding

* fix infer

* fix ctc loss
lr schedule
sortagrad
logger

* aishell egs

* refactor train
add aishell egs

* fix dataset batch shuffle and add batch sampler log
print model parameter

* fix model and ctc

* sequence_mask make all inputs zeros, which cause grad be zero, this is a bug of LessThanOp
add grad clip by global norm
add model train test notebook

* ctc loss
remove run prefix
using ord value as text id

* using unk when training
compute_loss need text ids
ord id using in test mode, which compute wer/cer

* fix tester

* add lr_deacy
refactor code

* fix tools

* fix ci
add tune
fix gru model bugs
add dataset and model test

* fix decoding

* refactor repo
fix decoding

* fix musan and rir dataset

* refactor io, loss, conv, rnn, gradclip, model, utils

* fix ci and import

* refactor model
add export jit model

* add deploy bin and test it

* rm uselss egs

* add layer tools

* refactor socket server
new model from pretrain

* remve useless

* fix instability loss and grad nan or inf for librispeech training

* fix sampler

* fix libri train.sh

* fix doc

* add license on cpp

* fix doc

* fix libri script

* fix install

* clip 5 wer 7.39, clip 400 wer 7.54, 1.8 clip 400 baseline 7.49
											
										
										
											3 years ago
+								- Python >= 3.7
-												fix a markdown typo

Remove the space between `[…]` and `(…)`.
											
										
										
											2 years ago
+								- PaddlePaddle latest version (please refer to the [Installation Guide](https://www.paddlepaddle.org.cn/documentation/docs/en/beginners_guide/index_en.html))
-												Update install.md (#1127)

test=doc_fix
											
										
										
											3 years ago
+								- C++ compilation environment
-												Update install.md (#2666)

Hip -> Tip
											
										
										
											2 years ago
+								- Tip: For Linux and Mac, do not use command `sh` instead of command `bash` in installation document.
 								- Tip: We recommand you to install `paddlepaddle` from https://mirror.baidu.com/pypi/simple and install `paddlespeech` from https://pypi.tuna.tsinghua.edu.cn/simple.
-												[install.md]Update install.md (#1123)

* Update install.md

* Update install.md

* Update install.md

* Update install.md
											
										
										
											3 years ago
-												update readme, test=doc_fix (#1156)


											
										
										
											3 years ago
+								## Easy: Get the Basic Function (Support Linux, Mac, and Windows)
-												update readme and install, test=doc_fix (#1236)


											
										
										
											3 years ago
+								- If you are newer to `PaddleSpeech` and want to experience it easily without your machine. We recommend you to use [AI Studio](https://aistudio.baidu.com/aistudio/index) to experience it. There is a step-by-step [tutorial](https://aistudio.baidu.com/aistudio/education/group/info/25130) for `PaddleSpeech`, and you can use the basic function of `PaddleSpeech` with a free machine.
-												update readme, test=doc_fix (#1156)


											
										
										
											3 years ago
+								- If you want to use the command line function of Paddlespeech, you need to complete the following steps to install `PaddleSpeech`. For more information about how to use the command line function, you can see the [cli](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/paddlespeech/cli).
-												[install.md]Update install.md (#1123)

* Update install.md

* Update install.md

* Update install.md

* Update install.md
											
										
										
											3 years ago
+								### Install Conda
-												Update install.md (#1127)

test=doc_fix
											
										
										
											3 years ago
+								Conda is a management system of the environment. You can go to [minicoda](https://docs.conda.io/en/latest/miniconda.html)  (select a version py>=3.7) to download and install the conda.
-												[install.md]Update install.md (#1123)

* Update install.md

* Update install.md

* Update install.md

* Update install.md
											
										
										
											3 years ago
+								And then Install  conda dependencies for `paddlespeech` :
-												revise the install.md, setup.py and makefile, rm the setup.sh

											
										
										
											3 years ago
-												[install.md]Update install.md (#1123)

* Update install.md

* Update install.md

* Update install.md

* Update install.md
											
										
										
											3 years ago
+								```bash
-												Update install.md (#1127)

test=doc_fix
											
										
										
											3 years ago
+								conda install -y -c conda-forge sox libsndfile bzip2
-												[install.md]Update install.md (#1123)

* Update install.md

* Update install.md

* Update install.md

* Update install.md
											
										
										
											3 years ago
+								```
-												Update install.md (#1127)

test=doc_fix
											
										
										
											3 years ago
+								### Install C++ Compilation Environment
 								(If you already have C++ compilation environment, you can miss this step.)
-												[install.md]Update install.md (#1123)

* Update install.md

* Update install.md

* Update install.md

* Update install.md
											
										
										
											3 years ago
+								#### Windows
-												update readme, test=doc_fix (#1156)


											
										
										
											3 years ago
+								You need to install `Visual Studio` to make the C++ compilation environment.
-												[install.md]Update install.md (#1123)

* Update install.md

* Update install.md

* Update install.md

* Update install.md
											
										
										
											3 years ago
-												install.md,test=doc_fix

											
										
										
											3 years ago
+								https://visualstudio.microsoft.com/visual-cpp-build-tools/
-												[Doc]improve install.md,test=doc_fix (#1200)

* [Doc]improve install.md,test=doc_fix

* test=doc_fix
											
										
										
											3 years ago
+								You can also see [#1195](https://github.com/PaddlePaddle/PaddleSpeech/discussions/1195) for more help.
-												[install.md]Update install.md (#1123)

* Update install.md

* Update install.md

* Update install.md

* Update install.md
											
										
										
											3 years ago
+								#### Mac
 								```bash
 								brew install gcc
 								```
 								#### Linux
 								```bash
 								#  centos
 								sudo yum install gcc gcc-c++
 								```
 								```bash
 								# ubuntu
 								sudo apt install build-essential
 								```
 								```bash
 								# Others
 								conda install -y -c gcc_linux-64=8.4.0 gxx_linux-64=8.4.0
 								```
 								### Install PaddleSpeech
-												Create install.md
											
										
										
											3 years ago
+								Some users may fail to install `kaldiio` due to the default download source, you can install `pytest-runner` at first；
 								```bash
 								pip install pytest-runner -i https://pypi.tuna.tsinghua.edu.cn/simple
 								```
 								Then you can use the following commands:
-												[install.md]Update install.md (#1123)

* Update install.md

* Update install.md

* Update install.md

* Update install.md
											
										
										
											3 years ago
+								```bash
-												update install introduction

											
										
										
											2 years ago
+								pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
-												[Install doc]  add the pypi-source (#1172)

* revise the doc,test=doc_fix

* test=doc_fix

* test=doc_fix

* Update install_cn.md

Co-authored-by: TianYuan <white-sky@qq.com>
											
										
										
											3 years ago
+								pip install paddlespeech -i https://pypi.tuna.tsinghua.edu.cn/simple
-												[install.md]Update install.md (#1123)

* Update install.md

* Update install.md

* Update install.md

* Update install.md
											
										
										
											3 years ago
+								```
-												update install introduction

											
										
										
											2 years ago
+								You can also specify the version of paddlepaddle or install the develop version.
 								```bash
-												update paddle from 2.4.0rc to 2.4.1 in readme (#2732)

* update paddle from 2.4.0rc to 2.4.1 in readme
											
										
										
											2 years ago
+								# install 2.4.1 version. Note, 2.4.1 is just an example, please follow the minimum dependency of paddlepaddle for your selection
 								pip install paddlepaddle==2.4.1 -i https://mirror.baidu.com/pypi/simple
-												update install introduction

											
										
										
											2 years ago
+								# install develop version
 								pip install paddlepaddle==0.0.0 -f https://www.paddlepaddle.org.cn/whl/linux/cpu-mkl/develop.html
 								```
-												update readme and install, test=doc_fix (#1236)


											
										
										
											3 years ago
+								> If you encounter problem with downloading **nltk_data** while using paddlespeech, it maybe due to your poor network, we suggest you download the [nltk_data](https://paddlespeech.bj.bcebos.com/Parakeet/tools/nltk_data.tar.gz) provided by us, and extract it to your `${HOME}`.
-												Create install.md
											
										
										
											3 years ago
-												test=doc
											
										
										
											2 years ago
+								> If you fail to install paddlespeech-ctcdecoders, you only can not use deepspeech2 model inference. For other models, it doesn't matter.
-												Create install.md
											
										
										
											3 years ago
-												test=doc
											
										
										
											2 years ago
+								## Medium: Get the Major Functions (Support Linux, mac and windows not support training)
-												reviset the install.md (#1165)


											
										
										
											3 years ago
+								If you want to get the major function of  `paddlespeech`, you need to do following steps:
 								### Git clone PaddleSpeech
 								You need to `git clone` this repository at first.
 								```bash
 								git clone https://github.com/PaddlePaddle/PaddleSpeech.git
 								cd PaddleSpeech
 								```
-												Support paddle 2.x (#538)

* 2.x model

* model test pass

* fix data

* fix soundfile with flac support

* one thread dataloader test pass

* export feasture size
add trainer and utils
add setup model and dataloader
update travis using Bionic dist

* add venv; test under venv

* fix unittest; train and valid

* add train and config

* add config and train script

* fix ctc cuda memcopy error

* fix imports

* fix train valid log

* fix dataset batch shuffle shift start from 1
fix rank_zero_only decreator error
close tensorboard when train over
add decoding config and code

* test process can run

* test with decoding

* test and infer with decoding

* fix infer

* fix ctc loss
lr schedule
sortagrad
logger

* aishell egs

* refactor train
add aishell egs

* fix dataset batch shuffle and add batch sampler log
print model parameter

* fix model and ctc

* sequence_mask make all inputs zeros, which cause grad be zero, this is a bug of LessThanOp
add grad clip by global norm
add model train test notebook

* ctc loss
remove run prefix
using ord value as text id

* using unk when training
compute_loss need text ids
ord id using in test mode, which compute wer/cer

* fix tester

* add lr_deacy
refactor code

* fix tools

* fix ci
add tune
fix gru model bugs
add dataset and model test

* fix decoding

* refactor repo
fix decoding

* fix musan and rir dataset

* refactor io, loss, conv, rnn, gradclip, model, utils

* fix ci and import

* refactor model
add export jit model

* add deploy bin and test it

* rm uselss egs

* add layer tools

* refactor socket server
new model from pretrain

* remve useless

* fix instability loss and grad nan or inf for librispeech training

* fix sampler

* fix libri train.sh

* fix doc

* add license on cpp

* fix doc

* fix libri script

* fix install

* clip 5 wer 7.39, clip 400 wer 7.54, 1.8 clip 400 baseline 7.49
											
										
										
											3 years ago
-												[DOCS] Correct the grammar and spelling mistakes of install.md (#1115)


											
										
										
											3 years ago
+								### Install Conda
-												test=doc
											
										
										
											2 years ago
+								Conda is a management system of the environment. You can go to [minicoda](https://docs.conda.io/en/latest/miniconda.html) to select a version (py>=3.7). For windows, you can follow the installing guide step by step and for linux and mac, you can use the following commands:
-												optimize the setup.py and setup.sh

											
										
										
											3 years ago
+								```bash
-												revise the install.md, setup.py and makefile, rm the setup.sh

											
										
										
											3 years ago
+								# download the miniconda
-												reviset the install.md (#1165)


											
										
										
											3 years ago
+								wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -P tools/
-												revise the install.md, setup.py and makefile, rm the setup.sh

											
										
										
											3 years ago
+								# install the miniconda
-												reviset the install.md (#1165)


											
										
										
											3 years ago
+								bash tools/Miniconda3-latest-Linux-x86_64.sh -b
-												revise the install.md, setup.py and makefile, rm the setup.sh

											
										
										
											3 years ago
+								# conda init
 								$HOME/miniconda3/bin/conda init
 								# activate the conda
 								bash
-												optimize the setup.py and setup.sh

											
										
										
											3 years ago
+								```
-												update readme, test=doc_fix (#1156)


											
										
										
											3 years ago
+								Then you can create a conda virtual environment using the following command:
-												revise the install.md, setup.py and makefile, rm the setup.sh

											
										
										
											3 years ago
+								```bash
-												update to py3.8

											
										
										
											1 year ago
+								conda create -y -p tools/venv python=3.8
-												optimize the setup.py and setup.sh

											
										
										
											3 years ago
+								```
-												revise the install.md, setup.py and makefile, rm the setup.sh

											
										
										
											3 years ago
+								Activate the conda virtual environment:
 								```bash
-												fix install

											
										
										
											3 years ago
+								conda activate tools/venv
-												Support paddle 2.x (#538)

* 2.x model

* model test pass

* fix data

* fix soundfile with flac support

* one thread dataloader test pass

* export feasture size
add trainer and utils
add setup model and dataloader
update travis using Bionic dist

* add venv; test under venv

* fix unittest; train and valid

* add train and config

* add config and train script

* fix ctc cuda memcopy error

* fix imports

* fix train valid log

* fix dataset batch shuffle shift start from 1
fix rank_zero_only decreator error
close tensorboard when train over
add decoding config and code

* test process can run

* test with decoding

* test and infer with decoding

* fix infer

* fix ctc loss
lr schedule
sortagrad
logger

* aishell egs

* refactor train
add aishell egs

* fix dataset batch shuffle and add batch sampler log
print model parameter

* fix model and ctc

* sequence_mask make all inputs zeros, which cause grad be zero, this is a bug of LessThanOp
add grad clip by global norm
add model train test notebook

* ctc loss
remove run prefix
using ord value as text id

* using unk when training
compute_loss need text ids
ord id using in test mode, which compute wer/cer

* fix tester

* add lr_deacy
refactor code

* fix tools

* fix ci
add tune
fix gru model bugs
add dataset and model test

* fix decoding

* refactor repo
fix decoding

* fix musan and rir dataset

* refactor io, loss, conv, rnn, gradclip, model, utils

* fix ci and import

* refactor model
add export jit model

* add deploy bin and test it

* rm uselss egs

* add layer tools

* refactor socket server
new model from pretrain

* remve useless

* fix instability loss and grad nan or inf for librispeech training

* fix sampler

* fix libri train.sh

* fix doc

* add license on cpp

* fix doc

* fix libri script

* fix install

* clip 5 wer 7.39, clip 400 wer 7.54, 1.8 clip 400 baseline 7.49
											
										
										
											3 years ago
+								```
-												Update install.md
											
										
										
											3 years ago
+								Install  conda dependencies for `paddlespeech` :
-												optimize the setup.py and setup.sh

											
										
										
											3 years ago
+								```bash
-												Update install.md (#1117)


											
										
										
											3 years ago
+								conda install -y -c conda-forge sox libsndfile swig bzip2
 								```
-												Update install.md (#1127)

test=doc_fix
											
										
										
											3 years ago
+								### Install C++ Compilation Environment
 								(If you already have C++ compilation environment, you can miss this step.)
-												Update install.md (#1117)


											
										
										
											3 years ago
+								Do not forget to install `gcc` and `gxx` on your system.
-												Update install.md (#1127)

test=doc_fix
											
										
										
											3 years ago
+								You can choose to use the scripts below to install them.
-												Update install.md (#1117)


											
										
										
											3 years ago
-												[install.md]Update install.md (#1123)

* Update install.md

* Update install.md

* Update install.md

* Update install.md
											
										
										
											3 years ago
+								```bash
 								#  centos
 								sudo yum install gcc gcc-c++
-												Update install.md (#1117)


											
										
										
											3 years ago
+								```
-												[install.md]Update install.md (#1123)

* Update install.md

* Update install.md

* Update install.md

* Update install.md
											
										
										
											3 years ago
+								```bash
 								# ubuntu
 								sudo apt install build-essential
 								```
 								```bash
 								# Others
-												Update install.md (#1117)


											
										
										
											3 years ago
+								conda install -y -c gcc_linux-64=8.4.0 gxx_linux-64=8.4.0
-												add conda init, use gcc 8.4.0

											
										
										
											3 years ago
+								```
-												[install.md]Update install.md (#1123)

* Update install.md

* Update install.md

* Update install.md

* Update install.md
											
										
										
											3 years ago
+								(Hip: Do not use the last script if you want to install by **Hard** way):
-												revise

											
										
										
											3 years ago
+								### Install PaddlePaddle
-												update paddle from 2.4.0rc to 2.4.1 in readme (#2732)

* update paddle from 2.4.0rc to 2.4.1 in readme
											
										
										
											2 years ago
+								You can choose the `PaddlePaddle` version based on your system. For example, for CUDA 10.2, CuDNN7.6 install paddlepaddle-gpu 2.4.1:
-												update dependency of paddle

											
										
										
											2 years ago
+								```bash
-												update paddle from 2.4.0rc to 2.4.1 in readme (#2732)

* update paddle from 2.4.0rc to 2.4.1 in readme
											
										
										
											2 years ago
+								# Note, 2.4.1 is just an example, please follow the minimum dependency of paddlepaddle for your selection
 								python3 -m pip install paddlepaddle-gpu==2.4.1 -i https://mirror.baidu.com/pypi/simple
-												update dependency of paddle

											
										
										
											2 years ago
+								```
-												Update install.md
											
										
										
											2 years ago
+								You can also install the develop version of paddlepaddle. For example, for CUDA 10.2, CuDNN7.6 install paddlepaddle-gpu develop:
-												revise

											
										
										
											3 years ago
+								```bash
-												update install.md

											
										
										
											2 years ago
+								python3 -m pip install paddlepaddle-gpu==0.0.0.post102 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html
-												revise

											
										
										
											3 years ago
+								```
-												Update install.md
											
										
										
											3 years ago
+								### Install PaddleSpeech
-												reviset the install.md (#1165)


											
										
										
											3 years ago
+								You can install  `paddlespeech`  by the following command，then you can use the `ready-made` examples in `paddlespeech` :
-												revise the install.md, setup.py and makefile, rm the setup.sh

											
										
										
											3 years ago
+								```bash
-												Create install.md
											
										
										
											3 years ago
+								# Some users may fail to install `kaldiio` due to the default download source, you can install `pytest-runner` at first；
 								pip install pytest-runner -i https://pypi.tuna.tsinghua.edu.cn/simple
-												reviset the install.md (#1165)


											
										
										
											3 years ago
+								# Make sure you are in the root directory of PaddleSpeech
-												[Install doc]  add the pypi-source (#1172)

* revise the doc,test=doc_fix

* test=doc_fix

* test=doc_fix

* Update install_cn.md

Co-authored-by: TianYuan <white-sky@qq.com>
											
										
										
											3 years ago
+								pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple
-												revise the install.md, setup.py and makefile, rm the setup.sh

											
										
										
											3 years ago
+								```
-												reviset the install.md (#1165)


											
										
										
											3 years ago
-												Update install.md (#1127)

test=doc_fix
											
										
										
											3 years ago
+								## Hard: Get the Full Function (Support Ubuntu)
-												revise the install.md, setup.py and makefile, rm the setup.sh

											
										
										
											3 years ago
+								### Prerequisites
-												Update install.md (#1127)

test=doc_fix
											
										
										
											3 years ago
+								- Ubuntu >= 16.04.
-												Update install.md
											
										
										
											3 years ago
+								- choice 1: working with `Ubuntu` Docker Container.
-												revise the install.md, setup.py and makefile, rm the setup.sh

											
										
										
											3 years ago
+								- choice 2: working on `Ubuntu` with `root` privilege.
-												update readme, test=doc_fix (#1156)


											
										
										
											3 years ago
+								To avoid the trouble of environment setup, running in a Docker container is highly recommended. Otherwise, if you work on `Ubuntu` with `root` privilege, you can still complete the installation.
-												revise the install.md, setup.py and makefile, rm the setup.sh

											
										
										
											3 years ago
-												update readme, test=doc_fix (#1156)


											
										
										
											3 years ago
+								### Choice 1: Running in Docker Container (Recommend)
-												Update install.md

Update chapter “Running in Docker Container (Recommend)”.
											
										
										
											2 years ago
+								Docker is an open-source tool to build, ship, and run distributed applications in an isolated environment. If you  do not have a Docker environment, please refer to [Docker](https://www.docker.com/). If you will use GPU version, you also need to install [nvidia-docker](https://github.com/NVIDIA/nvidia-docker).
-												Support paddle 2.x (#538)

* 2.x model

* model test pass

* fix data

* fix soundfile with flac support

* one thread dataloader test pass

* export feasture size
add trainer and utils
add setup model and dataloader
update travis using Bionic dist

* add venv; test under venv

* fix unittest; train and valid

* add train and config

* add config and train script

* fix ctc cuda memcopy error

* fix imports

* fix train valid log

* fix dataset batch shuffle shift start from 1
fix rank_zero_only decreator error
close tensorboard when train over
add decoding config and code

* test process can run

* test with decoding

* test and infer with decoding

* fix infer

* fix ctc loss
lr schedule
sortagrad
logger

* aishell egs

* refactor train
add aishell egs

* fix dataset batch shuffle and add batch sampler log
print model parameter

* fix model and ctc

* sequence_mask make all inputs zeros, which cause grad be zero, this is a bug of LessThanOp
add grad clip by global norm
add model train test notebook

* ctc loss
remove run prefix
using ord value as text id

* using unk when training
compute_loss need text ids
ord id using in test mode, which compute wer/cer

* fix tester

* add lr_deacy
refactor code

* fix tools

* fix ci
add tune
fix gru model bugs
add dataset and model test

* fix decoding

* refactor repo
fix decoding

* fix musan and rir dataset

* refactor io, loss, conv, rnn, gradclip, model, utils

* fix ci and import

* refactor model
add export jit model

* add deploy bin and test it

* rm uselss egs

* add layer tools

* refactor socket server
new model from pretrain

* remve useless

* fix instability loss and grad nan or inf for librispeech training

* fix sampler

* fix libri train.sh

* fix doc

* add license on cpp

* fix doc

* fix libri script

* fix install

* clip 5 wer 7.39, clip 400 wer 7.54, 1.8 clip 400 baseline 7.49
											
										
										
											3 years ago
-												Update install.md

update doc
											
										
										
											2 years ago
+								We provide docker images containing the latest PaddleSpeech code, and all environment and package dependencies are pre-installed. All you have to do is to **pull and run the docker image**. Then you can enjoy PaddleSpeech without any extra steps.
-												Support paddle 2.x (#538)

* 2.x model

* model test pass

* fix data

* fix soundfile with flac support

* one thread dataloader test pass

* export feasture size
add trainer and utils
add setup model and dataloader
update travis using Bionic dist

* add venv; test under venv

* fix unittest; train and valid

* add train and config

* add config and train script

* fix ctc cuda memcopy error

* fix imports

* fix train valid log

* fix dataset batch shuffle shift start from 1
fix rank_zero_only decreator error
close tensorboard when train over
add decoding config and code

* test process can run

* test with decoding

* test and infer with decoding

* fix infer

* fix ctc loss
lr schedule
sortagrad
logger

* aishell egs

* refactor train
add aishell egs

* fix dataset batch shuffle and add batch sampler log
print model parameter

* fix model and ctc

* sequence_mask make all inputs zeros, which cause grad be zero, this is a bug of LessThanOp
add grad clip by global norm
add model train test notebook

* ctc loss
remove run prefix
using ord value as text id

* using unk when training
compute_loss need text ids
ord id using in test mode, which compute wer/cer

* fix tester

* add lr_deacy
refactor code

* fix tools

* fix ci
add tune
fix gru model bugs
add dataset and model test

* fix decoding

* refactor repo
fix decoding

* fix musan and rir dataset

* refactor io, loss, conv, rnn, gradclip, model, utils

* fix ci and import

* refactor model
add export jit model

* add deploy bin and test it

* rm uselss egs

* add layer tools

* refactor socket server
new model from pretrain

* remve useless

* fix instability loss and grad nan or inf for librispeech training

* fix sampler

* fix libri train.sh

* fix doc

* add license on cpp

* fix doc

* fix libri script

* fix install

* clip 5 wer 7.39, clip 400 wer 7.54, 1.8 clip 400 baseline 7.49
											
										
										
											3 years ago
-												Update install.md

Update chapter “Running in Docker Container (Recommend)”.
											
										
										
											2 years ago
+								Get these images and guidance in [docker hub](https://hub.docker.com/repository/docker/paddlecloud/paddlespeech), including CPU, GPU, ROCm environment versions.
 								If you have some customized requirements about automatic building docker images, you can get it in github repo [PaddlePaddle/PaddleCloud](https://github.com/PaddlePaddle/PaddleCloud/tree/main/tekton).
-												Update install.md (#1127)

test=doc_fix
											
										
										
											3 years ago
-												revise the install.md, setup.py and makefile, rm the setup.sh

											
										
										
											3 years ago
+								### Choice 2: Running in Ubuntu with Root Privilege
-												[install.md]Update install.md (#1123)

* Update install.md

* Update install.md

* Update install.md

* Update install.md
											
										
										
											3 years ago
+								- Install `build-essential` by apt
 								```bash
 								sudo apt install build-essential
 								```
-												revise the install.md, setup.py and makefile, rm the setup.sh

											
										
										
											3 years ago
+								- Clone this repository
-												Update install.md
											
										
										
											3 years ago
+								```bash
-												revise the install.md, setup.py and makefile, rm the setup.sh

											
										
										
											3 years ago
+								git clone https://github.com/PaddlePaddle/PaddleSpeech.git
-												Update install.md (#1127)

test=doc_fix
											
										
										
											3 years ago
+								# Enter the PaddleSpeech dir
 								cd PaddleSpeech
-												revise the install.md, setup.py and makefile, rm the setup.sh

											
										
										
											3 years ago
+								```
 								### Install the Conda
 								```bash
-												reviset the install.md (#1165)


											
										
										
											3 years ago
+								# download the miniconda
 								wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -P tools/
 								# install the miniconda
 								bash tools/Miniconda3-latest-Linux-x86_64.sh -b
 								# conda init
 								$HOME/miniconda3/bin/conda init
-												revise the install.md, setup.py and makefile, rm the setup.sh

											
										
										
											3 years ago
+								# use the "bash" command to make the conda environment works
 								bash
-												update readme, test=doc_fix (#1156)


											
										
										
											3 years ago
+								# create a conda virtual environment
-												update to py3.8

											
										
										
											1 year ago
+								conda create -y -p tools/venv python=3.8
-												revise the install.md, setup.py and makefile, rm the setup.sh

											
										
										
											3 years ago
+								# Activate the conda virtual environment:
-												fix install

											
										
										
											3 years ago
+								conda activate tools/venv
-												update readme, test=doc_fix (#1156)


											
										
										
											3 years ago
+								# Install the conda packages
-												[install.md]remove the gcc and gxx in hard (#1122)

* remove the gcc and gxx in hard

* Update install.md

* Create install.md

* Update install.md
											
										
										
											3 years ago
+								conda install -y -c conda-forge sox libsndfile swig bzip2 libflac bc
-												revise the install.md, setup.py and makefile, rm the setup.sh

											
										
										
											3 years ago
+								```
-												revise

											
										
										
											3 years ago
+								### Install PaddlePaddle
-												update paddle from 2.4.0rc to 2.4.1 in readme (#2732)

* update paddle from 2.4.0rc to 2.4.1 in readme
											
										
										
											2 years ago
+								Make sure you have GPU and the paddlepaddle version is right. For example, for CUDA 10.2, CuDNN7.6 install paddle 2.4.1:
-												update dependency of paddle

											
										
										
											2 years ago
+								```bash
-												[audio] mv paddlespeech/audio to paddleaudio (#2706)

* split paddlespeech/audio to paddleaudio.
* add sox io ,sox effect, kaldi native fbank to paddleaudio.
											
										
										
											2 years ago
+								# Note, 2.4 is just an example, please follow the minimum dependency of paddlepaddle for your selection
-												update paddle from 2.4.0rc to 2.4.1 in readme (#2732)

* update paddle from 2.4.0rc to 2.4.1 in readme
											
										
										
											2 years ago
+								python3 -m pip install paddlepaddle-gpu==2.4.1 -i https://mirror.baidu.com/pypi/simple
-												update dependency of paddle

											
										
										
											2 years ago
+								```
-												Update install.md
											
										
										
											2 years ago
+								You can also install the develop version of paddlepaddle. For example, for CUDA 10.2, CuDNN7.6 install paddlepaddle-gpu develop:
-												revise

											
										
										
											3 years ago
+								```bash
-												update install.md

											
										
										
											2 years ago
+								python3 -m pip install paddlepaddle-gpu==0.0.0.post102 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html
-												revise

											
										
										
											3 years ago
+								```
-												Update install.md (#1127)

test=doc_fix
											
										
										
											3 years ago
+								### Install PaddleSpeech in Developing Mode
-												update install.md (#2590)


											
										
										
											2 years ago
+								Some users may fail to install `kaldiio` due to the default download source, you can install `pytest-runner` at first:
 								```bash
 								pip install pytest-runner -i https://pypi.tuna.tsinghua.edu.cn/simple
 								```
 								Then install PaddleSpeech:
-												Support paddle 2.x (#538)

* 2.x model

* model test pass

* fix data

* fix soundfile with flac support

* one thread dataloader test pass

* export feasture size
add trainer and utils
add setup model and dataloader
update travis using Bionic dist

* add venv; test under venv

* fix unittest; train and valid

* add train and config

* add config and train script

* fix ctc cuda memcopy error

* fix imports

* fix train valid log

* fix dataset batch shuffle shift start from 1
fix rank_zero_only decreator error
close tensorboard when train over
add decoding config and code

* test process can run

* test with decoding

* test and infer with decoding

* fix infer

* fix ctc loss
lr schedule
sortagrad
logger

* aishell egs

* refactor train
add aishell egs

* fix dataset batch shuffle and add batch sampler log
print model parameter

* fix model and ctc

* sequence_mask make all inputs zeros, which cause grad be zero, this is a bug of LessThanOp
add grad clip by global norm
add model train test notebook

* ctc loss
remove run prefix
using ord value as text id

* using unk when training
compute_loss need text ids
ord id using in test mode, which compute wer/cer

* fix tester

* add lr_deacy
refactor code

* fix tools

* fix ci
add tune
fix gru model bugs
add dataset and model test

* fix decoding

* refactor repo
fix decoding

* fix musan and rir dataset

* refactor io, loss, conv, rnn, gradclip, model, utils

* fix ci and import

* refactor model
add export jit model

* add deploy bin and test it

* rm uselss egs

* add layer tools

* refactor socket server
new model from pretrain

* remve useless

* fix instability loss and grad nan or inf for librispeech training

* fix sampler

* fix libri train.sh

* fix doc

* add license on cpp

* fix doc

* fix libri script

* fix install

* clip 5 wer 7.39, clip 400 wer 7.54, 1.8 clip 400 baseline 7.49
											
										
										
											3 years ago
+								```bash
-												[Install doc]  add the pypi-source (#1172)

* revise the doc,test=doc_fix

* test=doc_fix

* test=doc_fix

* Update install_cn.md

Co-authored-by: TianYuan <white-sky@qq.com>
											
										
										
											3 years ago
+								pip install -e .[develop] -i https://pypi.tuna.tsinghua.edu.cn/simple
-												Support paddle 2.x (#538)

* 2.x model

* model test pass

* fix data

* fix soundfile with flac support

* one thread dataloader test pass

* export feasture size
add trainer and utils
add setup model and dataloader
update travis using Bionic dist

* add venv; test under venv

* fix unittest; train and valid

* add train and config

* add config and train script

* fix ctc cuda memcopy error

* fix imports

* fix train valid log

* fix dataset batch shuffle shift start from 1
fix rank_zero_only decreator error
close tensorboard when train over
add decoding config and code

* test process can run

* test with decoding

* test and infer with decoding

* fix infer

* fix ctc loss
lr schedule
sortagrad
logger

* aishell egs

* refactor train
add aishell egs

* fix dataset batch shuffle and add batch sampler log
print model parameter

* fix model and ctc

* sequence_mask make all inputs zeros, which cause grad be zero, this is a bug of LessThanOp
add grad clip by global norm
add model train test notebook

* ctc loss
remove run prefix
using ord value as text id

* using unk when training
compute_loss need text ids
ord id using in test mode, which compute wer/cer

* fix tester

* add lr_deacy
refactor code

* fix tools

* fix ci
add tune
fix gru model bugs
add dataset and model test

* fix decoding

* refactor repo
fix decoding

* fix musan and rir dataset

* refactor io, loss, conv, rnn, gradclip, model, utils

* fix ci and import

* refactor model
add export jit model

* add deploy bin and test it

* rm uselss egs

* add layer tools

* refactor socket server
new model from pretrain

* remve useless

* fix instability loss and grad nan or inf for librispeech training

* fix sampler

* fix libri train.sh

* fix doc

* add license on cpp

* fix doc

* fix libri script

* fix install

* clip 5 wer 7.39, clip 400 wer 7.54, 1.8 clip 400 baseline 7.49
											
										
										
											3 years ago
+								```
-												revise

											
										
										
											3 years ago
+								### Install the Kaldi (Optional)
-												revise the install.md, setup.py and makefile, rm the setup.sh

											
										
										
											3 years ago
+								```bash
 								pushd tools
 								bash extras/install_openblas.sh
 								bash extras/install_kaldi.sh
 								popd
 								```