|
|
|
SHELL:= /bin/bash
|
|
|
|
PYTHON:= python3.8
|
|
|
|
|
|
|
|
CXX ?= g++
|
|
|
|
CC ?= gcc # used for sph2pipe
|
|
|
|
# CXX = clang++ # Uncomment these lines...
|
|
|
|
# CC = clang # ...to build with Clang.
|
|
|
|
|
|
|
|
WGET ?= wget --no-check-certificate
|
|
|
|
|
|
|
|
.PHONY: all clean
|
|
|
|
|
|
|
|
all: apt.done kenlm.done mfa.done sctk.done
|
Support paddle 2.x (#538)
* 2.x model
* model test pass
* fix data
* fix soundfile with flac support
* one thread dataloader test pass
* export feasture size
add trainer and utils
add setup model and dataloader
update travis using Bionic dist
* add venv; test under venv
* fix unittest; train and valid
* add train and config
* add config and train script
* fix ctc cuda memcopy error
* fix imports
* fix train valid log
* fix dataset batch shuffle shift start from 1
fix rank_zero_only decreator error
close tensorboard when train over
add decoding config and code
* test process can run
* test with decoding
* test and infer with decoding
* fix infer
* fix ctc loss
lr schedule
sortagrad
logger
* aishell egs
* refactor train
add aishell egs
* fix dataset batch shuffle and add batch sampler log
print model parameter
* fix model and ctc
* sequence_mask make all inputs zeros, which cause grad be zero, this is a bug of LessThanOp
add grad clip by global norm
add model train test notebook
* ctc loss
remove run prefix
using ord value as text id
* using unk when training
compute_loss need text ids
ord id using in test mode, which compute wer/cer
* fix tester
* add lr_deacy
refactor code
* fix tools
* fix ci
add tune
fix gru model bugs
add dataset and model test
* fix decoding
* refactor repo
fix decoding
* fix musan and rir dataset
* refactor io, loss, conv, rnn, gradclip, model, utils
* fix ci and import
* refactor model
add export jit model
* add deploy bin and test it
* rm uselss egs
* add layer tools
* refactor socket server
new model from pretrain
* remve useless
* fix instability loss and grad nan or inf for librispeech training
* fix sampler
* fix libri train.sh
* fix doc
* add license on cpp
* fix doc
* fix libri script
* fix install
* clip 5 wer 7.39, clip 400 wer 7.54, 1.8 clip 400 baseline 7.49
4 years ago
|
|
|
|
|
|
|
|
|
|
|
virtualenv.done:
|
Support paddle 2.x (#538)
* 2.x model
* model test pass
* fix data
* fix soundfile with flac support
* one thread dataloader test pass
* export feasture size
add trainer and utils
add setup model and dataloader
update travis using Bionic dist
* add venv; test under venv
* fix unittest; train and valid
* add train and config
* add config and train script
* fix ctc cuda memcopy error
* fix imports
* fix train valid log
* fix dataset batch shuffle shift start from 1
fix rank_zero_only decreator error
close tensorboard when train over
add decoding config and code
* test process can run
* test with decoding
* test and infer with decoding
* fix infer
* fix ctc loss
lr schedule
sortagrad
logger
* aishell egs
* refactor train
add aishell egs
* fix dataset batch shuffle and add batch sampler log
print model parameter
* fix model and ctc
* sequence_mask make all inputs zeros, which cause grad be zero, this is a bug of LessThanOp
add grad clip by global norm
add model train test notebook
* ctc loss
remove run prefix
using ord value as text id
* using unk when training
compute_loss need text ids
ord id using in test mode, which compute wer/cer
* fix tester
* add lr_deacy
refactor code
* fix tools
* fix ci
add tune
fix gru model bugs
add dataset and model test
* fix decoding
* refactor repo
fix decoding
* fix musan and rir dataset
* refactor io, loss, conv, rnn, gradclip, model, utils
* fix ci and import
* refactor model
add export jit model
* add deploy bin and test it
* rm uselss egs
* add layer tools
* refactor socket server
new model from pretrain
* remve useless
* fix instability loss and grad nan or inf for librispeech training
* fix sampler
* fix libri train.sh
* fix doc
* add license on cpp
* fix doc
* fix libri script
* fix install
* clip 5 wer 7.39, clip 400 wer 7.54, 1.8 clip 400 baseline 7.49
4 years ago
|
|
|
test -d venv || virtualenv -p $(PYTHON) venv
|
|
|
|
touch virtualenv.done
|
Support paddle 2.x (#538)
* 2.x model
* model test pass
* fix data
* fix soundfile with flac support
* one thread dataloader test pass
* export feasture size
add trainer and utils
add setup model and dataloader
update travis using Bionic dist
* add venv; test under venv
* fix unittest; train and valid
* add train and config
* add config and train script
* fix ctc cuda memcopy error
* fix imports
* fix train valid log
* fix dataset batch shuffle shift start from 1
fix rank_zero_only decreator error
close tensorboard when train over
add decoding config and code
* test process can run
* test with decoding
* test and infer with decoding
* fix infer
* fix ctc loss
lr schedule
sortagrad
logger
* aishell egs
* refactor train
add aishell egs
* fix dataset batch shuffle and add batch sampler log
print model parameter
* fix model and ctc
* sequence_mask make all inputs zeros, which cause grad be zero, this is a bug of LessThanOp
add grad clip by global norm
add model train test notebook
* ctc loss
remove run prefix
using ord value as text id
* using unk when training
compute_loss need text ids
ord id using in test mode, which compute wer/cer
* fix tester
* add lr_deacy
refactor code
* fix tools
* fix ci
add tune
fix gru model bugs
add dataset and model test
* fix decoding
* refactor repo
fix decoding
* fix musan and rir dataset
* refactor io, loss, conv, rnn, gradclip, model, utils
* fix ci and import
* refactor model
add export jit model
* add deploy bin and test it
* rm uselss egs
* add layer tools
* refactor socket server
new model from pretrain
* remve useless
* fix instability loss and grad nan or inf for librispeech training
* fix sampler
* fix libri train.sh
* fix doc
* add license on cpp
* fix doc
* fix libri script
* fix install
* clip 5 wer 7.39, clip 400 wer 7.54, 1.8 clip 400 baseline 7.49
4 years ago
|
|
|
|
|
|
|
clean:
|
|
|
|
rm -fr venv
|
|
|
|
find -iname "*.pyc" -delete
|
|
|
|
rm -rf kenlm
|
|
|
|
|
|
|
|
|
|
|
|
apt.done:
|
|
|
|
apt update -y
|
|
|
|
apt install -y bc flac jq vim tig tree sox pkg-config libsndfile1 libflac-dev libogg-dev libvorbis-dev libboost-dev swig python3-dev
|
|
|
|
echo "check_certificate = off" >> ~/.wgetrc
|
|
|
|
touch apt.done
|
|
|
|
|
|
|
|
|
|
|
|
kenlm.done:
|
|
|
|
# Ubuntu 16.04 透過 apt 會安裝 boost 1.58.0
|
|
|
|
# it seems that boost (1.54.0) requires higher version. After I switched to g++-5 it compiles normally.
|
|
|
|
apt install -y --allow-unauthenticated build-essential cmake libboost-system-dev libboost-thread-dev libboost-program-options-dev libboost-test-dev libeigen3-dev zlib1g-dev libbz2-dev liblzma-dev
|
|
|
|
apt-get install -y gcc-5 g++-5 && update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 50 && update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 50
|
|
|
|
test -d kenlm || $(WGET) -O - https://kheafield.com/code/kenlm.tar.gz | tar xz
|
|
|
|
rm -rf kenlm/build && mkdir -p kenlm/build && cd kenlm/build && cmake .. && make -j4 && make install
|
|
|
|
cd kenlm && python3 setup.py install
|
|
|
|
touch kenlm.done
|
|
|
|
|
Support paddle 2.x (#538)
* 2.x model
* model test pass
* fix data
* fix soundfile with flac support
* one thread dataloader test pass
* export feasture size
add trainer and utils
add setup model and dataloader
update travis using Bionic dist
* add venv; test under venv
* fix unittest; train and valid
* add train and config
* add config and train script
* fix ctc cuda memcopy error
* fix imports
* fix train valid log
* fix dataset batch shuffle shift start from 1
fix rank_zero_only decreator error
close tensorboard when train over
add decoding config and code
* test process can run
* test with decoding
* test and infer with decoding
* fix infer
* fix ctc loss
lr schedule
sortagrad
logger
* aishell egs
* refactor train
add aishell egs
* fix dataset batch shuffle and add batch sampler log
print model parameter
* fix model and ctc
* sequence_mask make all inputs zeros, which cause grad be zero, this is a bug of LessThanOp
add grad clip by global norm
add model train test notebook
* ctc loss
remove run prefix
using ord value as text id
* using unk when training
compute_loss need text ids
ord id using in test mode, which compute wer/cer
* fix tester
* add lr_deacy
refactor code
* fix tools
* fix ci
add tune
fix gru model bugs
add dataset and model test
* fix decoding
* refactor repo
fix decoding
* fix musan and rir dataset
* refactor io, loss, conv, rnn, gradclip, model, utils
* fix ci and import
* refactor model
add export jit model
* add deploy bin and test it
* rm uselss egs
* add layer tools
* refactor socket server
new model from pretrain
* remve useless
* fix instability loss and grad nan or inf for librispeech training
* fix sampler
* fix libri train.sh
* fix doc
* add license on cpp
* fix doc
* fix libri script
* fix install
* clip 5 wer 7.39, clip 400 wer 7.54, 1.8 clip 400 baseline 7.49
4 years ago
|
|
|
|
|
|
|
mfa.done:
|
|
|
|
test -d montreal-forced-aligner || $(WGET) https://paddlespeech.bj.bcebos.com/Parakeet/montreal-forced-aligner_linux.tar.gz
|
|
|
|
tar xvf montreal-forced-aligner_linux.tar.gz
|
|
|
|
touch mfa.done
|
|
|
|
|
|
|
|
openblas.done:
|
|
|
|
bash extras/install_openblas.sh
|
|
|
|
touch openblas.done
|
|
|
|
|
|
|
|
kaldi.done: apt.done openblas.done
|
|
|
|
bash extras/install_kaldi.sh
|
|
|
|
touch kaldi.done
|
|
|
|
|
|
|
|
sctk.done:
|
|
|
|
./extras/install_sclite.sh
|
|
|
|
touch sctk.done
|
|
|
|
|
|
|
|
srilm.done:
|
|
|
|
./extras/install_liblbfgs.sh
|
|
|
|
extras/install_srilm.sh
|
|
|
|
touch srilm.done
|
|
|
|
|
|
|
|
######################
|
|
|
|
dev: python conda_packages.done sctk.done
|
|
|
|
|
|
|
|
# Use pip for paddle installation even if you have anaconda
|
|
|
|
ifneq ($(shell test -f ./activate_python.sh && grep 'conda activate' ./activate_python.sh),)
|
|
|
|
USE_CONDA := 1
|
|
|
|
else
|
|
|
|
USE_CONDA :=
|
|
|
|
endif
|
|
|
|
|
|
|
|
python: activate_python.sh
|
|
|
|
|
|
|
|
activate_python.sh:
|
|
|
|
test -f activate_python.sh || { echo "Error: Run ./setup_python.sh or ./setup_anaconda.sh"; exit 1; }
|
|
|
|
|
|
|
|
bc.done: activate_python.sh
|
|
|
|
. ./activate_python.sh && { command -v bc || conda install -y bc -c conda-forge; }
|
|
|
|
touch bc.done
|
|
|
|
cmake.done: activate_python.sh
|
|
|
|
. ./activate_python.sh && { command -v cmake || conda install -y cmake; }
|
|
|
|
touch cmake.done
|
|
|
|
flac.done: activate_python.sh
|
|
|
|
. ./activate_python.sh && { command -v flac || conda install -y libflac -c conda-forge; }
|
|
|
|
touch flac.done
|
|
|
|
ffmpeg.done: activate_python.sh
|
|
|
|
. ./activate_python.sh && { command -v ffmpeg || conda install -y ffmpeg -c conda-forge; }
|
|
|
|
touch ffmpeg.done
|
|
|
|
sox.done: activate_python.sh
|
|
|
|
. ./activate_python.sh && { command -v sox || conda install -y sox -c conda-forge; }
|
|
|
|
touch sox.done
|
|
|
|
sndfile.done: activate_python.sh
|
|
|
|
. ./activate_python.sh && { python3 -c "from ctypes.util import find_library as F; assert F('sndfile') is not None" || conda install -y libsndfile=1.0.28 -c conda-forge; }
|
|
|
|
touch sndfile.done
|
|
|
|
ifneq ($(strip $(USE_CONDA)),)
|
|
|
|
conda_packages.done: bc.done cmake.done flac.done ffmpeg.done sox.done sndfile.done
|
|
|
|
else
|
|
|
|
conda_packages.done:
|
|
|
|
endif
|
|
|
|
touch conda_packages.done
|