History

TianYuan 3ce5dff460 refactor parakeet examples		3 years ago
..
data	merge parakeet repo into deepspeech	3 years ago
README.md	merge parakeet repo into deepspeech	3 years ago
get_g2p_data.py	refactor parakeet examples	3 years ago
get_textnorm_data.py	refactor parakeet examples	3 years ago
make_sclite.sh	merge parakeet repo into deepspeech	3 years ago
run.sh	merge parakeet repo into deepspeech	3 years ago
test_g2p.py	refactor parakeet examples	3 years ago
test_textnorm.py	refactor parakeet examples	3 years ago

Chinese Text Frontend Example

Here's an example for Chinese text frontend, including g2p and text normalization.

G2P

For g2p, we use BZNSYP's phone label as the ground truth and we delete silence tokens in labels and predicted phones.

You should Download BZNSYP from it's Official Website and extract it. Assume the path to the dataset is ~/datasets/BZNSYP.

We use WER as evaluation criterion.

For text normalization, the test data is data/textnorm_test_cases.txt, we use | as the separator of raw_data and normed_data.

We use CER as evaluation criterion.

If you want to use sclite to get more detail information of WER, you should run the command below to make sclite first.

./make_sclite.sh

Run the command below to get the results of test.

./run.sh

The avg WER of g2p is: 0.027495061517943988

The avg CER of text normalization is: 0.006388318503308237