You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
PaddleSpeech/third_party/chinese_text_normalization/thrax/src
Hui Zhang 538bf271eb
chinese char/word ngram lm (#613)
3 years ago
..
cn chinese char/word ngram lm (#613) 3 years ago
en chinese char/word ngram lm (#613) 3 years ago
number_data chinese char/word ngram lm (#613) 3 years ago
ru chinese char/word ngram lm (#613) 3 years ago
universal chinese char/word ngram lm (#613) 3 years ago
util chinese char/word ngram lm (#613) 3 years ago
LICENSE chinese char/word ngram lm (#613) 3 years ago
Makefile chinese char/word ngram lm (#613) 3 years ago
README.md chinese char/word ngram lm (#613) 3 years ago

README.md

Text normalization covering grammars

This repository provides covering grammars for English and Russian text normalization as documented in:

Gorman, K., and Sproat, R. 2016. Minimally supervised number normalization. Transactions of the Association for Computational Linguistics 4: 507-519.

Ng, A. H., Gorman, K., and Sproat, R. 2017. Minimally supervised written-to-spoken text normalization. In ASRU, pages 665-670.

If you use these grammars in a publication, we would appreciate if you cite these works.

Building

The grammars are written in Thrax and compile into OpenFst FAR (FstARchive) files. To compile, simply run make in the src/ directory.

License

See LICENSE.

Mandatory disclaimer

This is not an official Google product.