You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
|
|
5 years ago | |
|---|---|---|
| .. | ||
| cn | 5 years ago | |
| en | 5 years ago | |
| number_data | 5 years ago | |
| ru | 5 years ago | |
| universal | 5 years ago | |
| util | 5 years ago | |
| LICENSE | 5 years ago | |
| Makefile | 5 years ago | |
| README.md | 5 years ago | |
README.md
Text normalization covering grammars
This repository provides covering grammars for English and Russian text normalization as documented in:
Gorman, K., and Sproat, R. 2016. Minimally supervised number normalization. Transactions of the Association for Computational Linguistics 4: 507-519.
Ng, A. H., Gorman, K., and Sproat, R. 2017. Minimally supervised written-to-spoken text normalization. In ASRU, pages 665-670.
If you use these grammars in a publication, we would appreciate if you cite these works.
Building
The grammars are written in Thrax and compile into OpenFst FAR (FstARchive) files. To compile, simply run make in the src/ directory.
License
See LICENSE.
Mandatory disclaimer
This is not an official Google product.