chinese char/word ngram lm (#613)

* add ngram lm egs * add zhon repo * install kenlm, zhon * format * add chinese_text_normalization repo * add ngram lm egs
5 years ago · 538bf271eb
parent 2bdf4c946a
commit 538bf271eb
139 changed files with 25988 additions and 12 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1,6 +1,5 @@
 .DS_Store
 *.pyc
 tools/venv
 .vscode
 *.log
 *.pdmodel
@ -10,3 +9,6 @@ tools/venv
 *.tar.gz
 .ipynb_checkpoints
 *.npz
 tools/venv
 tools/kenlm
--- a/README.md
+++ b/README.md
@ -52,4 +52,4 @@ DeepSpeech is provided under the [Apache-2.0 License](./LICENSE).
 ## Acknowledgement
-We depends on many open source repos. See [References](doc/src/reference.md) for more information.
+We depends on many open source repos. See [References](doc/src/reference.md) for more information.
--- a/README_cn.md
+++ b/README_cn.md
@ -50,4 +50,4 @@ DeepSpeech遵循[Apache-2.0开源协议](./LICENSE)。
 ## 感谢
-开发中参考一些优秀的仓库，详情参见 [References](doc/src/reference.md)。
+开发中参考一些优秀的仓库，详情参见 [References](doc/src/reference.md)。
--- a/examples/ngram_lm/.gitignore
+++ b/examples/ngram_lm/.gitignore
@ -0,0 +1 @@
 exp/
--- a/examples/ngram_lm/data/README.md
+++ b/examples/ngram_lm/data/README.md
@ -0,0 +1,2 @@
 text_correct.txt: https://github.com/shibing624/pycorrector/raw/master/tests/test_file.txt
 custom_confusion.txt: https://github.com/shibing624/pycorrector/raw/master/tests/custom_confusion.txt
--- a/examples/ngram_lm/data/custom_confusion.txt
+++ b/examples/ngram_lm/data/custom_confusion.txt
--- a/examples/ngram_lm/data/text_correct.txt
+++ b/examples/ngram_lm/data/text_correct.txt
@ -0,0 +1,220 @@
 少先队员因该为老人让坐
 祛痘印可以吗？有效果吗？
 不知这款牛奶口感怎样？ 小孩子喝行吗！
 是转基因油?
 我家宝宝13斤用多大码的
 会起坨吗？
 请问给送上楼吗？
 亲是送赁上门吗
 送货时候有外包装没有还是直接发货过来
 会不会有坏的？
 这个米煮粥好还煮饭好吃
 有送的马克杯吗？
 这纸尿裤分男孩女孩使用吗
 买的路由器老是断网，拔了跳过路由器就可以用了
 能泡开不？辣度几
 请问这个米蒸出来是一粒一粒的还是一坨一坨的？
 水和其他商品一样送货上门，还是自提呀？
 快两个月的孩子 要穿什么码的
 买回来会不会过期？
 洗的还干净把吧
 路由器怎么样啊，掉线严重吗？
 你好这米是五斤还是十斤
 收安费不
 给送开果器吗
 这纸好用吗？我看有不少的差评
 自用好用吗
 请问袜子穿久了会往下掉吗？
 每一卷是独立包装的吗？
 这个火龙果口味怎么样？甜不甜？
 买这个送红杯吗？
 一袋子多少斤
 这款拉拉裤有味道吗？超市买的没有味道，不知道这个怎么样
 我想问下拉拉裤上面那个贴的用来干嘛的，怎么用
 这里边有没有枣核
 玫瑰和薰衣草哪个好闻
 这个冰糖质量怎么样，有杂质吗
 倒水的时候漏吗
 请问大家，这个水壶烧出来的水有异味吗？因为给宝宝用所以很在意，谢谢大家
 这米煮出来糯吗？
 这在款子好用吗？有香味吗？
 到底是棉花的材质还是化纤的无纺布啊 求问？
 我用360手机能充电几次
 亲这纸好用吗？值得买吗？
 24瓶？还是12瓶
 是否是真的纸？
 适用机洗吗?
 好吃不好吃啊
 真的好用吗？我也想买 
 你们拿到是什么版本的
 这水和超市一样吗？质量保证吗？
 可以丢进马桶冲吗？
 纸会不会粗？
 这个翠的还不是不催的呀。。没有吃的那种不脆
 这个好用吗
 这纸有香味的吗？
 是最近的生产日期吗
 赠品是什么呀
 这是两瓶还是一瓶的价格？
 请问这是硬壳还是软壳？
 亲，苹果收到后有坏的吗？
 适合两人用吗
 这个直接喝好不好喝   还是要热一下
 纸有木有刺鼻气味？
 酸不酸？？？
 这啤好渴吗?
 跟安慕希哪个比较好喝？
 好用么，主要是带宝宝出去玩的时候用的多？
 刚出生的宝宝用什么码？
 能当洗手液吗？
 是不是很小包的那一种？50块有24包便宜的有点不敢相信
 好用吗，会不会起会不会起坨？
 这个口可以直接放饮水机上用吗？
 这种纸掉粉末吗
 手机好用吗？会卡吗
 开盖里面是拉环的吗？
 这个电池真的需要一直换吗？
 好用吗？是不是正品？
 请问有尿显吗
 容易发烫吗
 苹果有腊吗
 这油有这么好吗？不是过期的吧
 这个夏天用会不会红屁股？透气性好吗
 你好。 我想问下这个是尿不湿吗 ？
 这奶为啥这么便宜？
 你们买的酱油会没有颜色吗，像水一样，看着都没胃口
 这个是机诜，还是手洗
 这个卫生巾带香味吗？
 这种洗发水好用吗
 有餡嗎？好不好吃
 纸质不会好差吗？
 亲们，此米是真空包装吗？
 是软毛的吗？！！
 请问大家德运牌子的好喝还是安佳的？
 这纸好用吗，薄嘛
 这壶保温吗
 这个威露士货到了就是跟图片上的一样吗？只要是图片上显示的都有吗？
 你们买的牛奶是最近日期吗
 这个除菌液，是单独放在滚筒洗衣机除菌液格，还是与洗衣液混合放在洗衣液格？
 请问你们的三只松鼠寄回来的时候是用袋子装着的吗
 1kg是不是两斤？
 洗衣皂怎么样啊，味道重吗，用之后好不好清洗啊。
 我要请问你这个是不是那个拉拉裤吗？这个花纹是不是拉拉裤？
 好多人都说小米运动升级后手环就连不上了，你们有没有这种情况？
 这部手机运行速度快不快？
 新生儿可以用吗 抽一张会带出来很多张吗
 洗后有香味吗
 体验装有多少片
 银装怎么样？会漏尿吗？你们都是多久换一次的？？（我家大概2-3个小时左右，宝宝醒一回换一次）
 声音大吗？好用不？
 抽纸有味吗
 苹果好吃吗？打过蜡吗？是不是坏的很多？
 70g和80g得区别是啥？
 袋装的和瓶装的洗衣液是一样的么？
 噪音很大吗
 烧出来的水会不会很多一块一块的东西
 这个吹风真心好用吗？我今晚下单什么时候到
 请问各位宝妈 这个乳垫的背胶粘吗
 M号的你们给宝宝用到多大啊？几个月？我家宝宝3个月5㎏重，用花王的M号觉得小了。不知道这个怎么样？
 这个喝了能找到女朋友吗
 这袜子耐不耐穿
 请问好用么  是正品么
 怎么储藏 我买了两天在常温阴凉处放着下层有些化了 需要放冰箱冷冻吗
 这批苏打水是否有股消毒水的味道？
 质量怎么样，看到那么多差评，我不敢买了。
 会不会有烂的
 为什么我买的用完之后没香味
 甜吗？？？？
 我看到评论里的差评说大米里有虫，是真的吗？
 要放冰箱冷藏吗
 好不好吃啊
 这油怎么样   炒菜香不香
 这纸擦手时有屑吗？
 是正品的吗？
 好用吗
 这个特浓的苦不苦
 这个好用吗？
 米里真的有虫吗
 是金装的吗？
 双内胆有什么区别，两个一样的吗？
 请问这款水可以降尿酸吗？
 好用吗这个
 购物袋结实吗，能放重东西吗
 你好，请问这款可以剃头发刮光头吗
 这个纸巾质量如何？好用吗？
 好用吗？小孩子喜欢吗？
 亲。煮面时会糊锅不
 包邮吗运费多少
 会一抽就两三张一起抽起来吗？
 一箱几桶油呀
 这个吹风机分冷风和热风吗
 发什么快递呢
 请问一下，有些枸杞说是不要洗，你们的是否建议洗呢？
 请问纸有异味吗？我以前买过一箱就是这个居然有异味。
 这是6个么  怎么觉得有好多
 我买的荣耀10横滑home键进入后台这个操作成功率特别低，你们也是这样吗？
 你们的有塑料味吗，机械的
 小米路由器真心说的有这么差吗
 请问大家这款刮的干净吗？谢谢
 会有塑料味吗
 质量真的很差吗？不敢买
 这纸有气味吗
 我买两箱怎么要运费
 这个标准果好吃吗，酸不酸
 稀吗？是不是有种兑了水的感觉？
 威露士和滴露的消毒液哪个更好用呢？
 曰期是几月份的
 手机容易折弯吗？
 我家宝宝25斤XL会紧吗？
 这款200克一箱的纸张和10卷手提的价格相差那么多 质量一样吗？
 豆浆可以打吗
 电量有百分比吗
 用快递送过来瓶子会不会打破
 是三相电吗，有空调摇控器吧
 拿它送人，有问题吗？？
 安幕希好喝吗？
 这款纸尿裤好用吗？和尤妮佳比较哪个好用些？
 2层厚吗？是不是一到水就烂了
 为什么我宝宝拉粑粑后面总是漏出来我已经贴的很牢了，10斤的宝宝用S号也不小啊你们用了没这种情况吗？
 这个产品好用吗？
 刷毛柔软度咋样，这么便宜，会不会是很小个的
 会不会有过敏的情况呀
 请问是辣条吗
 这种米只能煮粥不能煮饭吗
 可以开袋即食吗？
 这米好吃吗？
 这个充电宝充满电需要多久
 这个奶开了可以保质喝两天吗
 这种薰衣草的洗衣液怎么样
 你们的小米六边框掉漆了吗？？？
 这个是机洗用还是手洗用的啊
 厚度怎么样、起球吗感谢大哥大姐们
 这个好喝还是康师傅红茶好喝
 这种洁面膏会不会过敏，我上次用的火山岩冰感洁面啫喱对那种过敏，但听别人说那种稀的本来就特别容易过敏，不知道这种洁面膏会不会过敏！
 这杯那么多差评，是真的吗，吓得我都不敢买了
 枣是免洗的吗？
 这个尿不湿尿过会起坨吗
 感觉和苏菲比哪个更好用呢？
 煮出来的饭香吗？
 你好！请问这个水壶烧水开了是自动切电吗？
 这个跟 原木纯品 那个啥区别？不是原木纸浆做的？
 能放冰箱吗
 纸有味道吗？
 2016全国高考卷答题模板
 2016全国大考卷答题模板
 2016全国低考卷答题模板
 床前明月光，疑是地上霜
 床前星星光，疑是地上霜
 床前白月光，疑是地上霜
 落霞与孤鹜齐飞，秋水共长天一色
 落霞与孤鹜齐跑，秋水共长天一色
 落霞与孤鹜双飞，秋水共长天一色
 众里寻他千百度，蓦然回首，那人却在，灯火阑珊处
 众里寻她千百度，蓦然回首，那人却在，灯火阑珊处
 众里寻ta千百度，蓦然回首，那人却在，灯火阑珊处
 吸烟的人容*得癌症
 就只听着我*妈所说的话，
 就接受环境污*用化肥和农药，
 是或者接受环境污染用化肥和农药，
 现在的香港比从前的*荣很多。
 现在的香港比*前的饭荣很多。
--- a/examples/ngram_lm/local/build_zh_lm.sh
+++ b/examples/ngram_lm/local/build_zh_lm.sh
@ -0,0 +1,37 @@
 #!/bin/bash
 set -e
 stage=0
 stop_stage=100
 order=5
 mem=80%
 prune=0
 a=22
 q=8
 b=8
 source ${MAIN_ROOT}/utils/parse_options.sh || exit 1;
 if [ $# != 3 ]; then
    echo "$0 token_type exp/text exp/text.arpa"
    echo $@
    exit 1
 fi
 # char or word
 type=$1
 text=$2
 arpa=$3
 if [ $stage -le 0 ] && [ $stop_stage -ge 0 ];then
    # text tn & wordseg preprocess
    echo "process text."
    python3 ${MAIN_ROOT}/utils/zh_tn.py ${type} ${text} ${text}.${type}.tn
 fi
 if [ $stage -le 1 ] && [ $stop_stage -ge 1 ];then
    # train ngram lm
    echo "build lm."
    bash ${MAIN_ROOT}/utils/ngram_train.sh --order ${order} --mem ${mem} --prune "${prune}" ${text}.${type}.tn ${arpa}
 fi
--- a/examples/ngram_lm/local/download_lm_zh.sh
+++ b/examples/ngram_lm/local/download_lm_zh.sh
@ -0,0 +1,21 @@
 #! /usr/bin/env bash
 . ${MAIN_ROOT}/utils/utility.sh
 DIR=data/lm
 mkdir -p ${DIR}
 URL='https://deepspeech.bj.bcebos.com/zh_lm/zh_giga.no_cna_cmn.prune01244.klm'
 MD5="29e02312deb2e59b3c8686c7966d4fe3"
 TARGET=${DIR}/zh_giga.no_cna_cmn.prune01244.klm
 echo "Download language model ..."
 download $URL $MD5 $TARGET
 if [ $? -ne 0 ]; then
    echo "Fail to download the language model!"
    exit 1
 fi
 exit 0
--- a/examples/ngram_lm/local/kenlm_score_test.py
+++ b/examples/ngram_lm/local/kenlm_score_test.py
@ -0,0 +1,187 @@
 # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import os
 import sys
 import time
 import jieba
 import kenlm
 language_model_path = sys.argv[1]
 assert os.path.exists(language_model_path)
 start = time.time()
 model = kenlm.Model(language_model_path)
 print(f"load kenLM cost: {time.time() - start}s")
 sentence = '盘点不怕被税的海淘网站❗️海淘向来便宜又保真！'
 sentence_char_split = ' '.join(list(sentence))
 sentence_word_split = ' '.join(jieba.lcut(sentence))
 def test_score():
    print('Loaded language model: %s' % language_model_path)
    print(sentence)
    print(model.score(sentence))
    print(list(model.full_scores(sentence)))
    for i, v in enumerate(model.full_scores(sentence)):
        print(i, v)
    print(sentence_char_split)
    print(model.score(sentence_char_split))
    print(list(model.full_scores(sentence_char_split)))
    split_size = 0
    for i, v in enumerate(model.full_scores(sentence_char_split)):
        print(i, v)
        split_size += 1
    assert split_size == len(
        sentence_char_split.split()) + 1, "error split size."
    print(sentence_word_split)
    print(model.score(sentence_word_split))
    print(list(model.full_scores(sentence_word_split)))
    for i, v in enumerate(model.full_scores(sentence_word_split)):
        print(i, v)
 def test_full_scores_chars():
    print('Loaded language model: %s' % language_model_path)
    print(sentence_char_split)
    # Show scores and n-gram matches
    words = ['<s>'] + list(sentence) + ['</s>']
    for i, (prob, length,
            oov) in enumerate(model.full_scores(sentence_char_split)):
        print('{0} {1}: {2}'.format(prob, length, ' '.join(words[i + 2 - length:
                                                                 i + 2])))
        if oov:
            print('\t"{0}" is an OOV'.format(words[i + 1]))
    print("-" * 42)
    # Find out-of-vocabulary words
    oov = []
    for w in words:
        if w not in model:
            print('"{0}" is an OOV'.format(w))
            oov.append(w)
    assert oov == ["❗", "️", "！"], 'error oov'
 def test_full_scores_words():
    print('Loaded language model: %s' % language_model_path)
    print(sentence_word_split)
    # Show scores and n-gram matches
    words = ['<s>'] + sentence_word_split.split() + ['</s>']
    for i, (prob, length,
            oov) in enumerate(model.full_scores(sentence_word_split)):
        print('{0} {1}: {2}'.format(prob, length, ' '.join(words[i + 2 - length:
                                                                 i + 2])))
        if oov:
            print('\t"{0}" is an OOV'.format(words[i + 1]))
    print("-" * 42)
    # Find out-of-vocabulary words
    oov = []
    for w in words:
        if w not in model:
            print('"{0}" is an OOV'.format(w))
            oov.append(w)
    # zh_giga.no_cna_cmn.prune01244.klm is chinese charactor LM 
    assert oov == ["盘点", "不怕", "网站", "❗", "️", "海淘", "向来", "便宜", "保真",
                   "！"], 'error oov'
 def test_full_scores_chars_length():
    """test bos eos size"""
    print('Loaded language model: %s' % language_model_path)
    r = list(model.full_scores(sentence_char_split))
    n = list(model.full_scores(sentence_char_split, bos=False, eos=False))
    print(r)
    print(n)
    assert len(r) == len(n) + 1
    # bos=False, eos=False, input len == output len
    print(len(n), len(sentence_char_split.split()))
    assert len(n) == len(sentence_char_split.split())
    k = list(model.full_scores(sentence_char_split, bos=False, eos=True))
    print(k, len(k))
 def test_ppl_sentence():
    """测试句子粒度的ppl得分"""
    sentence_char_split1 = ' '.join('先救挨饿的人，然后治疗病人。')
    sentence_char_split2 = ' '.join('先就挨饿的人，然后治疗病人。')
    n = model.perplexity(sentence_char_split1)
    print('1', n)
    n = model.perplexity(sentence_char_split2)
    print(n)
    part_char_split1 = ' '.join('先救挨饿的人')
    part_char_split2 = ' '.join('先就挨饿的人')
    n = model.perplexity(part_char_split1)
    print('2', n)
    n = model.perplexity(part_char_split2)
    print(n)
    part_char_split1 = '先救挨'
    part_char_split2 = '先就挨'
    n1 = model.perplexity(part_char_split1)
    print('3', n1)
    n2 = model.perplexity(part_char_split2)
    print(n2)
    assert n1 == n2
    part_char_split1 = '先 救 挨'
    part_char_split2 = '先 就 挨'
    n1 = model.perplexity(part_char_split1)
    print('4', n1)
    n2 = model.perplexity(part_char_split2)
    print(n2)
    part_char_split1 = '先 救 挨 饿 的 人'
    part_char_split2 = '先 就 挨 饿 的 人'
    n1 = model.perplexity(part_char_split1)
    print('5', n1)
    n2 = model.perplexity(part_char_split2)
    print(n2)
    part_char_split1 = '先 救 挨 饿 的 人 ，'
    part_char_split2 = '先 就 挨 饿 的 人 ，'
    n1 = model.perplexity(part_char_split1)
    print('6', n1)
    n2 = model.perplexity(part_char_split2)
    print(n2)
    part_char_split1 = '先 救 挨 饿 的 人 ， 然 后 治 疗 病 人'
    part_char_split2 = '先 就 挨 饿 的 人 ， 然 后 治 疗 病 人'
    n1 = model.perplexity(part_char_split1)
    print('7', n1)
    n2 = model.perplexity(part_char_split2)
    print(n2)
    part_char_split1 = '先 救 挨 饿 的 人 ， 然 后 治 疗 病 人 。'
    part_char_split2 = '先 就 挨 饿 的 人 ， 然 后 治 疗 病 人 。'
    n1 = model.perplexity(part_char_split1)
    print('8', n1)
    n2 = model.perplexity(part_char_split2)
    print(n2)
 if __name__ == '__main__':
    test_score()
    test_full_scores_chars()
    test_full_scores_words()
    test_full_scores_chars_length()
    test_ppl_sentence()
--- a/examples/ngram_lm/path.sh
+++ b/examples/ngram_lm/path.sh
@ -0,0 +1,10 @@
 export MAIN_ROOT=${PWD}/../../
 export PATH=${MAIN_ROOT}:${MAIN_ROOT}/utils:${PATH}
 export LC_ALL=C
 # Use UTF-8 in Python to avoid UnicodeDecodeError when LC_ALL=C
 export PYTHONIOENCODING=UTF-8
 export PYTHONPATH=${MAIN_ROOT}:${PYTHONPATH}
 export LD_LIBRARY_PATH=/usr/local/lib/:${LD_LIBRARY_PATH}
--- a/examples/ngram_lm/requirements.txt
+++ b/examples/ngram_lm/requirements.txt
@ -0,0 +1 @@
 jieba>=0.39
--- a/examples/ngram_lm/run.sh
+++ b/examples/ngram_lm/run.sh
@ -0,0 +1,57 @@
 #!/bin/bash
 set -e
 source path.sh
 stage=0
 stop_stage=100
 source ${MAIN_ROOT}/utils/parse_options.sh || exit -1
 python3 -c 'import kenlm;' || { echo "kenlm package not install!"; exit -1; }
 if [ $stage -le 0 ] && [ $stop_stage -ge 0 ];then
    # case 1, test kenlm
    # download language model
    bash local/download_lm_zh.sh
    if [ $? -ne 0 ]; then
       exit 1
    fi
    # test kenlm `score` and `full_score`
    python local/kenlm_score_test.py data/lm/zh_giga.no_cna_cmn.prune01244.klm
 fi
 mkdir -p exp
 cp data/text_correct.txt exp/text
 if [ $stage -le 1 ] && [ $stop_stage -ge 1 ];then
    # case 2, chinese chararctor ngram lm build
    # output: xxx.arpa xxx.kenlm.bin
    input=exp/text
    token_type=char
    lang=zh
    order=5
    prune="0 1 2 4 4"
    a=22
    q=8
    b=8
    output=${input}_${lang}_${token_type}_o${order}_p${prune// /_}_a${a}_q${q}_b${b}.arpa
    echo "build ${token_type} lm."
    bash local/build_zh_lm.sh --order ${order} --prune "${prune}" --a ${a} --q ${a} --b ${b} ${token_type} ${input} ${output}
 fi
 if [ $stage -le 2 ] && [ $stop_stage -ge 2 ];then
    # case 2, chinese chararctor ngram lm build
    # output: xxx.arpa xxx.kenlm.bin
    input=exp/text
    token_type=word
    lang=zh
    order=3
    prune="0 0 0"
    a=22
    q=8
    b=8
    output=${input}_${lang}_${token_type}_o${order}_p${prune// /_}_a${a}_q${q}_b${b}.arpa
    echo "build ${token_type} lm."
    bash local/build_zh_lm.sh --order ${order} --prune "${prune}" --a ${a} --q ${a} --b ${b} ${token_type} ${input} ${output}
 fi
--- a/setup.sh
+++ b/setup.sh
@ -57,11 +57,11 @@ if [ $? != 0 ]; then
 fi
-# install kaldi-comptiable feature 
+# install third_party
-pushd third_party/python_kaldi_features/
+pushd third_party
-python setup.py install
+bash install.sh
 if [ $? != 0 ]; then
-   error_msg "Please check why kaldi feature install error!"
+   error_msg "Please check why third_party install error!"
   exit -1
 fi
 popd
--- a/third_party/README.md
+++ b/third_party/README.md
@ -1,8 +1,20 @@
 * [python_kaldi_features](https://github.com/ZitengWang/python_kaldi_features)  
 commit: fc1bd6240c2008412ab64dc25045cd872f5e126c  
 ref: https://zhuanlan.zhihu.com/p/55371926  
 licence: MIT
 * [python-pinyin](https://github.com/mozillazg/python-pinyin.git)
-  commit: 55e524aa1b7b8eec3d15c5306043c6cdd5938b03
+commit: 55e524aa1b7b8eec3d15c5306043c6cdd5938b03
-  licence: MIT
+licence: MIT
 * [zhon](https://github.com/tsroten/zhon)
 commit: 09bf543696277f71de502506984661a60d24494c
 licence: MIT
 * [pymmseg-cpp](https://github.com/pluskid/pymmseg-cpp.git)
 commit: b76465045717fbb4f118c4fbdd24ce93bab10a6d
 licence: MIT
 * [chinese_text_normalization](https://github.com/speechio/chinese_text_normalization.git)
 commit: 9e92c7bf2d6b5a7974305406d8e240045beac51c
 licence: MIT
--- a/third_party/chinese_text_normalization/.gitignore
+++ b/third_party/chinese_text_normalization/.gitignore
@ -0,0 +1,2 @@
 *~
 *.far
--- a/third_party/chinese_text_normalization/LICENSE
+++ b/third_party/chinese_text_normalization/LICENSE
@ -0,0 +1,21 @@
 MIT License
 Copyright (c) 2020 SpeechIO
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
--- a/third_party/chinese_text_normalization/README.md
+++ b/third_party/chinese_text_normalization/README.md
@ -0,0 +1,112 @@
 # Chinese Text Normalization for Speech Processing
 ## Problem
 Search for "Text Normalization"(TN) on Google and Github, you can hardly find open-source projects that are "read-to-use" for text normalization tasks. Instead, you find a bunch of NLP toolkits or frameworks that *supports* TN functionality.  There is quite some work between "support text normalization" and "do text normalization".
 ## Reason
 * TN is language-dependent, more or less.
    Some of TN processing methods are shared across languages, but a good TN module always involves language-specific knowledge and treatments, more or less.
 * TN is task-specific.
    Even for the same language, different applications require quite different TN.
 * TN is "dirty"
    Constructing and maintaining a set of TN rewrite-rules is painful, whatever toolkits and frameworks you choose.  Subtle and intrinsic complexities hide inside TN task itself, not in tools or frameworks.
 * mature TN module is an asset
    Since constructing and maintaining TN is hard, it is actually an asset for commercial companies, hence it is unlikely to find a product-level TN in open-source community (correct me if you find any)
 * TN is a less important topic for either academic or commercials.
 ## Goal
 This project sets up a ready-to-use TN module for **Chinese**. Since my background is **speech processing**, this project should be able to handle most common TN tasks, in **Chinese ASR** text processing pipelines.
 ## Normalizers
 1. supported NSW (Non-Standard-Word) Normalization
    |NSW type|raw|normalized|
    |-|-|-|
    |cardinal|这块黄金重达324.75克|这块黄金重达三百二十四点七五克|
    |date|她出生于86年8月18日，她弟弟出生于1995年3月1日|她出生于八六年八月十八日 她弟弟出生于一九九五年三月一日|
    |digit|电影中梁朝伟扮演的陈永仁的编号27149|电影中梁朝伟扮演的陈永仁的编号二七一四九|
    |fraction|现场有7/12的观众投出了赞成票|现场有十二分之七的观众投出了赞成票|
    |money|随便来几个价格12块5，34.5元，20.1万|随便来几个价格十二块五 三十四点五元 二十点一万|
    |percentage|明天有62％的概率降雨|明天有百分之六十二的概率降雨|
    |telephone|这是固话0421-33441122<br>这是手机+86 18544139121|这是固话零四二一三三四四一一二二<br>这是手机八六一八五四四一三九一二一|
    acknowledgement: the NSW normalization codes are based on [Zhiyang Zhou's work here](https://github.com/Joee1995/chn_text_norm.git)
 1. punctuation removal
    For Chinese, it removes punctuation list collected in [Zhon](https://github.com/tsroten/zhon) project, containing
    * non-stop puncs
        ```
        '＂＃＄％＆＇（）＊＋，－／：；＜＝＞＠［＼］＾＿｀｛｜｝～｟｠｢｣､、〃》「」『』【】〔〕〖〗〘〙〚〛〜〝〞〟〰〾〿–—‘’‛“”„‟…‧﹏'
        ```
    * stop puncs
        ```
        '！？｡。'
        ```
    For English, it removes Python's `string.punctuation`
 1. multilingual English word upper/lower case conversion
    since ASR/TTS lexicons usually unify English entries to uppercase or lowercase, the TN module should adapt with lexicon accordingly.
 ## Supported text format
 1. plain text, preferably one sentence per line(most common case in ASR processing).
    ```
    今天早饭吃了没
    没吃回家吃去吧
    ...
    ```
    plain text is default format.
 2. Kaldi's transcription format
    ```
    KALDI_KEY_UTT001    今天早饭吃了没
    KALDI_KEY_UTT002    没吃回家吃去吧
    ...
    ```
    TN will skip first column key section, normalize latter transcription text
    pass `--has_key` option to switch to kaldi format.
 _note: All input text should be UTF-8 encoded._
 ## Run examples
 * TN (python)
 make sure you have **python3**, python2.X won't work correctly.
 `sh run.sh` in `TN` dir, and compare raw text and normalized text.
 * ITN (thrax)
 make sure you  have **thrax** installed, and your PATH should be able to find thrax binaries.
 `sh run.sh` in `ITN` dir. check Makefile for grammar dependency.
 ## possible future work
 Since TN is a typical "done is better than perfect" module in context of ASR, and the current state is sufficient for my purpose, I probably won't update this repo frequently.
 there are indeed something that needs to be improved:
 * For TN, NSW normalizers in TN dir are based on regular expression, I've found some unintended matches, those pattern regexps need to be refined for more precise TN coverage.
 * For ITN, extend those thrax rewriting grammars to cover more scenarios.
 * Further more, nowadays commercial systems start to introduce RNN-like models into TN, and a mix of (rule-based & model-based) system is state-of-the-art.  More readings about this, look for Richard Sproat and KyleGorman's work at Google.
 END
--- a/third_party/chinese_text_normalization/python/cn_tn.py
+++ b/third_party/chinese_text_normalization/python/cn_tn.py
@ -0,0 +1,794 @@
 #!/usr/bin/env python3
 # coding=utf-8
 # Authors:
 #   2019.5 Zhiyang Zhou (https://github.com/Joee1995/chn_text_norm.git)
 #   2019.9 Jiayu DU
 #
 # requirements:
 #   - python 3.X
 # notes: python 2.X WILL fail or produce misleading results
 import sys, os, argparse, codecs, string, re
 # ================================================================================ #
 #                                    basic constant
 # ================================================================================ #
 CHINESE_DIGIS = u'零一二三四五六七八九'
 BIG_CHINESE_DIGIS_SIMPLIFIED = u'零壹贰叁肆伍陆柒捌玖'
 BIG_CHINESE_DIGIS_TRADITIONAL = u'零壹貳參肆伍陸柒捌玖'
 SMALLER_BIG_CHINESE_UNITS_SIMPLIFIED = u'十百千万'
 SMALLER_BIG_CHINESE_UNITS_TRADITIONAL = u'拾佰仟萬'
 LARGER_CHINESE_NUMERING_UNITS_SIMPLIFIED = u'亿兆京垓秭穰沟涧正载'
 LARGER_CHINESE_NUMERING_UNITS_TRADITIONAL = u'億兆京垓秭穰溝澗正載'
 SMALLER_CHINESE_NUMERING_UNITS_SIMPLIFIED = u'十百千万'
 SMALLER_CHINESE_NUMERING_UNITS_TRADITIONAL = u'拾佰仟萬'
 ZERO_ALT = u'〇'
 ONE_ALT = u'幺'
 TWO_ALTS = [u'两', u'兩']
 POSITIVE = [u'正', u'正']
 NEGATIVE = [u'负', u'負']
 POINT = [u'点', u'點']
 # PLUS = [u'加', u'加']
 # SIL = [u'杠', u'槓']
 # 中文数字系统类型
 NUMBERING_TYPES = ['low', 'mid', 'high']
 CURRENCY_NAMES = '(人民币|美元|日元|英镑|欧元|马克|法郎|加拿大元|澳元|港币|先令|芬兰马克|爱尔兰镑|' \
                 '里拉|荷兰盾|埃斯库多|比塞塔|印尼盾|林吉特|新西兰元|比索|卢布|新加坡元|韩元|泰铢)'
 CURRENCY_UNITS = '((亿|千万|百万|万|千|百)|(亿|千万|百万|万|千|百|)元|(亿|千万|百万|万|千|百|)块|角|毛|分)'
 COM_QUANTIFIERS = '(匹|张|座|回|场|尾|条|个|首|阙|阵|网|炮|顶|丘|棵|只|支|袭|辆|挑|担|颗|壳|窠|曲|墙|群|腔|' \
                  '砣|座|客|贯|扎|捆|刀|令|打|手|罗|坡|山|岭|江|溪|钟|队|单|双|对|出|口|头|脚|板|跳|枝|件|贴|' \
                  '针|线|管|名|位|身|堂|课|本|页|家|户|层|丝|毫|厘|分|钱|两|斤|担|铢|石|钧|锱|忽|(千|毫|微)克|' \
                  '毫|厘|分|寸|尺|丈|里|寻|常|铺|程|(千|分|厘|毫|微)米|撮|勺|合|升|斗|石|盘|碗|碟|叠|桶|笼|盆|' \
                  '盒|杯|钟|斛|锅|簋|篮|盘|桶|罐|瓶|壶|卮|盏|箩|箱|煲|啖|袋|钵|年|月|日|季|刻|时|周|天|秒|分|旬|' \
                  '纪|岁|世|更|夜|春|夏|秋|冬|代|伏|辈|丸|泡|粒|颗|幢|堆|条|根|支|道|面|片|张|颗|块)'
 # punctuation information are based on Zhon project (https://github.com/tsroten/zhon.git)
 CHINESE_PUNC_STOP = '！？｡。'
 CHINESE_PUNC_NON_STOP = '＂＃＄％＆＇（）＊＋，－／：；＜＝＞＠［＼］＾＿｀｛｜｝～｟｠｢｣､、〃《》「」『』【】〔〕〖〗〘〙〚〛〜〝〞〟〰〾〿–—‘’‛“”„‟…‧﹏'
 CHINESE_PUNC_OTHER = '·〈〉-'
 CHINESE_PUNC_LIST = CHINESE_PUNC_STOP + CHINESE_PUNC_NON_STOP + CHINESE_PUNC_OTHER
 # ================================================================================ #
 #                                    basic class
 # ================================================================================ #
 class ChineseChar(object):
    """
    中文字符
    每个字符对应简体和繁体,
    e.g. 简体 = '负', 繁体 = '負'
    转换时可转换为简体或繁体
    """
    def __init__(self, simplified, traditional):
        self.simplified = simplified
        self.traditional = traditional
        #self.__repr__ = self.__str__
    def __str__(self):
        return self.simplified or self.traditional or None
    def __repr__(self):
        return self.__str__()
 class ChineseNumberUnit(ChineseChar):
    """
    中文数字/数位字符
    每个字符除繁简体外还有一个额外的大写字符
    e.g. '陆' 和 '陸'
    """
    def __init__(self, power, simplified, traditional, big_s, big_t):
        super(ChineseNumberUnit, self).__init__(simplified, traditional)
        self.power = power
        self.big_s = big_s
        self.big_t = big_t
    def __str__(self):
        return '10^{}'.format(self.power)
    @classmethod
    def create(cls, index, value, numbering_type=NUMBERING_TYPES[1], small_unit=False):
        if small_unit:
            return ChineseNumberUnit(power=index + 1,
                                     simplified=value[0], traditional=value[1], big_s=value[1], big_t=value[1])
        elif numbering_type == NUMBERING_TYPES[0]:
            return ChineseNumberUnit(power=index + 8,
                                     simplified=value[0], traditional=value[1], big_s=value[0], big_t=value[1])
        elif numbering_type == NUMBERING_TYPES[1]:
            return ChineseNumberUnit(power=(index + 2) * 4,
                                     simplified=value[0], traditional=value[1], big_s=value[0], big_t=value[1])
        elif numbering_type == NUMBERING_TYPES[2]:
            return ChineseNumberUnit(power=pow(2, index + 3),
                                     simplified=value[0], traditional=value[1], big_s=value[0], big_t=value[1])
        else:
            raise ValueError(
                'Counting type should be in {0} ({1} provided).'.format(NUMBERING_TYPES, numbering_type))
 class ChineseNumberDigit(ChineseChar):
    """
    中文数字字符
    """
    def __init__(self, value, simplified, traditional, big_s, big_t, alt_s=None, alt_t=None):
        super(ChineseNumberDigit, self).__init__(simplified, traditional)
        self.value = value
        self.big_s = big_s
        self.big_t = big_t
        self.alt_s = alt_s
        self.alt_t = alt_t
    def __str__(self):
        return str(self.value)
    @classmethod
    def create(cls, i, v):
        return ChineseNumberDigit(i, v[0], v[1], v[2], v[3])
 class ChineseMath(ChineseChar):
    """
    中文数位字符
    """
    def __init__(self, simplified, traditional, symbol, expression=None):
        super(ChineseMath, self).__init__(simplified, traditional)
        self.symbol = symbol
        self.expression = expression
        self.big_s = simplified
        self.big_t = traditional
 CC, CNU, CND, CM = ChineseChar, ChineseNumberUnit, ChineseNumberDigit, ChineseMath
 class NumberSystem(object):
    """
    中文数字系统
    """
    pass
 class MathSymbol(object):
    """
    用于中文数字系统的数学符号 (繁/简体), e.g.
    positive = ['正', '正']
    negative = ['负', '負']
    point = ['点', '點']
    """
    def __init__(self, positive, negative, point):
        self.positive = positive
        self.negative = negative
        self.point = point
    def __iter__(self):
        for v in self.__dict__.values():
            yield v
 # class OtherSymbol(object):
 #     """
 #     其他符号
 #     """
 #
 #     def __init__(self, sil):
 #         self.sil = sil
 #
 #     def __iter__(self):
 #         for v in self.__dict__.values():
 #             yield v
 # ================================================================================ #
 #                                    basic utils
 # ================================================================================ #
 def create_system(numbering_type=NUMBERING_TYPES[1]):
    """
    根据数字系统类型返回创建相应的数字系统，默认为 mid
    NUMBERING_TYPES = ['low', 'mid', 'high']: 中文数字系统类型
        low:  '兆' = '亿' * '十' = $10^{9}$,  '京' = '兆' * '十', etc.
        mid:  '兆' = '亿' * '万' = $10^{12}$, '京' = '兆' * '万', etc.
        high: '兆' = '亿' * '亿' = $10^{16}$, '京' = '兆' * '兆', etc.
    返回对应的数字系统
    """
    # chinese number units of '亿' and larger
    all_larger_units = zip(
        LARGER_CHINESE_NUMERING_UNITS_SIMPLIFIED, LARGER_CHINESE_NUMERING_UNITS_TRADITIONAL)
    larger_units = [CNU.create(i, v, numbering_type, False)
                    for i, v in enumerate(all_larger_units)]
    # chinese number units of '十, 百, 千, 万'
    all_smaller_units = zip(
        SMALLER_CHINESE_NUMERING_UNITS_SIMPLIFIED, SMALLER_CHINESE_NUMERING_UNITS_TRADITIONAL)
    smaller_units = [CNU.create(i, v, small_unit=True)
                     for i, v in enumerate(all_smaller_units)]
    # digis
    chinese_digis = zip(CHINESE_DIGIS, CHINESE_DIGIS,
                        BIG_CHINESE_DIGIS_SIMPLIFIED, BIG_CHINESE_DIGIS_TRADITIONAL)
    digits = [CND.create(i, v) for i, v in enumerate(chinese_digis)]
    digits[0].alt_s, digits[0].alt_t = ZERO_ALT, ZERO_ALT
    digits[1].alt_s, digits[1].alt_t = ONE_ALT, ONE_ALT
    digits[2].alt_s, digits[2].alt_t = TWO_ALTS[0], TWO_ALTS[1]
    # symbols
    positive_cn = CM(POSITIVE[0], POSITIVE[1], '+', lambda x: x)
    negative_cn = CM(NEGATIVE[0], NEGATIVE[1], '-', lambda x: -x)
    point_cn = CM(POINT[0], POINT[1], '.', lambda x,
                  y: float(str(x) + '.' + str(y)))
    # sil_cn = CM(SIL[0], SIL[1], '-', lambda x, y: float(str(x) + '-' + str(y)))
    system = NumberSystem()
    system.units = smaller_units + larger_units
    system.digits = digits
    system.math = MathSymbol(positive_cn, negative_cn, point_cn)
    # system.symbols = OtherSymbol(sil_cn)
    return system
 def chn2num(chinese_string, numbering_type=NUMBERING_TYPES[1]):
    def get_symbol(char, system):
        for u in system.units:
            if char in [u.traditional, u.simplified, u.big_s, u.big_t]:
                return u
        for d in system.digits:
            if char in [d.traditional, d.simplified, d.big_s, d.big_t, d.alt_s, d.alt_t]:
                return d
        for m in system.math:
            if char in [m.traditional, m.simplified]:
                return m
    def string2symbols(chinese_string, system):
        int_string, dec_string = chinese_string, ''
        for p in [system.math.point.simplified, system.math.point.traditional]:
            if p in chinese_string:
                int_string, dec_string = chinese_string.split(p)
                break
        return [get_symbol(c, system) for c in int_string], \
               [get_symbol(c, system) for c in dec_string]
    def correct_symbols(integer_symbols, system):
        """
        一百八 to 一百八十
        一亿一千三百万 to 一亿 一千万 三百万
        """
        if integer_symbols and isinstance(integer_symbols[0], CNU):
            if integer_symbols[0].power == 1:
                integer_symbols = [system.digits[1]] + integer_symbols
        if len(integer_symbols) > 1:
            if isinstance(integer_symbols[-1], CND) and isinstance(integer_symbols[-2], CNU):
                integer_symbols.append(
                    CNU(integer_symbols[-2].power - 1, None, None, None, None))
        result = []
        unit_count = 0
        for s in integer_symbols:
            if isinstance(s, CND):
                result.append(s)
                unit_count = 0
            elif isinstance(s, CNU):
                current_unit = CNU(s.power, None, None, None, None)
                unit_count += 1
            if unit_count == 1:
                result.append(current_unit)
            elif unit_count > 1:
                for i in range(len(result)):
                    if isinstance(result[-i - 1], CNU) and result[-i - 1].power < current_unit.power:
                        result[-i - 1] = CNU(result[-i - 1].power +
                                             current_unit.power, None, None, None, None)
        return result
    def compute_value(integer_symbols):
        """
        Compute the value.
        When current unit is larger than previous unit, current unit * all previous units will be used as all previous units.
        e.g. '两千万' = 2000 * 10000 not 2000 + 10000
        """
        value = [0]
        last_power = 0
        for s in integer_symbols:
            if isinstance(s, CND):
                value[-1] = s.value
            elif isinstance(s, CNU):
                value[-1] *= pow(10, s.power)
                if s.power > last_power:
                    value[:-1] = list(map(lambda v: v *
                                                    pow(10, s.power), value[:-1]))
                    last_power = s.power
                value.append(0)
        return sum(value)
    system = create_system(numbering_type)
    int_part, dec_part = string2symbols(chinese_string, system)
    int_part = correct_symbols(int_part, system)
    int_str = str(compute_value(int_part))
    dec_str = ''.join([str(d.value) for d in dec_part])
    if dec_part:
        return '{0}.{1}'.format(int_str, dec_str)
    else:
        return int_str
 def num2chn(number_string, numbering_type=NUMBERING_TYPES[1], big=False,
            traditional=False, alt_zero=False, alt_one=False, alt_two=True,
            use_zeros=True, use_units=True):
    def get_value(value_string, use_zeros=True):
        striped_string = value_string.lstrip('0')
        # record nothing if all zeros
        if not striped_string:
            return []
        # record one digits
        elif len(striped_string) == 1:
            if use_zeros and len(value_string) != len(striped_string):
                return [system.digits[0], system.digits[int(striped_string)]]
            else:
                return [system.digits[int(striped_string)]]
        # recursively record multiple digits
        else:
            result_unit = next(u for u in reversed(
                system.units) if u.power < len(striped_string))
            result_string = value_string[:-result_unit.power]
            return get_value(result_string) + [result_unit] + get_value(striped_string[-result_unit.power:])
    system = create_system(numbering_type)
    int_dec = number_string.split('.')
    if len(int_dec) == 1:
        int_string = int_dec[0]
        dec_string = ""
    elif len(int_dec) == 2:
        int_string = int_dec[0]
        dec_string = int_dec[1]
    else:
        raise ValueError(
            "invalid input num string with more than one dot: {}".format(number_string))
    if use_units and len(int_string) > 1:
        result_symbols = get_value(int_string)
    else:
        result_symbols = [system.digits[int(c)] for c in int_string]
    dec_symbols = [system.digits[int(c)] for c in dec_string]
    if dec_string:
        result_symbols += [system.math.point] + dec_symbols
    if alt_two:
        liang = CND(2, system.digits[2].alt_s, system.digits[2].alt_t,
                    system.digits[2].big_s, system.digits[2].big_t)
        for i, v in enumerate(result_symbols):
            if isinstance(v, CND) and v.value == 2:
                next_symbol = result_symbols[i +
                                             1] if i < len(result_symbols) - 1 else None
                previous_symbol = result_symbols[i - 1] if i > 0 else None
                if isinstance(next_symbol, CNU) and isinstance(previous_symbol, (CNU, type(None))):
                    if next_symbol.power != 1 and ((previous_symbol is None) or (previous_symbol.power != 1)):
                        result_symbols[i] = liang
    # if big is True, '两' will not be used and `alt_two` has no impact on output
    if big:
        attr_name = 'big_'
        if traditional:
            attr_name += 't'
        else:
            attr_name += 's'
    else:
        if traditional:
            attr_name = 'traditional'
        else:
            attr_name = 'simplified'
    result = ''.join([getattr(s, attr_name) for s in result_symbols])
    # if not use_zeros:
    #     result = result.strip(getattr(system.digits[0], attr_name))
    if alt_zero:
        result = result.replace(
            getattr(system.digits[0], attr_name), system.digits[0].alt_s)
    if alt_one:
        result = result.replace(
            getattr(system.digits[1], attr_name), system.digits[1].alt_s)
    for i, p in enumerate(POINT):
        if result.startswith(p):
            return CHINESE_DIGIS[0] + result
    # ^10, 11, .., 19
    if len(result) >= 2 and result[1] in [SMALLER_CHINESE_NUMERING_UNITS_SIMPLIFIED[0],
                                          SMALLER_CHINESE_NUMERING_UNITS_TRADITIONAL[0]] and \
            result[0] in [CHINESE_DIGIS[1], BIG_CHINESE_DIGIS_SIMPLIFIED[1], BIG_CHINESE_DIGIS_TRADITIONAL[1]]:
        result = result[1:]
    return result
 # ================================================================================ #
 #                          different types of rewriters
 # ================================================================================ #
 class Cardinal:
    """
    CARDINAL类
    """
    def __init__(self, cardinal=None, chntext=None):
        self.cardinal = cardinal
        self.chntext = chntext
    def chntext2cardinal(self):
        return chn2num(self.chntext)
    def cardinal2chntext(self):
        return num2chn(self.cardinal)
 class Digit:
    """
    DIGIT类
    """
    def __init__(self, digit=None, chntext=None):
        self.digit = digit
        self.chntext = chntext
    # def chntext2digit(self):
    #     return chn2num(self.chntext)
    def digit2chntext(self):
        return num2chn(self.digit, alt_two=False, use_units=False)
 class TelePhone:
    """
    TELEPHONE类
    """
    def __init__(self, telephone=None, raw_chntext=None, chntext=None):
        self.telephone = telephone
        self.raw_chntext = raw_chntext
        self.chntext = chntext
    # def chntext2telephone(self):
    #     sil_parts = self.raw_chntext.split('<SIL>')
    #     self.telephone = '-'.join([
    #         str(chn2num(p)) for p in sil_parts
    #     ])
    #     return self.telephone
    def telephone2chntext(self, fixed=False):
        if fixed:
            sil_parts = self.telephone.split('-')
            self.raw_chntext = '<SIL>'.join([
                num2chn(part, alt_two=False, use_units=False) for part in sil_parts
            ])
            self.chntext = self.raw_chntext.replace('<SIL>', '')
        else:
            sp_parts = self.telephone.strip('+').split()
            self.raw_chntext = '<SP>'.join([
                num2chn(part, alt_two=False, use_units=False) for part in sp_parts
            ])
            self.chntext = self.raw_chntext.replace('<SP>', '')
        return self.chntext
 class Fraction:
    """
    FRACTION类
    """
    def __init__(self, fraction=None, chntext=None):
        self.fraction = fraction
        self.chntext = chntext
    def chntext2fraction(self):
        denominator, numerator = self.chntext.split('分之')
        return chn2num(numerator) + '/' + chn2num(denominator)
    def fraction2chntext(self):
        numerator, denominator = self.fraction.split('/')
        return num2chn(denominator) + '分之' + num2chn(numerator)
 class Date:
    """
    DATE类
    """
    def __init__(self, date=None, chntext=None):
        self.date = date
        self.chntext = chntext
    # def chntext2date(self):
    #     chntext = self.chntext
    #     try:
    #         year, other = chntext.strip().split('年', maxsplit=1)
    #         year = Digit(chntext=year).digit2chntext() + '年'
    #     except ValueError:
    #         other = chntext
    #         year = ''
    #     if other:
    #         try:
    #             month, day = other.strip().split('月', maxsplit=1)
    #             month = Cardinal(chntext=month).chntext2cardinal() + '月'
    #         except ValueError:
    #             day = chntext
    #             month = ''
    #         if day:
    #             day = Cardinal(chntext=day[:-1]).chntext2cardinal() + day[-1]
    #     else:
    #         month = ''
    #         day = ''
    #     date = year + month + day
    #     self.date = date
    #     return self.date
    def date2chntext(self):
        date = self.date
        try:
            year, other = date.strip().split('年', 1)
            year = Digit(digit=year).digit2chntext() + '年'
        except ValueError:
            other = date
            year = ''
        if other:
            try:
                month, day = other.strip().split('月', 1)
                month = Cardinal(cardinal=month).cardinal2chntext() + '月'
            except ValueError:
                day = date
                month = ''
            if day:
                day = Cardinal(cardinal=day[:-1]).cardinal2chntext() + day[-1]
        else:
            month = ''
            day = ''
        chntext = year + month + day
        self.chntext = chntext
        return self.chntext
 class Money:
    """
    MONEY类
    """
    def __init__(self, money=None, chntext=None):
        self.money = money
        self.chntext = chntext
    # def chntext2money(self):
    #     return self.money
    def money2chntext(self):
        money = self.money
        pattern = re.compile(r'(\d+(\.\d+)?)')
        matchers = pattern.findall(money)
        if matchers:
            for matcher in matchers:
                money = money.replace(matcher[0], Cardinal(cardinal=matcher[0]).cardinal2chntext())
        self.chntext = money
        return self.chntext
 class Percentage:
    """
    PERCENTAGE类
    """
    def __init__(self, percentage=None, chntext=None):
        self.percentage = percentage
        self.chntext = chntext
    def chntext2percentage(self):
        return chn2num(self.chntext.strip().strip('百分之')) + '%'
    def percentage2chntext(self):
        return '百分之' + num2chn(self.percentage.strip().strip('%'))
 # ================================================================================ #
 #                            NSW Normalizer
 # ================================================================================ #
 class NSWNormalizer:
    def __init__(self, raw_text):
        self.raw_text = '^' + raw_text + '$'
        self.norm_text = ''
    def _particular(self):
        text = self.norm_text
        pattern = re.compile(r"(([a-zA-Z]+)二([a-zA-Z]+))")
        matchers = pattern.findall(text)
        if matchers:
            # print('particular')
            for matcher in matchers:
                text = text.replace(matcher[0], matcher[1]+'2'+matcher[2], 1)
        self.norm_text = text
        return self.norm_text
    def normalize(self):
        text = self.raw_text
        # 规范化日期
        pattern = re.compile(r"\D+((([089]\d|(19|20)\d{2})年)?(\d{1,2}月(\d{1,2}[日号])?)?)")
        matchers = pattern.findall(text)
        if matchers:
            #print('date')
            for matcher in matchers:
                text = text.replace(matcher[0], Date(date=matcher[0]).date2chntext(), 1)
        # 规范化金钱
        pattern = re.compile(r"\D+((\d+(\.\d+)?)[多余几]?" + CURRENCY_UNITS + r"(\d" + CURRENCY_UNITS + r"?)?)")
        matchers = pattern.findall(text)
        if matchers:
            #print('money')
            for matcher in matchers:
                text = text.replace(matcher[0], Money(money=matcher[0]).money2chntext(), 1)
        # 规范化固话/手机号码
        # 手机
        # http://www.jihaoba.com/news/show/13680
        # 移动：139、138、137、136、135、134、159、158、157、150、151、152、188、187、182、183、184、178、198
        # 联通：130、131、132、156、155、186、185、176
        # 电信：133、153、189、180、181、177
        pattern = re.compile(r"\D((\+?86 ?)?1([38]\d|5[0-35-9]|7[678]|9[89])\d{8})\D")
        matchers = pattern.findall(text)
        if matchers:
            #print('telephone')
            for matcher in matchers:
                text = text.replace(matcher[0], TelePhone(telephone=matcher[0]).telephone2chntext(), 1)
        # 固话
        pattern = re.compile(r"\D((0(10|2[1-3]|[3-9]\d{2})-?)?[1-9]\d{6,7})\D")
        matchers = pattern.findall(text)
        if matchers:
            # print('fixed telephone')
            for matcher in matchers:
                text = text.replace(matcher[0], TelePhone(telephone=matcher[0]).telephone2chntext(fixed=True), 1)
        # 规范化分数
        pattern = re.compile(r"(\d+/\d+)")
        matchers = pattern.findall(text)
        if matchers:
            #print('fraction')
            for matcher in matchers:
                text = text.replace(matcher, Fraction(fraction=matcher).fraction2chntext(), 1)
        # 规范化百分数
        text = text.replace('％', '%')
        pattern = re.compile(r"(\d+(\.\d+)?%)")
        matchers = pattern.findall(text)
        if matchers:
            #print('percentage')
            for matcher in matchers:
                text = text.replace(matcher[0], Percentage(percentage=matcher[0]).percentage2chntext(), 1)
        # 规范化纯数+量词
        pattern = re.compile(r"(\d+(\.\d+)?)[多余几]?" + COM_QUANTIFIERS)
        matchers = pattern.findall(text)
        if matchers:
            #print('cardinal+quantifier')
            for matcher in matchers:
                text = text.replace(matcher[0], Cardinal(cardinal=matcher[0]).cardinal2chntext(), 1)
        # 规范化数字编号
        pattern = re.compile(r"(\d{4,32})")
        matchers = pattern.findall(text)
        if matchers:
            #print('digit')
            for matcher in matchers:
                text = text.replace(matcher, Digit(digit=matcher).digit2chntext(), 1)
        # 规范化纯数
        pattern = re.compile(r"(\d+(\.\d+)?)")
        matchers = pattern.findall(text)
        if matchers:
            #print('cardinal')
            for matcher in matchers:
                text = text.replace(matcher[0], Cardinal(cardinal=matcher[0]).cardinal2chntext(), 1)
        self.norm_text = text
        self._particular()
        return self.norm_text.lstrip('^').rstrip('$')
 def nsw_test_case(raw_text):
    print('I:' + raw_text)
    print('O:' + NSWNormalizer(raw_text).normalize())
    print('')
 def nsw_test():
    nsw_test_case('固话：0595-23865596或23880880。')
    nsw_test_case('固话：0595-23865596或23880880。')
    nsw_test_case('手机：+86 19859213959或15659451527。')
    nsw_test_case('分数：32477/76391。')
    nsw_test_case('百分数：80.03%。')
    nsw_test_case('编号：31520181154418。')
    nsw_test_case('纯数：2983.07克或12345.60米。')
    nsw_test_case('日期：1999年2月20日或09年3月15号。')
    nsw_test_case('金钱：12块5，34.5元，20.1万')
    nsw_test_case('特殊：O2O或B2C。')
    nsw_test_case('3456万吨')
    nsw_test_case('2938个')
    nsw_test_case('938')
    nsw_test_case('今天吃了115个小笼包231个馒头')
    nsw_test_case('有62％的概率')
 if __name__ == '__main__':
    #nsw_test()
    p = argparse.ArgumentParser()
    p.add_argument('ifile', help='input filename, assume utf-8 encoding')
    p.add_argument('ofile', help='output filename')
    p.add_argument('--to_upper', action='store_true', help='convert to upper case')
    p.add_argument('--to_lower', action='store_true', help='convert to lower case')
    p.add_argument('--has_key', action='store_true', help="input text has Kaldi's key as first field.")
    p.add_argument('--log_interval', type=int, default=100000, help='log interval in number of processed lines')
    args = p.parse_args()
    ifile = codecs.open(args.ifile, 'r', 'utf8')
    ofile = codecs.open(args.ofile, 'w+', 'utf8')
    n = 0
    for l in ifile:
        key = ''
        text = ''
        if args.has_key:
            cols = l.split(maxsplit=1)
            key = cols[0]
            if len(cols) == 2:
                text = cols[1].strip()
            else:
                text = ''
        else:
            text = l.strip()
        # cases
        if args.to_upper and args.to_lower:
            sys.stderr.write('cn_tn.py: to_upper OR to_lower?')
            exit(1)
        if args.to_upper:
            text = text.upper()
        if args.to_lower:
            text = text.lower()
        # NSW(Non-Standard-Word) normalization
        text = NSWNormalizer(text).normalize()
        # Punctuations removal
        old_chars = CHINESE_PUNC_LIST + string.punctuation # includes all CN and EN punctuations
        new_chars = ' ' * len(old_chars)
        del_chars = ''
        text = text.translate(str.maketrans(old_chars, new_chars, del_chars))
        #
        if args.has_key:
            ofile.write(key + '\t' + text + '\n')
        else:
            if text.strip() != '': # skip empty line in pure text format(without Kaldi's utt key)
                ofile.write(text + '\n')
        n += 1
        if n % args.log_interval == 0:
            sys.stderr.write("cn_tn.py: {} lines done.\n".format(n))
            sys.stderr.flush()
    sys.stderr.write("cn_tn.py: {} lines done in total.\n".format(n))
    sys.stderr.flush()
    ifile.close()
    ofile.close()
--- a/third_party/chinese_text_normalization/python/example_kaldi.txt
+++ b/third_party/chinese_text_normalization/python/example_kaldi.txt
@ -0,0 +1,7 @@
 UTT000	这块黄金重达324.75克
 UTT001	她出生于86年8月18日，她弟弟出生于1995年3月1日
 UTT002	电影中梁朝伟扮演的陈永仁的编号27149
 UTT003	现场有7/12的观众投出了赞成票
 UTT004	随便来几个价格12块5，34.5元，20.1万
 UTT005	明天有62％的概率降雨
 UTT006	这是固话0421-33441122或这是手机+86 18544139121
--- a/third_party/chinese_text_normalization/python/example_plain.txt
+++ b/third_party/chinese_text_normalization/python/example_plain.txt
@ -0,0 +1,7 @@
 这块黄金重达324.75克
 她出生于86年8月18日，她弟弟出生于1995年3月1日
 电影中梁朝伟扮演的陈永仁的编号27149
 现场有7/12的观众投出了赞成票
 随便来几个价格12块5，34.5元，20.1万
 明天有62％的概率降雨
 这是固话0421-33441122或这是手机+86 18544139121
--- a/third_party/chinese_text_normalization/python/run.sh
+++ b/third_party/chinese_text_normalization/python/run.sh
@ -0,0 +1,8 @@
 # for plain text
 python3 cn_tn.py example_plain.txt output_plain.txt
 diff example_plain.txt output_plain.txt
 # for Kaldi's trans format
 python3 cn_tn.py --has_key example_kaldi.txt output_kaldi.txt
 diff example_kaldi.txt output_kaldi.txt
--- a/third_party/chinese_text_normalization/thrax/INSTALL.txt
+++ b/third_party/chinese_text_normalization/thrax/INSTALL.txt
@ -0,0 +1,24 @@
 0. place install_thrax.sh into $KALDI/tools/extras/
 1. recompile openfst with necessary option "--enable-grm" to support thrax:
 * cd $KALDI_ROOT/tools
 * make clean
 * edit $KALDI_ROOT/tools/Makefile, append "--enable-grm" option to OPENFST_CONFIGURE:
 OPENFST_CONFIGURE ?= --enable-static --enable-shared --enable-far --enable-ngram-fsts --enable-lookahead-fsts --with-pic --enable-grm
 * make -j 10
 2. install thrax
 cd $KALDI_ROOT/tools
 sh extras/install_thrax.sh
 3. add thrax binary path into $KALDI_ROOT/tools/env.sh:
 export PATH=/path/to/your/kaldi_root/tools/thrax-1.2.9/src/bin:${PATH}
 usage:
 before you run anything related to thrax, use:
 . $KALDI_ROOT/tools/env.sh
 to enable binary finding, like what we always do in kaldi.
 sample usage:
 sh run_en.sh
 sh run_cn.sh
--- a/third_party/chinese_text_normalization/thrax/install_thrax.sh
+++ b/third_party/chinese_text_normalization/thrax/install_thrax.sh
@ -0,0 +1,12 @@
 #!/bin/bash
 ## This script should be placed under $KALDI_ROOT/tools/extras/, and see INSTALL.txt for installation guide
 if [ ! -f thrax-1.2.9.tar.gz ]; then
    wget http://www.openfst.org/twiki/pub/GRM/ThraxDownload/thrax-1.2.9.tar.gz
    tar -zxf thrax-1.2.9.tar.gz
 fi
 cd thrax-1.2.9
 OPENFSTPREFIX=`pwd`/../openfst
 LDFLAGS="-L${OPENFSTPREFIX}/lib" CXXFLAGS="-I${OPENFSTPREFIX}/include" ./configure --prefix ${OPENFSTPREFIX}
 make -j 10; make install
 cd ..
--- a/third_party/chinese_text_normalization/thrax/papers/gorman-sproat-2016.pdf
+++ b/third_party/chinese_text_normalization/thrax/papers/gorman-sproat-2016.pdf
--- a/third_party/chinese_text_normalization/thrax/papers/wu-etal-2016.pdf
+++ b/third_party/chinese_text_normalization/thrax/papers/wu-etal-2016.pdf
--- a/third_party/chinese_text_normalization/thrax/run_cn.sh
+++ b/third_party/chinese_text_normalization/thrax/run_cn.sh
@ -0,0 +1,6 @@
 cd src/cn
 thraxmakedep itn.grm
 make
 #thraxrewrite-tester --far=itn.far --rules=ITN 
 cat ../../testcase_cn.txt | thraxrewrite-tester --far=itn.far --rules=ITN 
 cd -
--- a/third_party/chinese_text_normalization/thrax/run_en.sh
+++ b/third_party/chinese_text_normalization/thrax/run_en.sh
@ -0,0 +1,6 @@
 cd src
 thraxmakedep en/verbalizer/podspeech.grm
 make
 cat ../testcase_en.txt
 cat ../testcase_en.txt | thraxrewrite-tester --far=en/verbalizer/podspeech.far --rules=POD_SPEECH_TN
 cd -
--- a/third_party/chinese_text_normalization/thrax/src/LICENSE
+++ b/third_party/chinese_text_normalization/thrax/src/LICENSE
@ -0,0 +1,202 @@
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/
   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
   1. Definitions.
      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.
      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.
      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.
      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.
      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.
      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.
      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).
      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.
      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."
      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.
   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.
   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.
   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:
      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and
      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and
      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and
      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.
      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.
   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.
   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.
   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.
   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.
   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.
   END OF TERMS AND CONDITIONS
   APPENDIX: How to apply the Apache License to your work.
      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.
   Copyright [yyyy] [name of copyright owner]
   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at
       http://www.apache.org/licenses/LICENSE-2.0
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
--- a/third_party/chinese_text_normalization/thrax/src/Makefile
+++ b/third_party/chinese_text_normalization/thrax/src/Makefile
@ -0,0 +1,65 @@
 en/verbalizer/podspeech.far: en/verbalizer/podspeech.grm util/util.far util/case.far en/verbalizer/extra_numbers.far en/verbalizer/float.far en/verbalizer/math.far en/verbalizer/miscellaneous.far en/verbalizer/money.far en/verbalizer/numbers.far en/verbalizer/numbers_plus.far en/verbalizer/spelled.far en/verbalizer/spoken_punct.far en/verbalizer/time.far en/verbalizer/urls.far
 	thraxcompiler --input_grammar=$< --output_far=$@
 util/util.far: util/util.grm util/byte.far util/case.far
 	thraxcompiler --input_grammar=$< --output_far=$@
 util/byte.far: util/byte.grm 
 	thraxcompiler --input_grammar=$< --output_far=$@
 util/case.far: util/case.grm util/byte.far
 	thraxcompiler --input_grammar=$< --output_far=$@
 en/verbalizer/extra_numbers.far: en/verbalizer/extra_numbers.grm util/byte.far en/verbalizer/numbers.far
 	thraxcompiler --input_grammar=$< --output_far=$@
 en/verbalizer/numbers.far: en/verbalizer/numbers.grm en/verbalizer/number_names.far util/byte.far universal/thousands_punct.far
 	thraxcompiler --input_grammar=$< --output_far=$@
 en/verbalizer/number_names.far: en/verbalizer/number_names.grm util/arithmetic.far en/verbalizer/g.fst en/verbalizer/cardinals.tsv en/verbalizer/ordinals.tsv
 	thraxcompiler --input_grammar=$< --output_far=$@
 util/arithmetic.far: util/arithmetic.grm util/byte.far util/germanic.tsv
 	thraxcompiler --input_grammar=$< --output_far=$@
 universal/thousands_punct.far: universal/thousands_punct.grm util/byte.far util/util.far
 	thraxcompiler --input_grammar=$< --output_far=$@
 en/verbalizer/float.far: en/verbalizer/float.grm en/verbalizer/factorization.far en/verbalizer/lexical_map.far en/verbalizer/numbers.far
 	thraxcompiler --input_grammar=$< --output_far=$@
 en/verbalizer/factorization.far: en/verbalizer/factorization.grm util/byte.far util/util.far en/verbalizer/numbers.far
 	thraxcompiler --input_grammar=$< --output_far=$@
 en/verbalizer/lexical_map.far: en/verbalizer/lexical_map.grm util/byte.far en/verbalizer/lexical_map.tsv
 	thraxcompiler --input_grammar=$< --output_far=$@
 en/verbalizer/math.far: en/verbalizer/math.grm en/verbalizer/float.far en/verbalizer/lexical_map.far en/verbalizer/numbers.far
 	thraxcompiler --input_grammar=$< --output_far=$@
 en/verbalizer/miscellaneous.far: en/verbalizer/miscellaneous.grm util/byte.far ru/classifier/cyrillic.far en/verbalizer/extra_numbers.far en/verbalizer/lexical_map.far en/verbalizer/numbers.far en/verbalizer/spelled.far
 	thraxcompiler --input_grammar=$< --output_far=$@
 ru/classifier/cyrillic.far: ru/classifier/cyrillic.grm 
 	thraxcompiler --input_grammar=$< --output_far=$@
 en/verbalizer/spelled.far: en/verbalizer/spelled.grm util/byte.far ru/classifier/cyrillic.far en/verbalizer/lexical_map.far en/verbalizer/numbers.far
 	thraxcompiler --input_grammar=$< --output_far=$@
 en/verbalizer/money.far: en/verbalizer/money.grm util/byte.far en/verbalizer/lexical_map.far en/verbalizer/numbers.far en/verbalizer/money.tsv
 	thraxcompiler --input_grammar=$< --output_far=$@
 en/verbalizer/numbers_plus.far: en/verbalizer/numbers_plus.grm en/verbalizer/factorization.far en/verbalizer/lexical_map.far en/verbalizer/numbers.far
 	thraxcompiler --input_grammar=$< --output_far=$@
 en/verbalizer/spoken_punct.far: en/verbalizer/spoken_punct.grm en/verbalizer/lexical_map.far
 	thraxcompiler --input_grammar=$< --output_far=$@
 en/verbalizer/time.far: en/verbalizer/time.grm util/byte.far en/verbalizer/lexical_map.far en/verbalizer/numbers.far
 	thraxcompiler --input_grammar=$< --output_far=$@
 en/verbalizer/urls.far: en/verbalizer/urls.grm util/byte.far en/verbalizer/lexical_map.far
 	thraxcompiler --input_grammar=$< --output_far=$@
 clean:
 	rm -f util/util.far util/case.far en/verbalizer/extra_numbers.far en/verbalizer/float.far en/verbalizer/math.far en/verbalizer/miscellaneous.far en/verbalizer/money.far en/verbalizer/numbers.far en/verbalizer/numbers_plus.far en/verbalizer/spelled.far en/verbalizer/spoken_punct.far en/verbalizer/time.far en/verbalizer/urls.far util/byte.far en/verbalizer/number_names.far universal/thousands_punct.far util/arithmetic.far en/verbalizer/factorization.far en/verbalizer/lexical_map.far ru/classifier/cyrillic.far
--- a/third_party/chinese_text_normalization/thrax/src/README.md
+++ b/third_party/chinese_text_normalization/thrax/src/README.md
@ -0,0 +1,24 @@
 # Text normalization covering grammars
 This repository provides covering grammars for English and Russian text normalization as
 documented in:
  Gorman, K., and Sproat, R. 2016. Minimally supervised number normalization.
  _Transactions of the Association for Computational Linguistics_ 4: 507-519.
  Ng, A. H., Gorman, K., and Sproat, R. 2017. Minimally supervised
  written-to-spoken text normalization. In _ASRU_, pages 665-670.
 If you use these grammars in a publication, we would appreciate if you cite these works.
 ## Building
 The grammars are written in [Thrax](thrax.opengrm.org) and compile into [OpenFst](openfst.org) FAR (FstARchive) files. To compile, simply run `make` in the `src/` directory.
 ## License
 See `LICENSE`.
 ## Mandatory disclaimer
 This is not an official Google product.
--- a/third_party/chinese_text_normalization/thrax/src/cn/Makefile
+++ b/third_party/chinese_text_normalization/thrax/src/cn/Makefile
@ -0,0 +1,23 @@
 itn.far: itn.grm byte.far number.far hotfix.far percentage.far date.far amount.far
 	thraxcompiler --input_grammar=$< --output_far=$@
 byte.far: byte.grm 
 	thraxcompiler --input_grammar=$< --output_far=$@
 number.far: number.grm byte.far
 	thraxcompiler --input_grammar=$< --output_far=$@
 hotfix.far: hotfix.grm byte.far hotfix.list
 	thraxcompiler --input_grammar=$< --output_far=$@
 percentage.far: percentage.grm byte.far number.far
 	thraxcompiler --input_grammar=$< --output_far=$@
 date.far: date.grm byte.far number.far
 	thraxcompiler --input_grammar=$< --output_far=$@
 amount.far: amount.grm byte.far number.far
 	thraxcompiler --input_grammar=$< --output_far=$@
 clean:
 	rm -f byte.far number.far hotfix.far percentage.far date.far amount.far
--- a/third_party/chinese_text_normalization/thrax/src/cn/amount.grm
+++ b/third_party/chinese_text_normalization/thrax/src/cn/amount.grm
@ -0,0 +1,24 @@
 import 'byte.grm' as b;
 import 'number.grm' as n;
 unit = (
 	"匹"|"张"|"座"|"回"|"场"|"尾"|"条"|"个"|"首"|"阙"|"阵"|"网"|"炮"|
 	"顶"|"丘"|"棵"|"只"|"支"|"袭"|"辆"|"挑"|"担"|"颗"|"壳"|"窠"|"曲"|
 	"墙"|"群"|"腔"|"砣"|"座"|"客"|"贯"|"扎"|"捆"|"刀"|"令"|"打"|"手"|
 	"罗"|"坡"|"山"|"岭"|"江"|"溪"|"钟"|"队"|"单"|"双"|"对"|"出"|"口"|
 	"头"|"脚"|"板"|"跳"|"枝"|"件"|"贴"|"针"|"线"|"管"|"名"|"位"|"身"|
 	"堂"|"课"|"本"|"页"|"家"|"户"|"层"|"丝"|"毫"|"厘"|"分"|"钱"|"两"|
 	"斤"|"担"|"铢"|"石"|"钧"|"锱"|"忽"|"毫"|"厘"|"分"|"寸"|"尺"|"丈"|
 	"里"|"寻"|"常"|"铺"|"程"|"撮"|"勺"|"合"|"升"|"斗"|"石"|"盘"|"碗"|
 	"碟"|"叠"|"桶"|"笼"|"盆"|"盒"|"杯"|"钟"|"斛"|"锅"|"簋"|"篮"|"盘"|
 	"桶"|"罐"|"瓶"|"壶"|"卮"|"盏"|"箩"|"箱"|"煲"|"啖"|"袋"|"钵"|"年"|
 	"月"|"日"|"季"|"刻"|"时"|"周"|"天"|"秒"|"分"|"旬"|"纪"|"岁"|"世"|
 	"更"|"夜"|"春"|"夏"|"秋"|"冬"|"代"|"伏"|"辈"|"丸"|"泡"|"粒"|"颗"|
 	"幢"|"堆"|"条"|"根"|"支"|"道"|"面"|"片"|"张"|"颗"|"块"|
 	(("千克":"kg")|("毫克":"mg")|("微克":"µg"))|
 	(("千米":"km")|("厘米":"cm")|("毫米":"mm")|("微米":"µm")|("纳米":"nm"))
 );
 amount = n.number unit;
 export AMOUNT = CDRewrite[amount, "", "", b.kBytes*];
--- a/third_party/chinese_text_normalization/thrax/src/cn/byte.grm
+++ b/third_party/chinese_text_normalization/thrax/src/cn/byte.grm
@ -0,0 +1,76 @@
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
 # Copyright 2005-2011 Google, Inc.
 # Author: ttai@google.com (Terry Tai)
 # Standard constants for ASCII (byte) based strings.  This mirrors the
 # functions provided by C/C++'s ctype.h library.
 # Note that [0] is missing.  Matching the string-termination character is kinda weird.
 export kBytes = Optimize[
  "[1]" |   "[2]" |   "[3]" |   "[4]" |   "[5]" |   "[6]" |   "[7]" |   "[8]" |   "[9]" |  "[10]" |
 "[11]" |  "[12]" |  "[13]" |  "[14]" |  "[15]" |  "[16]" |  "[17]" |  "[18]" |  "[19]" |  "[20]" |
 "[21]" |  "[22]" |  "[23]" |  "[24]" |  "[25]" |  "[26]" |  "[27]" |  "[28]" |  "[29]" |  "[30]" |
 "[31]" |  "[32]" |  "[33]" |  "[34]" |  "[35]" |  "[36]" |  "[37]" |  "[38]" |  "[39]" |  "[40]" |
 "[41]" |  "[42]" |  "[43]" |  "[44]" |  "[45]" |  "[46]" |  "[47]" |  "[48]" |  "[49]" |  "[50]" |
 "[51]" |  "[52]" |  "[53]" |  "[54]" |  "[55]" |  "[56]" |  "[57]" |  "[58]" |  "[59]" |  "[60]" |
 "[61]" |  "[62]" |  "[63]" |  "[64]" |  "[65]" |  "[66]" |  "[67]" |  "[68]" |  "[69]" |  "[70]" |
 "[71]" |  "[72]" |  "[73]" |  "[74]" |  "[75]" |  "[76]" |  "[77]" |  "[78]" |  "[79]" |  "[80]" |
 "[81]" |  "[82]" |  "[83]" |  "[84]" |  "[85]" |  "[86]" |  "[87]" |  "[88]" |  "[89]" |  "[90]" |
 "[91]" |  "[92]" |  "[93]" |  "[94]" |  "[95]" |  "[96]" |  "[97]" |  "[98]" |  "[99]" | "[100]" |
 "[101]" | "[102]" | "[103]" | "[104]" | "[105]" | "[106]" | "[107]" | "[108]" | "[109]" | "[110]" |
 "[111]" | "[112]" | "[113]" | "[114]" | "[115]" | "[116]" | "[117]" | "[118]" | "[119]" | "[120]" |
 "[121]" | "[122]" | "[123]" | "[124]" | "[125]" | "[126]" | "[127]" | "[128]" | "[129]" | "[130]" |
 "[131]" | "[132]" | "[133]" | "[134]" | "[135]" | "[136]" | "[137]" | "[138]" | "[139]" | "[140]" |
 "[141]" | "[142]" | "[143]" | "[144]" | "[145]" | "[146]" | "[147]" | "[148]" | "[149]" | "[150]" |
 "[151]" | "[152]" | "[153]" | "[154]" | "[155]" | "[156]" | "[157]" | "[158]" | "[159]" | "[160]" |
 "[161]" | "[162]" | "[163]" | "[164]" | "[165]" | "[166]" | "[167]" | "[168]" | "[169]" | "[170]" |
 "[171]" | "[172]" | "[173]" | "[174]" | "[175]" | "[176]" | "[177]" | "[178]" | "[179]" | "[180]" |
 "[181]" | "[182]" | "[183]" | "[184]" | "[185]" | "[186]" | "[187]" | "[188]" | "[189]" | "[190]" |
 "[191]" | "[192]" | "[193]" | "[194]" | "[195]" | "[196]" | "[197]" | "[198]" | "[199]" | "[200]" |
 "[201]" | "[202]" | "[203]" | "[204]" | "[205]" | "[206]" | "[207]" | "[208]" | "[209]" | "[210]" |
 "[211]" | "[212]" | "[213]" | "[214]" | "[215]" | "[216]" | "[217]" | "[218]" | "[219]" | "[220]" |
 "[221]" | "[222]" | "[223]" | "[224]" | "[225]" | "[226]" | "[227]" | "[228]" | "[229]" | "[230]" |
 "[231]" | "[232]" | "[233]" | "[234]" | "[235]" | "[236]" | "[237]" | "[238]" | "[239]" | "[240]" |
 "[241]" | "[242]" | "[243]" | "[244]" | "[245]" | "[246]" | "[247]" | "[248]" | "[249]" | "[250]" |
 "[251]" | "[252]" | "[253]" | "[254]" | "[255]"
 ];
 export kDigit = Optimize[
    "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
 ];
 export kLower = Optimize[
    "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" |
    "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
 ];
 export kUpper = Optimize[
    "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" |
    "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
 ];
 export kAlpha = Optimize[kLower | kUpper];
 export kAlnum = Optimize[kDigit | kAlpha];
 export kSpace = Optimize[
    " " | "\t" | "\n" | "\r"
 ];
 export kNotSpace = Optimize[kBytes - kSpace];
 export kPunct = Optimize[
    "!" | "\"" | "#" | "$" | "%" | "&" | "'" | "(" | ")" | "*" | "+" | "," |
    "-" | "." | "/" | ":" | ";" | "<" | "=" | ">" | "?" | "@" | "\[" | "\\" |
    "\]" | "^" | "_" | "`" | "{" | "|" | "}" | "~"
 ];
 export kGraph = Optimize[kAlnum | kPunct];
--- a/third_party/chinese_text_normalization/thrax/src/cn/date.grm
+++ b/third_party/chinese_text_normalization/thrax/src/cn/date.grm
@ -0,0 +1,10 @@
 import 'byte.grm' as b;
 import 'number.grm' as n;
 date_day = n.number_1_to_99 ("日"|"号");
 date_month_day = n.number_1_to_99 "月" date_day;
 date_year_month_day = ((n.number_0_to_9){2,4} | n.number) "年" date_month_day;
 date = date_year_month_day | date_month_day | date_day;
 export DATE = CDRewrite[date, "", "", b.kBytes*];
--- a/third_party/chinese_text_normalization/thrax/src/cn/hotfix.grm
+++ b/third_party/chinese_text_normalization/thrax/src/cn/hotfix.grm
@ -0,0 +1,5 @@
 import 'byte.grm' as b;
 hotfix = StringFile['hotfix.list'];
 export HOTFIX = CDRewrite[hotfix, "", "", b.kBytes*];
--- a/third_party/chinese_text_normalization/thrax/src/cn/hotfix.list
+++ b/third_party/chinese_text_normalization/thrax/src/cn/hotfix.list
@ -0,0 +1,18 @@
 0头	零头
 10字	十字
 东4环	东4环	-1.0
 东4	东四	-0.5
 4惠	四惠
 3元桥	三元桥
 4平市	四平市
 5台山	五台山
 西2旗	西二旗
 西3旗	西三旗
 4道口	四道口	-1.0
 5道口	五道口	-1.0
 6道口	六道口	-1.0
 6里桥	六里桥
 7里庄	七里庄
 8宝山	八宝山
 9颗松	九棵松
 10里堡	十里堡
--- a/third_party/chinese_text_normalization/thrax/src/cn/itn.grm
+++ b/third_party/chinese_text_normalization/thrax/src/cn/itn.grm
@ -0,0 +1,9 @@
 import 'byte.grm' as b;
 import 'number.grm' as number;
 import 'hotfix.grm' as hotfix;
 import 'percentage.grm' as percentage;
 import 'date.grm' as date;
 import 'amount.grm' as amount; # seems not useful for now
 export ITN = Optimize[percentage.PERCENTAGE @ (date.DATE <-1>) @ number.NUMBER @ hotfix.HOTFIX];
--- a/third_party/chinese_text_normalization/thrax/src/cn/number.grm
+++ b/third_party/chinese_text_normalization/thrax/src/cn/number.grm
@ -0,0 +1,61 @@
 import 'byte.grm' as b;
 number_1_to_9 = (
  ("一":"1") | ("幺":"1") |
  ("二":"2") | ("两":"2") |
  ("三":"3") |
  ("四":"4") |
  ("五":"5") |
  ("六":"6") |
  ("七":"7") |
  ("八":"8") |
  ("九":"9") 
 );
 export number_0_to_9 = (("零":"0") | number_1_to_9);
 number_10_to_19 = (
  ("十":"10") |
  ("十一":"11") |
  ("十二":"12") |
  ("十三":"13") |
  ("十四":"14") |
  ("十五":"15") |
  ("十六":"16") |
  ("十七":"17") |
  ("十八":"18") |
  ("十九":"19") 
 );
 number_10s    = (number_1_to_9 ("十":""));
 number_100s   = (number_1_to_9 ("百":""));
 number_1000s  = (number_1_to_9 ("千":""));
 number_10000s = (number_1_to_9 ("万":""));
 number_10_to_99 = (
  ((number_10s number_1_to_9)<-0.3>) | 
  ((number_10s ("":"0"))<-0.2>) | 
  (number_10_to_19 <-0.1>)
 );
 export number_1_to_99 = (number_1_to_9 | number_10_to_99);
 number_100_to_999 = (
  ((number_100s ("零":"0") number_1_to_9)<0.0>)|
  ((number_100s number_10_to_99)<0.0>) |
  ((number_100s number_1_to_9 ("":"0"))<0.0>) |
  ((number_100s ("":"00"))<0.1>)
 );
 number_1000_to_9999 = (
  ((number_1000s number_100_to_999)<0.0>) |
  ((number_1000s ("零":"0") number_10_to_99)<0.0>)|
  ((number_1000s ("零":"00") number_1_to_9)<0.0>)|
  ((number_1000s ("":"000"))<1>) |
  ((number_1000s number_1_to_9 ("":"00"))<0.0>)
 );
 export number = number_1_to_99 | (number_100_to_999 <-1>) | (number_1000_to_9999 <-2>);
 export NUMBER = CDRewrite[number, "", "", b.kBytes*];
--- a/third_party/chinese_text_normalization/thrax/src/cn/percentage.grm
+++ b/third_party/chinese_text_normalization/thrax/src/cn/percentage.grm
@ -0,0 +1,8 @@
 import 'byte.grm' as b;
 import 'number.grm' as n;
 percentage = (
  ("百分之":"") n.number_1_to_99 ("":"%")
 );
 export PERCENTAGE = CDRewrite[percentage, "", "", b.kBytes*];
--- a/third_party/chinese_text_normalization/thrax/src/en/README.md
+++ b/third_party/chinese_text_normalization/thrax/src/en/README.md
@ -0,0 +1,6 @@
 # English covering grammar definitions
 This directory defines a English text normalization covering grammar. The
 primary entry-point is the FST `VERBALIZER`, defined in
 `verbalizer/verbalizer.grm` and compiled in the FST archive
 `verbalizer/verbalizer.far`.
--- a/third_party/chinese_text_normalization/thrax/src/en/verbalizer/Makefile
+++ b/third_party/chinese_text_normalization/thrax/src/en/verbalizer/Makefile
@ -0,0 +1,3 @@
 verbalizer.far: verbalizer.grm util/util.far en/verbalizer/extra_numbers.far en/verbalizer/float.far en/verbalizer/math.far en/verbalizer/miscellaneous.far en/verbalizer/money.far en/verbalizer/numbers.far en/verbalizer/numbers_plus.far en/verbalizer/spelled.far en/verbalizer/spoken_punct.far en/verbalizer/time.far en/verbalizer/urls.far
 	thraxcompiler --input_grammar=$< --output_far=$@
--- a/third_party/chinese_text_normalization/thrax/src/en/verbalizer/cardinals.tsv
+++ b/third_party/chinese_text_normalization/thrax/src/en/verbalizer/cardinals.tsv
@ -0,0 +1,32 @@
 0	zero
 1	one
 2	two
 3	three
 4	four
 5	five
 6	six
 7	seven
 8	eight
 9	nine
 10	ten
 11	eleven
 12	twelve
 13	thirteen
 14	fourteen
 15	fifteen
 16	sixteen
 17	seventeen
 18	eighteen
 19	nineteen
 20	twenty
 30	thirty
 40	forty
 50	fifty
 60	sixty
 70	seventy
 80	eighty
 90	ninety
 100	hundred
 1000	thousand
 1000000	million
 1000000000	billion
--- a/third_party/chinese_text_normalization/thrax/src/en/verbalizer/extra_numbers.grm
+++ b/third_party/chinese_text_normalization/thrax/src/en/verbalizer/extra_numbers.grm
@ -0,0 +1,35 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import 'util/byte.grm' as b;
 import 'en/verbalizer/numbers.grm' as n;
 digit = b.kDigit @ n.CARDINAL_NUMBERS | ("0" : "@@OTHER_ZERO_VERBALIZATIONS@@");
 export DIGITS  = digit (n.I[" "] digit)*;
 # Various common factorizations
 two_digits = b.kDigit{2} @ n.CARDINAL_NUMBERS;
 three_digits = b.kDigit{3} @ n.CARDINAL_NUMBERS;
 mixed =
   (digit n.I[" "] two_digits)
 | (two_digits n.I[" "] two_digits)
 | (two_digits n.I[" "] three_digits)
 | (two_digits n.I[" "] two_digits n.I[" "] two_digits)
 ;
 export MIXED_NUMBERS = Optimize[mixed];
--- a/third_party/chinese_text_normalization/thrax/src/en/verbalizer/factorization.grm
+++ b/third_party/chinese_text_normalization/thrax/src/en/verbalizer/factorization.grm
@ -0,0 +1,40 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import 'util/byte.grm' as b;
 import 'util/util.grm' as u;
 import 'en/verbalizer/numbers.grm' as n;
 func ToNumberName[expr] {
  number_name_seq = n.CARDINAL_NUMBERS (" " n.CARDINAL_NUMBERS)*;
  return Optimize[expr @ number_name_seq];
 }
 d = b.kDigit;
 leading_zero = CDRewrite[n.I[" "], ("[BOS]" | " ") "0", "", b.kBytes*];
 by_ones = d n.I[" "];
 by_twos = (d{2} @ leading_zero) n.I[" "];
 by_threes = (d{3} @ leading_zero) n.I[" "];
 groupings = by_twos* (by_threes | by_twos | by_ones);
 export FRACTIONAL_PART_UNGROUPED =
  Optimize[ToNumberName[by_ones+ @ u.CLEAN_SPACES]]
 ;
 export FRACTIONAL_PART_GROUPED =
  Optimize[ToNumberName[groupings @ u.CLEAN_SPACES]]
 ;
 export FRACTIONAL_PART_UNPARSED = Optimize[ToNumberName[d*]];
--- a/third_party/chinese_text_normalization/thrax/src/en/verbalizer/float.grm
+++ b/third_party/chinese_text_normalization/thrax/src/en/verbalizer/float.grm
@ -0,0 +1,30 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import 'en/verbalizer/factorization.grm' as f;
 import 'en/verbalizer/lexical_map.grm' as l;
 import 'en/verbalizer/numbers.grm' as n;
 fractional_part_ungrouped = f.FRACTIONAL_PART_UNGROUPED;
 fractional_part_grouped = f.FRACTIONAL_PART_GROUPED;
 fractional_part_unparsed = f.FRACTIONAL_PART_UNPARSED;
 __fractional_part__ = fractional_part_ungrouped | fractional_part_unparsed;
 __decimal_marker__ = ".";
 export FLOAT = Optimize[
 (n.CARDINAL_NUMBERS
  (__decimal_marker__ : " @@DECIMAL_DOT_EXPRESSION@@ ")
  __fractional_part__) @ l.LEXICAL_MAP]
 ;
--- a/third_party/chinese_text_normalization/thrax/src/en/verbalizer/g.fst
+++ b/third_party/chinese_text_normalization/thrax/src/en/verbalizer/g.fst
--- a/third_party/chinese_text_normalization/thrax/src/en/verbalizer/lexical_map.grm
+++ b/third_party/chinese_text_normalization/thrax/src/en/verbalizer/lexical_map.grm
@ -0,0 +1,25 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import 'util/byte.grm' as b;
 lexical_map = StringFile['en/verbalizer/lexical_map.tsv'];
 sigma_star = b.kBytes*;
 del_null = CDRewrite["__NULL__" : "", "", "", sigma_star];
 export LEXICAL_MAP = Optimize[
  CDRewrite[lexical_map, "", "", sigma_star] @ del_null]
 ;
--- a/third_party/chinese_text_normalization/thrax/src/en/verbalizer/lexical_map.tsv
+++ b/third_party/chinese_text_normalization/thrax/src/en/verbalizer/lexical_map.tsv
@ -0,0 +1,74 @@
@@CONNECTOR_RANGE@@	to
@@CONNECTOR_RATIO@@	to
@@CONNECTOR_BY@@	by
@@CONNECTOR_CONSECUTIVE_YEAR@@	to
@@JANUARY@@	january
@@FEBRUARY@@	february
@@MARCH@@	march
@@APRIL@@	april
@@MAY@@	may
@@JUNE@@	june
@@JULY@@	july
@@AUGUST@@	august
@@SEPTEMBER@@	september
@@OCTOBER@@	october
@@NOVEMBER@@	november
@@DECEMBER@@	december
@@MINUS@@	minus
@@DECIMAL_DOT_EXPRESSION@@	point
@@URL_DOT_EXPRESSION@@	dot
@@DECIMAL_EXPONENT@@	to the
@@DECIMAL_EXPONENT@@	to the power of
@@COLON@@	colon
@@SLASH@@	slash
@@SLASH@@	forward slash
@@DASH@@	dash
@@PASSWORD@@	password
@@AT@@	at
@@PORT@@	port
@@QUESTION_MARK@@	question mark
@@HASH@@	hash
@@HASH@@	hash tag
@@FRACTION_OVER@@	over
@@MONEY_AND@@	and
@@AND@@	and
@@PHONE_PLUS@@	plus
@@PHONE_EXTENSION@@	extension
@@TIME_AM@@		a m
@@TIME_PM@@		p m
@@HOUR@@		o'clock
@@MINUTE@@		minute
@@MINUTE@@		minutes
@@TIME_AFTER@@		after
@@TIME_AFTER@@		past
@@TIME_BEFORE@@		to
@@TIME_BEFORE@@		till
@@TIME_QUARTER@@	quarter
@@TIME_HALF@@		half
@@TIME_ZERO@@		oh
@@TIME_THREE_QUARTER@@	three quarters
@@ARITHMETIC_PLUS@@	plus
@@ARITHMETIC_TIMES@@	times
@@ARITHMETIC_TIMES@@	multiplied by
@@ARITHMETIC_MINUS@@	minus
@@ARITHMETIC_DIVISION@@	divided by
@@ARITHMETIC_DIVISION@@	over
@@ARITHMETIC_EQUALS@@	equals
@@PERCENT@@		percent
@@DEGREE@@		degree
@@DEGREE@@		degrees
@@SQUARE_ROOT@@		square root of
@@SQUARE_ROOT@@		the square root of
@@STAR@@		star
@@HYPHEN@@		hyphen
@@AT@@			at
@@PER@@			per
@@PERIOD@@		period
@@PERIOD@@		full stop
@@PERIOD@@		dot
@@EXCLAMATION_MARK@@	exclamation mark
@@EXCLAMATION_MARK@@	exclamation point
@@COMMA@@		comma
@@POSITIVE@@		positive
@@NEGATIVE@@		negative
@@OTHER_ZERO_VERBALIZATIONS@@	oh
--- a/third_party/chinese_text_normalization/thrax/src/en/verbalizer/math.grm
+++ b/third_party/chinese_text_normalization/thrax/src/en/verbalizer/math.grm
@ -0,0 +1,34 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import 'en/verbalizer/float.grm' as f;
 import 'en/verbalizer/lexical_map.grm' as l;
 import 'en/verbalizer/numbers.grm' as n;
 float = f.FLOAT;
 card = n.CARDINAL_NUMBERS;
 number = card | float;
 plus = "+" : " @@ARITHMETIC_PLUS@@ ";
 times = "*" : " @@ARITHMETIC_TIMES@@ ";
 minus = "-" : " @@ARITHMETIC_MINUS@@ ";
 division = "/" : " @@ARITHMETIC_DIVISION@@ ";
 operator = plus | times | minus | division;
 percent = "%" : " @@PERCENT@@";
 export ARITHMETIC =
  Optimize[((number operator number) | (number percent)) @ l.LEXICAL_MAP]
 ;
--- a/third_party/chinese_text_normalization/thrax/src/en/verbalizer/miscellaneous.grm
+++ b/third_party/chinese_text_normalization/thrax/src/en/verbalizer/miscellaneous.grm
@ -0,0 +1,78 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import 'util/byte.grm' as b;
 import 'ru/classifier/cyrillic.grm' as c;
 import 'en/verbalizer/extra_numbers.grm' as e;
 import 'en/verbalizer/lexical_map.grm' as l;
 import 'en/verbalizer/numbers.grm' as n;
 import 'en/verbalizer/spelled.grm' as s;
 letter = b.kAlpha | c.kCyrillicAlpha;
 dash   = "-";
 word = letter+;
 possibly_split_word = word (((dash | ".") : " ") word)* n.D["."]?;
 post_word_symbol =
   ("+" : ("@@ARITHMETIC_PLUS@@" | "@@POSITIVE@@")) |
   ("-" : ("@@ARITHMETIC_MINUS@@" | "@@NEGATIVE@@")) |
   ("*" : "@@STAR@@")
 ;
 pre_word_symbol =
   ("@" : "@@AT@@") |
   ("/" : "@@SLASH@@") |
   ("#" : "@@HASH@@")
 ;
 post_word = possibly_split_word n.I[" "] post_word_symbol;
 pre_word = pre_word_symbol n.I[" "] possibly_split_word;
 ## Number/digit sequence combos, maybe with a dash
 spelled_word = word @ s.SPELLED_NO_LETTER;
 word_number =
  (word | spelled_word)
  (n.I[" "] | (dash : " "))
  (e.DIGITS | n.CARDINAL_NUMBERS | e.MIXED_NUMBERS)
 ;
 number_word =
  (e.DIGITS | n.CARDINAL_NUMBERS | e.MIXED_NUMBERS)
  (n.I[" "] | (dash : " "))
  (word | spelled_word)
 ;
 ## Two-digit year.
 # Note that in this case to be fair we really have to allow ordinals too since
 # in some languages that's what you would have.
 two_digit_year = n.D["'"] (b.kDigit{2} @ (n.CARDINAL_NUMBERS | e.DIGITS));
 dot_com = ("." : "@@URL_DOT_EXPRESSION@@") n.I[" "] "com";
 miscellaneous = Optimize[
    possibly_split_word
  | post_word
  | pre_word
  | word_number
  | number_word
  | two_digit_year
  | dot_com
 ];
 export MISCELLANEOUS = Optimize[miscellaneous @ l.LEXICAL_MAP];
--- a/third_party/chinese_text_normalization/thrax/src/en/verbalizer/money.grm
+++ b/third_party/chinese_text_normalization/thrax/src/en/verbalizer/money.grm
@ -0,0 +1,44 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import 'util/byte.grm' as b;
 import 'en/verbalizer/lexical_map.grm' as l;
 import 'en/verbalizer/numbers.grm' as n;
 card = n.CARDINAL_NUMBERS;
 __currency__ = StringFile['en/verbalizer/money.tsv'];
 d = b.kDigit;
 D = d - "0";
 cents = ((n.D["0"] | D) d) @ card;
 # Only dollar for the verbalizer tests for English. Will need to add other
 # currencies.
 usd_maj = Project["usd_maj" @ __currency__, 'output'];
 usd_min = Project["usd_min" @ __currency__, 'output'];
 and = " @@MONEY_AND@@ " | " ";
 dollar1 =
  n.D["$"] card n.I[" " usd_maj] n.I[and] n.D["."] cents n.I[" " usd_min]
 ;
 dollar2 = n.D["$"] card n.I[" " usd_maj] n.D["."] n.D["00"];
 dollar3 = n.D["$"] card n.I[" " usd_maj];
 dollar = Optimize[dollar1 | dollar2 | dollar3];
 export MONEY = Optimize[dollar @ l.LEXICAL_MAP];
--- a/third_party/chinese_text_normalization/thrax/src/en/verbalizer/money.tsv
+++ b/third_party/chinese_text_normalization/thrax/src/en/verbalizer/money.tsv
@ -0,0 +1,4 @@
 usd_maj	dollar
 usd_maj	dollars
 usd_min	cent
 usd_min	cents
--- a/third_party/chinese_text_normalization/thrax/src/en/verbalizer/number_names.grm
+++ b/third_party/chinese_text_normalization/thrax/src/en/verbalizer/number_names.grm
@ -0,0 +1,54 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
 # English minimally supervised number grammar.
 #
 # Supports both cardinals and ordinals without overt marking.
 #
 # The language-specific acceptor G was compiled with digit, teen, and decade
 # preterminals. The lexicon transducer L is unambiguous so no LM is used.
 import 'util/arithmetic.grm' as a;
 # Intersects the universal factorization transducer (F) with the
 # language-specific acceptor (G).
 d = a.DELTA_STAR;
 f = a.IARITHMETIC_RESTRICTED;
 g = LoadFst['en/verbalizer/g.fst'];
 fg = Optimize[d @ Optimize[f @ Optimize[f @ Optimize[f @ g]]]];
 test1 = AssertEqual["230" @ fg, "(+ (* 2 100 *) 30 +)"];
 # Compiles lexicon transducer (L).
 cardinal_name = StringFile['en/verbalizer/cardinals.tsv'];
 cardinal_l = Optimize[(cardinal_name " ")* cardinal_name];
 test2 = AssertEqual["2 100 30" @ cardinal_l, "two hundred thirty"];
 ordinal_name = StringFile['en/verbalizer/ordinals.tsv'];
 # In English, ordinals have the same syntax as cardinals and all but the final
 # element is verbalized using a cardinal number word; e.g., "two hundred
 # thirtieth".
 ordinal_l = Optimize[(cardinal_name " ")* ordinal_name];
 test3 = AssertEqual["2 100 30" @ ordinal_l, "two hundred thirtieth"];
 # Composes L with the leaf transducer (P), then composes that with FG.
 p = a.LEAVES;
 export CARDINAL_NUMBER_NAME = Optimize[fg @ (p @ cardinal_l)];
 test4 = AssertEqual["230" @ CARDINAL_NUMBER_NAME, "two hundred thirty"];
 export ORDINAL_NUMBER_NAME = Optimize[fg @ (p @ ordinal_l)];
 test5 = AssertEqual["230" @ ORDINAL_NUMBER_NAME, "two hundred thirtieth"];
--- a/third_party/chinese_text_normalization/thrax/src/en/verbalizer/numbers.grm
+++ b/third_party/chinese_text_normalization/thrax/src/en/verbalizer/numbers.grm
@ -0,0 +1,57 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import 'en/verbalizer/number_names.grm' as n;
 import 'util/byte.grm' as bytelib;
 import 'universal/thousands_punct.grm' as t;
 cardinal = n.CARDINAL_NUMBER_NAME;
 ordinal = n.ORDINAL_NUMBER_NAME;
 # Putting these here since this grammar gets incorporated by all the others.
 func I[expr] {
  return "" : expr;
 }
 func D[expr] {
  return expr : "";
 }
 separators = t.comma_thousands | t.no_delimiter;
 # Language specific endings for ordinals.
 d = bytelib.kDigit;
 endings = "st" | "nd" | "rd" | "th";
 st = (d* "1") - (d* "11");
 nd = (d* "2") - (d* "12");
 rd = (d* "3") - (d* "13");
 th = Optimize[d* - st - nd - rd];
 first = st ("st" : "");
 second = nd ("nd" : "");
 third = rd ("rd" : "");
 other = th ("th" : "");
 marked_ordinal = Optimize[first | second | third | other];
 # The separator is a no-op here but will be needed once we replace
 # the above targets.
 export CARDINAL_NUMBERS = Optimize[separators @ cardinal];
 export ORDINAL_NUMBERS =
  Optimize[(separators endings) @ marked_ordinal @ ordinal]
 ;
 export ORDINAL_NUMBERS_UNMARKED = Optimize[separators @ ordinal];
--- a/third_party/chinese_text_normalization/thrax/src/en/verbalizer/numbers_plus.grm
+++ b/third_party/chinese_text_normalization/thrax/src/en/verbalizer/numbers_plus.grm
@ -0,0 +1,133 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # Grammar for things built mostly on numbers.
 import 'en/verbalizer/factorization.grm' as f;
 import 'en/verbalizer/lexical_map.grm' as l;
 import 'en/verbalizer/numbers.grm' as n;
 num = n.CARDINAL_NUMBERS;
 ord = n.ORDINAL_NUMBERS_UNMARKED;
 digits = f.FRACTIONAL_PART_UNGROUPED;
 # Various symbols.
 plus = "+" : "@@ARITHMETIC_PLUS@@";
 minus = "-" : "@@ARITHMETIC_MINUS@@";
 slash = "/" : "@@SLASH@@";
 dot = "." : "@@URL_DOT_EXPRESSION@@";
 dash = "-" : "@@DASH@@";
 equals = "=" : "@@ARITHMETIC_EQUALS@@";
 degree = "°" : "@@DEGREE@@";
 division = ("/" | "÷") : "@@ARITHMETIC_DIVISION@@";
 times = ("x" | "*") : "@@ARITHMETIC_TIMES@@";
 power = "^" : "@@DECIMAL_EXPONENT@@";
 square_root = "√" : "@@SQUARE_ROOT@@";
 percent = "%" : "@@PERCENT@@";
 # Safe roman numbers.
 # NB: Do not change the formatting here. NO_EDIT must be on the same
 # line as the path.
 rfile = 
  'universal/roman_numerals.tsv' # NO_EDIT
 ;
 roman = StringFile[rfile];
 ## Main categories.
 cat_dot_number =
   num
   n.I[" "] dot n.I[" "] num
   (n.I[" "] dot n.I[" "] num)+
 ;
 cat_slash_number =
   num
   n.I[" "] slash n.I[" "] num
   (n.I[" "] slash n.I[" "] num)*
 ;
 cat_dash_number =
   num
   n.I[" "] dash n.I[" "] num
   (n.I[" "] dash n.I[" "] num)*
 ;
 cat_signed_number = ((plus | minus) n.I[" "])? num;
 cat_degree = cat_signed_number n.I[" "] degree;
 cat_country_code = plus n.I[" "] (num | digits);
 cat_math_operations =
     plus
   | minus
   | division
   | times
   | equals
   | percent
   | power
   | square_root
 ;
 # Roman numbers are often either cardinals or ordinals in various languages.
 cat_roman = roman @ (num | ord);
 # Allow
 #
 # number:number
 # number-number
 #
 # to just be
 #
 # number number.
 cat_number_number =
   num ((":" | "-") : " ") num
 ;
 # Some additional readings for these symbols.
 cat_additional_readings =
  ("/" : "@@PER@@") |
  ("+" : "@@AND@@") |
  ("-" : ("@@HYPHEN@@" | "@@CONNECTOR_TO@@")) |
  ("*" : "@@STAR@@") |
  ("x" : ("x" | "@@CONNECTOR_BY@@")) |
  ("@" : "@@AT@@")
 ;
 numbers_plus = Optimize[
   cat_dot_number
 | cat_slash_number
 | cat_dash_number
 | cat_signed_number
 | cat_degree
 | cat_country_code
 | cat_math_operations
 | cat_roman
 | cat_number_number
 | cat_additional_readings
 ];
 export NUMBERS_PLUS = Optimize[numbers_plus @ l.LEXICAL_MAP];
--- a/third_party/chinese_text_normalization/thrax/src/en/verbalizer/ordinals.tsv
+++ b/third_party/chinese_text_normalization/thrax/src/en/verbalizer/ordinals.tsv
@ -0,0 +1,32 @@
 0	zeroth
 1	first
 2	second
 3	third
 4	fourth
 5	fifth
 6	sixth
 7	seventh
 8	eighth
 9	ninth
 10	tenth
 11	eleventh
 12	twelfth
 13	thirteenth
 14	fourteenth
 15	fifteenth
 16	sixteenth
 17	seventeenth
 18	eighteenth
 19	nineteenth
 20	twentieth
 30	thirtieth
 40	fortieth
 50	fiftieth
 60	sixtieth
 70	seventieth
 80	eightieth
 90	ninetieth
 100	hundredth
 1000	thousandth
 1000000	millionth
 1000000000	billionth
--- a/third_party/chinese_text_normalization/thrax/src/en/verbalizer/params.tsv
+++ b/third_party/chinese_text_normalization/thrax/src/en/verbalizer/params.tsv
@ -0,0 +1,7 @@
 float.grm	__fractional_part__ = fractional_part_ungrouped | fractional_part_unparsed;
 telephone.grm	__grouping__ = f.UNGROUPED;
 measure.grm	__measure__ = StringFile['en/verbalizer/measures.tsv'];
 money.grm	__currency__ = StringFile['en/verbalizer/money.tsv'];
 time.grm	__sep__ = ":";
 time.grm	__am__ = "a.m." | "am" | "AM";
 time.grm	__pm__ = "p.m." | "pm" | "PM";
--- a/third_party/chinese_text_normalization/thrax/src/en/verbalizer/podspeech.grm
+++ b/third_party/chinese_text_normalization/thrax/src/en/verbalizer/podspeech.grm
@ -0,0 +1,46 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import 'util/util.grm' as util;
 import 'util/case.grm' as case;
 import 'en/verbalizer/extra_numbers.grm' as e;
 import 'en/verbalizer/float.grm' as f;
 import 'en/verbalizer/math.grm' as ma;
 import 'en/verbalizer/miscellaneous.grm' as mi;
 import 'en/verbalizer/money.grm' as mo;
 import 'en/verbalizer/numbers.grm' as n;
 import 'en/verbalizer/numbers_plus.grm' as np;
 import 'en/verbalizer/spelled.grm' as s;
 import 'en/verbalizer/spoken_punct.grm' as sp;
 import 'en/verbalizer/time.grm' as t;
 import 'en/verbalizer/urls.grm' as u;
 export POD_SPEECH_TN = Optimize[RmWeight[
 (u.URL 
  | e.MIXED_NUMBERS
  | e.DIGITS
  | f.FLOAT
  | ma.ARITHMETIC
  | mo.MONEY
  | n.CARDINAL_NUMBERS
  | n.ORDINAL_NUMBERS
  | np.NUMBERS_PLUS
  | s.SPELLED
  | sp.SPOKEN_PUNCT
  | t.TIME
  | u.URL
  | u.EMAILS) @ util.CLEAN_SPACES @ case.TOUPPER
 ]];
 #export POD_SPEECH_TN = Optimize[RmWeight[(mi.MISCELLANEOUS) @ util.CLEAN_SPACES @ case.TOUPPER]];
--- a/third_party/chinese_text_normalization/thrax/src/en/verbalizer/spelled.grm
+++ b/third_party/chinese_text_normalization/thrax/src/en/verbalizer/spelled.grm
@ -0,0 +1,77 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This verbalizer is used whenever there is an LM symbol that consists of
 # letters immediately followed by "{spelled}".l This strips the "{spelled}"
 # suffix.
 import 'util/byte.grm' as b;
 import 'ru/classifier/cyrillic.grm' as c;
 import 'en/verbalizer/lexical_map.grm' as l;
 import 'en/verbalizer/numbers.grm' as n;
 digit = b.kDigit @ n.CARDINAL_NUMBERS;
 char_set = (("a" | "A") : "letter-a")
        | (("b" | "B") : "letter-b")
        | (("c" | "C") : "letter-c")
        | (("d" | "D") : "letter-d")
        | (("e" | "E") : "letter-e")
        | (("f" | "F") : "letter-f")
        | (("g" | "G") : "letter-g")
        | (("h" | "H") : "letter-h")
        | (("i" | "I") : "letter-i")
        | (("j" | "J") : "letter-j")
        | (("k" | "K") : "letter-k")
        | (("l" | "L") : "letter-l")
        | (("m" | "M") : "letter-m")
        | (("n" | "N") : "letter-n")
        | (("o" | "O") : "letter-o")
        | (("p" | "P") : "letter-p")
        | (("q" | "Q") : "letter-q")
        | (("r" | "R") : "letter-r")
        | (("s" | "S") : "letter-s")
        | (("t" | "T") : "letter-t")
        | (("u" | "U") : "letter-u")
        | (("v" | "V") : "letter-v")
        | (("w" | "W") : "letter-w")
        | (("x" | "X") : "letter-x")
        | (("y" | "Y") : "letter-y")
        | (("z" | "Z") : "letter-z")
        | (digit)
        | ("&" : "@@AND@@")
        | ("." : "")
        | ("-" : "")
        | ("_" : "")
        | ("/" : "")
        | (n.I["letter-"] c.kCyrillicAlpha)
        ;
 ins_space = "" : " ";
 suffix = "{spelled}" : "";
 spelled = Optimize[char_set (ins_space char_set)* suffix];
 export SPELLED = Optimize[spelled @ l.LEXICAL_MAP];
 sigma_star = b.kBytes*;
 # Gets rid of the letter- prefix since in some cases we don't want it.
 del_letter = CDRewrite[n.D["letter-"], "", "", sigma_star];
 spelled_no_tag = Optimize[char_set (ins_space char_set)*];
 export SPELLED_NO_LETTER = Optimize[spelled_no_tag @ del_letter];
--- a/third_party/chinese_text_normalization/thrax/src/en/verbalizer/spoken_punct.grm
+++ b/third_party/chinese_text_normalization/thrax/src/en/verbalizer/spoken_punct.grm
@ -0,0 +1,24 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import 'en/verbalizer/lexical_map.grm' as l;
 punct =
   ("." : "@@PERIOD@@")
 | ("," : "@@COMMA@@")
 | ("!" : "@@EXCLAMATION_MARK@@")
 | ("?" : "@@QUESTION_MARK@@")
 ;
 export SPOKEN_PUNCT = Optimize[punct @ l.LEXICAL_MAP];
--- a/third_party/chinese_text_normalization/thrax/src/en/verbalizer/time.grm
+++ b/third_party/chinese_text_normalization/thrax/src/en/verbalizer/time.grm
@ -0,0 +1,108 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import 'util/byte.grm' as b;
 import 'en/verbalizer/lexical_map.grm' as l;
 import 'en/verbalizer/numbers.grm' as n;
 # Only handles 24-hour time with quarter-to, half-past and quarter-past.
 increment_hour =
    ("0" : "1")
  | ("1" : "2")
  | ("2" : "3")
  | ("3" : "4")
  | ("4" : "5")
  | ("5" : "6")
  | ("6" : "7")
  | ("7" : "8")
  | ("8" : "9")
  | ("9" : "10")
  | ("10" : "11")
  | ("11" : "12")
  | ("12" : "1")  # If someone uses 12, we assume 12-hour by default.
  | ("13" : "14")
  | ("14" : "15")
  | ("15" : "16")
  | ("16" : "17")
  | ("17" : "18")
  | ("18" : "19")
  | ("19" : "20")
  | ("20" : "21")
  | ("21" : "22")
  | ("22" : "23")
  | ("23" : "12")
 ;
 hours = Project[increment_hour, 'input'];
 d = b.kDigit;
 D = d - "0";
 minutes09 = "0" D;
 minutes = ("1" | "2" | "3" | "4" | "5") d;
 __sep__ = ":";
 sep_space = __sep__ : " ";
 verbalize_hours = hours @ n.CARDINAL_NUMBERS;
 verbalize_minutes =
   ("00" : "@@HOUR@@")
 | (minutes09 @ (("0" : "@@TIME_ZERO@@") n.I[" "] n.CARDINAL_NUMBERS))
 | (minutes @ n.CARDINAL_NUMBERS)
 ;
 time_basic = Optimize[verbalize_hours sep_space verbalize_minutes];
 # Special cases we handle right now.
 # TODO: Need to allow for cases like
 #
 #   half twelve (in the UK English sense)
 #   half twaalf (in the Dutch sense)
 time_quarter_past =
   n.I["@@TIME_QUARTER@@ @@TIME_AFTER@@ "]
   verbalize_hours
   n.D[__sep__ "15"];
 time_half_past =
   n.I["@@TIME_HALF@@ @@TIME_AFTER@@ "]
   verbalize_hours
   n.D[__sep__ "30"];
 time_quarter_to =
   n.I["@@TIME_QUARTER@@ @@TIME_BEFORE@@ "]
   (increment_hour @ verbalize_hours)
   n.D[__sep__ "45"];
 time_extra = Optimize[
  time_quarter_past | time_half_past | time_quarter_to]
 ;
 # Basic time periods which most languages can be expected to have.
 __am__ = "a.m." | "am" | "AM";
 __pm__ = "p.m." | "pm" | "PM";
 period = (__am__ : "@@TIME_AM@@") | (__pm__ : "@@TIME_PM@@");
 time_variants = time_basic | time_extra;
 time = Optimize[
    (period (" " | n.I[" "]))? time_variants
 |  time_variants ((" " | n.I[" "]) period)?]
 ;
 export TIME = Optimize[time @ l.LEXICAL_MAP];
--- a/third_party/chinese_text_normalization/thrax/src/en/verbalizer/urls.grm
+++ b/third_party/chinese_text_normalization/thrax/src/en/verbalizer/urls.grm
@ -0,0 +1,68 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
 # Rules for URLs and email addresses.
 import 'util/byte.grm' as bytelib;
 import 'en/verbalizer/lexical_map.grm' as l;
 ins_space = "" : " ";
 dot = "." : "@@URL_DOT_EXPRESSION@@";
 at = "@" : "@@AT@@";
 url_suffix =
  (".com" : dot ins_space "com") |
  (".gov" : dot ins_space "gov") |
  (".edu" : dot ins_space "e d u") |
  (".org" : dot ins_space "org") |
  (".net" : dot ins_space "net")
 ;
 letter_string = (bytelib.kAlnum)* bytelib.kAlnum;
 letter_string_dot =
  ((letter_string ins_space dot ins_space)* letter_string)
 ;
 # Rules for URLs.
 export URL = Optimize[
 ((letter_string_dot) (ins_space)
  (url_suffix)) @ l.LEXICAL_MAP
 ];
 # Rules for email addresses.
 letter_by_letter = ((bytelib.kAlnum ins_space)* bytelib.kAlnum);
 letter_by_letter_dot =
  ((letter_by_letter ins_space dot ins_space)*
  letter_by_letter)
 ;
 export EMAIL1 = Optimize[
 ((letter_by_letter) (ins_space)
  (at) (ins_space)
  (letter_by_letter_dot) (ins_space)
  (url_suffix)) @ l.LEXICAL_MAP
 ];
 export EMAIL2 = Optimize[
 ((letter_by_letter) (ins_space)
  (at) (ins_space)
  (letter_string_dot) (ins_space)
  (url_suffix)) @ l.LEXICAL_MAP
 ];
 export EMAILS = Optimize[
  EMAIL1 | EMAIL2
 ];
--- a/third_party/chinese_text_normalization/thrax/src/en/verbalizer/verbalizer.grm
+++ b/third_party/chinese_text_normalization/thrax/src/en/verbalizer/verbalizer.grm
@ -0,0 +1,42 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import 'util/util.grm' as util;
 import 'en/verbalizer/extra_numbers.grm' as e;
 import 'en/verbalizer/float.grm' as f;
 import 'en/verbalizer/math.grm' as ma;
 import 'en/verbalizer/miscellaneous.grm' as mi;
 import 'en/verbalizer/money.grm' as mo;
 import 'en/verbalizer/numbers.grm' as n;
 import 'en/verbalizer/numbers_plus.grm' as np;
 import 'en/verbalizer/spelled.grm' as s;
 import 'en/verbalizer/spoken_punct.grm' as sp;
 import 'en/verbalizer/time.grm' as t;
 import 'en/verbalizer/urls.grm' as u;
 export VERBALIZER = Optimize[RmWeight[
 (  e.MIXED_NUMBERS
  | e.DIGITS
  | f.FLOAT
  | ma.ARITHMETIC
  | mi.MISCELLANEOUS
  | mo.MONEY
  | n.CARDINAL_NUMBERS
  | n.ORDINAL_NUMBERS
  | np.NUMBERS_PLUS
  | s.SPELLED
  | sp.SPOKEN_PUNCT
  | t.TIME
  | u.URL) @ util.CLEAN_SPACES
 ]];
--- a/third_party/chinese_text_normalization/thrax/src/number_data/README.md
+++ b/third_party/chinese_text_normalization/thrax/src/number_data/README.md
@ -0,0 +1,17 @@
 This directory contains data used in:
  Gorman, K., and Sproat, R. 2016. Minimally supervised number normalization.
  Transactions of the Association for Computational Linguistics 4: 507-519.
 * `minimal.txt`: A list of 30 curated numbers used as the "minimal" training
  set.
 * `random-trn.txt`: A list of 9000 randomly-generated numbers used as the
  "medium" training set.
 * `random-tst.txt`: A list of 1000 randomly-generated numbers used as the test
  set.
 Note that `random-trn.txt` and `random-tst.txt` are totally disjoint, but that
 a small number of examples occur both in `minimal.txt` and `random-tst.txt`.
 For information about the sampling procedure used to generate the random data
 sets, see appendix A of the aforementioned paper.
--- a/third_party/chinese_text_normalization/thrax/src/number_data/minimal.txt
+++ b/third_party/chinese_text_normalization/thrax/src/number_data/minimal.txt
@ -0,0 +1,300 @@
 0
 1
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 132
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 187
 188
 189
 190
 191
 192
 193
 194
 195
 196
 197
 198
 199
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 220
 221
 230
 300
 400
 500
 600
 700
 800
 900
 1000
 1001
 1002
 1003
 1004
 1005
 1006
 1007
 1008
 1009
 1010
 1011
 1012
 1020
 1021
 1030
 1200
 2000
 2001
 2002
 2003
 2004
 2005
 2006
 2007
 2008
 2009
 2010
 2011
 2012
 2020
 2021
 2030
 2100
 2200
 5001
 10000
 12000
 20000
 21000
 50001
 100000
 120000
 200000
 210000
 500001
 1000000
 1001000
 1200000
 2000000
 2100000
 5000001
 10000000
 10001000
 12000000
 20000000
 50000001
 100000000
 100001000
 120000000
 200000000
 500000001
 1000000000
 1000001000
 1200000000
 2000000000
 5000000001
 10000000000
 10000001000
 12000000000
 20000000000
 50000000001
 100000000000
 100000001000
 120000000000
 200000000000
 500000000001
--- a/third_party/chinese_text_normalization/thrax/src/number_data/random-trn.txt
+++ b/third_party/chinese_text_normalization/thrax/src/number_data/random-trn.txt
--- a/third_party/chinese_text_normalization/thrax/src/number_data/random-tst.txt
+++ b/third_party/chinese_text_normalization/thrax/src/number_data/random-tst.txt
--- a/third_party/chinese_text_normalization/thrax/src/ru/README.md
+++ b/third_party/chinese_text_normalization/thrax/src/ru/README.md
@ -0,0 +1,6 @@
 # Russian covering grammar definitions
 This directory defines a Russian text normalization covering grammar. The
 primary entry-point is the FST `VERBALIZER`, defined in
 `verbalizer/verbalizer.grm` and compiled in the FST archive
 `verbalizer/verbalizer.far`.
--- a/third_party/chinese_text_normalization/thrax/src/ru/classifier/cyrillic.grm
+++ b/third_party/chinese_text_normalization/thrax/src/ru/classifier/cyrillic.grm
@ -0,0 +1,58 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 export kRussianLowerAlpha = Optimize[
    "а" | "б" | "в" | "г" | "д" | "е" | "ё" | "ж" | "з" | "и" | "й" |
    "к" | "л" | "м" | "н" | "о" | "п" | "р" | "с" | "т" | "у" | "ф" |
    "х" | "ц" | "ч" | "ш" | "щ" | "ъ" | "ы" | "ь" | "э" | "ю" | "я" ];
 export kRussianUpperAlpha = Optimize[
    "А" | "Б" | "В" | "Г" | "Д" | "Е" | "Ё" | "Ж" | "З" | "И" | "Й" |
    "К" | "Л" | "М" | "Н" | "О" | "П" | "Р" | "С" | "Т" | "У" | "Ф" |
    "Х" | "Ц" | "Ч" | "Ш" | "Щ" | "Ъ" | "Ы" | "Ь" | "Э" | "Ю" | "Я" ];
 export kRussianLowerAlphaStressed = Optimize[
    "а́" | "е́" | "ё́" | "и́" | "о́" | "у́" | "ы́" | "э́" | "ю́" | "я́" ];
 export kRussianUpperAlphaStressed = Optimize[
    "А́" | "Е́" | "Ё́" | "И́" | "О́" | "У́" | "Ы́" | "Э́" | "Ю́" | "Я́" ];
 export kRussianRewriteStress = Optimize[
    ("А́" : "А'") | ("Е́" : "Е'") | ("Ё́" : "Ё'") | ("И́" : "И'") |
    ("О́" : "О'") | ("У́" : "У'") | ("Ы́" : "Ы'") | ("Э́" : "Э'") |
    ("Ю́" : "Ю'") | ("Я́" : "Я'") |
    ("а́" : "а'") | ("е́" : "е'") | ("ё́" : "ё'") | ("и́" : "и'") |
    ("о́" : "о'") | ("у́" : "у'") | ("ы́" : "ы'") | ("э́" : "э'") |
    ("ю́" : "ю'") | ("я́" : "я'")
 ];
 export kRussianRemoveStress = Optimize[
    ("А́" : "А") | ("Е́" : "Е") | ("Ё́" : "Ё") | ("И́" : "И") | ("О́" : "О") |
    ("У́" : "У") | ("Ы́" : "Ы") | ("Э́" : "Э") | ("Ю́" : "Ю") | ("Я́" : "Я") |
    ("а́" : "а") | ("е́" : "е") | ("ё́" : "ё") | ("и́" : "и") | ("о́" : "о") |
    ("у́" : "у") | ("ы́" : "ы") | ("э́" : "э") | ("ю́" : "ю") | ("я́" : "я")
 ];
 # Pre-reform characters, just in case.
 export kRussianPreReform = Optimize[
    "ѣ" | "Ѣ"   # http://en.wikipedia.org/wiki/Yat
 ];
 export kCyrillicAlphaStressed = Optimize[
  kRussianLowerAlphaStressed | kRussianUpperAlphaStressed
 ];
 export kCyrillicAlpha = Optimize[
    kRussianLowerAlpha | kRussianUpperAlpha | kRussianPreReform
 ];
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/cardinals-lex.grm
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/cardinals-lex.grm
@ -0,0 +1,338 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # AUTOMATICALLY GENERATED: DO NOT EDIT.
 import 'util/byte.grm' as b;
 # Utilities for insertion and deletion.
 func I[expr] {
  return "" : expr;
 }
 func D[expr] {
  return expr : "";
 }
 # Powers of base 10.
 export POWERS =
    "[E15]"
  | "[E14]"
  | "[E13]"
  | "[E12]"
  | "[E11]"
  | "[E10]"
  | "[E9]"
  | "[E8]"
  | "[E7]"
  | "[E6]"
  | "[E5]"
  | "[E4]"
  | "[E3]"
  | "[E2]"
  | "[E1]"
 ;
 export SIGMA = b.kBytes | POWERS;
 export SIGMA_STAR = SIGMA*;
 export SIGMA_PLUS = SIGMA+;
 ################################################################################
 # BEGIN LANGUAGE SPECIFIC DATA
 revaluations =
    ("[E4]" : "[E1]")
  | ("[E5]" : "[E2]")
  | ("[E7]" : "[E1]")
  | ("[E8]" : "[E2]")
 ;
 Ms = "[E3]" | "[E6]" | "[E9]";
 func Zero[expr] {
  return expr : ("");
 }
 space = " ";
 lexset3 = Optimize[
    ("1[E1]+1" : "одиннадцати")
  | ("1[E1]+1" : "одиннадцать")
  | ("1[E1]+1" : "одиннадцатью")
  | ("1[E1]+2" : "двенадцати")
  | ("1[E1]+2" : "двенадцать")
  | ("1[E1]+2" : "двенадцатью")
  | ("1[E1]+3" : "тринадцати")
  | ("1[E1]+3" : "тринадцать")
  | ("1[E1]+3" : "тринадцатью")
  | ("1[E1]+4" : "четырнадцати")
  | ("1[E1]+4" : "четырнадцать")
  | ("1[E1]+4" : "четырнадцатью")
  | ("1[E1]+5" : "пятнадцати")
  | ("1[E1]+5" : "пятнадцать")
  | ("1[E1]+5" : "пятнадцатью")
  | ("1[E1]+6" : "шестнадцати")
  | ("1[E1]+6" : "шестнадцать")
  | ("1[E1]+6" : "шестнадцатью")
  | ("1[E1]+7" : "семнадцати")
  | ("1[E1]+7" : "семнадцать")
  | ("1[E1]+7" : "семнадцатью")
  | ("1[E1]+8" : "восемнадцати")
  | ("1[E1]+8" : "восемнадцать")
  | ("1[E1]+8" : "восемнадцатью")
  | ("1[E1]+9" : "девятнадцати")
  | ("1[E1]+9" : "девятнадцать")
  | ("1[E1]+9" : "девятнадцатью")]
 ;
 lex3 = CDRewrite[lexset3 I[space], "", "", SIGMA_STAR];
 lexset2 = Optimize[
    ("1[E1]" : "десяти")
  | ("1[E1]" : "десять")
  | ("1[E1]" : "десятью")
  | ("1[E2]" : "ста")
  | ("1[E2]" : "сто")
  | ("2[E1]" : "двадцати")
  | ("2[E1]" : "двадцать")
  | ("2[E1]" : "двадцатью")
  | ("2[E2]" : "двести")
  | ("2[E2]" : "двумстам")
  | ("2[E2]" : "двумястами")
  | ("2[E2]" : "двухсот")
  | ("2[E2]" : "двухстах")
  | ("3[E1]" : "тридцати")
  | ("3[E1]" : "тридцать")
  | ("3[E1]" : "тридцатью")
  | ("3[E2]" : "тремстам")
  | ("3[E2]" : "тремястами")
  | ("3[E2]" : "трехсот")
  | ("3[E2]" : "трехстах")
  | ("3[E2]" : "триста")
  | ("4[E1]" : "сорок")
  | ("4[E1]" : "сорока")
  | ("4[E2]" : "четыремстам")
  | ("4[E2]" : "четыреста")
  | ("4[E2]" : "четырехсот")
  | ("4[E2]" : "четырехстах")
  | ("4[E2]" : "четырьмястами")
  | ("5[E1]" : "пятидесяти")
  | ("5[E1]" : "пятьдесят")
  | ("5[E1]" : "пятьюдесятью")
  | ("5[E2]" : "пятисот")
  | ("5[E2]" : "пятистам")
  | ("5[E2]" : "пятистах")
  | ("5[E2]" : "пятьсот")
  | ("5[E2]" : "пятьюстами")
  | ("6[E1]" : "шестидесяти")
  | ("6[E1]" : "шестьдесят")
  | ("6[E1]" : "шестьюдесятью")
  | ("6[E2]" : "шестисот")
  | ("6[E2]" : "шестистам")
  | ("6[E2]" : "шестистах")
  | ("6[E2]" : "шестьсот")
  | ("6[E2]" : "шестьюстами")
  | ("7[E1]" : "семидесяти")
  | ("7[E1]" : "семьдесят")
  | ("7[E1]" : "семьюдесятью")
  | ("7[E2]" : "семисот")
  | ("7[E2]" : "семистам")
  | ("7[E2]" : "семистах")
  | ("7[E2]" : "семьсот")
  | ("7[E2]" : "семьюстами")
  | ("8[E1]" : "восемьдесят")
  | ("8[E1]" : "восьмидесяти")
  | ("8[E1]" : "восьмьюдесятью")
  | ("8[E2]" : "восемьсот")
  | ("8[E2]" : "восемьюстами")
  | ("8[E2]" : "восьмисот")
  | ("8[E2]" : "восьмистам")
  | ("8[E2]" : "восьмистах")
  | ("8[E2]" : "восьмьюстами")
  | ("9[E1]" : "девяноста")
  | ("9[E1]" : "девяносто")
  | ("9[E2]" : "девятисот")
  | ("9[E2]" : "девятистам")
  | ("9[E2]" : "девятистах")
  | ("9[E2]" : "девятьсот")
  | ("9[E2]" : "девятьюстами")]
 ;
 lex2 = CDRewrite[lexset2 I[space], "", "", SIGMA_STAR];
 lexset1 = Optimize[
    ("+" : "")
  | ("1" : "один")
  | ("1" : "одна")
  | ("1" : "одни")
  | ("1" : "одним")
  | ("1" : "одними")
  | ("1" : "одних")
  | ("1" : "одно")
  | ("1" : "одного")
  | ("1" : "одной")
  | ("1" : "одном")
  | ("1" : "одному")
  | ("1" : "одною")
  | ("1" : "одну")
  | ("2" : "два")
  | ("2" : "две")
  | ("2" : "двум")
  | ("2" : "двумя")
  | ("2" : "двух")
  | ("3" : "трем")
  | ("3" : "тремя")
  | ("3" : "трех")
  | ("3" : "три")
  | ("4" : "четыре")
  | ("4" : "четырем")
  | ("4" : "четырех")
  | ("4" : "четырьмя")
  | ("5" : "пяти")
  | ("5" : "пять")
  | ("5" : "пятью")
  | ("6" : "шести")
  | ("6" : "шесть")
  | ("6" : "шестью")
  | ("7" : "семи")
  | ("7" : "семь")
  | ("7" : "семью")
  | ("8" : "восемь")
  | ("8" : "восьми")
  | ("8" : "восьмью")
  | ("9" : "девяти")
  | ("9" : "девять")
  | ("9" : "девятью")
  | ("[E3]" : "тысяч")
  | ("[E3]" : "тысяча")
  | ("[E3]" : "тысячам")
  | ("[E3]" : "тысячами")
  | ("[E3]" : "тысячах")
  | ("[E3]" : "тысяче")
  | ("[E3]" : "тысячей")
  | ("[E3]" : "тысячи")
  | ("[E3]" : "тысячу")
  | ("[E3]" : "тысячью")
  | ("[E6]" : "миллион")
  | ("[E6]" : "миллиона")
  | ("[E6]" : "миллионам")
  | ("[E6]" : "миллионами")
  | ("[E6]" : "миллионах")
  | ("[E6]" : "миллионе")
  | ("[E6]" : "миллионов")
  | ("[E6]" : "миллионом")
  | ("[E6]" : "миллиону")
  | ("[E6]" : "миллионы")
  | ("[E9]" : "миллиард")
  | ("[E9]" : "миллиарда")
  | ("[E9]" : "миллиардам")
  | ("[E9]" : "миллиардами")
  | ("[E9]" : "миллиардах")
  | ("[E9]" : "миллиарде")
  | ("[E9]" : "миллиардов")
  | ("[E9]" : "миллиардом")
  | ("[E9]" : "миллиарду")
  | ("[E9]" : "миллиарды")
  | ("|0|" : "ноле")
  | ("|0|" : "нолем")
  | ("|0|" : "ноль")
  | ("|0|" : "нолю")
  | ("|0|" : "ноля")
  | ("|0|" : "нуле")
  | ("|0|" : "нулем")
  | ("|0|" : "нуль")
  | ("|0|" : "нулю")
  | ("|0|" : "нуля")]
 ;
 lex1 = CDRewrite[lexset1 I[space], "", "", SIGMA_STAR];
 export LEX = Optimize[lex3 @ lex2 @ lex1];
 export INDEPENDENT_EXPONENTS = "[E3]" | "[E6]" | "[E9]";
 # END LANGUAGE SPECIFIC DATA
 ################################################################################
 # Inserts a marker after the Ms.
 export INSERT_BOUNDARY = CDRewrite["" : "%", Ms, "", SIGMA_STAR];
 # Deletes all powers and "+".
 export DELETE_POWERS = CDRewrite[D[POWERS | "+"], "", "", SIGMA_STAR];
 # Deletes trailing zeros at the beginning of a number, so that "0003" does not
 # get treated as an ordinary number.
 export DELETE_INITIAL_ZEROS =
  CDRewrite[("0" POWERS "+") : "", "[BOS]", "", SIGMA_STAR]
 ;
 NonMs = Optimize[POWERS - Ms];
 # Deletes (usually) zeros before a non-M. E.g., +0[E1] should be deleted.
 export DELETE_INTERMEDIATE_ZEROS1 =
  CDRewrite[Zero["+0" NonMs], "", "", SIGMA_STAR]
 ;
 # Deletes (usually) zeros before an M, if there is no non-zero element between
 # that and the previous boundary. Thus, if after the result of the rule above we
 # end up with "%+0[E3]", then that gets deleted. Also (really) deletes a final
 # zero.
 export DELETE_INTERMEDIATE_ZEROS2 = Optimize[
   CDRewrite[Zero["%+0" Ms], "", "", SIGMA_STAR]
 @ CDRewrite[D["+0"], "", "[EOS]", SIGMA_STAR]]
 ;
 # Final clean up of stray zeros.
 export DELETE_REMAINING_ZEROS = Optimize[
   CDRewrite[Zero["+0"], "", "", SIGMA_STAR]
 @ CDRewrite[Zero["0"], "", "", SIGMA_STAR]]
 ;
 # Applies the revaluation map. For example in English, changes [E4] to [E1] as a
 # modifier of [E3].
 export REVALUE = CDRewrite[revaluations, "", "", SIGMA_STAR];
 # Deletes the various marks and powers in the input and output.
 export DELETE_MARKS = CDRewrite[D["%" | "+" | POWERS], "", "", SIGMA_STAR];
 export CLEAN_SPACES = Optimize[
   CDRewrite[" "+ : " ", b.kNotSpace, b.kNotSpace, SIGMA_STAR]
 @ CDRewrite[" "* : "", "[BOS]", "", SIGMA_STAR]
 @ CDRewrite[" "* : "", "", "[EOS]", SIGMA_STAR]]
 ;
 d = b.kDigit;
 # Germanic inversion rule.
 germanic =
    (I["1+"] d "[E1]" D["+1"])
  | (I["2+"] d "[E1]" D["+2"])
  | (I["3+"] d "[E1]" D["+3"])
  | (I["4+"] d "[E1]" D["+4"])
  | (I["5+"] d "[E1]" D["+5"])
  | (I["6+"] d "[E1]" D["+6"])
  | (I["7+"] d "[E1]" D["+7"])
  | (I["8+"] d "[E1]" D["+8"])
  | (I["9+"] d "[E1]" D["+9"])
 ;
 germanic_inversion =
  CDRewrite[germanic, "", "", SIGMA_STAR, 'ltr', 'opt']
 ;
 export GERMANIC_INVERSION = SIGMA_STAR;
 export ORDINAL_RESTRICTION = SIGMA_STAR;
 nondigits = b.kBytes - b.kDigit;
 export ORDINAL_SUFFIX = D[nondigits*];
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/cardinals.tsv
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/cardinals.tsv
@ -0,0 +1,177 @@
 0	ноле
 0	ноль
 0	нолю
 0	ноля
 0	нолём
 0	нуле
 0	нуль
 0	нулю
 0	нуля
 0	нулём
 1	один
 1	одна
 1	одни
 1	одним
 1	одними
 1	одних
 1	одно
 1	одного
 1	одной
 1	одном
 1	одному
 1	одною
 1	раз
 1	одну
 2	два
 2	две
 2	двум
 2	двумя
 2	двух
 3	тремя
 3	три
 3	трём
 3	трёх
 4	четыре
 4	четырьмя
 4	четырём
 4	четырёх
 5	пяти
 5	пять
 5	пятью
 6	шести
 6	шесть
 6	шестью
 7	семи
 7	семь
 7	семью
 8	восемь
 8	восьми
 8	восьмью
 9	девяти
 9	девять
 9	девятью
 10	десяти
 10	десять
 10	десятью
 11	одиннадцати
 11	одиннадцать
 11	одиннадцатью
 12	двенадцати
 12	двенадцать
 12	двенадцатью
 13	тринадцати
 13	тринадцать
 13	тринадцатью
 14	четырнадцати
 14	четырнадцать
 14	четырнадцатью
 15	пятнадцати
 15	пятнадцать
 15	пятнадцатью
 16	шестнадцати
 16	шестнадцать
 16	шестнадцатью
 17	семнадцати
 17	семнадцать
 17	семнадцатью
 18	восемнадцати
 18	восемнадцать
 18	восемнадцатью
 19	девятнадцати
 19	девятнадцать
 19	девятнадцатью
 20	двадцати
 20	двадцать
 20	двадцатью
 30	тридцати
 30	тридцать
 30	тридцатью
 40	сорок
 40	сорока
 50	пятидесяти
 50	пятьдесят
 50	пятьюдесятью
 60	шестидесяти
 60	шестьдесят
 60	шестьюдесятью
 70	семидесяти
 70	семьдесят
 70	семьюдесятью
 80	восемьдесят
 80	восьмидесяти
 80	восьмьюдесятью
 90	девяноста
 90	девяносто
 100	ста
 100	сто
 200	двести
 200	двумстам
 200	двумястами
 200	двухсот
 200	двухстах
 300	тремястами
 300	трехсот
 300	триста
 300	трёмстам
 300	трёхстах
 400	четыреста
 400	четырьмястами
 400	четырёмстам
 400	четырёхсот
 400	четырёхстах
 500	пятисот
 500	пятистам
 500	пятистах
 500	пятьсот
 500	пятьюстами
 600	шестисот
 600	шестистам
 600	шестистах
 600	шестьсот
 600	шестьюстами
 700	семисот
 700	семистам
 700	семистах
 700	семьсот
 700	семьюстами
 800	восемьсот
 800	восемьюстами
 800	восьмисот
 800	восьмистам
 800	восьмистах
 800	восьмьюстами
 900	девятисот
 900	девятистам
 900	девятистах
 900	девятьсот
 900	девятьюстами
 1000	тысяч
 1000	тысяча
 1000	тысячам
 1000	тысячами
 1000	тысячах
 1000	тысяче
 1000	тысячей
 1000	тысячи
 1000	тысячу
 1000	тысячью
 1000000	миллион
 1000000	миллиона
 1000000	миллионам
 1000000	миллионами
 1000000	миллионах
 1000000	миллионе
 1000000	миллионов
 1000000	миллионом
 1000000	миллиону
 1000000	миллионы
 1000000000	миллиард
 1000000000	миллиарда
 1000000000	миллиардам
 1000000000	миллиардами
 1000000000	миллиардах
 1000000000	миллиарде
 1000000000	миллиардов
 1000000000	миллиардом
 1000000000	миллиарду
 1000000000	миллиарды
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/extra_numbers.grm
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/extra_numbers.grm
@ -0,0 +1,35 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import 'util/byte.grm' as b;
 import 'ru/verbalizer/numbers.grm' as n;
 digit = b.kDigit @ n.CARDINAL_NUMBERS | ("0" : "@@OTHER_ZERO_VERBALIZATIONS@@");
 export DIGITS  = digit (n.I[" "] digit)*;
 # Various common factorizations
 two_digits = b.kDigit{2} @ n.CARDINAL_NUMBERS;
 three_digits = b.kDigit{3} @ n.CARDINAL_NUMBERS;
 mixed =
   (digit n.I[" "] two_digits)
 | (two_digits n.I[" "] two_digits)
 | (two_digits n.I[" "] three_digits)
 | (two_digits n.I[" "] two_digits n.I[" "] two_digits)
 ;
 export MIXED_NUMBERS = Optimize[mixed];
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/factorization.grm
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/factorization.grm
@ -0,0 +1,40 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import 'util/byte.grm' as b;
 import 'util/util.grm' as u;
 import 'ru/verbalizer/numbers.grm' as n;
 func ToNumberName[expr] {
  number_name_seq = n.CARDINAL_NUMBERS (" " n.CARDINAL_NUMBERS)*;
  return Optimize[expr @ number_name_seq];
 }
 d = b.kDigit;
 leading_zero = CDRewrite[n.I[" "], ("[BOS]" | " ") "0", "", b.kBytes*];
 by_ones = d n.I[" "];
 by_twos = (d{2} @ leading_zero) n.I[" "];
 by_threes = (d{3} @ leading_zero) n.I[" "];
 groupings = by_twos* (by_threes | by_twos | by_ones);
 export FRACTIONAL_PART_UNGROUPED =
  Optimize[ToNumberName[by_ones+ @ u.CLEAN_SPACES]]
 ;
 export FRACTIONAL_PART_GROUPED =
  Optimize[ToNumberName[groupings @ u.CLEAN_SPACES]]
 ;
 export FRACTIONAL_PART_UNPARSED = Optimize[ToNumberName[d*]];
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/float.grm
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/float.grm
@ -0,0 +1,30 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import 'ru/verbalizer/factorization.grm' as f;
 import 'ru/verbalizer/lexical_map.grm' as l;
 import 'ru/verbalizer/numbers.grm' as n;
 fractional_part_ungrouped = f.FRACTIONAL_PART_UNGROUPED;
 fractional_part_grouped = f.FRACTIONAL_PART_GROUPED;
 fractional_part_unparsed = f.FRACTIONAL_PART_UNPARSED;
 __fractional_part__ = fractional_part_unparsed;
 __decimal_marker__ = ",";
 export FLOAT = Optimize[
 (n.CARDINAL_NUMBERS
  (__decimal_marker__ : " @@DECIMAL_DOT_EXPRESSION@@ ")
  __fractional_part__) @ l.LEXICAL_MAP]
 ;
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/g.fst
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/g.fst
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/lexical_map.grm
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/lexical_map.grm
@ -0,0 +1,25 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import 'util/byte.grm' as b;
 lexical_map = StringFile['ru/verbalizer/lexical_map.tsv'];
 sigma_star = b.kBytes*;
 del_null = CDRewrite["__NULL__" : "", "", "", sigma_star];
 export LEXICAL_MAP = Optimize[
  CDRewrite[lexical_map, "", "", sigma_star] @ del_null]
 ;
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/lexical_map.tsv
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/lexical_map.tsv
@ -0,0 +1,221 @@
@@CONNECTOR_RANGE@@	до
@@CONNECTOR_RATIO@@	к
@@CONNECTOR_BY@@	на
@@CONNECTOR_CONSECUTIVE_YEAR@@	до
@@JANUARY@@	январь
@@JANUARY@@	январи
@@JANUARY@@	января
@@JANUARY@@	январей
@@JANUARY@@	январю
@@JANUARY@@	январям
@@JANUARY@@	январь
@@JANUARY@@	январи
@@JANUARY@@	январём
@@JANUARY@@	январями
@@JANUARY@@	январе
@@JANUARY@@	январях
@@FEBRUARY@@	февраль
@@FEBRUARY@@	феврали
@@FEBRUARY@@	февраля
@@FEBRUARY@@	февралей
@@FEBRUARY@@	февралю
@@FEBRUARY@@	февралям
@@FEBRUARY@@	февраль
@@FEBRUARY@@	феврали
@@FEBRUARY@@	февралём
@@FEBRUARY@@	февралями
@@FEBRUARY@@	феврале
@@FEBRUARY@@	февралях
@@MARCH@@	март
@@MARCH@@	марты
@@MARCH@@	марта
@@MARCH@@	мартов
@@MARCH@@	марту
@@MARCH@@	мартам
@@MARCH@@	март
@@MARCH@@	марты
@@MARCH@@	мартом
@@MARCH@@	мартами
@@MARCH@@	марте
@@MARCH@@	мартах
@@APRIL@@	апрель
@@APRIL@@	апрели
@@APRIL@@	апреля
@@APRIL@@	апрелей
@@APRIL@@	апрелю
@@APRIL@@	апрелям
@@APRIL@@	апрель
@@APRIL@@	апрели
@@APRIL@@	апрелем
@@APRIL@@	апрелями
@@APRIL@@	апреле
@@APRIL@@	апрелях
@@MAY@@	май
@@MAY@@	маи
@@MAY@@	мая
@@MAY@@	маев
@@MAY@@	маю
@@MAY@@	маям
@@MAY@@	май
@@MAY@@	маи
@@MAY@@	маем
@@MAY@@	маями
@@MAY@@	мае
@@MAY@@	маях
@@JUN@@	июнь
@@JUN@@	июни
@@JUN@@	июня
@@JUN@@	июней
@@JUN@@	июню
@@JUN@@	июням
@@JUN@@	июнь
@@JUN@@	июни
@@JUN@@	июнем
@@JUN@@	июнями
@@JUN@@	июне
@@JUN@@	июнях
@@JUL@@	июль
@@JUL@@	июли
@@JUL@@	июля
@@JUL@@	июлей
@@JUL@@	июлю
@@JUL@@	июлям
@@JUL@@	июль
@@JUL@@	июли
@@JUL@@	июлем
@@JUL@@	июлями
@@JUL@@	июле
@@JUL@@	июлях
@@AUGUST@@	август
@@AUGUST@@	августы
@@AUGUST@@	августа
@@AUGUST@@	августов
@@AUGUST@@	августу
@@AUGUST@@	августам
@@AUGUST@@	август
@@AUGUST@@	августы
@@AUGUST@@	августом
@@AUGUST@@	августами
@@AUGUST@@	августе
@@AUGUST@@	августах
@@SEPTEMBER@@	сентябрь
@@SEPTEMBER@@	сентябри
@@SEPTEMBER@@	сентября
@@SEPTEMBER@@	сентябрей
@@SEPTEMBER@@	сентябрю
@@SEPTEMBER@@	сентябрям
@@SEPTEMBER@@	сентябрь
@@SEPTEMBER@@	сентябри
@@SEPTEMBER@@	сентябрём
@@SEPTEMBER@@	сентябрями
@@SEPTEMBER@@	сентябре
@@SEPTEMBER@@	сентябрях
@@OCTOBER@@	октябрь
@@OCTOBER@@	октябри
@@OCTOBER@@	октября
@@OCTOBER@@	октябрей
@@OCTOBER@@	октябрю
@@OCTOBER@@	октябрям
@@OCTOBER@@	октябрь
@@OCTOBER@@	октябри
@@OCTOBER@@	октябрём
@@OCTOBER@@	октябрями
@@OCTOBER@@	октябре
@@OCTOBER@@	октябрях
@@NOVEMBER@@	ноябрь
@@NOVEMBER@@	ноябри
@@NOVEMBER@@	ноября
@@NOVEMBER@@	ноябрей
@@NOVEMBER@@	ноябрю
@@NOVEMBER@@	ноябрям
@@NOVEMBER@@	ноябрь
@@NOVEMBER@@	ноябри
@@NOVEMBER@@	ноябрём
@@NOVEMBER@@	ноябрями
@@NOVEMBER@@	ноябре
@@NOVEMBER@@	ноябрях
@@DECEMBER@@	декабрь
@@DECEMBER@@	декабри
@@DECEMBER@@	декабря
@@DECEMBER@@	декабрей
@@DECEMBER@@	декабрю
@@DECEMBER@@	декабрям
@@DECEMBER@@	декабрь
@@DECEMBER@@	декабри
@@DECEMBER@@	декабрём
@@DECEMBER@@	декабрями
@@DECEMBER@@	декабре
@@DECEMBER@@	декабрях
@@MINUS@@	минус
@@DECIMAL_DOT_EXPRESSION@@	целая
@@DECIMAL_DOT_EXPRESSION@@	целой
@@DECIMAL_DOT_EXPRESSION@@	целой
@@DECIMAL_DOT_EXPRESSION@@	целую
@@DECIMAL_DOT_EXPRESSION@@	целой
@@DECIMAL_DOT_EXPRESSION@@	целой
@@DECIMAL_DOT_EXPRESSION@@	целым
@@DECIMAL_DOT_EXPRESSION@@	целыми
@@DECIMAL_DOT_EXPRESSION@@	целых
@@DECIMAL_DOT_EXPRESSION@@	целых
@@URL_DOT_EXPRESSION@@	точка
@@PERIOD@@	точка
@@DECIMAL_EXPONENT@@	умножить на десять в степени
@@COLON@@	двоеточие
@@SLASH@@	косая черта
@@PASSWORD@@	пароль
@@AT@@	собака
@@PORT@@	порт
@@QUESTION_MARK@@	вопросительный знак
@@HASH@@	решётка
@@HASH@@	решетка
@@MONEY_AND@@	и
@@AND@@	и
@@PHONE_PLUS@@	плюс
@@ARITHMETIC_PLUS@@	плюс
@@PHONE_EXTENSION@@	добавочный номер
@@TIME_AM@@		утра
@@TIME_PM@@		вечера
@@HOUR@@		час
@@HOUR@@		часа
@@HOUR@@		часам
@@HOUR@@		часами
@@HOUR@@		часах
@@HOUR@@		часе
@@HOUR@@		часов
@@HOUR@@		часом
@@HOUR@@		часу
@@HOUR@@		часы
@@MINUTE@@	минут
@@MINUTE@@	минута
@@MINUTE@@	минутам
@@MINUTE@@	минутами
@@MINUTE@@	минутах
@@MINUTE@@	минуте
@@MINUTE@@	минутой
@@MINUTE@@	минутою
@@MINUTE@@	минуту
@@MINUTE@@	минуты
@@TIME_AFTER@@	__NULL__
@@TIME_BEFORE_PRE@@		без
@@TIME_QUARTER@@	четверть
@@TIME_QUARTER@@	четверти
@@TIME_HALF@@	половина
@@TIME_HALF@@	половины
@@TIME_HALF@@	половину
@@TIME_HALF@@	половин
@@TIME_HALF@@	половине
@@TIME_HALF@@	половинам
@@TIME_HALF@@	половиной
@@TIME_HALF@@	половинами
@@TIME_HALF@@	половинах
@@PERCENT@@	процент
@@PERCENT@@	процента
@@PERCENT@@	процентам
@@PERCENT@@	процентами
@@PERCENT@@	процентах
@@PERCENT@@	проценте
@@PERCENT@@	процентов
@@PERCENT@@	процентом
@@PERCENT@@	проценту
@@PERCENT@@	проценты
@@PERCENT@@	проценты
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/math.grm
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/math.grm
@ -0,0 +1,34 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import 'ru/verbalizer/float.grm' as f;
 import 'ru/verbalizer/lexical_map.grm' as l;
 import 'ru/verbalizer/numbers.grm' as n;
 float = f.FLOAT;
 card = n.CARDINAL_NUMBERS;
 number = card | float;
 plus = "+" : " @@ARITHMETIC_PLUS@@ ";
 times = "*" : " @@ARITHMETIC_TIMES@@ ";
 minus = "-" : " @@ARITHMETIC_MINUS@@ ";
 division = "/" : " @@ARITHMETIC_DIVISION@@ ";
 operator = plus | times | minus | division;
 percent = "%" : " @@PERCENT@@";
 export ARITHMETIC =
  Optimize[((number operator number) | (number percent)) @ l.LEXICAL_MAP]
 ;
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/miscellaneous.grm
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/miscellaneous.grm
@ -0,0 +1,78 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import 'util/byte.grm' as b;
 import 'ru/classifier/cyrillic.grm' as c;
 import 'ru/verbalizer/extra_numbers.grm' as e;
 import 'ru/verbalizer/lexical_map.grm' as l;
 import 'ru/verbalizer/numbers.grm' as n;
 import 'ru/verbalizer/spelled.grm' as s;
 letter = b.kAlpha | c.kCyrillicAlpha;
 dash   = "-";
 word = letter+;
 possibly_split_word = word (((dash | ".") : " ") word)* n.D["."]?;
 post_word_symbol =
   ("+" : ("@@ARITHMETIC_PLUS@@" | "@@POSITIVE@@")) |
   ("-" : ("@@ARITHMETIC_MINUS@@" | "@@NEGATIVE@@")) |
   ("*" : "@@STAR@@")
 ;
 pre_word_symbol =
   ("@" : "@@AT@@") |
   ("/" : "@@SLASH@@") |
   ("#" : "@@HASH@@")
 ;
 post_word = possibly_split_word n.I[" "] post_word_symbol;
 pre_word = pre_word_symbol n.I[" "] possibly_split_word;
 ## Number/digit sequence combos, maybe with a dash
 spelled_word = word @ s.SPELLED_NO_LETTER;
 word_number =
  (word | spelled_word)
  (n.I[" "] | (dash : " "))
  (e.DIGITS | n.CARDINAL_NUMBERS | e.MIXED_NUMBERS)
 ;
 number_word =
  (e.DIGITS | n.CARDINAL_NUMBERS | e.MIXED_NUMBERS)
  (n.I[" "] | (dash : " "))
  (word | spelled_word)
 ;
 ## Two-digit year.
 # Note that in this case to be fair we really have to allow ordinals too since
 # in some languages that's what you would have.
 two_digit_year = n.D["'"] (b.kDigit{2} @ (n.CARDINAL_NUMBERS | e.DIGITS));
 dot_com = ("." : "@@URL_DOT_EXPRESSION@@") n.I[" "] "com";
 miscellaneous = Optimize[
    possibly_split_word
  | post_word
  | pre_word
  | word_number
  | number_word
  | two_digit_year
  | dot_com
 ];
 export MISCELLANEOUS = Optimize[miscellaneous @ l.LEXICAL_MAP];
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/money.grm
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/money.grm
@ -0,0 +1,44 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import 'util/byte.grm' as b;
 import 'ru/verbalizer/lexical_map.grm' as l;
 import 'ru/verbalizer/numbers.grm' as n;
 card = n.CARDINAL_NUMBERS;
 __currency__ = StringFile['ru/verbalizer/money.tsv'];
 d = b.kDigit;
 D = d - "0";
 cents = ((n.D["0"] | D) d) @ card;
 # Only dollar for the verbalizer tests for English. Will need to add other
 # currencies.
 usd_maj = Project["usd_maj" @ __currency__, 'output'];
 usd_min = Project["usd_min" @ __currency__, 'output'];
 and = " @@MONEY_AND@@ " | " ";
 dollar1 =
  n.D["$"] card n.I[" " usd_maj] n.I[and] n.D["."] cents n.I[" " usd_min]
 ;
 dollar2 = n.D["$"] card n.I[" " usd_maj] n.D["."] n.D["00"];
 dollar3 = n.D["$"] card n.I[" " usd_maj];
 dollar = Optimize[dollar1 | dollar2 | dollar3];
 export MONEY = Optimize[dollar @ l.LEXICAL_MAP];
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/money.tsv
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/money.tsv
@ -0,0 +1,24 @@
 usd_maj	доллара
 usd_maj	долларами
 usd_maj	долларам
 usd_maj	долларах
 usd_maj	долларе
 usd_maj	долларов
 usd_maj	долларом
 usd_maj	доллар
 usd_maj	доллар
 usd_maj	доллару
 usd_maj	доллары
 usd_maj	доллары
 usd_min	цент
 usd_min	цент
 usd_min	цента
 usd_min	центам
 usd_min	центами
 usd_min	центах
 usd_min	центе
 usd_min	центов
 usd_min	центом
 usd_min	центу
 usd_min	центы
 usd_min	центы
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/nominatives.tsv
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/nominatives.tsv
@ -0,0 +1,166 @@
 нуль
 ноль
 один
 два
 две
 три
 четыре
 пять
 шесть
 семь
 восемь
 девять
 десять
 одиннадцать
 двенадцать
 тринадцать
 четырнадцать
 пятнадцать
 шестнадцать
 семнадцать
 восемнадцать
 девятнадцать
 двадцать
 тридцать
 сорок
 пятьдесят
 шестьдесят
 семьдесят
 восемьдесят
 девяносто
 сто
 двести
 триста
 четыреста
 пятьсот
 шестьсот
 семьсот
 восемьсот
 девятьсот
 тысячи
 тысяч
 тысяча
 миллионов
 миллион
 миллиона
 миллиардов
 миллиард
 миллиарда
 первая
 первого
 первое
 первый
 вторая
 второе
 второй
 третий
 третье
 третья
 четвертая
 четвертое
 четвертой
 пятая
 пятое
 пятой
 шестая
 шестое
 шестой
 седьмая
 седьмое
 седьмой
 восьмая
 восьмое
 восьмой
 девятая
 девятое
 девятой
 десятая
 десятое
 десятой
 одиннадцатая
 одиннадцатое
 одиннадцатой
 двенадцатая
 двенадцатое
 двенадцатой
 тринадцатая
 тринадцатое
 тринадцатой
 четырнадцатая
 четырнадцатое
 четырнадцатой
 пятнадцатая
 пятнадцатое
 пятнадцатой
 шестнадцатая
 шестнадцатое
 шестнадцатой
 семнадцатая
 семнадцатое
 семнадцатой
 восемнадцатая
 восемнадцатое
 восемнадцатой
 девятнадцатая
 девятнадцатое
 девятнадцатой
 двадцатая
 двадцатое
 двадцатой
 тридцатая
 тридцатое
 тридцатой
 сороковая
 сороковое
 сороковой
 пятидесятая
 пятидесятое
 пятидесятой
 шестидесятая
 шестидесятое
 шестидесятой
 семидесятая
 семидесятое
 семидесятой
 восьмидесятая
 восьмидесятое
 восьмидесятой
 девяностая
 девяностое
 девяностой
 сотая
 сотое
 сотой
 двухсотая
 двухсотое
 двухсотой
 трехсотая
 трехсотое
 трехсотой
 четырехсотая
 четырехсотое
 четырехсотой
 пятисотая
 пятисотое
 пятисотой
 шестисотая
 шестисотое
 шестисотой
 семисотая
 семисотое
 семисотой
 восьмисотая
 восьмисотое
 восьмисотой
 девятисотая
 девятисотое
 девятисотой
 тысячная
 тысячное
 тысячной
 миллионная
 миллионное
 миллионной
 миллиардная
 миллиардное
 миллиардной
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/number_names.grm
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/number_names.grm
@ -0,0 +1,48 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
 # Russian minimally supervised number grammar.
 #
 # Supports cardinals and ordinals in all inflected forms.
 #
 # The language-specific acceptor G was compiled with digit, teen, decade,
 # century, and big power-of-ten preterminals. The lexicon transducer is
 # highly ambiguous, but no LM is used.
 import 'util/arithmetic.grm' as a;
 # Intersects the universal factorization transducer (F) with language-specific
 # acceptor (G).
 d = a.DELTA_STAR;
 f = a.IARITHMETIC_RESTRICTED;
 g = LoadFst['ru/verbalizer/g.fst'];
 fg = Optimize[d @ Optimize[f @ Optimize[f @ Optimize[f @ g]]]];
 test1 = AssertEqual["230" @ fg, "(+ 200 30 +)"];
 # Compiles lexicon transducers (L).
 cardinal_name = StringFile['ru/verbalizer/cardinals.tsv'];
 cardinal_l = Optimize[(cardinal_name " ")* cardinal_name];
 ordinal_name = StringFile['ru/verbalizer/ordinals.tsv'];
 ordinal_l = Optimize[(cardinal_name " ")* ordinal_name];
 # Composes L with the leaf transducer (P), then composes that with FG.
 p = a.LEAVES;
 export CARDINAL_NUMBER_NAME = Optimize[fg @ (p @ cardinal_l)];
 export ORDINAL_NUMBER_NAME = Optimize[fg @ (p @ ordinal_l)];
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/numbers.grm
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/numbers.grm
@ -0,0 +1,68 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import 'ru/verbalizer/number_names.grm' as n;
 import 'universal/thousands_punct.grm' as t;
 import 'util/byte.grm' as b;
 nominatives = StringFile['ru/verbalizer/nominatives.tsv'];
 sigma_star = b.kBytes*;
 nominative_filter =
 CDRewrite[nominatives ("" : "" <-1>), "[BOS]" | " ", " " | "[EOS]", sigma_star]
 ;
 cardinal = n.CARDINAL_NUMBER_NAME;
 ordinal = n.ORDINAL_NUMBER_NAME;
 # Putting these here since this grammar gets incorporated by all the others.
 func I[expr] {
  return "" : expr;
 }
 func D[expr] {
  return expr : "";
 }
 # Since we know this is the default for Russian, it's fair game to set it.
 separators = t.dot_thousands | t.no_delimiter;
 export CARDINAL_NUMBERS = Optimize[
   separators
 @ cardinal
 ];
 export ORDINAL_NUMBERS_UNMARKED = Optimize[
   separators
 @ ordinal
 ];
 endings = StringFile['ru/verbalizer/ordinal_endings.tsv'];
 not_dash = (b.kBytes - "-")+;
 del_ending = CDRewrite[("-" not_dash) : "", "", "[EOS]", sigma_star];
 # Needs nominative_filter here if we take out Kyle's models.
 export ORDINAL_NUMBERS_MARKED = Optimize[
   Optimize[Optimize[separators @ ordinal] "-" not_dash]
 @ Optimize[sigma_star endings]
 @ del_ending]
 ;
 export ORDINAL_NUMBERS =
  Optimize[ORDINAL_NUMBERS_MARKED | ORDINAL_NUMBERS_UNMARKED]
 ;
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/numbers_plus.grm
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/numbers_plus.grm
@ -0,0 +1,133 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
 # Grammar for things built mostly on numbers.
 import 'ru/verbalizer/factorization.grm' as f;
 import 'ru/verbalizer/lexical_map.grm' as l;
 import 'ru/verbalizer/numbers.grm' as n;
 num = n.CARDINAL_NUMBERS;
 ord = n.ORDINAL_NUMBERS_UNMARKED;
 digits = f.FRACTIONAL_PART_UNGROUPED;
 # Various symbols.
 plus = "+" : "@@ARITHMETIC_PLUS@@";
 minus = "-" : "@@ARITHMETIC_MINUS@@";
 slash = "/" : "@@SLASH@@";
 dot = "." : "@@URL_DOT_EXPRESSION@@";
 dash = "-" : "@@DASH@@";
 equals = "=" : "@@ARITHMETIC_EQUALS@@";
 degree = "°" : "@@DEGREE@@";
 division = ("/" | "÷") : "@@ARITHMETIC_DIVISION@@";
 times = ("x" | "*") : "@@ARITHMETIC_TIMES@@";
 power = "^" : "@@DECIMAL_EXPONENT@@";
 square_root = "√" : "@@SQUARE_ROOT@@";
 percent = "%" : "@@PERCENT@@";
 # Safe roman numbers.
 # NB: Do not change the formatting here. NO_EDIT must be on the same
 # line as the path.
 rfile =
  'universal/roman_numerals.tsv' # NO_EDIT
 ;
 roman = StringFile[rfile];
 ## Main categories.
 cat_dot_number =
   num
   n.I[" "] dot n.I[" "] num
   (n.I[" "] dot n.I[" "] num)+
 ;
 cat_slash_number =
   num
   n.I[" "] slash n.I[" "] num
   (n.I[" "] slash n.I[" "] num)*
 ;
 cat_dash_number =
   num
   n.I[" "] dash n.I[" "] num
   (n.I[" "] dash n.I[" "] num)*
 ;
 cat_signed_number = ((plus | minus) n.I[" "])? num;
 cat_degree = cat_signed_number n.I[" "] degree;
 cat_country_code = plus n.I[" "] (num | digits);
 cat_math_operations =
     plus
   | minus
   | division
   | times
   | equals
   | percent
   | power
   | square_root
 ;
 # Roman numbers are often either cardinals or ordinals in various languages.
 cat_roman = roman @ (num | ord);
 # Allow
 #
 # number:number
 # number-number
 #
 # to just be
 #
 # number number.
 cat_number_number =
   num ((":" | "-") : " ") num
 ;
 # Some additional readings for these symbols.
 cat_additional_readings =
  ("/" : "@@PER@@") |
  ("+" : "@@AND@@") |
  ("-" : ("@@HYPHEN@@" | "@@CONNECTOR_TO@@")) |
  ("*" : "@@STAR@@") |
  ("x" : ("x" | "@@CONNECTOR_BY@@")) |
  ("@" : "@@AT@@")
 ;
 numbers_plus = Optimize[
   cat_dot_number
 | cat_slash_number
 | cat_dash_number
 | cat_signed_number
 | cat_degree
 | cat_country_code
 | cat_math_operations
 | cat_roman
 | cat_number_number
 | cat_additional_readings
 ];
 export NUMBERS_PLUS = Optimize[numbers_plus @ l.LEXICAL_MAP];
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/ordinal_endings.tsv
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/ordinal_endings.tsv
@ -0,0 +1,39 @@
 ая-ая
 ого-го
 ьего-го
 ьего-его
 ьей-ей
 ьему-ему
 ьем-ем
 ое-е
 ые-е
 ье-е
 ий-ий
 ьими-ими
 ьим-им
 ьих-их
 ьи-и
 ий-й
 ой-й
 ый-й
 ыми-ми
 ьими-ми
 ому-му
 ьему-му
 ого-ого
 ое-ое
 ой-ой
 ом-ом
 ому-ому
 ую-ую
 ых-х
 ьих-х
 ые-ые
 ый-ый
 ыми-ыми
 ым-ым
 ых-ых
 ую-ю
 ью-ю
 ая-я
 ья-я
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/ordinals-lex.grm
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/ordinals-lex.grm
@ -0,0 +1,804 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # AUTOMATICALLY GENERATED: DO NOT EDIT.
 import 'util/byte.grm' as b;
 # Utilities for insertion and deletion.
 func I[expr] {
  return "" : expr;
 }
 func D[expr] {
  return expr : "";
 }
 # Powers of base 10.
 export POWERS =
    "[E15]"
  | "[E14]"
  | "[E13]"
  | "[E12]"
  | "[E11]"
  | "[E10]"
  | "[E9]"
  | "[E8]"
  | "[E7]"
  | "[E6]"
  | "[E5]"
  | "[E4]"
  | "[E3]"
  | "[E2]"
  | "[E1]"
 ;
 export SIGMA = b.kBytes | POWERS;
 export SIGMA_STAR = SIGMA*;
 export SIGMA_PLUS = SIGMA+;
 ################################################################################
 # BEGIN LANGUAGE SPECIFIC DATA
 revaluations =
    ("[E4]" : "[E1]")
  | ("[E5]" : "[E2]")
  | ("[E7]" : "[E1]")
  | ("[E8]" : "[E2]")
 ;
 Ms = "[E3]" | "[E6]" | "[E9]";
 func Zero[expr] {
  return expr : ("");
 }
 space = " ";
 lexset3 = Optimize[
    ("1[E1]+1" : "одиннадцатая@")
  | ("1[E1]+1" : "одиннадцати")
  | ("1[E1]+1" : "одиннадцатого@")
  | ("1[E1]+1" : "одиннадцатое@")
  | ("1[E1]+1" : "одиннадцатой@")
  | ("1[E1]+1" : "одиннадцатом@")
  | ("1[E1]+1" : "одиннадцатому@")
  | ("1[E1]+1" : "одиннадцатую@")
  | ("1[E1]+1" : "одиннадцатые@")
  | ("1[E1]+1" : "одиннадцатый@")
  | ("1[E1]+1" : "одиннадцатым@")
  | ("1[E1]+1" : "одиннадцатыми@")
  | ("1[E1]+1" : "одиннадцатых@")
  | ("1[E1]+1" : "одиннадцать")
  | ("1[E1]+1" : "одиннадцатью")
  | ("1[E1]+2" : "двенадцатая@")
  | ("1[E1]+2" : "двенадцати")
  | ("1[E1]+2" : "двенадцатого@")
  | ("1[E1]+2" : "двенадцатое@")
  | ("1[E1]+2" : "двенадцатой@")
  | ("1[E1]+2" : "двенадцатом@")
  | ("1[E1]+2" : "двенадцатому@")
  | ("1[E1]+2" : "двенадцатую@")
  | ("1[E1]+2" : "двенадцатые@")
  | ("1[E1]+2" : "двенадцатый@")
  | ("1[E1]+2" : "двенадцатым@")
  | ("1[E1]+2" : "двенадцатыми@")
  | ("1[E1]+2" : "двенадцатых@")
  | ("1[E1]+2" : "двенадцать")
  | ("1[E1]+2" : "двенадцатью")
  | ("1[E1]+3" : "тринадцатая@")
  | ("1[E1]+3" : "тринадцати")
  | ("1[E1]+3" : "тринадцатого@")
  | ("1[E1]+3" : "тринадцатое@")
  | ("1[E1]+3" : "тринадцатой@")
  | ("1[E1]+3" : "тринадцатом@")
  | ("1[E1]+3" : "тринадцатому@")
  | ("1[E1]+3" : "тринадцатую@")
  | ("1[E1]+3" : "тринадцатые@")
  | ("1[E1]+3" : "тринадцатый@")
  | ("1[E1]+3" : "тринадцатым@")
  | ("1[E1]+3" : "тринадцатыми@")
  | ("1[E1]+3" : "тринадцатых@")
  | ("1[E1]+3" : "тринадцать")
  | ("1[E1]+3" : "тринадцатью")
  | ("1[E1]+4" : "четырнадцатая@")
  | ("1[E1]+4" : "четырнадцати")
  | ("1[E1]+4" : "четырнадцатого@")
  | ("1[E1]+4" : "четырнадцатое@")
  | ("1[E1]+4" : "четырнадцатой@")
  | ("1[E1]+4" : "четырнадцатом@")
  | ("1[E1]+4" : "четырнадцатому@")
  | ("1[E1]+4" : "четырнадцатую@")
  | ("1[E1]+4" : "четырнадцатые@")
  | ("1[E1]+4" : "четырнадцатый@")
  | ("1[E1]+4" : "четырнадцатым@")
  | ("1[E1]+4" : "четырнадцатыми@")
  | ("1[E1]+4" : "четырнадцатых@")
  | ("1[E1]+4" : "четырнадцать")
  | ("1[E1]+4" : "четырнадцатью")
  | ("1[E1]+5" : "пятнадцатая@")
  | ("1[E1]+5" : "пятнадцати")
  | ("1[E1]+5" : "пятнадцатого@")
  | ("1[E1]+5" : "пятнадцатое@")
  | ("1[E1]+5" : "пятнадцатой@")
  | ("1[E1]+5" : "пятнадцатом@")
  | ("1[E1]+5" : "пятнадцатому@")
  | ("1[E1]+5" : "пятнадцатую@")
  | ("1[E1]+5" : "пятнадцатые@")
  | ("1[E1]+5" : "пятнадцатый@")
  | ("1[E1]+5" : "пятнадцатым@")
  | ("1[E1]+5" : "пятнадцатыми@")
  | ("1[E1]+5" : "пятнадцатых@")
  | ("1[E1]+5" : "пятнадцать")
  | ("1[E1]+5" : "пятнадцатью")
  | ("1[E1]+6" : "шестнадцатая@")
  | ("1[E1]+6" : "шестнадцати")
  | ("1[E1]+6" : "шестнадцатого@")
  | ("1[E1]+6" : "шестнадцатое@")
  | ("1[E1]+6" : "шестнадцатой@")
  | ("1[E1]+6" : "шестнадцатом@")
  | ("1[E1]+6" : "шестнадцатому@")
  | ("1[E1]+6" : "шестнадцатую@")
  | ("1[E1]+6" : "шестнадцатые@")
  | ("1[E1]+6" : "шестнадцатый@")
  | ("1[E1]+6" : "шестнадцатым@")
  | ("1[E1]+6" : "шестнадцатыми@")
  | ("1[E1]+6" : "шестнадцатых@")
  | ("1[E1]+6" : "шестнадцать")
  | ("1[E1]+6" : "шестнадцатью")
  | ("1[E1]+7" : "семнадцатая@")
  | ("1[E1]+7" : "семнадцати")
  | ("1[E1]+7" : "семнадцатого@")
  | ("1[E1]+7" : "семнадцатое@")
  | ("1[E1]+7" : "семнадцатой@")
  | ("1[E1]+7" : "семнадцатом@")
  | ("1[E1]+7" : "семнадцатому@")
  | ("1[E1]+7" : "семнадцатую@")
  | ("1[E1]+7" : "семнадцатые@")
  | ("1[E1]+7" : "семнадцатый@")
  | ("1[E1]+7" : "семнадцатым@")
  | ("1[E1]+7" : "семнадцатыми@")
  | ("1[E1]+7" : "семнадцатых@")
  | ("1[E1]+7" : "семнадцать")
  | ("1[E1]+7" : "семнадцатью")
  | ("1[E1]+8" : "восемнадцатая@")
  | ("1[E1]+8" : "восемнадцати")
  | ("1[E1]+8" : "восемнадцатого@")
  | ("1[E1]+8" : "восемнадцатое@")
  | ("1[E1]+8" : "восемнадцатой@")
  | ("1[E1]+8" : "восемнадцатом@")
  | ("1[E1]+8" : "восемнадцатому@")
  | ("1[E1]+8" : "восемнадцатую@")
  | ("1[E1]+8" : "восемнадцатые@")
  | ("1[E1]+8" : "восемнадцатый@")
  | ("1[E1]+8" : "восемнадцатым@")
  | ("1[E1]+8" : "восемнадцатыми@")
  | ("1[E1]+8" : "восемнадцатых@")
  | ("1[E1]+8" : "восемнадцать")
  | ("1[E1]+8" : "восемнадцатью")
  | ("1[E1]+9" : "девятнадцатая@")
  | ("1[E1]+9" : "девятнадцати")
  | ("1[E1]+9" : "девятнадцатого@")
  | ("1[E1]+9" : "девятнадцатое@")
  | ("1[E1]+9" : "девятнадцатой@")
  | ("1[E1]+9" : "девятнадцатом@")
  | ("1[E1]+9" : "девятнадцатому@")
  | ("1[E1]+9" : "девятнадцатую@")
  | ("1[E1]+9" : "девятнадцатые@")
  | ("1[E1]+9" : "девятнадцатый@")
  | ("1[E1]+9" : "девятнадцатым@")
  | ("1[E1]+9" : "девятнадцатыми@")
  | ("1[E1]+9" : "девятнадцатых@")
  | ("1[E1]+9" : "девятнадцать")
  | ("1[E1]+9" : "девятнадцатью")]
 ;
 lex3 = CDRewrite[lexset3 I[space], "", "", SIGMA_STAR];
 lexset2 = Optimize[
    ("1[E1]" : "десятая@")
  | ("1[E1]" : "десяти")
  | ("1[E1]" : "десятого@")
  | ("1[E1]" : "десятое@")
  | ("1[E1]" : "десятой@")
  | ("1[E1]" : "десятом@")
  | ("1[E1]" : "десятому@")
  | ("1[E1]" : "десятую@")
  | ("1[E1]" : "десятые@")
  | ("1[E1]" : "десятый@")
  | ("1[E1]" : "десятым@")
  | ("1[E1]" : "десятыми@")
  | ("1[E1]" : "десятых@")
  | ("1[E1]" : "десять")
  | ("1[E1]" : "десятью")
  | ("1[E2]" : "сотая@")
  | ("1[E2]" : "сотого@")
  | ("1[E2]" : "сотое@")
  | ("1[E2]" : "сотой@")
  | ("1[E2]" : "сотом@")
  | ("1[E2]" : "сотому@")
  | ("1[E2]" : "сотую@")
  | ("1[E2]" : "сотые@")
  | ("1[E2]" : "сотый@")
  | ("1[E2]" : "сотым@")
  | ("1[E2]" : "сотыми@")
  | ("1[E2]" : "сотых@")
  | ("1[E2]" : "ста")
  | ("1[E2]" : "сто")
  | ("1[E3]" : "тысячная@")
  | ("1[E3]" : "тысячного@")
  | ("1[E3]" : "тысячное@")
  | ("1[E3]" : "тысячной@")
  | ("1[E3]" : "тысячном@")
  | ("1[E3]" : "тысячному@")
  | ("1[E3]" : "тысячную@")
  | ("1[E3]" : "тысячные@")
  | ("1[E3]" : "тысячный@")
  | ("1[E3]" : "тысячным@")
  | ("1[E3]" : "тысячными@")
  | ("1[E3]" : "тысячных@")
  | ("1[E6]" : "миллионная@")
  | ("1[E6]" : "миллионного@")
  | ("1[E6]" : "миллионное@")
  | ("1[E6]" : "миллионной@")
  | ("1[E6]" : "миллионном@")
  | ("1[E6]" : "миллионному@")
  | ("1[E6]" : "миллионную@")
  | ("1[E6]" : "миллионные@")
  | ("1[E6]" : "миллионный@")
  | ("1[E6]" : "миллионным@")
  | ("1[E6]" : "миллионными@")
  | ("1[E6]" : "миллионных@")
  | ("1[E9]" : "миллиардная@")
  | ("1[E9]" : "миллиардного@")
  | ("1[E9]" : "миллиардное@")
  | ("1[E9]" : "миллиардной@")
  | ("1[E9]" : "миллиардном@")
  | ("1[E9]" : "миллиардному@")
  | ("1[E9]" : "миллиардную@")
  | ("1[E9]" : "миллиардные@")
  | ("1[E9]" : "миллиардный@")
  | ("1[E9]" : "миллиардным@")
  | ("1[E9]" : "миллиардными@")
  | ("1[E9]" : "миллиардных@")
  | ("2[E1]" : "двадцатая@")
  | ("2[E1]" : "двадцати")
  | ("2[E1]" : "двадцатого@")
  | ("2[E1]" : "двадцатое@")
  | ("2[E1]" : "двадцатой@")
  | ("2[E1]" : "двадцатом@")
  | ("2[E1]" : "двадцатому@")
  | ("2[E1]" : "двадцатую@")
  | ("2[E1]" : "двадцатые@")
  | ("2[E1]" : "двадцатый@")
  | ("2[E1]" : "двадцатым@")
  | ("2[E1]" : "двадцатыми@")
  | ("2[E1]" : "двадцатых@")
  | ("2[E1]" : "двадцать")
  | ("2[E1]" : "двадцатью")
  | ("2[E2]" : "двести")
  | ("2[E2]" : "двумстам")
  | ("2[E2]" : "двумястами")
  | ("2[E2]" : "двухсот")
  | ("2[E2]" : "двухсотая@")
  | ("2[E2]" : "двухсотого@")
  | ("2[E2]" : "двухсотое@")
  | ("2[E2]" : "двухсотой@")
  | ("2[E2]" : "двухсотом@")
  | ("2[E2]" : "двухсотому@")
  | ("2[E2]" : "двухсотую@")
  | ("2[E2]" : "двухсотые@")
  | ("2[E2]" : "двухсотый@")
  | ("2[E2]" : "двухсотым@")
  | ("2[E2]" : "двухсотыми@")
  | ("2[E2]" : "двухсотых@")
  | ("2[E2]" : "двухстах")
  | ("3[E1]" : "тридцатая@")
  | ("3[E1]" : "тридцати")
  | ("3[E1]" : "тридцатого@")
  | ("3[E1]" : "тридцатое@")
  | ("3[E1]" : "тридцатой@")
  | ("3[E1]" : "тридцатом@")
  | ("3[E1]" : "тридцатому@")
  | ("3[E1]" : "тридцатую@")
  | ("3[E1]" : "тридцатые@")
  | ("3[E1]" : "тридцатый@")
  | ("3[E1]" : "тридцатым@")
  | ("3[E1]" : "тридцатыми@")
  | ("3[E1]" : "тридцатых@")
  | ("3[E1]" : "тридцать")
  | ("3[E1]" : "тридцатью")
  | ("3[E2]" : "тремстам")
  | ("3[E2]" : "тремястами")
  | ("3[E2]" : "трехсот")
  | ("3[E2]" : "трехсотая@")
  | ("3[E2]" : "трехсотого@")
  | ("3[E2]" : "трехсотое@")
  | ("3[E2]" : "трехсотой@")
  | ("3[E2]" : "трехсотом@")
  | ("3[E2]" : "трехсотому@")
  | ("3[E2]" : "трехсотую@")
  | ("3[E2]" : "трехсотые@")
  | ("3[E2]" : "трехсотый@")
  | ("3[E2]" : "трехсотым@")
  | ("3[E2]" : "трехсотыми@")
  | ("3[E2]" : "трехсотых@")
  | ("3[E2]" : "трехстах")
  | ("3[E2]" : "триста")
  | ("4[E1]" : "сорок")
  | ("4[E1]" : "сорока")
  | ("4[E1]" : "сороковая@")
  | ("4[E1]" : "сорокового@")
  | ("4[E1]" : "сороковое@")
  | ("4[E1]" : "сороковой@")
  | ("4[E1]" : "сороковом@")
  | ("4[E1]" : "сороковому@")
  | ("4[E1]" : "сороковую@")
  | ("4[E1]" : "сороковые@")
  | ("4[E1]" : "сороковым@")
  | ("4[E1]" : "сороковыми@")
  | ("4[E1]" : "сороковых@")
  | ("4[E2]" : "четыремстам")
  | ("4[E2]" : "четыреста")
  | ("4[E2]" : "четырехсот")
  | ("4[E2]" : "четырехсотая@")
  | ("4[E2]" : "четырехсотого@")
  | ("4[E2]" : "четырехсотое@")
  | ("4[E2]" : "четырехсотой@")
  | ("4[E2]" : "четырехсотом@")
  | ("4[E2]" : "четырехсотому@")
  | ("4[E2]" : "четырехсотую@")
  | ("4[E2]" : "четырехсотые@")
  | ("4[E2]" : "четырехсотый@")
  | ("4[E2]" : "четырехсотым@")
  | ("4[E2]" : "четырехсотыми@")
  | ("4[E2]" : "четырехсотых@")
  | ("4[E2]" : "четырехстах")
  | ("4[E2]" : "четырьмястами")
  | ("5[E1]" : "пятидесятая@")
  | ("5[E1]" : "пятидесяти")
  | ("5[E1]" : "пятидесятого@")
  | ("5[E1]" : "пятидесятое@")
  | ("5[E1]" : "пятидесятой@")
  | ("5[E1]" : "пятидесятом@")
  | ("5[E1]" : "пятидесятому@")
  | ("5[E1]" : "пятидесятую@")
  | ("5[E1]" : "пятидесятые@")
  | ("5[E1]" : "пятидесятый@")
  | ("5[E1]" : "пятидесятым@")
  | ("5[E1]" : "пятидесятыми@")
  | ("5[E1]" : "пятидесятых@")
  | ("5[E1]" : "пятьдесят")
  | ("5[E1]" : "пятьюдесятью")
  | ("5[E2]" : "пятисот")
  | ("5[E2]" : "пятисотая@")
  | ("5[E2]" : "пятисотого@")
  | ("5[E2]" : "пятисотое@")
  | ("5[E2]" : "пятисотой@")
  | ("5[E2]" : "пятисотом@")
  | ("5[E2]" : "пятисотому@")
  | ("5[E2]" : "пятисотую@")
  | ("5[E2]" : "пятисотые@")
  | ("5[E2]" : "пятисотый@")
  | ("5[E2]" : "пятисотым@")
  | ("5[E2]" : "пятисотыми@")
  | ("5[E2]" : "пятисотых@")
  | ("5[E2]" : "пятистам")
  | ("5[E2]" : "пятистах")
  | ("5[E2]" : "пятьсот")
  | ("5[E2]" : "пятьюстами")
  | ("6[E1]" : "шестидесятая@")
  | ("6[E1]" : "шестидесяти")
  | ("6[E1]" : "шестидесятого@")
  | ("6[E1]" : "шестидесятое@")
  | ("6[E1]" : "шестидесятой@")
  | ("6[E1]" : "шестидесятом@")
  | ("6[E1]" : "шестидесятому@")
  | ("6[E1]" : "шестидесятую@")
  | ("6[E1]" : "шестидесятые@")
  | ("6[E1]" : "шестидесятый@")
  | ("6[E1]" : "шестидесятым@")
  | ("6[E1]" : "шестидесятыми@")
  | ("6[E1]" : "шестидесятых@")
  | ("6[E1]" : "шестьдесят")
  | ("6[E1]" : "шестьюдесятью")
  | ("6[E2]" : "шестисот")
  | ("6[E2]" : "шестисотая@")
  | ("6[E2]" : "шестисотого@")
  | ("6[E2]" : "шестисотое@")
  | ("6[E2]" : "шестисотой@")
  | ("6[E2]" : "шестисотом@")
  | ("6[E2]" : "шестисотому@")
  | ("6[E2]" : "шестисотую@")
  | ("6[E2]" : "шестисотые@")
  | ("6[E2]" : "шестисотый@")
  | ("6[E2]" : "шестисотым@")
  | ("6[E2]" : "шестисотыми@")
  | ("6[E2]" : "шестисотых@")
  | ("6[E2]" : "шестистам")
  | ("6[E2]" : "шестистах")
  | ("6[E2]" : "шестьсот")
  | ("6[E2]" : "шестьюстами")
  | ("7[E1]" : "семидесятая@")
  | ("7[E1]" : "семидесяти")
  | ("7[E1]" : "семидесятого@")
  | ("7[E1]" : "семидесятое@")
  | ("7[E1]" : "семидесятой@")
  | ("7[E1]" : "семидесятом@")
  | ("7[E1]" : "семидесятому@")
  | ("7[E1]" : "семидесятую@")
  | ("7[E1]" : "семидесятые@")
  | ("7[E1]" : "семидесятый@")
  | ("7[E1]" : "семидесятым@")
  | ("7[E1]" : "семидесятыми@")
  | ("7[E1]" : "семидесятых@")
  | ("7[E1]" : "семьдесят")
  | ("7[E1]" : "семьюдесятью")
  | ("7[E2]" : "семисот")
  | ("7[E2]" : "семисотая@")
  | ("7[E2]" : "семисотого@")
  | ("7[E2]" : "семисотое@")
  | ("7[E2]" : "семисотой@")
  | ("7[E2]" : "семисотом@")
  | ("7[E2]" : "семисотому@")
  | ("7[E2]" : "семисотую@")
  | ("7[E2]" : "семисотые@")
  | ("7[E2]" : "семисотый@")
  | ("7[E2]" : "семисотым@")
  | ("7[E2]" : "семисотыми@")
  | ("7[E2]" : "семисотых@")
  | ("7[E2]" : "семистам")
  | ("7[E2]" : "семистах")
  | ("7[E2]" : "семьсот")
  | ("7[E2]" : "семьюстами")
  | ("8[E1]" : "восемьдесят")
  | ("8[E1]" : "восьмидесятая@")
  | ("8[E1]" : "восьмидесяти")
  | ("8[E1]" : "восьмидесятого@")
  | ("8[E1]" : "восьмидесятое@")
  | ("8[E1]" : "восьмидесятой@")
  | ("8[E1]" : "восьмидесятом@")
  | ("8[E1]" : "восьмидесятому@")
  | ("8[E1]" : "восьмидесятую@")
  | ("8[E1]" : "восьмидесятые@")
  | ("8[E1]" : "восьмидесятый@")
  | ("8[E1]" : "восьмидесятым@")
  | ("8[E1]" : "восьмидесятыми@")
  | ("8[E1]" : "восьмидесятых@")
  | ("8[E1]" : "восьмьюдесятью")
  | ("8[E2]" : "восемьсот")
  | ("8[E2]" : "восемьюстами")
  | ("8[E2]" : "восьмисот")
  | ("8[E2]" : "восьмисотая@")
  | ("8[E2]" : "восьмисотого@")
  | ("8[E2]" : "восьмисотое@")
  | ("8[E2]" : "восьмисотой@")
  | ("8[E2]" : "восьмисотом@")
  | ("8[E2]" : "восьмисотому@")
  | ("8[E2]" : "восьмисотую@")
  | ("8[E2]" : "восьмисотые@")
  | ("8[E2]" : "восьмисотый@")
  | ("8[E2]" : "восьмисотым@")
  | ("8[E2]" : "восьмисотыми@")
  | ("8[E2]" : "восьмисотых@")
  | ("8[E2]" : "восьмистам")
  | ("8[E2]" : "восьмистах")
  | ("8[E2]" : "восьмьюстами")
  | ("9[E1]" : "девяноста")
  | ("9[E1]" : "девяностая@")
  | ("9[E1]" : "девяносто")
  | ("9[E1]" : "девяностого@")
  | ("9[E1]" : "девяностое@")
  | ("9[E1]" : "девяностой@")
  | ("9[E1]" : "девяностом@")
  | ("9[E1]" : "девяностому@")
  | ("9[E1]" : "девяностую@")
  | ("9[E1]" : "девяностые@")
  | ("9[E1]" : "девяностый@")
  | ("9[E1]" : "девяностым@")
  | ("9[E1]" : "девяностыми@")
  | ("9[E1]" : "девяностых@")
  | ("9[E2]" : "девятисот")
  | ("9[E2]" : "девятисотая@")
  | ("9[E2]" : "девятисотого@")
  | ("9[E2]" : "девятисотое@")
  | ("9[E2]" : "девятисотой@")
  | ("9[E2]" : "девятисотом@")
  | ("9[E2]" : "девятисотому@")
  | ("9[E2]" : "девятисотую@")
  | ("9[E2]" : "девятисотые@")
  | ("9[E2]" : "девятисотый@")
  | ("9[E2]" : "девятисотым@")
  | ("9[E2]" : "девятисотыми@")
  | ("9[E2]" : "девятисотых@")
  | ("9[E2]" : "девятистам")
  | ("9[E2]" : "девятистах")
  | ("9[E2]" : "девятьсот")
  | ("9[E2]" : "девятьюстами")]
 ;
 lex2 = CDRewrite[lexset2 I[space], "", "", SIGMA_STAR];
 lexset1 = Optimize[
    ("+" : "")
  | ("1" : "один")
  | ("1" : "одна")
  | ("1" : "одни")
  | ("1" : "одним")
  | ("1" : "одними")
  | ("1" : "одних")
  | ("1" : "одно")
  | ("1" : "одного")
  | ("1" : "одной")
  | ("1" : "одном")
  | ("1" : "одному")
  | ("1" : "одною")
  | ("1" : "одну")
  | ("1" : "первая@")
  | ("1" : "первого@")
  | ("1" : "первое@")
  | ("1" : "первой@")
  | ("1" : "первом@")
  | ("1" : "первому@")
  | ("1" : "первую@")
  | ("1" : "первые@")
  | ("1" : "первый@")
  | ("1" : "первым@")
  | ("1" : "первыми@")
  | ("1" : "первых@")
  | ("2" : "вторая@")
  | ("2" : "второго@")
  | ("2" : "второе@")
  | ("2" : "второй@")
  | ("2" : "втором@")
  | ("2" : "второму@")
  | ("2" : "вторую@")
  | ("2" : "вторые@")
  | ("2" : "вторым@")
  | ("2" : "вторыми@")
  | ("2" : "вторых@")
  | ("2" : "два")
  | ("2" : "две")
  | ("2" : "двум")
  | ("2" : "двумя")
  | ("2" : "двух")
  | ("3" : "трем")
  | ("3" : "тремя")
  | ("3" : "третий@")
  | ("3" : "третье@")
  | ("3" : "третьего@")
  | ("3" : "третьей@")
  | ("3" : "третьем@")
  | ("3" : "третьему@")
  | ("3" : "третьи@")
  | ("3" : "третьим@")
  | ("3" : "третьими@")
  | ("3" : "третьих@")
  | ("3" : "третью@")
  | ("3" : "третья@")
  | ("3" : "трех")
  | ("3" : "три")
  | ("4" : "четвертая@")
  | ("4" : "четвертого@")
  | ("4" : "четвертое@")
  | ("4" : "четвертой@")
  | ("4" : "четвертом@")
  | ("4" : "четвертому@")
  | ("4" : "четвертую@")
  | ("4" : "четвертые@")
  | ("4" : "четвертый@")
  | ("4" : "четвертым@")
  | ("4" : "четвертыми@")
  | ("4" : "четвертых@")
  | ("4" : "четыре")
  | ("4" : "четырем")
  | ("4" : "четырех")
  | ("4" : "четырьмя")
  | ("5" : "пятая@")
  | ("5" : "пяти")
  | ("5" : "пятого@")
  | ("5" : "пятое@")
  | ("5" : "пятой@")
  | ("5" : "пятом@")
  | ("5" : "пятому@")
  | ("5" : "пятую@")
  | ("5" : "пятые@")
  | ("5" : "пятый@")
  | ("5" : "пятым@")
  | ("5" : "пятыми@")
  | ("5" : "пятых@")
  | ("5" : "пять")
  | ("5" : "пятью")
  | ("6" : "шестая@")
  | ("6" : "шести")
  | ("6" : "шестого@")
  | ("6" : "шестое@")
  | ("6" : "шестой@")
  | ("6" : "шестом@")
  | ("6" : "шестому@")
  | ("6" : "шестую@")
  | ("6" : "шестые@")
  | ("6" : "шестым@")
  | ("6" : "шестыми@")
  | ("6" : "шестых@")
  | ("6" : "шесть")
  | ("6" : "шестью")
  | ("7" : "седьмая@")
  | ("7" : "седьмого@")
  | ("7" : "седьмое@")
  | ("7" : "седьмой@")
  | ("7" : "седьмом@")
  | ("7" : "седьмому@")
  | ("7" : "седьмую@")
  | ("7" : "седьмые@")
  | ("7" : "седьмым@")
  | ("7" : "седьмыми@")
  | ("7" : "седьмых@")
  | ("7" : "семи")
  | ("7" : "семь")
  | ("7" : "семью")
  | ("8" : "восемь")
  | ("8" : "восьмая@")
  | ("8" : "восьми")
  | ("8" : "восьмого@")
  | ("8" : "восьмое@")
  | ("8" : "восьмой@")
  | ("8" : "восьмом@")
  | ("8" : "восьмому@")
  | ("8" : "восьмую@")
  | ("8" : "восьмые@")
  | ("8" : "восьмым@")
  | ("8" : "восьмыми@")
  | ("8" : "восьмых@")
  | ("8" : "восьмью")
  | ("9" : "девятая@")
  | ("9" : "девяти")
  | ("9" : "девятого@")
  | ("9" : "девятое@")
  | ("9" : "девятой@")
  | ("9" : "девятом@")
  | ("9" : "девятому@")
  | ("9" : "девятую@")
  | ("9" : "девятые@")
  | ("9" : "девятый@")
  | ("9" : "девятым@")
  | ("9" : "девятыми@")
  | ("9" : "девятых@")
  | ("9" : "девять")
  | ("9" : "девятью")
  | ("[E3]" : "тысяч")
  | ("[E3]" : "тысяча")
  | ("[E3]" : "тысячам")
  | ("[E3]" : "тысячами")
  | ("[E3]" : "тысячах")
  | ("[E3]" : "тысяче")
  | ("[E3]" : "тысячей")
  | ("[E3]" : "тысячи")
  | ("[E3]" : "тысячу")
  | ("[E3]" : "тысячью")
  | ("[E6]" : "миллион")
  | ("[E6]" : "миллиона")
  | ("[E6]" : "миллионам")
  | ("[E6]" : "миллионами")
  | ("[E6]" : "миллионах")
  | ("[E6]" : "миллионе")
  | ("[E6]" : "миллионов")
  | ("[E6]" : "миллионом")
  | ("[E6]" : "миллиону")
  | ("[E6]" : "миллионы")
  | ("[E9]" : "миллиард")
  | ("[E9]" : "миллиарда")
  | ("[E9]" : "миллиардам")
  | ("[E9]" : "миллиардами")
  | ("[E9]" : "миллиардах")
  | ("[E9]" : "миллиарде")
  | ("[E9]" : "миллиардов")
  | ("[E9]" : "миллиардом")
  | ("[E9]" : "миллиарду")
  | ("[E9]" : "миллиарды")
  | ("|0|" : "ноле")
  | ("|0|" : "нолем")
  | ("|0|" : "ноль")
  | ("|0|" : "нолю")
  | ("|0|" : "ноля")
  | ("|0|" : "нуле")
  | ("|0|" : "нулем")
  | ("|0|" : "нуль")
  | ("|0|" : "нулю")
  | ("|0|" : "нуля")]
 ;
 lex1 = CDRewrite[lexset1 I[space], "", "", SIGMA_STAR];
 export LEX = Optimize[lex3 @ lex2 @ lex1];
 export INDEPENDENT_EXPONENTS = "[E3]" | "[E6]" | "[E9]";
 # END LANGUAGE SPECIFIC DATA
 ################################################################################
 # Inserts a marker after the Ms.
 export INSERT_BOUNDARY = CDRewrite["" : "%", Ms, "", SIGMA_STAR];
 # Deletes all powers and "+".
 export DELETE_POWERS = CDRewrite[D[POWERS | "+"], "", "", SIGMA_STAR];
 # Deletes trailing zeros at the beginning of a number, so that "0003" does not
 # get treated as an ordinary number.
 export DELETE_INITIAL_ZEROS =
  CDRewrite[("0" POWERS "+") : "", "[BOS]", "", SIGMA_STAR]
 ;
 NonMs = Optimize[POWERS - Ms];
 # Deletes (usually) zeros before a non-M. E.g., +0[E1] should be
 # deleted
 export DELETE_INTERMEDIATE_ZEROS1 =
  CDRewrite[Zero["+0" NonMs], "", "", SIGMA_STAR]
 ;
 # Deletes (usually) zeros before an M, if there is no non-zero element between
 # that and the previous boundary. Thus, if after the result of the rule above we
 # end up with "%+0[E3]", then that gets deleted. Also (really) deletes a final
 # zero.
 export DELETE_INTERMEDIATE_ZEROS2 = Optimize[
   CDRewrite[Zero["%+0" Ms], "", "", SIGMA_STAR]
 @ CDRewrite[D["+0"], "", "[EOS]", SIGMA_STAR]]
 ;
 # Final clean up of stray zeros.
 export DELETE_REMAINING_ZEROS = Optimize[
   CDRewrite[Zero["+0"], "", "", SIGMA_STAR]
 @ CDRewrite[Zero["0"], "", "", SIGMA_STAR]]
 ;
 # Applies the revaluation map. For example in English, change [E4] to [E1] as a
 # modifier of [E3]
 export REVALUE = CDRewrite[revaluations, "", "", SIGMA_STAR];
 # Deletes the various marks and powers in the input and output.
 export DELETE_MARKS = CDRewrite[D["%" | "+" | POWERS], "", "", SIGMA_STAR];
 export CLEAN_SPACES = Optimize[
   CDRewrite[" "+ : " ", b.kNotSpace, b.kNotSpace, SIGMA_STAR]
 @ CDRewrite[" "* : "", "[BOS]", "", SIGMA_STAR]
 @ CDRewrite[" "* : "", "", "[EOS]", SIGMA_STAR]]
 ;
 d = b.kDigit;
 # Germanic inversion rule.
 germanic =
    (I["1+"] d "[E1]" D["+1"])
  | (I["2+"] d "[E1]" D["+2"])
  | (I["3+"] d "[E1]" D["+3"])
  | (I["4+"] d "[E1]" D["+4"])
  | (I["5+"] d "[E1]" D["+5"])
  | (I["6+"] d "[E1]" D["+6"])
  | (I["7+"] d "[E1]" D["+7"])
  | (I["8+"] d "[E1]" D["+8"])
  | (I["9+"] d "[E1]" D["+9"])
 ;
 germanic_inversion =
  CDRewrite[germanic, "", "", SIGMA_STAR, 'ltr', 'opt']
 ;
 export GERMANIC_INVERSION = SIGMA_STAR;
 export ORDINAL_RESTRICTION = 
  Optimize[((SIGMA - "@")* "@") @ CDRewrite[D["@"], "", "", SIGMA_STAR]]
 ;
 nondigits = b.kBytes - b.kDigit;
 export ORDINAL_SUFFIX = D[nondigits*];
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/ordinals.tsv
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/ordinals.tsv
@ -0,0 +1,527 @@
 0	нулевая
 0	нулевого
 0	нулевое
 0	нулевой
 0	нулевом
 0	нулевому
 0	нулевую
 0	нулевые
 0	нулевым
 0	нулевым
 0	нулевыми
 0	нулевых
 1	первая
 1	первого
 1	первое
 1	первой
 1	первом
 1	первому
 1	первую
 1	первые
 1	первый
 1	первым
 1	первым
 1	первыми
 1	первых
 2	вторая
 2	второго
 2	второе
 2	второй
 2	втором
 2	второму
 2	вторую
 2	вторые
 2	вторым
 2	вторым
 2	вторыми
 2	вторых
 3	третий
 3	третье
 3	третьего
 3	третьей
 3	третьем
 3	третьему
 3	третьи
 3	третьим
 3	третьим
 3	третьими
 3	третьих
 3	третью
 3	третья
 4	четвертая
 4	четвертого
 4	четвертое
 4	четвертой
 4	четвертом
 4	четвертому
 4	четвертую
 4	четвертые
 4	четвертый
 4	четвертым
 4	четвертым
 4	четвертыми
 4	четвертых
 4	четвёртая
 4	четвёртого
 4	четвёртое
 4	четвёртой
 4	четвёртом
 4	четвёртому
 4	четвёртую
 4	четвёртые
 4	четвёртый
 4	четвёртым
 4	четвёртым
 4	четвёртыми
 4	четвёртых
 5	пятая
 5	пятого
 5	пятое
 5	пятой
 5	пятом
 5	пятому
 5	пятую
 5	пятые
 5	пятый
 5	пятым
 5	пятым
 5	пятыми
 5	пятых
 6	шестая
 6	шестого
 6	шестое
 6	шестой
 6	шестом
 6	шестому
 6	шестую
 6	шестые
 6	шестым
 6	шестым
 6	шестыми
 6	шестых
 7	седьмая
 7	седьмого
 7	седьмое
 7	седьмой
 7	седьмом
 7	седьмому
 7	седьмую
 7	седьмые
 7	седьмым
 7	седьмым
 7	седьмыми
 7	седьмых
 8	восьмая
 8	восьмого
 8	восьмое
 8	восьмой
 8	восьмом
 8	восьмому
 8	восьмую
 8	восьмые
 8	восьмым
 8	восьмым
 8	восьмыми
 8	восьмых
 9	девятая
 9	девятого
 9	девятое
 9	девятой
 9	девятом
 9	девятому
 9	девятую
 9	девятые
 9	девятый
 9	девятым
 9	девятым
 9	девятыми
 9	девятых
 10	десятая
 10	десятого
 10	десятое
 10	десятой
 10	десятом
 10	десятому
 10	десятую
 10	десятые
 10	десятый
 10	десятым
 10	десятым
 10	десятыми
 10	десятых
 11	одиннадцатая
 11	одиннадцатого
 11	одиннадцатое
 11	одиннадцатой
 11	одиннадцатом
 11	одиннадцатому
 11	одиннадцатую
 11	одиннадцатые
 11	одиннадцатый
 11	одиннадцатым
 11	одиннадцатым
 11	одиннадцатыми
 11	одиннадцатых
 12	двенадцатая
 12	двенадцатого
 12	двенадцатое
 12	двенадцатой
 12	двенадцатом
 12	двенадцатому
 12	двенадцатую
 12	двенадцатые
 12	двенадцатый
 12	двенадцатым
 12	двенадцатым
 12	двенадцатыми
 12	двенадцатых
 13	тринадцатая
 13	тринадцатого
 13	тринадцатое
 13	тринадцатой
 13	тринадцатом
 13	тринадцатому
 13	тринадцатую
 13	тринадцатые
 13	тринадцатый
 13	тринадцатым
 13	тринадцатым
 13	тринадцатыми
 13	тринадцатых
 14	четырнадцатая
 14	четырнадцатого
 14	четырнадцатое
 14	четырнадцатой
 14	четырнадцатом
 14	четырнадцатому
 14	четырнадцатую
 14	четырнадцатые
 14	четырнадцатый
 14	четырнадцатым
 14	четырнадцатым
 14	четырнадцатыми
 14	четырнадцатых
 15	пятнадцатая
 15	пятнадцатого
 15	пятнадцатое
 15	пятнадцатой
 15	пятнадцатом
 15	пятнадцатому
 15	пятнадцатую
 15	пятнадцатые
 15	пятнадцатый
 15	пятнадцатым
 15	пятнадцатым
 15	пятнадцатыми
 15	пятнадцатых
 16	шестнадцатая
 16	шестнадцатого
 16	шестнадцатое
 16	шестнадцатой
 16	шестнадцатом
 16	шестнадцатому
 16	шестнадцатую
 16	шестнадцатые
 16	шестнадцатый
 16	шестнадцатым
 16	шестнадцатым
 16	шестнадцатыми
 16	шестнадцатых
 17	семнадцатая
 17	семнадцатого
 17	семнадцатое
 17	семнадцатой
 17	семнадцатом
 17	семнадцатому
 17	семнадцатую
 17	семнадцатые
 17	семнадцатый
 17	семнадцатым
 17	семнадцатым
 17	семнадцатыми
 17	семнадцатых
 18	восемнадцатая
 18	восемнадцатого
 18	восемнадцатое
 18	восемнадцатой
 18	восемнадцатом
 18	восемнадцатому
 18	восемнадцатую
 18	восемнадцатые
 18	восемнадцатый
 18	восемнадцатым
 18	восемнадцатым
 18	восемнадцатыми
 18	восемнадцатых
 19	девятнадцатая
 19	девятнадцатого
 19	девятнадцатое
 19	девятнадцатой
 19	девятнадцатом
 19	девятнадцатому
 19	девятнадцатую
 19	девятнадцатые
 19	девятнадцатый
 19	девятнадцатым
 19	девятнадцатым
 19	девятнадцатыми
 19	девятнадцатых
 20	двадцатая
 20	двадцатого
 20	двадцатое
 20	двадцатой
 20	двадцатом
 20	двадцатому
 20	двадцатую
 20	двадцатые
 20	двадцатый
 20	двадцатым
 20	двадцатым
 20	двадцатыми
 20	двадцатых
 30	тридцатая
 30	тридцатого
 30	тридцатое
 30	тридцатой
 30	тридцатом
 30	тридцатому
 30	тридцатую
 30	тридцатые
 30	тридцатый
 30	тридцатым
 30	тридцатым
 30	тридцатыми
 30	тридцатых
 40	сороковая
 40	сорокового
 40	сороковое
 40	сороковой
 40	сороковом
 40	сороковому
 40	сороковую
 40	сороковые
 40	сороковым
 40	сороковым
 40	сороковыми
 40	сороковых
 50	пятидесятая
 50	пятидесятого
 50	пятидесятое
 50	пятидесятой
 50	пятидесятом
 50	пятидесятому
 50	пятидесятую
 50	пятидесятые
 50	пятидесятый
 50	пятидесятым
 50	пятидесятым
 50	пятидесятыми
 50	пятидесятых
 60	шестидесятая
 60	шестидесятого
 60	шестидесятое
 60	шестидесятой
 60	шестидесятом
 60	шестидесятому
 60	шестидесятую
 60	шестидесятые
 60	шестидесятый
 60	шестидесятым
 60	шестидесятым
 60	шестидесятыми
 60	шестидесятых
 70	семидесятая
 70	семидесятого
 70	семидесятое
 70	семидесятой
 70	семидесятом
 70	семидесятому
 70	семидесятую
 70	семидесятые
 70	семидесятый
 70	семидесятым
 70	семидесятым
 70	семидесятыми
 70	семидесятых
 80	восьмидесятая
 80	восьмидесятого
 80	восьмидесятое
 80	восьмидесятой
 80	восьмидесятом
 80	восьмидесятому
 80	восьмидесятую
 80	восьмидесятые
 80	восьмидесятый
 80	восьмидесятым
 80	восьмидесятым
 80	восьмидесятыми
 80	восьмидесятых
 90	девяностая
 90	девяностого
 90	девяностое
 90	девяностой
 90	девяностом
 90	девяностому
 90	девяностую
 90	девяностые
 90	девяностый
 90	девяностым
 90	девяностым
 90	девяностыми
 90	девяностых
 100	сотая
 100	сотого
 100	сотое
 100	сотой
 100	сотом
 100	сотому
 100	сотую
 100	сотые
 100	сотый
 100	сотым
 100	сотым
 100	сотыми
 100	сотых
 200	двухсотая
 200	двухсотого
 200	двухсотое
 200	двухсотой
 200	двухсотом
 200	двухсотому
 200	двухсотую
 200	двухсотые
 200	двухсотый
 200	двухсотым
 200	двухсотым
 200	двухсотыми
 200	двухсотых
 300	трехсотая
 300	трехсотого
 300	трехсотое
 300	трехсотой
 300	трехсотом
 300	трехсотому
 300	трехсотую
 300	трехсотые
 300	трехсотый
 300	трехсотым
 300	трехсотым
 300	трехсотыми
 300	трехсотых
 400	четырехсотая
 400	четырехсотого
 400	четырехсотое
 400	четырехсотой
 400	четырехсотом
 400	четырехсотому
 400	четырехсотую
 400	четырехсотые
 400	четырехсотый
 400	четырехсотым
 400	четырехсотым
 400	четырехсотыми
 400	четырехсотых
 500	пятисотая
 500	пятисотого
 500	пятисотое
 500	пятисотой
 500	пятисотом
 500	пятисотому
 500	пятисотую
 500	пятисотые
 500	пятисотый
 500	пятисотым
 500	пятисотым
 500	пятисотыми
 500	пятисотых
 600	шестисотая
 600	шестисотого
 600	шестисотое
 600	шестисотой
 600	шестисотом
 600	шестисотому
 600	шестисотую
 600	шестисотые
 600	шестисотый
 600	шестисотым
 600	шестисотым
 600	шестисотыми
 600	шестисотых
 700	семисотая
 700	семисотого
 700	семисотое
 700	семисотой
 700	семисотом
 700	семисотому
 700	семисотую
 700	семисотые
 700	семисотый
 700	семисотым
 700	семисотым
 700	семисотыми
 700	семисотых
 800	восьмисотая
 800	восьмисотого
 800	восьмисотое
 800	восьмисотой
 800	восьмисотом
 800	восьмисотому
 800	восьмисотую
 800	восьмисотые
 800	восьмисотый
 800	восьмисотым
 800	восьмисотым
 800	восьмисотыми
 800	восьмисотых
 900	девятисотая
 900	девятисотого
 900	девятисотое
 900	девятисотой
 900	девятисотом
 900	девятисотому
 900	девятисотую
 900	девятисотые
 900	девятисотый
 900	девятисотым
 900	девятисотым
 900	девятисотыми
 900	девятисотых
 1000	тысячная
 1000	тысячного
 1000	тысячное
 1000	тысячной
 1000	тысячном
 1000	тысячному
 1000	тысячную
 1000	тысячные
 1000	тысячный
 1000	тысячным
 1000	тысячным
 1000	тысячными
 1000	тысячных
 1000000	миллионная
 1000000	миллионного
 1000000	миллионное
 1000000	миллионной
 1000000	миллионном
 1000000	миллионному
 1000000	миллионную
 1000000	миллионные
 1000000	миллионный
 1000000	миллионным
 1000000	миллионным
 1000000	миллионными
 1000000	миллионных
 1000000000	миллиардная
 1000000000	миллиардного
 1000000000	миллиардное
 1000000000	миллиардной
 1000000000	миллиардном
 1000000000	миллиардному
 1000000000	миллиардную
 1000000000	миллиардные
 1000000000	миллиардный
 1000000000	миллиардным
 1000000000	миллиардным
 1000000000	миллиардными
 1000000000	миллиардных
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/spelled.grm
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/spelled.grm
@ -0,0 +1,77 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
 # This verbalizer is used whenever there is an LM symbol that consists of
 # letters immediately followed by "{spelled}". This strips the "{spelled}"
 # suffix.
 import 'util/byte.grm' as b;
 import 'ru/classifier/cyrillic.grm' as c;
 import 'ru/verbalizer/lexical_map.grm' as l;
 import 'ru/verbalizer/numbers.grm' as n;
 digit = b.kDigit @ n.CARDINAL_NUMBERS;
 char_set = (("a" | "A") : "letter-a")
        | (("b" | "B") : "letter-b")
        | (("c" | "C") : "letter-c")
        | (("d" | "D") : "letter-d")
        | (("e" | "E") : "letter-e")
        | (("f" | "F") : "letter-f")
        | (("g" | "G") : "letter-g")
        | (("h" | "H") : "letter-h")
        | (("i" | "I") : "letter-i")
        | (("j" | "J") : "letter-j")
        | (("k" | "K") : "letter-k")
        | (("l" | "L") : "letter-l")
        | (("m" | "M") : "letter-m")
        | (("n" | "N") : "letter-n")
        | (("o" | "O") : "letter-o")
        | (("p" | "P") : "letter-p")
        | (("q" | "Q") : "letter-q")
        | (("r" | "R") : "letter-r")
        | (("s" | "S") : "letter-s")
        | (("t" | "T") : "letter-t")
        | (("u" | "U") : "letter-u")
        | (("v" | "V") : "letter-v")
        | (("w" | "W") : "letter-w")
        | (("x" | "X") : "letter-x")
        | (("y" | "Y") : "letter-y")
        | (("z" | "Z") : "letter-z")
        | (digit)
        | ("&" : "@@AND@@")
        | ("." : "")
        | ("-" : "")
        | ("_" : "")
        | ("/" : "")
        | (n.I["letter-"] c.kCyrillicAlpha)
        ;
 ins_space = "" : " ";
 suffix = "{spelled}" : "";
 spelled = Optimize[char_set (ins_space char_set)* suffix];
 export SPELLED = Optimize[spelled @ l.LEXICAL_MAP];
 sigma_star = b.kBytes*;
 # Gets rid of the letter- prefix since in some cases we don't want it.
 del_letter = CDRewrite[n.D["letter-"], "", "", sigma_star];
 spelled_no_tag = Optimize[char_set (ins_space char_set)*];
 export SPELLED_NO_LETTER = Optimize[spelled_no_tag @ del_letter];
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/spoken_punct.grm
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/spoken_punct.grm
@ -0,0 +1,24 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import 'ru/verbalizer/lexical_map.grm' as l;
 punct =
   ("." : "@@PERIOD@@")
 | ("," : "@@COMMA@@")
 | ("!" : "@@EXCLAMATION_MARK@@")
 | ("?" : "@@QUESTION_MARK@@")
 ;
 export SPOKEN_PUNCT = Optimize[punct @ l.LEXICAL_MAP];
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/time.grm
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/time.grm
@ -0,0 +1,108 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import 'util/byte.grm' as b;
 import 'ru/verbalizer/lexical_map.grm' as l;
 import 'ru/verbalizer/numbers.grm' as n;
 # Only handles 24-hour time with quarter-to, half-past and quarter-past.
 increment_hour =
    ("0" : "1")
  | ("1" : "2")
  | ("2" : "3")
  | ("3" : "4")
  | ("4" : "5")
  | ("5" : "6")
  | ("6" : "7")
  | ("7" : "8")
  | ("8" : "9")
  | ("9" : "10")
  | ("10" : "11")
  | ("11" : "12")
  | ("12" : "1")  # If someone uses 12, we assume 12-hour by default.
  | ("13" : "14")
  | ("14" : "15")
  | ("15" : "16")
  | ("16" : "17")
  | ("17" : "18")
  | ("18" : "19")
  | ("19" : "20")
  | ("20" : "21")
  | ("21" : "22")
  | ("22" : "23")
  | ("23" : "12")
 ;
 hours = Project[increment_hour, 'input'];
 d = b.kDigit;
 D = d - "0";
 minutes09 = "0" D;
 minutes = ("1" | "2" | "3" | "4" | "5") d;
 __sep__ = ":";
 sep_space = __sep__ : " ";
 verbalize_hours = hours @ n.CARDINAL_NUMBERS;
 verbalize_minutes =
   ("00" : "@@HOUR@@")
 | (minutes09 @ (("0" : "@@TIME_ZERO@@") n.I[" "] n.CARDINAL_NUMBERS))
 | (minutes @ n.CARDINAL_NUMBERS)
 ;
 time_basic = Optimize[verbalize_hours sep_space verbalize_minutes];
 # Special cases we handle right now.
 # TODO: Need to allow for cases like
 #
 #   half twelve (in the UK English sense)
 #   half twaalf (in the Dutch sense)
 time_quarter_past =
   n.I["@@TIME_QUARTER@@ @@TIME_AFTER@@ "]
   verbalize_hours
   n.D[__sep__ "15"];
 time_half_past =
   n.I["@@TIME_HALF@@ @@TIME_AFTER@@ "]
   verbalize_hours
   n.D[__sep__ "30"];
 time_quarter_to =
   n.I["@@TIME_QUARTER@@ @@TIME_BEFORE@@ "]
   (increment_hour @ verbalize_hours)
   n.D[__sep__ "45"];
 time_extra = Optimize[
  time_quarter_past | time_half_past | time_quarter_to]
 ;
 # Basic time periods which most languages can be expected to have.
 __am__ = "a.m." | "am" | "AM" | "утра";
 __pm__ = "p.m." | "pm" | "PM" | "вечера";
 period = (__am__ : "@@TIME_AM@@") | (__pm__ : "@@TIME_PM@@");
 time_variants = time_basic | time_extra;
 time = Optimize[
    (period (" " | n.I[" "]))? time_variants
 |  time_variants ((" " | n.I[" "]) period)?]
 ;
 export TIME = Optimize[time @ l.LEXICAL_MAP];
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/urls.grm
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/urls.grm
@ -0,0 +1,68 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
 # Rules for URLs and email addresses.
 import 'util/byte.grm' as bytelib;
 import 'ru/verbalizer/lexical_map.grm' as l;
 ins_space = "" : " ";
 dot = "." : "@@URL_DOT_EXPRESSION@@";
 at = "@" : "@@AT@@";
 url_suffix =
  (".com" : dot ins_space "com") |
  (".gov" : dot ins_space "gov") |
  (".edu" : dot ins_space "e d u") |
  (".org" : dot ins_space "org") |
  (".net" : dot ins_space "net")
 ;
 letter_string = (bytelib.kAlnum)* bytelib.kAlnum;
 letter_string_dot =
  ((letter_string ins_space dot ins_space)* letter_string)
 ;
 # Rules for URLs.
 export URL = Optimize[
 ((letter_string_dot) (ins_space)
  (url_suffix)) @ l.LEXICAL_MAP
 ];
 # Rules for email addresses.
 letter_by_letter = ((bytelib.kAlnum ins_space)* bytelib.kAlnum);
 letter_by_letter_dot =
  ((letter_by_letter ins_space dot ins_space)*
  letter_by_letter)
 ;
 export EMAIL1 = Optimize[
 ((letter_by_letter) (ins_space)
  (at) (ins_space)
  (letter_by_letter_dot) (ins_space)
  (url_suffix)) @ l.LEXICAL_MAP
 ];
 export EMAIL2 = Optimize[
 ((letter_by_letter) (ins_space)
  (at) (ins_space)
  (letter_string_dot) (ins_space)
  (url_suffix)) @ l.LEXICAL_MAP
 ];
 export EMAILS = Optimize[
  EMAIL1 | EMAIL2
 ];
--- a/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/verbalizer.grm
+++ b/third_party/chinese_text_normalization/thrax/src/ru/verbalizer/verbalizer.grm
@ -0,0 +1,42 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import 'util/util.grm' as util;
 import 'ru/verbalizer/extra_numbers.grm' as e;
 import 'ru/verbalizer/float.grm' as f;
 import 'ru/verbalizer/math.grm' as ma;
 import 'ru/verbalizer/miscellaneous.grm' as mi;
 import 'ru/verbalizer/money.grm' as mo;
 import 'ru/verbalizer/numbers.grm' as n;
 import 'ru/verbalizer/numbers_plus.grm' as np;
 import 'ru/verbalizer/spelled.grm' as s;
 import 'ru/verbalizer/spoken_punct.grm' as sp;
 import 'ru/verbalizer/time.grm' as t;
 import 'ru/verbalizer/urls.grm' as u;
 export VERBALIZER = Optimize[RmWeight[
 (  e.MIXED_NUMBERS
  | e.DIGITS
  | f.FLOAT
  | ma.ARITHMETIC
  | mi.MISCELLANEOUS
  | mo.MONEY
  | n.CARDINAL_NUMBERS
  | n.ORDINAL_NUMBERS
  | np.NUMBERS_PLUS
  | s.SPELLED
  | sp.SPOKEN_PUNCT
  | t.TIME
  | u.URL) @ util.CLEAN_SPACES
 ]];
--- a/third_party/chinese_text_normalization/thrax/src/universal/README.md
+++ b/third_party/chinese_text_normalization/thrax/src/universal/README.md
@ -0,0 +1,3 @@
 # Language-universal grammar definitions
 This directory contains various language-universal grammar definitions.
--- a/third_party/chinese_text_normalization/thrax/src/universal/roman_numerals.tsv
+++ b/third_party/chinese_text_normalization/thrax/src/universal/roman_numerals.tsv
@ -0,0 +1,91 @@
 i	1
 ii	2
 iii	3
 iv	4
 v	5
 vi	6
 vii	7
 viii	8
 ix	9
 x	10
 xi	11
 xii	12
 xiii	13
 xiv	14
 xv	15
 xvi	16
 xvii	17
 xviii	18
 xix	19
 xx	20
 xxi	21
 xxii	22
 xxiii	23
 xxiv	24
 xxv	25
 xxvi	26
 xxvii	27
 xxviii	28
 xxix	29
 xxx	30
 xxxi	31
 xxxii	32
 xxxiii	33
 xxxiv	34
 xxxv	35
 xxxvi	36
 xxxvii	37
 xxxviii	38
 xxxix	39
 xl	40
 xli	41
 xlii	42
 xliii	43
 xliv	44
 xlv	45
 xlvi	46
 xlvii	47
 xlviii	48
 xlix	49
 mcmxciv	1994
 mcmxcv	1995
 mcmxcvi	1996
 mcmxcvii	1997
 mcmxcviii	1998
 mcmxcix	1999
 mm	2000
 mmi	2001
 mmii	2002
 mmiii	2003
 mmiv	2004
 mmv	2005
 mmvi	2006
 mmvii	2007
 mmviii	2008
 mmix	2009
 mmx	2010
 mmxi	2011
 mmxii	2012
 mmxiii	2013
 mmxiv	2014
 mmxv	2015
 mmxvi	2016
 mmxvii	2017
 mmxviii	2018
 mmxix	2019
 mmxx	2020
 mmxxi	2021
 mmxxii	2022
 mmxxiii	2023
 mmxxiv	2024
 mmxxv	2025
 mmxxvi	2026
 mmxxvii	2027
 mmxxviii	2028
 mmxxix	2029
 mmxxx	2030
 mmxxxi	2031
 mmxxxii	2032
 mmxxxiii	2033
 mmxxxiv	2034
 mmxxxv	2035
--- a/third_party/chinese_text_normalization/thrax/src/universal/thousands_punct.grm
+++ b/third_party/chinese_text_normalization/thrax/src/universal/thousands_punct.grm
@ -0,0 +1,126 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
 # Specifies common ways of delimiting thousands in digit strings.
 import 'util/byte.grm' as bytelib;
 import 'util/util.grm' as util;
 killcomma = "," : "";
 dot2comma = "." : ",";
 spaces2comma = " "+ : ",";
 zero = "0";
 # no_delimiter = zero | "[1-9][0-9]*";
 export no_delimiter = zero | (util.d1to9 bytelib.kDigit*);
 # delim_map_dot = ("[0-9]" | ("\." : ","))*;
 delim_map_dot = (bytelib.kDigit | dot2comma)*;
 # delim_map_space = ("[0-9]" | (" +" : ","))*;
 delim_map_space = (bytelib.kDigit | spaces2comma)*;
 ## Western systems group thousands. Korean goes this way too.
 # comma_thousands = zero | ("[1-9][0-9]?[0-9]?" (("," : "") "[0-9][0-9][0-9]")*);
 export comma_thousands = zero | (util.d1to9 bytelib.kDigit{0,2} (killcomma bytelib.kDigit{3})*);
 # ComposeFst: 1st argument cannot match on output labels and 2nd argument
 # cannot match on input labels (sort?).
 export dot_thousands = delim_map_dot @ comma_thousands;
 # ComposeFst: 1st argument cannot match on output labels and 2nd argument
 # cannot match on input labels (sort?).
 export space_thousands = delim_map_space @ comma_thousands;
 ## Chinese prefers grouping by fours (by ten-thousands).
 # chinese_comma =
 #   zero | ("[1-9][0-9]?[0-9]?[0-9]?" (("," : "") "[0-9][0-9][0-9][0-9]")*);
 export chinese_comma = zero | (util.d1to9 (bytelib.kDigit{0,3}) (killcomma bytelib.kDigit{4})*);
 ## The Indian system is more complex because of the Stravinskian alternation
 ## between lakhs and crores.
 ##
 ## According to Wikipedia:
 ##
 ## Indian English       Value
 ## One                  1
 ## Ten                  10
 ## Hundred              100
 ## Thousand             1,000
 ## Lakh                 1,00,000
 ## Crore                1,00,00,000
 ## Arab                 1,00,00,00,000
 ## Kharab               1,00,00,00,00,000
 # indian_hundreds = "[1-9][0-9]?[0-9]?";
 indian_hundreds = util.d1to9 bytelib.kDigit{0,2};
 ## Up to 99,999.
 # indian_comma_thousands = "[1-9][0-9]?" ("," : "") "[0-9][0-9][0-9]";
 indian_comma_thousands = util.d1to9 bytelib.kDigit? killcomma bytelib.kDigit{3};
 ## Up to 99,99,999.
 # indian_comma_lakhs = "[1-9][0-9]?" ("," : "") "[0-9][0-9]" ("," : "") "[0-9][0-9][0-9]";
 indian_comma_lakhs = util.d1to9 bytelib.kDigit? killcomma bytelib.kDigit{2} killcomma bytelib.kDigit{3};
 ## Up to 999,99,99,999
 indian_comma_crores =
    util.d1to9 bytelib.kDigit? bytelib.kDigit? killcomma
    (bytelib.kDigit{2} killcomma)?
    bytelib.kDigit{2} killcomma
    bytelib.kDigit{3}
 ;
 ## Up to 99,999,99,99,999.
 indian_comma_thousand_crores =
    util.d1to9 bytelib.kDigit? killcomma
    bytelib.kDigit{3} killcomma
    bytelib.kDigit{2} killcomma
    bytelib.kDigit{2} killcomma
    bytelib.kDigit{3}
 ;
 ## Up to 999,99,999,99,99,999.
 indian_comma_lakh_crores =
    util.d1to9 bytelib.kDigit? bytelib.kDigit? killcomma
    bytelib.kDigit{2} killcomma
    bytelib.kDigit{3} killcomma
    bytelib.kDigit{2} killcomma
    bytelib.kDigit{2} killcomma
    bytelib.kDigit{3}
 ;
 export indian_comma =
    zero
  | indian_hundreds
  | indian_comma_thousands
  | indian_comma_lakhs
  | indian_comma_crores
  | indian_comma_thousand_crores
  | indian_comma_lakh_crores
 ;
 # Indian number system with dots.
 export indian_dot_number = delim_map_dot @ indian_comma;
 # Indian number system with spaces.
 export indian_space_number = delim_map_space @ indian_comma;
--- a/third_party/chinese_text_normalization/thrax/src/util/README.md
+++ b/third_party/chinese_text_normalization/thrax/src/util/README.md
@ -0,0 +1,3 @@
 # Utility grammar definitions
 This directory contains various utility grammar definitions.
--- a/third_party/chinese_text_normalization/thrax/src/util/arithmetic.grm
+++ b/third_party/chinese_text_normalization/thrax/src/util/arithmetic.grm
@ -0,0 +1,326 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
 # Basic arithmetic on S-expressions. Exported arithmetic transducers may either:
 #
 # * Support weak vigesimal addition and multiplication...
 #
 #   (+ 20 17 +) -> 37
 #   (+ 20 10 7 +) -> 37
 #   (* 4 20 *) -> 80
 #
 #   ...or not.
 #
 # * Support "Germanic decade flop" addition....
 #
 #   (+ 8 20 +) -> 28
 #   (+ 4 60 +) -> 64
 #
 #   ...or not.
 #
 # * Support multiplication where the left-hand side multiplicand is of a higher
 #   order than the right-hand side multiplicand.
 #
 #   (* 1000 100) -> 100000
 #
 #   ...or not.
 #
 # However, modulo these exceptions, arithmetic transducers do not support
 # addition that requires "carrying", or multiplication where the right-hand
 # side multiplicand is not a power of ten. So this is not a *generic*
 # S-expression evaluator.
 #
 # LEAVES is a transducer that accepts symbols in delta but deletes symbols
 # in sigma - delta. So it essentially removes markup.
 #
 # REPEAT_FILTER is an acceptor which blocks derivations of the form
 #
 #   (+ (* 50 1000 *) (* 4 1000) ...)   "fifty thousand four thousand..."
 #
 # in languages where that is not licensed.
 import 'util/byte.grm' as b;
 # Deleter FST.
 func D[expr] {
  return expr : "";
 }
 delta = b.kDigit;
 sigma = delta | " " | "(" | ")" | "+" | "*";
 sigmastar = sigma*;
 deltastar = delta*;
 rparen = Optimize["+)" | "*)"];
 space_or_rparen = Optimize[" " | rparen];
 ## Multiplication.
 # Generic multiplication where the RHS is a power of ten.
 del_one = Optimize[delta+ D[" 1"] "0"+];
 test1_1 = AssertEqual["2 10"      @ del_one,      "20"];
 test1_2 = AssertEqual["20 10"     @ del_one,     "200"];
 test1_3 = AssertEqual["2 100"     @ del_one,     "200"];
 test1_4 = AssertEqual["20 100"    @ del_one,    "2000"];
 test1_5 = AssertEqual["200 100"   @ del_one,   "20000"];
 test1_6 = AssertEqual["2 1000"    @ del_one,    "2000"];
 test1_7 = AssertEqual["20 1000"   @ del_one,   "20000"];
 test1_8 = AssertEqual["200 1000"  @ del_one,  "200000"];
 test1_9 = AssertEqual["2000 1000" @ del_one, "2000000"];
 # Generic multiplication where the RHS is a power of ten and the LHS has fewer
 # trailing zeros than the RHS.
 del_one_restricted = Optimize[ # e.g., "2 x 10", "2 x 100", etc.
                               delta      D[" 1"]        "0"+ |
                               # e.g., "20 x 100", etc.
                               delta{1,2} D[" 1"] "0"    "0"+ |
                               # e.g., "200" x 1000", etc.
                               delta{2,3} D[" 1"] "0"{2} "0"+ |
                               delta{3,4} D[" 1"] "0"{3} "0"+ |
                               delta{4,5} D[" 1"] "0"{4} "0"+];
 test2_01 = AssertEqual["2 10"     @ del_one_restricted,               "20"];
 test2_02 = AssertNull["20 10"     @ del_one_restricted];
 test2_03 = AssertEqual["2 100"    @ del_one_restricted,              "200"];
 test2_04 = AssertEqual["20 100"   @ del_one_restricted,             "2000"];
 test2_05 = AssertNull[ "200 100"  @ del_one_restricted];
 test2_06 = AssertEqual["2 1000"   @ del_one_restricted,             "2000"];
 test2_07 = AssertEqual["20 1000"  @ del_one_restricted,            "20000"];
 test2_08 = AssertEqual["200 1000" @ del_one_restricted,           "200000"];
 test2_09 = AssertNull["2000 1000" @ del_one_restricted];
 test2_10 = AssertEqual["1000 10000000" @ del_one_restricted, "10000000000"];
 # Multiplication of vigesimal base for weak vigesimal systems
 vigesimal_times_map = ("1" : "2") | ("2" : "4") | ("3" : "6") | ("4" : "8");
 del_two = Optimize[vigesimal_times_map D[" 2"] "0"+];
 test3_1 = AssertEqual["1 20" @ del_two, "20"];
 test3_2 = AssertEqual["2 20" @ del_two, "40"];
 test3_3 = AssertEqual["3 20" @ del_two, "60"];
 test3_4 = AssertEqual["4 20" @ del_two, "80"];
 # Multiplication of vigesimal base restricted to cases where the LHS is [1-4]
 # and the RHS is a power of ten.
 del_two_restricted = Optimize[vigesimal_times_map D[" 2"] "0"+];
 test4_1 = AssertEqual["1 20" @ del_two_restricted, "20"];
 test4_2 = AssertEqual["2 20" @ del_two_restricted, "40"];
 test4_3 = AssertEqual["3 20" @ del_two_restricted, "60"];
 test4_4 = AssertEqual["4 20" @ del_two_restricted, "80"];
 test4_5 = AssertNull["5 20" @ del_two_restricted];
 test4_6 = AssertNull["10 20" @ del_two_restricted];
 products = del_one | del_two;
 products_restricted = del_one_restricted | del_two_restricted;
 multiplication = CDRewrite[D["(* "] products D[" *)"], "", "", sigmastar];
 multiplication_restricted = CDRewrite[D["(* "] products_restricted D[" *)"],
                                      "", "", sigmastar];
 test5_1 = AssertEqual["(* 8 100 *)"    @ multiplication, "800"];
 test5_2 = AssertEqual["(* 1 100 *)"    @ multiplication, "100"];
 test5_3 = AssertEqual["(* 4 20 *)"     @ multiplication, "80"];
 test5_4 = AssertEqual["(* 13 1000 *)"  @ multiplication, "13000"];
 test5_5 = AssertEqual["(* 13000 10 *)" @ multiplication, "130000"];
 test5_6 = AssertEqual["(* 13000 10 *)" @ multiplication_restricted,
                      "(* 13000 10 *)"];  # Can't reduce this.
 ## Addition.
 insum = "+" (sigma - "(")*;
 rcon = insum deltastar;
 # Generic zero deletion up to 12.
 del_zero = Optimize[
   # Handles lone zero inside a plus statement.
   CDRewrite[D[" 0"], rcon, space_or_rparen, sigmastar] @
   # If we need to go any larger, we probably should switch to a PDT.
   CDRewrite[D["0"{12} " "] delta{12}, rcon, space_or_rparen, sigmastar] @
   CDRewrite[D["0"{11} " "] delta{11}, rcon, space_or_rparen, sigmastar] @
   CDRewrite[D["0"{10} " "] delta{10}, rcon, space_or_rparen, sigmastar] @
   CDRewrite[D["0"{9} " "]  delta{9},  rcon, space_or_rparen, sigmastar] @
   CDRewrite[D["0"{8} " "]  delta{8},  rcon, space_or_rparen, sigmastar] @
   CDRewrite[D["0"{7} " "]  delta{7},  rcon, space_or_rparen, sigmastar] @
   CDRewrite[D["0"{6} " "]  delta{6},  rcon, space_or_rparen, sigmastar] @
   CDRewrite[D["0"{5} " "]  delta{5},  rcon, space_or_rparen, sigmastar] @
   CDRewrite[D["0"{4} " "]  delta{4},  rcon, space_or_rparen, sigmastar] @
   CDRewrite[D["0"{3} " "]  delta{3},  rcon, space_or_rparen, sigmastar] @
   CDRewrite[D["0"{2} " "]  delta{2},  rcon, space_or_rparen, sigmastar] @
   CDRewrite[D["0" " "]     delta,     rcon, space_or_rparen, sigmastar]];
 ## Weak vigesimal cases involving scores and teens.
 vigesimal_plus_map = Optimize[("20 1" : "3") delta |
                              ("40 1" : "5") delta |
                              ("60 1" : "7") delta |
                              ("80 1" : "9") delta];
 vigesimal = CDRewrite[vigesimal_plus_map, insum, space_or_rparen, sigmastar];
 ## Germanic decade flop.
 germanic_map = StringFile['util/germanic.tsv'];
 germanic = CDRewrite[germanic_map, insum, space_or_rparen, sigmastar];
 sums = Optimize[germanic @ vigesimal @ del_zero];
 # Deletes the surrounding "(+ +)" around a successful reduction.
 del_plus = CDRewrite[D["(+ "] delta+ D[" +)"], "", "", sigmastar];
 addition = Optimize[sums @ del_plus];
 test6_1 = AssertEqual["(+ 30 2 +)" @ addition, "32"];
 test6_2 = AssertEqual["(+ 300 20 1 +)" @ addition, "321"];
 test6_3 = AssertEqual["(+ 80 17 +)" @ addition, "97"];
 test6_4 = AssertEqual["(+ 4 50 +)" @ addition, "54"];
 test6_5 = AssertEqual["(+ 3000 80 17 +)" @ addition, "3097"];
 test6_6 = AssertEqual["(+ 3000 4 50 +)" @ addition, "3054"];
 test6_7 = AssertEqual["(+ 0 10 +)" @ addition, "10"];
 test6_8 = AssertEqual["(+ 0 20 +)" @ addition, "20"];
 test6_9 = AssertEqual["(+ 200 (+ 0 20 +) +)" @ addition @ addition, "220"];
 ## Export statements.
 export ARITHMETIC = Optimize[multiplication @ addition];
 export ARITHMETIC_RESTRICTED = Optimize[multiplication_restricted @ addition];
 # Lightweight versions that lack the vigesimal /vɪˈdʒɛsɪməl/ or Germanic decade
 # flop, or both.
 export ARITHMETIC_BASIC = Optimize[multiplication @ del_zero @ del_plus];
 export ARITHMETIC_BASIC_RESTRICTED = Optimize[multiplication_restricted @
                                              del_zero @ del_plus];
 export ARITHMETIC_GERMANIC = Optimize[multiplication @ germanic @ del_zero @
                                      del_plus];
 export ARITHMETIC_GERMANIC_RESTRICTED = Optimize[multiplication_restricted @
                                                 germanic @ del_zero @
                                                 del_plus];
 export ARITHMETIC_VIGESIMAL = Optimize[multiplication @ vigesimal @ del_zero @
                                       del_plus];
 export ARITHMETIC_VIGESIMAL_RESTRICTED = Optimize[multiplication_restricted @
                                                  vigesimal @ del_zero @
                                                  del_plus];
 ## LEAVES transducer.
 nonterm = "+" | "*";
 export LEAVES = Optimize[CDRewrite["(" nonterm " " | " " nonterm ")" : "",
                                   "", "", sigmastar]];
 test7 = AssertEqual["(* (+ (* 4 20 *) 10 7 +) 1000 *)" @ LEAVES,
                    "4 20 10 7 1000"];
 ## Optional filter for repeated large powers of ten, to be applied to leaves.
 func Filter[expr, sigstar] {
  return Optimize[sigstar - (sigstar expr sigstar)];
 }
 func FilterMoreThanOne[expr, sigstar] {
  return Filter[expr " " (sigstar " ")? expr, sigstar];
 }
 filter_sigstar = (delta | " ")*;
 export REPEAT_FILTER =
  Optimize[FilterMoreThanOne["1000", filter_sigstar] @
           FilterMoreThanOne["10000", filter_sigstar] @
           FilterMoreThanOne["100000", filter_sigstar] @
           FilterMoreThanOne["1000000", filter_sigstar] @
           FilterMoreThanOne["1000000000", filter_sigstar] @
           FilterMoreThanOne["1000000000000", filter_sigstar]];
 test8_1 = AssertNull["50 1000 4 1000" @ REPEAT_FILTER];
 test8_2 = AssertNull["50 1000000 4 1000000" @ REPEAT_FILTER];
 test8_3 = AssertEqual["50 100 1000" @ REPEAT_FILTER, "50 100 1000"];
 test8_4 = AssertNull["20 1000 1000 20" @ REPEAT_FILTER];
 test8_5 = AssertEqual[
    "70 1000000 400 0 70 0 7 1000 100 0 70" @ REPEAT_FILTER,
    "70 1000000 400 0 70 0 7 1000 100 0 70" @ REPEAT_FILTER];
 test8_6 = AssertNull[
    "70 1000000 400 0 70 1000 0 7 1000 100 0 70" @ REPEAT_FILTER];
 # Filters to force the output of *inverting* the arithmetic as applied to a
 # digit string to be a well-formed sexpr:
 not_space = b.kNotSpace;
 # Things like (+ 1 +)(+ 9 +).
 bad_parens  =
     sigmastar ")" not_space sigmastar
 |   sigmastar not_space "("  sigmastar
 ;
 no_bad_parens = sigmastar - bad_parens;
 # Things like (+ 1 +) or (* 3 *).
 spurious_operators =
    sigmastar "(+ " delta+ " +)" sigmastar
  | sigmastar "(* " delta+ " *)" sigmastar
 ;
 no_spurious_operators = sigmastar - spurious_operators;
 no_strings_of_zeros =
  sigmastar - (sigmastar " " "0"+ " " "0"+ " " sigmastar)
 ;
 no_bad_sequences =
  Optimize[no_bad_parens @ no_strings_of_zeros]
 ;
 export SEXP_FILTER = Optimize[
 (  delta+
  | "(* " no_bad_sequences " *)"
  | "(+ " no_bad_sequences " +)") @ no_spurious_operators]
 ;
 # For convenience adds inverses of the arithmetic rules:
 export IARITHMETIC = Invert[ARITHMETIC];
 export IARITHMETIC_RESTRICTED = Invert[ARITHMETIC_RESTRICTED];
 export IARITHMETIC_BASIC = Invert[ARITHMETIC_BASIC];
 export IARITHMETIC_BASIC_RESTRICTED = Invert[ARITHMETIC_BASIC_RESTRICTED];
 export IARITHMETIC_GERMANIC = Invert[ARITHMETIC_GERMANIC];
 export IARITHMETIC_GERMANIC_RESTRICTED =
  Invert[ARITHMETIC_GERMANIC_RESTRICTED]
 ;
 export IARITHMETIC_VIGESIMAL = Invert[ARITHMETIC_VIGESIMAL];
 export IARITHMETIC_VIGESIMAL_RESTRICTED =
    Invert[ARITHMETIC_VIGESIMAL_RESTRICTED]
 ;
 ## This should be applied on the lefthand side of FG to ensure that the only
 ## digit input nis permitted.
 export DELTA_STAR = deltastar;
--- a/third_party/chinese_text_normalization/thrax/src/util/byte.grm
+++ b/third_party/chinese_text_normalization/thrax/src/util/byte.grm
@ -0,0 +1,75 @@
 # Copyright 2017 Google Inc.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 # 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
 # Standard constants for ASCII (byte) based strings.  This mirrors the
 # functions provided by C/C++'s ctype.h library.
 # Note that [0] is missing; matching the string-termination character is kinda weird.
 export kBytes = Optimize[
  "[1]" |   "[2]" |   "[3]" |   "[4]" |   "[5]" |   "[6]" |   "[7]" |   "[8]" |   "[9]" |  "[10]" |
 "[11]" |  "[12]" |  "[13]" |  "[14]" |  "[15]" |  "[16]" |  "[17]" |  "[18]" |  "[19]" |  "[20]" |
 "[21]" |  "[22]" |  "[23]" |  "[24]" |  "[25]" |  "[26]" |  "[27]" |  "[28]" |  "[29]" |  "[30]" |
 "[31]" |  "[32]" |  "[33]" |  "[34]" |  "[35]" |  "[36]" |  "[37]" |  "[38]" |  "[39]" |  "[40]" |
 "[41]" |  "[42]" |  "[43]" |  "[44]" |  "[45]" |  "[46]" |  "[47]" |  "[48]" |  "[49]" |  "[50]" |
 "[51]" |  "[52]" |  "[53]" |  "[54]" |  "[55]" |  "[56]" |  "[57]" |  "[58]" |  "[59]" |  "[60]" |
 "[61]" |  "[62]" |  "[63]" |  "[64]" |  "[65]" |  "[66]" |  "[67]" |  "[68]" |  "[69]" |  "[70]" |
 "[71]" |  "[72]" |  "[73]" |  "[74]" |  "[75]" |  "[76]" |  "[77]" |  "[78]" |  "[79]" |  "[80]" |
 "[81]" |  "[82]" |  "[83]" |  "[84]" |  "[85]" |  "[86]" |  "[87]" |  "[88]" |  "[89]" |  "[90]" |
 "[91]" |  "[92]" |  "[93]" |  "[94]" |  "[95]" |  "[96]" |  "[97]" |  "[98]" |  "[99]" | "[100]" |
 "[101]" | "[102]" | "[103]" | "[104]" | "[105]" | "[106]" | "[107]" | "[108]" | "[109]" | "[110]" |
 "[111]" | "[112]" | "[113]" | "[114]" | "[115]" | "[116]" | "[117]" | "[118]" | "[119]" | "[120]" |
 "[121]" | "[122]" | "[123]" | "[124]" | "[125]" | "[126]" | "[127]" | "[128]" | "[129]" | "[130]" |
 "[131]" | "[132]" | "[133]" | "[134]" | "[135]" | "[136]" | "[137]" | "[138]" | "[139]" | "[140]" |
 "[141]" | "[142]" | "[143]" | "[144]" | "[145]" | "[146]" | "[147]" | "[148]" | "[149]" | "[150]" |
 "[151]" | "[152]" | "[153]" | "[154]" | "[155]" | "[156]" | "[157]" | "[158]" | "[159]" | "[160]" |
 "[161]" | "[162]" | "[163]" | "[164]" | "[165]" | "[166]" | "[167]" | "[168]" | "[169]" | "[170]" |
 "[171]" | "[172]" | "[173]" | "[174]" | "[175]" | "[176]" | "[177]" | "[178]" | "[179]" | "[180]" |
 "[181]" | "[182]" | "[183]" | "[184]" | "[185]" | "[186]" | "[187]" | "[188]" | "[189]" | "[190]" |
 "[191]" | "[192]" | "[193]" | "[194]" | "[195]" | "[196]" | "[197]" | "[198]" | "[199]" | "[200]" |
 "[201]" | "[202]" | "[203]" | "[204]" | "[205]" | "[206]" | "[207]" | "[208]" | "[209]" | "[210]" |
 "[211]" | "[212]" | "[213]" | "[214]" | "[215]" | "[216]" | "[217]" | "[218]" | "[219]" | "[220]" |
 "[221]" | "[222]" | "[223]" | "[224]" | "[225]" | "[226]" | "[227]" | "[228]" | "[229]" | "[230]" |
 "[231]" | "[232]" | "[233]" | "[234]" | "[235]" | "[236]" | "[237]" | "[238]" | "[239]" | "[240]" |
 "[241]" | "[242]" | "[243]" | "[244]" | "[245]" | "[246]" | "[247]" | "[248]" | "[249]" | "[250]" |
 "[251]" | "[252]" | "[253]" | "[254]" | "[255]"
 ];
 export kDigit = Optimize[
    "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
 ];
 export kLower = Optimize[
    "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" |
    "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
 ];
 export kUpper = Optimize[
    "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" |
    "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
 ];
 export kAlpha = Optimize[kLower | kUpper];
 export kAlnum = Optimize[kDigit | kAlpha];
 export kSpace = Optimize[
    " " | "\t" | "\n" | "\r"
 ];
 export kNotSpace = Optimize[kBytes - kSpace];
 export kPunct = Optimize[
    "!" | "\"" | "#" | "$" | "%" | "&" | "'" | "(" | ")" | "*" | "+" | "," |
    "-" | "." | "/" | ":" | ";" | "<" | "=" | ">" | "?" | "@" | "\[" | "\\" |
    "\]" | "^" | "_" | "`" | "{" | "|" | "}" | "~"
 ];
 export kGraph = Optimize[kAlnum | kPunct];
--- a/Show More
+++ b/Show More
`@ -52,4 +52,4 @@ DeepSpeech is provided under the [Apache-2.0 License](./LICENSE).`

	`## Acknowledgement`	`## Acknowledgement`

	`We depends on many open source repos. See [References](doc/src/reference.md) for more information.`	`We depends on many open source repos. See [References](doc/src/reference.md) for more information.`
`@ -50,4 +50,4 @@ DeepSpeech遵循[Apache-2.0开源协议](./LICENSE)。`

	`## 感谢`	`## 感谢`

	`开发中参考一些优秀的仓库，详情参见 [References](doc/src/reference.md)。`	`开发中参考一些优秀的仓库，详情参见 [References](doc/src/reference.md)。`
		`@ -0,0 +1,2 @@`
							`text_correct.txt: https://github.com/shibing624/pycorrector/raw/master/tests/test_file.txt`
							`custom_confusion.txt: https://github.com/shibing624/pycorrector/raw/master/tests/custom_confusion.txt`
		`@ -0,0 +1,3 @@`
							`verbalizer.far: verbalizer.grm util/util.far en/verbalizer/extra_numbers.far en/verbalizer/float.far en/verbalizer/math.far en/verbalizer/miscellaneous.far en/verbalizer/money.far en/verbalizer/numbers.far en/verbalizer/numbers_plus.far en/verbalizer/spelled.far en/verbalizer/spoken_punct.far en/verbalizer/time.far en/verbalizer/urls.far`
							`thraxcompiler --input_grammar=$< --output_far=$@`
		`@ -0,0 +1,3 @@`
							`# Language-universal grammar definitions`

							`This directory contains various language-universal grammar definitions.`
		`@ -0,0 +1,3 @@`
							`# Utility grammar definitions`

							`This directory contains various utility grammar definitions.`