You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
PaddleSpeech/third_party/python-pinyin/pinyin-data/GBK_PUA.txt

83 lines
3.2 KiB

E2E/Streaming Transformer/Conformer ASR (#578) * add cmvn and label smoothing loss layer * add layer for transformer * add glu and conformer conv * add torch compatiable hack, mask funcs * not hack size since it exists * add test; attention * add attention, common utils, hack paddle * add audio utils * conformer batch padding mask bug fix #223 * fix typo, python infer fix rnn mem opt name error and batchnorm1d, will be available at 2.0.2 * fix ci * fix ci * add encoder * refactor egs * add decoder * refactor ctc, add ctc align, refactor ckpt, add warmup lr scheduler, cmvn utils * refactor docs * add fix * fix readme * fix bugs, refactor collator, add pad_sequence, fix ckpt bugs * fix docstring * refactor data feed order * add u2 model * refactor cmvn, test * add utils * add u2 config * fix bugs * fix bugs * fix autograd maybe has problem when using inplace operation * refactor data, build vocab; add format data * fix text featurizer * refactor build vocab * add fbank, refactor feature of speech * refactor audio feat * refactor data preprare * refactor data * model init from config * add u2 bins * flake8 * can train * fix bugs, add coverage, add scripts * test can run * fix data * speed perturb with sox * add spec aug * fix for train * fix train logitc * fix logger * log valid loss, time dataset process * using np for speed perturb, remove some debug log of grad clip * fix logger * fix build vocab * fix logger name * using module logger as default * fix * fix install * reorder imports * fix board logger * fix logger * kaldi fbank and mfcc * fix cmvn and print prarams * fix add_eos_sos and cmvn * fix cmvn compute * fix logger and cmvn * fix subsampling, label smoothing loss, remove useless * add notebook test * fix log * fix tb logger * multi gpu valid * fix log * fix log * fix config * fix compute cmvn, need paddle 2.1 * add cmvn notebook * fix layer tools * fix compute cmvn * add rtf * fix decoding * fix layer tools * fix log, add avg script * more avg and test info * fix dataset pickle problem; using 2.1 paddle; num_workers can > 0; ckpt save in exp dir;fix setup.sh; * add vimrc * refactor tiny script, add transformer and stream conf * spm demo; librisppech scripts and confs * fix log * add librispeech scripts * refactor data pipe; fix conf; fix u2 default params * fix bugs * refactor aishell scripts * fix test * fix cmvn * fix s0 scripts * fix ds2 scripts and bugs * fix dev & test dataset filter * fix dataset filter * filter dev * fix ckpt path * filter test, since librispeech will cause OOM, but all test wer will be worse, since mismatch train with test * add comment * add syllable doc * fix ds2 configs * add doc * add pypinyin tools * fix decoder using blank_id=0 * mmseg with pybind11 * format code
3 years ago
# GBK/GB 18030 PUA 映射
# 详见https://zh.wikipedia.org/wiki/GB_18030#PUA
# U+E815: #  Unihan: U+2E81 ⺁
U+E816: zuǒ #  Unihan: U+20087 𠂇
# U+E817: #  Unihan: U+20089 𠂉
U+E818: gǔn #  Unihan: U+200CC 𠃌
# U+E819: #  Unihan: U+2E84 ⺄
U+E81A: zhòu,zhū #  Unihan: U+3473 㑳
U+E81B: zhòu #  Unihan: U+3447 㑇
# U+E81C: #  Unihan: U+2E88 ⺈
# U+E81D: #  Unihan: U+2E8B ⺋
# U+E81E: #  Unihan: U+9FB4 龴
U+E81F: wāi #  Unihan: U+359E 㖞
U+E820: hǎn #  Unihan: U+361A 㘚
U+E821: hǎn #  Unihan: U+360E 㘎
# U+E822: #  Unihan: U+2E8C ⺌
# U+E823: #  Unihan: U+2E97 ⺗
U+E824: zhòu,chǎo #  Unihan: U+396E 㥮
U+E825: zhòu #  Unihan: U+3918 㤘
# U+E826: #  Unihan: U+9FB5 龵
U+E827: gāng #  Unihan: U+39CF 㧏
U+E828: kuǎi #  Unihan: U+39DF 㧟
U+E829: sǒng #  Unihan: U+3A73 㩳
U+E82A: sǒng #  Unihan: U+39D0 㧐
# U+E82B: #  Unihan: U+9FB6 龶
# U+E82C: #  Unihan: U+9FB7 龷
U+E82D: gāng #  Unihan: U+3B4E 㭎
U+E82E: kuài #  Unihan: U+3C6E 㱮
U+E82F: tà #  Unihan: U+3CE0 㳠
# U+E830: #  Unihan: U+2EA7 ⺧
U+E831: pěng #  Unihan: U+215D7 𡗗
# U+E832: #  Unihan: U+9FB8 龸
# U+E833: #  Unihan: U+2EAA ⺪
U+E834: lōu #  Unihan: U+4056 䁖
U+E835: cǎn #  Unihan: U+415F 䅟
# U+E836: #  Unihan: U+2EAE ⺮
U+E837: chōu,chóu #  Unihan: U+4337 䌷
# U+E838: #  Unihan: U+2EB3 ⺳
# U+E839: #  Unihan: U+2EB6 ⺶
# U+E83A: #  Unihan: U+2EB7 ⺷
U+E83B: zāi #  Unihan: U+2298F 𢦏
U+E83C: bà,bēi #  Unihan: U+43B1 䎱
U+E83D: bà #  Unihan: U+43AC 䎬
# U+E83E: #  Unihan: U+2EBB ⺻
U+E83F: zhuān #  Unihan: U+43DD 䏝
U+E840: qióng #  Unihan: U+44D6 䓖
U+E841: kuì,huì #  Unihan: U+4661 䙡
U+E842: kuì #  Unihan: U+464C 䙌
# U+E843: #  Unihan: U+9FB9 龹
U+E844: xīn #  Unihan: U+4723 䜣
U+E845: yàn #  Unihan: U+4729 䜩
U+E846: jìng,qíng #  Unihan: U+477C 䝼
U+E847: qíng #  Unihan: U+478D 䞍
# U+E848: #  Unihan: U+2ECA ⻊
U+E849: shàn #  Unihan: U+4947 䥇
U+E84A: yé #  Unihan: U+497A 䥺
U+E84B: pō #  Unihan: U+497D 䥽
U+E84C: shàn #  Unihan: U+4982 䦂
U+E84D: zhuō #  Unihan: U+4983 䦃
U+E84E: shàn #  Unihan: U+4985 䦅
U+E84F: jué #  Unihan: U+4986 䦆
U+E850: wěn,chuài #  Unihan: U+499F 䦟
U+E851: zhèng #  Unihan: U+499B 䦛
U+E852: chuài #  Unihan: U+49B7 䦷
U+E853: zhèng #  Unihan: U+49B6 䦶
# U+E854: #  Unihan: U+9FBA 龺
U+E855: yíng #  Unihan: U+241FE 𤇾
U+E856: yú #  Unihan: U+4CA3 䲣
U+E857: yìn #  Unihan: U+4C9F 䲟
U+E858: chūn #  Unihan: U+4CA0 䲠
U+E859: qiū #  Unihan: U+4CA1 䲡
U+E85A: yú #  Unihan: U+4C77 䱷
U+E85B: téng #  Unihan: U+4CA2 䲢
U+E85C: shī #  Unihan: U+4D13 䴓
U+E85D: jiāo #  Unihan: U+4D14 䴔
U+E85E: liè #  Unihan: U+4D15 䴕
U+E85F: jīng #  Unihan: U+4D16 䴖
U+E860: jú #  Unihan: U+4D17 䴗
U+E861: tī #  Unihan: U+4D18 䴘
U+E862: pì #  Unihan: U+4D19 䴙
U+E863: yǎn #  Unihan: U+4DAE 䶮
# U+E864: #  Unihan: U+9FBB 龻