PaddleSpeech

History

Hui Zhang 538bf271eb chinese char/word ngram lm (#613 ) * add ngram lm egs * add zhon repo * install kenlm, zhon * format * add chinese_text_normalization repo * add ngram lm egs		3 years ago
..
docs	chinese char/word ngram lm (#613 )	3 years ago
tests	chinese char/word ngram lm (#613 )	3 years ago
zhon	chinese char/word ngram lm (#613 )	3 years ago
.gitignore	chinese char/word ngram lm (#613 )	3 years ago
.travis.yml	chinese char/word ngram lm (#613 )	3 years ago
AUTHORS.rst	chinese char/word ngram lm (#613 )	3 years ago
CHANGES.rst	chinese char/word ngram lm (#613 )	3 years ago
CONTRIBUTING.rst	chinese char/word ngram lm (#613 )	3 years ago
LICENSE.txt	chinese char/word ngram lm (#613 )	3 years ago
MANIFEST.in	chinese char/word ngram lm (#613 )	3 years ago
Makefile	chinese char/word ngram lm (#613 )	3 years ago
README.rst	chinese char/word ngram lm (#613 )	3 years ago
requirements.txt	chinese char/word ngram lm (#613 )	3 years ago
setup.cfg	chinese char/word ngram lm (#613 )	3 years ago
setup.py	chinese char/word ngram lm (#613 )	3 years ago
tox.ini	chinese char/word ngram lm (#613 )	3 years ago

README.rst

====

Zhon

====



.. image:: https://badge.fury.io/py/zhon.png

    :target: http://badge.fury.io/py/zhon



.. image:: https://travis-ci.org/tsroten/zhon.png?branch=develop

        :target: https://travis-ci.org/tsroten/zhon



Zhon is a Python library that provides constants commonly used in Chinese text

processing.



* Documentation: http://zhon.rtfd.org

* GitHub: https://github.com/tsroten/zhon

* Support: https://github.com/tsroten/zhon/issues

* Free software: `MIT license <http://opensource.org/licenses/MIT>`_



About

-----



Zhon's constants can be used in Chinese text processing, for example:



* Find CJK characters in a string:



  .. code:: python



    >>> re.findall('[{}]'.format(zhon.hanzi.characters), 'I broke a plate: 我打破了一个盘子.')

    ['我', '打', '破', '了', '一', '个', '盘', '子']



* Validate Pinyin syllables, words, or sentences:



  .. code:: python



    >>> re.findall(zhon.pinyin.syllable, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I)

    ['Yuàn', 'zi', 'lǐ', 'tíng', 'zhe', 'yí', 'liàng', 'chē']



    >>> re.findall(zhon.pinyin.word, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I)

    ['Yuànzi', 'lǐ', 'tíngzhe', 'yí', 'liàng', 'chē']



    >>> re.findall(zhon.pinyin.sentence, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I)

    ['Yuànzi lǐ tíngzhe yí liàng chē.']



Features

--------



+ Includes commonly-used constants:

    - CJK characters and radicals

    - Chinese punctuation marks

    - Chinese sentence regular expression pattern

    - Pinyin vowels, consonants, lowercase, uppercase, and punctuation

    - Pinyin syllable, word, and sentence regular expression patterns

    - Zhuyin characters and marks

    - Zhuyin syllable regular expression pattern

    - CC-CEDICT characters

+ Runs on Python 2.7 and 3



Getting Started

---------------



* `Install Zhon <http://zhon.readthedocs.org/en/latest/#installation>`_

* Read `Zhon's introduction <http://zhon.readthedocs.org/en/latest/#using-zhon>`_

* Learn from the `API documentation <http://zhon.readthedocs.org/en/latest/#zhon-hanzi>`_

* `Contribute <https://github.com/tsroten/zhon/blob/develop/CONTRIBUTING.rst>`_ documentation, code, or feedback