parent
bedbfac5a2
commit
bf676c0a94
@ -0,0 +1,21 @@
|
||||
sudo: false
|
||||
language: python
|
||||
install: pip install tox
|
||||
script: tox
|
||||
|
||||
matrix:
|
||||
include:
|
||||
- python: 2.7
|
||||
env: TOXENV=py27
|
||||
- python: 3.4
|
||||
env: TOXENV=py34
|
||||
- python: 3.5
|
||||
env: TOXENV=py35
|
||||
- python: 3.6
|
||||
env: TOXENV=py36
|
||||
- python: 3.6
|
||||
env: TOXENV=pep8
|
||||
- python: 3.6
|
||||
env: TOXENV=docs
|
||||
- python: 3.6
|
||||
env: TOXENV=packaging
|
@ -0,0 +1,14 @@
|
||||
=======
|
||||
Credits
|
||||
=======
|
||||
|
||||
Author and Maintainer
|
||||
---------------------
|
||||
|
||||
* Thomas Roten <https://github.com/tsroten>
|
||||
|
||||
Contributors
|
||||
------------
|
||||
|
||||
None yet. Why not be the first?
|
||||
|
@ -0,0 +1,88 @@
|
||||
Changes
|
||||
=======
|
||||
|
||||
v0.1.0 (2013-05-05)
|
||||
-------------------
|
||||
|
||||
* Initial release
|
||||
|
||||
v0.1.1 (2013-05-05)
|
||||
-------------------
|
||||
|
||||
* Adds zhon.cedict package to setup.py
|
||||
|
||||
v0.2.0 (2013-05-07)
|
||||
-------------------
|
||||
|
||||
* Allows for mapping between simplified and traditional.
|
||||
* Adds logging to build_string().
|
||||
* Adds constants for numbered Pinyin and accented Pinyin.
|
||||
|
||||
v0.2.1 (2013-05-07)
|
||||
-------------------
|
||||
|
||||
* Fixes typo in README.rst.
|
||||
|
||||
v.1.0.0 (2014-01-25)
|
||||
--------------------
|
||||
|
||||
* Complete rewrite that refactors code, renames constants, and improves Pinyin
|
||||
support.
|
||||
|
||||
v.1.1.0 (2014-01-28)
|
||||
--------------------
|
||||
|
||||
* Adds ``zhon.pinyin.punctuation`` constant.
|
||||
* Adds ``zhon.pinyin.accented_syllable``, ``zhon.pinyin.accented_word``, and
|
||||
``zhon.pinyin.accented_sentence`` constants.
|
||||
* Adds ``zhon.pinyin.numbered_syllable``, ``zhon.pinyin.numbered_word``, and
|
||||
``zhon.pinyin.numbered_sentence`` constants.
|
||||
* Fixes some README.rst typos.
|
||||
* Clarifies information regarding Traditional and Simplified character
|
||||
constants in README.rst.
|
||||
* Adds constant short names to README.rst.
|
||||
|
||||
v.1.1.1 (2014-01-29)
|
||||
--------------------
|
||||
|
||||
* Adds documentation.
|
||||
* Adds ``zhon.cedict.all`` constant.
|
||||
* Removes duplicate code ranges from ``zhon.hanzi.characters``.
|
||||
* Makes ``zhon.hanzi.non_stops`` a string containing all non-stops instead of
|
||||
a string containing code ranges.
|
||||
* Removes duplicate letters in ``zhon.pinyin.consonants``.
|
||||
* Refactors Pinyin vowels/consonant code.
|
||||
* Removes the Latin alpha from ``zhon.pinyin.vowels``. Fixes #16.
|
||||
* Adds ``cjk_ideographs`` alias for ``zhon.hanzi.characters``.
|
||||
* Fixes various typos.
|
||||
* Removes numbers from Pinyin word constants. Fixes #15.
|
||||
* Adds lowercase and uppercase constants to ``zhon.pinyin``.
|
||||
* Fixes a bug with ``zhon.pinyin.sentence``.
|
||||
* Adds ``sent`` alias for ``zhon.pinyin.sentence``.
|
||||
|
||||
v.1.1.2 (2014-01-31)
|
||||
--------------------
|
||||
|
||||
* Fixes bug with ``zhon.cedict.all``.
|
||||
|
||||
v.1.1.3 (2014-02-12)
|
||||
--------------------
|
||||
|
||||
* Adds Ideographic number zero to ``zhon.hanzi.characters``. Fixes #17.
|
||||
* Fixes r-suffix bug. Fixes #18.
|
||||
|
||||
v.1.1.4 (2015-01-25)
|
||||
--------------------
|
||||
|
||||
* Removes duplicate module declarations in documentation.
|
||||
* Moves tests inside zhon package.
|
||||
* Adds travis config file.
|
||||
* Adds Python 3.4 tests to travis and tox.
|
||||
* Fixes flake8 warnings.
|
||||
* Adds distutil fallback import statment to setup.py.
|
||||
* Adds missing hanzi punctuation. Fixes #19.
|
||||
|
||||
v.1.1.5 (2016-05-23)
|
||||
--------------------
|
||||
|
||||
* Add missing Zhuyin characters. Fixes #23.
|
@ -0,0 +1,107 @@
|
||||
============
|
||||
Contributing
|
||||
============
|
||||
|
||||
Contributions are welcome, and they are greatly appreciated! Every
|
||||
little bit helps, and credit will always be given.
|
||||
|
||||
You can contribute in many ways:
|
||||
|
||||
Types of Contributions
|
||||
----------------------
|
||||
|
||||
Report Bugs
|
||||
~~~~~~~~~~~
|
||||
|
||||
Report bugs at https://github.com/tsroten/zhon/issues.
|
||||
|
||||
If you are reporting a bug, please include:
|
||||
|
||||
* Your operating system name and version.
|
||||
* Any details about your local setup that might be helpful in troubleshooting.
|
||||
* Detailed steps to reproduce the bug.
|
||||
|
||||
Fix Bugs
|
||||
~~~~~~~~
|
||||
|
||||
Look through the GitHub issues for bugs. Anything tagged with "bug"
|
||||
is open to whoever wants to implement it.
|
||||
|
||||
Implement Features
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Look through the GitHub issues for features. Anything tagged with "feature"
|
||||
is open to whoever wants to implement it.
|
||||
|
||||
Write Documentation
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Zhon could always use more documentation, whether as part of the
|
||||
official Zhon docs, in docstrings, or even on the web in blog posts,
|
||||
articles, and such.
|
||||
|
||||
Submit Feedback
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
The best way to send feedback is to file an issue at https://github.com/tsroten/zhon/issues.
|
||||
|
||||
If you are proposing a feature:
|
||||
|
||||
* Explain in detail how it would work.
|
||||
* Keep the scope as narrow as possible, to make it easier to implement.
|
||||
* Remember that this is a volunteer-driven project, and that contributions
|
||||
are welcome :)
|
||||
|
||||
Get Started!
|
||||
------------
|
||||
|
||||
Ready to contribute? Here's how to set up `zhon` for local development.
|
||||
|
||||
1. Fork the `zhon` repo on GitHub.
|
||||
2. Clone your fork locally::
|
||||
|
||||
$ git clone git@github.com:your_name_here/zhon.git
|
||||
|
||||
3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development::
|
||||
|
||||
$ mkvirtualenv zhon
|
||||
$ cd zhon/
|
||||
$ python setup.py develop
|
||||
|
||||
4. Create a branch for local development::
|
||||
|
||||
$ git checkout -b name-of-your-bugfix-or-feature
|
||||
|
||||
Now you can make your changes locally.
|
||||
|
||||
5. When you're done making changes, check that your changes pass flake8 and the tests, including testing other Python versions with tox::
|
||||
|
||||
$ flake8 zhon
|
||||
$ python setup.py test
|
||||
$ tox
|
||||
|
||||
To get flake8 and tox, just pip install them into your virtualenv.
|
||||
|
||||
You can ignore the flake8 errors regarding `zhon.cedict` files. Rather than include hundreds of newline characters in each file, we are ignoring those errors.
|
||||
|
||||
6. Commit your changes and push your branch to GitHub::
|
||||
|
||||
$ git add .
|
||||
$ git commit -m "Your detailed description of your changes."
|
||||
$ git push origin name-of-your-bugfix-or-feature
|
||||
|
||||
7. Submit a pull request through the GitHub website.
|
||||
|
||||
Pull Request Guidelines
|
||||
-----------------------
|
||||
|
||||
Before you submit a pull request, check that it meets these guidelines:
|
||||
|
||||
1. The pull request should include tests.
|
||||
2. If the pull request adds functionality, the docs should be updated. Put
|
||||
your new functionality into a function with a docstring, and add the
|
||||
feature to the list in README.rst.
|
||||
3. The pull request should work for Python 2.7, 3.3, and 3.4. Check
|
||||
https://travis-ci.org/tsroten/zhon/pull_requests
|
||||
and make sure that the tests pass for all supported Python versions.
|
||||
4. If you want to receive credit, add your name to `AUTHORS.rst`.
|
@ -0,0 +1,7 @@
|
||||
Copyright (c) 2013-2014 Thomas Roten
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
@ -0,0 +1,6 @@
|
||||
include *.txt *.rst
|
||||
include Makefile
|
||||
include tox.ini
|
||||
recursive-include docs *
|
||||
recursive-include tests *.py
|
||||
prune docs/_build
|
@ -0,0 +1,42 @@
|
||||
PROJECT = zhon
|
||||
|
||||
.PHONY: docs clean lint test test-all coverage dist release
|
||||
|
||||
help:
|
||||
@echo "clean - remove all build artifacts"
|
||||
@echo "lint - check style with flake8"
|
||||
@echo "test - run tests quickly with the current Python"
|
||||
@echo "test-all - run tests in all environments"
|
||||
@echo "coverage - check code coverage"
|
||||
@echo "docs - generate Sphinx HTML documentation"
|
||||
@echo "dist - make the source and binary distributions"
|
||||
@echo "release - package and upload a release"
|
||||
|
||||
clean:
|
||||
rm -rf build dist egg *.egg-info htmlcov
|
||||
find . -name '*.py[co]' -exec rm -f {} +
|
||||
$(MAKE) -C docs clean
|
||||
|
||||
lint:
|
||||
flake8 $(PROJECT) tests setup.py
|
||||
|
||||
test:
|
||||
python setup.py test
|
||||
|
||||
test-all:
|
||||
tox
|
||||
|
||||
coverage:
|
||||
coverage run --source $(PROJECT) setup.py test
|
||||
coverage report --fail-under=100
|
||||
|
||||
docs:
|
||||
$(MAKE) -C docs clean
|
||||
$(MAKE) -C docs html
|
||||
open docs/_build/html/index.html
|
||||
|
||||
dist: clean
|
||||
python setup.py sdist bdist_wheel
|
||||
|
||||
release: clean dist
|
||||
twine upload -s dist/*
|
@ -0,0 +1,64 @@
|
||||
====
|
||||
Zhon
|
||||
====
|
||||
|
||||
.. image:: https://badge.fury.io/py/zhon.png
|
||||
:target: http://badge.fury.io/py/zhon
|
||||
|
||||
.. image:: https://travis-ci.org/tsroten/zhon.png?branch=develop
|
||||
:target: https://travis-ci.org/tsroten/zhon
|
||||
|
||||
Zhon is a Python library that provides constants commonly used in Chinese text
|
||||
processing.
|
||||
|
||||
* Documentation: http://zhon.rtfd.org
|
||||
* GitHub: https://github.com/tsroten/zhon
|
||||
* Support: https://github.com/tsroten/zhon/issues
|
||||
* Free software: `MIT license <http://opensource.org/licenses/MIT>`_
|
||||
|
||||
About
|
||||
-----
|
||||
|
||||
Zhon's constants can be used in Chinese text processing, for example:
|
||||
|
||||
* Find CJK characters in a string:
|
||||
|
||||
.. code:: python
|
||||
|
||||
>>> re.findall('[{}]'.format(zhon.hanzi.characters), 'I broke a plate: 我打破了一个盘子.')
|
||||
['我', '打', '破', '了', '一', '个', '盘', '子']
|
||||
|
||||
* Validate Pinyin syllables, words, or sentences:
|
||||
|
||||
.. code:: python
|
||||
|
||||
>>> re.findall(zhon.pinyin.syllable, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I)
|
||||
['Yuàn', 'zi', 'lǐ', 'tíng', 'zhe', 'yí', 'liàng', 'chē']
|
||||
|
||||
>>> re.findall(zhon.pinyin.word, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I)
|
||||
['Yuànzi', 'lǐ', 'tíngzhe', 'yí', 'liàng', 'chē']
|
||||
|
||||
>>> re.findall(zhon.pinyin.sentence, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I)
|
||||
['Yuànzi lǐ tíngzhe yí liàng chē.']
|
||||
|
||||
Features
|
||||
--------
|
||||
|
||||
+ Includes commonly-used constants:
|
||||
- CJK characters and radicals
|
||||
- Chinese punctuation marks
|
||||
- Chinese sentence regular expression pattern
|
||||
- Pinyin vowels, consonants, lowercase, uppercase, and punctuation
|
||||
- Pinyin syllable, word, and sentence regular expression patterns
|
||||
- Zhuyin characters and marks
|
||||
- Zhuyin syllable regular expression pattern
|
||||
- CC-CEDICT characters
|
||||
+ Runs on Python 2.7 and 3
|
||||
|
||||
Getting Started
|
||||
---------------
|
||||
|
||||
* `Install Zhon <http://zhon.readthedocs.org/en/latest/#installation>`_
|
||||
* Read `Zhon's introduction <http://zhon.readthedocs.org/en/latest/#using-zhon>`_
|
||||
* Learn from the `API documentation <http://zhon.readthedocs.org/en/latest/#zhon-hanzi>`_
|
||||
* `Contribute <https://github.com/tsroten/zhon/blob/develop/CONTRIBUTING.rst>`_ documentation, code, or feedback
|
@ -0,0 +1,177 @@
|
||||
# Makefile for Sphinx documentation
|
||||
#
|
||||
|
||||
# You can set these variables from the command line.
|
||||
SPHINXOPTS =
|
||||
SPHINXBUILD = sphinx-build
|
||||
PAPER =
|
||||
BUILDDIR = _build
|
||||
|
||||
# User-friendly check for sphinx-build
|
||||
ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
|
||||
$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
|
||||
endif
|
||||
|
||||
# Internal variables.
|
||||
PAPEROPT_a4 = -D latex_paper_size=a4
|
||||
PAPEROPT_letter = -D latex_paper_size=letter
|
||||
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
|
||||
# the i18n builder cannot share the environment and doctrees with the others
|
||||
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
|
||||
|
||||
.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
|
||||
|
||||
help:
|
||||
@echo "Please use \`make <target>' where <target> is one of"
|
||||
@echo " html to make standalone HTML files"
|
||||
@echo " dirhtml to make HTML files named index.html in directories"
|
||||
@echo " singlehtml to make a single large HTML file"
|
||||
@echo " pickle to make pickle files"
|
||||
@echo " json to make JSON files"
|
||||
@echo " htmlhelp to make HTML files and a HTML help project"
|
||||
@echo " qthelp to make HTML files and a qthelp project"
|
||||
@echo " devhelp to make HTML files and a Devhelp project"
|
||||
@echo " epub to make an epub"
|
||||
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
|
||||
@echo " latexpdf to make LaTeX files and run them through pdflatex"
|
||||
@echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
|
||||
@echo " text to make text files"
|
||||
@echo " man to make manual pages"
|
||||
@echo " texinfo to make Texinfo files"
|
||||
@echo " info to make Texinfo files and run them through makeinfo"
|
||||
@echo " gettext to make PO message catalogs"
|
||||
@echo " changes to make an overview of all changed/added/deprecated items"
|
||||
@echo " xml to make Docutils-native XML files"
|
||||
@echo " pseudoxml to make pseudoxml-XML files for display purposes"
|
||||
@echo " linkcheck to check all external links for integrity"
|
||||
@echo " doctest to run all doctests embedded in the documentation (if enabled)"
|
||||
|
||||
clean:
|
||||
rm -rf $(BUILDDIR)/*
|
||||
|
||||
html:
|
||||
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
|
||||
@echo
|
||||
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
|
||||
|
||||
dirhtml:
|
||||
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
|
||||
@echo
|
||||
@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
|
||||
|
||||
singlehtml:
|
||||
$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
|
||||
@echo
|
||||
@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
|
||||
|
||||
pickle:
|
||||
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
|
||||
@echo
|
||||
@echo "Build finished; now you can process the pickle files."
|
||||
|
||||
json:
|
||||
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
|
||||
@echo
|
||||
@echo "Build finished; now you can process the JSON files."
|
||||
|
||||
htmlhelp:
|
||||
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
|
||||
@echo
|
||||
@echo "Build finished; now you can run HTML Help Workshop with the" \
|
||||
".hhp project file in $(BUILDDIR)/htmlhelp."
|
||||
|
||||
qthelp:
|
||||
$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
|
||||
@echo
|
||||
@echo "Build finished; now you can run "qcollectiongenerator" with the" \
|
||||
".qhcp project file in $(BUILDDIR)/qthelp, like this:"
|
||||
@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/Zhon.qhcp"
|
||||
@echo "To view the help file:"
|
||||
@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/Zhon.qhc"
|
||||
|
||||
devhelp:
|
||||
$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
|
||||
@echo
|
||||
@echo "Build finished."
|
||||
@echo "To view the help file:"
|
||||
@echo "# mkdir -p $$HOME/.local/share/devhelp/Zhon"
|
||||
@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/Zhon"
|
||||
@echo "# devhelp"
|
||||
|
||||
epub:
|
||||
$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
|
||||
@echo
|
||||
@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
|
||||
|
||||
latex:
|
||||
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
|
||||
@echo
|
||||
@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
|
||||
@echo "Run \`make' in that directory to run these through (pdf)latex" \
|
||||
"(use \`make latexpdf' here to do that automatically)."
|
||||
|
||||
latexpdf:
|
||||
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
|
||||
@echo "Running LaTeX files through pdflatex..."
|
||||
$(MAKE) -C $(BUILDDIR)/latex all-pdf
|
||||
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
|
||||
|
||||
latexpdfja:
|
||||
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
|
||||
@echo "Running LaTeX files through platex and dvipdfmx..."
|
||||
$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
|
||||
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
|
||||
|
||||
text:
|
||||
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
|
||||
@echo
|
||||
@echo "Build finished. The text files are in $(BUILDDIR)/text."
|
||||
|
||||
man:
|
||||
$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
|
||||
@echo
|
||||
@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
|
||||
|
||||
texinfo:
|
||||
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
|
||||
@echo
|
||||
@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
|
||||
@echo "Run \`make' in that directory to run these through makeinfo" \
|
||||
"(use \`make info' here to do that automatically)."
|
||||
|
||||
info:
|
||||
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
|
||||
@echo "Running Texinfo files through makeinfo..."
|
||||
make -C $(BUILDDIR)/texinfo info
|
||||
@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
|
||||
|
||||
gettext:
|
||||
$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
|
||||
@echo
|
||||
@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
|
||||
|
||||
changes:
|
||||
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
|
||||
@echo
|
||||
@echo "The overview file is in $(BUILDDIR)/changes."
|
||||
|
||||
linkcheck:
|
||||
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
|
||||
@echo
|
||||
@echo "Link check complete; look for any errors in the above output " \
|
||||
"or in $(BUILDDIR)/linkcheck/output.txt."
|
||||
|
||||
doctest:
|
||||
$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
|
||||
@echo "Testing of doctests in the sources finished, look at the " \
|
||||
"results in $(BUILDDIR)/doctest/output.txt."
|
||||
|
||||
xml:
|
||||
$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
|
||||
@echo
|
||||
@echo "Build finished. The XML files are in $(BUILDDIR)/xml."
|
||||
|
||||
pseudoxml:
|
||||
$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
|
||||
@echo
|
||||
@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."
|
@ -0,0 +1,264 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
#
|
||||
# Zhon documentation build configuration file, created by
|
||||
# sphinx-quickstart on Tue Jan 28 22:18:02 2014.
|
||||
#
|
||||
# This file is execfile()d with the current directory set to its
|
||||
# containing dir.
|
||||
#
|
||||
# Note that not all possible configuration values are present in this
|
||||
# autogenerated file.
|
||||
#
|
||||
# All configuration values have a default; values that are commented out
|
||||
# serve to show the default.
|
||||
|
||||
# If extensions (or modules to document with autodoc) are in another directory,
|
||||
# add these directories to sys.path here. If the directory is relative to the
|
||||
# documentation root, use os.path.abspath to make it absolute, like shown here.
|
||||
#sys.path.insert(0, os.path.abspath('.'))
|
||||
|
||||
# -- General configuration ------------------------------------------------
|
||||
|
||||
# If your documentation needs a minimal Sphinx version, state it here.
|
||||
#needs_sphinx = '1.0'
|
||||
|
||||
# Add any Sphinx extension module names here, as strings. They can be
|
||||
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
|
||||
# ones.
|
||||
extensions = [
|
||||
'sphinx.ext.autodoc',
|
||||
'sphinx.ext.intersphinx',
|
||||
'sphinx.ext.viewcode',
|
||||
]
|
||||
|
||||
# Add any paths that contain templates here, relative to this directory.
|
||||
templates_path = ['_templates']
|
||||
|
||||
# The suffix of source filenames.
|
||||
source_suffix = '.rst'
|
||||
|
||||
# The encoding of source files.
|
||||
#source_encoding = 'utf-8-sig'
|
||||
|
||||
# The master toctree document.
|
||||
master_doc = 'index'
|
||||
|
||||
# General information about the project.
|
||||
project = 'Zhon'
|
||||
copyright = '2016, Thomas Roten'
|
||||
|
||||
# The version info for the project you're documenting, acts as replacement for
|
||||
# |version| and |release|, also used in various other places throughout the
|
||||
# built documents.
|
||||
#
|
||||
# The short X.Y version.
|
||||
version = '1.1'
|
||||
# The full version, including alpha/beta/rc tags.
|
||||
release = '1.1.5'
|
||||
|
||||
# The language for content autogenerated by Sphinx. Refer to documentation
|
||||
# for a list of supported languages.
|
||||
#language = None
|
||||
|
||||
# There are two options for replacing |today|: either, you set today to some
|
||||
# non-false value, then it is used:
|
||||
#today = ''
|
||||
# Else, today_fmt is used as the format for a strftime call.
|
||||
#today_fmt = '%B %d, %Y'
|
||||
|
||||
# List of patterns, relative to source directory, that match files and
|
||||
# directories to ignore when looking for source files.
|
||||
exclude_patterns = ['_build']
|
||||
|
||||
# The reST default role (used for this markup: `text`) to use for all
|
||||
# documents.
|
||||
#default_role = None
|
||||
|
||||
# If true, '()' will be appended to :func: etc. cross-reference text.
|
||||
#add_function_parentheses = True
|
||||
|
||||
# If true, the current module name will be prepended to all description
|
||||
# unit titles (such as .. function::).
|
||||
#add_module_names = True
|
||||
|
||||
# If true, sectionauthor and moduleauthor directives will be shown in the
|
||||
# output. They are ignored by default.
|
||||
#show_authors = False
|
||||
|
||||
# The name of the Pygments (syntax highlighting) style to use.
|
||||
pygments_style = 'sphinx'
|
||||
|
||||
# A list of ignored prefixes for module index sorting.
|
||||
#modindex_common_prefix = []
|
||||
|
||||
# If true, keep warnings as "system message" paragraphs in the built documents.
|
||||
#keep_warnings = False
|
||||
|
||||
|
||||
# -- Options for HTML output ----------------------------------------------
|
||||
|
||||
# The theme to use for HTML and HTML Help pages. See the documentation for
|
||||
# a list of builtin themes.
|
||||
html_theme = 'default'
|
||||
|
||||
# Theme options are theme-specific and customize the look and feel of a theme
|
||||
# further. For a list of options available for each theme, see the
|
||||
# documentation.
|
||||
#html_theme_options = {}
|
||||
|
||||
# Add any paths that contain custom themes here, relative to this directory.
|
||||
#html_theme_path = []
|
||||
|
||||
# The name for this set of Sphinx documents. If None, it defaults to
|
||||
# "<project> v<release> documentation".
|
||||
#html_title = None
|
||||
|
||||
# A shorter title for the navigation bar. Default is the same as html_title.
|
||||
#html_short_title = None
|
||||
|
||||
# The name of an image file (relative to this directory) to place at the top
|
||||
# of the sidebar.
|
||||
#html_logo = None
|
||||
|
||||
# The name of an image file (within the static path) to use as favicon of the
|
||||
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
|
||||
# pixels large.
|
||||
#html_favicon = None
|
||||
|
||||
# Add any paths that contain custom static files (such as style sheets) here,
|
||||
# relative to this directory. They are copied after the builtin static files,
|
||||
# so a file named "default.css" will overwrite the builtin "default.css".
|
||||
html_static_path = ['_static']
|
||||
|
||||
# Add any extra paths that contain custom files (such as robots.txt or
|
||||
# .htaccess) here, relative to this directory. These files are copied
|
||||
# directly to the root of the documentation.
|
||||
#html_extra_path = []
|
||||
|
||||
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
|
||||
# using the given strftime format.
|
||||
#html_last_updated_fmt = '%b %d, %Y'
|
||||
|
||||
# If true, SmartyPants will be used to convert quotes and dashes to
|
||||
# typographically correct entities.
|
||||
#html_use_smartypants = True
|
||||
|
||||
# Custom sidebar templates, maps document names to template names.
|
||||
#html_sidebars = {}
|
||||
|
||||
# Additional templates that should be rendered to pages, maps page names to
|
||||
# template names.
|
||||
#html_additional_pages = {}
|
||||
|
||||
# If false, no module index is generated.
|
||||
#html_domain_indices = True
|
||||
|
||||
# If false, no index is generated.
|
||||
#html_use_index = True
|
||||
|
||||
# If true, the index is split into individual pages for each letter.
|
||||
#html_split_index = False
|
||||
|
||||
# If true, links to the reST sources are added to the pages.
|
||||
#html_show_sourcelink = True
|
||||
|
||||
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
|
||||
#html_show_sphinx = True
|
||||
|
||||
# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
|
||||
#html_show_copyright = True
|
||||
|
||||
# If true, an OpenSearch description file will be output, and all pages will
|
||||
# contain a <link> tag referring to it. The value of this option must be the
|
||||
# base URL from which the finished HTML is served.
|
||||
#html_use_opensearch = ''
|
||||
|
||||
# This is the file name suffix for HTML files (e.g. ".xhtml").
|
||||
#html_file_suffix = None
|
||||
|
||||
# Output file base name for HTML help builder.
|
||||
htmlhelp_basename = 'Zhondoc'
|
||||
|
||||
|
||||
# -- Options for LaTeX output ---------------------------------------------
|
||||
|
||||
latex_elements = {
|
||||
# The paper size ('letterpaper' or 'a4paper').
|
||||
#'papersize': 'letterpaper',
|
||||
|
||||
# The font size ('10pt', '11pt' or '12pt').
|
||||
#'pointsize': '10pt',
|
||||
|
||||
# Additional stuff for the LaTeX preamble.
|
||||
#'preamble': '',
|
||||
}
|
||||
|
||||
# Grouping the document tree into LaTeX files. List of tuples
|
||||
# (source start file, target name, title,
|
||||
# author, documentclass [howto, manual, or own class]).
|
||||
latex_documents = [
|
||||
('index', 'Zhon.tex', 'Zhon Documentation',
|
||||
'Thomas Roten', 'manual'),
|
||||
]
|
||||
|
||||
# The name of an image file (relative to this directory) to place at the top of
|
||||
# the title page.
|
||||
#latex_logo = None
|
||||
|
||||
# For "manual" documents, if this is true, then toplevel headings are parts,
|
||||
# not chapters.
|
||||
#latex_use_parts = False
|
||||
|
||||
# If true, show page references after internal links.
|
||||
#latex_show_pagerefs = False
|
||||
|
||||
# If true, show URL addresses after external links.
|
||||
#latex_show_urls = False
|
||||
|
||||
# Documents to append as an appendix to all manuals.
|
||||
#latex_appendices = []
|
||||
|
||||
# If false, no module index is generated.
|
||||
#latex_domain_indices = True
|
||||
|
||||
|
||||
# -- Options for manual page output ---------------------------------------
|
||||
|
||||
# One entry per manual page. List of tuples
|
||||
# (source start file, name, description, authors, manual section).
|
||||
man_pages = [
|
||||
('index', 'zhon', 'Zhon Documentation',
|
||||
['Thomas Roten'], 1)
|
||||
]
|
||||
|
||||
# If true, show URL addresses after external links.
|
||||
#man_show_urls = False
|
||||
|
||||
|
||||
# -- Options for Texinfo output -------------------------------------------
|
||||
|
||||
# Grouping the document tree into Texinfo files. List of tuples
|
||||
# (source start file, target name, title, author,
|
||||
# dir menu entry, description, category)
|
||||
texinfo_documents = [
|
||||
('index', 'Zhon', 'Zhon Documentation',
|
||||
'Thomas Roten', 'Zhon', 'One line description of project.',
|
||||
'Miscellaneous'),
|
||||
]
|
||||
|
||||
# Documents to append as an appendix to all manuals.
|
||||
#texinfo_appendices = []
|
||||
|
||||
# If false, no module index is generated.
|
||||
#texinfo_domain_indices = True
|
||||
|
||||
# How to display URL addresses: 'footnote', 'no', or 'inline'.
|
||||
#texinfo_show_urls = 'footnote'
|
||||
|
||||
# If true, do not generate a @detailmenu in the "Top" node's menu.
|
||||
#texinfo_no_detailmenu = False
|
||||
|
||||
|
||||
# Example configuration for intersphinx: refer to the Python standard library.
|
||||
intersphinx_mapping = {'http://docs.python.org/3': None}
|
@ -0,0 +1,413 @@
|
||||
.. Zhon documentation master file, created by
|
||||
sphinx-quickstart on Tue Jan 28 22:18:02 2014.
|
||||
You can adapt this file completely to your liking, but it should at least
|
||||
contain the root `toctree` directive.
|
||||
|
||||
Zhon
|
||||
====
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
Zhon is a Python library that provides constants commonly used in Chinese text
|
||||
processing:
|
||||
|
||||
* CJK characters and radicals
|
||||
* Chinese punctuation marks
|
||||
* Chinese sentence regular expression pattern
|
||||
* Pinyin vowels, consonants, lowercase, uppercase, and punctuation
|
||||
* Pinyin syllable, word, and sentence regular expression patterns
|
||||
* Zhuyin characters and marks
|
||||
* Zhuyin syllable regular expression pattern
|
||||
* CC-CEDICT characters
|
||||
|
||||
Installation
|
||||
------------
|
||||
|
||||
Zhon supports Python 2.7 and 3. Install using pip:
|
||||
|
||||
.. code:: bash
|
||||
|
||||
$ pip install zhon
|
||||
|
||||
If you want to download the latest source code, check out `Zhon's GitHub
|
||||
repository <https://github.com/tsroten/zhon>`_.
|
||||
|
||||
Be sure to `report any bugs <https://github.com/tsroten/zhon/issues>`_ you find.
|
||||
Thanks!
|
||||
|
||||
.. module:: zhon
|
||||
|
||||
Using Zhon
|
||||
----------
|
||||
|
||||
Zhon contains four modules that export helpful Chinese constants:
|
||||
|
||||
* :py:mod:`zhon.hanzi`
|
||||
* :py:mod:`zhon.pinyin`
|
||||
* :py:mod:`zhon.zhuyin`
|
||||
* :py:mod:`zhon.cedict`
|
||||
|
||||
Zhon's constants are formatted in one of three ways:
|
||||
|
||||
* Characters listed individually. These can be used with membership tests
|
||||
or used to build regular expression patterns. For example, ``'aeiou'``.
|
||||
* Character code ranges. These are used to build regular expression patterns.
|
||||
For example, ``'u\0041-\u005A\u0061-\u007A'``.
|
||||
* Regular expression pattern. These are regular expression patterns
|
||||
that can be used with the regular expression library directly. For
|
||||
example, ``'[u\0020-\u007E]+'``.
|
||||
|
||||
Using the constants listed below is simple. For constants that list the
|
||||
characters individually, you can perform membership tests or use them in
|
||||
regular expressions:
|
||||
|
||||
.. code:: python
|
||||
|
||||
>>> '车' in zhon.cedict.traditional
|
||||
False
|
||||
|
||||
>>> # This regular expression finds all characters that aren't considered
|
||||
... # traditional according to CC-CEDICT
|
||||
... re.findall('[^{}]'.format(zhon.cedict.traditional), '我买了一辆车')
|
||||
['买', '辆', '车']
|
||||
|
||||
For constants that contain character code ranges, you'll want to build a
|
||||
regular expression:
|
||||
|
||||
.. code:: python
|
||||
|
||||
>>> re.findall('[{}]'.format(zhon.hanzi.punctuation), '我买了一辆车。')
|
||||
['。']
|
||||
|
||||
For constants that are regular expression patterns, you can use them directly
|
||||
with the regular expression library, without formatting them:
|
||||
|
||||
.. code:: python
|
||||
|
||||
>>> re.findall(zhon.hanzi.sentence, '我买了一辆车。妈妈做的菜,很好吃!')
|
||||
['我买了一辆车。', '妈妈做的菜,很好吃!']
|
||||
|
||||
.. module:: zhon.hanzi
|
||||
|
||||
``zhon.hanzi``
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
These constants can be used when working directly with Chinese characters.
|
||||
|
||||
These constants can be used in a variety of ways, but they can't directly
|
||||
distinguish between Chinese, Japanese, and Korean characters/words.
|
||||
Chapter 12 of The Unicode Standard
|
||||
(`PDF <http://www.unicode.org/versions/Unicode6.2.0/ch12.pdf>`_)
|
||||
has some useful information about this:
|
||||
|
||||
There is some concern that unifying the Han characters may lead to confusion because they are sometimes used differently by the various East Asian languages. Computationally, Han character unification presents no more difficulty than employing a single Latin character set that is used to write languages as different as English and French. Programmers do not expect the characters "c", "h", "a", and "t" alone to tell us whether chat is a French word for cat or an English word meaning “informal talk.” Likewise, we depend on context to identify the American hood (of a car) with the British bonnet. Few computer users are confused by the fact that ASCII can also be used to represent such words as the Welsh word ynghyd, which are strange looking to English eyes. Although it would be convenient to identify words by language for programs such as spell-checkers, it is neither practical nor productive to encode a separate Latin character set for every language that uses it.
|
||||
|
||||
.. py:data:: characters
|
||||
cjk_ideographs
|
||||
|
||||
Character codes and code ranges for pertinent CJK ideograph Unicode characters. This includes:
|
||||
|
||||
* `CJK Unified Ideographs <http://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block)>`_
|
||||
* `CJK Unified Ideographs Extension A <http://en.wikipedia.org/wiki/CJK_Unified_Ideographs_Extension_A>`_
|
||||
* `CJK Unified Ideographs Extension B <http://en.wikipedia.org/wiki/CJK_Unified_Ideographs_Extension_B>`_
|
||||
* `CJK Unified Ideographs Extension C <http://en.wikipedia.org/wiki/CJK_Unified_Ideographs_Extension_C>`_
|
||||
* `CJK Unified Ideographs Extension D <http://en.wikipedia.org/wiki/CJK_Unified_Ideographs_Extension_D>`_
|
||||
* `CJK Compatibility Ideographs <http://en.wikipedia.org/wiki/CJK_Compatibility_Ideographs>`_
|
||||
* `CJK Compatibility Ideographs Supplement <http://en.wikipedia.org/wiki/CJK_Compatibility_Ideographs_Supplement>`_
|
||||
* Ideographic number zero
|
||||
|
||||
Some of the characters in this constant will not be Chinese characters,
|
||||
but this is a convienient way to approach the issue. If you'd rather have
|
||||
an enormous string of Chinese characters from a Chinese dictionary, check
|
||||
out :py:data:`zhon.cedict`.
|
||||
|
||||
.. py:data:: radicals
|
||||
|
||||
Character code ranges for the `Kangxi Radicals <http://en.wikipedia.org/wiki/Kangxi_radical#Unicode>`_
|
||||
and `CJK Radicals Supplement <http://en.wikipedia.org/wiki/CJK_Radicals_Supplement>`_
|
||||
Unicode blocks.
|
||||
|
||||
.. py:data:: punctuation
|
||||
|
||||
This is the concatenation of :py:data:`zhon.hanzi.non_stops` and
|
||||
:py:data:`zhon.hanzi.stops`.
|
||||
|
||||
.. py:data:: non_stops
|
||||
|
||||
The string ``'"#$%&'()*+,-/:;<=>@[\]^_`{|}~⦅⦆「」、 、〃》「」『』【】〔〕〖〗〘〙〚〛〜〝〞〟〰〾〿–—‘’‛“”„‟…‧﹏'``.
|
||||
This contains Chinese punctuation marks, excluding punctuation marks that
|
||||
function as stops.
|
||||
|
||||
.. py:data:: stops
|
||||
|
||||
The string ``'!?。。'``. These punctuation marks function as stops.
|
||||
|
||||
.. py:data:: sent
|
||||
sentence
|
||||
|
||||
A regular expression pattern for a Chinese sentence. A sentence is defined
|
||||
as a series of CJK characters (as defined by
|
||||
:py:data:`zhon.hanzi.characters`) and non-stop punctuation marks followed
|
||||
by a stop and zero or more container-closing punctuation marks (e.g.
|
||||
apostrophe and brackets).
|
||||
|
||||
.. code:: python
|
||||
|
||||
>>> re.findall(zhon.hanzi.sentence, '我买了一辆车。')
|
||||
['我买了一辆车。']
|
||||
|
||||
.. module:: zhon.pinyin
|
||||
|
||||
``zhon.pinyin``
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
These constants can be used when working with Pinyin.
|
||||
|
||||
.. py:data:: vowels
|
||||
|
||||
The string ``'aeiouvüāēīōūǖáéíóúǘǎěǐǒǔǚàèìòùǜAEIOUVÜĀĒĪŌŪǕÁÉÍÓÚǗǍĚǏǑǓǙÀÈÌÒÙǛ'``. This contains every Pinyin vowel (lowercase and uppercase).
|
||||
|
||||
.. py:data:: consonants
|
||||
|
||||
The string ``'bpmfdtnlgkhjqxzcsrwyBPMFDTNLGKHJQXZCSRWY'``. This
|
||||
contains every Pinyin consonant (lowercase and uppercase).
|
||||
|
||||
.. py:data:: lowercase
|
||||
|
||||
The string ``'bpmfdtnlgkhjqxzcsrwyaeiouvüāēīōūǖáéíóúǘǎěǐǒǔǚàèìòùǜ'``. This contains every lowercase Pinyin vowel and consonant.
|
||||
|
||||
.. py:data:: uppercase
|
||||
|
||||
The string ``'BPMFDTNLGKHJQXZCSRWYAEIOUVÜĀĒĪŌŪǕÁÉÍÓÚǗǍĚǏǑǓǙÀÈÌÒÙǛ'``.
|
||||
This contains every uppercase vowel and consonant.
|
||||
|
||||
.. py:data:: marks
|
||||
|
||||
The string ``"·012345:-'"``. This contains all Pinyin marks that have
|
||||
special meaning: a middle dot and numbers for indicating tone, a colon for
|
||||
easily writing ü ('u:'), a hyphen for connecting syllables within words,
|
||||
and an apostrophe for separating a syllable beginning with a vowel from
|
||||
the previous syllable in its word. All of these marks can be used within a
|
||||
valid Pinyin word.
|
||||
|
||||
.. py:data:: punctuation
|
||||
|
||||
The concatenation of :py:data:`zhon.pinyin.non_stops` and
|
||||
:py:data:`zhon.pinyin.stops`.
|
||||
|
||||
.. py:data:: non_stops
|
||||
|
||||
The string ``'"#$%&\'()*+,-/:;<=>@[\]^_`{|}~"'``. This contains every
|
||||
ASCII punctuation mark that doesn't function as a stop.
|
||||
|
||||
.. py:data:: stops
|
||||
|
||||
The string ``'.!?'``. This contains every ASCII punctuation mark that
|
||||
functions as a stop.
|
||||
|
||||
.. py:data:: printable
|
||||
|
||||
The concatenation of :py:data:`zhon.pinyin.vowels`,
|
||||
:py:data:`zhon.pinyin.consonants`, :py:data:`zhon.pinyin.marks`,
|
||||
:py:data:`zhon.pinyin.punctuation`, and :py:data:`string.whitespace`. This
|
||||
is essentially a Pinyin whitelist for complete Pinyin sentences -- it's
|
||||
every possible valid character a Pinyin string can use assuming all
|
||||
non-Chinese words that might be included (like proper nouns) use ASCII.
|
||||
|
||||
Validating and splitting Pinyin isn't as simple as checking that only
|
||||
valid characters exist or matching maximum-length valid syllables.
|
||||
The regular expression library's lookahead features are used in this
|
||||
module's regular expression patterns to ensure that only valid Pinyin
|
||||
syllables are matched. The approach used to segment a string into valid
|
||||
Pinyin syllables is roughly:
|
||||
|
||||
1. Match the longest possible valid syllable.
|
||||
2. If that match is followed directly by a vowel, drop that match and try
|
||||
again with the next longest possible valid syllable.
|
||||
|
||||
Additionally, lookahead assertions are used to ensure that hyphens and
|
||||
apostrophes are only accepted when they are used correctly. This helps to
|
||||
weed out non-Pinyin strings.
|
||||
|
||||
.. py:data:: syl
|
||||
syllable
|
||||
|
||||
A regular expression pattern for a valid Pinyin syllable (accented or
|
||||
numbered). Compile with :py:data:`re.IGNORECASE` (:py:data:`re.I`) to
|
||||
accept uppercase letters as well.
|
||||
|
||||
.. code:: python
|
||||
|
||||
>>> re.findall(zhon.pinyin.syllable, 'Shū zài zhuōzi shàngmian. Shu1 zai4 zhuo1zi5 shang4mian5.', re.IGNORECASE)
|
||||
['Shū', 'zài', 'zhuō', 'zi', 'shàng', 'mian', 'Shu1', 'zai4', 'zhuo1', 'zi5', 'shang4', 'mian5']
|
||||
|
||||
.. py:data:: a_syl
|
||||
acc_syl
|
||||
accented_syllable
|
||||
|
||||
A regular expression for a valid accented Pinyin syllable. Compile with
|
||||
:py:data:`re.IGNORECASE` (:py:data:`re.I`) to accept uppercase letters as
|
||||
well.
|
||||
|
||||
.. code:: python
|
||||
|
||||
>>> re.findall(zhon.pinyin.acc_syl, 'Shū zài zhuōzi shàngmian.', re.IGNORECASE)
|
||||
['Shū', 'zài', 'zhuō', 'zi', 'shàng', 'mian']
|
||||
|
||||
|
||||
.. py:data:: n_syl
|
||||
num_syl
|
||||
numbered_syllable
|
||||
|
||||
A regular expression for a valid numbered Pinyin syllable. Compile with
|
||||
:py:data:`re.IGNORECASE` (:py:data:`re.I`) to accept uppercase letters as
|
||||
well.
|
||||
|
||||
.. code:: python
|
||||
|
||||
>>> re.findall(zhon.pinyin.num_syl, 'Shu1 zai4 zhuo1zi5 shang4mian5.', re.IGNORECASE)
|
||||
['Shu1', 'zai4', 'zhuo1', 'zi5', 'shang4', 'mian5']
|
||||
|
||||
.. py:data:: word
|
||||
|
||||
A regular expression pattern for a valid Pinyin word (accented or
|
||||
numbered). Compile with :py:data:`re.IGNORECASE` (:py:data:`re.I`) to
|
||||
accept uppercase letters as well.
|
||||
|
||||
.. code:: python
|
||||
|
||||
>>> re.findall(zhon.pinyin.word, 'Shū zài zhuōzi shàngmian. Shu1 zai4 zhuo1zi5 shang4mian5.', re.IGNORECASE)
|
||||
['Shū', 'zài', 'zhuōzi', 'shàngmian', 'Shu1', 'zai4', 'zhuo1zi5', 'shang4mian5'
|
||||
|
||||
.. py:data:: a_word
|
||||
acc_word
|
||||
accented_word
|
||||
|
||||
A regular expression for a valid accented Pinyin word. Compile with
|
||||
:py:data:`re.IGNORECASE` (:py:data:`re.I`) to accept uppercase letters as
|
||||
well.
|
||||
|
||||
.. code:: python
|
||||
|
||||
>>> re.findall(zhon.pinyin.acc_word, 'Shū zài zhuōzi shàngmian.', re.IGNORECASE)
|
||||
['Shū', 'zài', 'zhuōzi', 'shàngmian']
|
||||
|
||||
|
||||
.. py:data:: n_word
|
||||
num_word
|
||||
numbered_word
|
||||
|
||||
A regular expression for a valid numbered Pinyin word. Compile with
|
||||
:py:data:`re.IGNORECASE` (:py:data:`re.I`) to accept uppercase letters as
|
||||
well.
|
||||
|
||||
.. code:: python
|
||||
|
||||
>>> re.findall(zhon.pinyin.num_word, 'Shu1 zai4 zhuo1zi5 shang4mian5.', re.IGNORECASE)
|
||||
['Shu1', 'zai4', 'zhuo1zi5', 'shang4mian5']
|
||||
|
||||
.. py:data:: sent
|
||||
sentence
|
||||
|
||||
A regular expression pattern for a valid Pinyin sentence (accented or
|
||||
numbered). Compile with :py:data:`re.IGNORECASE` (:py:data:`re.I`) to
|
||||
accept uppercase letters as well.
|
||||
|
||||
.. code:: python
|
||||
|
||||
>>> re.findall(zhon.pinyin.sentence, 'Shū zài zhuōzi shàngmian. Shu1 zai4 zhuo1zi5 shang4mian5.', re.IGNORECASE)
|
||||
['Shū zài zhuōzi shàngmian.', 'Shu1 zai4 zhuo1zi5 shang4mian5.']
|
||||
|
||||
.. py:data:: a_sent
|
||||
acc_sent
|
||||
accented_sentence
|
||||
|
||||
A regular expression for a valid accented Pinyin sentence. Compile with
|
||||
:py:data:`re.IGNORECASE` (:py:data:`re.I`) to accept uppercase letters as
|
||||
well.
|
||||
|
||||
|
||||
.. code:: python
|
||||
|
||||
>>> re.findall(zhon.pinyin.acc_sent, 'Shū zài zhuōzi shàngmian.', re.IGNORECASE)
|
||||
['Shū zài zhuōzi shàngmian.']
|
||||
|
||||
|
||||
.. py:data:: n_sent
|
||||
num_sent
|
||||
numbered_sentence
|
||||
|
||||
A regular expression for a valid numbered Pinyin sentence. Compile with
|
||||
:py:data:`re.IGNORECASE` (:py:data:`re.I`) to accept uppercase letters as
|
||||
well.
|
||||
|
||||
|
||||
.. code:: python
|
||||
|
||||
>>> re.findall(zhon.pinyin.num_sent, 'Shu1 zai4 zhuo1zi5 shang4mian5.', re.IGNORECASE)
|
||||
['Shu1 zai4 zhuo1zi5 shang4mian5.']
|
||||
|
||||
.. module:: zhon.zhuyin
|
||||
|
||||
``zhon.zhuyin``
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
These constants can be used when working with Zhuyin (Bopomofo).
|
||||
|
||||
.. py:data:: characters
|
||||
|
||||
The string ``'ㄅㄆㄇㄈㄉㄊㄋㄌㄍㄎㄏㄐㄑㄒㄓㄔㄕㄖㄗㄘㄙㄚㄛㄝㄜㄞㄟㄠㄡㄢㄣㄤㄥㄦㄧ'``.
|
||||
This contains all Zhuyin characters as defined by the `Bomopofo Unicode
|
||||
block <http://en.wikipedia.org/wiki/Bopomofo_(Unicode_block)>`_. It does
|
||||
not include the
|
||||
`Bomopofo Extended block <http://en.wikipedia.org/wiki/Bopomofo_Extended_(Unicode_block)>`_
|
||||
that defines characters used in non-standard dialects or minority
|
||||
languages.
|
||||
|
||||
.. py:data:: marks
|
||||
|
||||
The string ``'ˇˊˋ˙'``. This contains the Zhuyin tone marks.
|
||||
|
||||
.. py:data:: syl
|
||||
syllable
|
||||
|
||||
A regular expression pattern for a valid Zhuyin syllable.
|
||||
|
||||
.. code:: python
|
||||
|
||||
>>> re.findall(zhon.zhuyin.syllable, 'ㄓㄨˋ ㄧㄣ ㄈㄨˊ ㄏㄠˋ')
|
||||
['ㄓㄨˋ', 'ㄧㄣ', 'ㄈㄨˊ', 'ㄏㄠˋ']
|
||||
|
||||
.. module:: zhon.cedict
|
||||
|
||||
``zhon.cedict``
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
These constants are built from the `CC-CEDICT dictionary
|
||||
<http://cc-cedict.org/wiki/>`_.
|
||||
They aren't guaranteed to contain every possible Chinese character. They only
|
||||
provide characters that exist in the CC-CEDICT dictionary.
|
||||
|
||||
.. py:data:: all
|
||||
|
||||
A string containing all Chinese characters found in `CC-CEDICT
|
||||
<http://cc-cedict.org/wiki/>`_.
|
||||
|
||||
.. py:data:: trad
|
||||
traditional
|
||||
|
||||
A string containing characters considered by `CC-CEDICT
|
||||
<http://cc-cedict.org/wiki/>`_ to be Traditional Chinese characters.
|
||||
Some of these characters are also present in
|
||||
:py:data:`zhon.cedict.simplified` because many characters were left
|
||||
untouched by the simplification process.
|
||||
|
||||
.. py:data:: simp
|
||||
simplified
|
||||
|
||||
A string containing characters considered by `CC-CEDICT
|
||||
<http://cc-cedict.org/wiki/>`_ to be Simplified Chinese characters.
|
||||
Some of these characters are also present in
|
||||
:py:data:`zhon.cedict.traditional` because many characters were left
|
||||
untouched by the simplification process.
|
@ -0,0 +1,242 @@
|
||||
@ECHO OFF
|
||||
|
||||
REM Command file for Sphinx documentation
|
||||
|
||||
if "%SPHINXBUILD%" == "" (
|
||||
set SPHINXBUILD=sphinx-build
|
||||
)
|
||||
set BUILDDIR=_build
|
||||
set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% .
|
||||
set I18NSPHINXOPTS=%SPHINXOPTS% .
|
||||
if NOT "%PAPER%" == "" (
|
||||
set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS%
|
||||
set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS%
|
||||
)
|
||||
|
||||
if "%1" == "" goto help
|
||||
|
||||
if "%1" == "help" (
|
||||
:help
|
||||
echo.Please use `make ^<target^>` where ^<target^> is one of
|
||||
echo. html to make standalone HTML files
|
||||
echo. dirhtml to make HTML files named index.html in directories
|
||||
echo. singlehtml to make a single large HTML file
|
||||
echo. pickle to make pickle files
|
||||
echo. json to make JSON files
|
||||
echo. htmlhelp to make HTML files and a HTML help project
|
||||
echo. qthelp to make HTML files and a qthelp project
|
||||
echo. devhelp to make HTML files and a Devhelp project
|
||||
echo. epub to make an epub
|
||||
echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter
|
||||
echo. text to make text files
|
||||
echo. man to make manual pages
|
||||
echo. texinfo to make Texinfo files
|
||||
echo. gettext to make PO message catalogs
|
||||
echo. changes to make an overview over all changed/added/deprecated items
|
||||
echo. xml to make Docutils-native XML files
|
||||
echo. pseudoxml to make pseudoxml-XML files for display purposes
|
||||
echo. linkcheck to check all external links for integrity
|
||||
echo. doctest to run all doctests embedded in the documentation if enabled
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "clean" (
|
||||
for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i
|
||||
del /q /s %BUILDDIR%\*
|
||||
goto end
|
||||
)
|
||||
|
||||
|
||||
%SPHINXBUILD% 2> nul
|
||||
if errorlevel 9009 (
|
||||
echo.
|
||||
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
|
||||
echo.installed, then set the SPHINXBUILD environment variable to point
|
||||
echo.to the full path of the 'sphinx-build' executable. Alternatively you
|
||||
echo.may add the Sphinx directory to PATH.
|
||||
echo.
|
||||
echo.If you don't have Sphinx installed, grab it from
|
||||
echo.http://sphinx-doc.org/
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
if "%1" == "html" (
|
||||
%SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished. The HTML pages are in %BUILDDIR%/html.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "dirhtml" (
|
||||
%SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "singlehtml" (
|
||||
%SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "pickle" (
|
||||
%SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished; now you can process the pickle files.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "json" (
|
||||
%SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished; now you can process the JSON files.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "htmlhelp" (
|
||||
%SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished; now you can run HTML Help Workshop with the ^
|
||||
.hhp project file in %BUILDDIR%/htmlhelp.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "qthelp" (
|
||||
%SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished; now you can run "qcollectiongenerator" with the ^
|
||||
.qhcp project file in %BUILDDIR%/qthelp, like this:
|
||||
echo.^> qcollectiongenerator %BUILDDIR%\qthelp\Zhon.qhcp
|
||||
echo.To view the help file:
|
||||
echo.^> assistant -collectionFile %BUILDDIR%\qthelp\Zhon.ghc
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "devhelp" (
|
||||
%SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "epub" (
|
||||
%SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished. The epub file is in %BUILDDIR%/epub.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "latex" (
|
||||
%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished; the LaTeX files are in %BUILDDIR%/latex.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "latexpdf" (
|
||||
%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
|
||||
cd %BUILDDIR%/latex
|
||||
make all-pdf
|
||||
cd %BUILDDIR%/..
|
||||
echo.
|
||||
echo.Build finished; the PDF files are in %BUILDDIR%/latex.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "latexpdfja" (
|
||||
%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
|
||||
cd %BUILDDIR%/latex
|
||||
make all-pdf-ja
|
||||
cd %BUILDDIR%/..
|
||||
echo.
|
||||
echo.Build finished; the PDF files are in %BUILDDIR%/latex.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "text" (
|
||||
%SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished. The text files are in %BUILDDIR%/text.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "man" (
|
||||
%SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished. The manual pages are in %BUILDDIR%/man.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "texinfo" (
|
||||
%SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "gettext" (
|
||||
%SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished. The message catalogs are in %BUILDDIR%/locale.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "changes" (
|
||||
%SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.The overview file is in %BUILDDIR%/changes.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "linkcheck" (
|
||||
%SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Link check complete; look for any errors in the above output ^
|
||||
or in %BUILDDIR%/linkcheck/output.txt.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "doctest" (
|
||||
%SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Testing of doctests in the sources finished, look at the ^
|
||||
results in %BUILDDIR%/doctest/output.txt.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "xml" (
|
||||
%SPHINXBUILD% -b xml %ALLSPHINXOPTS% %BUILDDIR%/xml
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished. The XML files are in %BUILDDIR%/xml.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "pseudoxml" (
|
||||
%SPHINXBUILD% -b pseudoxml %ALLSPHINXOPTS% %BUILDDIR%/pseudoxml
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished. The pseudo-XML files are in %BUILDDIR%/pseudoxml.
|
||||
goto end
|
||||
)
|
||||
|
||||
:end
|
@ -0,0 +1,6 @@
|
||||
coverage==4.3.4
|
||||
flake8==3.3.0
|
||||
Sphinx==1.5.3
|
||||
tox==2.6.0
|
||||
twine==1.8.1
|
||||
wheel==0.29.0
|
@ -0,0 +1,6 @@
|
||||
[bdist_wheel]
|
||||
universal = 1
|
||||
|
||||
[flake8]
|
||||
ignore = E731, P101
|
||||
exclude = zhon/cedict/*
|
@ -0,0 +1,49 @@
|
||||
#!/usr/bin/env python3
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
|
||||
enc_open = open
|
||||
|
||||
try:
|
||||
from setuptools import setup
|
||||
except ImportError:
|
||||
from distutils.core import setup
|
||||
|
||||
|
||||
if sys.argv[-1] == 'publish':
|
||||
os.system('python setup.py sdist upload')
|
||||
sys.exit()
|
||||
|
||||
with enc_open('README.rst', 'r', encoding='utf-8') as f:
|
||||
long_description = f.read()
|
||||
|
||||
setup(
|
||||
name='zhon',
|
||||
version='1.1.5',
|
||||
author='Thomas Roten',
|
||||
author_email='thomas@roten.us',
|
||||
url='https://github.com/tsroten/zhon',
|
||||
description=('Zhon provides constants used in Chinese text processing.'),
|
||||
long_description=long_description,
|
||||
packages=['zhon', 'zhon.cedict'],
|
||||
keywords=('chinese mandarin segmentation tokenization punctuation hanzi '
|
||||
'unicode radicals han cjk cedict cc-cedict traditional '
|
||||
'simplified characters pinyin zhuyin'),
|
||||
classifiers=[
|
||||
'Operating System :: OS Independent',
|
||||
'Intended Audience :: Developers',
|
||||
'Development Status :: 5 - Production/Stable',
|
||||
'License :: OSI Approved :: MIT License',
|
||||
'Programming Language :: Python',
|
||||
'Programming Language :: Python :: 3',
|
||||
'Programming Language :: Python :: 3.4',
|
||||
'Programming Language :: Python :: 3.5',
|
||||
'Programming Language :: Python :: 3.6',
|
||||
'Topic :: Software Development :: Libraries :: Python Modules',
|
||||
'Topic :: Text Processing :: Linguistic',
|
||||
],
|
||||
platforms='Any',
|
||||
test_suite='tests',
|
||||
)
|
@ -0,0 +1,33 @@
|
||||
|
||||
"""Tests for the zhon.cedict module."""
|
||||
|
||||
import re
|
||||
import unittest
|
||||
from zhon import cedict
|
||||
|
||||
|
||||
class TestSimplified(unittest.TestCase):
|
||||
|
||||
simplified_text = '有人丢失了一把斧子怎么找也没有找到'
|
||||
|
||||
def test_re_complement_search(self):
|
||||
re_complement = re.compile('[^{}]'.format(cedict.simplified))
|
||||
self.assertEqual(re_complement.search(self.simplified_text), None)
|
||||
|
||||
|
||||
class TestTraditional(unittest.TestCase):
|
||||
|
||||
simplified_text = '有人丢失了一把斧子怎么找也没有找到'
|
||||
|
||||
def test_re_complement_search(self):
|
||||
re_complement = re.compile('[^{}]'.format(cedict.traditional))
|
||||
self.assertNotEqual(re_complement.search(self.simplified_text), None)
|
||||
|
||||
|
||||
class TestAll(unittest.TestCase):
|
||||
|
||||
all_text = '车車'
|
||||
|
||||
def test_re_complement_search(self):
|
||||
re_complement = re.compile('[^{}]'.format(cedict.all))
|
||||
self.assertEqual(re_complement.search(self.all_text), None)
|
@ -0,0 +1,49 @@
|
||||
|
||||
"""Tests for the zhon.hanzi module."""
|
||||
|
||||
import re
|
||||
import unittest
|
||||
|
||||
from zhon import hanzi
|
||||
|
||||
|
||||
class TestCharacters(unittest.TestCase):
|
||||
|
||||
def test_all_chinese(self):
|
||||
c_re = re.compile('[^{}]'.format(hanzi.characters))
|
||||
t = '你我都很她它隹廿'
|
||||
self.assertEqual(c_re.search(t), None)
|
||||
|
||||
def test_chinese_and_punc(self):
|
||||
c_re = re.compile('[^{}]'.format(hanzi.characters))
|
||||
t = '你我都很她它隹廿。,!'
|
||||
self.assertNotEqual(c_re.search(t), None)
|
||||
|
||||
|
||||
class TestRadicals(unittest.TestCase):
|
||||
|
||||
def test_only_radicals(self):
|
||||
r_re = re.compile('[^{}]'.format(hanzi.radicals))
|
||||
t = '\u2F00\u2F31\u2FBA\u2E98\u2EF3\u2ECF'
|
||||
self.assertEqual(r_re.search(t), None)
|
||||
|
||||
def test_chinese_equivalents(self):
|
||||
r_re = re.compile('[^{}]'.format(hanzi.radicals))
|
||||
t = '\u4E00\u5E7F\u516B\u5165'
|
||||
self.assertNotEqual(r_re.search(t), None)
|
||||
|
||||
|
||||
class TestPunctuation(unittest.TestCase):
|
||||
|
||||
def test_split_on_punctuation(self):
|
||||
p_re = re.compile('[{}]'.format(hanzi.punctuation))
|
||||
t = '你好你好好好哈哈,米饭很好吃;哈哈!'
|
||||
self.assertEqual(len(p_re.split(t)), 4)
|
||||
|
||||
def test_issue_19(self):
|
||||
self.assertTrue('《' in hanzi.punctuation)
|
||||
self.assertTrue('·' in hanzi.punctuation)
|
||||
self.assertTrue('〈' in hanzi.punctuation)
|
||||
self.assertTrue('〉' in hanzi.punctuation)
|
||||
self.assertTrue('﹑' in hanzi.punctuation)
|
||||
self.assertTrue('﹔' in hanzi.punctuation)
|
@ -0,0 +1,204 @@
|
||||
|
||||
"""Tests for the zhon.pinyin module."""
|
||||
|
||||
import random
|
||||
import re
|
||||
import unittest
|
||||
|
||||
from zhon import pinyin
|
||||
|
||||
|
||||
NUM_WORDS = 50 # Number of random words to test
|
||||
WORD_LENGTH = 4 # Length of random words (number of syllables)
|
||||
NUM_SENT = 10 # Number of random sentences to test
|
||||
SENT_LENGTH = 5 # Length of random sentences (number of words)
|
||||
|
||||
VALID_SYLS = ( # 411 total syllables, including 'r'
|
||||
'ba', 'pa', 'ma', 'fa', 'da', 'ta', 'na', 'la', 'ga', 'ka', 'ha', 'za',
|
||||
'ca', 'sa', 'zha', 'cha', 'sha', 'a', 'bo', 'po', 'mo', 'fo', 'yo', 'lo',
|
||||
'o', 'me', 'de', 'te', 'ne', 'le', 'ge', 'ke', 'he', 'ze', 'ce', 'se',
|
||||
'zhe', 'che', 'she', 're', 'e', 'bai', 'pai', 'mai', 'dai', 'tai',
|
||||
'nai', 'lai', 'gai', 'kai', 'hai', 'zai', 'cai', 'sai', 'zhai', 'chai',
|
||||
'shai', 'ai', 'bei', 'pei', 'mei', 'fei', 'dei', 'tei', 'nei', 'lei',
|
||||
'gei', 'kei', 'hei', 'zei', 'zhei', 'shei', 'ei', 'bao', 'pao', 'mao',
|
||||
'dao', 'tao', 'nao', 'lao', 'gao', 'kao', 'hao', 'zao', 'cao', 'sao',
|
||||
'zhao', 'chao', 'shao', 'rao', 'ao', 'pou', 'mou', 'fou', 'dou', 'tou',
|
||||
'nou', 'lou', 'gou', 'kou', 'hou', 'zou', 'cou', 'sou', 'zhou', 'chou',
|
||||
'shou', 'rou', 'ou', 'ban', 'pan', 'man', 'fan', 'dan', 'tan', 'nan',
|
||||
'lan', 'gan', 'kan', 'han', 'zan', 'can', 'san', 'zhan', 'chan',
|
||||
'shan', 'ran', 'an', 'bang', 'pang', 'mang', 'fang', 'dang', 'tang',
|
||||
'nang', 'lang', 'gang', 'kang', 'hang', 'zang', 'cang', 'sang',
|
||||
'zhang', 'chang', 'shang', 'rang', 'ang', 'ben', 'pen', 'men', 'fen',
|
||||
'den', 'nen', 'gen', 'ken', 'hen', 'zen', 'cen', 'sen', 'zhen', 'chen',
|
||||
'shen', 'ren', 'en', 'beng', 'peng', 'meng', 'feng', 'deng', 'teng',
|
||||
'neng', 'leng', 'geng', 'keng', 'heng', 'zeng', 'ceng', 'seng',
|
||||
'zheng', 'cheng', 'sheng', 'reng', 'eng', 'dong', 'tong', 'nong',
|
||||
'long', 'gong', 'kong', 'hong', 'zong', 'cong', 'song', 'zhong',
|
||||
'chong', 'rong', 'bu', 'pu', 'mu', 'fu', 'du', 'tu', 'nu', 'lu',
|
||||
'gu', 'ku', 'hu', 'zu', 'cu', 'su', 'zhu', 'chu', 'shu', 'ru', 'wu',
|
||||
'gua', 'kua', 'hua', 'zhua', 'chua', 'shua', 'rua', 'wa', 'duo', 'tuo',
|
||||
'nuo', 'luo', 'guo', 'kuo', 'huo', 'zuo', 'cuo', 'suo', 'zhuo', 'chuo',
|
||||
'shuo', 'ruo', 'wo', 'guai', 'kuai', 'huai', 'zhuai', 'chuai', 'shuai',
|
||||
'wai', 'dui', 'tui', 'gui', 'kui', 'hui', 'zui', 'cui', 'sui', 'zhui',
|
||||
'chui', 'shui', 'rui', 'wei', 'duan', 'tuan', 'nuan', 'luan', 'guan',
|
||||
'kuan', 'huan', 'zuan', 'cuan', 'suan', 'zhuan', 'chuan', 'shuan',
|
||||
'ruan', 'wan', 'guang', 'kuang', 'huang', 'zhuang', 'chuang', 'shuang',
|
||||
'wang', 'dun', 'tun', 'nun', 'lun', 'gun', 'kun', 'hun', 'zun', 'cun',
|
||||
'sun', 'zhun', 'chun', 'shun', 'run', 'wen', 'weng', 'bi', 'pi', 'mi',
|
||||
'di', 'ti', 'ni', 'li', 'zi', 'ci', 'si', 'zhi', 'chi', 'shi', 'ri',
|
||||
'ji', 'qi', 'xi', 'yi', 'dia', 'lia', 'jia', 'qia', 'xia', 'ya', 'bie',
|
||||
'pie', 'mie', 'die', 'tie', 'nie', 'lie', 'jie', 'qie', 'xie', 'ye',
|
||||
'biao', 'piao', 'miao', 'diao', 'tiao', 'niao', 'liao', 'jiao', 'qiao',
|
||||
'xiao', 'yao', 'miu', 'diu', 'niu', 'liu', 'jiu', 'qiu', 'xiu', 'you',
|
||||
'bian', 'pian', 'mian', 'dian', 'tian', 'nian', 'lian', 'jian', 'qian',
|
||||
'xian', 'yan', 'niang', 'liang', 'jiang', 'qiang', 'xiang', 'yang',
|
||||
'bin', 'pin', 'min', 'nin', 'lin', 'jin', 'qin', 'xin', 'yin', 'bing',
|
||||
'ping', 'ming', 'ding', 'ting', 'ning', 'ling', 'jing', 'qing', 'xing',
|
||||
'ying', 'jiong', 'qiong', 'xiong', 'yong', 'nü', 'lü', 'ju', 'qu',
|
||||
'xu', 'yu', 'nüe', 'lüe', 'jue', 'que', 'xue', 'yue', 'juan', 'quan',
|
||||
'xuan', 'yuan', 'jun', 'qun', 'xun', 'yun', 'er', 'r'
|
||||
)
|
||||
|
||||
SYL = re.compile(pinyin.syllable)
|
||||
A_SYL = re.compile(pinyin.a_syl)
|
||||
N_SYL = re.compile(pinyin.n_syl)
|
||||
WORD = re.compile(pinyin.word)
|
||||
N_WORD = re.compile(pinyin.n_word)
|
||||
A_WORD = re.compile(pinyin.a_word)
|
||||
SENT = re.compile(pinyin.sentence)
|
||||
N_SENT = re.compile(pinyin.n_sent)
|
||||
A_SENT = re.compile(pinyin.a_sent)
|
||||
|
||||
|
||||
VOWELS = 'aeiou\u00FC'
|
||||
VOWEL_MAP = {
|
||||
'a1': '\u0101', 'a2': '\xe1', 'a3': '\u01ce', 'a4': '\xe0', 'a5': 'a',
|
||||
'e1': '\u0113', 'e2': '\xe9', 'e3': '\u011b', 'e4': '\xe8', 'e5': 'e',
|
||||
'i1': '\u012b', 'i2': '\xed', 'i3': '\u01d0', 'i4': '\xec', 'i5': 'i',
|
||||
'o1': '\u014d', 'o2': '\xf3', 'o3': '\u01d2', 'o4': '\xf2', 'o5': 'o',
|
||||
'u1': '\u016b', 'u2': '\xfa', 'u3': '\u01d4', 'u4': '\xf9', 'u5': 'u',
|
||||
'\u00fc1': '\u01d6', '\u00fc2': '\u01d8', '\u00fc3': '\u01da',
|
||||
'\u00fc4': '\u01dc', '\u00fc5': '\u00fc'
|
||||
}
|
||||
|
||||
|
||||
def _num_vowel_to_acc(vowel, tone):
|
||||
"""Convert a numbered vowel to an accented vowel."""
|
||||
try:
|
||||
return VOWEL_MAP[vowel + str(tone)]
|
||||
except IndexError:
|
||||
raise ValueError("Vowel must be one of '{}' and tone must be an int"
|
||||
"1-5.".format(VOWELS))
|
||||
|
||||
|
||||
def num_syl_to_acc(syllable):
|
||||
"""Convert a numbered pinyin syllable to an accented pinyin syllable.
|
||||
|
||||
Implements the following algorithm:
|
||||
1. If the syllable has an 'a' or 'e', put the tone over that vowel.
|
||||
2. If the syllable has 'ou', place the tone over the 'o'.
|
||||
3. Otherwise, put the tone on the last vowel.
|
||||
|
||||
"""
|
||||
if syllable.startswith('r') and len(syllable) <= 2:
|
||||
return 'r' # Special case for 'r' syllable.
|
||||
if re.search('[{}]'.format(VOWELS), syllable) is None:
|
||||
return syllable
|
||||
syl, tone = syllable[:-1], syllable[-1]
|
||||
if tone not in '12345':
|
||||
# We did not find a tone number. Abort conversion.
|
||||
return syl
|
||||
syl = re.sub('u:|v', '\u00fc', syl)
|
||||
if 'a' in syl:
|
||||
return syl.replace('a', _num_vowel_to_acc('a', tone))
|
||||
elif 'e' in syl:
|
||||
return syl.replace('e', _num_vowel_to_acc('e', tone))
|
||||
elif 'ou' in syl:
|
||||
return syl.replace('o', _num_vowel_to_acc('o', tone))
|
||||
last_vowel = syl[max(map(syl.rfind, VOWELS))] # Find last vowel index.
|
||||
return syl.replace(last_vowel, _num_vowel_to_acc(last_vowel, tone))
|
||||
|
||||
|
||||
class TestPinyinSyllables(unittest.TestCase):
|
||||
|
||||
maxDiff = None
|
||||
|
||||
def test_number_syllables(self):
|
||||
vs = list(VALID_SYLS)
|
||||
_vs = []
|
||||
for n in range(0, len(vs)):
|
||||
vs[n] = vs[n] + str(random.randint(1, 5))
|
||||
_vs.append(vs[n])
|
||||
if _vs[n][0] in 'aeo':
|
||||
_vs[n] = "'{}".format(_vs[n])
|
||||
s = ''.join(_vs)
|
||||
self.assertEqual(SYL.findall(s), vs)
|
||||
self.assertEqual(N_SYL.findall(s), vs)
|
||||
|
||||
def test_accent_syllables(self):
|
||||
vs = list(VALID_SYLS)
|
||||
_vs = []
|
||||
for n in range(0, len(vs)):
|
||||
syl = vs[n]
|
||||
vs[n] = num_syl_to_acc(vs[n] + str(random.randint(1, 5)))
|
||||
_vs.append(vs[n])
|
||||
if syl[0] in 'aeo':
|
||||
_vs[n] = "'{}".format(_vs[n])
|
||||
s = ''.join(_vs)
|
||||
self.assertEqual(SYL.findall(s), vs)
|
||||
self.assertEqual(A_SYL.findall(s), vs)
|
||||
|
||||
|
||||
def create_word(accented=False):
|
||||
if accented:
|
||||
tone = lambda: str(random.randint(1, 5))
|
||||
vs = [num_syl_to_acc(s + tone()) for s in VALID_SYLS]
|
||||
else:
|
||||
vs = [s + str(random.randint(1, 5)) for s in VALID_SYLS]
|
||||
word = vs[random.randint(0, len(vs) - 1)]
|
||||
for n in range(1, WORD_LENGTH):
|
||||
num = random.randint(0, len(vs) - 1)
|
||||
word += ['-', ''][random.randint(0, 1)]
|
||||
if VALID_SYLS[num][0] in 'aeo' and word[-1] != '-':
|
||||
word += "'"
|
||||
word += vs[num]
|
||||
return word
|
||||
|
||||
|
||||
class TestPinyinWords(unittest.TestCase):
|
||||
|
||||
def test_number_words(self):
|
||||
for n in range(0, NUM_WORDS):
|
||||
word = create_word()
|
||||
self.assertEqual(WORD.match(word).group(0), word)
|
||||
self.assertEqual(N_WORD.match(word).group(0), word)
|
||||
|
||||
def test_accent_words(self):
|
||||
for n in range(0, NUM_WORDS):
|
||||
word = create_word(accented=True)
|
||||
self.assertEqual(WORD.match(word).group(0), word)
|
||||
self.assertEqual(A_WORD.match(word).group(0), word)
|
||||
|
||||
|
||||
def create_sentence(accented=False):
|
||||
_sent = []
|
||||
for n in range(0, SENT_LENGTH):
|
||||
_sent.append(create_word(accented=accented))
|
||||
sentence = [_sent.pop(0)]
|
||||
sentence.extend([random.choice([' ', ', ', '; ']) + w for w in _sent])
|
||||
return ''.join(sentence) + '.'
|
||||
|
||||
|
||||
class TestPinyinSentences(unittest.TestCase):
|
||||
|
||||
def test_number_sentences(self):
|
||||
for n in range(0, NUM_SENT):
|
||||
sentence = create_sentence()
|
||||
self.assertEqual(SENT.match(sentence).group(0), sentence)
|
||||
self.assertEqual(N_SENT.match(sentence).group(0), sentence)
|
||||
|
||||
def test_accent_sentences(self):
|
||||
for n in range(0, NUM_SENT):
|
||||
sentence = create_sentence(accented=True)
|
||||
self.assertEqual(SENT.match(sentence).group(0), sentence)
|
||||
self.assertEqual(A_SENT.match(sentence).group(0), sentence)
|
@ -0,0 +1,78 @@
|
||||
|
||||
"""Tests for the zhon.zhuyin module."""
|
||||
|
||||
import random
|
||||
import re
|
||||
import unittest
|
||||
|
||||
from zhon import zhuyin
|
||||
|
||||
VALID_SYLS = (
|
||||
'ㄓ', 'ㄔ', 'ㄕ', 'ㄖ', 'ㄗ', 'ㄘ', 'ㄙ', 'ㄚ', 'ㄅㄚ', 'ㄆㄚ', 'ㄇㄚ',
|
||||
'ㄈㄚ', 'ㄉㄚ', 'ㄊㄚ', 'ㄋㄚ', 'ㄌㄚ', 'ㄍㄚ', 'ㄎㄚ', 'ㄏㄚ', 'ㄓㄚ',
|
||||
'ㄔㄚ', 'ㄕㄚ', 'ㄗㄚ', 'ㄘㄚ', 'ㄙㄚ', 'ㄛ', 'ㄅㄛ', 'ㄆㄛ', 'ㄇㄛ',
|
||||
'ㄈㄛ', 'ㄌㄛ', 'ㄜ', 'ㄇㄜ', 'ㄉㄜ', 'ㄊㄜ', 'ㄋㄜ', 'ㄌㄜ', 'ㄍㄜ',
|
||||
'ㄎㄜ', 'ㄏㄜ', 'ㄓㄜ', 'ㄔㄜ', 'ㄕㄜ', 'ㄖㄜ', 'ㄗㄜ', 'ㄘㄜ', 'ㄙㄜ',
|
||||
'ㄝ', 'ㄞ', 'ㄅㄞ', 'ㄆㄞ', 'ㄇㄞ', 'ㄉㄞ', 'ㄊㄞ', 'ㄋㄞ', 'ㄌㄞ', 'ㄍㄞ',
|
||||
'ㄎㄞ', 'ㄏㄞ', 'ㄓㄞ', 'ㄔㄞ', 'ㄕㄞ', 'ㄗㄞ', 'ㄘㄞ', 'ㄙㄞ', 'ㄟ',
|
||||
'ㄅㄟ', 'ㄆㄟ', 'ㄇㄟ', 'ㄈㄟ', 'ㄉㄟ', 'ㄋㄟ', 'ㄌㄟ', 'ㄍㄟ', 'ㄏㄟ',
|
||||
'ㄓㄟ', 'ㄕㄟ', 'ㄗㄟ', 'ㄠ', 'ㄅㄠ', 'ㄆㄠ', 'ㄇㄠ', 'ㄉㄠ', 'ㄊㄠ',
|
||||
'ㄋㄠ', 'ㄌㄠ', 'ㄍㄠ', 'ㄎㄠ', 'ㄏㄠ', 'ㄓㄠ', 'ㄔㄠ', 'ㄕㄠ', 'ㄖㄠ',
|
||||
'ㄗㄠ', 'ㄘㄠ', 'ㄙㄠ', 'ㄡ', 'ㄆㄡ', 'ㄇㄡ', 'ㄈㄡ', 'ㄉㄡ', 'ㄊㄡ',
|
||||
'ㄋㄡ', 'ㄌㄡ', 'ㄍㄡ', 'ㄎㄡ', 'ㄏㄡ', 'ㄓㄡ', 'ㄔㄡ', 'ㄕㄡ', 'ㄖㄡ',
|
||||
'ㄗㄡ', 'ㄘㄡ', 'ㄙㄡ', 'ㄢ', 'ㄅㄢ', 'ㄆㄢ', 'ㄇㄢ', 'ㄈㄢ', 'ㄉㄢ',
|
||||
'ㄊㄢ', 'ㄋㄢ', 'ㄌㄢ', 'ㄍㄢ', 'ㄎㄢ', 'ㄏㄢ', 'ㄓㄢ', 'ㄔㄢ', 'ㄕㄢ',
|
||||
'ㄖㄢ', 'ㄗㄢ', 'ㄘㄢ', 'ㄙㄢ', 'ㄣ', 'ㄅㄣ', 'ㄆㄣ', 'ㄇㄣ', 'ㄈㄣ',
|
||||
'ㄋㄣ', 'ㄍㄣ', 'ㄎㄣ', 'ㄏㄣ', 'ㄓㄣ', 'ㄔㄣ', 'ㄕㄣ', 'ㄖㄣ', 'ㄗㄣ',
|
||||
'ㄘㄣ', 'ㄙㄣ', 'ㄤ', 'ㄅㄤ', 'ㄆㄤ', 'ㄇㄤ', 'ㄈㄤ', 'ㄉㄤ', 'ㄊㄤ',
|
||||
'ㄋㄤ', 'ㄌㄤ', 'ㄍㄤ', 'ㄎㄤ', 'ㄏㄤ', 'ㄓㄤ', 'ㄔㄤ', 'ㄕㄤ', 'ㄖㄤ',
|
||||
'ㄗㄤ', 'ㄘㄤ', 'ㄙㄤ', 'ㄥ', 'ㄅㄥ', 'ㄆㄥ', 'ㄇㄥ', 'ㄈㄥ', 'ㄉㄥ',
|
||||
'ㄊㄥ', 'ㄋㄥ', 'ㄌㄥ', 'ㄍㄥ', 'ㄎㄥ', 'ㄏㄥ', 'ㄓㄥ', 'ㄔㄥ', 'ㄕㄥ',
|
||||
'ㄖㄥ', 'ㄗㄥ', 'ㄘㄥ', 'ㄙㄥ', 'ㄦ', 'ㄧ', 'ㄅㄧ', 'ㄆㄧ', 'ㄇㄧ', 'ㄉㄧ',
|
||||
'ㄊㄧ', 'ㄋㄧ', 'ㄌㄧ', 'ㄐㄧ', 'ㄑㄧ', 'ㄒㄧ', 'ㄧㄚ', 'ㄉㄧㄚ', 'ㄌㄧㄚ',
|
||||
'ㄐㄧㄚ', 'ㄑㄧㄚ', 'ㄒㄧㄚ', 'ㄧㄛ', 'ㄧㄝ', 'ㄅㄧㄝ', 'ㄆㄧㄝ', 'ㄇㄧㄝ',
|
||||
'ㄉㄧㄝ', 'ㄊㄧㄝ', 'ㄋㄧㄝ', 'ㄌㄧㄝ', 'ㄐㄧㄝ', 'ㄑㄧㄝ', 'ㄒㄧㄝ',
|
||||
'ㄧㄞ', 'ㄧㄠ', 'ㄅㄧㄠ', 'ㄆㄧㄠ', 'ㄇㄧㄠ', 'ㄉㄧㄠ', 'ㄊㄧㄠ', 'ㄋㄧㄠ',
|
||||
'ㄌㄧㄠ', 'ㄐㄧㄠ', 'ㄑㄧㄠ', 'ㄒㄧㄠ', 'ㄧㄡ', 'ㄇㄧㄡ', 'ㄉㄧㄡ',
|
||||
'ㄋㄧㄡ', 'ㄌㄧㄡ', 'ㄐㄧㄡ', 'ㄑㄧㄡ', 'ㄒㄧㄡ', 'ㄧㄢ', 'ㄅㄧㄢ',
|
||||
'ㄆㄧㄢ', 'ㄇㄧㄢ', 'ㄉㄧㄢ', 'ㄊㄧㄢ', 'ㄋㄧㄢ', 'ㄌㄧㄢ', 'ㄐㄧㄢ',
|
||||
'ㄑㄧㄢ', 'ㄒㄧㄢ', 'ㄧㄣ', 'ㄅㄧㄣ', 'ㄆㄧㄣ', 'ㄇㄧㄣ', 'ㄋㄧㄣ',
|
||||
'ㄌㄧㄣ', 'ㄐㄧㄣ', 'ㄑㄧㄣ', 'ㄒㄧㄣ', 'ㄧㄤ', 'ㄋㄧㄤ', 'ㄌㄧㄤ',
|
||||
'ㄐㄧㄤ', 'ㄑㄧㄤ', 'ㄒㄧㄤ', 'ㄧㄥ', 'ㄅㄧㄥ', 'ㄆㄧㄥ', 'ㄇㄧㄥ',
|
||||
'ㄉㄧㄥ', 'ㄊㄧㄥ', 'ㄋㄧㄥ', 'ㄌㄧㄥ', 'ㄐㄧㄥ', 'ㄑㄧㄥ', 'ㄒㄧㄥ', 'ㄨ',
|
||||
'ㄅㄨ', 'ㄆㄨ', 'ㄇㄨ', 'ㄈㄨ', 'ㄉㄨ', 'ㄊㄨ', 'ㄋㄨ', 'ㄌㄨ', 'ㄍㄨ',
|
||||
'ㄎㄨ', 'ㄏㄨ', 'ㄓㄨ', 'ㄔㄨ', 'ㄕㄨ', 'ㄖㄨ', 'ㄗㄨ', 'ㄘㄨ', 'ㄙㄨ',
|
||||
'ㄨㄚ', 'ㄍㄨㄚ', 'ㄎㄨㄚ', 'ㄏㄨㄚ', 'ㄓㄨㄚ', 'ㄔㄨㄚ', 'ㄕㄨㄚ', 'ㄨㄛ',
|
||||
'ㄉㄨㄛ', 'ㄊㄨㄛ', 'ㄋㄨㄛ', 'ㄌㄨㄛ', 'ㄍㄨㄛ', 'ㄎㄨㄛ', 'ㄏㄨㄛ',
|
||||
'ㄓㄨㄛ', 'ㄔㄨㄛ', 'ㄕㄨㄛ', 'ㄖㄨㄛ', 'ㄗㄨㄛ', 'ㄘㄨㄛ', 'ㄙㄨㄛ',
|
||||
'ㄨㄞ', 'ㄍㄨㄞ', 'ㄎㄨㄞ', 'ㄏㄨㄞ', 'ㄓㄨㄞ', 'ㄔㄨㄞ', 'ㄕㄨㄞ', 'ㄨㄟ',
|
||||
'ㄉㄨㄟ', 'ㄊㄨㄟ', 'ㄍㄨㄟ', 'ㄎㄨㄟ', 'ㄏㄨㄟ', 'ㄓㄨㄟ', 'ㄔㄨㄟ',
|
||||
'ㄕㄨㄟ', 'ㄖㄨㄟ', 'ㄗㄨㄟ', 'ㄘㄨㄟ', 'ㄙㄨㄟ', 'ㄨㄢ', 'ㄉㄨㄢ',
|
||||
'ㄊㄨㄢ', 'ㄋㄨㄢ', 'ㄌㄨㄢ', 'ㄍㄨㄢ', 'ㄎㄨㄢ', 'ㄏㄨㄢ', 'ㄓㄨㄢ',
|
||||
'ㄔㄨㄢ', 'ㄕㄨㄢ', 'ㄖㄨㄢ', 'ㄗㄨㄢ', 'ㄘㄨㄢ', 'ㄙㄨㄢ', 'ㄨㄣ',
|
||||
'ㄉㄨㄣ', 'ㄊㄨㄣ', 'ㄌㄨㄣ', 'ㄍㄨㄣ', 'ㄎㄨㄣ', 'ㄏㄨㄣ', 'ㄓㄨㄣ',
|
||||
'ㄔㄨㄣ', 'ㄕㄨㄣ', 'ㄖㄨㄣ', 'ㄗㄨㄣ', 'ㄘㄨㄣ', 'ㄙㄨㄣ', 'ㄨㄤ',
|
||||
'ㄍㄨㄤ', 'ㄎㄨㄤ', 'ㄏㄨㄤ', 'ㄓㄨㄤ', 'ㄔㄨㄤ', 'ㄕㄨㄤ', 'ㄨㄥ',
|
||||
'ㄉㄨㄥ', 'ㄊㄨㄥ', 'ㄋㄨㄥ', 'ㄌㄨㄥ', 'ㄍㄨㄥ', 'ㄎㄨㄥ', 'ㄏㄨㄥ',
|
||||
'ㄓㄨㄥ', 'ㄔㄨㄥ', 'ㄖㄨㄥ', 'ㄗㄨㄥ', 'ㄘㄨㄥ', 'ㄙㄨㄥ', 'ㄩ', 'ㄋㄩ',
|
||||
'ㄌㄩ', 'ㄐㄩ', 'ㄑㄩ', 'ㄒㄩ', 'ㄩㄝ', 'ㄋㄩㄝ', 'ㄌㄩㄝ', 'ㄐㄩㄝ',
|
||||
'ㄑㄩㄝ', 'ㄒㄩㄝ', 'ㄩㄢ', 'ㄐㄩㄢ', 'ㄑㄩㄢ', 'ㄒㄩㄢ', 'ㄩㄣ', 'ㄌㄩㄣ',
|
||||
'ㄐㄩㄣ', 'ㄑㄩㄣ', 'ㄒㄩㄣ', 'ㄩㄥ', 'ㄐㄩㄥ', 'ㄑㄩㄥ', 'ㄒㄩㄥ'
|
||||
)
|
||||
|
||||
SYL = re.compile(zhuyin.syllable)
|
||||
|
||||
|
||||
def create_syllable():
|
||||
syl = random.choice(VALID_SYLS)
|
||||
return syl + random.choice(list(zhuyin.marks) + [' '])
|
||||
|
||||
|
||||
class TestZhuyinSyllables(unittest.TestCase):
|
||||
|
||||
def test_zhuyin_syllable(self):
|
||||
vs = []
|
||||
for n in range(0, len(VALID_SYLS)):
|
||||
vs.append(VALID_SYLS[n] + random.choice(list(zhuyin.marks) + ['']))
|
||||
s = ''.join(vs)
|
||||
self.assertEqual(''.join(SYL.findall(s)), s)
|
@ -0,0 +1,39 @@
|
||||
[tox]
|
||||
envlist = py27, py34, py35, py36, pep8, docs, packaging
|
||||
|
||||
[testenv]
|
||||
whitelist_externals = make
|
||||
setenv =
|
||||
PYTHONPATH = {toxinidir}:{toxinidir}/zhon
|
||||
commands = make test
|
||||
deps = -r{toxinidir}/requirements.txt
|
||||
|
||||
[testenv:pep8]
|
||||
whitelist_externals = make
|
||||
deps =
|
||||
flake8
|
||||
pep8-naming
|
||||
flake8-blind-except
|
||||
flake8-builtins
|
||||
flake8-pep3101
|
||||
flake8-string-format
|
||||
commands = make lint
|
||||
|
||||
[testenv:docs]
|
||||
changedir = docs
|
||||
deps =
|
||||
sphinx
|
||||
releases
|
||||
whitelist_externals = make
|
||||
commands =
|
||||
make clean
|
||||
make html
|
||||
make linkcheck
|
||||
|
||||
[testenv:packaging]
|
||||
deps =
|
||||
check-manifest
|
||||
readme_renderer
|
||||
commands =
|
||||
check-manifest
|
||||
python setup.py check -m -r -s
|
@ -0,0 +1,3 @@
|
||||
"""Provides constants used in Chinese text processing."""
|
||||
|
||||
__version__ = '1.1.5'
|
@ -0,0 +1,14 @@
|
||||
"""Provides CC-CEDICT character constants."""
|
||||
|
||||
from . import simplified
|
||||
from . import traditional
|
||||
from . import all
|
||||
|
||||
#: A string containing all Simplified characters according to CC-CEDICT.
|
||||
simp = simplified = simplified.CHARACTERS
|
||||
|
||||
#: A string containing all Traditional characters according to CC-CEDICT.
|
||||
trad = traditional = traditional.CHARACTERS
|
||||
|
||||
#: A string containing all Chinese characters found in CC-CEDICT.
|
||||
all = all.CHARACTERS
|
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@ -0,0 +1,91 @@
|
||||
|
||||
"""Constants for working with Chinese characters."""
|
||||
import sys
|
||||
|
||||
#: Character code ranges for pertinent CJK ideograph Unicode blocks.
|
||||
characters = cjk_ideographs = (
|
||||
'\u3007' # Ideographic number zero, see issue #17
|
||||
'\u4E00-\u9FFF' # CJK Unified Ideographs
|
||||
'\u3400-\u4DBF' # CJK Unified Ideographs Extension A
|
||||
'\uF900-\uFAFF' # CJK Compatibility Ideographs
|
||||
)
|
||||
if sys.maxunicode > 0xFFFF:
|
||||
characters += (
|
||||
'\U00020000-\U0002A6DF' # CJK Unified Ideographs Extension B
|
||||
'\U0002A700-\U0002B73F' # CJK Unified Ideographs Extension C
|
||||
'\U0002B740-\U0002B81F' # CJK Unified Ideographs Extension D
|
||||
'\U0002F800-\U0002FA1F' # CJK Compatibility Ideographs Supplement
|
||||
)
|
||||
|
||||
#: Character code ranges for the Kangxi radicals and CJK Radicals Supplement.
|
||||
radicals = (
|
||||
'\u2F00-\u2FD5' # Kangxi Radicals
|
||||
'\u2E80-\u2EF3' # CJK Radicals Supplement
|
||||
)
|
||||
|
||||
#: A string containing Chinese punctuation marks (non-stops).
|
||||
non_stops = (
|
||||
# Fullwidth ASCII variants
|
||||
'\uFF02\uFF03\uFF04\uFF05\uFF06\uFF07\uFF08\uFF09\uFF0A\uFF0B\uFF0C\uFF0D'
|
||||
'\uFF0F\uFF1A\uFF1B\uFF1C\uFF1D\uFF1E\uFF20\uFF3B\uFF3C\uFF3D\uFF3E\uFF3F'
|
||||
'\uFF40\uFF5B\uFF5C\uFF5D\uFF5E\uFF5F\uFF60'
|
||||
|
||||
# Halfwidth CJK punctuation
|
||||
'\uFF62\uFF63\uFF64'
|
||||
|
||||
# CJK symbols and punctuation
|
||||
'\u3000\u3001\u3003'
|
||||
|
||||
# CJK angle and corner brackets
|
||||
'\u3008\u3009\u300A\u300B\u300C\u300D\u300E\u300F\u3010\u3011'
|
||||
|
||||
# CJK brackets and symbols/punctuation
|
||||
'\u3014\u3015\u3016\u3017\u3018\u3019\u301A\u301B\u301C\u301D\u301E\u301F'
|
||||
|
||||
# Other CJK symbols
|
||||
'\u3030'
|
||||
|
||||
# Special CJK indicators
|
||||
'\u303E\u303F'
|
||||
|
||||
# Dashes
|
||||
'\u2013\u2014'
|
||||
|
||||
# Quotation marks and apostrophe
|
||||
'\u2018\u2019\u201B\u201C\u201D\u201E\u201F'
|
||||
|
||||
# General punctuation
|
||||
'\u2026\u2027'
|
||||
|
||||
# Overscores and underscores
|
||||
'\uFE4F'
|
||||
|
||||
# Small form variants
|
||||
'\uFE51\uFE54'
|
||||
|
||||
# Latin punctuation
|
||||
'\u00B7'
|
||||
)
|
||||
|
||||
#: A string of Chinese stops.
|
||||
stops = (
|
||||
'\uFF01' # Fullwidth exclamation mark
|
||||
'\uFF1F' # Fullwidth question mark
|
||||
'\uFF61' # Halfwidth ideographic full stop
|
||||
'\u3002' # Ideographic full stop
|
||||
)
|
||||
|
||||
#: A string containing all Chinese punctuation.
|
||||
punctuation = non_stops + stops
|
||||
|
||||
# A sentence end is defined by a stop followed by zero or more
|
||||
# container-closing marks (e.g. quotation or brackets).
|
||||
_sentence_end = '[{stops}]*'.format(stops=stops) + '[」﹂”』’》)]}〕〗〙〛〉】]*'
|
||||
|
||||
#: A regular expression pattern for a Chinese sentence. A sentence is defined
|
||||
#: as a series of characters and non-stop punctuation marks followed by a stop
|
||||
#: and zero or more container-closing punctuation marks (e.g. apostrophe or
|
||||
# brackets).
|
||||
sent = sentence = '[{characters}{radicals}{non_stops}]*{sentence_end}'.format(
|
||||
characters=characters, radicals=radicals, non_stops=non_stops,
|
||||
sentence_end=_sentence_end)
|
@ -0,0 +1,181 @@
|
||||
|
||||
"""Constants for processing Pinyin strings."""
|
||||
from string import whitespace
|
||||
|
||||
_a = 'a\u0101\u00E0\u00E1\u01CE'
|
||||
_e = 'e\u0113\u00E9\u011B\u00E8'
|
||||
_i = 'i\u012B\u00ED\u01D0\u00EC'
|
||||
_o = 'o\u014D\u00F3\u01D2\u00F2'
|
||||
_u = 'u\u016B\u00FA\u01D4\u00F9'
|
||||
_v = 'v\u00FC\u01D6\u01D8\u01DA\u01DC'
|
||||
|
||||
_lowercase_vowels = _a + _e + _i + _o + _u + _v
|
||||
_uppercase_vowels = _lowercase_vowels.upper()
|
||||
_lowercase_consonants = 'bpmfdtnlgkhjqxzcsrwy'
|
||||
_uppercase_consonants = _lowercase_consonants.upper()
|
||||
|
||||
#: A string containing every Pinyin vowel (lowercase and uppercase).
|
||||
vowels = _lowercase_vowels + _uppercase_vowels
|
||||
|
||||
#: A string containing every Pinyin consonant (lowercase and uppercase).
|
||||
consonants = _lowercase_consonants + _uppercase_consonants
|
||||
|
||||
#: A string containing every lowercase Pinyin character.
|
||||
lowercase = _lowercase_consonants + _lowercase_vowels
|
||||
|
||||
#: A string containing every uppercase Pinyin character.
|
||||
uppercase = _uppercase_consonants + _uppercase_vowels
|
||||
|
||||
#: A string containing all Pinyin marks that have special meaning:
|
||||
#: middle dot and numbers for tones, colon for easily writing \u00FC ('u:'),
|
||||
#: hyphen for connecting syllables within words, and apostrophe for
|
||||
#: separating a syllable beginning with a vowel from the previous syllable
|
||||
#: in its word. All of these marks can be used within a valid Pinyin word.
|
||||
marks = "·012345:-'"
|
||||
|
||||
#: A string containing valid punctuation marks that are not stops.
|
||||
non_stops = """"#$%&'()*+,-/:;<=>@[\]^_`{|}~"""
|
||||
|
||||
#: A string containing valid stop punctuation marks.
|
||||
stops = '.!?'
|
||||
|
||||
#: A string containing all punctuation marks.
|
||||
punctuation = non_stops + stops
|
||||
|
||||
#: A string containing all printable Pinyin characters, marks, punctuation,
|
||||
#: and whitespace.
|
||||
printable = vowels + consonants + marks[:-3] + whitespace + punctuation
|
||||
|
||||
_a_vowels = {'a': _a, 'e': _e, 'i': _i, 'o': _o, 'u': _u, 'v': _v}
|
||||
_n_vowels = {'a': 'a', 'e': 'e', 'i': 'i', 'o': 'o', 'u': 'u', 'v': 'v\u00FC'}
|
||||
|
||||
|
||||
def _build_syl(vowels, tone_numbers=False):
|
||||
"""Builds a Pinyin syllable re pattern.
|
||||
|
||||
Syllables can be preceded by a middle dot (tone mark). Syllables that end
|
||||
in a consonant are only valid if they aren't followed directly by a vowel
|
||||
with no apostrophe in between.
|
||||
|
||||
The rough approach used to validate a Pinyin syllable is:
|
||||
1. Get the longest valid syllable.
|
||||
2. If it ends in a consonant make sure it's not followed directly by a
|
||||
vowel (hyphens and apostrophes don't count).
|
||||
3. If the above didn't match, repeat for the next longest valid match.
|
||||
|
||||
Lookahead assertions are used to ensure that hyphens and apostrophes are
|
||||
only considered valid if used correctly. This helps to weed out non-Pinyin
|
||||
strings.
|
||||
|
||||
"""
|
||||
# This is the end-of-syllable-consonant lookahead assertion.
|
||||
consonant_end = '(?![{a}{e}{i}{o}{u}{v}]|u:)'.format(
|
||||
a=_a, e=_e, i=_i, o=_o, u=_u, v=_v
|
||||
)
|
||||
_vowels = vowels.copy()
|
||||
for v, s in _vowels.items():
|
||||
if len(s) > 1:
|
||||
_vowels[v] = '[{}]'.format(s)
|
||||
return (
|
||||
'(?:\u00B7|\u2027)?'
|
||||
'(?:'
|
||||
'(?:(?:[zcs]h|[gkh])u%(a)sng%(consonant_end)s)|'
|
||||
'(?:[jqx]i%(o)sng%(consonant_end)s)|'
|
||||
'(?:[nljqx]i%(a)sng%(consonant_end)s)|'
|
||||
'(?:(?:[zcs]h?|[dtnlgkhrjqxy])u%(a)sn%(consonant_end)s)|'
|
||||
'(?:(?:[zcs]h|[gkh])u%(a)si)|'
|
||||
'(?:(?:[zc]h?|[rdtnlgkhsy])%(o)sng%(consonant_end)s)|'
|
||||
'(?:(?:[zcs]h?|[rbpmfdtnlgkhw])?%(e)sng%(consonant_end)s)|'
|
||||
'(?:(?:[zcs]h?|[rbpmfdtnlgkhwy])?%(a)sng%(consonant_end)s)|'
|
||||
'(?:[bpmdtnljqxy]%(i)sng%(consonant_end)s)|'
|
||||
'(?:[bpmdtnljqx]i%(a)sn%(consonant_end)s)|'
|
||||
'(?:[bpmdtnljqx]i%(a)so)|'
|
||||
'(?:[nl](?:v|u:|\u00FC)%(e)s)|'
|
||||
'(?:[nl](?:%(v)s|u:))|'
|
||||
'(?:[jqxy]u%(e)s)|'
|
||||
'(?:[bpmnljqxy]%(i)sn%(consonant_end)s)|'
|
||||
'(?:[mdnljqx]i%(u)s)|'
|
||||
'(?:[bpmdtnljqx]i%(e)s)|'
|
||||
'(?:[dljqx]i%(a)s)|'
|
||||
'(?:(?:[zcs]h?|[rdtnlgkhxqjy])%(u)sn%(consonant_end)s)|'
|
||||
'(?:(?:[zcs]h?|[rdtgkh])u%(i)s)|'
|
||||
'(?:(?:[zcs]h?|[rdtnlgkh])u%(o)s)|'
|
||||
'(?:(?:[zcs]h|[rgkh])u%(a)s)|'
|
||||
'(?:(?:[zcs]h?|[rbpmfdngkhw])?%(e)sn%(consonant_end)s)|'
|
||||
'(?:(?:[zcs]h?|[rbpmfdtnlgkhwy])?%(a)sn%(consonant_end)s)|'
|
||||
'(?:(?:[zcs]h?|[rpmfdtnlgkhy])?%(o)su)|'
|
||||
'(?:(?:[zcs]h?|[rbpmdtnlgkhy])?%(a)so)|'
|
||||
'(?:(?:[zs]h|[bpmfdtnlgkhwz])?%(e)si)|'
|
||||
'(?:(?:[zcs]h?|[bpmdtnlgkhw])?%(a)si)|'
|
||||
'(?:(?:[zcs]h?|[rjqxybpmdtnl])%(i)s)|'
|
||||
'(?:(?:[zcs]h?|[rwbpmfdtnlgkhjqxwy])%(u)s)|'
|
||||
'(?:%(e)s(?:r%(consonant_end)s)?)|'
|
||||
'(?:(?:[zcs]h?|[rmdtnlgkhy])%(e)s)|'
|
||||
'(?:[bpmfwyl]?%(o)s)|'
|
||||
'(?:(?:[zcs]h|[bpmfdtnlgkhzcswy])?%(a)s)|'
|
||||
'(?:r%(consonant_end)s)'
|
||||
')' + ('[0-5]?' if tone_numbers else '')
|
||||
) % {
|
||||
'consonant_end': consonant_end, 'a': _vowels['a'], 'e': _vowels['e'],
|
||||
'i': _vowels['i'], 'o': _vowels['o'], 'u': _vowels['u'],
|
||||
'v': _vowels['v']
|
||||
}
|
||||
|
||||
|
||||
def _build_word(syl, vowels):
|
||||
"""Builds a Pinyin word re pattern from a Pinyin syllable re pattern.
|
||||
|
||||
A word is defined as a series of consecutive valid Pinyin syllables
|
||||
with optional hyphens and apostrophes interspersed. Hyphens must be
|
||||
followed immediately by another valid Pinyin syllable. Apostrophes must be
|
||||
followed by another valid Pinyin syllable that starts with an 'a', 'e', or
|
||||
'o'.
|
||||
|
||||
"""
|
||||
return "(?:{syl}(?:-(?={syl})|'(?=[{a}{e}{o}])(?={syl}))?)+".format(
|
||||
syl=syl, a=vowels['a'], e=vowels['e'], o=vowels['o'])
|
||||
|
||||
|
||||
def _build_sentence(word):
|
||||
"""Builds a Pinyin sentence re pattern from a Pinyin word re pattern.
|
||||
|
||||
A sentence is defined as a series of valid Pinyin words, punctuation
|
||||
(non-stops), and spaces followed by a single stop and zero or more
|
||||
container-closing punctuation marks (e.g. apostrophe and brackets).
|
||||
|
||||
"""
|
||||
return (
|
||||
"(?:{word}|[{non_stops}]|(?<![{stops} ]) )+"
|
||||
"[{stops}]['\"\]\}}\)]*"
|
||||
).format(word=word, non_stops=non_stops.replace('-', '\-'),
|
||||
stops=stops)
|
||||
|
||||
|
||||
#: A regular expression pattern for a valid accented Pinyin syllable.
|
||||
a_syl = acc_syl = accented_syllable = _build_syl(_a_vowels, tone_numbers=False)
|
||||
|
||||
#: A regular expression pattern for a valid numbered Pinyin syllable.
|
||||
n_syl = num_syl = numbered_syllable = _build_syl(_n_vowels, tone_numbers=True)
|
||||
|
||||
#: A regular expression pattern for a valid Pinyin syllable.
|
||||
syl = syllable = _build_syl(_a_vowels, tone_numbers=True)
|
||||
|
||||
|
||||
#: A regular expression pattern for a valid accented Pinyin word.
|
||||
a_word = acc_word = accented_word = _build_word(a_syl, _a_vowels)
|
||||
|
||||
#: A regular expression pattern for a valid numbered Pinyin word.
|
||||
n_word = num_word = numbered_word = _build_word(n_syl, _n_vowels)
|
||||
|
||||
#: A regular expression pattern for a valid Pinyin word.
|
||||
word = _build_word(syl, _a_vowels)
|
||||
|
||||
|
||||
#: A regular expression pattern for a valid accented Pinyin sentence.
|
||||
a_sent = acc_sent = accented_sentence = _build_sentence(a_word)
|
||||
|
||||
#: A regular expression pattern for a valid numbered Pinyin sentence.
|
||||
n_sent = num_sent = numbered_sentence = _build_sentence(n_word)
|
||||
|
||||
#: A regular expression pattern for a valid Pinyin sentence.
|
||||
sent = sentence = _build_sentence(word)
|
@ -0,0 +1,47 @@
|
||||
|
||||
"""Constants for working with Zhuyin (Bopomofo)."""
|
||||
|
||||
#: A string containing all Zhuyin characters.
|
||||
characters = (
|
||||
'ㄅㄆㄇㄈㄉㄊㄋㄌㄍㄎㄏㄐㄑㄒㄓㄔㄕㄖㄗㄘㄙ'
|
||||
'ㄚㄛㄝㄜㄞㄟㄠㄡㄢㄣㄤㄥㄦㄧㄨㄩㄭ'
|
||||
)
|
||||
|
||||
#: A string containing all Zhuyin tone marks.
|
||||
marks = (
|
||||
'\u02C7' # Caron
|
||||
'\u02CA' # Modifier letter accute accent
|
||||
'\u02CB' # Modifier letter grave accent
|
||||
'\u02D9' # Dot above
|
||||
)
|
||||
|
||||
#: A regular expression pattern for a Zhuyin syllable.
|
||||
syl = syllable = (
|
||||
'(?:'
|
||||
'[ㄇㄉㄊㄋㄌㄍㄎㄏㄓㄔㄕㄖㄗㄘㄙ]?ㄜ|'
|
||||
'[ㄅㄆㄇㄉㄊㄋㄌㄍㄎㄏㄓㄔㄕㄗㄘㄙㄧ]?ㄞ|'
|
||||
'[ㄅㄆㄇㄈㄉㄋㄌㄍㄏㄓㄕㄗ]?ㄟ|'
|
||||
'[ㄅㄆㄇㄈㄋㄍㄎㄏㄓㄔㄕㄖㄗㄘㄙ]?ㄣ|'
|
||||
'[ㄉㄌㄐㄑㄒ]?ㄧㄚ|'
|
||||
'[ㄅㄆㄇㄈㄉㄊㄋㄌㄍㄎㄏㄓㄔㄕㄗㄘㄙ]?ㄚ|'
|
||||
'[ㄅㄆㄇㄉㄊㄋㄌㄍㄎㄏㄓㄔㄕㄖㄗㄘㄙ]?ㄠ|'
|
||||
'[ㄆㄇㄈㄉㄊㄋㄌㄍㄎㄏㄓㄔㄕㄖㄗㄘㄙ]?ㄡ|'
|
||||
'[ㄅㄆㄇㄈㄉㄊㄋㄌㄍㄎㄏㄓㄔㄕㄖㄗㄘㄙ]?ㄢ|'
|
||||
'[ㄇㄉㄋㄌㄐㄑㄒ]?ㄧㄡ|'
|
||||
'[ㄅㄆㄇㄋㄌㄐㄑㄒ]?ㄧㄣ|'
|
||||
'[ㄐㄑㄒ]?ㄩ[ㄢㄥ]|'
|
||||
'[ㄌㄐㄑㄒ]?ㄩㄣ|'
|
||||
'[ㄋㄌㄐㄑㄒ]?(?:ㄩㄝ?|ㄧㄤ)|'
|
||||
'[ㄅㄆㄇㄈㄌㄧ]?ㄛ|'
|
||||
'[ㄅㄆㄇㄉㄊㄋㄌㄐㄑㄒ]?ㄧ[ㄝㄠㄢㄥ]?|'
|
||||
'[ㄅㄆㄇㄈㄉㄊㄋㄌㄍㄎㄏㄓㄔㄕㄖㄗㄘㄙ]?[ㄤㄥ]|'
|
||||
'[ㄍㄎㄏㄓㄔㄕ]?ㄨ[ㄚㄞㄤ]|'
|
||||
'[ㄉㄊㄋㄌㄍㄎㄏㄓㄔㄕㄖㄗㄘㄙ]?ㄨㄛ|'
|
||||
'[ㄉㄊㄍㄎㄏㄓㄔㄕㄖㄗㄘㄙ]?ㄨㄟ|'
|
||||
'[ㄉㄊㄋㄌㄍㄎㄏㄓㄔㄕㄖㄗㄘㄙ]?ㄨㄢ|'
|
||||
'[ㄉㄊㄌㄍㄎㄏㄓㄔㄕㄖㄗㄘㄙ]?ㄨㄣ|'
|
||||
'[ㄉㄊㄋㄌㄍㄎㄏㄓㄔㄖㄗㄘㄙ]?ㄨㄥ|'
|
||||
'[ㄅㄆㄇㄈㄉㄊㄋㄌㄍㄎㄏㄓㄔㄕㄖㄗㄘㄙ]?ㄨ|'
|
||||
'[ㄓㄔㄕㄖㄗㄘㄙㄝㄦㄧ]'
|
||||
')[{marks}]?'
|
||||
).format(marks=marks)
|
Loading…
Reference in new issue