remove useless

4 years ago · 8983880c75
parent 28974ab7ec
commit 8983880c75
3 changed files with 49 additions and 800 deletions
--- a/third_party/phkit/phkit/pinyinkit/README.md
+++ b/third_party/phkit/phkit/pinyinkit/README.md
@ -1,294 +0,0 @@
-汉字拼音转换工具（Python 版）
-=============================
-
-|Build| |appveyor| |Coverage| |Pypi version| |DOI|
-
-
-将汉字转为拼音。可以用于汉字注音、排序、检索(`Russian translation`_) 。
-
-基于 `hotoo/pinyin <https://github.com/hotoo/pinyin>`__ 开发。
-
-* Documentation: http://pypinyin.rtfd.io/
-* GitHub: https://github.com/mozillazg/python-pinyin
-* License: MIT license
-* PyPI: https://pypi.org/project/pypinyin
-* Python version: 2.7, pypy, pypy3, 3.4, 3.5, 3.6, 3.7, 3.8
-
-.. contents::
-
-
-特性
----
-
-* 根据词组智能匹配最正确的拼音。
-* 支持多音字。
-* 简单的繁体支持, 注音支持。
-* 支持多种不同拼音/注音风格。
-
-
-安装
----
-
-.. code-block:: bash
-
-    $ pip install pypinyin
-
-
-使用示例
--------
-
-Python 3(Python 2 下把 ``'中心'`` 替换为 ``u'中心'`` 即可):
-
-.. code-block:: python
-
-    >>> from pypinyin import pinyin, lazy_pinyin, Style
-    >>> pinyin('中心')
-    [['zhōng'], ['xīn']]
-    >>> pinyin('中心', heteronym=True)  # 启用多音字模式
-    [['zhōng', 'zhòng'], ['xīn']]
-    >>> pinyin('中心', style=Style.FIRST_LETTER)  # 设置拼音风格
-    [['z'], ['x']]
-    >>> pinyin('中心', style=Style.TONE2, heteronym=True)
-    [['zho1ng', 'zho4ng'], ['xi1n']]
-    >>> pinyin('中心', style=Style.TONE3, heteronym=True)
-    [['zhong1', 'zhong4'], ['xin1']]
-    >>> pinyin('中心', style=Style.BOPOMOFO)  # 注音风格
-    [['ㄓㄨㄥ'], ['ㄒㄧㄣ']]
-    >>> lazy_pinyin('中心')  # 不考虑多音字的情况
-    ['zhong', 'xin']
-
-
-**注意事项** ：
-
-* 拼音结果不会标明哪个韵母是轻声，轻声的韵母没有声调或数字标识（使用 ``5`` 标识轻声的方法见 `文档 <https://pypinyin.readthedocs.io/zh_CN/master/contrib.html#neutraltonewith5mixin>`__ ）。
-* 无声调相关拼音风格下的结果会使用 ``v`` 表示 ``ü`` （使用 ``ü`` 代替 ``v`` 的方法见 `文档 <https://pypinyin.readthedocs.io/zh_CN/master/contrib.html#v2umixin>`__ ）。
-
-命令行工具：
-
-.. code-block:: console
-
-    $ pypinyin 音乐
-    yīn yuè
-    $ pypinyin -h
-
-
-文档
--------
-
-详细文档请访问：http://pypinyin.rtfd.io/ 。
-
-项目代码开发方面的问题可以看看 `开发文档`_ 。
-
-
-FAQ
---------
-
-词语中的多音字拼音有误？
-+++++++++++++++++++++++++++++
-
-目前是通过词组拼音库的方式来解决多音字问题的。如果出现拼音有误的情况，
-可以自定义词组拼音来调整词语中的拼音：
-
-.. code-block:: python
-
-    >>> from pypinyin import Style, pinyin, load_phrases_dict
-    >>> pinyin('步履蹒跚')
-    [['bù'], ['lǚ'], ['mán'], ['shān']]
-    >>> load_phrases_dict({'步履蹒跚': [['bù'], ['lǚ'], ['pán'], ['shān']]})
-    >>> pinyin('步履蹒跚')
-    [['bù'], ['lǚ'], ['pán'], ['shān']]
-
-详见 `文档 <https://pypinyin.readthedocs.io/zh_CN/master/usage.html#custom-dict>`__ 。
-
-为什么没有 y, w, yu 几个声母？
-++++++++++++++++++++++++++++++++++++++++++++
-
-.. code-block:: python
-
-    >>> from pypinyin import Style, pinyin
-    >>> pinyin('下雨天', style=Style.INITIALS)
-    [['x'], [''], ['t']]
-
-因为根据 `《汉语拼音方案》 <http://www.moe.edu.cn/s78/A19/yxs_left/moe_810/s230/195802/t19580201_186000.html>`__ ，
-y，w，ü (yu) 都不是声母。
-
-    声母风格（INITIALS）下，“雨”、“我”、“圆”等汉字返回空字符串，因为根据
-    `《汉语拼音方案》 <http://www.moe.edu.cn/s78/A19/yxs_left/moe_810/s230/195802/t19580201_186000.html>`__ ，
-    y，w，ü (yu) 都不是声母，在某些特定韵母无声母时，才加上 y 或 w，而 ü 也有其特定规则。    —— @hotoo
-
-    **如果你觉得这个给你带来了麻烦，那么也请小心一些无声母的汉字（如“啊”、“饿”、“按”、“昂”等）。
-    这时候你也许需要的是首字母风格（FIRST_LETTER）**。    —— @hotoo
-
-    参考: `hotoo/pinyin#57 <https://github.com/hotoo/pinyin/issues/57>`__,
-    `#22 <https://github.com/mozillazg/python-pinyin/pull/22>`__,
-    `#27 <https://github.com/mozillazg/python-pinyin/issues/27>`__,
-    `#44 <https://github.com/mozillazg/python-pinyin/issues/44>`__
-
-如果觉得这个行为不是你想要的，就是想把 y 当成声母的话，可以指定 ``strict=False`` ，
-这个可能会符合你的预期：
-
-.. code-block:: python
-
-    >>> from pypinyin import Style, pinyin
-    >>> pinyin('下雨天', style=Style.INITIALS)
-    [['x'], [''], ['t']]
-    >>> pinyin('下雨天', style=Style.INITIALS, strict=False)
-    [['x'], ['y'], ['t']]
-
-详见 `strict 参数的影响`_ 。
-
-如何减少内存占用
-++++++++++++++++++++
-
-如果对拼音的准确性不是特别在意的话，可以通过设置环境变量 ``PYPINYIN_NO_PHRASES``
-和 ``PYPINYIN_NO_DICT_COPY`` 来节省内存。
-详见 `文档 <https://pypinyin.readthedocs.io/zh_CN/master/faq.html#no-phrases>`__
-
-
-更多 FAQ 详见文档中的
-`FAQ <https://pypinyin.readthedocs.io/zh_CN/master/faq.html>`__ 部分。
-
-
-.. _#13 : https://github.com/mozillazg/python-pinyin/issues/113
-.. _strict 参数的影响: https://pypinyin.readthedocs.io/zh_CN/master/usage.html#strict
-
-
-拼音数据
---------
-
-* 单个汉字的拼音使用 `pinyin-data`_ 的数据
-* 词组的拼音使用 `phrase-pinyin-data`_ 的数据
-
-
-Related Projects
-----------------
-
-* `hotoo/pinyin`__: 汉字拼音转换工具 Node.js/JavaScript 版。
-* `mozillazg/go-pinyin`__: 汉字拼音转换工具 Go 版。
-* `mozillazg/rust-pinyin`__: 汉字拼音转换工具 Rust 版。
-
-
-__ https://github.com/hotoo/pinyin
-__ https://github.com/mozillazg/go-pinyin
-__ https://github.com/mozillazg/rust-pinyin
-
-
-.. |Build| image:: https://img.shields.io/circleci/project/github/mozillazg/python-pinyin/master.svg
-   :target: https://circleci.com/gh/mozillazg/python-pinyin
-.. |appveyor| image:: https://ci.appveyor.com/api/projects/status/ni8gdyextfa85yqo/branch/master?svg=true
-   :target: https://ci.appveyor.com/project/mozillazg/python-pinyin
-.. |Coverage| image:: https://img.shields.io/codecov/c/github/mozillazg/python-pinyin/master.svg
-   :target: https://codecov.io/gh/mozillazg/python-pinyin
-.. |PyPI version| image:: https://img.shields.io/pypi/v/pypinyin.svg
-   :target: https://pypi.org/project/pypinyin/
-.. |DOI| image:: https://zenodo.org/badge/12830126.svg
-   :target: https://zenodo.org/badge/latestdoi/12830126
-
-
-
-.. _Russian translation: https://github.com/mozillazg/python-pinyin/blob/master/README_ru.rst
-.. _pinyin-data: https://github.com/mozillazg/pinyin-data
-.. _phrase-pinyin-data: https://github.com/mozillazg/phrase-pinyin-data
-.. _开发文档: https://pypinyin.readthedocs.io/zh_CN/develop/develop.html
-
-
-
-# pinyin-data [![Build Status](https://travis-ci.org/mozillazg/pinyin-data.svg?branch=master)](https://travis-ci.org/mozillazg/pinyin-data)
-
-汉字拼音数据。
-
-
-## 数据介绍
-
-拼音数据的格式：
-
-    {code point}: {pinyins}  # {hanzi} {comments}
-
-* 以 `#` 开头的行是注释，行内 `#` 后面的字符也是注释
-* `{pinyins}` 中使用逗号分隔多个拼音
-* 示例：
-
-        # 注释
-        U+4E2D: zhōng,zhòng  # 中
-
-
-[Unihan Database][unihan] 数据版本：
-
-> Date: 2018-11-09 21:36:19 GMT [JHJ]    
-> Unicode version: 12.0.0
-
-* `kHanyuPinyin.txt`: [Unihan Database][unihan] 中 [kHanyuPinyin](http://www.unicode.org/reports/tr38/#kHanyuPinyin) 部分的拼音数据（来源于《漢語大字典》的拼音数据）
-* `kXHC1983.txt`: [Unihan Database][unihan] 中 [kXHC1983](http://www.unicode.org/reports/tr38/#kXHC1983) 部分的拼音数据（来源于《现代汉语词典》的拼音数据）
-* `kHanyuPinlu.txt`: [Unihan Database][unihan] 中 [kHanyuPinlu](http://www.unicode.org/reports/tr38/#kHanyuPinlu) 部分的拼音数据（来源于《現代漢語頻率詞典》的拼音数据）
-* `kMandarin.txt`: [Unihan Database][unihan] 中 [kMandarin](http://www.unicode.org/reports/tr38/#kMandarin) 部分的拼音数据（普通话中最常用的一个读音。zh-CN 为主，如果 zh-CN 中没有则使用 zh-TW 中的拼音）
-* `kMandarin_overwrite.txt`: 手工纠正 `kMandarin.txt` 中有误的拼音数据（**可以修改**）
-* `GBK_PUA.txt`: [Private Use Area](https://en.wikipedia.org/wiki/Private_Use_Areas) 中有拼音的汉字，参考 [GB 18030 - 维基百科，自由的百科全书](https://zh.wikipedia.org/wiki/GB_18030#PUA) （**可以修改**）
-* `nonCJKUI.txt`: 不属于 [CJK Unified Ideograph](https://en.wikipedia.org/wiki/CJK_Unified_Ideographs) 但是却有拼音的字符（**可以修改**）
-* `kanji.txt`: [日本自造汉字](https://zh.wikipedia.org/wiki/%E6%97%A5%E6%9C%AC%E6%B1%89%E5%AD%97#7_%E6%97%A5%E6%9C%AC%E6%B1%89%E5%AD%97%E7%9A%84%E6%B1%89%E8%AF%AD%E6%99%AE%E9%80%9A%E8%AF%9D%E8%A7%84%E8%8C%83%E8%AF%BB%E9%9F%B3%E8%A1%A8) 的拼音数据 （**可以修改**）
-* `kMandarin_8105.txt`: [《通用规范汉字表》](https://zh.wikipedia.org/wiki/通用规范汉字表)(2013 年版)里 8105 个汉字最常用的一个读音 (**可以修改**)
-* `overwrite.txt`: 手工纠正的拼音数据（**可以修改**）
-* `pinyin.txt`: 合并上述文件后的拼音数据
-* `zdic.txt`: [汉典网](http://zdic.net) 的拼音数据（**可以修改**）
-
-
-## 参考资料
-
-* [汉语拼音方案](http://www.moe.edu.cn/s78/A19/yxs_left/moe_810/s230/195802/t19580201_186000.html)
-* [Unihan Database Lookup](http://www.unicode.org/charts/unihan.html)
-* [汉典 zdic.net](http://www.zdic.net/)
-* [字海网，叶典网](http://zisea.com/)
-* [国学大师_国学网](http://www.guoxuedashi.com/)
-* [Unicode、GB2312、GBK和GB18030中的汉字](http://www.fmddlmyy.cn/text24.html)
-* [GB 18030 - 维基百科，自由的百科全书](https://zh.wikipedia.org/wiki/GB_18030#PUA)
-* [通用规范汉字表 - 维基百科，自由的百科全书](https://zh.wikipedia.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8)
-* [China’s 通用规范汉字表 (Tōngyòng Guīfàn Hànzìbiǎo)](https://blogs.adobe.com/CCJKType/2014/03/china-8105.html)
-* [日本汉字的汉语读音规范](http://www.moe.gov.cn/s78/A19/yxs_left/moe_810/s230/201001/t20100115_75698.html)
-* [日本汉字的汉语普通话规范读音表- 维基百科](https://zh.wikipedia.org/wiki/%E6%97%A5%E6%9C%AC%E6%B1%89%E5%AD%97#7_%E6%97%A5%E6%9C%AC%E6%B1%89%E5%AD%97%E7%9A%84%E6%B1%89%E8%AF%AD%E6%99%AE%E9%80%9A%E8%AF%9D%E8%A7%84%E8%8C%83%E8%AF%BB%E9%9F%B3%E8%A1%A8)
-
-[unihan]: http://www.unicode.org/charts/unihan.html
-
-
-# phrase-pinyin-data [![Build Status](https://travis-ci.org/mozillazg/phrase-pinyin-data.svg?branch=master)](https://travis-ci.org/mozillazg/phrase-pinyin-data)
-
-词语拼音数据。
-
-
-## 数据介绍
-
-拼音数据的格式：
-
-```
-{phrase}: {pinyin}
-```
-
-* 以 `#` 开头的行是注释
-* 行尾的 `#` 也是注释
-* `{phrase}` 汉字词语
-* `{pinyin}` 词语的拼音，使用空格分隔每个汉字的拼音
-* 一行一个词语的读音，有多个音的词语会出现在多行
-* 示例：
-
-  ```
-  # 注释
-  中国: zhōng guó
-  北京: běi jīng  # 注释
-  ```
-
-文件说明:
-
-* `overwrite.txt`: 手工纠正的拼音数据
-* `pinyin.txt`: `pinyin.txt + overwrite.txt` 后的拼音数据
-* `zdic_cibs.txt`: [汉典网](http://www.zdic.net/) 汉语词典拼音数据
-* `zdic_cybs.txt`: [汉典网](http://www.zdic.net/) 成语词典拼音数据
-* `cc_cedict.txt`: [cc-cedict.org](https://cc-cedict.org/) 拼音数据
-* `large_pinyin.txt`: `zdic_cibs.txt + zdic_cybs.txt + cc_cedict.txt + pinyin.txt + overwrite.txt` 后的拼音数据
-
-
-## 参考资料
-
-* 初始数据基于 [phrases-dict.js](https://github.com/hotoo/pinyin/blob/05f74496c34ccb32db1a0fd0b358a798a22a51e5/data/phrases-dict.js) 和 [phrases_dict.py](https://github.com/mozillazg/python-pinyin/blob/366de0363ff1fb9a718ce668448bea59de09a4bf/pypinyin/phrases_dict.py)
-* [汉典 zdic.net](http://www.zdic.net/)
-* [字海网，叶典网](http://zisea.com/)
-* [国学大师_国学网](http://www.guoxuedashi.com/)
-* [CC-CEDICT download - MDBG English to Chinese dictionary](http://www.mdbg.net/chindict/chindict.php?page=cc-cedict)
-
--- a/third_party/phkit/phkit/pinyinkit/init.py
+++ b/third_party/phkit/phkit/pinyinkit/init.py
@ -3,60 +3,58 @@
 文本转拼音的模块，依赖python-pinyin，jieba，phrase-pinyin-data模块。
 """
 import re
-#from .core import lazy_pinyin, Style
-from .core import lazy_pinyin as lazy_pinyin_local
 from pypinyin import lazy_pinyin, Style, load_phrases_dict, load_phrases_dict


-def parse_pinyin_txt(inpath):
-    # U+4E2D: zhōng,zhòng  # 中
-    outs = []
-    with open(inpath, encoding="utf8") as fin:
-        for line in tqdm(fin, desc='load pinyin', ncols=80, mininterval=1):
-            if line.startswith("#"):
-                continue
-            res = _ziyin_re.search(line)
-            if res:
-                zi = res.group(3).strip()
-                if len(zi) == 1:
-                    outs.append([zi, res.group(2).strip().split(",")])
-                else:
-                    print(line)
-            elif line.strip():
-                print(line)
-    return {ord(z): ','.join(p) for z, p in outs}
-
-
-def parse_phrase_txt(inpath):
-    # 一一对应: yī yī duì yìng
-    outs = []
-    with open(inpath, encoding="utf8") as fin:
-        for line in tqdm(fin, desc='load phrase', ncols=80, mininterval=1):
-            if line.startswith("#"):
-                continue
-            parts = line.split(":")
-            zs = parts[0].strip()
-            ps = parts[1].strip().split()
-            if len(parts) == 2 and len(zs) == len(ps) and len(zs) >= 2:
-                outs.append([zs, ps])
-            elif line.strip():
-                print(line)
-    return {zs: [[p] for p in ps] for zs, ps in outs}
-
-
-def initialize():
-    # 导入数据
-    inpath = Path(__file__).absolute().parent.joinpath('phrase_pinyin.txt.py')
-    _phrases_dict = parse_phrase_txt(inpath)
-    load_phrases_dict(_phrases_dict)  # big:398815 small:36776
-
-    inpath = Path(__file__).absolute().parent.joinpath('single_pinyin.txt.py')
-    _pinyin_dict = parse_pinyin_txt(inpath)
-    load_single_dict(_pinyin_dict)  # 41451
-
-    jieba.initialize()
-    # for word, _ in tqdm(_phrases_dict.items(), desc='jieba add word', ncols=80, mininterval=1):
-    #     jieba.add_word(word)
+# def parse_pinyin_txt(inpath):
+#     # U+4E2D: zhōng,zhòng  # 中
+#     outs = []
+#     with open(inpath, encoding="utf8") as fin:
+#         for line in tqdm(fin, desc='load pinyin', ncols=80, mininterval=1):
+#             if line.startswith("#"):
+#                 continue
+#             res = _ziyin_re.search(line)
+#             if res:
+#                 zi = res.group(3).strip()
+#                 if len(zi) == 1:
+#                     outs.append([zi, res.group(2).strip().split(",")])
+#                 else:
+#                     print(line)
+#             elif line.strip():
+#                 print(line)
+#     return {ord(z): ','.join(p) for z, p in outs}
+
+
+# def parse_phrase_txt(inpath):
+#     # 一一对应: yī yī duì yìng
+#     outs = []
+#     with open(inpath, encoding="utf8") as fin:
+#         for line in tqdm(fin, desc='load phrase', ncols=80, mininterval=1):
+#             if line.startswith("#"):
+#                 continue
+#             parts = line.split(":")
+#             zs = parts[0].strip()
+#             ps = parts[1].strip().split()
+#             if len(parts) == 2 and len(zs) == len(ps) and len(zs) >= 2:
+#                 outs.append([zs, ps])
+#             elif line.strip():
+#                 print(line)
+#     return {zs: [[p] for p in ps] for zs, ps in outs}
+
+
+# def initialize():
+#     # 导入数据
+#     inpath = Path(__file__).absolute().parent.joinpath('phrase_pinyin.txt.py')
+#     _phrases_dict = parse_phrase_txt(inpath)
+#     load_phrases_dict(_phrases_dict)  # big:398815 small:36776
+
+#     inpath = Path(__file__).absolute().parent.joinpath('single_pinyin.txt.py')
+#     _pinyin_dict = parse_pinyin_txt(inpath)
+#     load_single_dict(_pinyin_dict)  # 41451
+
+#     jieba.initialize()
+#     # for word, _ in tqdm(_phrases_dict.items(), desc='jieba add word', ncols=80, mininterval=1):
+#     #     jieba.add_word(word)


 # 兼容0.1.0之前的版本。
@ -74,8 +72,6 @@ def text2pinyin(text, errors=None, **kwargs):
    if errors is None:
        errors = default_errors
    pin = lazy_pinyin(text, style=Style.TONE3, errors=errors, strict=True, neutral_tone_with_five=True, **kwargs)
-    pino = lazy_pinyin_local(text, style=Style.TONE3, errors=errors, strict=True, neutral_tone_with_five=True, **kwargs)
-    assert pin == pino
    return pin


--- a/third_party/phkit/phkit/pinyinkit/core.py
+++ b/third_party/phkit/phkit/pinyinkit/core.py
@ -1,453 +0,0 @@
-#!/usr/bin/env python
-# -*- coding: utf-8 -*-
-# author: kuangdd
-# date: 2020/5/30
-"""
-Base on python-pinyin(pypinyin), phrase-pinyin-data, pinyin-data and jieba.
-"""
-
-from itertools import chain
-
-from pypinyin.constants import (
-    PHRASES_DICT, PINYIN_DICT, Style
-)
-from pypinyin.converter import DefaultConverter, _mixConverter
-from pypinyin.seg import mmseg
-from pypinyin.seg.simpleseg import seg
-from pypinyin.utils import _replace_tone2_style_dict_to_default
-
-from tqdm import tqdm
-import jieba
-import re
-from pathlib import Path
-
-_ziyin_re = re.compile(r"^U\+(\w+?):(.+?)#(.+)$")
-_true_pin_re = re.compile(r"[^a-zA-Z]+")
-
-is_initialized = False
-
-def load_single_dict(pinyin_dict, style='default'):
-    """载入用户自定义的单字拼音库
-
-    :param pinyin_dict: 单字拼音库。比如： ``{0x963F: u"ā,ē"}``
-    :param style: pinyin_dict 参数值的拼音库风格. 支持 'default', 'tone2'
-    :type pinyin_dict: dict
-    """
-    if style == 'tone2':
-        for k, v in pinyin_dict.items():
-            v = _replace_tone2_style_dict_to_default(v)
-            PINYIN_DICT[k] = v
-    else:
-        PINYIN_DICT.update(pinyin_dict)
-
-    mmseg.retrain(mmseg.seg)
-
-
-def load_phrases_dict(phrases_dict, style='default'):
-    """载入用户自定义的词语拼音库
-
-    :param phrases_dict: 词语拼音库。比如： ``{u"阿爸": [[u"ā"], [u"bà"]]}``
-    :param style: phrases_dict 参数值的拼音库风格. 支持 'default', 'tone2'
-    :type phrases_dict: dict
-    """
-    if style == 'tone2':
-        for k, value in phrases_dict.items():
-            v = [
-                list(map(_replace_tone2_style_dict_to_default, pys))
-                for pys in value
-            ]
-            PHRASES_DICT[k] = v
-    else:
-        PHRASES_DICT.update(phrases_dict)
-
-    mmseg.retrain(mmseg.seg)
-
-
-def parse_pinyin_txt(inpath):
-    # U+4E2D: zhōng,zhòng  # 中
-    outs = []
-    with open(inpath, encoding="utf8") as fin:
-        for line in tqdm(fin, desc='load pinyin', ncols=80, mininterval=1):
-            if line.startswith("#"):
-                continue
-            res = _ziyin_re.search(line)
-            if res:
-                zi = res.group(3).strip()
-                if len(zi) == 1:
-                    outs.append([zi, res.group(2).strip().split(",")])
-                else:
-                    print(line)
-            elif line.strip():
-                print(line)
-    return {ord(z): ','.join(p) for z, p in outs}
-
-
-def parse_phrase_txt(inpath):
-    # 一一对应: yī yī duì yìng
-    outs = []
-    with open(inpath, encoding="utf8") as fin:
-        for line in tqdm(fin, desc='load phrase', ncols=80, mininterval=1):
-            if line.startswith("#"):
-                continue
-            parts = line.split(":")
-            zs = parts[0].strip()
-            ps = parts[1].strip().split()
-            if len(parts) == 2 and len(zs) == len(ps) and len(zs) >= 2:
-                outs.append([zs, ps])
-            elif line.strip():
-                print(line)
-    return {zs: [[p] for p in ps] for zs, ps in outs}
-
-
-def initialize():
-    # 导入数据
-    inpath = Path(__file__).absolute().parent.joinpath('phrase_pinyin.txt.py')
-    _phrases_dict = parse_phrase_txt(inpath)
-    load_phrases_dict(_phrases_dict)  # big:398815 small:36776
-
-    inpath = Path(__file__).absolute().parent.joinpath('single_pinyin.txt.py')
-    _pinyin_dict = parse_pinyin_txt(inpath)
-    load_single_dict(_pinyin_dict)  # 41451
-
-    jieba.initialize()
-    # for word, _ in tqdm(_phrases_dict.items(), desc='jieba add word', ncols=80, mininterval=1):
-    #     jieba.add_word(word)
-    global is_initialized
-    is_initialized = True
-
-
-class Pinyin(object):
-
-    def __init__(self, converter=None, **kwargs):
-        self._converter = converter or DefaultConverter()
-
-    def pinyin(self, hans, style=Style.TONE, heteronym=False,
-               errors='default', strict=True, **kwargs):
-        """将汉字转换为拼音，返回汉字的拼音列表。
-
-        :param hans: 汉字字符串( ``'你好吗'`` )或列表( ``['你好', '吗']`` ).
-                     可以使用自己喜爱的分词模块对字符串进行分词处理,
-                     只需将经过分词处理的字符串列表传进来就可以了。
-        :type hans: unicode 字符串或字符串列表
-        :param style: 指定拼音风格，默认是 :py:attr:`~pypinyin.Style.TONE` 风格。
-                      更多拼音风格详见 :class:`~pypinyin.Style`
-        :param errors: 指定如何处理没有拼音的字符。详见 :ref:`handle_no_pinyin`
-
-                       * ``'default'``: 保留原始字符
-                       * ``'ignore'``: 忽略该字符
-                       * ``'replace'``: 替换为去掉 ``\\u`` 的 unicode 编码字符串
-                         (``'\\u90aa'`` => ``'90aa'``)
-                       * callable 对象: 回调函数之类的可调用对象。
-
-        :param heteronym: 是否启用多音字
-        :param strict: 只获取声母或只获取韵母相关拼音风格的返回结果
-                       是否严格遵照《汉语拼音方案》来处理声母和韵母，
-                       详见 :ref:`strict`
-        :return: 拼音列表
-        :rtype: list
-
-        """
-        # 对字符串进行分词处理
-        if isinstance(hans, str):
-            han_list = self.seg(hans)
-        else:
-            han_list = chain(*(self.seg(x) for x in hans))
-
-        pys = []
-        for words in han_list:
-            pys.extend(
-                self._converter.convert(
-                    words, style, heteronym, errors, strict=strict))
-        return pys
-
-    def lazy_pinyin(self, hans, style=Style.NORMAL,
-                    errors='default', strict=True, **kwargs):
-        """将汉字转换为拼音，返回不包含多音字结果的拼音列表.
-
-        与 :py:func:`~pypinyin.pinyin` 的区别是每个汉字的拼音是个字符串，
-        并且每个字只包含一个读音.
-
-        :param hans: 汉字
-        :type hans: unicode or list
-        :param style: 指定拼音风格，默认是 :py:attr:`~pypinyin.Style.NORMAL` 风格。
-                      更多拼音风格详见 :class:`~pypinyin.Style`。
-        :param errors: 指定如何处理没有拼音的字符，详情请参考
-                       :py:func:`~pypinyin.pinyin`
-        :param strict: 只获取声母或只获取韵母相关拼音风格的返回结果
-                       是否严格遵照《汉语拼音方案》来处理声母和韵母，
-                       详见 :ref:`strict`
-        :return: 拼音列表(e.g. ``['zhong', 'guo', 'ren']``)
-        :rtype: list
-
-        """
-        return list(
-            chain(
-                *self.pinyin(
-                    hans, style=style, heteronym=False,
-                    errors=errors, strict=strict)))
-
-    def pre_seg(self, hans, **kwargs):
-        """对字符串进行分词前将调用 ``pre_seg`` 方法对未分词的字符串做预处理。
-
-        默认原样返回传入的 ``hans``。
-
-        如果这个方法的返回值类型是 ``list``，表示返回的是一个分词后的结果，此时，
-        ``seg`` 方法中将不再调用 ``seg_function`` 进行分词。
-
-        :param hans: 分词前的字符串
-        :return: ``None`` or ``list``
-        """
-        outs = list(jieba.cut(hans))  # 默认用jieba分词，从语义角度分词。
-        return outs
-
-    def seg(self, hans, **kwargs):
-        """对汉字进行分词。
-
-        分词前会调用 ``pre_seg`` 方法，分词后会调用 ``post_seg`` 方法。
-
-        :param hans:
-        :return:
-        """
-        pre_data = self.pre_seg(hans)
-        if isinstance(pre_data, list):
-            seg_data = pre_data
-        else:
-            seg_data = self.get_seg()(hans)
-
-        post_data = self.post_seg(hans, seg_data)
-        if isinstance(post_data, list):
-            return post_data
-
-        return seg_data
-
-    def get_seg(self, **kwargs):
-        """获取分词函数。
-
-        :return: 分词函数
-        """
-        return seg
-
-    def post_seg(self, hans, seg_data, **kwargs):
-        """对字符串进行分词后将调用 ``post_seg`` 方法对分词后的结果做处理。
-
-        默认原样返回传入的 ``seg_data``。
-
-        如果这个方法的返回值类型是 ``list``，表示对分词结果做了二次处理，此时，
-        ``seg`` 方法将以这个返回的数据作为返回值。
-
-        :param hans: 分词前的字符串
-        :param seg_data: 分词后的结果
-        :type seg_data: list
-        :return: ``None`` or ``list``
-        """
-        pass
-
-
-_default_convert = DefaultConverter()
-_default_pinyin = Pinyin(_default_convert)
-
-
-def to_fixed(pinyin, style, strict=True):
-    # 用于向后兼容，TODO: 废弃
-    return _default_convert.convert_style(
-        '', pinyin, style=style, strict=strict, default=pinyin)
-
-
-_to_fixed = to_fixed
-
-
-def handle_nopinyin(chars, errors='default', heteronym=True):
-    # 用于向后兼容，TODO: 废弃
-    return _default_convert.handle_nopinyin(
-        chars, style=None, errors=errors, heteronym=heteronym, strict=True)
-
-
-def single_pinyin(han, style, heteronym, errors='default', strict=True):
-    # 用于向后兼容，TODO: 废弃
-    return _default_convert._single_pinyin(
-        han, style, heteronym, errors=errors, strict=strict)
-
-
-def phrase_pinyin(phrase, style, heteronym, errors='default', strict=True):
-    # 用于向后兼容，TODO: 废弃
-    return _default_convert._phrase_pinyin(
-        phrase, style, heteronym, errors=errors, strict=strict)
-
-
-def pinyin(hans, style=Style.TONE, heteronym=False,
-           errors='default', strict=True,
-           v_to_u=False, neutral_tone_with_five=False):
-    """将汉字转换为拼音，返回汉字的拼音列表。
-
-    :param hans: 汉字字符串( ``'你好吗'`` )或列表( ``['你好', '吗']`` ).
-                 可以使用自己喜爱的分词模块对字符串进行分词处理,
-                 只需将经过分词处理的字符串列表传进来就可以了。
-    :type hans: unicode 字符串或字符串列表
-    :param style: 指定拼音风格，默认是 :py:attr:`~pypinyin.Style.TONE` 风格。
-                  更多拼音风格详见 :class:`~pypinyin.Style`
-    :param errors: 指定如何处理没有拼音的字符。详见 :ref:`handle_no_pinyin`
-
-                   * ``'default'``: 保留原始字符
-                   * ``'ignore'``: 忽略该字符
-                   * ``'replace'``: 替换为去掉 ``\\u`` 的 unicode 编码字符串
-                     (``'\\u90aa'`` => ``'90aa'``)
-                   * callable 对象: 回调函数之类的可调用对象。
-
-    :param heteronym: 是否启用多音字
-    :param strict: 只获取声母或只获取韵母相关拼音风格的返回结果
-                   是否严格遵照《汉语拼音方案》来处理声母和韵母，
-                   详见 :ref:`strict`
-    :param v_to_u: 无声调相关拼音风格下的结果是否使用 ``ü`` 代替原来的 ``v``
-    :type v_to_u: bool
-    :param neutral_tone_with_five: 声调使用数字表示的相关拼音风格下的结果是否
-                                   使用 5 标识轻声
-    :type neutral_tone_with_five: bool
-    :return: 拼音列表
-    :rtype: list
-
-    :raise AssertionError: 当传入的字符串不是 unicode 字符时会抛出这个异常
-
-    Usage::
-
-      >>> from pypinyin import pinyin, Style
-      >>> import pypinyin
-      >>> pinyin('中心')
-      [['zhōng'], ['xīn']]
-      >>> pinyin('中心', heteronym=True)  # 启用多音字模式
-      [['zhōng', 'zhòng'], ['xīn']]
-      >>> pinyin('中心', style=Style.FIRST_LETTER)  # 设置拼音风格
-      [['z'], ['x']]
-      >>> pinyin('中心', style=Style.TONE2)
-      [['zho1ng'], ['xi1n']]
-      >>> pinyin('中心', style=Style.CYRILLIC)
-      [['чжун1'], ['синь1']]
-      >>> pinyin('战略', v_to_u=True, style=Style.NORMAL)
-      [['zhan'], ['lüe']]
-      >>> pinyin('衣裳', style=Style.TONE3, neutral_tone_with_five=True)
-      [['yi1'], ['shang5']]
-    """
-    global is_initialized
-    if not is_initialized:
-        initialize()
-        is_initialized = True
-    _pinyin = Pinyin(_mixConverter(
-        v_to_u=v_to_u, neutral_tone_with_five=neutral_tone_with_five))
-    return _pinyin.pinyin(
-        hans, style=style, heteronym=heteronym, errors=errors, strict=strict)
-
-
-def slug(hans, style=Style.NORMAL, heteronym=False, separator='-',
-         errors='default', strict=True):
-    """将汉字转换为拼音，然后生成 slug 字符串.
-
-    :param hans: 汉字
-    :type hans: unicode or list
-    :param style: 指定拼音风格，默认是 :py:attr:`~pypinyin.Style.NORMAL` 风格。
-                  更多拼音风格详见 :class:`~pypinyin.Style`
-    :param heteronym: 是否启用多音字
-    :param separator: 两个拼音间的分隔符/连接符
-    :param errors: 指定如何处理没有拼音的字符，详情请参考
-                   :py:func:`~pypinyin.pinyin`
-    :param strict: 只获取声母或只获取韵母相关拼音风格的返回结果
-                   是否严格遵照《汉语拼音方案》来处理声母和韵母，
-                   详见 :ref:`strict`
-    :return: slug 字符串.
-
-    :raise AssertionError: 当传入的字符串不是 unicode 字符时会抛出这个异常
-
-    ::
-
-      >>> import pypinyin
-      >>> from pypinyin import Style
-      >>> pypinyin.slug('中国人')
-      'zhong-guo-ren'
-      >>> pypinyin.slug('中国人', separator=' ')
-      'zhong guo ren'
-      >>> pypinyin.slug('中国人', style=Style.FIRST_LETTER)
-      'z-g-r'
-      >>> pypinyin.slug('中国人', style=Style.CYRILLIC)
-      'чжун1-го2-жэнь2'
-    """
-    global is_initialized
-    if not is_initialized:
-        initialize()
-        is_initialized = True
-    return separator.join(
-        chain(
-            *_default_pinyin.pinyin(
-                hans, style=style, heteronym=heteronym,
-                errors=errors, strict=strict
-            )
-        )
-    )
-
-
-def lazy_pinyin(hans, style=Style.NORMAL, errors='default', strict=True,
-                v_to_u=False, neutral_tone_with_five=False):
-    """将汉字转换为拼音，返回不包含多音字结果的拼音列表.
-
-    与 :py:func:`~pypinyin.pinyin` 的区别是返回的拼音是个字符串，
-    并且每个字只包含一个读音.
-
-    :param hans: 汉字
-    :type hans: unicode or list
-    :param style: 指定拼音风格，默认是 :py:attr:`~pypinyin.Style.NORMAL` 风格。
-                  更多拼音风格详见 :class:`~pypinyin.Style`。
-    :param errors: 指定如何处理没有拼音的字符，详情请参考
-                   :py:func:`~pypinyin.pinyin`
-    :param strict: 只获取声母或只获取韵母相关拼音风格的返回结果
-                   是否严格遵照《汉语拼音方案》来处理声母和韵母，
-                   详见 :ref:`strict`
-    :param v_to_u: 无声调相关拼音风格下的结果是否使用 ``ü`` 代替原来的 ``v``
-    :type v_to_u: bool
-    :param neutral_tone_with_five: 声调使用数字表示的相关拼音风格下的结果是否
-                                   使用 5 标识轻声
-    :type neutral_tone_with_five: bool
-    :return: 拼音列表(e.g. ``['zhong', 'guo', 'ren']``)
-    :rtype: list
-
-    :raise AssertionError: 当传入的字符串不是 unicode 字符时会抛出这个异常
-
-    Usage::
-
-      >>> from pypinyin import lazy_pinyin, Style
-      >>> import pypinyin
-      >>> lazy_pinyin('中心')
-      ['zhong', 'xin']
-      >>> lazy_pinyin('中心', style=Style.TONE)
-      ['zhōng', 'xīn']
-      >>> lazy_pinyin('中心', style=Style.FIRST_LETTER)
-      ['z', 'x']
-      >>> lazy_pinyin('中心', style=Style.TONE2)
-      ['zho1ng', 'xi1n']
-      >>> lazy_pinyin('中心', style=Style.CYRILLIC)
-      ['чжун1', 'синь1']
-      >>> lazy_pinyin('战略', v_to_u=True)
-      ['zhan', 'lüe']
-      >>> lazy_pinyin('衣裳', style=Style.TONE3, neutral_tone_with_five=True)
-      ['yi1', 'shang5']
-    """
-    global is_initialized
-    if not is_initialized:
-        initialize()
-        is_initialized = True
-    _pinyin = Pinyin(_mixConverter(
-        v_to_u=v_to_u, neutral_tone_with_five=neutral_tone_with_five))
-    return _pinyin.lazy_pinyin(
-        hans, style=style, errors=errors, strict=strict)
-
-
-if __name__ == "__main__":
-    print(__file__)
-    han = '老师很重视这个问题啊，请重说一遍。。。很难说有山难发生，理发师和会计谁会发财？'
-    out = _default_pinyin.seg(han)
-    assert out == ['老师', '很', '重视', '这个', '问题', '啊', '，', '请重', '说', '一遍', '。', '。', '。', '很难说', '有山难', '发生', '，',
-                   '理发师', '和', '会计', '谁', '会', '发财', '？']
-
-    out = lazy_pinyin(han, style=8, neutral_tone_with_five=True)
-    assert out == ['lao3', 'shi1', 'hen3', 'zhong4', 'shi4', 'zhe4', 'ge4', 'wen4', 'ti2', 'a5', '，', 'qing3', 'zhong4',
-                   'shuo1', 'yi1', 'bian4', '。', '。', '。', 'hen3', 'nan2', 'shuo1', 'you3', 'shan1', 'nan2', 'fa1',
-                   'sheng1', '，', 'li3', 'fa4', 'shi1', 'he2', 'kuai4', 'ji4', 'shui2', 'hui4', 'fa1', 'cai2', '？']
-
-    out = slug(han, style=8, separator=' ')
-    assert out == 'lao3 shi1 hen3 zhong4 shi4 zhe4 ge4 wen4 ti2 a ， qing3 zhong4 shuo1 yi1 bian4 。 。 。 hen3 nan2 shuo1 you3 shan1 nan2 fa1 sheng1 ， li3 fa4 shi1 he2 kuai4 ji4 shui2 hui4 fa1 cai2 ？'