letiantian / chinesetone Goto Github PK
View Code? Open in Web Editor NEW[本项目不再维护] 将汉字转换为拼音, 支持多音字,拼音 -> pin yin
[本项目不再维护] 将汉字转换为拼音, 支持多音字,拼音 -> pin yin
python 3.6
latest ChineseTone version
ChineseTone.PinyinHelper.hasMultiPinyin('鮟') True ChineseTone.PinyinHelper.hasMultiPinyin('鱇') True ChineseTone.PinyinHelper.hasMultiPinyin('徐') True
in addition, characters like '检、海' etc. are all recognized as poly-phone character.
测试一下
打开拼音数据库时候报错误
File "D:\software\python\lib\site-packages\ChineseTone\chinesetone.py", line 68, in getPinyinResource
for line in open(os.path.join(CURRENT_DIR, 'data', 'pinyin.db')):
UnicodeDecodeError: 'gbk' codec can't decode byte 0x87 in position 2: illegal multibyte sequence
把分词结果也发出来了:
财经法规与会计职业道德
['财经', '法规', '与', '会计', '职业道德']
['cai', 'jing', 'fa', 'gui', 'yu', 'hui', 'ji', 'zhi', 'ye', 'dao', 'de']
会计职业道德
['会计', '职业道德']
['kuai', 'ji', 'zhi', 'ye', 'dao', 'de']
一些错读统计:
词语 正确 错误
金馆长 jinguanzhang jinguanchang
长高 zhanggao changgao
如何长高 ruhezhanggao ruhechanggao
崴脚 waijiao weijiao
内分泌失调 neifenmishitiao neifenmishidiao
脸上长痘痘怎么办 lianshangzhangdoudouzenmeban lianshangchangdoudouzenmeban
水泊梁山 shuipoliangshan shuiboliangshan
桔色 juse jiese
海娜 haina hainuo
爱望着蓝天发呆 aiwangzhelantianfadai aiwangzhuolantianfadai
漂在外面的涿州人 piaozaiwaimiandezhuozhouren piaozaiwaimiandizhuozhouren
心里住着太阳 xinlizhuzhetaiyang xinlizhuzhuotaiyang
金莎 jinsha jinsuo
镜泊湖 jingpohu jingbohu
你是我年少时最美的梦 nishiwonianshaoshizuimeidemeng nishiwonianshaoshizuimeidimeng
暗夜中的亡灵 anyezhongdewangling anyezhongdiwangling
梦娜 mengna mengnuo
财经法规与会计职业道德 caijingfaguiyukuaijizhiyedaode caijingfaguiyuhuijizhiyedaode
脸上长斑 lianshangzhangban lianshangchangban
结伴同行 jiebantongxing jiebantonghang
如:
print '-'.join(PinyinHelper.convertToPinyinFromSentence('了解了,Mike'))
如何实现输出: liǎo-jiě-le,Mike
而不是 liǎo-jiě-le,M-i-k-e
感谢开源的拼音工具,这里是我评测拼音工具对多音字的识别情况的一些错误例子:
dǔ【真正读音】 拌肚丝【原词】
bàndùsī【工具拼出的错误拼音】
bà 花把、刀把
huā bǎ 、 dāo bǎ
bèi 背篼、背运、
bēi dōu 、 bēi yùn
bāo 剥花生
bō huā shēng
bào 平刨、糟刨
píng páo 、 zāo páo
zhǎng 长膘、长脸、长亲
cháng biāo 、 cháng liǎn 、 cháng qīn
cēn 纵横参错
zòng héng cān cuò
shēn 参商
cān shāng
chāi 差事
chà shì
chèn 称身、称愿、称体裁衣
chēng shēn 、 chēng yuàn 、 chēng tǐ cái yī
zàng 藏蓝、藏历
cáng lán 、 cáng lì
chóng 重版、重播
zhòng bǎn 、 zhòng bō
dài 大王、大夫、大黄、大城
dà wáng、 dà fū、 dà huáng 、dà chéng
dì 有的放矢
yǒu de fàng shǐ
diào 调类
tiáo lèi
dǔ 拌肚丝
bàn dù sī
duó 测度
cè dù
fà 发妻
fā qī
fèn 分量
fēn liàng
fèng 门缝、缭缝儿
mén féng、 liáo féng ér
gú 骨头、骨节、懒骨头、硬骨头
gǔ tou 、 gǔ jié 、 lǎn gǔ tou 、 yìng gǔ tou
háo 怒号
nù hào
hè 吆喝、喝问
yāo he、 hē wèn
huà 划款
huá kuǎn
hè 奉和
fèng hé
huò 和弄
hé nòng
hú 和了
hé le
hèng 横死
héng sǐ
hǒng 哄弄
hōng nòng
jì 系围裙
xì wéi qún
jiān 间距
jiàn jù
jiàng 将令
jiāng lìng
jìn 尽量
jǐn liàng
jǐ 给付、给水、
gěi fù 、 gěi shuǐ
jiè 起解、解送、
qǐ jiě 、 jiě sòng
xiè 姓解、解池
xìng jiě 、 jiě chí
jué 口角
kǒu jiǎo
I've installed ChineseTone using
pip install ChineseTone
It automatically downloaded version ChineseTone-0.1.4, but this version doesn't include this commit 3b8fcd6 with encoding fixes, so I keep getting encoding errors. (I had to update manually in order to get rid of the errors).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.