Giter Club home page Giter Club logo

cn2an's Introduction

👋 Hey! I am an algorithm engineer.
My current work focuses on Natural Language Processing



🍵 Wanna chat? 👉 Talk me on Zhihu


github stats

🐍 Python
  • 📦 cn2an: 快速转化「中文数字」和「阿拉伯数字」。
  • 🦅 en2an: 快速转化「英文数字」和「阿拉伯数字」。
  • 😏 two: 随机一句「中二」的台词!
  • ashe: 一个 Python 语言的超级扩展。
  • 🐢 suo: 一个「中英文缩写转化」的工具包。
  • 📻 mulan: 人类的本质之木兰诗「复读机」~
  • 🔨 torbjorn: 提供一些实用的 Python 装饰器~
  • ✂️ simjb: 用 100 行实现简单版本的 jieba 分词。
  • 🏆 award: 一个用来表示「数据」和「链接」的图标生成器。
  • 🧪 roseta: 从非结构化数据到结构化数据!
🚀 Julia
  • 👋 Hey Julia: Julia 语言入门。
  • 📦 Cn2An.jl: Convert Chinese Numerals To Arabic Numerals With Julia Language.
🐦 Swift

cn2an's People

Contributors

20071313 avatar ailln avatar beants avatar yudong27 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

cn2an's Issues

导入错误!

import cn2an
Traceback (most recent call last):
File "", line 1, in
File "/home/liuhe/anaconda3/lib/python3.6/site-packages/cn2an-0.1.2-py3.6.egg/cn2an/init.py", line 11, in
cn2an = cn2an.Cn2An().cn2an
File "/home/liuhe/anaconda3/lib/python3.6/site-packages/cn2an-0.1.2-py3.6.egg/cn2an/cn2an.py", line 11, in init
self.conf = utils.get_default_conf()
File "/home/liuhe/anaconda3/lib/python3.6/site-packages/cn2an-0.1.2-py3.6.egg/cn2an/utils.py", line 7, in get_default_conf
with codecs.open("./cn2an/config.yaml", "r", encoding="utf-8") as f_config:
File "/home/liuhe/anaconda3/lib/python3.6/codecs.py", line 895, in open
file = builtins.open(filename, mode, buffering)
FileNotFoundError: [Errno 2] No such file or directory: './cn2an/config.yaml'

指定输出数字的字长

建议增加选项,输出指定位数的数字,如

>>> cn2an.transform(“第一章”, "cn2an", 2)
第01章

utils.py 指定 encoding="UTF-8"

直接安装报错, 下载安装也报错

cd cn2an && python setup.py install
Traceback (most recent call last):
  File "setup.py", line 4, in <module>
    from cn2an import version
  \cn2an\cn2an\__init__.py", line 9, in <module>
    cn2an = cn2an.Cn2An().cn2an
  \cn2an\cn2an\cn2an.py", line 9, in __init__
    self.conf = utils.get_default_conf()
  File "\cn2an\cn2an\utils.py", line 8, in get_default_conf
    config_data = yaml.load(f_config, Loader=yaml.FullLoader)
  File "C:\Python\envs\37\lib\site-packages\yaml\__init__.py", line 112, in load
    loader = Loader(stream)
  File "C:\Python\envs\37\lib\site-packages\yaml\loader.py", line 24, in __init__
    Reader.__init__(self, stream)
  File "C:\Python\envs\37\lib\site-packages\yaml\reader.py", line 85, in __init__
    self.determine_encoding()
  File "C:\Python\envs\37\lib\site-packages\yaml\reader.py", line 124, in determine_encoding
    self.update_raw()
  File "C:\Python\envs\37\lib\site-packages\yaml\reader.py", line 178, in update_raw
    data = self.stream.read(size)
UnicodeDecodeError: 'gbk' codec can't decode byte 0xb6 in position 18: illegal multibyte sequence

改一下utils.py, 指定encoding解决了

import os
import yaml


def get_default_conf():
    with open(f"{os.path.dirname(__file__)}/config.yaml", "r", encoding="UTF-8") as f_config:
        config_data = yaml.load(f_config, Loader=yaml.FullLoader)
    return config_data

中文念法的差异

“一亿五千万零六千三百五十五”,这种应该是正常的念法,要将其转为150006355,但是我发现cn2an目前并不支持该种格式,需要把中间的零去掉才支持,但是如果用an2cn将150006355转换为中文,得到的又是“一亿五千万零六千三百五十五”。不是很懂这种情况。

编译执行出错

你好,我用pyinstaller 打包py文件,然后exe执行,会出错,出错信息如下。
(docparser) PS C:\Users\49476\glory\work\code_test\dist\main> .\main.exe
Traceback (most recent call last):
File "main.py", line 2, in
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in load_unlocked
File "PyInstaller\loader\pyimod03_importers.py", line 546, in exec_module
File "cn2an_init
.py", line 7, in
File "cn2an\cn2an.py", line 9, in init
File "cn2an\utils.py", line 10, in get_default_conf
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\49476\glory\work\code_test\dist\main\cn2an/config.yaml'
[27324] Failed to execute script 'main' due to unhandled exception!

重现如下:
新建一个mian.py文件,内容如下:
import cn2an
def print_hi(name):
print(f'Hi, {name}')
if name == 'main':
print_hi('PyCharm')

然后执行 pyinstaller main.py。
会生成dist/main文件夹,然后执行main.exe,出错

我是在window10平台做的测试,不清楚什么原因,目前是认为打包后,cn2an代码目录结构破坏了,找不到文件,
不知道能否提供改进建议,谢谢!

1百01识别错误

Describe the bug
1百01识别错误

Screenshots
Code_2022-10-16_21-32-19

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

pyinstaller打包exe后运行报错,隐藏的依赖

#43 与这类似,猜测是因为cn2an存在隐藏的依赖,是代码规范问题
测试代码

import cn2an

t = cn2an.Transform()
print(t.transform("零八八八八"))

打包命令
pyinstaller -F test.py
过程中warnings

6634 WARNING: Hidden import "pkg_resources.py2_warn" not found!
6634 WARNING: Hidden import "pkg_resources.markers" not found!

运行报错

Traceback (most recent call last):
  File "test.py", line 1, in <module>
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "PyInstaller\loader\pyimod03_importers.py", line 476, in exec_module
  File "cn2an\__init__.py", line 7, in <module>
  File "cn2an\cn2an.py", line 12, in __init__
  File "cn2an\utils.py", line 15, in get_default_conf
  File "pkg_resources\__init__.py", line 1142, in resource_stream
  File "pkg_resources\__init__.py", line 1390, in get_resource_stream
  File "pkg_resources\__init__.py", line 1393, in get_resource_string
  File "pkg_resources\__init__.py", line 1560, in _get
  File "PyInstaller\loader\pyimod03_importers.py", line 325, in get_data
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\****\\AppData\\Local\\Temp\\_MEI153042\\cn2an\\config.yaml'
[15460] Failed to execute script 'test' due to unhandled exception!

使用pyinstaller 打包成功 cn2an

找了整个网络都没看见打包成功的案例,我只能自己上手打包。

pyinstaller win py3.8.8


├─dist
│ │ amain.exe
│ │ 程序运行错误记录.py
│ │
│ └─配置表
│ 产品配置表.xlsx

├─ui_file
│ load_file.ui
│ main_ui.ui
│ main_ui1.ui
│ zhcdict.json

├─__pycache__
│ amain.cpython-38.pyc
│ an2cn.cpython-38.pyc
│ an2cn_test.cpython-38.pyc
│ cn2an.cpython-38.pyc
│ cn2an_test.cpython-38.pyc
│ funciton_list.cpython-38.pyc
│ glo.cpython-38.pyc
│ main.cpython-38.pyc
│ performance.cpython-38.pyc
│ transform.cpython-38.pyc
│ transform_test.cpython-38.pyc
│ utils.cpython-38.pyc
init.cpython-38.pyc

├─当前查找得到的结果
│ transform.py
│ transform_test.py
init.py
│ 程序运行错误记录.py
│ 高效检索代码程序.py

└─配置表
产品配置表.xlsx

注意 打包以后实际上是需要两个资源,一个 是 cn2an 自己本身的 cn2an/config.yaml
一个 是 zhconv/zhcdict.json 这个都需要自己手动加载到 打包以后的临时文件夹中, 自己用代码挪动就行
如何获得打包以后的 临时路径,请使用这个函数

import sys
import os

获得打包以后得路径。

def get_exe_path(relative_path=''):
if hasattr(sys, '_MEIPASS'):
return os.path.join(sys._MEIPASS, relative_path)
return os.path.join(os.path.abspath("."), relative_path)[:-1]

你还得修改打包的指令,第一次打包:
pyinstaller -F -import -i fff.ico amain.py -w
这样,生成了一个 amain.spec 文件,修改spec文件中
: datas=[("cn2an","cn2an"),("ui_file","ui_file")],

我的做法是在代码里面 转移了zhcdict.json ,做法是先创建一个文件夹,然后 复制进去。

第二次打包:
pyinstaller amain.spec
这样就结束了

如果实在不懂可以联系我。

微信: wo15985300747
qq: 1975767630

第一次对官方库提起问题,有点小激动。

希望可以最大支持到千兆

我希望可以最大支持到千兆,虽然目前的大小已经够了,但我要处理的数实在是太大太大了
谢谢啦!

这个库现在不能用了么

print(cn2an.cn2an("一百万零五十四"))
File "D:\old_machine\untitled\djando_env\lib\site-packages\cn2an\cn2an.py", line 26, in cn2an
self.check_input_data_is_valid(input_data)
File "D:\old_machine\untitled\djando_env\lib\site-packages\cn2an\cn2an.py", line 73, in check_input_data_is_valid
raise ValueError(u"输入的数据不在转化范围内:{}!".format(data))
ValueError: 输入的数据不在转化范围内:r!

转写【零一】的时候,将其结果转换为【1】,期望的结果是【01】

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

关于中文数字中省略”零“的情况

您好,在测试”一亿两千万三百万四百五十六“时无法解析,
我测试了一下要写成”一亿两千三百万零四百五十六“才行,
虽然我也知道小学就教过这里要写这个”零“,但是这个地方甲方爸爸不想写,出事了乙方全责,您看有什么办法可以支持一下吗?

参数过多的问题

问题描述

在某个程序中引入cn2an库,若运行主程序时包含命令行参数,尤其参数不少于3个的情况下,执行转换必报错。错因:参数过多。

个人浅见

cn2an引入了命令行模式,并强制从命令行接收参数。在某程序中调用cn2an方法,即使该程序命令行参数与cn2an无任何关联,例如debug模式、端口号等,cn2an还是会把这些参数视为自身命令行模式的输入,以至于出现参数过多的错误。

可否对考虑命令行模式与程序调用模式进行拆分?

相关代码

程序封装到docker里出现 no module named yaml

@Ailln 我现在把程序封装到docker里面,用docker compose 运行,但是在下载cn2an的时候一直报错:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-install-c6k7n574/cn2an/setup.py", line 5, in
from cn2an import version
File "/tmp/pip-install-c6k7n574/cn2an/cn2an/init.py", line 2, in
from . import cn2an
File "/tmp/pip-install-c6k7n574/cn2an/cn2an/cn2an.py", line 3, in
from . import utils
File "/tmp/pip-install-c6k7n574/cn2an/cn2an/utils.py", line 3, in
import yaml
ModuleNotFoundError: No module named 'yaml'

我的requirements文件已经包含了pyyaml 和cn2an。
gunicorn
uvicorn
loguru
pandas
sqlalchemy
redis
datetime
configparser
PyYAML>=5.1
cn2an
想问下这是怎么回事呢 ?

Originally posted by @huanggengkeng in #10 (comment)

簡體('参':\u53c2)不在conf.py內

Describe the bug
conf.py of the proces module map tranditional chinese word(參:\u53c3) to simplified chinese word('参':\u53c2) through the dictionary , T2S_DICT .

cn2an.py translate tranditional chinese word(參:\u53c3) to simplified chinese through proces.preprocess file , then use conf.py in cn2an module to convert chinese number to arabic numeral .

Howerver , in conf.py , there isn't corresponding simplified chinese word('参':\u53c2) in conf.py of cn2an module , which leads to conf.py can identify ('参':\u53c2) and convert it to arabic numeral .

To Reproduce
cn2an.cn2an('參','smart')

Desktop (please complete the following information):

  • OS: Window
  • Browser : chrome

报告数字与中文混合情况下的问题

Describe the bug
"各地累计报告接种新冠病毒疫苗104974.4万剂次"
新闻联播播音员说的是:
**十亿四千九百七十四**点四万剂次
但是cn2an的输出是
十万零四千九百七十四点四万

Desktop (please complete the following information):

  • OS: Linux
  • Version cn2an==0.5.11

ubuntu Python3.6 安装0.3.2错误

完整异常日志:

  Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-0xvk4tfr/cn2an/setup.py", line 4, in <module>
        from cn2an import version
      File "/tmp/pip-build-0xvk4tfr/cn2an/cn2an/__init__.py", line 9, in <module>
        cn2an = cn2an.Cn2An().cn2an
      File "/tmp/pip-build-0xvk4tfr/cn2an/cn2an/cn2an.py", line 9, in __init__
        self.conf = utils.get_default_conf()
      File "/tmp/pip-build-0xvk4tfr/cn2an/cn2an/utils.py", line 8, in get_default_conf
        config_data = yaml.load(f_config, Loader=yaml.FullLoader)
      File "/usr/local/lib/python3.6/dist-packages/yaml/__init__.py", line 112, in load
        loader = Loader(stream)
      File "/usr/local/lib/python3.6/dist-packages/yaml/loader.py", line 24, in __init__
        Reader.__init__(self, stream)
      File "/usr/local/lib/python3.6/dist-packages/yaml/reader.py", line 85, in __init__
        self.determine_encoding()
      File "/usr/local/lib/python3.6/dist-packages/yaml/reader.py", line 124, in determine_encoding
        self.update_raw()
      File "/usr/local/lib/python3.6/dist-packages/yaml/reader.py", line 178, in update_raw
        data = self.stream.read(size)
      File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 15: ordinal not in range(128)
    
    ----------------------------------------

换成0.3.1正常

导入错误

sorry,刚看到 仅支持 Python 的 3.6 以上版本;


3.6以下版本不支持f前缀,pip直接安装导入会报错
image

识别不要转化的数字

1、
输入:原价都是全国统一零售价它是幺三八
输出:原价都是全国统10售价它是138
统一零售价不用转吧
2、
输入:卖到几十块钱
输出:卖到几10块钱
我理解几十块钱也不需要转吧

Originally posted by @mengxifeng in #26 (comment)

不支持数字与中文单位混合金额转换

背景

大部分语音软件将中文金额转为文字时,会输出阿拉伯数字与中文单位的混合,如:
语音输入“一百万”,文字将转为“100万”
语音输入“十亿三千万”,文字将转为“十亿3000万”
语音输入“一百万两千”,文字将转为“100万两千”

预期的feature

cn2an.cn2an("100万")
>>> 1000000
cn2an.cn2an("十亿3000万")
>>> 1030000000
cn2an.cn2an("100万两千")
>>> 1002000

目前的bug

在中文口语里,只有两个相邻单位的数字,人们往往会直接省略掉后一个单位,如:
标准语句:“三万五千”,口语:“三万五”
标准语句:“两千六”,口语:“两千六”
标准语句:“一百二十”,口语:“一百二”
当输入的末尾不存在单位时,目前cn2an是存在bug的:

cn2an.cn2an("三万五")
>>> 30005 #理论值 35000
cn2an.cn2an("两千六")
>>> 2006 # 理论值 2600
cn2an.cn2an("一百二")
>>> 102 # 理论值 120

suggestion

suggestion_1

通过使用正则表达式去匹配“个”“十”“百”“千”“万”“亿”之前的汉字或阿拉伯数字,然后再进行转换。

suggestion_2

中文部分的实现可参照该链接
数字部分的提取可参照suggestion_1

illegal multibyte sequence

File "C:\Users\testter\AppData\Local\Temp\pycharm-packaging\cn2an\setup.py", line 21, in
long_description=open("./README.md", "r").read(),
UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position 62: illegal multibyte sequence

这个错误怎么解决?我当前Python 3.6版本,试过3.7也是有这个错误。

新需求🎈,戳这里👇

如果你有新的需求,可以来这个 issues 中讨论。

比如:我有个关于「人民币大写」转化的需求。

库本身的建议

程序是基于文件的,所以我认为程序是一个文件夹系统,最基本的就是这个
他包含代码和资源两个部分,
我在打包的时候,就发现 文件夹系统本身会发生变化,这种比较头疼的问题,还需要打包得到exe的人自己处理。

要是能够直接一次行代码就好了, 老哥可以自己保留 相关的资源文件,如果发现库被打包了,就在临时文件夹中创建对应的文件路径体系,把资源放进去。

官方文档有三个例子不能用

尝试了
cn2an.cn2an("一二三")
cn2an.cn2an("1百23", "smart")
cn2an.an2cn("1百23", "smart")
这三个都不能用

这是出错情况:

cn2an.cn2an("1百23", "smart")
Traceback (most recent call last):
File "", line 1, in
File "D:\softdown\python3.6.6\lib\site-packages\cn2an\cn2an.py", line 17, in cn2an
data_type = self.check_input_data_is_valid(inputs, mode)
File "D:\softdown\python3.6.6\lib\site-packages\cn2an\cn2an.py", line 42, in check_input_data_is_valid
raise ValueError(f"输入的数据不在转化范围内:{data}!")
ValueError: 输入的数据不在转化范围内:1!
cn2an.cn2an("一二三")
Traceback (most recent call last):
File "", line 1, in
File "D:\softdown\python3.6.6\lib\site-packages\cn2an\cn2an.py", line 17, in cn2an
data_type = self.check_input_data_is_valid(inputs, mode)
File "D:\softdown\python3.6.6\lib\site-packages\cn2an\cn2an.py", line 65, in check_input_data_is_valid
raise ValueError(f"不符合格式的数据:{integer_data}")
ValueError: 不符合格式的数据:一二三
cn2an.an2cn("1百23", "smart")
Traceback (most recent call last):
File "", line 1, in
File "D:\softdown\python3.6.6\lib\site-packages\cn2an\an2cn.py", line 11, in an2cn
raise ValueError("mode 仅支持 low up rmb smart 四种!")
ValueError: mode 仅支持 low up rmb smart 四种!

cn2an版本是0.3.6
python版本是3.6.6
系统是windows10

打包成功运行失败

File "PyInstaller\loader\pyimod02_importers.py", line 344, in get_data
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\ADMINI~1\AppData\Local\Temp\_MEI200602\cn2an\config.yaml'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.