Giter Club home page Giter Club logo

time_nlp's Issues

运行报错No module named 'regex._regex'

在本地运行可以,上传到云函数运行报No module named 'regex._regex'
{
"errorCode":-1,
"errorMessage":"Traceback (most recent call last):
File "/var/runtime/python3/bootstrap.py", line 133, in init_handler
func_handler = get_func_handler(file.rsplit(".", 1)[0], func)
File "/var/runtime/python3/bootstrap.py", line 159, in get_func_handler
mod = imp.load_module(mname, *imp.find_module(mname))
File "/var/lang/python3/lib/python3.6/imp.py", line 234, in load_module
return load_source(name, filename, file)
File "/var/lang/python3/lib/python3.6/imp.py", line 172, in load_source
module = _load(spec)
File "", line 675, in _load
File "", line 655, in _load_unlocked
File "", line 678, in exec_module
File "", line 205, in _call_with_frames_removed
File "/var/user/index.py", line 9, in
from TimeNormalizer import TimeNormalizer # 引入包
File "/var/user/TimeNormalizer.py", line 8, in
import regex as re
File "/var/user/regex/init.py", line 1, in
from .regex import *
File "/var/user/regex/regex.py", line 419, in
import regex._regex_core as _regex_core
File "/var/user/regex/_regex_core.py", line 21, in
import regex._regex as _regex
ModuleNotFoundError: No module named 'regex._regex'",
"requestId":"79e742a0-acdd-474d-bc2c-248fbaefdfdd",
"statusCode":443
}

月末,年末无法识别

3月末识别为3月,本月末识别为本月

看正则中是有相关判断的,不知道是什么问题

PS:我在windows下跑

过去的时间段和未来的时间段不分

输入:未来一周
(['未来1周'], '{"timedelta": {"second": 0, "minute": 0, "hour": 0, "day": 7, "year": 0, "month": 0}, "type": "timedelta"}')
输入:
过去一周:
(['过去1周'], '{"timedelta": {"minute": 0, "hour": 0, "day": 7, "month": 0, "year": 0, "second": 0}, "type": "timedelta"}')
在对这种时间的处理上还是需要进行优化的啊

日志时间补充

  1. 可以添加一些**常见的节假日,如:春节、元宵、五一、国庆、父亲节等
  2. 可以添加一些相对时间解析,如:1小时之后提醒我起床 (?再过|过|再过去|再经过)? 日期时间 ((?之后|后|以后|过后)|(?之前|前|以前)|(?内|之内|以内|间|之间|里|中|中间))?
  3. timedelta 可以优化支持 3年半 和 3年半个月
  4. 必要时要结合分词判断是时间还是程度副词,比如 一点都不喜欢你

多个时间描述提取不全

res = tn.parse(target="晚上八点到九点,明天中午给我")

运行结果:

temp ['晚上8点', '9点', '明天中午']
{"type": "timespan", "timespan": ["2018-11-17 20:00:00", "2018-11-17 21:00:00"]}

部分情况遗漏起始时间解析

temp ['明天', '上午9点', '11点']
{"type": "timespan", "timespan": ["2019-05-13 00:00:00", "2019-05-13 09:00:00"]}
2019-5-12-14-5-14
temp ['早上9点', '11点']
{"type": "timespan", "timespan": ["2019-05-13 09:00:00", "2019-05-13 11:00:00"]}
2019-5-12-14-5-14
temp ['上午9点', '11点']
{"type": "timespan", "timespan": ["2019-05-13 09:00:00", "2019-05-13 11:00:00"]}

"半个小时以内"无法识别

经测试, 可以准确识别 “一个小时以内” “三十分钟以内”,但无法识别“半个小时以内”

另外,如果是“0.5小时以内”,可以识别,但会错误识别为5小时

如果你遇到 bug 了,不妨试一下另一个时间解析工具包 JioNLP

JioNLP

import time
import jionlp as jio
res = jio.parse_time('今年9月', time_base={'year': 2021})
res = jio.parse_time('零三年元宵节晚上8点半', time_base=time.time())
res = jio.parse_time('一万个小时')
res = jio.parse_time('100天之后', time.time())
res = jio.parse_time('每周五下午4点', time.time())
print(res)

# {'type': 'time_span', 'definition': 'accurate', 'time': ['2021-09-01 00:00:00', '2021-09-30 23:59:59']}
# {'type': 'time_point', 'definition': 'accurate', 'time': ['2003-02-15 20:30:00', '2003-02-15 20:30:59']}
# {'type': 'time_delta', 'definition': 'accurate', 'time': {'hour': 10000.0}}
# {'type': 'time_span', 'definition': 'blur', 'time': ['2021-10-22 00:00:00', 'inf']}
# {'type': 'time_period', 'definition': 'accurate', 'time': {'delta': {'day': 7}, 
#  'point': {'time': ['2021-07-16 16:00:00', '2021-07-16 16:59:59'], 'string': '周五下午4点'}}}

jio.ner.extract_time('据央视新闻消息,10月12日,福建省莆田市政府召开疫情防控情况新闻发布会,介绍最新情况。据通报,从本月10日至12日16时,大约两天时间内,累计报告新冠病毒核酸阳性64例,平均每日新增病例30例,其中确诊病例32例、无症状感染者32例。')

解析晚上 8点半 返回时明天晚上8点

temp ['晚上8点半']
{"type": "timestamp", "timestamp": "2019-05-13 20:30:00"}
2019-5-12-14-5-14
temp ['晚上9点半']
{"type": "timestamp", "timestamp": "2019-05-13 21:30:00"}
2019-5-12-14-5-14
temp ['8点半']
{"type": "timestamp", "timestamp": "2019-05-13 08:30:00"}
2019-5-12-14-5-14
temp ['2月28日11点半']
{"type": "timestamp", "timestamp": "2020-02-28 11:30:00"}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.