Giter Club home page Giter Club logo

time_nlp's Introduction

简介

Time-NLP的python3版本
python 版本https://github.com/sunfiyes/Time-NLPY
Java 版本https://github.com/shinyke/Time-NLP PHP 版本https://github.com/crazywhalecc/Time-NLP-PHP

功能说明

用于句子中时间词的抽取和转换
详情请见test.py

res = tn.parse(target=u'过十分钟') # target为待分析语句,timeBase为基准时间默认是当前时间
print(res)
res = tn.parse(target=u'2013年二月二十八日下午四点三十分二十九秒', timeBase='2013-02-28 16:30:29') # target为待分析语句,timeBase为基准时间默认是当前时间
print(res)
res = tn.parse(target=u'我需要大概33天2分钟四秒', timeBase='2013-02-28 16:30:29') # target为待分析语句,timeBase为基准时间默认是当前时间
print(res)
res = tn.parse(target=u'今年儿童节晚上九点一刻') # target为待分析语句,timeBase为基准时间默认是当前时间
print(res)
res = tn.parse(target=u'2个小时以前') # target为待分析语句,timeBase为基准时间默认是当前时间
print(res)
res = tn.parse(target=u'晚上8点到上午10点之间') # target为待分析语句,timeBase为基准时间默认是当前时间
print(res)

返回结果:

{"timedelta": "0 days, 0:10:00", "type": "timedelta"}
{"timestamp": "2013-02-28 16:30:29", "type": "timestamp"}
{"type": "timedelta", "timedelta": {"year": 0, "month": 1, "day": 3, "hour": 0, "minute": 2, "second": 4}}
{"timestamp": "2018-06-01 21:15:00", "type": "timestamp"}
{"error": "no time pattern could be extracted."}
{"type": "timespan", "timespan": ["2018-03-16 20:00:00", "2018-03-16 10:00:00"]}

使用方式

demo:python3 Test.py

优化说明

问题 以前版本 现在版本
无法解析下下周末 "timestamp": "2018-04-01 00:00:00" "timestamp": "2018-04-08 00:00:00"
无法解析 3月4 "2018-03-01" "2018-03-04"
无法解析 初一 初二 cannot parse "2018-02-16"
晚上8点到上午10点之间 无法解析上午 ["2018-03-16 20:00:00", "2018-03-16 22:00:00"] ["2018-03-16 20:00:00", "2018-03-16 10:00:00"]
3月21号  错误解析成2019年     "2019-03-21" "2018-03-21" 

感谢@tianyuningmou 目前增加了对24节气的支持

temp = ['今年春分']
"timestamp" : "2020-03-20 00:00:00"

TODO

问题 现在版本 正确
晚上8点到上午10点之间 ["2018-03-16 20:00:00", "2018-03-16 22:00:00"] ["2018-03-16 20:00:00", "2018-03-17 10:00:00"]"

time_nlp's People

Contributors

crazywhalecc avatar tianyuningmou avatar tobytyx avatar zhanzecheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

time_nlp's Issues

解析晚上 8点半 返回时明天晚上8点

temp ['晚上8点半']
{"type": "timestamp", "timestamp": "2019-05-13 20:30:00"}
2019-5-12-14-5-14
temp ['晚上9点半']
{"type": "timestamp", "timestamp": "2019-05-13 21:30:00"}
2019-5-12-14-5-14
temp ['8点半']
{"type": "timestamp", "timestamp": "2019-05-13 08:30:00"}
2019-5-12-14-5-14
temp ['2月28日11点半']
{"type": "timestamp", "timestamp": "2020-02-28 11:30:00"}

日志时间补充

  1. 可以添加一些**常见的节假日,如:春节、元宵、五一、国庆、父亲节等
  2. 可以添加一些相对时间解析,如:1小时之后提醒我起床 (?再过|过|再过去|再经过)? 日期时间 ((?之后|后|以后|过后)|(?之前|前|以前)|(?内|之内|以内|间|之间|里|中|中间))?
  3. timedelta 可以优化支持 3年半 和 3年半个月
  4. 必要时要结合分词判断是时间还是程度副词,比如 一点都不喜欢你

过去的时间段和未来的时间段不分

输入:未来一周
(['未来1周'], '{"timedelta": {"second": 0, "minute": 0, "hour": 0, "day": 7, "year": 0, "month": 0}, "type": "timedelta"}')
输入:
过去一周:
(['过去1周'], '{"timedelta": {"minute": 0, "hour": 0, "day": 7, "month": 0, "year": 0, "second": 0}, "type": "timedelta"}')
在对这种时间的处理上还是需要进行优化的啊

部分情况遗漏起始时间解析

temp ['明天', '上午9点', '11点']
{"type": "timespan", "timespan": ["2019-05-13 00:00:00", "2019-05-13 09:00:00"]}
2019-5-12-14-5-14
temp ['早上9点', '11点']
{"type": "timespan", "timespan": ["2019-05-13 09:00:00", "2019-05-13 11:00:00"]}
2019-5-12-14-5-14
temp ['上午9点', '11点']
{"type": "timespan", "timespan": ["2019-05-13 09:00:00", "2019-05-13 11:00:00"]}

运行报错No module named 'regex._regex'

在本地运行可以,上传到云函数运行报No module named 'regex._regex'
{
"errorCode":-1,
"errorMessage":"Traceback (most recent call last):
File "/var/runtime/python3/bootstrap.py", line 133, in init_handler
func_handler = get_func_handler(file.rsplit(".", 1)[0], func)
File "/var/runtime/python3/bootstrap.py", line 159, in get_func_handler
mod = imp.load_module(mname, *imp.find_module(mname))
File "/var/lang/python3/lib/python3.6/imp.py", line 234, in load_module
return load_source(name, filename, file)
File "/var/lang/python3/lib/python3.6/imp.py", line 172, in load_source
module = _load(spec)
File "", line 675, in _load
File "", line 655, in _load_unlocked
File "", line 678, in exec_module
File "", line 205, in _call_with_frames_removed
File "/var/user/index.py", line 9, in
from TimeNormalizer import TimeNormalizer # 引入包
File "/var/user/TimeNormalizer.py", line 8, in
import regex as re
File "/var/user/regex/init.py", line 1, in
from .regex import *
File "/var/user/regex/regex.py", line 419, in
import regex._regex_core as _regex_core
File "/var/user/regex/_regex_core.py", line 21, in
import regex._regex as _regex
ModuleNotFoundError: No module named 'regex._regex'",
"requestId":"79e742a0-acdd-474d-bc2c-248fbaefdfdd",
"statusCode":443
}

月末,年末无法识别

3月末识别为3月,本月末识别为本月

看正则中是有相关判断的,不知道是什么问题

PS:我在windows下跑

"半个小时以内"无法识别

经测试, 可以准确识别 “一个小时以内” “三十分钟以内”,但无法识别“半个小时以内”

另外,如果是“0.5小时以内”,可以识别,但会错误识别为5小时

如果你遇到 bug 了,不妨试一下另一个时间解析工具包 JioNLP

JioNLP

import time
import jionlp as jio
res = jio.parse_time('今年9月', time_base={'year': 2021})
res = jio.parse_time('零三年元宵节晚上8点半', time_base=time.time())
res = jio.parse_time('一万个小时')
res = jio.parse_time('100天之后', time.time())
res = jio.parse_time('每周五下午4点', time.time())
print(res)

# {'type': 'time_span', 'definition': 'accurate', 'time': ['2021-09-01 00:00:00', '2021-09-30 23:59:59']}
# {'type': 'time_point', 'definition': 'accurate', 'time': ['2003-02-15 20:30:00', '2003-02-15 20:30:59']}
# {'type': 'time_delta', 'definition': 'accurate', 'time': {'hour': 10000.0}}
# {'type': 'time_span', 'definition': 'blur', 'time': ['2021-10-22 00:00:00', 'inf']}
# {'type': 'time_period', 'definition': 'accurate', 'time': {'delta': {'day': 7}, 
#  'point': {'time': ['2021-07-16 16:00:00', '2021-07-16 16:59:59'], 'string': '周五下午4点'}}}

jio.ner.extract_time('据央视新闻消息,10月12日,福建省莆田市政府召开疫情防控情况新闻发布会,介绍最新情况。据通报,从本月10日至12日16时,大约两天时间内,累计报告新冠病毒核酸阳性64例,平均每日新增病例30例,其中确诊病例32例、无症状感染者32例。')

多个时间描述提取不全

res = tn.parse(target="晚上八点到九点,明天中午给我")

运行结果:

temp ['晚上8点', '9点', '明天中午']
{"type": "timespan", "timespan": ["2018-11-17 20:00:00", "2018-11-17 21:00:00"]}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.