Giter Club home page Giter Club logo

hawking's Introduction

HAWKING

A natural language date parser written in Java

AboutKey FeaturesConfigurations

About

Given any date expression in a sentence, Hawking will apply standard language recognition and parser techniques to produce a list of corresponding dates with optional parse and syntax information.

It supports for almost every existing date format: absolute dates, relative dates(on Sunday, at 12 PM, at 04/11/2021, etc)

Here is sample input:
HawkingTimeParser parser = new HawkingTimeParser();
String inputText = "Good morning, Have a nice day. Shall we meet on December 20 ?";
DatesFound datesFound = parser.parse(inputText, referenceDate, hawkingConfiguration, "eng");

//"eng" refers to English language
//referenceDate, hawkingConfiguration are explained in detail below
Output object "datesFound" contains following information and more:
Text : on December 20
Start : 2021-12-20T00:00:00.000+05:30
End : 2021-12-20T23:59:59.000+05:30

Check out the Demo file: HawkingDemo

Releases

Hawking date time parser releases are available in the Maven Central repository

Models

The models used in Hawking Time Parser in the latest code are listed below.

*Parser Model : Custom NER model to parse all dates in a sentence

src/main/resources/parser/parser.crf.ser.gz

*Stanford NLP POS Tagger: Used for Tense Prediction

src/main/resources/tense/english-left3words-distsim.tagger

Key Features

Here are some key features of Hawking Time Parser,

Tense Consideration :

Tense of the sentence is considered while parsing date.

I met you on Monday
I will meet you on Monday

Context Understanding :

Hawking Time Parser can understand and detect the date components based on context.

Good morning, I am so happy to see you(No dates)
Sun rises in the east(No dates) -- On sun, John met Lisa(Date: on sun)
I watched Tomorrow Never Dies(No dates) -- Can we go to the movie tomorrow ?(Date: tomorrow)

Multiple Dates :

It can detect multiple dates for a given input.

I am going for a trade show at evening. From December 2nd, most of the companies will be open in Tamil Nadu. (Dates: at evening, from December 2nd)

Date Time Relation and Duration :

Hawking time parser can parse duration or span cases .

Next 2 weeks, I am going to London for a conference.
We have been working on this project for 3 months.
Call me in 2 hours.

Prefix & Postfix Importance :

Prefix and postfix plays a vital role in parsing datetime text. Each prefix/postfix refers different date and time

Prefix -> Since Monday, Next Monday, within Monday, for Monday.
Postfix -> 2 weeks back, 10 months ago, 5 hours later.

Hawking time Parser can account 30+ prefix and suffix while parsing and returns date as input referred to(since, from, till, until, within, for, at, on, in, next, last, past and so on).

Complex cases :

Hawking can detect and parse date time more than 5 words.

next year 1st weekend Sunday morning 9 am

Reference time :

A reference time is the time from which date text has to be parsed

  *Let's say reference time = 01/11/2020 9:00 AM
    Input : "I will call at 2 PM"
    output = 01/11/2020 2:00 PM

  *Let's say reference time = 01/11/2020 8:00 PM
    Input : "I will call at 2 PM"
    Output = 02/11/2020 2:00 PM (As input is in future tense at night
    8 PM, it's obvious that it refers to tomorrow)

So parsing dates based on input reference time is a base for Hawking time parser(if reference time is not passed, current system time will be default reference time)

Time Zone Handling :

Hawking time parser can parse dates and time along with timezone. Currently it supports 500+ Time zones

  Call me at 9 AM Singapore time
  I will schedule demo at 13:30 PST
  Meeting will start at 10:00 AM (Asia/Calcutta)

Configurations

Hawking time Parser's versatility allows to the user to configure everything in the way they need the date text to parsed.

Needs for configuration :

Even though date and time is globally referred by same common words in English, the context of date text can differs from person to person, locale to locale, country to country, business to business.

When someone says "we will go out on weekend "
  -> In India it's Saturday and Sunday
  -> In UAE it's Friday and Saturday

Date formats :
  In Europe  = DD/MM/YYYY
  In United States = MM/DD/YYYY

So Hawking Configurations enable user to control the date and time output based on their perspective and requirements.

Cases for configurations :

Hawking Time Parser allows user set configuration to 30+ date and time components based on the needs.

*Show me meeting schedules for next 2 days

User can expect next 48 hours or next 2 days (from tomorrow))

*Show me meeting schedules for next week

WeekStart and WeekEnd can be given by user like Sunday to Saturday. For Business model user can set week to represent only weekdays-Monday to Friday

*I will call you at 12 am

User can set their time zone as default timezone instead of mentioning timezone in input every time

*Schedule meeting on 07/12/2020

User can set default dateformats like (DD/MM/YYYY or MM/DD/YYYY) more than 20 DateFormats is supported.

Business Custom Cases:

Hawking time parser supports business date cases like Fiscal year, Financial year, Annual year, quarterly, Q3, etc.

* Revenue generated this quarter
* In the last fiscal year, our company reached more than 20 million users

Users can set configuration for the Fiscal year start and end too based on their business model (February to January or April to March...)

To understand the Hawking configuration input, check: HawkingConfiguration

hawking's People

Contributors

ajar19 avatar arulvendhan avatar gooddima96 avatar venkat007-dev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hawking's Issues

Is there a non-GPL3 option

Although the docs say that the library is great for business apps, GPL3 isn't that useful for them. Does Zoho sell commercial licenses or otherwise provide some way to get a non-GPL3 version of the library?

One day difference

I was trying to get a date range based on the given example in the readme file in Hawking library.

My code:

        HawkingTimeParser parser = new HawkingTimeParser();
        String inputText = "Good morning, Have a nice day. Shall we meet on December 20 between 8 and 9 am ?";
        Date referenceDate = new Date();
        HawkingConfiguration hawkingConfiguration = new HawkingConfiguration();
        DatesFound datesFound = parser.parse(inputText, referenceDate, hawkingConfiguration, "eng");

        for (ParserOutput dateGroup : datesFound.getParserOutputs()) {
            System.out.println(dateGroup.getDateRange().getStart());
            System.out.println(dateGroup.getDateRange().getEnd());
        }

Expected:

2024-12-20T01:08:00.000+11:00
2024-12-20T10:09:00.000+11:00

Actual:

2024-12-20T01:00:00.000+11:00
2024-12-21T00:59:59.000+11:00
2024-12-20T10:00:00.000+11:00
2024-12-20T10:00:00.000+11:00

Any ideas why this may be happening or how I can overcome the issue?

Some observations that failed:

Note : All timezones are set to IST. And tested on 08-04-2021 (Thrusday IST)

Input : "there was a conference day before yesterday"
Output -> "endDateFormat": "2021-04-08T05:29:59" (Failed)

Input : "there will be a conference from today till next monday"
Output -> "startDateFormat": "2021-04-12T05:30:00", "endDateFormat": "2021-04-13T05:29:59" (Failed for start date)

Input : "there will be a conference today and next friday" (there is 2 spaces between "and" and "next")
Output : "startDateFormat": "2021-04-16T05:30:00", "endDateFormat": "2021-04-17T05:29:59" (Failed for start date)

Could you please help me out in these?

from until parsing

Hi,
I was the demo on few scenarios and for some reason durations arent able to come correctedly.
For example:
Input = "He worked in Google from 2012 to 2020."
expected output should have duration from 2012 to 2020 captured together.
However, actual output =
Jul 10, 2021 2:50:39 AM com.zoho.hawking.utils.DateTimeProperties emptyDatesRemover
INFO: DateTimeExtractor :: Unparsed date is present :: to 2020
Label : D
Text : from 2012
IsTimeZonePresent : false
Time Zone : +00:00
IsExactTimePresent : false
Id : 0
End Index : 29
Start Index : 20
Match Type :
Start : 2011-12-31T19:00:00.000-05:00
End : null
Sequence :MULTIPLE
Expression :(0)between(1)

to 2020 is unpared.

Similarly, if sentences has multiple examples with , separated if picks only partial either start date or end date but complete start and end date together.

Relative input truncates the less specific details of start to 0

Hey, so I'm using this library to set up a reminder system, but when I put in text of a relative time frame, it loses precision. My expectation in this is that itll use the exact moment for the calculations. For example:

The current date is 2022-08-22T16:50:10.000-07:00.

"In a year"

Expectation: Start : 2022-08-22T16:50:10.000-07:00. End : 2023-08-22T16:50:09.000-07:00.
Reality: The result is Start : 2022-12-31T23:00:00.000-08:00 End : 2023-12-31T22:59:59.000-08:00

"In a month"

Expectation: Start : 2022-08-22T16:50:10.000-07:00. End : 2022-09-22T16:50:09.000-07:00.
Reality: The result is Start : 2022-09-01T00:00:00.000-07:00 End : 2022-09-30T23:59:59.000-07:00

This even applies to lower values:

"In a minute"

Expectation: Start : 2022-08-22T16:50:10.000-07:00. End : 2022-08-22T16:51:09.000-07:00
Reality: The result is 2022-08-22T16:51:00.000-07:00. End : 2022-08-22T16:51:59.000-07:00

I am unsure if I am just simply configuring this wrong or if it is a bug, my configuration is:

HawkingTimeParser parser = new HawkingTimeParser();
Date referenceDate = new Date();
HawkingConfiguration hawkingConfiguration = new HawkingConfiguration();
hawkingConfiguration.setTimeZone("PST");
DatesFound datesFound = parser.parse(when, referenceDate, hawkingConfiguration, "eng");
System.out.println(datesFound.toString());

Please let me know if this is a bug with my configuration, but I dont believe it to be so as it is set up pretty much the same way as the example. I really appreciate this library existing, it is one of the only ones of its sort in Java, and if this could be fixed it would make using it much easier. Thanks.

Increase In Heap Memory due to "wordToSubstrings"

In Parser Model training a flag "cacheNGrams" is set to true in training props to reuse the NGram features which is recorded in a Hashmap "wordToSubstrings". While prediction updates any new words not seen in training and NGram features in the same hashmap in turn increases the object size and heap memory while running in a server.

Timing issue with parsing strings

I am looking at implementing the parser in my code and I am running into issues with the amount of time the parser takes to parse a simple string.

The first time that the parser is called, it took about 18 seconds to parse the string. The second time the parser was called, it took less than one second to parse the string. Note that the first string was "Event one tomorrow" and the second string was "Event two tomorrow".

My log output:

// The very first time the parser is called, it is pretty slow to return the parsed string
2024-06-11 18:40:31.959  4477-4648  TIMING                    com.x      D  in parser: 1718145631959 // time difference between HERE
2024-06-11 18:40:50.417  4477-4648  TIMING                    com.x      D  after parsing: 1718145650417 // and HERE is significant (18 sec)
2024-06-11 18:40:50.417  4477-4648  TIMING                    com.x      D  after get outputs: 1718145650417
2024-06-11 18:40:50.417  4477-4648  TIMING                    com.x      D  after get date groups: 1718145650417

// The second to n... time the parser is called, it is relatively quick
2024-06-11 18:40:59.715  4477-4649  TIMING                    com.x      D  in parser: 1718145659715 // here to below is super quick (< 1 sec)
2024-06-11 18:40:59.851  4477-4649  TIMING                    com.x      D  after parsing: 1718145659851 
2024-06-11 18:40:59.851  4477-4649  TIMING                    com.x      D  after get outputs: 1718145659851
2024-06-11 18:40:59.851  4477-4649  TIMING                    com.x      D  after get date groups: 1718145659851

I have tested this multiple times and always get the same results. Here is my code:

    public String[] parseEvent(String inputText, HawkingTimeParser dateTimeParser, HawkingConfiguration parserConfig) {

        Date referenceDate = new Date();
        DatesFound datesFound = null;

        DateTime startDateAndTime = null;
        DateTime endDateAndTime = null;
        String event = inputText;

        Log.d("TIMING", "in parser: " + System.currentTimeMillis());

        try {
            datesFound = dateTimeParser.parse(inputText, referenceDate, parserConfig, "eng");
            Log.d("TIMING", "after parsing: " + System.currentTimeMillis());

            List<ParserOutput> parserOutputs = datesFound.getParserOutputs();
            Log.d("TIMING", "after get outputs: " + System.currentTimeMillis());

            List<DateGroup> dateGroups = datesFound.getDateGroups();
            Log.d("TIMING", "after get date groups: " + System.currentTimeMillis());

            if (parserOutputs != null && dateGroups != null) {
                .... do things .....

Am I doing something incorrectly? Or is there a reason that the parser is pretty slow to initialize the first time around? Also I'm not sure if this changes anything, but I am initializing the HawkingTimeParser and HawkingConfiguration objects in my Kotlin code, then passing it in to this Java method.

CVE-2020-15250 vulnerability

Currently, you are using the junit lib version 4.8.2 but this version has a security vulnerability CVE-2020-15250.
This has been fixed since version 4.13.1.
Do you have any plan to bump this dependency?

Is there a python wrapper for this ?

Hi,
I was hoping to see if there was a python wrapper for this as I am not familiar with Java. It would also be helpful to see if there is a guide on how to use this. Looking forward to hearing from you all!

A couple of things don't seem quite right

A few I tested:

next Wednesday - correct
next Wednesday morning - incorrect (adds a day)
next Wednesday before 3pm - no response
Tomorrow morning - incorrect (adds a day)
Tomorrow afternoon - incorrect (adds a day)
Tomorrow at 3pm - correct
yesterday evening - correct
yesterday morning - incorrect - becomes today?
later this week - incorrect - puts next week

I used timezone EST for all of these but the docker library was likely running UTC. Not sure if the docker machine should be in the user or server timezone.

Is there somewhere you'd recommend I start looking?

A couple of edge cases don't appear to be coming through correctly

can someone come out this weekend in the afternoon

Text: this weekend
ID: 0
IsExactTimePresent: No
Start: Saturday, October 9, 2021 at 1:00:00 AM EDT
End: Monday, October 11, 2021 at 12:59:59 AM EDT
Text: afternoon
ID: 0
IsExactTimePresent: No
Start: Wednesday, October 6, 2021 at 1:00:00 PM EDT
End: Wednesday, October 6, 2021 at 5:59:59 PM EDT

do you have any availability next week sometime in the morning?

Text: next week
ID: 0
IsExactTimePresent: No
Start: Sunday, October 10, 2021 at 1:00:00 AM EDT
End: Sunday, October 17, 2021 at 12:59:59 AM EDT
Text: morning
ID: 0
IsExactTimePresent: No
Start: Thursday, October 7, 2021 at 2:00:00 AM EDT
End: Thursday, October 7, 2021 at 12:59:59 PM EDT

I guess the problem is when there are compound statements.

Do you think this is something we could do with hawking or do some external parsing in these cases?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.