Giter Club home page Giter Club logo

Comments (16)

StephenOTT avatar StephenOTT commented on August 17, 2024

This already exists
https://github.com/tmlee/time_difference

from chronic_duration.

natesire avatar natesire commented on August 17, 2024

Thanks. I emailed the founder of time_difference. I am actually looking for something that uses machine learning in a natural language approach. I need to parse human written date ranges. I might fork your chronic and post the beginnings of it. I am still deciding on which language to implement the machine learning in. Python has a great NLP TLKT. And C++ for Ruby extensions might take a while. But I even like Scala. Any ideas?

from chronic_duration.

StephenOTT avatar StephenOTT commented on August 17, 2024

From a ruby perspective do you have a aversion to wrapping chronic with time_duration?

Something like this:

require 'chronic'
require 'time_difference'

humanStatement1 = "this tuesday 1pm"
humanStatement2 = "this tuesday 3pm"

humanStatement1Parsed = Chronic.parse(humanStatement1)
humanStatement2Parsed = Chronic.parse(humanStatement2)

# very human readable version
puts TimeDifference.between(humanStatement1Parsed, humanStatement2Parsed).in_hours  #=> 2.0

# No need for the Prased Variables version
puts TimeDifference.between(Chronic.parse(humanStatement1), Chronic.parse(humanStatement2)).in_hours  #=> 2.0

# Single Line version
puts TimeDifference.between(Chronic.parse("this tuesday 1pm"), Chronic.parse("this tuesday 3pm")).in_hours  #=> 2.0

from chronic_duration.

StephenOTT avatar StephenOTT commented on August 17, 2024

Use your NLP to tokenize the statements into the start date token and the end date token (humanStatement1 and humanStatement2)

from chronic_duration.

StephenOTT avatar StephenOTT commented on August 17, 2024

For NLP have you looked at OpenNLP?
http://opennlp.apache.org

and then for the ruby bindings, use:
https://github.com/louismullie/open-nlp

from chronic_duration.

natesire avatar natesire commented on August 17, 2024

I am testing time_difference. I didn't even know about openNLP. Awesome. I am checking all of this out.

from chronic_duration.

natesire avatar natesire commented on August 17, 2024

I have to handle all kinds of weird characters like - / -- & etc... that can be inside and outside parts of the dates. I am going to write the more advanced parsing in Scala.

from chronic_duration.

StephenOTT avatar StephenOTT commented on August 17, 2024

This is why you have NLP to tokenize your text to remove useless characters or replace the unneeded characters or words.

from chronic_duration.

natesire avatar natesire commented on August 17, 2024

I see. Tokenization should work. Currently, my algorithm reads the sentence from 0 till chronic returns nil. Then it reads the sentence backwards until the previous nil point. I'll check and see how well tokenization can just provide me two dates.

from chronic_duration.

natesire avatar natesire commented on August 17, 2024

Here's an example I am running into with chronic.
'Jan first week' is nil
'Jan first' is valid in chronic
'Jan' isn't valid, chronic returns 2015-01-16 12:00:00 -0500

So your idea is to erase 'week' and leave 'first', using tokenization?

from chronic_duration.

natesire avatar natesire commented on August 17, 2024

I wrote a test in Python.

Here is the output
[('Available', 'JJ'), ('June', 'NNP'), ('9', 'CD'), ('--', ':'), ('August', 'NNP'), ('first', 'JJ'), ('week', 'NN')]
['June', '9', 'August']
['June', '9', 'August']

import nltk
import MySQLdb
import time
import string
import re

#tokenize
sentence = 'Available June 9 -- August first week'
tokens = nltk.word_tokenize(sentence)

parts_of_speech = nltk.pos_tag(tokens)
print parts_of_speech

#allow only prepositions
#NNP, CD

approved_prepositions = ['NNP', 'CD']
filtered = []
for word in parts_of_speech:

if any(x in word[1] for x in approved_prepositions):
    filtered.append(word[0])

print filtered

#normalize to alphanumeric only
normalized = re.sub(r'\s\W+', ' ', ' '.join(filtered))
print filtered

from chronic_duration.

natesire avatar natesire commented on August 17, 2024

I can write a white-list function for words like 'first'. I am really liking this solution. Great idea to tokenize. Now I need a different excuse to write something in Scala. hahahahaha

from chronic_duration.

StephenOTT avatar StephenOTT commented on August 17, 2024

Here's an example I am running into with chronic.
'Jan first week' is nil
'Jan first' is valid in chronic
'Jan' isn't valid, chronic returns 2015-01-16 12:00:00 -0500

So your idea is to erase 'week' and leave 'first', using tokenization?

for examples like this i would make assumptions about the formats for the dates. Example if someone does "Jan First Week" you use NLP to grab the Month, and they they want Week 1. Then use the ruby date library to grab the day 1 in week 1 and day 7 in week 1.

from chronic_duration.

StephenOTT avatar StephenOTT commented on August 17, 2024

Take a look at this for an example of grabbing the date of a day number in a week number: http://www.ruby-doc.org/stdlib-2.1.1/libdoc/date/rdoc/Date.html#method-c-commercial

Then use the time_difference library to get the duration.

from chronic_duration.

natesire avatar natesire commented on August 17, 2024

I wrote a white-list function. Python is handling things beautifully. I can feed the output into chronic. I can call a python script from ruby. Let me know if chronic needs contributions.

from chronic_duration.

StephenOTT avatar StephenOTT commented on August 17, 2024

great

from chronic_duration.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.