akoumjian / datefinder Goto Github PK

View Code? Open in Web Editor NEW

625.0 18.0 165.0 537 KB

Find dates inside text using Python and get back datetime objects

Home Page: http://datefinder.readthedocs.org/en/latest/

License: MIT License

Python 36.13% HTML 63.87%

datetime parser nlp

datefinder's Introduction

datefinder - extract dates from text

A python module for locating dates inside text. Use this package to extract all sorts of date like strings from a document and turn them into datetime objects.

This module finds the likely datetime strings and then uses dateutil to convert to the datetime object.

Installation

With pip

pip install datefinder

Note: I do not publish the version on conda forge and cannot verify its integrity.

How to Use

In [1]: string_with_dates = """
   ...: ...
   ...: entries are due by January 4th, 2017 at 8:00pm
   ...: ...
   ...: created 01/15/2005 by ACME Inc. and associates.
   ...: ...
   ...: """

In [2]: import datefinder

In [3]: matches = datefinder.find_dates(string_with_dates)

In [4]: for match in matches:
   ...:     print match
   ...:
2017-01-04 20:00:00
2005-01-15 00:00:00

Demo

🎞️ Video demo by Calmcode.io. ⭐

datefinder's People

Contributors

Stargazers

Watchers

Forkers

ranchodeluxe jonafato lindauer ilyesdata joswinkj heaven00 cy-fir dpmontero mou55 gitter-badger techscientist rawouter priyanka-parida mklingn effulon merito tomgobravo llambeau atchoum31 nipunsadvilkar mrkafk gauravjuvekar bobquest33 mbatchkarov yanbutelskyy lexpredict chiraggiri brusteca pmelet ejvelasco iamjoshbinder andreiiacob sandeepnaidu oowowaee nicolaivr nanaakwasiabayieboateng tedroshaile startmat thclark primael newbazz rajeevamoorthy jigneshpadsumbiya33 chid tqhdesilva pliashkou anandrathidev abp272 qioubi wtimchiang mgobi bowenwen sumitkumarmbk gehendrasharma solveretur naveenkhasyap devkrish23 wolframalpha oceanos74 goldenpine rishabhbatra10 hibellm daniel-gherard zhaotianlin990121 jaxxstorm nkatebi connectitnet mamilov pabbob tdooskin sivaranjani1 0dadj1an daviddextercharles garethsparks rubens36 anandlonkar andreycorelli safewire nanaky92 diannamcallister rajatb94 amanb87 ecatkins gbroques fglz mikebirdgeneau saurabh1920 cooleel dujiaxin filipegood tqcai fcakyon rakeshravidata fatzh c-byte88 shshe doanminhtien-ins vishalbelsare zedian synapticarbors

datefinder's Issues

Regression in v0.6.1

Hello, it seems that changing from dateparser to dateutils in 0.6.1 significantly reduces accuracy in languages other than English.

In 0.6.0, installed from PyPI, I get the correct output for these Bulgarian dates

1 май 1974  			 1974-05-01 00:00:00
1 януари 1970  		 1970-01-01 00:00:00

In the latest master, I get:

1 януари 1970  		 1970-08-10 00:00:00
1 май 1974  			 1974-08-10 00:00:00

Dates are correctly identified, but they are not parsed right. It is suspicious that both inputs resolve to 10 August.

Month abbreviation with period fails

Using Python 3.5.2 and datefinder 0.6.0.
textstring = "Oct. , 1881"
find_dates returns no matches.
If the period following "Oct" is removed, then find_dates returns a match.

Periods after date abbreviations is quite common. I would have expected this to work.

Unable to parse dates for format 08082018

The find_dates method returns not a match if the string is set to 08082018 or any other string of similar format.

Failure to recognise dates with time ranges

"\n\n Wednesday 5th April 09.00 - 11.30\n Wednesday 5th April 15.00 - 17.30\n\n Friday 7th April 09.00 - 11.30"

-->

2009-04-05 00:00:00
2030-04-05 15:00:00
2030-04-07 09:00:00

Error of infinite recursion in R(Shiny)

I am working on R(Shiny) that is working on a loan prediction model. I am trying to calculate loan amount for a particular method but it is showing me an error for infinite recursion and just like mentioned in most of the posts even I have tried options(expressions = 1000) value, but am still experiencing the error.

If I write my condition as
MDBB_LA<- reactive({ input$MDBB*10 })
then it is giving me no error but if I add another condition on this as

DSCR_Post<- reactive({ if (input$MU == "EMM" & (input$EMIM/12)+EMI()!=0) { EBITDA_EMM()/((input$EMIM/12) + EMI()) } else if (input$MU == "EMM" & (input$EMIM/12)+EMI()==0) { 0 } else if (input$MU != "EMM" & (input$EMIM/12)+EMI()!=0 ){ EBITDA()/((input$EMIM/12) + EMI()) }else{ 0 }})

MDBB_LA<- reactive({ if ((input$MU == "EMM" & DSCR_Post() >= 1) | (input$MU == "FAT1" & DSCR_Post() >= 0.8) | (input$MU == "FAT2" & DSCR_Post() >= 0.7) | (input$MU == "UAT" & DSCR_Post() >= 0.5)) { input$MDBB*10*2 } else if ((input$MU == "EMM" & DSCR_Post() < 1) | (input$MU == "FAT1" & DSCR_Post() < 0.8 ) | (input$MU == "FAT2" & DSCR_Post() < 0.7) | (input$MU == "UAT" & DSCR_Post() < 0.5)){ input$MDBB*10 } else if ((input$MU == "MDBB1" ) | (input$MU == "MDBB2" ) | (input$MU == "MDBB3") | (input$MU == "MDBB4") ){ input$MDBB*10 } else {input$MDBB*10} })
then it is showing me error as :

Warning: Error in : evaluation nested too deeply: infinite recursion / options(expressions=)?

Anyone who can help me, what is going wrong with the logical statement.

wrong interpretation

I have the following string:
str1= 'SvrCk: 21 3:13p 04/19/16 Separate checks: 8-of-8'
the date I want to extract will be obviously 2016-04-19 (or, in full, 2016-04-19 15:13 - but I don't even need the time). Unfortunately, datefinder fails to recognize it:

matchesFuzzy = datefinder.find_dates(str1)
for match in matchesFuzzy: print(match)

gives this:

2016-08-21 03:13:00
2016-09-04 00:00:00
2016-08-08 00:00:00

and

matchesStrict = datefinder.find_dates(str1, strict=True)
for match in matchesStrict: print(match)

gives this:

2016-09-04 00:00:00

REPLACEMENTS keys should consider whitespace permutations

Issue #14 also highlights that REPLACEMENTS maybe should try to match certain permutations of a key surrounded by whitespace characters.

    REPLACEMENTS = {
        "standard": "",
        "daylight": "",
        "savings": "",
        "time": "",
        "date": "",
        "by": "",
        "due": "",
        "on": "",
        ",": "",
    }

For example, consider the key = 'to'

should match ' to'
should match 'to '
should match ' to '
should never match 'to' in say 'october'

See draft implementation with tests here

can't install datefinder

Failed building wheel for regex
Running setup.py clean for regex
Failed to build regex
Installing collected packages: regex, datefinder
Running setup.py install for regex ... error
Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build-QCCGKo/regex/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-I_ImpH-record/install-record.txt --single-version-externally-managed --compile:
/usr/local/lib/python2.7/dist-packages/setuptools/dist.py:351: UserWarning: Normalizing '2016.01.10' to '2016.1.10'
normalized_version,
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.7
copying Python2/regex.py -> build/lib.linux-x86_64-2.7
copying Python2/_regex_core.py -> build/lib.linux-x86_64-2.7
copying Python2/test_regex.py -> build/lib.linux-x86_64-2.7
running build_ext
building '_regex' extension
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/Python2
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c Python2/_regex.c -o build/temp.linux-x86_64-2.7/Python2/_regex.o
Python2/_regex.c:46:20: fatal error: Python.h: No such file or directory
#include "Python.h"
^
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

----------------------------------------

Command "/usr/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build-QCCGKo/regex/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-I_ImpH-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-QCCGKo/regex/

dateparser throws TypeError error with specific number combo

If datefinder attempts to parse a string such as "32 2016", it throws an error "TypeError: Required argument 'day' (pos 3) not found".

The combination needs to be a string with a combination of 2 and 4 digits, and the number with 2 digits needs to be greater than 31 or less than 1 (eg 00) to throw the error.

I found myself deep down the dateparser rabbit hole, but thought this scenario may need to be filtered in datefinder prior to passing to dateparser.parse. Eg something like the below added to parse_date_string (this is a bit ugly, but does the job)

date_string_split = date_string.split()
if len(date_string_split) == 2:
    if len(date_string_split[0]) == 2 and len(date_string_split[1]) == 4 and (int(date_string_split[0]) < 1 or int(date_string_split[0]) > 31):
            return None
    if len(date_string_split[1]) == 2 and len(date_string_split[0]) == 4 and (int(date_string_split[1]) < 1 or int(date_string_split[1]) > 31):
            return None

I can submit a pull request, but need to make my code prettier...

Fails on a specific string

In [13]: def fd(s):
    ...:     for match in datefinder.find_dates(s):
    ...:         print match
    ...:

In [31]: b
Out[31]: '37.47.96.153 - - [09/Jun/2017:00:00:47 +0200] "GET /style/common/img/icons/friendmsg.png HTTP/1.1" 304 -'

In [32]: fd(b)
0304-06-23 00:00:00

not able to download the package using pip

Hi,
Can't access the module. I am using python 3.5.
Below is the complete tracback of the error.

Exception:
Traceback (most recent call last):
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\basecommand.py", line 215, in main
    status = self.run(options, args)
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\commands\install.py", line 335, in run
    wb.build(autobuilding=True)
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\wheel.py", line 749, in build
    self.requirement_set.prepare_files(self.finder)
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\req\req_set.py", line 380, in prepare_files
    ignore_dependencies=self.ignore_dependencies))
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\req\req_set.py", line 554, in _prepare_file
    require_hashes
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\req\req_install.py", line 278, in populate_link
    self.link = finder.find_requirement(self, upgrade)
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\index.py", line 465, in find_requirement
    all_candidates = self.find_all_candidates(req.name)
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\index.py", line 423, in find_all_candidates
    for page in self._get_pages(url_locations, project_name):
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\index.py", line 568, in _get_pages
    page = self._get_page(location)
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\index.py", line 683, in _get_page
    return HTMLPage.get_page(link, session=self.session)
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\index.py", line 811, in get_page
    inst = cls(resp.content, resp.url, resp.headers)
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\index.py", line 731, in __init__
    namespaceHTMLElements=False,
TypeError: parse() got an unexpected keyword argument 'transport_encoding'

Extract a date from a string with additional numbers

I have a problem with these examples:

"09/05/2009 14:40 06", "03/29/2009 11:03 am 1", "32402 05/19/13 05:37", ...

In general, when a text line comes with a complete date inside and some extra number (let's assume it's a number, because I've already filtered those non significant characters), the module can't find a date on that line.

I guess this is a little bit difficult to parse, so I just wanted to know if there are some ideas to solve this, without going against the main pipeline of this engine. I'm able to post a PR to fix (or cover) this if it's a desirable behavior on the DataFinder engine.

Thanks!

Not able to extract dates from a string

I have tried running this code as:
string_with_dates = """
I want to apply for leaves from 12/12/2017 to 12/18/2017"""
import datefinder
matches = datefinder.find_dates(string_with_dates)
for match in matches:
print match

But it is not extracting any date values

Unable to detect dates

list(datefinder.find_dates('date: 11-05-16'))
gives an empty list

while list(datefinder.find_dates('date 11-05-16')) (without the colon) gives the correct result [datetime.datetime(2016, 11, 5, 0, 0)]

Date parsing is just wrong.

See this input text has no dates in it (not even numbers!):

text = '''Notwithstanding Lender’s acceleration of the sums secured by this Mortgage due to Borrower's default, Borrower shall have the right to have any proceedings begun by Lender to enforce this Mortgage discontinued at any time prior to entry of a judgment enforcing this Mortgage if: (a) Borrower pays Lender all sums which would be then due under this Mortgage and the Credit Agreement had no acceleration occurred; (b) Borrower cures all events of default; (c) Borrower pays all reasonable expenses incurred by Lender in enforcing the covenants and agreements of Borrower contained in this Mortgage, and in enforcing Lender’s remedies as provided in paragraph 22 hereof, including, but not limited to, reasonable attorneys' fees; and (a) Borrower takes such action as Lender may reasonably require to assure that the lien of this Mortgage, Lender's interest in the Property and Borrower’s obligation to pay the sums secured by this Mortgage shall continue unimpaired.'''

Running list(datefinder.find_dates(text)) produces
[(datetime.datetime(2017, 5, 9, 0, 0), 'may')]

Why?

Wrong Interpretation with "to" in string

For string "transactions from 14 Aug 2016 to 18 Aug 2016." the result is "no dates found". If I replace "to" with "till", it gives the correct result.

List of dates not recognised

"We should arrange meetings on the following dates:
April 3rd
4th
5th
6th"

No dates found

Regexp identifies months as a part of the word

Hi,

I think that an example should give you a better understanding about the problem.

>>> from datefinder import *
>>> date_finder = DateFinder()
>>> list(date_finder.extract_date_strings('unable to separate'))
[('to sep', (6, 13), {'timezones': [], 'digits': [], 'hours': [], 'months': ['sep'], 'delimiters': [' ', ' '], 'extra_tokens': ['to'], 'minutes': [], 'time_periods': [], 'time': [], 'seconds': [], 'days': [], 'digits_modifier': []})]
>>>
>>> list(date_finder.extract_date_strings('copy of octocat.txt document'))
[('of oct', (4, 11), {'timezones': [], 'digits': [], 'hours': [], 'months': ['oct'], 'delimiters': [' ', ' '], 'extra_tokens': ['of'], 'minutes': [], 'time_periods': [], 'time': [], 'seconds': [], 'days': [], 'digits_modifier': []})]
>>>
>>> list(date_finder.extract_date_strings('of Octoberrrrr'))
[('of October', (0, 10), {'timezones': [], 'digits': [], 'hours': [], 'months': ['October'], 'delimiters': [' '], 'extra_tokens': ['of'], 'minutes': [], 'time_periods': [], 'time': [], 'seconds': [], 'days': [], 'digits_modifier': []})]

Not sure how to fix this issue. If you have any idea, please, let me know. I can try to make a pull request in case if it solve the problem.

Minutes / Months

Hi !

I just wanted to start by thanking you for datefinder, exactly what I was looking.

I think I found a small bug but couldn't understand how to fix it in the source code.

When parsing strings such as:

>>> list(datefinder.find_dates('2 months'))[0]

it parses months as minutes, i.e. :

datetime.datetime(2016, 6, 10, 8, 31, 41, 469126)

If you could have a look at it, it'd be great ! Thanks a lot again !

String with leading digits not parsed

Hello,
a string which has digits before the date to be extracted is not correctly handled, and thus does not extract any dates (both in strict and non-strict mode).

Here is an example:
' sf.0008 05/04/17 21:34'
Kind regards

Identifies dates that are not present in the string

Below is the snippet:
`
import datefinder

s = '1.1 This Addendum applies to Software Maintenance & Support Services ("Services") for all copies of software designated on the attached Exhibit 1 ("S oftware") which you have licensed from ABC. The Services for specific Software products may be more fully described in attached Appendices. In the event of a conflict, the order of precedence will be the Appendices, this Addendum, and the Agreement, in that order.'
m = list(datefinder.find_dates(s))
print m
`

On printing m this is seen
[datetime.datetime(2017, 8, 1, 0, 0), datetime.datetime(2017, 8, 1, 0, 0), datetime.datetime(2017, 5, 14, 0, 0)]

There is no date present in the string. Is this an expected behaviour or a mistake from my end?

Parse range of dates

Find dates ranges, for example:

17 and 18 september 2016

from 7 to 14 of september 2016

from friday 2 of september to saturday 10 of 2016

september, from 7 to 10

from 31 august to 2 of september 2016

In spanish is typical:

del 17 al 20 de septiembre 2016

del 31 de agosto al 2 de septiembre 2016

Good job!!
Thanks!

Microsecond extraction different if delimited with ',' and '.'

Version 0.6.1 on Python3.5.2

>>> import datefinder
>>> df = datefinder.DateFinder()
>>> list(df.find_dates('2017-06-27 09:51:46,509'))
[datetime.datetime(2017, 6, 27, 9, 51, 46, 509000)]
>>> list(df.find_dates('2017-06-27 09:51:46.509'))
[datetime.datetime(2017, 6, 27, 9, 51, 46)]
>>>

On Python2.7.12, the ',' version is not found at all.

>>> import datefinder
>>> df = datefinder.DateFinder()
>>> list(df.find_dates('2017-06-27 09:51:46,509'))
[]
>>> list(df.find_dates('2017-06-27 09:51:46.509'))
[datetime.datetime(2017, 6, 27, 9, 51, 46)]
>>>

Multiple Dates next to each other are lumped together

Issue #14 highlights that our DATE_REGEX can match more than one date string in certain situations.

One situation where this occurs is when two dates bookend an EXTRA_TOKEN such as:

datestring = 'june 5th 2012 to january 1st 2014'

There is a test setup for this use case in this unmerged branch

Unable to recognize dates with keywords 'before' and 'after'

Somehow inspite of being in the relative patterns list the above two keywords aren't working.

import datefinder as df
text="check for flights between pune and mumbai before 10 PM"
matches = df.find_dates(text)
for match in matches:
 print match

Gives no return.

Missing day returns last day of the month, prefer 0.

Using Python 3.5.2 and datefinder 0.6.0.
textstring = "Oct , 1881"
find_dates returns '1881-10-31 00:00:00'
I would have preferred '1881-10-00 00:00:00'

Is there a way to force the default day?

datefinder can't find date with Monday, b'cause all "on" substrings are removed, while dateparser could parse it

Example: date_string= 'On Monday, May 9, 2016'
_find_and_replace() should remove all DateFinder.REPLACEMENT words, but in fact it removes all substrings, i.e. within words too,
so, _find_and_replace() wrongly mutates this into " mday may 9 2016"
i.e. it wrongly removes "on" from "monday".
dateparser can't parse " mday may 9 2016"
but it successfully parses 'On Monday, May 9, 2016'
How to correct: force _find_and_replace() function to not touch substrings within words, at least not touch "monday"s

Cannot find the following scenerios

and this is handpunched on 2017/08/27 @ 7:24 AM --- COULD NOT FIND

and this is handpunched on 08/24/2017 @ 7:24 AM --- COULD NOT FIND

and this is handpunched on August 27, 2017 at 7:57 AM === INCORRECT FIND 2027-08-03 07:57:00

test_find_dates test fails to detect erroneous None return

In some cases, find_dates returns None, but the unit test fails to detect this because it relies on a for loop construct. I'll submit a pull request to fix this.

update 'regex' dependency version

Why does setup.py dependencies list, point to an old version of regex library?

How fast to parse a text ?

Hello

I have this very small text and it seems it's very slow to parse it, am I doing something wrong?

text = "hello 2014/02/15 love you"

import datefinder

matches = datefinder.find_dates(text,False,False,True);

print "datefinder = "

for match in matches:
    if match.year>1800 :
        print match,match.year,match.month,match.day


$ time python test.py 
datefinder = 
2014-02-15 00:00:00 2014 2 15

real    0m1.380s
user    0m0.976s
sys 0m0.080s

Dates ending with Z timezone not recognized if there is no millisecond

Let me start with: this is exactly the module I was looking for, thanks a lot!

I'm having an annoying bug however, I have dates ending the with "Z" keyword for the timezone, which is ISO 8601 compliant but the module fails to parse it:

>>> list(datefinder.find_dates('INFO[2017-02-03T09:04:08Z] Done job'))
[]

>>> list(datefinder.find_dates('INFO[2017-02-03T09:04:08] Done job'))
[datetime.datetime(2017, 2, 3, 9, 4, 8)]

Wrong Detection

Okay so When I search this string "[[[uSaturday 04 January 2014 y 27 December 2013 uBooked u23:20 u1"

I get correctly -
2014-01-04 00:00:00
2013-12-27 00:00:00

But when I run it on "[[uMonday 30 December 2013 y 27 December 2013 u23:20 u1"
I get -
2013-12-27 00:00:00
2016-10-07 23:20:00

Why am I getting wrong results despite only a change in date?

test_parse_date_string fails comparing offset-naive and offset-aware types

>           assert actual_datetime == expected_date
E           TypeError: can't compare offset-naive and offset-aware datetimes

tests/test_parse_date_string.py:61: TypeError
----------------------------- Captured stdout call -----------------------------
DEBUG:tests.test_parse_date_string:acutal=2015-11-20 18:00:00-06:00  expected=2015-11-20 18:00:00

It looks like the problem is test test itself, unless I misunderstand what is supposed to be happening.

diff --git a/tests/test_parse_date_string.py b/tests/test_parse_date_string.py
index 68e8e6b..eee6366 100644
--- a/tests/test_parse_date_string.py
+++ b/tests/test_parse_date_string.py
@@ -41,7 +41,7 @@ logger = logging.getLogger(__name__)
     (' on 11-20-2015 6pm CST ',
      '11-20-2015 6pm',
      {'timezones':['CST']},
-     datetime(2015, 11, 20, 18, 0)
+     datetime(2015, 11, 20, 18, 0).replace(tzinfo=tz.gettz('CST'))
     ),
     # test a tz abbreviation that
     # dateutile.tz.gettz cannot find
@@ -49,7 +49,7 @@ logger = logging.getLogger(__name__)
     (' on 11-20-2015 6am IRST ',
      '11-20-2015 6am',
      {'timezones':['IRST']},
-     datetime(2015, 11, 20, 6, 0)
+     datetime(2015, 11, 20, 6, 0).replace(tzinfo=tz.gettz('IRST'))
     )
 ])
 def test_parse_date_string_find_replace(date_string, expected_parse_arg, expected_captures, expected_date):

Support for following format: e.g. 7th day of May, 2013

1:00 pm is not recognized but 1:00pm is...

To consider spaces between the time and the time period(am/pm)

Problem finding dates with EXTRA_TOKENS_PATTERNS words in sentence

Hi,
Thanks for writing this module, I've been playing around with it and have found that there seems to be an issue finding dates when words like "to", "by" and "until" are in the string. I notice these words are included in EXTRA_TOKENS_PATTERNS in datefinder.py but I'm not really familiar with dateutil module so not sure why this should cause an issue. Below is some output showing some examples where dates aren't identified and how swapping the word "to" for the word "so" means dates are correctly identified:

>>> chk = "i am looking for a date june 4th 1996 to july 3rd 2013"
>>> matches = datefinder.find_dates(chk)
>>> matchlist = list(matches)
>>> matchlist
[]
>>> chk = "i am looking for a date june 4th 1996 so july 3rd 2013"
>>> matches = datefinder.find_dates(chk)
>>> matchlist = list(matches)
>>> matchlist
[datetime.datetime(1996, 6, 4, 0, 0), datetime.datetime(2013, 7, 3, 0, 0)]
>>> chk = "october 27 1994 to be put into effect on june 1 1995"
>>> matches = datefinder.find_dates(chk)
>>> matchlist = list(matches)
>>> matchlist
[datetime.datetime(1995, 6, 1, 0, 0)]
>>> chk = "october 27 1994 so be put into effect on june 1 1995"
>>> matches = datefinder.find_dates(chk)
>>> matchlist = list(matches)
>>> matchlist
[datetime.datetime(1994, 10, 27, 0, 0), datetime.datetime(1995, 6, 1, 0, 0)]

Parse "next" "last" "upcoming"

Hey @akoumjian,

Could you add "next", "last", "upcoming" and "this" to the parser? Would like to parse things like "this Friday", "next Wednesday", "Yesterday", "Tomorrow" etc.?

How to debug DateFinder().find_dates ?

I've tried pycharm's buildin debuger and ipdb

None of them could step into the find_dates function.

dateparser fails miserably on ISO 8601 string!

See https://github.com/akoumjian/datefinder/blob/master/tests/test_find_dates.py#L33

Issue with datefinder module python

I have used date finder module to read unformatted dates, it is working great for some formats but when i read from 21st Oct 2012 to 30/11/2012 it is not giving proper dates.
Expected:- 2012-10-21 00:00:00 and 2012-11-30 00:00:00 but i am getting following dates
2262-08-19 19:14:50, 2016-12-30 00:00:00

But when i removed to from above highlighted section it is giving proper dates. it will be great if any one solves this issue.

Is it possible to use language different from english?

If yes how to set datefinder to do it?

Thank you

dateutil.parser.parse throws ValueError on crud date strings

Issue #14 highlights we need to wrap dateutil.parser.parse:

>>> from dateutil import parser
>>> parser.parse('to blah')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/src/datefinder/venv/lib/python3.4/site-packages/python_dateutil-2.4.2-py3.4.egg/dateutil/parser.py", line 1008, in parse
  File "/usr/local/src/datefinder/venv/lib/python3.4/site-packages/python_dateutil-2.4.2-py3.4.egg/dateutil/parser.py", line 395, in parse
ValueError: Unknown string format

Fixed and added a passing test in this draft implementation

Day format not parsed

A day specified as in 12th day of December, 2001 or in regex geek:

[0-9][0-9]?(st|nd|rd|th) day of

is not parsed.

Thnx for the great work!

Isn't able to recognize months after 'from'.

Eg: stringtotest = 'from July'.
OR
Eg: stringtotest = 'July'
And if it is able to do it using EXTRA_TOKENS_PATTERNS, how do I use it?
I downloaded using pip.

Which EXTRA_TOKENS should have matching REPLACEMENTS?

Issue #14 has highlighted a discrepancy between extra tokens we use to help us locate dates:

EXTRA_TOKENS_PATTERN = 'due|by|on|standard|daylight|savings|time|date|of|to|until|z|at|t'

and those tokens which are later replaced because dateutil.parser cannot accept them:

    REPLACEMENTS = {
        "standard": "",
        "daylight": "",
        "savings": "",
        "time": "",
        "date": "",
        "by": "",
        "due": "",
        "on": "",
        ",": "",
    }

Currently 'to' is not in the REPLACEMENTS for example.

My guess is that not all the extra tokens need to be replaced, meaning, REPLACEMENTS is a subset of EXTRA_TOKENS. We should at least have a test that shows which extra tokens dateutil.parse handles and which need to be in REPLACEMENTS.

unable to detect today, tomorrow

Hey, I have been using datefinder, but it is unable to detect words like 'today' and 'tomorrow', 'this week' etc.

Getting false positives if input text is not English

Hi,
I tried using datefinder to find dates that are together with Portuguese texts. I figured that although the package is originally made for English, it can get the dates in certain formats like YYYY-MM-DD but not say Janeiro 22, 2016 since "janeiro" is not in the RE pattern.

For input text like:
one_str = "O Benfica está nas meias-finais da Taça de Portugalo Leixões por 2012-2-21.",
it can parse 2012-2-21 to 2012-02-21 00:00:00.

However if the text is like:
sec_str = O Benfica está nas meias-finais da Taça de Portugalo Leixões por 6 2 Ricardo Salgado está impedido de sair do país e de contactar com os outros arguidos da operação Marquês.,

the package parses "6 2" as a date 2017-06-02 00:00:00.

If I have a string with a number that has two digits like:
third_str = O Benfica está nas meias-finais da Taça de Portugalo Leixões por 22 Ricardo Salgado,
it will ignore 22.

If I have a string with a number that has three digits like:
O Benfica está nas meias-finais da Taça de Portugalo Leixões por 6 2 Ricardo Salgado está impedido de sair do país e de contactar 233 com os outros arguidos da operação Marquês.,
it will consider "233" as a year.

I would like to ask if it should behave this way, and/or pointers to extending your package to another language.

Thank you very much.

Enable to pass arguments dayfirst and yearfirst

In the dateutil module, there is option to customize to take day as first or year as first.
For example
Input: 11-12-2017 with dayfirst=True
Output: 11-DEC-2017 in DD-MMM-YYYY

Input: 11-12-2017 with dayfirst=False (default)
Output: 12-NOV-2017.

Please enable this argument to be passed from datefinder.