Giter Club home page Giter Club logo

datefinder's Introduction

datefinder - extract dates from text

Build Status

pypi downloads per day

pypi version

A python module for locating dates inside text. Use this package to extract all sorts of date like strings from a document and turn them into datetime objects.

This module finds the likely datetime strings and then uses dateutil to convert to the datetime object.

Installation

With pip

pip install datefinder

Note: I do not publish the version on conda forge and cannot verify its integrity.

How to Use

In [1]: string_with_dates = """
   ...: ...
   ...: entries are due by January 4th, 2017 at 8:00pm
   ...: ...
   ...: created 01/15/2005 by ACME Inc. and associates.
   ...: ...
   ...: """

In [2]: import datefinder

In [3]: matches = datefinder.find_dates(string_with_dates)

In [4]: for match in matches:
   ...:     print match
   ...:
2017-01-04 20:00:00
2005-01-15 00:00:00

Demo

datefinder's People

Contributors

akoumjian avatar andreycorelli avatar diannamcallister avatar ecatkins avatar janto avatar jonafato avatar jsenecal avatar llambeau avatar ranchodeluxe avatar rawouter avatar reddalexx avatar sugatoray avatar synapticarbors avatar tomgobravo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datefinder's Issues

Regression in v0.6.1

Hello, it seems that changing from dateparser to dateutils in 0.6.1 significantly reduces accuracy in languages other than English.

In 0.6.0, installed from PyPI, I get the correct output for these Bulgarian dates

1 май 1974  			 1974-05-01 00:00:00
1 януари 1970  		 1970-01-01 00:00:00

In the latest master, I get:

1 януари 1970  		 1970-08-10 00:00:00
1 май 1974  			 1974-08-10 00:00:00

Dates are correctly identified, but they are not parsed right. It is suspicious that both inputs resolve to 10 August.

Multiple dates causing issues

A sentence with multiple dates is not getting detected.

import datefinder
matches = datefinder.find_dates(u'He was in hospital from Aug-2001 to Feb-2002.')
for match in matches:
    print match

However if the sentence has only one date component(same date format), it is detected.

import datefinder
matches = datefinder.find_dates(u'He was in hospital from Aug-2001.')
for match in matches:
    print match

Env:
Windows, 64bit, Python 2.7

Month abbreviation with period fails

Using Python 3.5.2 and datefinder 0.6.0.
textstring = "Oct. , 1881"
find_dates returns no matches.
If the period following "Oct" is removed, then find_dates returns a match.

Periods after date abbreviations is quite common. I would have expected this to work.

Failure to recognise dates with time ranges

"\n\n Wednesday 5th April 09.00 - 11.30\n Wednesday 5th April 15.00 - 17.30\n\n Friday 7th April 09.00 - 11.30"

-->

2009-04-05 00:00:00
2030-04-05 15:00:00
2030-04-07 09:00:00

Error of infinite recursion in R(Shiny)

I am working on R(Shiny) that is working on a loan prediction model. I am trying to calculate loan amount for a particular method but it is showing me an error for infinite recursion and just like mentioned in most of the posts even I have tried options(expressions = 1000) value, but am still experiencing the error.

If I write my condition as
MDBB_LA<- reactive({ input$MDBB*10 })
then it is giving me no error but if I add another condition on this as

DSCR_Post<- reactive({ if (input$MU == "EMM" & (input$EMIM/12)+EMI()!=0) { EBITDA_EMM()/((input$EMIM/12) + EMI()) } else if (input$MU == "EMM" & (input$EMIM/12)+EMI()==0) { 0 } else if (input$MU != "EMM" & (input$EMIM/12)+EMI()!=0 ){ EBITDA()/((input$EMIM/12) + EMI()) }else{ 0 }})

MDBB_LA<- reactive({ if ((input$MU == "EMM" & DSCR_Post() >= 1) | (input$MU == "FAT1" & DSCR_Post() >= 0.8) | (input$MU == "FAT2" & DSCR_Post() >= 0.7) | (input$MU == "UAT" & DSCR_Post() >= 0.5)) { input$MDBB*10*2 } else if ((input$MU == "EMM" & DSCR_Post() < 1) | (input$MU == "FAT1" & DSCR_Post() < 0.8 ) | (input$MU == "FAT2" & DSCR_Post() < 0.7) | (input$MU == "UAT" & DSCR_Post() < 0.5)){ input$MDBB*10 } else if ((input$MU == "MDBB1" ) | (input$MU == "MDBB2" ) | (input$MU == "MDBB3") | (input$MU == "MDBB4") ){ input$MDBB*10 } else {input$MDBB*10} })
then it is showing me error as :

Warning: Error in : evaluation nested too deeply: infinite recursion / options(expressions=)?

Anyone who can help me, what is going wrong with the logical statement.

wrong interpretation

I have the following string:
str1= 'SvrCk: 21 3:13p 04/19/16 Separate checks: 8-of-8'
the date I want to extract will be obviously 2016-04-19 (or, in full, 2016-04-19 15:13 - but I don't even need the time). Unfortunately, datefinder fails to recognize it:

matchesFuzzy = datefinder.find_dates(str1)
for match in matchesFuzzy: print(match)

gives this:

2016-08-21 03:13:00
2016-09-04 00:00:00
2016-08-08 00:00:00

and

matchesStrict = datefinder.find_dates(str1, strict=True)
for match in matchesStrict: print(match)

gives this:

2016-09-04 00:00:00

REPLACEMENTS keys should consider whitespace permutations

Issue #14 also highlights that REPLACEMENTS maybe should try to match certain permutations of a key surrounded by whitespace characters.

    REPLACEMENTS = {
        "standard": "",
        "daylight": "",
        "savings": "",
        "time": "",
        "date": "",
        "by": "",
        "due": "",
        "on": "",
        ",": "",
    }

For example, consider the key = 'to'

  • should match ' to'
  • should match 'to '
  • should match ' to '
  • should never match 'to' in say 'october'

See draft implementation with tests here

can't install datefinder


Failed building wheel for regex
Running setup.py clean for regex
Failed to build regex
Installing collected packages: regex, datefinder
Running setup.py install for regex ... error
Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build-QCCGKo/regex/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-I_ImpH-record/install-record.txt --single-version-externally-managed --compile:
/usr/local/lib/python2.7/dist-packages/setuptools/dist.py:351: UserWarning: Normalizing '2016.01.10' to '2016.1.10'
normalized_version,
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.7
copying Python2/regex.py -> build/lib.linux-x86_64-2.7
copying Python2/_regex_core.py -> build/lib.linux-x86_64-2.7
copying Python2/test_regex.py -> build/lib.linux-x86_64-2.7
running build_ext
building '_regex' extension
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/Python2
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c Python2/_regex.c -o build/temp.linux-x86_64-2.7/Python2/_regex.o
Python2/_regex.c:46:20: fatal error: Python.h: No such file or directory
#include "Python.h"
^
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

----------------------------------------

Command "/usr/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build-QCCGKo/regex/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-I_ImpH-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-QCCGKo/regex/

dateparser throws TypeError error with specific number combo

If datefinder attempts to parse a string such as "32 2016", it throws an error "TypeError: Required argument 'day' (pos 3) not found".

The combination needs to be a string with a combination of 2 and 4 digits, and the number with 2 digits needs to be greater than 31 or less than 1 (eg 00) to throw the error.

I found myself deep down the dateparser rabbit hole, but thought this scenario may need to be filtered in datefinder prior to passing to dateparser.parse. Eg something like the below added to parse_date_string (this is a bit ugly, but does the job)

date_string_split = date_string.split()
if len(date_string_split) == 2:
    if len(date_string_split[0]) == 2 and len(date_string_split[1]) == 4 and (int(date_string_split[0]) < 1 or int(date_string_split[0]) > 31):
            return None
    if len(date_string_split[1]) == 2 and len(date_string_split[0]) == 4 and (int(date_string_split[1]) < 1 or int(date_string_split[1]) > 31):
            return None

I can submit a pull request, but need to make my code prettier...

Fails on a specific string

In [13]: def fd(s):
    ...:     for match in datefinder.find_dates(s):
    ...:         print match
    ...:

In [31]: b
Out[31]: '37.47.96.153 - - [09/Jun/2017:00:00:47 +0200] "GET /style/common/img/icons/friendmsg.png HTTP/1.1" 304 -'

In [32]: fd(b)
0304-06-23 00:00:00

not able to download the package using pip

Hi,
Can't access the module. I am using python 3.5.
Below is the complete tracback of the error.

Exception:
Traceback (most recent call last):
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\basecommand.py", line 215, in main
    status = self.run(options, args)
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\commands\install.py", line 335, in run
    wb.build(autobuilding=True)
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\wheel.py", line 749, in build
    self.requirement_set.prepare_files(self.finder)
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\req\req_set.py", line 380, in prepare_files
    ignore_dependencies=self.ignore_dependencies))
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\req\req_set.py", line 554, in _prepare_file
    require_hashes
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\req\req_install.py", line 278, in populate_link
    self.link = finder.find_requirement(self, upgrade)
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\index.py", line 465, in find_requirement
    all_candidates = self.find_all_candidates(req.name)
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\index.py", line 423, in find_all_candidates
    for page in self._get_pages(url_locations, project_name):
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\index.py", line 568, in _get_pages
    page = self._get_page(location)
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\index.py", line 683, in _get_page
    return HTMLPage.get_page(link, session=self.session)
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\index.py", line 811, in get_page
    inst = cls(resp.content, resp.url, resp.headers)
  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\index.py", line 731, in __init__
    namespaceHTMLElements=False,
TypeError: parse() got an unexpected keyword argument 'transport_encoding'

Extract a date from a string with additional numbers

I have a problem with these examples:

"09/05/2009 14:40 06", "03/29/2009 11:03 am 1", "32402 05/19/13 05:37", ...

In general, when a text line comes with a complete date inside and some extra number (let's assume it's a number, because I've already filtered those non significant characters), the module can't find a date on that line.

I guess this is a little bit difficult to parse, so I just wanted to know if there are some ideas to solve this, without going against the main pipeline of this engine. I'm able to post a PR to fix (or cover) this if it's a desirable behavior on the DataFinder engine.

Thanks!

Not able to extract dates from a string

I have tried running this code as:
string_with_dates = """
I want to apply for leaves from 12/12/2017 to 12/18/2017"""
import datefinder
matches = datefinder.find_dates(string_with_dates)
for match in matches:
print match

But it is not extracting any date values

Unable to detect dates

list(datefinder.find_dates('date: 11-05-16'))
gives an empty list

while list(datefinder.find_dates('date 11-05-16')) (without the colon) gives the correct result [datetime.datetime(2016, 11, 5, 0, 0)]

Date parsing is just wrong.

See this input text has no dates in it (not even numbers!):

text = '''Notwithstanding Lender’s acceleration of the sums secured by this Mortgage due to Borrower's default, Borrower shall have the right to have any proceedings begun by Lender to enforce this Mortgage discontinued at any time prior to entry of a judgment enforcing this Mortgage if: (a) Borrower pays Lender all sums which would be then due under this Mortgage and the Credit Agreement had no acceleration occurred; (b) Borrower cures all events of default; (c) Borrower pays all reasonable expenses incurred by Lender in enforcing the covenants and agreements of Borrower contained in this Mortgage, and in enforcing Lender’s remedies as provided in paragraph 22 hereof, including, but not limited to, reasonable attorneys' fees; and (a) Borrower takes such action as Lender may reasonably require to assure that the lien of this Mortgage, Lender's interest in the Property and Borrower’s obligation to pay the sums secured by this Mortgage shall continue unimpaired.'''

Running list(datefinder.find_dates(text)) produces
[(datetime.datetime(2017, 5, 9, 0, 0), 'may')]

Why?

Regexp identifies months as a part of the word

Hi,

I think that an example should give you a better understanding about the problem.

>>> from datefinder import *
>>> date_finder = DateFinder()
>>> list(date_finder.extract_date_strings('unable to separate'))
[('to sep', (6, 13), {'timezones': [], 'digits': [], 'hours': [], 'months': ['sep'], 'delimiters': [' ', ' '], 'extra_tokens': ['to'], 'minutes': [], 'time_periods': [], 'time': [], 'seconds': [], 'days': [], 'digits_modifier': []})]
>>>
>>> list(date_finder.extract_date_strings('copy of octocat.txt document'))
[('of oct', (4, 11), {'timezones': [], 'digits': [], 'hours': [], 'months': ['oct'], 'delimiters': [' ', ' '], 'extra_tokens': ['of'], 'minutes': [], 'time_periods': [], 'time': [], 'seconds': [], 'days': [], 'digits_modifier': []})]
>>>
>>> list(date_finder.extract_date_strings('of Octoberrrrr'))
[('of October', (0, 10), {'timezones': [], 'digits': [], 'hours': [], 'months': ['October'], 'delimiters': [' '], 'extra_tokens': ['of'], 'minutes': [], 'time_periods': [], 'time': [], 'seconds': [], 'days': [], 'digits_modifier': []})]

Not sure how to fix this issue. If you have any idea, please, let me know. I can try to make a pull request in case if it solve the problem.

Minutes / Months

Hi !

I just wanted to start by thanking you for datefinder, exactly what I was looking.

I think I found a small bug but couldn't understand how to fix it in the source code.

When parsing strings such as:

>>> list(datefinder.find_dates('2 months'))[0]

it parses months as minutes, i.e. :

datetime.datetime(2016, 6, 10, 8, 31, 41, 469126)

If you could have a look at it, it'd be great ! Thanks a lot again !

String with leading digits not parsed

Hello,
a string which has digits before the date to be extracted is not correctly handled, and thus does not extract any dates (both in strict and non-strict mode).

Here is an example:
' sf.0008 05/04/17 21:34'
Kind regards

Identifies dates that are not present in the string

Below is the snippet:
`
import datefinder

s = '1.1 This Addendum applies to Software Maintenance & Support Services ("Services") for all copies of software designated on the attached Exhibit 1 ("S oftware") which you have licensed from ABC. The Services for specific Software products may be more fully described in attached Appendices. In the event of a conflict, the order of precedence will be the Appendices, this Addendum, and the Agreement, in that order.'
m = list(datefinder.find_dates(s))
print m
`

On printing m this is seen
[datetime.datetime(2017, 8, 1, 0, 0), datetime.datetime(2017, 8, 1, 0, 0), datetime.datetime(2017, 5, 14, 0, 0)]

There is no date present in the string. Is this an expected behaviour or a mistake from my end?

Parse range of dates

Find dates ranges, for example:

17 and 18 september 2016

from 7 to 14 of september 2016

from friday 2 of september to saturday 10 of 2016

september, from 7 to 10

from 31 august to 2 of september 2016

In spanish is typical:

del 17 al 20 de septiembre 2016

del 31 de agosto al 2 de septiembre 2016

Good job!!
Thanks!

Microsecond extraction different if delimited with ',' and '.'

Version 0.6.1 on Python3.5.2

>>> import datefinder
>>> df = datefinder.DateFinder()
>>> list(df.find_dates('2017-06-27 09:51:46,509'))
[datetime.datetime(2017, 6, 27, 9, 51, 46, 509000)]
>>> list(df.find_dates('2017-06-27 09:51:46.509'))
[datetime.datetime(2017, 6, 27, 9, 51, 46)]
>>> 

On Python2.7.12, the ',' version is not found at all.

>>> import datefinder
>>> df = datefinder.DateFinder()
>>> list(df.find_dates('2017-06-27 09:51:46,509'))
[]
>>> list(df.find_dates('2017-06-27 09:51:46.509'))
[datetime.datetime(2017, 6, 27, 9, 51, 46)]
>>> 

Unable to recognize dates with keywords 'before' and 'after'

Somehow inspite of being in the relative patterns list the above two keywords aren't working.

import datefinder as df
text="check for flights between pune and mumbai before 10 PM"
matches = df.find_dates(text)
for match in matches:
 print match

Gives no return.

datefinder can't find date with Monday, b'cause all "on" substrings are removed, while dateparser could parse it

Example: date_string= 'On Monday, May 9, 2016'
_find_and_replace() should remove all DateFinder.REPLACEMENT words, but in fact it removes all substrings, i.e. within words too,
so, _find_and_replace() wrongly mutates this into " mday may 9 2016"
i.e. it wrongly removes "on" from "monday".
dateparser can't parse " mday may 9 2016"
but it successfully parses 'On Monday, May 9, 2016'
How to correct: force _find_and_replace() function to not touch substrings within words, at least not touch "monday"s

Cannot find the following scenerios

and this is handpunched on 2017/08/27 @ 7:24 AM --- COULD NOT FIND

and this is handpunched on 08/24/2017 @ 7:24 AM --- COULD NOT FIND

and this is handpunched on August 27, 2017 at 7:57 AM === INCORRECT FIND 2027-08-03 07:57:00

How fast to parse a text ?

Hello

I have this very small text and it seems it's very slow to parse it, am I doing something wrong?

text = "hello 2014/02/15 love you"

import datefinder

matches = datefinder.find_dates(text,False,False,True);

print "datefinder = "

for match in matches:
    if match.year>1800 :
        print match,match.year,match.month,match.day


$ time python test.py 
datefinder = 
2014-02-15 00:00:00 2014 2 15

real    0m1.380s
user    0m0.976s
sys 0m0.080s

Dates ending with Z timezone not recognized if there is no millisecond

Let me start with: this is exactly the module I was looking for, thanks a lot!

I'm having an annoying bug however, I have dates ending the with "Z" keyword for the timezone, which is ISO 8601 compliant but the module fails to parse it:

>>> list(datefinder.find_dates('INFO[2017-02-03T09:04:08Z] Done job'))
[]

>>> list(datefinder.find_dates('INFO[2017-02-03T09:04:08] Done job'))
[datetime.datetime(2017, 2, 3, 9, 4, 8)]

Wrong Detection

Okay so When I search this string "[[[uSaturday 04 January 2014 y 27 December 2013 uBooked u23:20 u1"

I get correctly -
2014-01-04 00:00:00
2013-12-27 00:00:00

But when I run it on "[[uMonday 30 December 2013 y 27 December 2013 u23:20 u1"
I get -
2013-12-27 00:00:00
2016-10-07 23:20:00

Why am I getting wrong results despite only a change in date?

test_parse_date_string fails comparing offset-naive and offset-aware types

>           assert actual_datetime == expected_date
E           TypeError: can't compare offset-naive and offset-aware datetimes

tests/test_parse_date_string.py:61: TypeError
----------------------------- Captured stdout call -----------------------------
DEBUG:tests.test_parse_date_string:acutal=2015-11-20 18:00:00-06:00  expected=2015-11-20 18:00:00

It looks like the problem is test test itself, unless I misunderstand what is supposed to be happening.

diff --git a/tests/test_parse_date_string.py b/tests/test_parse_date_string.py
index 68e8e6b..eee6366 100644
--- a/tests/test_parse_date_string.py
+++ b/tests/test_parse_date_string.py
@@ -41,7 +41,7 @@ logger = logging.getLogger(__name__)
     (' on 11-20-2015 6pm CST ',
      '11-20-2015 6pm',
      {'timezones':['CST']},
-     datetime(2015, 11, 20, 18, 0)
+     datetime(2015, 11, 20, 18, 0).replace(tzinfo=tz.gettz('CST'))
     ),
     # test a tz abbreviation that
     # dateutile.tz.gettz cannot find
@@ -49,7 +49,7 @@ logger = logging.getLogger(__name__)
     (' on 11-20-2015 6am IRST ',
      '11-20-2015 6am',
      {'timezones':['IRST']},
-     datetime(2015, 11, 20, 6, 0)
+     datetime(2015, 11, 20, 6, 0).replace(tzinfo=tz.gettz('IRST'))
     )
 ])
 def test_parse_date_string_find_replace(date_string, expected_parse_arg, expected_captures, expected_date):

Problem finding dates with EXTRA_TOKENS_PATTERNS words in sentence

Hi,
Thanks for writing this module, I've been playing around with it and have found that there seems to be an issue finding dates when words like "to", "by" and "until" are in the string. I notice these words are included in EXTRA_TOKENS_PATTERNS in datefinder.py but I'm not really familiar with dateutil module so not sure why this should cause an issue. Below is some output showing some examples where dates aren't identified and how swapping the word "to" for the word "so" means dates are correctly identified:

>>> chk = "i am looking for a date june 4th 1996 to july 3rd 2013"
>>> matches = datefinder.find_dates(chk)
>>> matchlist = list(matches)
>>> matchlist
[]
>>> chk = "i am looking for a date june 4th 1996 so july 3rd 2013"
>>> matches = datefinder.find_dates(chk)
>>> matchlist = list(matches)
>>> matchlist
[datetime.datetime(1996, 6, 4, 0, 0), datetime.datetime(2013, 7, 3, 0, 0)]
>>> chk = "october 27 1994 to be put into effect on june 1 1995"
>>> matches = datefinder.find_dates(chk)
>>> matchlist = list(matches)
>>> matchlist
[datetime.datetime(1995, 6, 1, 0, 0)]
>>> chk = "october 27 1994 so be put into effect on june 1 1995"
>>> matches = datefinder.find_dates(chk)
>>> matchlist = list(matches)
>>> matchlist
[datetime.datetime(1994, 10, 27, 0, 0), datetime.datetime(1995, 6, 1, 0, 0)]

Parse "next" "last" "upcoming"

Hey @akoumjian,

Could you add "next", "last", "upcoming" and "this" to the parser? Would like to parse things like "this Friday", "next Wednesday", "Yesterday", "Tomorrow" etc.?

Issue with datefinder module python

I have used date finder module to read unformatted dates, it is working great for some formats but when i read from 21st Oct 2012 to 30/11/2012 it is not giving proper dates.
Expected:- 2012-10-21 00:00:00 and 2012-11-30 00:00:00 but i am getting following dates
2262-08-19 19:14:50, 2016-12-30 00:00:00

But when i removed to from above highlighted section it is giving proper dates. it will be great if any one solves this issue.

dateutil.parser.parse throws ValueError on crud date strings

Issue #14 highlights we need to wrap dateutil.parser.parse:

>>> from dateutil import parser
>>> parser.parse('to blah')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/src/datefinder/venv/lib/python3.4/site-packages/python_dateutil-2.4.2-py3.4.egg/dateutil/parser.py", line 1008, in parse
  File "/usr/local/src/datefinder/venv/lib/python3.4/site-packages/python_dateutil-2.4.2-py3.4.egg/dateutil/parser.py", line 395, in parse
ValueError: Unknown string format

Fixed and added a passing test in this draft implementation

Day format not parsed

A day specified as in 12th day of December, 2001 or in regex geek:

[0-9][0-9]?(st|nd|rd|th) day of

is not parsed.

Thnx for the great work!

Which EXTRA_TOKENS should have matching REPLACEMENTS?

Issue #14 has highlighted a discrepancy between extra tokens we use to help us locate dates:

EXTRA_TOKENS_PATTERN = 'due|by|on|standard|daylight|savings|time|date|of|to|until|z|at|t'

and those tokens which are later replaced because dateutil.parser cannot accept them:

    REPLACEMENTS = {
        "standard": "",
        "daylight": "",
        "savings": "",
        "time": "",
        "date": "",
        "by": "",
        "due": "",
        "on": "",
        ",": "",
    }

Currently 'to' is not in the REPLACEMENTS for example.

My guess is that not all the extra tokens need to be replaced, meaning, REPLACEMENTS is a subset of EXTRA_TOKENS. We should at least have a test that shows which extra tokens dateutil.parse handles and which need to be in REPLACEMENTS.

Getting false positives if input text is not English

Hi,
I tried using datefinder to find dates that are together with Portuguese texts. I figured that although the package is originally made for English, it can get the dates in certain formats like YYYY-MM-DD but not say Janeiro 22, 2016 since "janeiro" is not in the RE pattern.

For input text like:
one_str = "O Benfica está nas meias-finais da Taça de Portugalo Leixões por 2012-2-21.",
it can parse 2012-2-21 to 2012-02-21 00:00:00.

However if the text is like:
sec_str = O Benfica está nas meias-finais da Taça de Portugalo Leixões por 6 2 Ricardo Salgado está impedido de sair do país e de contactar com os outros arguidos da operação Marquês.,

the package parses "6 2" as a date 2017-06-02 00:00:00.

If I have a string with a number that has two digits like:
third_str = O Benfica está nas meias-finais da Taça de Portugalo Leixões por 22 Ricardo Salgado,
it will ignore 22.

If I have a string with a number that has three digits like:
O Benfica está nas meias-finais da Taça de Portugalo Leixões por 6 2 Ricardo Salgado está impedido de sair do país e de contactar 233 com os outros arguidos da operação Marquês.,
it will consider "233" as a year.

I would like to ask if it should behave this way, and/or pointers to extending your package to another language.

Thank you very much.

Enable to pass arguments dayfirst and yearfirst

In the dateutil module, there is option to customize to take day as first or year as first.
For example
Input: 11-12-2017 with dayfirst=True
Output: 11-DEC-2017 in DD-MMM-YYYY

Input: 11-12-2017 with dayfirst=False (default)
Output: 12-NOV-2017.

Please enable this argument to be passed from datefinder.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.