akoumjian / datefinder Goto Github PK
View Code? Open in Web Editor NEWFind dates inside text using Python and get back datetime objects
Home Page: http://datefinder.readthedocs.org/en/latest/
License: MIT License
Find dates inside text using Python and get back datetime objects
Home Page: http://datefinder.readthedocs.org/en/latest/
License: MIT License
Hi !
I just wanted to start by thanking you for datefinder, exactly what I was looking.
I think I found a small bug but couldn't understand how to fix it in the source code.
When parsing strings such as:
>>> list(datefinder.find_dates('2 months'))[0]
it parses months as minutes, i.e. :
datetime.datetime(2016, 6, 10, 8, 31, 41, 469126)
If you could have a look at it, it'd be great ! Thanks a lot again !
Hi,
I think that an example should give you a better understanding about the problem.
>>> from datefinder import *
>>> date_finder = DateFinder()
>>> list(date_finder.extract_date_strings('unable to separate'))
[('to sep', (6, 13), {'timezones': [], 'digits': [], 'hours': [], 'months': ['sep'], 'delimiters': [' ', ' '], 'extra_tokens': ['to'], 'minutes': [], 'time_periods': [], 'time': [], 'seconds': [], 'days': [], 'digits_modifier': []})]
>>>
>>> list(date_finder.extract_date_strings('copy of octocat.txt document'))
[('of oct', (4, 11), {'timezones': [], 'digits': [], 'hours': [], 'months': ['oct'], 'delimiters': [' ', ' '], 'extra_tokens': ['of'], 'minutes': [], 'time_periods': [], 'time': [], 'seconds': [], 'days': [], 'digits_modifier': []})]
>>>
>>> list(date_finder.extract_date_strings('of Octoberrrrr'))
[('of October', (0, 10), {'timezones': [], 'digits': [], 'hours': [], 'months': ['October'], 'delimiters': [' '], 'extra_tokens': ['of'], 'minutes': [], 'time_periods': [], 'time': [], 'seconds': [], 'days': [], 'digits_modifier': []})]
Not sure how to fix this issue. If you have any idea, please, let me know. I can try to make a pull request in case if it solve the problem.
Failed building wheel for regex
Running setup.py clean for regex
Failed to build regex
Installing collected packages: regex, datefinder
Running setup.py install for regex ... error
Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build-QCCGKo/regex/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-I_ImpH-record/install-record.txt --single-version-externally-managed --compile:
/usr/local/lib/python2.7/dist-packages/setuptools/dist.py:351: UserWarning: Normalizing '2016.01.10' to '2016.1.10'
normalized_version,
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.7
copying Python2/regex.py -> build/lib.linux-x86_64-2.7
copying Python2/_regex_core.py -> build/lib.linux-x86_64-2.7
copying Python2/test_regex.py -> build/lib.linux-x86_64-2.7
running build_ext
building '_regex' extension
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/Python2
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c Python2/_regex.c -o build/temp.linux-x86_64-2.7/Python2/_regex.o
Python2/_regex.c:46:20: fatal error: Python.h: No such file or directory
#include "Python.h"
^
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
----------------------------------------
Command "/usr/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build-QCCGKo/regex/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-I_ImpH-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-QCCGKo/regex/
Issue #14 also highlights that REPLACEMENTS maybe should try to match certain permutations of a key surrounded by whitespace characters.
REPLACEMENTS = {
"standard": "",
"daylight": "",
"savings": "",
"time": "",
"date": "",
"by": "",
"due": "",
"on": "",
",": "",
}
For example, consider the key = 'to'
See draft implementation with tests here
For string "transactions from 14 Aug 2016 to 18 Aug 2016." the result is "no dates found". If I replace "to" with "till", it gives the correct result.
"We should arrange meetings on the following dates:
April 3rd
4th
5th
6th"
No dates found
Hi,
Can't access the module. I am using python 3.5.
Below is the complete tracback of the error.
Exception:
Traceback (most recent call last):
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\basecommand.py", line 215, in main
status = self.run(options, args)
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\commands\install.py", line 335, in run
wb.build(autobuilding=True)
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\wheel.py", line 749, in build
self.requirement_set.prepare_files(self.finder)
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\req\req_set.py", line 380, in prepare_files
ignore_dependencies=self.ignore_dependencies))
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\req\req_set.py", line 554, in _prepare_file
require_hashes
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\req\req_install.py", line 278, in populate_link
self.link = finder.find_requirement(self, upgrade)
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\index.py", line 465, in find_requirement
all_candidates = self.find_all_candidates(req.name)
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\index.py", line 423, in find_all_candidates
for page in self._get_pages(url_locations, project_name):
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\index.py", line 568, in _get_pages
page = self._get_page(location)
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\index.py", line 683, in _get_page
return HTMLPage.get_page(link, session=self.session)
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\index.py", line 811, in get_page
inst = cls(resp.content, resp.url, resp.headers)
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pip\index.py", line 731, in __init__
namespaceHTMLElements=False,
TypeError: parse() got an unexpected keyword argument 'transport_encoding'
In some cases, find_dates returns None, but the unit test fails to detect this because it relies on a for loop construct. I'll submit a pull request to fix this.
I have used date finder module to read unformatted dates, it is working great for some formats but when i read from 21st Oct 2012 to 30/11/2012 it is not giving proper dates.
Expected:- 2012-10-21 00:00:00 and 2012-11-30 00:00:00 but i am getting following dates
2262-08-19 19:14:50, 2016-12-30 00:00:00
But when i removed to from above highlighted section it is giving proper dates. it will be great if any one solves this issue.
> assert actual_datetime == expected_date
E TypeError: can't compare offset-naive and offset-aware datetimes
tests/test_parse_date_string.py:61: TypeError
----------------------------- Captured stdout call -----------------------------
DEBUG:tests.test_parse_date_string:acutal=2015-11-20 18:00:00-06:00 expected=2015-11-20 18:00:00
It looks like the problem is test test itself, unless I misunderstand what is supposed to be happening.
diff --git a/tests/test_parse_date_string.py b/tests/test_parse_date_string.py
index 68e8e6b..eee6366 100644
--- a/tests/test_parse_date_string.py
+++ b/tests/test_parse_date_string.py
@@ -41,7 +41,7 @@ logger = logging.getLogger(__name__)
(' on 11-20-2015 6pm CST ',
'11-20-2015 6pm',
{'timezones':['CST']},
- datetime(2015, 11, 20, 18, 0)
+ datetime(2015, 11, 20, 18, 0).replace(tzinfo=tz.gettz('CST'))
),
# test a tz abbreviation that
# dateutile.tz.gettz cannot find
@@ -49,7 +49,7 @@ logger = logging.getLogger(__name__)
(' on 11-20-2015 6am IRST ',
'11-20-2015 6am',
{'timezones':['IRST']},
- datetime(2015, 11, 20, 6, 0)
+ datetime(2015, 11, 20, 6, 0).replace(tzinfo=tz.gettz('IRST'))
)
])
def test_parse_date_string_find_replace(date_string, expected_parse_arg, expected_captures, expected_date):
Eg: stringtotest = 'from July'.
OR
Eg: stringtotest = 'July'
And if it is able to do it using EXTRA_TOKENS_PATTERNS, how do I use it?
I downloaded using pip.
Somehow inspite of being in the relative patterns list the above two keywords aren't working.
import datefinder as df
text="check for flights between pune and mumbai before 10 PM"
matches = df.find_dates(text)
for match in matches:
print match
Gives no return.
Hey, I have been using datefinder, but it is unable to detect words like 'today' and 'tomorrow', 'this week' etc.
Issue #14 highlights that our DATE_REGEX can match more than one date string in certain situations.
One situation where this occurs is when two dates bookend an EXTRA_TOKEN such as:
datestring = 'june 5th 2012 to january 1st 2014'
There is a test setup for this use case in this unmerged branch
Let me start with: this is exactly the module I was looking for, thanks a lot!
I'm having an annoying bug however, I have dates ending the with "Z" keyword for the timezone, which is ISO 8601 compliant but the module fails to parse it:
>>> list(datefinder.find_dates('INFO[2017-02-03T09:04:08Z] Done job'))
[]
>>> list(datefinder.find_dates('INFO[2017-02-03T09:04:08] Done job'))
[datetime.datetime(2017, 2, 3, 9, 4, 8)]
and this is handpunched on 2017/08/27 @ 7:24 AM --- COULD NOT FIND
and this is handpunched on 08/24/2017 @ 7:24 AM --- COULD NOT FIND
and this is handpunched on August 27, 2017 at 7:57 AM === INCORRECT FIND 2027-08-03 07:57:00
"\n\n Wednesday 5th April 09.00 - 11.30\n Wednesday 5th April 15.00 - 17.30\n\n Friday 7th April 09.00 - 11.30"
-->
2009-04-05 00:00:00
2030-04-05 15:00:00
2030-04-07 09:00:00
Example: date_string= 'On Monday, May 9, 2016'
_find_and_replace() should remove all DateFinder.REPLACEMENT words, but in fact it removes all substrings, i.e. within words too,
so, _find_and_replace() wrongly mutates this into " mday may 9 2016"
i.e. it wrongly removes "on" from "monday".
dateparser can't parse " mday may 9 2016"
but it successfully parses 'On Monday, May 9, 2016'
How to correct: force _find_and_replace() function to not touch substrings within words, at least not touch "monday"s
In the dateutil
module, there is option to customize to take day as first or year as first.
For example
Input: 11-12-2017 with dayfirst=True
Output: 11-DEC-2017 in DD-MMM-YYYY
Input: 11-12-2017 with dayfirst=False
(default)
Output: 12-NOV-2017.
Please enable this argument to be passed from datefinder
.
Using Python 3.5.2 and datefinder 0.6.0.
textstring = "Oct. , 1881"
find_dates returns no matches.
If the period following "Oct" is removed, then find_dates returns a match.
Periods after date abbreviations is quite common. I would have expected this to work.
Version 0.6.1 on Python3.5.2
>>> import datefinder
>>> df = datefinder.DateFinder()
>>> list(df.find_dates('2017-06-27 09:51:46,509'))
[datetime.datetime(2017, 6, 27, 9, 51, 46, 509000)]
>>> list(df.find_dates('2017-06-27 09:51:46.509'))
[datetime.datetime(2017, 6, 27, 9, 51, 46)]
>>>
On Python2.7.12, the ',' version is not found at all.
>>> import datefinder
>>> df = datefinder.DateFinder()
>>> list(df.find_dates('2017-06-27 09:51:46,509'))
[]
>>> list(df.find_dates('2017-06-27 09:51:46.509'))
[datetime.datetime(2017, 6, 27, 9, 51, 46)]
>>>
Using Python 3.5.2 and datefinder 0.6.0.
textstring = "Oct , 1881"
find_dates returns '1881-10-31 00:00:00'
I would have preferred '1881-10-00 00:00:00'
Is there a way to force the default day?
Hello, it seems that changing from dateparser
to dateutils
in 0.6.1 significantly reduces accuracy in languages other than English.
In 0.6.0, installed from PyPI, I get the correct output for these Bulgarian dates
1 май 1974 1974-05-01 00:00:00
1 януари 1970 1970-01-01 00:00:00
In the latest master, I get:
1 януари 1970 1970-08-10 00:00:00
1 май 1974 1974-08-10 00:00:00
Dates are correctly identified, but they are not parsed right. It is suspicious that both inputs resolve to 10 August.
Hey @akoumjian,
Could you add "next", "last", "upcoming" and "this" to the parser? Would like to parse things like "this Friday", "next Wednesday", "Yesterday", "Tomorrow" etc.?
I have tried running this code as:
string_with_dates = """
I want to apply for leaves from 12/12/2017 to 12/18/2017"""
import datefinder
matches = datefinder.find_dates(string_with_dates)
for match in matches:
print match
But it is not extracting any date values
I have the following string:
str1= 'SvrCk: 21 3:13p 04/19/16 Separate checks: 8-of-8'
the date I want to extract will be obviously 2016-04-19 (or, in full, 2016-04-19 15:13 - but I don't even need the time). Unfortunately, datefinder
fails to recognize it:
matchesFuzzy = datefinder.find_dates(str1)
for match in matchesFuzzy: print(match)
gives this:
2016-08-21 03:13:00
2016-09-04 00:00:00
2016-08-08 00:00:00
and
matchesStrict = datefinder.find_dates(str1, strict=True)
for match in matchesStrict: print(match)
gives this:
2016-09-04 00:00:00
Why does setup.py dependencies list, point to an old version of regex
library?
In [13]: def fd(s):
...: for match in datefinder.find_dates(s):
...: print match
...:
In [31]: b
Out[31]: '37.47.96.153 - - [09/Jun/2017:00:00:47 +0200] "GET /style/common/img/icons/friendmsg.png HTTP/1.1" 304 -'
In [32]: fd(b)
0304-06-23 00:00:00
Find dates ranges, for example:
17 and 18 september 2016
from 7 to 14 of september 2016
from friday 2 of september to saturday 10 of 2016
september, from 7 to 10
from 31 august to 2 of september 2016
In spanish is typical:
del 17 al 20 de septiembre 2016
del 31 de agosto al 2 de septiembre 2016
Good job!!
Thanks!
I have a problem with these examples:
"09/05/2009 14:40 06", "03/29/2009 11:03 am 1", "32402 05/19/13 05:37", ...
In general, when a text line comes with a complete date inside and some extra number (let's assume it's a number, because I've already filtered those non significant characters), the module can't find a date on that line.
I guess this is a little bit difficult to parse, so I just wanted to know if there are some ideas to solve this, without going against the main pipeline of this engine. I'm able to post a PR to fix (or cover) this if it's a desirable behavior on the DataFinder engine.
Thanks!
If yes how to set datefinder to do it?
Thank you
The find_dates method returns not a match if the string is set to 08082018 or any other string of similar format.
Below is the snippet:
`
import datefinder
s = '1.1 This Addendum applies to Software Maintenance & Support Services ("Services") for all copies of software designated on the attached Exhibit 1 ("S oftware") which you have licensed from ABC. The Services for specific Software products may be more fully described in attached Appendices. In the event of a conflict, the order of precedence will be the Appendices, this Addendum, and the Agreement, in that order.'
m = list(datefinder.find_dates(s))
print m
`
On printing m this is seen
[datetime.datetime(2017, 8, 1, 0, 0), datetime.datetime(2017, 8, 1, 0, 0), datetime.datetime(2017, 5, 14, 0, 0)]
There is no date present in the string. Is this an expected behaviour or a mistake from my end?
Issue #14 highlights we need to wrap dateutil.parser.parse
:
>>> from dateutil import parser
>>> parser.parse('to blah')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/src/datefinder/venv/lib/python3.4/site-packages/python_dateutil-2.4.2-py3.4.egg/dateutil/parser.py", line 1008, in parse
File "/usr/local/src/datefinder/venv/lib/python3.4/site-packages/python_dateutil-2.4.2-py3.4.egg/dateutil/parser.py", line 395, in parse
ValueError: Unknown string format
Fixed and added a passing test in this draft implementation
To consider spaces between the time and the time period(am/pm)
A sentence with multiple dates is not getting detected.
import datefinder
matches = datefinder.find_dates(u'He was in hospital from Aug-2001 to Feb-2002.')
for match in matches:
print match
However if the sentence has only one date component(same date format), it is detected.
import datefinder
matches = datefinder.find_dates(u'He was in hospital from Aug-2001.')
for match in matches:
print match
Env:
Windows, 64bit, Python 2.7
I've tried pycharm's buildin debuger and ipdb
None of them could step into the find_dates function.
A day specified as in 12th day of December, 2001 or in regex geek:
[0-9][0-9]?(st|nd|rd|th) day of
is not parsed.
Thnx for the great work!
I am working on R(Shiny) that is working on a loan prediction model. I am trying to calculate loan amount for a particular method but it is showing me an error for infinite recursion and just like mentioned in most of the posts even I have tried options(expressions = 1000) value, but am still experiencing the error.
If I write my condition as
MDBB_LA<- reactive({ input$MDBB*10 })
then it is giving me no error but if I add another condition on this as
DSCR_Post<- reactive({ if (input$MU == "EMM" & (input$EMIM/12)+EMI()!=0) { EBITDA_EMM()/((input$EMIM/12) + EMI()) } else if (input$MU == "EMM" & (input$EMIM/12)+EMI()==0) { 0 } else if (input$MU != "EMM" & (input$EMIM/12)+EMI()!=0 ){ EBITDA()/((input$EMIM/12) + EMI()) }else{ 0 }})
MDBB_LA<- reactive({ if ((input$MU == "EMM" & DSCR_Post() >= 1) | (input$MU == "FAT1" & DSCR_Post() >= 0.8) | (input$MU == "FAT2" & DSCR_Post() >= 0.7) | (input$MU == "UAT" & DSCR_Post() >= 0.5)) { input$MDBB*10*2 } else if ((input$MU == "EMM" & DSCR_Post() < 1) | (input$MU == "FAT1" & DSCR_Post() < 0.8 ) | (input$MU == "FAT2" & DSCR_Post() < 0.7) | (input$MU == "UAT" & DSCR_Post() < 0.5)){ input$MDBB*10 } else if ((input$MU == "MDBB1" ) | (input$MU == "MDBB2" ) | (input$MU == "MDBB3") | (input$MU == "MDBB4") ){ input$MDBB*10 } else {input$MDBB*10} })
then it is showing me error as :
Warning: Error in : evaluation nested too deeply: infinite recursion / options(expressions=)?
Anyone who can help me, what is going wrong with the logical statement.
Hi,
Thanks for writing this module, I've been playing around with it and have found that there seems to be an issue finding dates when words like "to", "by" and "until" are in the string. I notice these words are included in EXTRA_TOKENS_PATTERNS in datefinder.py but I'm not really familiar with dateutil module so not sure why this should cause an issue. Below is some output showing some examples where dates aren't identified and how swapping the word "to" for the word "so" means dates are correctly identified:
>>> chk = "i am looking for a date june 4th 1996 to july 3rd 2013"
>>> matches = datefinder.find_dates(chk)
>>> matchlist = list(matches)
>>> matchlist
[]
>>> chk = "i am looking for a date june 4th 1996 so july 3rd 2013"
>>> matches = datefinder.find_dates(chk)
>>> matchlist = list(matches)
>>> matchlist
[datetime.datetime(1996, 6, 4, 0, 0), datetime.datetime(2013, 7, 3, 0, 0)]
>>> chk = "october 27 1994 to be put into effect on june 1 1995"
>>> matches = datefinder.find_dates(chk)
>>> matchlist = list(matches)
>>> matchlist
[datetime.datetime(1995, 6, 1, 0, 0)]
>>> chk = "october 27 1994 so be put into effect on june 1 1995"
>>> matches = datefinder.find_dates(chk)
>>> matchlist = list(matches)
>>> matchlist
[datetime.datetime(1994, 10, 27, 0, 0), datetime.datetime(1995, 6, 1, 0, 0)]
Issue #14 has highlighted a discrepancy between extra tokens we use to help us locate dates:
EXTRA_TOKENS_PATTERN = 'due|by|on|standard|daylight|savings|time|date|of|to|until|z|at|t'
and those tokens which are later replaced because dateutil.parser
cannot accept them:
REPLACEMENTS = {
"standard": "",
"daylight": "",
"savings": "",
"time": "",
"date": "",
"by": "",
"due": "",
"on": "",
",": "",
}
Currently 'to'
is not in the REPLACEMENTS for example.
My guess is that not all the extra tokens need to be replaced, meaning, REPLACEMENTS is a subset of EXTRA_TOKENS. We should at least have a test that shows which extra tokens dateutil.parse handles and which need to be in REPLACEMENTS.
Hi,
I tried using datefinder to find dates that are together with Portuguese texts. I figured that although the package is originally made for English, it can get the dates in certain formats like YYYY-MM-DD but not say Janeiro 22, 2016 since "janeiro" is not in the RE pattern.
For input text like:
one_str = "O Benfica está nas meias-finais da Taça de Portugalo Leixões por 2012-2-21."
,
it can parse 2012-2-21 to 2012-02-21 00:00:00
.
However if the text is like:
sec_str = O Benfica está nas meias-finais da Taça de Portugalo Leixões por 6 2 Ricardo Salgado está impedido de sair do país e de contactar com os outros arguidos da operação Marquês.
,
the package parses "6 2" as a date 2017-06-02 00:00:00
.
If I have a string with a number that has two digits like:
third_str = O Benfica está nas meias-finais da Taça de Portugalo Leixões por 22 Ricardo Salgado
,
it will ignore 22.
If I have a string with a number that has three digits like:
O Benfica está nas meias-finais da Taça de Portugalo Leixões por 6 2 Ricardo Salgado está impedido de sair do país e de contactar 233 com os outros arguidos da operação Marquês.
,
it will consider "233" as a year.
I would like to ask if it should behave this way, and/or pointers to extending your package to another language.
Thank you very much.
list(datefinder.find_dates('date: 11-05-16'))
gives an empty list
while list(datefinder.find_dates('date 11-05-16'))
(without the colon) gives the correct result [datetime.datetime(2016, 11, 5, 0, 0)]
Hello
I have this very small text and it seems it's very slow to parse it, am I doing something wrong?
text = "hello 2014/02/15 love you"
import datefinder
matches = datefinder.find_dates(text,False,False,True);
print "datefinder = "
for match in matches:
if match.year>1800 :
print match,match.year,match.month,match.day
$ time python test.py
datefinder =
2014-02-15 00:00:00 2014 2 15
real 0m1.380s
user 0m0.976s
sys 0m0.080s
Okay so When I search this string "[[[uSaturday 04 January 2014 y 27 December 2013 uBooked u23:20 u1"
I get correctly -
2014-01-04 00:00:00
2013-12-27 00:00:00
But when I run it on "[[uMonday 30 December 2013 y 27 December 2013 u23:20 u1"
I get -
2013-12-27 00:00:00
2016-10-07 23:20:00
Why am I getting wrong results despite only a change in date?
If datefinder attempts to parse a string such as "32 2016", it throws an error "TypeError: Required argument 'day' (pos 3) not found".
The combination needs to be a string with a combination of 2 and 4 digits, and the number with 2 digits needs to be greater than 31 or less than 1 (eg 00) to throw the error.
I found myself deep down the dateparser rabbit hole, but thought this scenario may need to be filtered in datefinder prior to passing to dateparser.parse. Eg something like the below added to parse_date_string (this is a bit ugly, but does the job)
date_string_split = date_string.split()
if len(date_string_split) == 2:
if len(date_string_split[0]) == 2 and len(date_string_split[1]) == 4 and (int(date_string_split[0]) < 1 or int(date_string_split[0]) > 31):
return None
if len(date_string_split[1]) == 2 and len(date_string_split[0]) == 4 and (int(date_string_split[1]) < 1 or int(date_string_split[1]) > 31):
return None
I can submit a pull request, but need to make my code prettier...
Hello,
a string which has digits before the date to be extracted is not correctly handled, and thus does not extract any dates (both in strict and non-strict mode).
Here is an example:
' sf.0008 05/04/17 21:34'
Kind regards
See this input text has no dates in it (not even numbers!):
text = '''Notwithstanding Lender’s acceleration of the sums secured by this Mortgage due to Borrower's default, Borrower shall have the right to have any proceedings begun by Lender to enforce this Mortgage discontinued at any time prior to entry of a judgment enforcing this Mortgage if: (a) Borrower pays Lender all sums which would be then due under this Mortgage and the Credit Agreement had no acceleration occurred; (b) Borrower cures all events of default; (c) Borrower pays all reasonable expenses incurred by Lender in enforcing the covenants and agreements of Borrower contained in this Mortgage, and in enforcing Lender’s remedies as provided in paragraph 22 hereof, including, but not limited to, reasonable attorneys' fees; and (a) Borrower takes such action as Lender may reasonably require to assure that the lien of this Mortgage, Lender's interest in the Property and Borrower’s obligation to pay the sums secured by this Mortgage shall continue unimpaired.'''
Running list(datefinder.find_dates(text))
produces
[(datetime.datetime(2017, 5, 9, 0, 0), 'may')]
Why?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.