Giter Club home page Giter Club logo

date-extractor's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

date-extractor's Issues

Error in extraction of dates for upcoming years

When trying extract upcoming dates for upcoming years, I am finding unexpected results. Here are few failed cases.

from date_extractor import extract_dates

extract_dates('can you find correct date here 2033')
[datetime.datetime(1920, 3, 3, 0, 0)]

extract_dates('can you find correct date here june 2033')
[]

extract_dates('can you find correct date here 2 june 2033')
[datetime.datetime(1920, 6, 2, 0, 0)]

extract_dates('can you find correct date here 12 january 2018')
[datetime.datetime(1920, 1, 12, 0, 0)]

extract_dates('can you find correct date here 1 january 2018')
[datetime.datetime(1920, 1, 1, 0, 0)]

Inconsistency in the following cases

I am finding some inconsistency in the following cases:

in first case i am getting - ValueError: day is out of range for month

extract_dates('can you find correct date here 31 april 2017')
Traceback (most recent call last):
File "", line 1, in
File "/home/pranavwaila/anaconda2/lib/python2.7/site-packages/date_extractor/init.py", line 192, in extract_dates
completes = [datetime(normalize_year(d['year']),int(d['month']),int(d['day'])) for d in completes]
ValueError: day is out of range for month

where as similarly when i pass the out of range date for december, it is handeled:

extract_dates('can you find correct date here 32 december 2017')
[]

2015 is interpreting incorrectly

As of now, 2015 is interpreting as Yr: 20, Month: 1, Day: 5

But it should be interpreted as Year 2015.....

Changing

p["date"] = (
"(?P"
+ "|".join(
[p["iso"] , p["mdy"], p["dmy"], p["ymd"], p["my"] , p["y"]]
)
+ ")"
)

to

p["date"] = (
"(?P"
+ "|".join(
[p["iso"] , p["y"], p["mdy"], p["dmy"], p["ymd"], p["my"] ]
)
+ ")"
)

i.e putting p["y"] at the start is solving this... pls share your thoughts

Extract Only Year from text

Thanks for this great project.
Currently I am able to extract the dates, but for only year i.e for eample "In year 2011 the incident happened." The program retrieves "2011-01-01 00:00:00+00".

But we need to retrieve it as "2011-01-01 12:14:12+00"
Can you please let me know how should I change in the library to achieve this.

The basic Aim is to differentiate the original "1st Jan 2011" and "2011".

Thanks

detect_format, train on column before extraction

Assuming a column in a csv will all be formatted the same, I should be able to train on a column of dates before detecting dates

two new methods

from date_extractor import detect_format

data = [None, "", "10/31/23", "1/2/23"]

detect_format(data)
"%m/%d/%Y"


from date_extractor import prepare

extract_date = prepare(data)
for date in data:
    extract_date(date)

Return matched text

Hi,

I think it would be great if the extract_dates function could return the original matched text.
ie:
extract_dates('This happened 2020-01-01')
would return matches and the original date text (2020-01-01)

New date formatting

I was processing a bunch of text blobs and the date/time is written like this:
23:49:58 on 11/9/2020.
Would it be hard to add support for the time before the date to the data-extractor?

UnicodeDecodeError on Import

Getting this error during import, running Python 3.6.3:

File "myproject\myfile.py", line 6, in <module>
    from date_extractor import extract_dates
  File "myproject\venv\lib\site-packages\date_extractor\__init__.py", line 6, in <module>
    from . import enumerations
  File "myproject\venv\lib\site-packages\date_extractor\enumerations.py", line 83, in <module>
    lines = f.read().split("\n")
  File "myproject\venv\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 129: character maps to <undefined>

Works fine on another machine running 3.5.?. Any clues?

Issue with numbers

Hi
Thanks a lot for your great job.
I have some issues regarding number, most of numbers in the text is converted to date !
for example
text2="The meeting will be held at paris Allé 6, 0208 paris. Election 30 of a chairperson in france. page 18 of 20"

then we did get
[datetime.datetime(2008, 2, 6, 0, 0, tzinfo=), datetime.datetime(1930, 1, 1, 0, 0, tzinfo=), datetime.datetime(2018, 1, 1, 0, 0, tzinfo=), datetime.datetime(1920, 1, 1, 0, 0, tzinfo=)]
As you see all numbers her should not be extracted as date.
Is there any sulotion ?

Thanks

day and month coming out swapped

For the string "5/1/2016 ", the results are "date": "2016-05-01", the day and month are shown opposite. Kindly tell if some parameter can be used to manually handle this. Or kindly provide a fix for it.

Date sometimes not recognized in v3.9.1

>>> extract_date("some_text_20140205") or print("Uhoh...")
datetime.datetime(2014, 2, 5, 0, 0, tzinfo=<UTC>)

>>> extract_date("some_text20140205") or print("Uhoh...")
Uhoh...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.