Giter Club home page Giter Club logo

Comments (14)

zed avatar zed commented on August 18, 2024

Whether the test succeeds depends on PYTHONHASHSEED envvar e.g., if it equals 12 then all tests pass:

$ PYTHONHASHSEED=12 python3 run_tests.py
.....................................................
----------------------------------------------------------------------
Ran 53 tests in 0.292s

OK

but with PYTHONHASHSEED=123 testMonths fails. The hash value is also may depend on the python version. The above values are for Python 3.4.

The result (as expected) doesn't depend on the PyICU version installed. I've tried with 1.5 and 1.8.

from parsedatetime.

zed avatar zed commented on August 18, 2024

testMonths continues to fail on the earliest parsedatetime version that I can run 7c5e6a0

from parsedatetime.

fake-name avatar fake-name commented on August 18, 2024

Can confirm this issue. I'm seeing strings like 1 month ago fail to be parsed at all in the current version from pip.

Edit: Just checked out the HEAD. I'm seeing the exact same behaviour as zed describes.

from parsedatetime.

fake-name avatar fake-name commented on August 18, 2024

The output of self.cal.parse('1 month', start) is (time.struct_time(tm_year=2014, tm_mon=10, tm_mday=19, tm_hour=6, tm_min=3, tm_sec=0, tm_wday=6, tm_yday=292, tm_isdst=-1), 2), so it looks like the parser is incorrectly classifying the string as a time, rather then a date.

This is supported by comparing the output of self.cal.parse('1 month', start) and start:

Parsed:  
time.struct_time(tm_year=2014, tm_mon=10, tm_mday=19, tm_hour=6, tm_min=4, tm_sec=25, tm_wday=6, tm_yday=292, tm_isdst=-1)
Now:  
time.struct_time(tm_year=2014, tm_mon=10, tm_mday=19, tm_hour=6, tm_min=3, tm_sec=25, tm_wday=6, tm_yday=292, tm_isdst=-1)

So, it appears that the parser is incorrectly parsing the "1" as a minute quantity with a positive offset.

from parsedatetime.

fake-name avatar fake-name commented on August 18, 2024

Ok, I turned on lots of logging. Interesting results:

herp@mainnas:/media/Storage/Scripts/parsedatetime$ python3 run_tests.py parsedatetime.tests.TestUnits.test.testMonths
daysInMonth(11, 2014)

parse()
parse (top of loop): [1 month][]
parse 1 [][][]
parse 2 [][][]
CRE_UNITS matched
parse 3 [1 m][][onth]
parse 4 [1 m][][onth]
parse 5 [1 m][][onth]
parse 6 [1 m][][onth]
parse 7 [1 m][][onth]
parse 8 [1 m][][onth]
parse 9 [1 m][][onth]
parse A [1 m][][onth]
parse (bottom) [ onth][1 m][][onth]
weekday False, dateStd False, dateStr False, time False, timeStr False, meridian False
dayStr False, modifier False, modifier2 False, units True, qunits False
_evalString(1 m, time.struct_time(tm_year=2014, tm_mon=10, tm_mday=19, tm_hour=6, tm_min=12, tm_sec=49, tm_wday=6, tm_yday=292, tm_isdst=-1))
unitsFlag is set
_buildTime: [1 ][][m]
units m --> realunit minutes
parse (top of loop): [ onth][]
parse 1 [][][]
parse 2 [][][]
parse 3 [][][]
parse 4 [][][]
parse 5 [][][]
parse 6 [][][]
parse 7 [][][]
parse 8 [][][]
parse 9 [][][]
parse A [][][]
parse (bottom) [][][][]
weekday False, dateStd False, dateStr False, time False, timeStr False, meridian False
dayStr False, modifier False, modifier2 False, units False, qunits False
parse() return dateFlag 0 timeFlag 2 totalTime time.struct_time(tm_year=2014, tm_mon=10, tm_mday=19, tm_hour=6, tm_min=13, tm_sec=49, tm_wday=6, tm_yday=292, tm_isdst=-1)
F
======================================================================
FAIL: testMonths (parsedatetime.tests.TestUnits.test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/media/Storage/Scripts/parsedatetime/parsedatetime/tests/TestUnits.py", line 110, in testMonths
    self.assertTrue(_compareResults(self.cal.parse('1 month',  start), (target, 1)))
AssertionError: False is not true

----------------------------------------------------------------------
Ran 1 test in 0.144s

So it looks like, /sometimes only/, the "m" is being interpreted as the prefix for "minutes", rather then as part of the value "months".

It always happens in the parse 3 stage.

from parsedatetime.

fake-name avatar fake-name commented on August 18, 2024

Ok, I've figured out where the issue is.

Basically, it appears that the ordering of the contents of self.locale.re_values['numbers'] is non-determinstic. self.locale.re_values['numbers'] contains both the literal m and months.

When the literal m sorts before months, the test fails, because the regex binds to the m preferentially. When the months sorts first, everything works fine.

I think the solution here would be to add a word boundary match to the self.RE_UNITS regex, so the regex has to match an entire word, rather then just the starting boundary and the first n characters.

In other words:

        self.RE_UNITS     = r'''(?P<qty>(-?(\b(%(numbers)s)\b|\d+)\s*
                                         (?P<units>((\b%(units)s)s?))
                                        ))''' % self.locale.re_values

Becomes

        self.RE_UNITS     = r'''(?P<qty>(-?(\b(%(numbers)s)\b|\d+)\s*
                                         (?P<units>((\b(%(units)s))s?\b))
                                        ))''' % self.locale.re_values

Note that I had to add an additional set of parenthesis, because as it is, (\b%(units)s) is evaluating to (\bsecond|seconds|sec|s|month|months|mth|year|years|yr|y|hour|hours|hr|h|minute|minutes|min|m|week|weeks|wk|w|day|days|dy|d), and as such only the first item is actually even using the \b boundary.

The boundary grouping issue is also occuring in the RE_QUNITS regex, so there are certain conditions where it improperly matches as well:

parse() return dateFlag 1 timeFlag 0 totalTime time.struct_time(tm_year=2014, tm_mon=11, tm_mday=19, tm_hour=6, tm_min=34, tm_sec=0, tm_wday=2, tm_yday=323, tm_isdst=-1)
parse()
parse (top of loop): [1month][]
parse 1 [][][]
parse 2 [][][]
M None
parse 3 [][][]
CRE_QUNITS matched
parse 4 [1m][][onth]
parse 5 [1m][][onth]
parse 6 [1m][][onth]
parse 7 [1m][][onth]
parse 8 [1m][][onth]
parse 9 [1m][][onth]
parse A [1m][][onth]
parse (bottom) [ onth][1m][][onth]
weekday False, dateStd False, dateStr False, time False, timeStr False, meridian False

(with PYTHONHASHSEED=152)

from parsedatetime.

fake-name avatar fake-name commented on August 18, 2024

Actually, considering some of the comments, I think the core of the issue might be the values of self.locale.re_values['units'] which both seems to include single char date characters, which contradicts some of the comments in the code:

units year|years|yr|y|day|days|dy|d|hour|hours|hr|h|minute|minutes|min|m|second|seconds|sec|s|week|weeks|wk|w|month|months|mth

        # Given string is a time string with units like "5 hrs 30 min"
        if self.unitsFlag:
--- SNIP ---
        # Given string is a time string with single char units like "5 h 30 m"
        if self.qunitsFlag:

from parsedatetime.

fake-name avatar fake-name commented on August 18, 2024

Ok, so the root of the issue is that single-character duration labels are getting into self.locale.units somehow.

I filtered them out:

(line 2226... in __init__.py)

            l = []
            for s in self.locale.units:
                l = l + self.locale.units[s]
---- to ----
            l = []
            for s in self.locale.units:
                l = l + [x for x in self.locale.units[s] if len(x) > 1]

And it fixes the issue initially described. I still see periodic failures in the 1month match, but I suspect that's a deeper issue with the regex.


Sorry if this thread has gotten a bit long. This is an interesting problem, and it's affecting a project I'm working on, so I've been poking at it.

from parsedatetime.

fake-name avatar fake-name commented on August 18, 2024

Ok, the fix as I described above fixes the month parsing issue, but breaks a number of the other tests, primarily those using the single-char time notation.

I suspect what it's actually doing is preventing the self.RE_UNITS regex from occationally working in some contexts, which is exposing an existing issue with the self.RE_QUNITS regex that was already there, and just papered over by the accidental functionality of the self.RE_UNITS regex, but the regexes are melting my brain at this point, so I'm going to stop for the moment.

from parsedatetime.

zed avatar zed commented on August 18, 2024

Note: locale.units is a dictionary that is why the order depends on PYTHONHASHSEED. I've created PR #57 that fixes this issue.

from parsedatetime.

bear avatar bear commented on August 18, 2024

Let me know if either of you would like to be added as contributors - your both actively using and fixing bugs.

from parsedatetime.

fake-name avatar fake-name commented on August 18, 2024

I'd be willing to spend some time trying to make the regex stuff more concise.

Particularly, is the use of named %(name)s string formatting a legacy compatibility thing? Moving to '{val}'.format(val=something) makes things MUCH more readable, and makes scoping much easier to understand.

The .format() method was added in python 2.6. Is < 2.6 still a support target?

from parsedatetime.

bear avatar bear commented on August 18, 2024

< 2.6 is not longer supported in the main branch - 2.7+ and 3.0 are it's goals

from parsedatetime.

fake-name avatar fake-name commented on August 18, 2024

Ok, cool. The .format() method was introduced in 2.6 anyways, so it's a bit of a moot point.

from parsedatetime.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.