r1chardj0n3s / parse Goto Github PK

View Code? Open in Web Editor NEW

1.7K 1.7K 100.0 405 KB

Parse strings using a specification based on the Python format() syntax.

Home Page: http://pypi.python.org/pypi/parse

License: MIT License

Python 100.00%

parse's People

Stargazers

Watchers

Forkers

kennethreitz-archive jnrowe-retired-forks jenisys maisano wojons jkmacc pombredanne vallsv lkilcher lucien2k moonbot scooterman sparkslabs kivio benthomasson titussanchez moreati amigadave raylore2000 mjmvisser nivir lguyogiro es-so timofurrer ricyteach morabaraba richard-reece jfrfonseca mpagel techalchemy adam-meya grishaspektor nokusukun bermanmaxim wdv4758h danshorstein bellyfat briancknight tuna25 tuksik mjfitzge kyluca martian111 dialneus akashdesarda stjordanis jab reynoldsnlp adamchainz gridl bionictk abhijeetmanhas rrosajp wrmsr xrosliang jarvan40 wasdee desabel suyujun91 maxxk a29107a qyttools eric-seekas jonike jonathangjertsen orions-stardom traceofpoem tomviner ii0 erichear tomerha menesis silygose tommyj83 ihfazhillah technodiver luffbee kaanizgi jamie-chang wandrys-dev python-repository-hub sdementen regmetrics tobynance mkokryashkin catalystneuro wimglenn arpitjain799 presteddy56 bendichter hussein-l-almadhachi krrt7 yetingli hellobrobro bbertincourt 150520 fxwhu yeatry yanorepuser4 blablatdinov

parse's Issues

Can't install v1.8.1/v1.8.2 with python3

Can't install parse to virtualenv in Debian/tesing.
1.8.0 is ok, 1.8.1 and 1.8.2 does not install.

/tmp$ virtualenv -p python3 virtualenv
Already using interpreter /usr/bin/python3
Using base prefix '/usr'
New python executable in /tmp/virtualenv/bin/python3
Also creating executable in /tmp/virtualenv/bin/python
Installing setuptools, pkg_resources, pip, wheel...done.

/tmp$  . virtualenv/bin/activate

(virtualenv) /tmp$ pip install parse
Already using interpreter /usr/bin/python3
Using base prefix '/usr'
New python executable in /tmp/virtualenv/bin/python3
Also creating executable in /tmp/virtualenv/bin/python
Installing setuptools, pkg_resources, pip, wheel...done.
Collecting parse
  Using cached parse-1.8.2.tar.gz
Building wheels for collected packages: parse
  Running setup.py bdist_wheel for parse ... error
  Complete output from command /tmp/virtualenv/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-kz7sa7u4/parse/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpdxlborhzpip-wheel- --python-tag cp35:
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib
  copying parse.py -> build/lib
  installing to build/bdist.linux-x86_64/wheel
  running install
  running install_lib
  creating build/bdist.linux-x86_64
  creating build/bdist.linux-x86_64/wheel
  copying build/lib/parse.py -> build/bdist.linux-x86_64/wheel
  running install_egg_info
  running egg_info
  writing parse.egg-info/PKG-INFO
  writing top-level names to parse.egg-info/top_level.txt
  writing dependency_links to parse.egg-info/dependency_links.txt
  reading manifest file 'parse.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  writing manifest file 'parse.egg-info/SOURCES.txt'
  Copying parse.egg-info to build/bdist.linux-x86_64/wheel/parse-1.8.2-py3.5.egg-info
  running install_scripts
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/tmp/pip-build-kz7sa7u4/parse/setup.py", line 35, in <module>
      'License :: OSI Approved :: BSD License',
    File "/usr/lib/python3.5/distutils/core.py", line 148, in setup
      dist.run_commands()
    File "/usr/lib/python3.5/distutils/dist.py", line 955, in run_commands
      self.run_command(cmd)
    File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command
      cmd_obj.run()
    File "/tmp/virtualenv/lib/python3.5/site-packages/wheel/bdist_wheel.py", line 215, in run
      self.run_command('install')
    File "/usr/lib/python3.5/distutils/cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command
      cmd_obj.run()
    File "/tmp/virtualenv/lib/python3.5/site-packages/setuptools/command/install.py", line 61, in run
      return orig.install.run(self)
    File "/usr/lib/python3.5/distutils/command/install.py", line 595, in run
      self.run_command(cmd_name)
    File "/usr/lib/python3.5/distutils/cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command
      cmd_obj.run()
    File "/tmp/virtualenv/lib/python3.5/site-packages/setuptools/command/install_scripts.py", line 17, in run
      import setuptools.command.easy_install as ei
    File "/tmp/virtualenv/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 49, in <module>
      from setuptools.py27compat import rmtree_safe
    File "/tmp/virtualenv/lib/python3.5/site-packages/setuptools/py27compat.py", line 7, in <module>
      import six
  ImportError: No module named 'six'

Issue with quoting and question marks: parse('"{}"?', '"teststr"?')

>>> '"{}"?'.format("teststr")
'"teststr"?'
>>> parse('"{}"?', '"teststr"?')
<Result ('teststr"?',) {}>

I would like to match only "teststr". Am I doing something wrong?

Difference in `parse` and `search`

I know that parse.search should be used to match a pattern at any position in the string whereas parse.parse has to match the string exactly.

The following issue came up some days ago in the radish project: radish-bdd/radish#106
Especially this comment might be interesting: radish-bdd/radish#106 (comment)

However, this gives as interesting outcome:

>>> patt = parse.compile('I have a {}')
>>> patt.search('I have a apple')
<Result ('a',) {}>
>>> patt.parse('I have a apple')
<Result ('apple',) {}>

and

>>> patt = parse.compile('I {} a {}')
>>> patt.parse('I have a apple')
<Result ('have', 'apple') {}>
>>> patt.search('I have a apple')
<Result ('have', 'a') {}>

As you can see search and parse are giving different results. In this example it indeed be possible to just use parse - but in a lot of cases we use this library for is not.

Is this intended behavior?

Adding @rscrimojr

Can I parse a JSON string with any dotted names?

I want to parse JSON string.
But I got an error and I can not parse.

from parse import *

pattern = '{"name": {data.name}, "age": {data.age}}'
print(parse(pattern, '{"name": "test", "age": 25}'))

result

ValueError: format spec 'name":' not recognised

Is it impossible to parse a JSON character string with any dotted names?

Add search() and findall()

More than just match! findall() should be an iterator.

Support text alignment

Format supports text aligning, but parser does not. It'd be nice if it did :-)

Example:

from parse import parse
fmt = "{:>6}{:>7}"
print(("three", "four") == parse(fmt, fmt.format("three", "four"))) #this should be True

Parse chokes on parse("{n} {n}, "x x")

The parser is apparently in an infinite loop.

I think it should at least say that this format is not parsable.
At best, it should do:

parse("{n} {n}", "x x") -> {"n", "x"}
parse("{n} {n}", "x y") -> None

Parsing a continous string

I was trying to parse continous string having letters and numbers by the easiest way:

parse('{:>w}{:g}{:w}{:g}{:w}', ' G3.80XA5.2M')
<Result ('G', 3.8, 'XA', 5.2, 'M') {}>
parse('{:>w}{:g}{:w}{:g}{:w}', ' G3.80XA4.2M')
<Result ('G', 3.8, 'XA', 4.2, 'M') {}>

as far so good, but

parse('{:>w}{:g}{:w}{:g}{:w}', ' G3.80XA04.2M')
<Result ('G', 3.8, 'XA0', 4.2, 'M') {}>
parse('{:>w}{:g}{:w}{:g}{:w}', ' G3.80XA44.2M')
<Result ('G', 3.8, 'XA4', 4.2, 'M') {}>
parse('{:>w}{:g}{:w}{:g}{:w}', ' G3.80XA40M')
<Result ('G', 3.8, 'XA4', 0.0, 'M') {}>

changing input string a little bit I was getting bad results.

I am missing something?

Igor

Problem with findall in parse module

Reference link : https://pypi.python.org/pypi/parse

data = """ 0 2 PRLI Def 666E00 B A 0 0
1 2 PRLI Def 017000 B A 0 0"""

for x in parse.findall("{Id:^d} {Index:^d} {State:^} {Emulation:^} {ID:^} {NN:^} {PN:^} {ABTS:^d} {SRR:^d}\n", data):
print ("====", x.named)

Above code will and and gets hang(stuck) after printing first line.
But if we reduce number of columns then it will work fine.
also if there is difference in number of columns in data and pattern string then findall will hang.

Unicode support?

Ctrl+F "unicode" on README.rst here and on https://pypi.python.org/pypi/parse doesn't find anything. parse appears to hang indefinitely on unicode strings (as with e.g. from __future__ import unicode_literals). Is that the case? Is there any expectation it'll work in the future?

Several DeprecationWarning: invalid escape sequence

As noted in PR #71

Please add a license

I want to use this project but can't because there is no license. No license on a project means all rights are reserved the author of the code, that prevents any use of code by other people. Please consider adding a license. You can use https://choosealicense.com/ or https://tldrlegal.com/ to determine what license is right for you.

Thanks!

Request: Add a copy of the license in a separate file

Hey, we are vendoring this library over in https://github.com/pypa/pipenv and we are automating our vendoring process. As part of the broader distribution process we are trying to handle our current licensing issues by including explicit license files for each of our vendored dependencies. Would you be receptive to adding an additional LICENSE file with the text of the license of your software (MIT license I believe?)

If so I don't mind tossing a PR in this direction

Add an optional cardinality field for parsing at end of the parse schema

parse uses currently the same field schema like the str.format() function.
But parsing problems are often slightly different (and more complicated) compared to output formatting problems. I stumbled over a use case where it would be rather nice to have an optional cardinality field after the type field.

EXAMPLE:

#!python
schema = "I met {person:Person?} ..."  #< OPTIONAL DATA: Zero or one cardinality
schema = "I am meeting with {persons:Person+}"  #< MANY: One or more cardinality
schema = "I am meeting with {persons:Person*}"  #< MANY: Zero or more cardinality

The "many solution" is basically a comma-separated list of items for this datatype, like:

"I am meeting with Alice, Bob, Charly"

I have a canned, working solution (if my pull-request for the pattern attribute is accepted) that will allow to solve the underlying cardinality problem above: Generating a regular expression for the cardinality by using the regular expression of a user-defined (or built-in) data type.

pip installation

Hello,
Would it be possible to have the package available via pip ?
I used pip install git+https://github.com/r1chardj0n3s/parse that works well but something like pip install parse would be nice !
Thanks

Greedy matching causes subsequent specifiers to be included in first match


In [22]: p = compile('I make a POST request to "{url_path_segment}"')

In [23]: p.parse('I make a POST request to "{url_path}" with file "{filename}" as "{key}"')
Out[23]: <Result () {'url_path_segment': '{url_path}" with file "{filename}" as "{key}'}>

expected

None

LICENSE file in pypi tarball

I know it may sound tiresome after two issues filled because of license already. Could you please still consider to add LICENSE file also in pypi tarball?

Way to fall back on default parsing behavior for an overridden type spec

Edit

After looking a bit closer at the Custom Type Conversions section of the pypi page, I think I can probably get things working the way I need using these. The page contains this statement:

Your custom type conversions may override the builtin types if you supply one with the same identifier.

Is there an exception I can raise, or a method I can call, in the custom type conversion so as to make it fall back on the default behavior for the type conversion (either the one supplied, or a modified version of the one that was supplied)?

Original Question

Is there a way to force the API to fall back on to the default behavior for formatting types when they have been overridden by extra_types? Below is an example of what I mean.

Desired float parsing behavior:

>>> parse('{: >f}{: >f}', '   1.025      1.033')
<Result (1.025, 1.033) {}> # expected result

Actual behavior (when overridden):

>>> parse('{: >f}{: >f}', '   1.025      1.033', extra_types=dict(f=float))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\ricky\Anaconda3\lib\site-packages\parse.py", line 1117, in parse
    return Parser(format, extra_types=extra_types).parse(string, evaluate_result=evaluate_result)
  File "C:\Users\ricky\Anaconda3\lib\site-packages\parse.py", line 699, in parse
    return self.evaluate_result(m)
  File "C:\Users\ricky\Anaconda3\lib\site-packages\parse.py", line 766, in evaluate_result
    fixed_fields[n] = self._type_conversions[n](fixed_fields[n], m)
  File "C:\Users\ricky\Anaconda3\lib\site-packages\parse.py", line 882, in f
    return type_converter(string)
ValueError: could not convert string to float: '.025      1.033'

Use Case

I have a need for a couple of custom numerical types that can handle empty strings and spaces in a manner similar to zeroes. I've implemented them something like this:

class Blank():
    def __new__(cls, value):
        try:
            return super().__new__(cls, value)
        except ValueError:
            if value == '' or value == ' ':
                return super().__new__(cls, 0.0)
            else:
                raise
    def __str__(self):
        return '' if self==0 else super().__str__()
    def __format__(self, spec):
        if (spec.endswith('d') or spec.endswith('f') or spec.endswith('n')) and self==0:
            spec = spec[:-1]+'s'
            return format('',spec)
        else:
            return super().__format__(spec)

class BlankInt(Blank, int):
    '''An int that prints blank when zero.'''
    pass
        
class BlankFloat(Blank, float):
    '''A float that prints blank when zero.'''
    pass

This seems to partially work the way I had in mind:

>>> from parse import parse 
>>> parse('{: >5f}',  '     ', extra_types=dict(f=BlankFloat))
<Result (0.0,) {}>
>>> parse('{: >5f}'*5,  '     ', extra_types=dict(f=BlankFloat))
<Result (0.0, 0.0, 0.0, 0.0, 0.0) {}>

However, this doesn't work (since float doesn't work, either and BlankFloat is a subclass):

>>> parse('{: >f}{: >f}', '   1.025      1.033', extra_types=dict(f=BlankFloat))
ValueError: could not convert string to float: '.025      1.033'

Type converter signature should be changed (or extended)

The type converter function signature should be changed to:

def type_converter(text, match=None, match_start=0):
    # -- NEW: match_start : int = 0, 
    # refers to the first group in the match object for a field where the converter is used.
    pass

This change would allow to provide a generic Parser without type knowledge.
In addition, user-defined types should also provide this signature. Currently, only the first parameter is supported there which prevents to use complexes type converter cases.

Due to backward compatibility reasons, the old user-defined signature should be supported, too, at least for some time.

Problem with integer number parsing

Hi there.
I'm having a problem with the 1.6.6 version in both python 2.7.9 and python 3.4.3.

I was able to reproduce it with the following example:

>>> import parse
>>> parse.parse("blablabla {x:d}", "blablabla 12")
<Result () {'x': 12}>
>>> parse.parse("blablabla {x:d}", "blablabla jdhhd")
>>> parse.parse("blablabla {x:d}", "blablabla cdc")                                                                                                                                                                                
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/calsaverini/.envs/default/lib/python3.4/site-packages/parse.py", line 1044, in parse
    return Parser(format, extra_types=extra_types).parse(string)
  File "/home/calsaverini/.envs/default/lib/python3.4/site-packages/parse.py", line 681, in parse
    return self._generate_result(m)
  File "/home/calsaverini/.envs/default/lib/python3.4/site-packages/parse.py", line 731, in _generate_result
    m)
  File "/home/calsaverini/.envs/default/lib/python3.4/site-packages/parse.py", line 398, in f
    return sign * int(string, base)
ValueError: invalid literal for int() with base 10: ''

Is this a bug or am I missing some kind of edge case when {:d} might match cdc that I'm not aware of?

Thanks for your help.

Unmatched brace can still result in a match

Certain patterns can still result in a match even when there are unmatched braces. For example:

>>> parse("{who.txt", "hello")
<Result () {'who.tx': 'hello'}>

Even though there is no closing }, parse assumes the final character is the closing brace and matches the pattern accordingly. In this case, I'd expect parse to return None since there is no direct match.

Document behaviour when the template is ambiguous

Example:

>>> pattern = '{dir1}/{dir2}'
>>> data = 'root/parent/subdir'
>>> parse(pattern, data).named
{'dir1': 'root', 'dir2': 'parent/subdir'}

But {'dir1': 'root/parent', 'dir2': 'subdir'} is also fitting the pattern. Is this behaviour is reliable, or should it be considered implementation detail? I couldn't find it in the docs anywhere.

Is there anyway to coerce the result one way or the other?

Curious: why wasn't stdlib `string.Formatter().parse` used

Mostly just curious, not an actual bug.

I'm working on a (tangentially) related idea of mine to rewrite format strings.

https://docs.python.org/3/library/string.html#string.Formatter.parse

Cannot parse concatenated string items

I don't know if this is a bug or just me but here is my use case. I have two items that are concatenated in a filename : a date (YYYMMDD) and a 2-digit string (model_run). I can't find a way to specify a pattern that will allow parsing of the two concatenated elements :

>>> filename_pattern = 'Some_string_{wx_variable}_ps2.5km_{YYYYMMDD}{model_run}_P{forecast_hour}-00.extension'
>>> afile = 'Some_string_A_variable_ps2.5km_1999121100_P012-00.extension'
>>> r = parse(filename_pattern, afile)
>>> r
<Result () {'forecast_hour': '012', 'wx_variable': 'A_variable', 'model_run': '999121100', 'YYYYMMDD': '1'}>
>>> r.named['YYYYMMDD']
'1'
>>> r.named['model_run']
'999121100'

How can I get the parsed items I expect, i.e. r.named['YYYYMMDD'] = '19991211' and r.named['model_run'] = '00' ?

I would like to avoid fiddling with the filename_pattern before parsing (e.g. add some format options like width, if that is even possible) because that would defeat the purpose of having a file pattern to begin with IMO.

Thanx !

Pipe symbol not escaped

Literal text with a | symbol in it is not handled correctly:

>>> search('| {:d}', '| 10')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mj/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/parse.py", line 1041, in search
    return Parser(format, extra_types=extra_types).search(string, pos, endpos)
  File "/Users/mj/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/parse.py", line 678, in search
    return self._generate_result(m)
  File "/Users/mj/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/parse.py", line 699, in _generate_result
    fixed_fields[n] = self._type_conversions[n](fixed_fields[n], m)
  File "/Users/mj/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/parse.py", line 375, in f
    if string[0] == '-':
TypeError: 'NoneType' object has no attribute '__getitem__'

Escaping the | character explicitly makes it work:

>>> search('\\| {:d}', '| 10')
<Result (10,) {}>

Looking over the code base it should be trivial to fix by adding | to he REGEX_SAFETY pattern. However, I do wonder why re.escape() isn't used instead to escape regular expression metacharacters instead here. Am I missing something, does re.escape() escape too much?

Some hex values mistakenly parsed as zeroes

For some reason, parsing certain hex values with leading zeroes produces buggy and unreliable results.
Here's a quick demonstration using the latest version of parse (1.12.0):

>>> from parse import parse
>>> parse.parse('${:x}','$0b67')
<Result (0,) {}>
>>> parse.parse('${:x}','$0B67')
<Result (0,) {}>
>>> parse.parse('${:x}','$0B6')
<Result (0,) {}>
>>> parse.parse('${:x}','$0B')
<Result (11,) {}>
>>> parse.parse('${:x}','$B67')
<Result (2919,) {}>

It appears that parse() recognizes, but fails to parse certain combinations of hex digits, returning zero instead.
Interestingly, whether parsing fails depends on the digit next to zero.
Based on my testing, it always happens with numbers starting with "0Bxxx" (excluding "0B"; I know there was a separate [closed] bug report on that one, but it appears that the underlying issue is still there).

Brackets Break Date Parsing

import parse
def a(a):
    return a
a.pattern = '((3))'
parse.parse('{:a}q{:ti}', '3q2017-12-31', dict(a=a))

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-9cc16e0f1e39> in <module>()
      3     return a
      4 a.pattern = '((3))'
----> 5 parse.parse('{:a}q{:ti}', '3q2017-12-31', dict(a=a))

~/miniconda3/lib/python3.6/site-packages/parse.py in parse(format, string, extra_types, evaluate_result)
   1115     In the case there is no match parse() will return None.
   1116     '''
-> 1117     return Parser(format, extra_types=extra_types).parse(string, evaluate_result=evaluate_result)
   1118 
   1119 

~/miniconda3/lib/python3.6/site-packages/parse.py in parse(self, string, evaluate_result)
    697 
    698         if evaluate_result:
--> 699             return self.evaluate_result(m)
    700         else:
    701             return Match(self, m)

~/miniconda3/lib/python3.6/site-packages/parse.py in evaluate_result(self, m)
    764         for n in self._fixed_fields:
    765             if n in self._type_conversions:
--> 766                 fixed_fields[n] = self._type_conversions[n](fixed_fields[n], m)
    767         fixed_fields = tuple(fixed_fields[n] for n in self._fixed_fields)
    768 

~/miniconda3/lib/python3.6/site-packages/parse.py in date_convert(string, match, ymd, mdy, dmy, d_m_y, hms, am, tz, mm, dd)
    482         d=groups[dd]
    483     elif ymd is not None:
--> 484         y, m, d = re.split('[-/\s]', groups[ymd])
    485     elif mdy is not None:
    486         m, d, y = re.split('[-/\s]', groups[mdy])

ValueError: not enough values to unpack (expected 3, got 1)

Cannot use name with ti specifier

This doesn't work:

>>> parse.parse('on {date:ti}', 'on 2012-09-17')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build\bdist.win-amd64\egg\parse.py", line 849, in parse
  File "build\bdist.win-amd64\egg\parse.py", line 526, in parse
  File "build\bdist.win-amd64\egg\parse.py", line 572, in _generate_result
KeyError: 'date'

This does work:

>>> parse.parse('on {:ti}', 'on 2012-09-17')
<Result (datetime.datetime(2012, 9, 17, 0, 0),) {}>

Doesn't matter what name you use or what else is in the pattern, it always throws a KeyError.

parse of '0B' as an hexadecimal fails

While trying to parse a list of HID usages, I have a field that is represented by '0B' that I'd like to convert to the value 11.

When parsing the values, I am using parse.parse('{value:x}\t{name}', line), so I am specifying that I want the int value to be an hex. However, I am hitting https://github.com/r1chardj0n3s/parse/blob/master/parse.py#L441 (in int_convert), and parse decides that my hex value is a base 2 one, and returns 0.

One solution could be to enforce the size to be at least 3 in int_convert if the value starts with a '0' and a known prefix. But I think if the users provides the base for the conversion, the int_convertfunction should not try to be smart and simply use the provided base.

hex letters are considered "digits", really?

Just got bitten by this, I think it's a bug...

>>> parse('wat {:d} wat', 'wat 12345 wat')
<Result (12345,) {}>
>>> parse('wat {:d} wat', 'wat 12f45 wat')
<Result (1245,) {}>
>>> parse('wat {:d} wat', 'wat 12g45 wat')
>>> parse('wat {:d} wat', 'wat ff3ff wat')
<Result (3,) {}>
>>> parse('wat {:d} wat', 'wat fffff wat')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-54-91053c27d470> in <module>()
----> 1 parse('wat {:d} wat', 'wat fffff wat')

/home/wglenn/.virtualenvs/scratch/lib/python3.6/site-packages/parse.py in parse(format, string, extra_types, evaluate_result)
   1113     In the case there is no match parse() will return None.
   1114     '''
-> 1115     return Parser(format, extra_types=extra_types).parse(string, evaluate_result=evaluate_result)
   1116 
   1117 

/home/wglenn/.virtualenvs/scratch/lib/python3.6/site-packages/parse.py in parse(self, string, evaluate_result)
    695 
    696         if evaluate_result:
--> 697             return self.evaluate_result(m)
    698         else:
    699             return Match(self, m)

/home/wglenn/.virtualenvs/scratch/lib/python3.6/site-packages/parse.py in evaluate_result(self, m)
    762         for n in self._fixed_fields:
    763             if n in self._type_conversions:
--> 764                 fixed_fields[n] = self._type_conversions[n](fixed_fields[n], m)
    765         fixed_fields = tuple(fixed_fields[n] for n in self._fixed_fields)
    766 

/home/wglenn/.virtualenvs/scratch/lib/python3.6/site-packages/parse.py in f(string, match, base)
    412         chars = CHARS[:base]
    413         string = re.sub('[^%s]' % chars, '', string.lower())
--> 414         return sign * int(string, base)
    415     return f
    416 

ValueError: invalid literal for int() with base 10: ''

Allow a field in the parse format to be optional

The suggested syntax from jenisys is to suffix the type with "?" which I believe is reasonable. Thus:

{honorific:s?} {given:s} {sur:s}

would match both of:

"Mr Richard Jones"
"Jens Engels"

The "honorific" element in the result object would have the value None.

Parser constructor should allow to set the regular expression flags (re flags)

Currently, the Parser class always applies "re.IGNORECASE" internally where needed. This may not be always desired. Therefore, it would be best, if the constructor of Parser would allow to provide own "re flags" or disable the "re.IGNORECASE" flag.

Supported versions problem: python2.6, python2.5 (minor problem)

The "setup.py" file currently still states that python2.5 and python2.6 are supported.
This may be true, but can currently not be proven because the test suite contains at least tests that run only on python2.7 and newer.

EXAMPLE:

$ pytest
platform ... -- Python 2.6.9, pytest-3.2.5, ...
...
self = <test_parse.TestParseType testMethod=test_decimal_value>
    def test_decimal_value(self):
        value = Decimal('5.5')
>       str_ = 'test {}'.format(value)
E       ValueError: zero length field name in format

NOTE: string-format without index or named args work only for Python2.7.x or newer AFAIK.

POSSIBLE SOLUTIONS:

Drop support for older python versions
Ensure that tests pass on all supported python version (and fix the test)

Bug in {:ta} and {:tg} parsing for AM/PM

When using the :ta parsing format, the hour between Noon and 13:00 (aka 1:00PM) generates a ValueError in datetime because hour must be in the range 0..23.

Example:

parse.version
'1.6.2'

parse.parse('Meet at {:tg}', 'Meet at 1/2/2011 1:00 PM')
<Result (datetime.datetime(2011, 2, 1, 13, 0),) {}>

example from the documentation. Now, changing this to 12:45 pm doesn't work.

parse.parse('Meet at {:tg}', 'Meet at 1/2/2011 12:45 PM')
Traceback (most recent call last):
File "", line 1, in
File "//anaconda/lib/python2.7/site-packages/parse-1.6.2-py2.7.egg/parse.py", line 983, in parse
return Parser(format, extra_types=extra_types).parse(string)
File "//anaconda/lib/python2.7/site-packages/parse-1.6.2-py2.7.egg/parse.py", line 640, in parse
return self._generate_result(m)
File "//anaconda/lib/python2.7/site-packages/parse-1.6.2-py2.7.egg/parse.py", line 678, in _generate_result
fixed_fields[n] = self._type_conversions[n](fixed_fields[n], m)
File "//anaconda/lib/python2.7/site-packages/parse-1.6.2-py2.7.egg/parse.py", line 518, in date_convert
d = datetime(y, m, d, H, M, S, u, tzinfo=tz)
ValueError: hour must be in 0..23

Add beginning / end of string indicator

Would be great to be able to indicate a match at beginning or end of string. E.g. if a pattern matches some records at the beginning and other records in the middle but you only want to target those at the beginning of the string, I don’t see an easy way to do that currently.

PM handling

the AM/PM are optional, and if PM is found then 12 hours will be added to the datetime object's hours amount - even if the hour is greater than 12 (for consistency.)

I realize you've already chosen your poison here, but the "add 12 always" rule either needs an explicit exception, or it should just do away with the "add 12 to PM values." Why?

noon is 12:00PM. 15 minutes after noon is 12:15PM. 12 should NOT be added to these value, or they'll register as later than e.g. 9:24PM.

In fact, 12AM (and 12:15AM) should result in a SUBTRACTION of 12 hours, as it occurs prior to 1AM on the given day.

Both issues can be worked around by assuming in the presence of AM/PM indicators, a subtraction of 12 is done. So, the later adding of 12 hours for having "PM" would restore the timeline.

Silly ancient peoples not inventing the concept of "0". Silly us for continuing to stick with a counter-intuitive notation.

Getting UnicodeEncodeError during installation

We are installing the parse as the dependency of the behave
And this is the traceback that we getting during the installation.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-install-sme0se0x/parse/setup.py", line 10, in <module>
    f.write(__doc__)
UnicodeEncodeError: 'ascii' codec can't encode character '\xeb' in position 13017: ordinal not in range(128)

Cannot use { Character?

So.... you didn't list an escape character... so if my string contains "{", it just doesn't work?

GREAT work!

Add parsing of other things (add requests here)

It'd be neat if it could parse:

URLs, producing the same result as urlparse.urlparse()
email addresses, producing a (realname, email address) pair like email.utils.parseaddr()
IPv4 and IPv6 addresses (producing .. what?)

...?

add support for datetime.strftime directives?

There are some edge cases that this module does not cover, and rather than recreating the wheel I would like to discuss a method to support datetime.strftime directives.

The basic strategy I am imagining would be to preprocess the given string to replace these directives with appropriate format definitions from a hard-coded table so that they are loaded into the named set. These values can then be used to set a datetime on the named set after everything is parsed.

Walkthrough example:

FMT_STR="string with {stuff}, {}, and strftime directives like %Y, %d, and %b"
parse(FMT_STR, "string with myStuff, also_this, and strftime directives like 2018, 03 and Feb").named
>> {
    "stuff": "myStuff",
    "__Y": 2018,
    "__b": "Feb",
    "__d": 3,
    "__datetime": datetime(2018, 2, 3)
}

In the above example the format string would be pre-parsed into something like :

"string with {stuff}, {}, and strftime directives like {:4d}, {:2d}, and {:3w}"

using a mapping like:

map = {
    "%Y": "{:4d}", 
    "%d": "{:2d}", 
    "%b": "{:3w}"
}
for directive, fmt in map.items():
    string = string.replace(directive, fmt)

Does this seem reasonable? I may try an implementation unless there are potential issues with this I am overlooking.

Optional "group_count" attribute for user-defined type converters

The current "parse" module has as small deficiency (or bug).
When user-defined type converter uses regular expression grouping in its pattern (attribute), the extracted result parameters are partly wrong in some params because this group index offset is not considered.
NOTE: This problem occurs only for fixed (unnamed) fields, named fields are OK.

# FILE: parse.py
# NECESSARY CHANGES:
…
        # -- Parser._handle_field()
        ...
        if type in self._extra_types:
            type_converter = self._extra_types[type]
            s = getattr(type_converter, 'pattern', r'.+?')
            # -- EXTENSION: group_count attribute
            group_count = getattr(type_converter, 'group_count', 0)
            self._group_index += group_count
            # -- EXTENSION-END

possible to specify "hungry" format specs?

I think this is more of a question, but it may be an issue as well.

If I format the following format spec string with values:

fmat='{}{}'
values=['a','b']

I of course get this result:

>>> fmat.format(*values)
'ab'

And parse handles this as expected.

>>> list(parse(fmat,'ab'))
['a', 'b']

However, I could get the same result by supplying these arguments (the final arg just being an empty string):

>>> values=['ab','']
>>> fmat.format(*values)
'ab'

The default parse behavior becomes a bit more clear when I do this:

>>> list(parse('{}{}', 'abcdef'))
['a', 'bcdef']

So it seems that the format fields "eat" as little as possible. This definitely makes sense as a default behavior.

If I naively supply the optional third argument to parse, in an attempt to signal that a field should be as hungry as possible and "eat" any string (str) that it finds (if it can), I get the same result:

>>> list(parse('{:s}{:s}', 'abcdef', dict(s=str))
['a', 'bcdef'] # rather than ['abcdef', '']

Any ideas on how to get the last argument to be the empty string using the existing API?

I do understand that an option like this would be tough to implement. For example: how should this be handled?

>>> list(parse('{:s}{:s}{:s}{:d}{:d}', 'abc123', dict(s=str), hungry=True))

Should it return None, e.g.?:

['abc123','','', ERROR, ERROR] # errors because no integers to eat after first string eats everything

Or should it pick out the integers first and leave the leftovers for the other fields (i.e., integers are hungrier than strings)?

['abc','','', 12, 3]

(NOTE in the last example the integers would also be "hungry", but not hungry enough to cause an error; i.e., integers are hungrier than strings, but not more hungry than each other.)

Fixed point should return Decimal

fixed point numbers should return a Decimal

format f is currently used for fixed point numbers, and returns a float. Changing that would break many things, so suggest this new mode is given the upper case letter F.

Result.contains not implemented

Result.__getitem__ is implemented to perform lookup using index or key, but Result doesnt implement __contains__ and doesnt inherit from abc.Mapping

As a result, the following fails:

    if 'foo' in result:
         blah['foo'] = result['foo']

instead the following needs to be used:

    try:
         blah['foo'] = result['foo']
    except KeyError:
        pass

a = "{hello} world"
b = "hello world"
parse(a, 'well hello there world')  # matches
parse(b, 'well hello there world')  # fails

Is there a way to get a to fail without specifying custom formats?

Alternatively, is there a way to override the default format type/matching behavior?

Allow element indexes in field names

The format field names can have element indexes. See the python documentation.

field_name        ::=  arg_name ("." attribute_name | "[" element_index "]")*
arg_name          ::=  [identifier | integer]
attribute_name    ::=  identifier
element_index     ::=  integer | index_string

But parse doesn't support it. Example:

>>> "test: {dict[0]}".format(dict=["red"])
'test: red'
>>> "test: {dict[color0]}".format(dict={"color0":"red"})
'test: red'

>>> parse.parse("test: {dict}", "test: blue")
<Result () {'dict': 'blue'}>
>>> parse.parse("test: {dict[0]}", "test: blue")
None
# return must be <Result () {'dict[0]': 'blue'}>
>>> parse.parse("test: {dict[color0]}", "test: blue")
None
# return must be <Result () {'dict[color0]': 'blue'}>

Percent format with decimal limit isn't supported

The following works: {field:%} but adding a decimal limit like {field:.2%} does not and throws an exception:

  File "/usr/local/lib/python3.2/dist-packages/parse.py", line 983, in parse
    return Parser(format, extra_types=extra_types).parse(string)
  File "/usr/local/lib/python3.2/dist-packages/parse.py", line 586, in __init__
    self._expression = self._generate_expression()
  File "/usr/local/lib/python3.2/dist-packages/parse.py", line 717, in _generate_expression
    e.append(self._handle_field(part))
  File "/usr/local/lib/python3.2/dist-packages/parse.py", line 770, in _handle_field
    format = extract_format(format, self._extra_types)
  File "/usr/local/lib/python3.2/dist-packages/parse.py", line 562, in extract_format
    raise ValueError('type %r not recognised' % type)

Unsure whether this is a bug or intended.

Thanks for the extremely quick fix on the logging issue by the way!

r1chardj0n3s / parse Goto Github PK

parse's People

Stargazers

Watchers

Forkers

parse's Issues

Edit

Original Question

Use Case

Recommend Projects

Recommend Topics

Recommend Org