r1chardj0n3s / parse Goto Github PK
View Code? Open in Web Editor NEWParse strings using a specification based on the Python format() syntax.
Home Page: http://pypi.python.org/pypi/parse
License: MIT License
Parse strings using a specification based on the Python format() syntax.
Home Page: http://pypi.python.org/pypi/parse
License: MIT License
Can't install parse to virtualenv in Debian/tesing.
1.8.0 is ok, 1.8.1 and 1.8.2 does not install.
/tmp$ virtualenv -p python3 virtualenv
Already using interpreter /usr/bin/python3
Using base prefix '/usr'
New python executable in /tmp/virtualenv/bin/python3
Also creating executable in /tmp/virtualenv/bin/python
Installing setuptools, pkg_resources, pip, wheel...done.
/tmp$ . virtualenv/bin/activate
(virtualenv) /tmp$ pip install parse
Already using interpreter /usr/bin/python3
Using base prefix '/usr'
New python executable in /tmp/virtualenv/bin/python3
Also creating executable in /tmp/virtualenv/bin/python
Installing setuptools, pkg_resources, pip, wheel...done.
Collecting parse
Using cached parse-1.8.2.tar.gz
Building wheels for collected packages: parse
Running setup.py bdist_wheel for parse ... error
Complete output from command /tmp/virtualenv/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-kz7sa7u4/parse/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpdxlborhzpip-wheel- --python-tag cp35:
running bdist_wheel
running build
running build_py
creating build
creating build/lib
copying parse.py -> build/lib
installing to build/bdist.linux-x86_64/wheel
running install
running install_lib
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/wheel
copying build/lib/parse.py -> build/bdist.linux-x86_64/wheel
running install_egg_info
running egg_info
writing parse.egg-info/PKG-INFO
writing top-level names to parse.egg-info/top_level.txt
writing dependency_links to parse.egg-info/dependency_links.txt
reading manifest file 'parse.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'parse.egg-info/SOURCES.txt'
Copying parse.egg-info to build/bdist.linux-x86_64/wheel/parse-1.8.2-py3.5.egg-info
running install_scripts
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-kz7sa7u4/parse/setup.py", line 35, in <module>
'License :: OSI Approved :: BSD License',
File "/usr/lib/python3.5/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/lib/python3.5/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/tmp/virtualenv/lib/python3.5/site-packages/wheel/bdist_wheel.py", line 215, in run
self.run_command('install')
File "/usr/lib/python3.5/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/tmp/virtualenv/lib/python3.5/site-packages/setuptools/command/install.py", line 61, in run
return orig.install.run(self)
File "/usr/lib/python3.5/distutils/command/install.py", line 595, in run
self.run_command(cmd_name)
File "/usr/lib/python3.5/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/tmp/virtualenv/lib/python3.5/site-packages/setuptools/command/install_scripts.py", line 17, in run
import setuptools.command.easy_install as ei
File "/tmp/virtualenv/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 49, in <module>
from setuptools.py27compat import rmtree_safe
File "/tmp/virtualenv/lib/python3.5/site-packages/setuptools/py27compat.py", line 7, in <module>
import six
ImportError: No module named 'six'
>>> '"{}"?'.format("teststr")
'"teststr"?'
>>> parse('"{}"?', '"teststr"?')
<Result ('teststr"?',) {}>
I would like to match only "teststr". Am I doing something wrong?
I know that parse.search
should be used to match a pattern at any position in the string whereas parse.parse
has to match the string exactly.
The following issue came up some days ago in the radish project: radish-bdd/radish#106
Especially this comment might be interesting: radish-bdd/radish#106 (comment)
However, this gives as interesting outcome:
>>> patt = parse.compile('I have a {}') >>> patt.search('I have a apple') <Result ('a',) {}> >>> patt.parse('I have a apple') <Result ('apple',) {}>and
>>> patt = parse.compile('I {} a {}') >>> patt.parse('I have a apple') <Result ('have', 'apple') {}> >>> patt.search('I have a apple') <Result ('have', 'a') {}>
As you can see search
and parse
are giving different results. In this example it indeed be possible to just use parse
- but in a lot of cases we use this library for is not.
Is this intended behavior?
Adding @rscrimojr
I want to parse JSON string.
But I got an error and I can not parse.
from parse import *
pattern = '{"name": {data.name}, "age": {data.age}}'
print(parse(pattern, '{"name": "test", "age": 25}'))
result
ValueError: format spec 'name":' not recognised
Is it impossible to parse a JSON character string with any dotted names?
More than just match! findall() should be an iterator.
Format supports text aligning, but parser does not. It'd be nice if it did :-)
Example:
from parse import parse
fmt = "{:>6}{:>7}"
print(("three", "four") == parse(fmt, fmt.format("three", "four"))) #this should be True
The parser is apparently in an infinite loop.
I think it should at least say that this format is not parsable.
At best, it should do:
parse("{n} {n}", "x x") -> {"n", "x"}
parse("{n} {n}", "x y") -> None
I was trying to parse continous string having letters and numbers by the easiest way:
parse('{:>w}{:g}{:w}{:g}{:w}', ' G3.80XA5.2M')
<Result ('G', 3.8, 'XA', 5.2, 'M') {}>
parse('{:>w}{:g}{:w}{:g}{:w}', ' G3.80XA4.2M')
<Result ('G', 3.8, 'XA', 4.2, 'M') {}>
as far so good, but
parse('{:>w}{:g}{:w}{:g}{:w}', ' G3.80XA04.2M')
<Result ('G', 3.8, 'XA0', 4.2, 'M') {}>
parse('{:>w}{:g}{:w}{:g}{:w}', ' G3.80XA44.2M')
<Result ('G', 3.8, 'XA4', 4.2, 'M') {}>
parse('{:>w}{:g}{:w}{:g}{:w}', ' G3.80XA40M')
<Result ('G', 3.8, 'XA4', 0.0, 'M') {}>
changing input string a little bit I was getting bad results.
I am missing something?
Igor
Reference link : https://pypi.python.org/pypi/parse
data = """ 0 2 PRLI Def 666E00 B A 0 0
1 2 PRLI Def 017000 B A 0 0"""
for x in parse.findall("{Id:^d} {Index:^d} {State:^} {Emulation:^} {ID:^} {NN:^} {PN:^} {ABTS:^d} {SRR:^d}\n", data):
print ("====", x.named)
Above code will and and gets hang(stuck) after printing first line.
But if we reduce number of columns then it will work fine.
also if there is difference in number of columns in data and pattern string then findall will hang.
Ctrl+F "unicode" on README.rst here and on https://pypi.python.org/pypi/parse doesn't find anything. parse appears to hang indefinitely on unicode strings (as with e.g. from __future__ import unicode_literals
). Is that the case? Is there any expectation it'll work in the future?
As noted in PR #71
I want to use this project but can't because there is no license. No license on a project means all rights are reserved the author of the code, that prevents any use of code by other people. Please consider adding a license. You can use https://choosealicense.com/ or https://tldrlegal.com/ to determine what license is right for you.
Thanks!
Hey, we are vendoring this library over in https://github.com/pypa/pipenv and we are automating our vendoring process. As part of the broader distribution process we are trying to handle our current licensing issues by including explicit license files for each of our vendored dependencies. Would you be receptive to adding an additional LICENSE file with the text of the license of your software (MIT license I believe?)
If so I don't mind tossing a PR in this direction
parse
uses currently the same field schema like the str.format()
function.
But parsing problems are often slightly different (and more complicated) compared to output formatting problems. I stumbled over a use case where it would be rather nice to have an optional cardinality field after the type field.
EXAMPLE:
#!python
schema = "I met {person:Person?} ..." #< OPTIONAL DATA: Zero or one cardinality
schema = "I am meeting with {persons:Person+}" #< MANY: One or more cardinality
schema = "I am meeting with {persons:Person*}" #< MANY: Zero or more cardinality
The "many solution" is basically a comma-separated list of items for this datatype, like:
"I am meeting with Alice, Bob, Charly"
I have a canned, working solution (if my pull-request for the pattern
attribute is accepted) that will allow to solve the underlying cardinality problem above: Generating a regular expression for the cardinality by using the regular expression of a user-defined (or built-in) data type.
Hello,
Would it be possible to have the package available via pip ?
I used pip install git+https://github.com/r1chardj0n3s/parse
that works well but something like pip install parse
would be nice !
Thanks
In [22]: p = compile('I make a POST request to "{url_path_segment}"')
In [23]: p.parse('I make a POST request to "{url_path}" with file "{filename}" as "{key}"')
Out[23]: <Result () {'url_path_segment': '{url_path}" with file "{filename}" as "{key}'}>
expected
None
I know it may sound tiresome after two issues filled because of license already. Could you please still consider to add LICENSE file also in pypi tarball?
After looking a bit closer at the Custom Type Conversions section of the pypi page, I think I can probably get things working the way I need using these. The page contains this statement:
Your custom type conversions may override the builtin types if you supply one with the same identifier.
Is there an exception I can raise, or a method I can call, in the custom type conversion so as to make it fall back on the default behavior for the type conversion (either the one supplied, or a modified version of the one that was supplied)?
Is there a way to force the API to fall back on to the default behavior for formatting types when they have been overridden by extra_types
? Below is an example of what I mean.
Desired float
parsing behavior:
>>> parse('{: >f}{: >f}', ' 1.025 1.033')
<Result (1.025, 1.033) {}> # expected result
Actual behavior (when overridden):
>>> parse('{: >f}{: >f}', ' 1.025 1.033', extra_types=dict(f=float))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\ricky\Anaconda3\lib\site-packages\parse.py", line 1117, in parse
return Parser(format, extra_types=extra_types).parse(string, evaluate_result=evaluate_result)
File "C:\Users\ricky\Anaconda3\lib\site-packages\parse.py", line 699, in parse
return self.evaluate_result(m)
File "C:\Users\ricky\Anaconda3\lib\site-packages\parse.py", line 766, in evaluate_result
fixed_fields[n] = self._type_conversions[n](fixed_fields[n], m)
File "C:\Users\ricky\Anaconda3\lib\site-packages\parse.py", line 882, in f
return type_converter(string)
ValueError: could not convert string to float: '.025 1.033'
I have a need for a couple of custom numerical types that can handle empty strings and spaces in a manner similar to zeroes. I've implemented them something like this:
class Blank():
def __new__(cls, value):
try:
return super().__new__(cls, value)
except ValueError:
if value == '' or value == ' ':
return super().__new__(cls, 0.0)
else:
raise
def __str__(self):
return '' if self==0 else super().__str__()
def __format__(self, spec):
if (spec.endswith('d') or spec.endswith('f') or spec.endswith('n')) and self==0:
spec = spec[:-1]+'s'
return format('',spec)
else:
return super().__format__(spec)
class BlankInt(Blank, int):
'''An int that prints blank when zero.'''
pass
class BlankFloat(Blank, float):
'''A float that prints blank when zero.'''
pass
This seems to partially work the way I had in mind:
>>> from parse import parse
>>> parse('{: >5f}', ' ', extra_types=dict(f=BlankFloat))
<Result (0.0,) {}>
>>> parse('{: >5f}'*5, ' ', extra_types=dict(f=BlankFloat))
<Result (0.0, 0.0, 0.0, 0.0, 0.0) {}>
However, this doesn't work (since float
doesn't work, either and BlankFloat
is a subclass):
>>> parse('{: >f}{: >f}', ' 1.025 1.033', extra_types=dict(f=BlankFloat))
ValueError: could not convert string to float: '.025 1.033'
The type converter function signature should be changed to:
def type_converter(text, match=None, match_start=0):
# -- NEW: match_start : int = 0,
# refers to the first group in the match object for a field where the converter is used.
pass
This change would allow to provide a generic Parser without type knowledge.
In addition, user-defined types should also provide this signature. Currently, only the first parameter is supported there which prevents to use complexes type converter cases.
Due to backward compatibility reasons, the old user-defined signature should be supported, too, at least for some time.
Hi there.
I'm having a problem with the 1.6.6 version in both python 2.7.9 and python 3.4.3.
I was able to reproduce it with the following example:
>>> import parse
>>> parse.parse("blablabla {x:d}", "blablabla 12")
<Result () {'x': 12}>
>>> parse.parse("blablabla {x:d}", "blablabla jdhhd")
>>> parse.parse("blablabla {x:d}", "blablabla cdc")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/calsaverini/.envs/default/lib/python3.4/site-packages/parse.py", line 1044, in parse
return Parser(format, extra_types=extra_types).parse(string)
File "/home/calsaverini/.envs/default/lib/python3.4/site-packages/parse.py", line 681, in parse
return self._generate_result(m)
File "/home/calsaverini/.envs/default/lib/python3.4/site-packages/parse.py", line 731, in _generate_result
m)
File "/home/calsaverini/.envs/default/lib/python3.4/site-packages/parse.py", line 398, in f
return sign * int(string, base)
ValueError: invalid literal for int() with base 10: ''
Is this a bug or am I missing some kind of edge case when {:d}
might match cdc
that I'm not aware of?
Thanks for your help.
Certain patterns can still result in a match even when there are unmatched braces. For example:
>>> parse("{who.txt", "hello")
<Result () {'who.tx': 'hello'}>
Even though there is no closing }
, parse
assumes the final character is the closing brace and matches the pattern accordingly. In this case, I'd expect parse
to return None
since there is no direct match.
Example:
>>> pattern = '{dir1}/{dir2}'
>>> data = 'root/parent/subdir'
>>> parse(pattern, data).named
{'dir1': 'root', 'dir2': 'parent/subdir'}
But {'dir1': 'root/parent', 'dir2': 'subdir'}
is also fitting the pattern. Is this behaviour is reliable, or should it be considered implementation detail? I couldn't find it in the docs anywhere.
Is there anyway to coerce the result one way or the other?
Mostly just curious, not an actual bug.
I'm working on a (tangentially) related idea of mine to rewrite format strings.
https://docs.python.org/3/library/string.html#string.Formatter.parse
I don't know if this is a bug or just me but here is my use case. I have two items that are concatenated in a filename : a date (YYYMMDD) and a 2-digit string (model_run). I can't find a way to specify a pattern that will allow parsing of the two concatenated elements :
>>> filename_pattern = 'Some_string_{wx_variable}_ps2.5km_{YYYYMMDD}{model_run}_P{forecast_hour}-00.extension'
>>> afile = 'Some_string_A_variable_ps2.5km_1999121100_P012-00.extension'
>>> r = parse(filename_pattern, afile)
>>> r
<Result () {'forecast_hour': '012', 'wx_variable': 'A_variable', 'model_run': '999121100', 'YYYYMMDD': '1'}>
>>> r.named['YYYYMMDD']
'1'
>>> r.named['model_run']
'999121100'
How can I get the parsed items I expect, i.e. r.named['YYYYMMDD'] = '19991211'
and r.named['model_run'] = '00'
?
I would like to avoid fiddling with the filename_pattern before parsing (e.g. add some format options like width
, if that is even possible) because that would defeat the purpose of having a file pattern to begin with IMO.
Thanx !
Literal text with a |
symbol in it is not handled correctly:
>>> search('| {:d}', '| 10')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mj/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/parse.py", line 1041, in search
return Parser(format, extra_types=extra_types).search(string, pos, endpos)
File "/Users/mj/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/parse.py", line 678, in search
return self._generate_result(m)
File "/Users/mj/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/parse.py", line 699, in _generate_result
fixed_fields[n] = self._type_conversions[n](fixed_fields[n], m)
File "/Users/mj/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/parse.py", line 375, in f
if string[0] == '-':
TypeError: 'NoneType' object has no attribute '__getitem__'
Escaping the |
character explicitly makes it work:
>>> search('\\| {:d}', '| 10')
<Result (10,) {}>
Looking over the code base it should be trivial to fix by adding |
to he REGEX_SAFETY
pattern. However, I do wonder why re.escape()
isn't used instead to escape regular expression metacharacters instead here. Am I missing something, does re.escape()
escape too much?
For some reason, parsing certain hex values with leading zeroes produces buggy and unreliable results.
Here's a quick demonstration using the latest version of parse (1.12.0):
>>> from parse import parse
>>> parse.parse('${:x}','$0b67')
<Result (0,) {}>
>>> parse.parse('${:x}','$0B67')
<Result (0,) {}>
>>> parse.parse('${:x}','$0B6')
<Result (0,) {}>
>>> parse.parse('${:x}','$0B')
<Result (11,) {}>
>>> parse.parse('${:x}','$B67')
<Result (2919,) {}>
It appears that parse() recognizes, but fails to parse certain combinations of hex digits, returning zero instead.
Interestingly, whether parsing fails depends on the digit next to zero.
Based on my testing, it always happens with numbers starting with "0Bxxx" (excluding "0B"; I know there was a separate [closed] bug report on that one, but it appears that the underlying issue is still there).
import parse
def a(a):
return a
a.pattern = '((3))'
parse.parse('{:a}q{:ti}', '3q2017-12-31', dict(a=a))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-9cc16e0f1e39> in <module>()
3 return a
4 a.pattern = '((3))'
----> 5 parse.parse('{:a}q{:ti}', '3q2017-12-31', dict(a=a))
~/miniconda3/lib/python3.6/site-packages/parse.py in parse(format, string, extra_types, evaluate_result)
1115 In the case there is no match parse() will return None.
1116 '''
-> 1117 return Parser(format, extra_types=extra_types).parse(string, evaluate_result=evaluate_result)
1118
1119
~/miniconda3/lib/python3.6/site-packages/parse.py in parse(self, string, evaluate_result)
697
698 if evaluate_result:
--> 699 return self.evaluate_result(m)
700 else:
701 return Match(self, m)
~/miniconda3/lib/python3.6/site-packages/parse.py in evaluate_result(self, m)
764 for n in self._fixed_fields:
765 if n in self._type_conversions:
--> 766 fixed_fields[n] = self._type_conversions[n](fixed_fields[n], m)
767 fixed_fields = tuple(fixed_fields[n] for n in self._fixed_fields)
768
~/miniconda3/lib/python3.6/site-packages/parse.py in date_convert(string, match, ymd, mdy, dmy, d_m_y, hms, am, tz, mm, dd)
482 d=groups[dd]
483 elif ymd is not None:
--> 484 y, m, d = re.split('[-/\s]', groups[ymd])
485 elif mdy is not None:
486 m, d, y = re.split('[-/\s]', groups[mdy])
ValueError: not enough values to unpack (expected 3, got 1)
This doesn't work:
>>> parse.parse('on {date:ti}', 'on 2012-09-17')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "build\bdist.win-amd64\egg\parse.py", line 849, in parse
File "build\bdist.win-amd64\egg\parse.py", line 526, in parse
File "build\bdist.win-amd64\egg\parse.py", line 572, in _generate_result
KeyError: 'date'
This does work:
>>> parse.parse('on {:ti}', 'on 2012-09-17')
<Result (datetime.datetime(2012, 9, 17, 0, 0),) {}>
Doesn't matter what name you use or what else is in the pattern, it always throws a KeyError.
While trying to parse a list of HID usages, I have a field that is represented by '0B' that I'd like to convert to the value 11.
When parsing the values, I am using parse.parse('{value:x}\t{name}', line)
, so I am specifying that I want the int value to be an hex. However, I am hitting https://github.com/r1chardj0n3s/parse/blob/master/parse.py#L441 (in int_convert
), and parse decides that my hex value is a base 2 one, and returns 0
.
One solution could be to enforce the size to be at least 3 in int_convert if the value starts with a '0' and a known prefix. But I think if the users provides the base for the conversion, the int_convert
function should not try to be smart and simply use the provided base.
Just got bitten by this, I think it's a bug...
>>> parse('wat {:d} wat', 'wat 12345 wat')
<Result (12345,) {}>
>>> parse('wat {:d} wat', 'wat 12f45 wat')
<Result (1245,) {}>
>>> parse('wat {:d} wat', 'wat 12g45 wat')
>>> parse('wat {:d} wat', 'wat ff3ff wat')
<Result (3,) {}>
>>> parse('wat {:d} wat', 'wat fffff wat')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-54-91053c27d470> in <module>()
----> 1 parse('wat {:d} wat', 'wat fffff wat')
/home/wglenn/.virtualenvs/scratch/lib/python3.6/site-packages/parse.py in parse(format, string, extra_types, evaluate_result)
1113 In the case there is no match parse() will return None.
1114 '''
-> 1115 return Parser(format, extra_types=extra_types).parse(string, evaluate_result=evaluate_result)
1116
1117
/home/wglenn/.virtualenvs/scratch/lib/python3.6/site-packages/parse.py in parse(self, string, evaluate_result)
695
696 if evaluate_result:
--> 697 return self.evaluate_result(m)
698 else:
699 return Match(self, m)
/home/wglenn/.virtualenvs/scratch/lib/python3.6/site-packages/parse.py in evaluate_result(self, m)
762 for n in self._fixed_fields:
763 if n in self._type_conversions:
--> 764 fixed_fields[n] = self._type_conversions[n](fixed_fields[n], m)
765 fixed_fields = tuple(fixed_fields[n] for n in self._fixed_fields)
766
/home/wglenn/.virtualenvs/scratch/lib/python3.6/site-packages/parse.py in f(string, match, base)
412 chars = CHARS[:base]
413 string = re.sub('[^%s]' % chars, '', string.lower())
--> 414 return sign * int(string, base)
415 return f
416
ValueError: invalid literal for int() with base 10: ''
The suggested syntax from jenisys is to suffix the type with "?" which I believe is reasonable. Thus:
{honorific:s?} {given:s} {sur:s}
would match both of:
"Mr Richard Jones"
"Jens Engels"
The "honorific" element in the result object would have the value None.
Currently, the Parser class always applies "re.IGNORECASE" internally where needed. This may not be always desired. Therefore, it would be best, if the constructor of Parser would allow to provide own "re flags" or disable the "re.IGNORECASE" flag.
The "setup.py" file currently still states that python2.5 and python2.6 are supported.
This may be true, but can currently not be proven because the test suite contains at least tests that run only on python2.7 and newer.
EXAMPLE:
$ pytest
platform ... -- Python 2.6.9, pytest-3.2.5, ...
...
self = <test_parse.TestParseType testMethod=test_decimal_value>
def test_decimal_value(self):
value = Decimal('5.5')
> str_ = 'test {}'.format(value)
E ValueError: zero length field name in format
NOTE: string-format without index or named args work only for Python2.7.x or newer AFAIK.
POSSIBLE SOLUTIONS:
When using the :ta parsing format, the hour between Noon and 13:00 (aka 1:00PM) generates a ValueError in datetime because hour must be in the range 0..23.
Example:
parse.version
'1.6.2'parse.parse('Meet at {:tg}', 'Meet at 1/2/2011 1:00 PM')
<Result (datetime.datetime(2011, 2, 1, 13, 0),) {}>
example from the documentation. Now, changing this to 12:45 pm doesn't work.
parse.parse('Meet at {:tg}', 'Meet at 1/2/2011 12:45 PM')
Traceback (most recent call last):
File "", line 1, in
File "//anaconda/lib/python2.7/site-packages/parse-1.6.2-py2.7.egg/parse.py", line 983, in parse
return Parser(format, extra_types=extra_types).parse(string)
File "//anaconda/lib/python2.7/site-packages/parse-1.6.2-py2.7.egg/parse.py", line 640, in parse
return self._generate_result(m)
File "//anaconda/lib/python2.7/site-packages/parse-1.6.2-py2.7.egg/parse.py", line 678, in _generate_result
fixed_fields[n] = self._type_conversions[n](fixed_fields[n], m)
File "//anaconda/lib/python2.7/site-packages/parse-1.6.2-py2.7.egg/parse.py", line 518, in date_convert
d = datetime(y, m, d, H, M, S, u, tzinfo=tz)
ValueError: hour must be in 0..23
Would be great to be able to indicate a match at beginning or end of string. E.g. if a pattern matches some records at the beginning and other records in the middle but you only want to target those at the beginning of the string, I don’t see an easy way to do that currently.
the AM/PM are optional, and if PM is found then 12 hours will be added to the datetime object's hours amount - even if the hour is greater than 12 (for consistency.)
I realize you've already chosen your poison here, but the "add 12 always" rule either needs an explicit exception, or it should just do away with the "add 12 to PM values." Why?
noon is 12:00PM. 15 minutes after noon is 12:15PM. 12 should NOT be added to these value, or they'll register as later than e.g. 9:24PM.
In fact, 12AM (and 12:15AM) should result in a SUBTRACTION of 12 hours, as it occurs prior to 1AM on the given day.
Both issues can be worked around by assuming in the presence of AM/PM indicators, a subtraction of 12 is done. So, the later adding of 12 hours for having "PM" would restore the timeline.
Silly ancient peoples not inventing the concept of "0". Silly us for continuing to stick with a counter-intuitive notation.
We are installing the parse as the dependency of the behave
And this is the traceback that we getting during the installation.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-sme0se0x/parse/setup.py", line 10, in <module>
f.write(__doc__)
UnicodeEncodeError: 'ascii' codec can't encode character '\xeb' in position 13017: ordinal not in range(128)
So.... you didn't list an escape character... so if my string contains "{", it just doesn't work?
GREAT work!
It'd be neat if it could parse:
...?
There are some edge cases that this module does not cover, and rather than recreating the wheel I would like to discuss a method to support datetime.strftime
directives.
The basic strategy I am imagining would be to preprocess the given string to replace these directives with appropriate format definitions from a hard-coded table so that they are loaded into the named
set. These values can then be used to set a datetime on the named
set after everything is parsed.
Walkthrough example:
FMT_STR="string with {stuff}, {}, and strftime directives like %Y, %d, and %b"
parse(FMT_STR, "string with myStuff, also_this, and strftime directives like 2018, 03 and Feb").named
>> {
"stuff": "myStuff",
"__Y": 2018,
"__b": "Feb",
"__d": 3,
"__datetime": datetime(2018, 2, 3)
}
In the above example the format string would be pre-parsed into something like :
"string with {stuff}, {}, and strftime directives like {:4d}, {:2d}, and {:3w}"
using a mapping like:
map = {
"%Y": "{:4d}",
"%d": "{:2d}",
"%b": "{:3w}"
}
for directive, fmt in map.items():
string = string.replace(directive, fmt)
Does this seem reasonable? I may try an implementation unless there are potential issues with this I am overlooking.
The current "parse" module has as small deficiency (or bug).
When user-defined type converter uses regular expression grouping in its pattern (attribute), the extracted result parameters are partly wrong in some params because this group index offset is not considered.
NOTE: This problem occurs only for fixed (unnamed) fields, named fields are OK.
# FILE: parse.py
# NECESSARY CHANGES:
…
# -- Parser._handle_field()
...
if type in self._extra_types:
type_converter = self._extra_types[type]
s = getattr(type_converter, 'pattern', r'.+?')
# -- EXTENSION: group_count attribute
group_count = getattr(type_converter, 'group_count', 0)
self._group_index += group_count
# -- EXTENSION-END
I think this is more of a question, but it may be an issue as well.
If I format the following format spec string with values
:
fmat='{}{}'
values=['a','b']
I of course get this result:
>>> fmat.format(*values)
'ab'
And parse
handles this as expected.
>>> list(parse(fmat,'ab'))
['a', 'b']
However, I could get the same result by supplying these arguments (the final arg just being an empty string):
>>> values=['ab','']
>>> fmat.format(*values)
'ab'
The default parse
behavior becomes a bit more clear when I do this:
>>> list(parse('{}{}', 'abcdef'))
['a', 'bcdef']
So it seems that the format fields "eat" as little as possible. This definitely makes sense as a default behavior.
If I naively supply the optional third argument to parse
, in an attempt to signal that a field should be as hungry as possible and "eat" any string (str
) that it finds (if it can), I get the same result:
>>> list(parse('{:s}{:s}', 'abcdef', dict(s=str))
['a', 'bcdef'] # rather than ['abcdef', '']
Any ideas on how to get the last argument to be the empty string using the existing API?
I do understand that an option like this would be tough to implement. For example: how should this be handled?
>>> list(parse('{:s}{:s}{:s}{:d}{:d}', 'abc123', dict(s=str), hungry=True))
Should it return None
, e.g.?:
['abc123','','', ERROR, ERROR] # errors because no integers to eat after first string eats everything
Or should it pick out the integers first and leave the leftovers for the other fields (i.e., integers are hungrier than strings)?
['abc','','', 12, 3]
(NOTE in the last example the integers would also be "hungry", but not hungry enough to cause an error; i.e., integers are hungrier than strings, but not more hungry than each other.)
fixed point numbers should return a Decimal
format f
is currently used for fixed point numbers, and returns a float. Changing that would break many things, so suggest this new mode is given the upper case letter F
.
Result.__getitem__
is implemented to perform lookup using index or key, but Result
doesnt implement __contains__
and doesnt inherit from abc.Mapping
As a result, the following fails:
if 'foo' in result:
blah['foo'] = result['foo']
instead the following needs to be used:
try:
blah['foo'] = result['foo']
except KeyError:
pass
Currently only English days/months are recognised. Could be better.
Currently parse uses logging.debug()
; this sends the message to the root logger making it hard to filter. It would be great if parse could use logging.getLogger(__name__).debug()
or log.debug()
(log is assigned near the start but never used).
Thanks for that great module!
Is there a way to search for a string but not generate the results?
This would be especially useful when I have custom type converters but want to evaluate those later. I just want to make search my string is valid.
Consider the patterns:
a = "{hello} world"
b = "hello world"
parse(a, 'well hello there world') # matches
parse(b, 'well hello there world') # fails
Is there a way to get a to fail without specifying custom formats?
Alternatively, is there a way to override the default format type/matching behavior?
The format field names can have element indexes. See the python documentation.
field_name ::= arg_name ("." attribute_name | "[" element_index "]")*
arg_name ::= [identifier | integer]
attribute_name ::= identifier
element_index ::= integer | index_string
But parse doesn't support it. Example:
>>> "test: {dict[0]}".format(dict=["red"])
'test: red'
>>> "test: {dict[color0]}".format(dict={"color0":"red"})
'test: red'
>>> parse.parse("test: {dict}", "test: blue")
<Result () {'dict': 'blue'}>
>>> parse.parse("test: {dict[0]}", "test: blue")
None
# return must be <Result () {'dict[0]': 'blue'}>
>>> parse.parse("test: {dict[color0]}", "test: blue")
None
# return must be <Result () {'dict[color0]': 'blue'}>
The following works: {field:%}
but adding a decimal limit like {field:.2%}
does not and throws an exception:
File "/usr/local/lib/python3.2/dist-packages/parse.py", line 983, in parse
return Parser(format, extra_types=extra_types).parse(string)
File "/usr/local/lib/python3.2/dist-packages/parse.py", line 586, in __init__
self._expression = self._generate_expression()
File "/usr/local/lib/python3.2/dist-packages/parse.py", line 717, in _generate_expression
e.append(self._handle_field(part))
File "/usr/local/lib/python3.2/dist-packages/parse.py", line 770, in _handle_field
format = extract_format(format, self._extra_types)
File "/usr/local/lib/python3.2/dist-packages/parse.py", line 562, in extract_format
raise ValueError('type %r not recognised' % type)
Unsure whether this is a bug or intended.
Thanks for the extremely quick fix on the logging issue by the way!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.