taocpp / pegtl Goto Github PK
View Code? Open in Web Editor NEWParsing Expression Grammar Template Library
License: Boost Software License 1.0
Parsing Expression Grammar Template Library
License: Boost Software License 1.0
There seems to be some internal wrapping mechanism when working with large input strings. I see several lines in the trace which look like this:
pegtl: success flags 1 rule 1 nest 1 at 15,23 expression ...
pegtl: start flags 2 rule 1 nest 1 at 1,1 expression ...
I'm occasionally experiencing a bug where a rule which works otherwise doesn't work immediately following one of these discontinuities. I can try to work up a minimal example if that would help.
PEGTL
uses the MAP_FILE compatibility flag - according to the Linux standard this is ignored. Worth removing?
Not sure whether it's a clang's upstream bug or the lib's one, but anyway it doesn't build.
System: ArchLinux CURRENT
$ clang++ -v
clang version 4.0.0 (tags/RELEASE_400/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-pc-linux-gnu/7.1.1
Found candidate GCC installation: /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/7.1.1
Found candidate GCC installation: /usr/lib/gcc/x86_64-pc-linux-gnu/7.1.1
Found candidate GCC installation: /usr/lib64/gcc/x86_64-pc-linux-gnu/7.1.1
Selected GCC installation: /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/7.1.1
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc-multilib/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --enable-libmpx --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --enable-multilib --disable-werror --enable-checking=release
Thread model: posix
gcc version 7.1.1 20170528 (GCC)
// test.cpp
#include <tao/pegtl.hpp>
int main() {
return 0;
}
-std= |
g++ | clang++ |
---|---|---|
c++11 |
✅ | ✅ |
c++14 |
✅ | ✅ |
c++1z |
✅ | ❌ |
c++17 |
✅ | (N/A) |
clang++ -std=c++1z test.cpp
:
In file included from test.cpp:1:
In file included from PEGTL/include/tao/pegtl.hpp:10:
In file included from PEGTL/include/tao/pegtl/ascii.hpp:8:
In file included from PEGTL/include/tao/pegtl/eol.hpp:28:
In file included from PEGTL/include/tao/pegtl/internal/eol.hpp:11:
In file included from PEGTL/include/tao/pegtl/internal/../analysis/generic.hpp:9:
In file included from PEGTL/include/tao/pegtl/internal/../analysis/grammar_info.hpp:7:
In file included from /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/7.1.1/../../../../include/c++/7.1.1/map:60:
In file included from /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/7.1.1/../../../../include/c++/7.1.1/bits/stl_tree.h:72:
In file included from /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/7.1.1/../../../../include/c++/7.1.1/bits/node_handle.h:39:
/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/7.1.1/../../../../include/c++/7.1.1/optional:1032:27: error: use of class template 'optional' requires template arguments
template <typename _Tp> optional(_Tp) -> optional<_Tp>;
^
/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/7.1.1/../../../../include/c++/7.1.1/optional:451:11: note: template is declared here
class optional
^
/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/7.1.1/../../../../include/c++/7.1.1/optional:1032:40: error: expected ';' at end of declaration
template <typename _Tp> optional(_Tp) -> optional<_Tp>;
^
/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/7.1.1/../../../../include/c++/7.1.1/optional:1032:41: error: cannot use arrow operator on a type
template <typename _Tp> optional(_Tp) -> optional<_Tp>;
^
3 errors generated.
Thank you for maintaining PEGTL.
My understanding is that currently file_input and read_input don't work with Unicode filenames on Windows? Because I see that internal::file_reader uses fopen(_s) not _wfopen(_s).
If that's true, this is a feature request to add support for Unicode filenames.
Hello again,
I have an error while compiling the example s_expression_2 that I don't understand :
c++ -I. -std=c++11 -stdlib=libc++ -pedantic -Wall -Wextra -Werror -O3 examples/s_expression_2.cc -o build/examples/s_expression_2
examples/s_expression_2.cc:74:32: error: no matching member function for call to 'parse'
read_parser( fn, in ).parse< main, action >( f2 );
~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
./pegtl/read_parser.hh:39:12: note: candidate template ignored: invalid explicitly-specified argument
for template parameter 'Action'
void parse( States && ... st )
^
1 error generated.
make: *** [build/examples/s_expression_2] Error 1
pegtl::input
does not seem to exist.
gcc do not support _assume .
Is there any chance to enable actions in the at rule?
I have tried to use
enable< at<my_rule> >
and
at< enable<my_rule> >
but for both grammar expressions the corresponding action is never called...
The calculator does not yield the same result for : 100-100x3+111 (which gives -311) and for : 100-(100x3)+11 (which gives -89)
Hi, I appear to have found a slight bug in internal::file_reader.
If a file with multiple newlines is created on Windows, file_reader fails with "unable to fread() file size errno 0". Creating the same file on Linux, or converting with 'dos2unix', causes file_reader to successfully parse the file. Conversely, a single line Windows file will also parse successfully. The core of the problem seems to be due to Windows using "\r\n" to indicate newlines, or more specifically, the different ways fseek/ftell and fread count the "\r\n" sequence.
In file_reader::read(), fread is given the number of characters it is expected to read. That number is calculated in file_reader::size(), which uses fseek/ftell to count the number of characters in the file (counting "\r\n" as two characters). However, std::fread automatically converts all "\r\n" to '\n', causing it to actually read in a smaller memory chunk than told. Since it is unable to read in the specified number of characters, it returns 0, causing the if check to fail and throwing the error shown above. In all other respects, the read completed successfully.
Also, if you stop execution right before the throw and look at the constructed string, the last N characters of the string are '\0', with N being the exact number of newlines in the file.
I have been using parse_tree.hpp to generate ASTs but there are some rules that always get duplicated in the child nodes. Two consecutive nodes represent exactly the same node.
Is that normal? Am I doing something wrong?
Example:
std::unique_ptr<parse_tree::node> ast = parse_tree::parse<pop::grammar, pop::store>(in);
print_node( *ast );
Output (summarized):
ROOT
pop::cpp_function at /LevyProblem.pop:15:0(220)
pop::function_keyword "minimize" at /LevyProblem.pop:15:0(220) to /LevyProblem.pop:15:8(228)
pop::var_name "levyFunction" at /LevyProblem.pop:15:9(229) to /LevyProblem.pop:15:21(241)
pop::cpp_brackets "{ expression; }" at /LevyProblem.pop:15:24(244) to /LevyProblem.pop:38:1(948)
pop::cpp_function at /LevyProblem.pop:15:0(220)
pop::function_keyword "minimize" at /LevyProblem.pop:15:0(220) to /LevyProblem.pop:15:8(228)
pop::var_name "levyFunction" at /LevyProblem.pop:15:9(229) to /LevyProblem.pop:15:21(241)
pop::cpp_brackets "{ expression; }" at /LevyProblem.pop:15:24(244) to /LevyProblem.pop:38:1(948)
Is it possible to raise global error from an action?
E.g., a grammar accepts 1..3 digits but there's additional constraint on the value in these digits, namely TTL can be 0..255 but not 256 or 999 which is valid by the grammar itself.
In the action I can add a check for the range but how to trigger error?
Tried to use throw tao::pegtl::parse_error but can't compile since:
const class tao::pegtl::internal::action_input<tao::pegtl::lf_crlf_eol, (tao::pegtl::tracking_mode)1u>' has no member named 'position'
Would you like to replace any double quotes by angle brackets around file names for include statements?
Hi !
I was unable to compile on OSX due to the compiler not finding the file type_traits, and I found that it needs the -stdlib option set to libc++ to look in the good places.
I suggest adding it to the Makefile, as I've done, if it doesn't break compiling on other OSs.
~> cat q.abnf
quoted-pair = "\" (%x00-09 / %x0B-0C / %x0E-7F)
~> ./abnf2pegtl q.abnf
struct quoted_pair : pegtl::seq< pegtl::one< '\' >, pegtl::sor< pegtl::range< 0x00, 0x09 >, pegtl::range< 0x0B, 0x0C >, pegtl::range< 0x0E, 0x7F > > > {};
Note that "\" from abnf was converted to '\' with just one backslash, i.e. malformed char.
Hi,
I found that discard breaks input matching in the action. This seems to be a bug? Below you can find a test case that shows two behaviours. I'd have expected either to work.
The guts of it is in this rule:
struct word1 : seq<discard, bytes<4>> {};
In the action, I try to match the four byte input string, but find that the discard (which should throw away the three preceeding characters) throws away the first three characters of bytes<4>
.
$ ./discard-test
Unexpected:
INPUT: 3/3: one
INPUT: 4/4: vase
WORD: e
INPUT: 3/3: two
INPUT: 4/4: pots
WORD: s
INPUT: 0/3:
Expected:
INPUT: 3/3: one
INPUT: 4/4: vase
WORD: vase
INPUT: 3/3: two
INPUT: 4/4: pots
WORD: pots
INPUT: 0/3:
#include <tao/pegtl.hpp>
#include <stringstream>
#include <iostream>
using namespace tao::pegtl;
struct word1 : seq<discard, bytes<4>> {};
struct grammar1 : star<discard, bytes<3>, word1> {};
struct word2 : seq<bytes<4>> {};
struct grammar2 : star<discard, bytes<3>, discard, word2> {};
template <typename Rule>
struct xaction : nothing<Rule> {};
template <>
struct xaction<word1> {
template <typename Input>
static void apply(const Input& in) {
std::string str = in.string();
std:
std::cerr << "WORD: " << str << "\n";
}
};
template <>
struct xaction<word2> : xaction<word1> {};
int main(int argc, char* argv[]) {
std::stringstream data;
data << "onevasetwopots";
using reader_t =
std::function<std::size_t(char* buffer, const std::size_t length)>;
auto reader = [&data](char* buffer, const std::size_t length) mutable {
std::streamsize sz = data.read(buffer, length).gcount();
std::cerr << "INPUT: " << sz << "/" << length << ": ";
std::cerr.write(buffer, sz);
std::cerr << "\n";
return sz;
};
std::cerr << "Unexpected:\n";
buffer_input<reader_t> input1("reader", 1024, reader);
parse<grammar1, xaction>(input1);
std::cerr << "\nExpected:\n";
data.clear();
data.seekg(0, data.beg);
buffer_input<reader_t> input2("reader", 1024, reader);
parse<grammar2, xaction>(input2);
return 0;
}
Hi !
First and foremost, it's a real pleasure to use PEGTL, thank you !
When using string_input
like:
std::string s{ "something to parse" };
tao::pegtl::string_input<> in(s);
a compiler error occurs.
Obviously in string_input.hpp:42
the second parameter to memory_input
is not what is expected.
Changing it to data.data() + data.size()
fixes the problem.
Hope this helps.
Best.
I think it might be a good idea in terms of convinience to extend the availible match rules by must_err<R, ERR>
.
The thought behind that is defining an Error message for an anonymous set of rules.
The implementation of must<R>
is described as:
sor< R, raise< R > >
Having a scenario as following:
struct foo : seq< Spacing, string<'v', 'a', 'r'>, must< Spacing, Identifier, Spacing > > {};
I would need to define an error_control for this anonymous segment. Rather that or naming this sequence but I really would like to avoid that. Also this is ambigious, as Spacing, Identifier, Spacing
is a very generic term, which may need very different error messages, depending of it's apperance inside a rule.
What I would suggest instead is:
struct foo : seq< Spacing, string<'v', 'a', 'r'>, must_err< seq< Spacing, Identifier, Spacing >, ERR_MSG("var must be followed by an Identifier") > > {};
Where ERR_MSG( "..." )
is a macro like TAO_PEGTL_STRING( "..." )
, just returning something more simple than string<>
(without match method).
must_err<> then rises an tao::pegtl::parse_error, with msg ERR_MSG( "..." ). While overwriting the content of msg with the custom message (together with filename / linenumber etc..), I would suggest parse_error would get some more methods/members, that give access to:
I tried it out but as always my template programming skills are very limited.
For the sake of simplicity I extended the given string<> template with a static data() method and created my own rule must_err<>.
template< char... Cs >
struct string
{
...
static const std::string data() {
const std::initializer_list< char >& l = { Cs... };
const std::string str (l.begin(), l.end());
return str;
}
};
And copied and modified the must<> rule
template< typename Rule, class ERR_MSG = string<'f', 'o', 'o'> >
struct must_err
{
using analyze_t = typename Rule::analyze_t;
template< apply_mode A,
rewind_mode,
template< typename... > class Action,
template< typename... > class Control,
typename Input,
typename... States >
static bool match( Input& in, States&&... st )
{
//Actually a raise should happen here
std::cout << "Error: " << ERR_MSG::data() << std::endl;
return true;
}
};
Looking forward on your thoughts on this topic.
Hi,
I am currently trying to find out why my seemingly correct grammar won't parse. Here's the first issue that I don't understand: take the JSON grammar as an example and start with 16 spaces:
{
}
It fails with:
1 1 source:1:0(0) start tao::pegtl::disable<tao::pegtl::json::text>
2 2 source:1:0(0) start tao::pegtl::json::text
3 3 source:1:0(0) start tao::pegtl::star<tao::pegtl::json::ws>
4 4 source:1:0(0) start tao::pegtl::json::ws
5 4 source:1:0(0) failure tao::pegtl::json::ws
6 3 source:1:0(0) success tao::pegtl::star<tao::pegtl::json::ws>
7 5 source:1:0(0) start tao::pegtl::json::value
8 6 source:1:0(0) start tao::pegtl::sor<tao::pegtl::json::string, tao::pegtl::json::number, tao::pegtl::json::object, tao::pegtl::json::array, tao::pegtl::json::false_, tao::pegtl::json::true_, tao::pegtl::json::null>
9 7 source:1:0(0) start tao::pegtl::json::string
10 8 source:1:0(0) start tao::pegtl::ascii::one<(char)34>
11 8 source:1:0(0) failure tao::pegtl::ascii::one<(char)34>
12 7 source:1:0(0) failure tao::pegtl::json::string
13 9 source:1:0(0) start tao::pegtl::json::number
14 10 source:1:0(0) start tao::pegtl::opt<tao::pegtl::ascii::one<(char)45> >
15 11 source:1:0(0) start tao::pegtl::ascii::one<(char)45>
16 11 source:1:0(0) failure tao::pegtl::ascii::one<(char)45>
17 10 source:1:0(0) success tao::pegtl::opt<tao::pegtl::ascii::one<(char)45> >
18 12 source:1:0(0) start tao::pegtl::json::int_
19 13 source:1:0(0) start tao::pegtl::ascii::one<(char)48>
20 13 source:1:0(0) failure tao::pegtl::ascii::one<(char)48>
21 14 source:1:0(0) start tao::pegtl::json::digits
22 15 source:1:0(0) start tao::pegtl::abnf::DIGIT
23 15 source:1:0(0) failure tao::pegtl::abnf::DIGIT
24 14 source:1:0(0) failure tao::pegtl::json::digits
25 12 source:1:0(0) failure tao::pegtl::json::int_
26 9 source:1:0(0) failure tao::pegtl::json::number
27 16 source:1:0(0) start tao::pegtl::json::object
28 17 source:1:0(0) start tao::pegtl::json::begin_object
29 18 source:1:0(0) start tao::pegtl::ascii::one<(char)123>
30 18 source:1:0(0) failure tao::pegtl::ascii::one<(char)123>
31 17 source:1:0(0) failure tao::pegtl::json::begin_object
32 16 source:1:0(0) failure tao::pegtl::json::object
33 19 source:1:0(0) start tao::pegtl::json::array
34 20 source:1:0(0) start tao::pegtl::json::begin_array
35 21 source:1:0(0) start tao::pegtl::ascii::one<(char)91>
36 21 source:1:0(0) failure tao::pegtl::ascii::one<(char)91>
37 20 source:1:0(0) failure tao::pegtl::json::begin_array
38 19 source:1:0(0) failure tao::pegtl::json::array
39 22 source:1:0(0) start tao::pegtl::json::false_
40 22 source:1:0(0) failure tao::pegtl::json::false_
41 23 source:1:0(0) start tao::pegtl::json::true_
42 23 source:1:0(0) failure tao::pegtl::json::true_
43 24 source:1:0(0) start tao::pegtl::json::null
44 24 source:1:0(0) failure tao::pegtl::json::null
45 6 source:1:0(0) failure tao::pegtl::sor<tao::pegtl::json::string, tao::pegtl::json::number, tao::pegtl::json::object, tao::pegtl::json::array, tao::pegtl::json::false_, tao::pegtl::json::true_, tao::pegtl::json::null>
46 5 source:1:0(0) failure tao::pegtl::json::value
47 2 source:1:0(0) failure tao::pegtl::json::text
48 1 source:1:0(0) failure tao::pegtl::disable<tao::pegtl::json::text>
Note 5 4 source:1:0(0) failure tao::pegtl::json::ws
: it fails already at the first space! If you delete one space, it works.
Using a modified version of the 'ID and sum comma-separated digits' example:
peg_test.cpp:29:68: error: template argument 3 is invalid
seq< plus< D >, opt< dot, star< D > > > > {};
^
In file included from ../inst/include/pegtl/internal/action.hpp:9:0,
from ../inst/include/pegtl/internal/rules.hpp:7,
from ../inst/include/pegtl/ascii.hpp:11,
from ../inst/include/pegtl.hpp:10,
from peg_test.cpp:3:
C++11 via GCC 4.9.3; works fine on clang, however!
Hi, everyone! Thank you very much for the absolutely amazing library!
I have encountered a problem:
I have a simple code:
#include <iostream>
#include <tao/pegtl.hpp>
#include <tao/pegtl/contrib/http.hpp>
#include <tao/pegtl/contrib/tracer.hpp>
using namespace std::string_literals;
using namespace tao::pegtl;
namespace rule {
using grammar = must<http::start_line>;
}
int main() {
auto response =
"HTTP/1.1 206 Partial content\r\n"s;
string_input<> input(response, "test");
try {
parse<rule::grammar, nothing, tracer>(input);
} catch (const std::exception &e) {
std::cerr << "\nERROR: " << e.what() << std::endl;
}
return 0;
}
It fails like this:
test:1:0(0) start tao::pegtl::must<tao::pegtl::http::start_line>
test:1:0(0) start tao::pegtl::http::start_line
test:1:0(0) start tao::pegtl::http::request_line
test:1:0(0) start tao::pegtl::http::method
test:1:0(0) start tao::pegtl::http::tchar
test:1:0(0) start tao::pegtl::abnf::ALPHA
test:1:1(1) success tao::pegtl::abnf::ALPHA
test:1:1(1) success tao::pegtl::http::tchar
test:1:1(1) start tao::pegtl::http::tchar
test:1:1(1) start tao::pegtl::abnf::ALPHA
test:1:2(2) success tao::pegtl::abnf::ALPHA
test:1:2(2) success tao::pegtl::http::tchar
test:1:2(2) start tao::pegtl::http::tchar
test:1:2(2) start tao::pegtl::abnf::ALPHA
test:1:3(3) success tao::pegtl::abnf::ALPHA
test:1:3(3) success tao::pegtl::http::tchar
test:1:3(3) start tao::pegtl::http::tchar
test:1:3(3) start tao::pegtl::abnf::ALPHA
test:1:4(4) success tao::pegtl::abnf::ALPHA
test:1:4(4) success tao::pegtl::http::tchar
test:1:4(4) start tao::pegtl::http::tchar
test:1:4(4) start tao::pegtl::abnf::ALPHA
test:1:4(4) failure tao::pegtl::abnf::ALPHA
test:1:4(4) start tao::pegtl::abnf::DIGIT
test:1:4(4) failure tao::pegtl::abnf::DIGIT
test:1:4(4) start tao::pegtl::ascii::one<(char)33, (char)35, (char)36, (char)37, (char)38, (char)39, (char)42, (char)43, (char)45, (char)46, (char)94, (char)95, (char)96, (char)124, (char)126>
test:1:4(4) failure tao::pegtl::ascii::one<(char)33, (char)35, (char)36, (char)37, (char)38, (char)39, (char)42, (char)43, (char)45, (char)46, (char)94, (char)95, (char)96, (char)124, (char)126>
test:1:4(4) failure tao::pegtl::http::tchar
test:1:4(4) success tao::pegtl::http::method
test:1:4(4) start tao::pegtl::abnf::SP
test:1:4(4) failure tao::pegtl::abnf::SP
ERROR: test:1:4(4): parse error matching tao::pegtl::abnf::SP
As far as I understand it tries to parse the first part of the rule struct start_line : sor< request_line, status_line > {};
which is request_line
, but I'm parsing actually a status_line
. So if I'm not mistaken, it must check the second sor
rule if the first one fails, because of "OR" predicate nature, but it just fails at the first one.
If I'm wrong with my understanding, what's the problem with that code?
I'm writing a parser for the Verilog language using PEGTL and wanted to share a problem I'm running into and a possible solution. This is my first time using PEG-style parsing, although I am far from a novice at parsing, having done work for my PhD thesis on approximate parsing in a network security context (google "flowsifter").
I can appreciate the simplicity and elegance (and incredible run-time efficiency) of having actions associated with each rule run when that rule is matched; this combines with the control system to make a very powerful way to write code that gets triggered by parsing. But I'm particularly annoyed at how unnecessarily difficult it is to write side-effect heavy code that works correctly without these pieces of code having a hierarchical context to work in.
In the best parsers I've worked with (and written), when part of the grammar matches, it can return a value to the next level up, and that value can be used as part of constructing the result of that higher-level rule's parse result. This ends up resulting in a sequence of function calls that are nested in exactly the same structure as the parse tree of the text being parsed. In PEGTL, I can decide exactly what code runs when a particular rule matches, but I get no nesting of function calls, I only get a flat space of code executions. Of course it's possible to implement that hierarchy by using an explicit stack and pushing and popping from the stack as rules are matched, but this has a high degree of complexity and is prone to user error. The expression parser example in this repository even goes as far as using a stack of stacks to handle parentheses in expressions.
There's got to be a reasonable way to stitch together rules and actions in such a way that actions can return a value and the action for a rule receives/can access values produced by child rules. Maybe this will require each child rule to have an action that returns a value; that seems a reasonable price to pay for this feature.
Hello
I want to define rule in plain files and parse accordingly.
Is it possible? If not please add this.
Allowing
pegtl::string<"Hello, ">
would obviously be simpler than
pegtl::string<'H', 'e', 'l', 'l', 'o', ',', ' '>
but implementing this is a chore. I am not versed enough in C++ to know if upcoming standard versions make it simpler, but for now I found a workaround with boost. This issue is a placeholder so others can find it until pegtl supports it natively (so feel free to close the issue if it is not a priority for now).
#include <boost/metaparse/string.hpp>
template<typename T>
struct literal_string {};
template<char... Cs>
struct literal_string<boost::metaparse::string<Cs...> > : pegtl::string<Cs...> {};
#define STRING(str) literal_string< BOOST_METAPARSE_STRING(str) >
struct prefix
: STRING("Hello, ")
{};
The example grammar in double.hh
requires dot in the input, so it won't parse 2
or 2e0
.
(it's not an intentional limitation? numeral
from lua53_parse.cc
handles it correctly)
Here is what the sum.cc program shows:
$ ./sum
Give me a comma separated list of numbers.
The numbers are added using the PEGTL.
Type [q or Q] to quit
2.0
parsing OK; sum = 2
2.
parsing OK; sum = 2
2.e0
parsing OK; sum = 2
2e0
parsing failed
2
parsing failed
[ 22%] Building CXX object src/test/pegtl/CMakeFiles/internal_file_opener.dir/internal_file_opener.cpp.o
cd /var/tmp/portage/dev-libs/pegtl-2.1.3/work/pegtl-2.1.3_build/src/test/pegtl && /usr/lib64/ccache/bin/x86_64-pc-linux-gnu-g++ -I/var/tmp/portage/dev-libs/pegtl-2.1.3/work/PEGTL-2.1.3/include -DNDEBUG -O2 -pipe -march=native -fomit-frame-pointer -pedantic -Wall -Wextra -Wshadow -Werror -std=c++11 -o CMakeFiles/internal_file_opener.dir/internal_file_opener.cpp.o -c /var/tmp/portage/dev-libs/pegtl-2.1.3/work/PEGTL-2.1.3/src/test/pegtl/internal_file_opener.cpp
/var/tmp/portage/dev-libs/pegtl-2.1.3/work/PEGTL-2.1.3/src/test/pegtl/contrib_raw_string.cpp:59:42: error: declaration of template parameter ‘Rule’ shadows template parameter
template< typename Rule, template< typename Rule > class Action, unsigned M, unsigned N >
^~~~~~~~
/var/tmp/portage/dev-libs/pegtl-2.1.3/work/PEGTL-2.1.3/src/test/pegtl/contrib_raw_string.cpp:59:17: note: template parameter ‘Rule’ declared here
template< typename Rule, template< typename Rule > class Action, unsigned M, unsigned N >
^~~~~~~~
make[2]: *** [src/test/pegtl/CMakeFiles/contrib_raw_string.dir/build.make:63: src/test/pegtl/CMakeFiles/contrib_raw_string.dir/contrib_raw_string.cpp.o] Error 1
Details here: build.txt
Hey Colin, great library! I'm looking forward to using it. I actually got here through googling C++ CSV Parser
, which took me here where you wrote:
CSV isn't a precisely defined format; on multiple occasions I slapped together a simple CSV parser for whatever file format was thrown my way with our PEGTL parser library (PEG based, C++11, header-only, production quality, small and light - and with documentation).
Unfortunately, I didn't see a CSV parser in your examples
folder, and I was hoping you could add one such that I could see how it'd be done idiomatically with PEGTL.
I want to have a parser that translates the Syntax introduced in Bryan Ford's paper to translate to PEGTL, as I find this Syntax very readable. To do so I try to utilize the parse_tree.h provided with this library. My approch was to use a cutom node struct, that implements a to_String method like this:
struct node {
std::vector< std::unique_ptr< node > > children;
std::string str_before, str_after;
std::string to_String()
{
std::string result = str_before;
for(auto it = children.begin(); it != children.end();) {
std::string child_str = it->get()->to_String();
result += child_str;
if(++it != children.end() && child_str != "") {
result += ", ";
}
}
return result += str_after;
}
...
};
To make the actual magic happen I thought to use the transform function like so:
template< typename > struct store : std::false_type {
};
template<> struct store< Sequence > : std::true_type {
static void transform(std::unique_ptr< PEG2PEGTL::node >& n)
{
n.get()->str_before = "sor< ";
n.get()->str_after = " >";
}
};
And it works for me, until it gets a little tricky, since I would need to nest sibling inside each other, if I have a rule like this:
struct Element : seq< opt< Prefix >, Primary, opt< Suffix > > {
};
If only I could access Element's children from Prefix and/or Suffix, and move them inside itself. I suppose I could do this by using the node's id_
and move the logic inside the Element's transform function. Is that the intended usage? How far off the intended usage am I with this approach?
Hi,
First, wow! Kudos to you, this is very cool.
Your documentation says that there should be examples/expression.cc which is the same as the calculator example, but builds an parse tree and operates on that. I'm either incredibly blind, or it's not in the examples directory.
I'd really like to see a good example of how to build a parse tree with this parser.
Hi,
Currently, I have action<> specializations with apply() method to acquire the matched text, eg:
template<> struct action<number>
{
template <typename Input>
static void apply(const Input &in, SomeState &state)
{
// here I capture the text via "in.string()", and the starting position via "in.position()"
...
}
};
Is there an easy way to obtain the position at the end of the matched text?
Right now, I'm iterating over the string, increment byte, and byte_in_line, (unless I see a newline), in which case, I bump the line counter, and reset the byte_in_line counter. It occurs to me that the parser must already have this information somewhere, and I'm just wasting CPU cycles. :(
Also, I have a small "calculator" project which creates and AST from the input, then evaluates the AST to perform the calculations. Let me know if you're interesting in seeing it.
--Rich
For parsing binary data, it would be useful to have range<C, D> for unsigned char. Otherwise, using a range above 0x7f fails to match properly due to signed conversion (e.g., (char)3 is not matched by range<0, (char)0xbf>).
Should be a straightforward application of internal range with an appropriate peek function.
Of course, there is a workaround by writing a custom rule.
When running the hello world in the README.md I get:
/Users/rfonseca/Documents/workspace-CPP/SeqScan/src/testpegtl.cc:84:35: error: no type named 'input' in namespace 'pegtl'
static void apply( const pegtl::input & in, std::string & name )
~~~~~~~^
I noticed that position() on a buffer_input returns the relative offset to the last discard, which is fair I guess., but it wasn't obvious from the documentation.
Often, the user will be interested in the absolute offset from the start of parsing. It's easy enough to keep track of that in the read callback, but I wonder if this is something you want to provide in pegtl.
Today I tried to use the file_input<> in
class to try to find out some information on the characters surrounding the byte position I've got from a thrown parse_error
.
My goal is to create an error message like this:
[Error] ./test/testfile.js:2:14(58): Unexpected character in variable statement
>>> var Bar = 0xE var Foo;
^
With the parse_error
thrown I've got access to:
Do you have any suggestions on how to get the error's line's size?
Only thing I could imagine is using in.bump()
with the error's offset, and then bump by 1 until eof, or in.position.line() != e.positions().front().line
, but I can't seem to find a reset operator. Basicly I'm not able to iterate trogh the Input. What I tried out so far is:
auto err_pos = e.positions.front();
std::cout << "[ERROR] " << e.what() << std::endl;
std::cout << ">>> " << std::string(in.current(), err_pos.byte - err_pos.byte_in_line, err_pos.byte_in_line + 1 ) << std::endl;
std::cout << std::string(4 + err_pos.byte_in_line, ' ') << '^'<< std::endl;
But it obviously end with the first character of the parsed error.
Any suggestions?
Is it possible to parse true (unseekable, possibly infinite) streams with PEGTL?
For example, could the calculator demo be modified to parse stdin as a stream, with 'quit' being a parser action?
In my particular case, I have a growing number (1000s) of large (~200GB) files. The files are stored compressed and accessed over a network. I would like to decompress these and pipe the uncompressed stream directly to the parser, to avoid the time penalty associated with decompressing to temporary files just so that they can be parsed. Sadly, I can't decompress entire files into memory and parse them that way, and "chunking" them is not as simple as splitting on newlines (as in the calculator demo).
I tried out the abnf2pegtl2.cpp example after I couldn't find out at which point the error_control
is passed trough the tao::pegtl::parse_tree::parse
to tao::pegtl::parse
, and it seems it just isn't. Therefor custom error messages are not thrown.
I'v downloaded this on my laptop (running Ubuntu) and I was wonder -- how can I Install this?
Using CMake and MinGW to build the library, I get the following error message:
Scanning dependencies of target abnf2pegtl
[ 80%] Building CXX object src/example/pegtl/CMakeFiles/abnf2pegtl.dir/abnf2pegtl.cpp.obj
D:\Users\Daniel\Programming\PEGTL-master\src\example\pegtl\abnf2pegtl.cpp: In lambda function:
D:\Users\Daniel\Programming\PEGTL-master\src\example\pegtl\abnf2pegtl.cpp:177:98: error: '::strcasecmp' has not been declared
return std::find_if( rbegin, rules.rend(), [&]( const rules_t::value_type& p ) { return ::strcasecmp( p.first.c_str(), v.c_str() ) == 0; } );
^~
In file included from c:\mingw\lib\gcc\mingw32\6.3.0\include\c++\bits\stl_algobase.h:71:0,
from c:\mingw\lib\gcc\mingw32\6.3.0\include\c++\algorithm:61,
from D:\Users\Daniel\Programming\PEGTL-master\src\example\pegtl\abnf2pegtl.cpp:4:
c:\mingw\lib\gcc\mingw32\6.3.0\include\c++\bits\predefined_ops.h: In instantiation of 'bool __gnu_cxx::__ops::_Iter_pred<_Predicate>::operator()(_Iterator) [with _Iterator = std::reverse_iterator<__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >*, std::vector<std::pair<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> > > > >; _Predicate = abnf2pegtl::data::find_rule(const string&, const reverse_iterator&)::<lambda(const value_type&)>]':
c:\mingw\lib\gcc\mingw32\6.3.0\include\c++\bits\stl_algo.h:120:14: required from '_RandomAccessIterator std::__find_if(_RandomAccessIterator, _RandomAccessIterator, _Predicate, std::random_access_iterator_tag) [with _RandomAccessIterator = std::reverse_iterator<__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >*, std::vector<std::pair<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> > > > >; _Predicate = __gnu_cxx::__ops::_Iter_pred<abnf2pegtl::data::find_rule(const string&, const reverse_iterator&)::<lambda(const value_type&)> >]'
c:\mingw\lib\gcc\mingw32\6.3.0\include\c++\bits\stl_algo.h:161:23: required from '_Iterator std::__find_if(_Iterator, _Iterator, _Predicate) [with _Iterator = std::reverse_iterator<__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >*, std::vector<std::pair<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> > > > >; _Predicate = __gnu_cxx::__ops::_Iter_pred<abnf2pegtl::data::find_rule(const string&, const reverse_iterator&)::<lambda(const value_type&)> >]'
c:\mingw\lib\gcc\mingw32\6.3.0\include\c++\bits\stl_algo.h:3817:28: required from '_IIter std::find_if(_IIter, _IIter, _Predicate) [with _IIter = std::reverse_iterator<__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >*, std::vector<std::pair<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> > > > >; _Predicate = abnf2pegtl::data::find_rule(const string&, const reverse_iterator&)::<lambda(const value_type&)>]'
D:\Users\Daniel\Programming\PEGTL-master\src\example\pegtl\abnf2pegtl.cpp:177:149: required from here
c:\mingw\lib\gcc\mingw32\6.3.0\include\c++\bits\predefined_ops.h:234:11: error: void value not ignored as it ought to be { return bool(_M_pred(*__it)); }
^~~~~~~~~~~~~~~~~~~~
src\example\pegtl\CMakeFiles\abnf2pegtl.dir\build.make:62: recipe for target 'src/example/pegtl/CMakeFiles/abnf2pegtl.dir/abnf2pegtl.cpp.obj' failed
mingw32-make[2]: *** [src/example/pegtl/CMakeFiles/abnf2pegtl.dir/abnf2pegtl.cpp.obj] Error 1
CMakeFiles\Makefile2:3260: recipe for target 'src/example/pegtl/CMakeFiles/abnf2pegtl.dir/all' failed
mingw32-make[1]: *** [src/example/pegtl/CMakeFiles/abnf2pegtl.dir/all] Error 2
Makefile:139: recipe for target 'all' failed
mingw32-make: *** [all] Error 2
<cstring>
seems to be included in the source, I don't know why it doesn't seem to be able to resolve strcasecmp
...
Hi, great to see the new project structure & hosting! Have you any interest in making pegtl work on win32? Here's a quick sample of some of the issues I found:
Warnings:
Most other issues seem to be related to partial c++11 support.
11>s:\source\pegtl\pegtl\internal\rule_conjunction.hh(21): warning C4789: buffer '' of size 4 bytes will be overrun; 1 bytes will be written starting at offset 4
11>s:\source\pegtl\pegtl\internal\rule_conjunction.hh(21): warning C4789: buffer '' of size 4 bytes will be overrun; 1 bytes will be written starting at offset 5
11>s:\source\pegtl\pegtl\internal\sor.hh(26): warning C4789: buffer '' of size 4 bytes will be overrun; 1 bytes will be written starting at offset 4
11>s:\source\pegtl\pegtl\internal\sor.hh(26): warning C4789: buffer '' of size 4 bytes will be overrun; 1 bytes will be written starting at offset 4
11>s:\source\pegtl\pegtl\internal\rule_conjunction.hh(21): warning C4789: buffer '' of size 4 bytes will be overrun; 1 bytes will be written starting at offset 4
11>s:\source\pegtl\pegtl\internal\rule_conjunction.hh(21): warning C4789: buffer '' of size 4 bytes will be overrun; 1 bytes will be written starting at offset 5
11>s:\source\pegtl\pegtl\internal\rule_conjunction.hh(21): warning C4789: buffer '' of size 4 bytes will be overrun; 1 bytes will be written starting at offset 6
pegtl/internal/file_opener.hpp is now using O_CLOEXEC
.
According to a man page on Linux:
O_CLOEXEC (since Linux 2.6.23)
and I was just told that it doesn't compile on CentOS 5.
Perhaps it could be wrapped in #ifdef
s?
When a rule uses must
then the exception reveals error information related to place and rule where parsing has failed. However there seems to be no way to get such or similar information for rules w/o must
. If it doesn't hurt the current design of the PEGTL then it'd be a valuable addition to provide some way to obtain error information in any case when user needs it.
Many, many thanks to your brilliant code!
Just discovered it and it seems to be a killer solution for my task.
However I need to parse messages and can't afford to wrap decoding in try/catch blocks since it's costly. Rather would be enough to get true/false from parse. Perhaps I couldn't find it in docs but trying to provide custom control doesn't help since the PEGTL code will call std::abort if no exception thrown.
Could you please consider some control policy to inhibit exceptions?
I'm trying to use TAOCPP_PEGTL_KEYWORD
but it does not compile. I think the macro is missing something.
This is the offending code.
struct ARRAY : TAOCPP_PEGTL_KEYWORD("array") {};
I'm using version 2.0 installed via brew on macOS.
The compiler output can be found here: https://gist.github.com/agustingianni/2503c7f83c0ff9617bda0e74a51d52d5
Hello,
I am using PEGTL as a submodule for my own repository and link it by using "cmake_add_subdirectroy" (which works perfectly fine). The annoying part is that this will add all of the PEGL examples & tests to my own project. Currently there is no cmake option to disable this. I'd recommend something like this:
option(BUILD_TESTS "Wether or not to build PEGL test cases" TRUE)
,
which would build the tests if not explicitly disabled before.
Hi,
I pulled down the code, and tried to build on OS X 10.9.5.
It failed though as shown below, I didn't have time to look at it more,
but i thought might be useful to know,
Regards
Jiri
System details:
[2015.01.19:11.11][Jiri@jirimbp:~/lib/cpp/PEGTL]$ g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin13.4.0
Thread model: posix
It fails with the following message:
c++ -I. -std=c++11 -pedantic -Wall -Wextra -Werror -O3 source/modulus_match.cc -o build/source/modulus_match
In file included from source/modulus_match.cc:4:
In file included from ./pegtl.hh:8:
In file included from ./pegtl/parse.hh:14:
In file included from ./pegtl/internal/rule_match_help.hh:11:
In file included from ./pegtl/internal/rule_match_impl.hh:10:
./pegtl/internal/rule_match_call.hh:17:88: error: 'match' following the 'template' keyword does not
refer to a template
...auto match( Input & in, States && ... st ) -> decltype( Rule::template match< E, Action, Con...
~~~~~~ ^
./pegtl/internal/rule_match_impl.hh:43:18: note: in instantiation of template class
'pegtl::internal::rule_match_call<modulus::my_rule<3, 0>, 1, nothing, normal>' requested here
if ( rule_match_call< Rule, error_mode::THROW, Action, Control >::match( in, st ...
^
./pegtl/internal/rule_match_help.hh:20:173: note: in instantiation of function template
specialization 'pegtl::internal::rule_match_impl<modulus::my_rule<3, 0>, 1, nothing, normal,
1>::match<pegtl::input>' requested here
...Action< Rule > >::value ? apply_here::NOTHING : apply_here::ACTION >::template match( in, s...
^
./pegtl/internal/rule_conjunction_impl.hh:31:20: note: in instantiation of function template
specialization 'pegtl::internal::rule_match_help<modulus::my_rule<3, 0>, 1, nothing, normal,
pegtl::input>' requested here
return rule_match_help< Rule, E, Action, Control >( in, st ... ) && rule_conjunc...
^
./pegtl/internal/until.hh:47:88: note: in instantiation of function template specialization
'pegtl::internal::rule_conjunction_impl<modulus::my_rule<3, 0> >::match<1, nothing, normal,
pegtl::input>' requested here
...if ( in.empty() || ! rule_conjunction_impl< Rule, Rules ... >::template match< E, Action, Co...
^
./pegtl/internal/rule_match_call.hh:19:35: note: in instantiation of function template
specialization 'pegtl::internal::until<pegtl::ascii::eolf, modulus::my_rule<3, 0> >::match<1,
nothing, normal, pegtl::input>' requested here
return Rule::template match< E, Action, Control >( in, st ... );
^
./pegtl/internal/rule_match_impl.hh:43:79: note: in instantiation of function template
specialization 'pegtl::internal::rule_match_call<modulus::grammar, 1, nothing,
normal>::match<pegtl::input>' requested here
if ( rule_match_call< Rule, error_mode::THROW, Action, Control >::match( in, st ...
^
./pegtl/internal/rule_match_help.hh:20:173: note: in instantiation of function template
specialization 'pegtl::internal::rule_match_impl<modulus::grammar, 1, nothing, normal,
1>::match<pegtl::input>' requested here
...Action< Rule > >::value ? apply_here::NOTHING : apply_here::ACTION >::template match( in, s...
^
./pegtl/parse.hh:21:17: note: in instantiation of function template specialization
'pegtl::internal::rule_match_help<modulus::grammar, 1, nothing, normal, pegtl::input>'
requested here
internal::rule_match_help< Rule, error_mode::THROW, Action, Control >( in, st ... );
^
./pegtl/parse.hh:28:7: note: in instantiation of function template specialization
'pegtl::parse<modulus::grammar, nothing, normal>' requested here
parse< Rule, Action, Control >( in, st ... );
^
source/modulus_match.cc:33:14: note: in instantiation of function template specialization
'pegtl::parse<modulus::grammar, nothing, normal>' requested here
pegtl::parse< modulus::grammar >( 1, argv );
^
1 error generated.
make: *** [build/source/modulus_match] Error 1
The VS2015 technology preview is the only version that almost manages to build PEGTL. However, it fails to understand the decltype+comma SFINAE trick in rule_match_call.hh
:
1>c:\users\sam\pegtl\pegtl\internal\rule_match_call.hh(26): error C2535: 'unknown-type
pegtl::internal::rule_match_call<Rule,E,Action,Control>::match(Input &,States &&...)':
member function already defined or declared
1>c:\users\sam\pegtl\pegtl\internal\rule_match_call.hh(17): note: see declaration of
'pegtl::internal::rule_match_call<Rule,E,Action,Control>::match'
1>c:\users\sam\pegtl\pegtl\internal\rule_match_call.hh(27): note: see reference to class
template instantiation 'pegtl::internal::rule_match_call<Rule,E,Action,Control>' being
compiled
Even though this is a compiler bug, do you think a workaround is possible? I have tried a few things but failed to come up with something that works.
I understand that it's possible to abort parsing by throwing an exception but it would be nice to have ability to stop parsing depending on action
result.
E.g., instead of current signature an action
can also return result:
template<> struct action< SOME >
{
template <class T, class... ARGS>
static bool apply(T const& in, ARGS&&...)
{
if (CONDITION) return false;
return true;
}
};
It would be nice if PEGTL supported (configurable) memoization to increase performance in situations where backtracking happened.
Clang-cl has an issue with these two lines. It gets confused and somehow thinks that the code is calling the constructor for the class it's in (peek_char
and remove_content
) and not the method of the variable in
or n
, respectively.
I don't know how you'd like to handle this (if at all). I'd appreciate it if the bug were fixed, but that is up to you of course. You could fix it in a few ways:
noexcept
.#ifdef
s to not try to use noexcept
for this compiler.FYI, I've created a minimal test case that reproduces the bug, which I will submit to LLVM once they've given me an account. If you're interested, it's attached.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.