orangeduck / mpc Goto Github PK
View Code? Open in Web Editor NEWA Parser Combinator library for C
License: Other
A Parser Combinator library for C
License: Other
I get unexpected link errors.
1>Main.obj : error LNK2019: unresolved external symbol "void __cdecl mpc_err_delete(struct mpc_err_t *)" (?mpc_err_delete@@YAXPAUmpc_err_t@@@Z) referenced in function _main
1>Main.obj : error LNK2019: unresolved external symbol "void __cdecl mpc_err_print(struct mpc_err_t *)" (?mpc_err_print@@YAXPAUmpc_err_t@@@Z) referenced in function _main
1>Main.obj : error LNK2019: unresolved external symbol "int __cdecl mpc_parse(char const *,char const *,struct mpc_parser_t *,union mpc_result_t *)" (?mpc_parse@@YAHPBD0PAUmpc_parser_t@@PATmpc_result_t@@@Z) referenced in function _main
1>Main.obj : error LNK2019: unresolved external symbol "struct mpc_parser_t * __cdecl mpc_new(char const *)" (?mpc_new@@YAPAUmpc_parser_t@@PBD@Z) referenced in function _main
1>Main.obj : error LNK2019: unresolved external symbol "void __cdecl mpc_cleanup(int,...)" (?mpc_cleanup@@YAXHZZ) referenced in function _main
1>Main.obj : error LNK2019: unresolved external symbol "void __cdecl mpc_ast_delete(struct mpc_ast_t *)" (?mpc_ast_delete@@YAXPAUmpc_ast_t@@@Z) referenced in function _main
1>Main.obj : error LNK2019: unresolved external symbol "void __cdecl mpc_ast_print(struct mpc_ast_t *)" (?mpc_ast_print@@YAXPAUmpc_ast_t@@@Z) referenced in function _main
1>Main.obj : error LNK2019: unresolved external symbol "struct mpc_err_t * __cdecl mpca_lang(int,char const *,...)" (?mpca_lang@@YAPAUmpc_err_t@@HPBDZZ) referenced in function _main
1>D:\Code\C++\Aueb\TonyCC\TonyCC\..\bin\Win32\Debug\TonyCC.exe : fatal error LNK1120: 8 unresolved externals
I checked and every function I am trying to use is in its right place in both mpc.h and mpc.c files.
Anyone has any idea what's going on?
Hi,
I've been expanding lispy (from your book, which is awesome, btw), to include support for doubles. However, when I finished my implementation, I suddenly get SIGSEGV errors when creating the grammar. Is there something I'm doing wrong?
I'm using the latest (master) version on the repo.
Grammar:
#define GRAMMAR " \
\
long : /-?[0-9]+/ ; \
double : /-?[0-9]+\\.?[0-9]+/; \
symbol : /[a-zA-Z0-9_+\\-*\\/\\\\=<>!&\\|\\:]+/ ; \
string : /\"(\\\\.|[^\"])*\"/ ; \
comment : /;[^\\r\\n]*/ ; \
sexpr : '(' <expr>* ')' ; \
qexpr : '{' <expr>* '}' ; \
expr : <number> | <symbol> | <string> \
| <comment> | <sexpr> | <qexpr> ; \
lispy : /^/ <expr>* /$/ ; \
\
"
static mpc_parser_t* number_l;
static mpc_parser_t* number_d;
static mpc_parser_t* symbol;
static mpc_parser_t* string;
static mpc_parser_t* comment;
static mpc_parser_t* sexpr;
static mpc_parser_t* qexpr;
static mpc_parser_t* expr;
static mpc_parser_t* lispy;
mpc_parser_t* grammar_create() {
number_l = mpc_new("long");
number_d = mpc_new("double");
symbol = mpc_new("symbol");
string = mpc_new("string");
comment = mpc_new("comment");
sexpr = mpc_new("sexpr");
qexpr = mpc_new("qexpr");
expr = mpc_new("expr");
lispy = mpc_new("lispy");
mpca_lang(MPCA_LANG_DEFAULT, GRAMMAR,
number_l, number_d, symbol, string, comment,
sexpr, qexpr, expr, lispy);
return lispy;
}
GDB backtrace:
Program received signal SIGSEGV, Segmentation fault.
0x74538eb4 in strcmp () from C:\WINDOWS\SysWOW64\msvcrt.dll
(gdb) backtrace
#0 0x74538eb4 in strcmp () from C:\WINDOWS\SysWOW64\msvcrt.dll
#1 0x0040de42 in mpca_grammar_find_parser ()
#2 0x0040de74 in mpcaf_grammar_id ()
#3 0x004081f5 in mpc_parse_apply_to ()
#4 0x00408607 in mpc_parse_run ()
#5 0x00408d22 in mpc_parse_run ()
#6 0x00408e6d in mpc_parse_run ()
#7 0x00408a66 in mpc_parse_run ()
#8 0x00408e6d in mpc_parse_run ()
#9 0x00408e6d in mpc_parse_run ()
#10 0x0040891a in mpc_parse_run ()
#11 0x004086d8 in mpc_parse_run ()
#12 0x00408e6d in mpc_parse_run ()
#13 0x00408e6d in mpc_parse_run ()
#14 0x004085db in mpc_parse_run ()
#15 0x00408feb in mpc_parse_input ()
#16 0x0040e8df in mpca_lang_st ()
#17 0x0040ea59 in mpca_lang ()
#18 0x00403e5c in grammar_create ()
#19 0x00401451 in main ()
(gdb)
When I tried compiling MPC for windows using Visual Studio 2015 Update 2 I got a few warnings regarding the usage of functions like strcpy. I'm not particularly worried about those warnings so I just use a few flags to remove them. But two other warnings still show up.
One of the warning is in this function. The function strtod uses doubles but the return value expects a float. To solve we just use the function strtof
.
The other is in this line. There is a size_t being cast to char. This is not incorrect in terms of usage, but the compiler still complains. To remove this warning we can use an explicit cast like so:
range[strlen(range) + 0] = (char) j;
That will work unless there is an error in the function and j should be a char.
I think I have stumbled upon a possible violation of POLA here..
This works as expected:
mpc_parser_t *p = mpc_count(3, mpcf_strfold, mpc_digit(), free);
int r = mpc_parse("test", "046", p, &mr);
If there's any non-digit input past count
chars (3 in my example), the parser also succeeds, as expected:
mpc_parser_t *p = mpc_count(3, mpcf_strfold, mpc_digit(), free);
int r = mpc_parse("test", "046aa", p, &mr);
But, and this is the behaviour that surprised me, if there are any digits beyond count
, the parser fails, like so:
mpc_parser_t *p = mpc_count(3, mpcf_strfold, mpc_digit(), free);
int r = mpc_parse("test", "04632", p, &mr);
$ test:1:6: error: expected 3 of digit at end of input
Is this on purpose? I can get around this issue by creating an mpc_many1
parser and splitting the digits externally, I just want to make sure that I'm not missing something here.
Oh, and thanks for this awesome library!
Working through http://www.buildyourownlisp.com/chapter6_parsing I found that replacing the decimal-number rule you suggested in #12 (comment) with the following causes a segfault when the REPL is given any input that causes mpc_parse("<stdin>", input, Lispy, &r)
to evaluate as true:
diff --git a/repl.c b/repl.c
index 5aa2cf9..b2f26fb 100644
--- a/repl.c
+++ b/repl.c
@@ -12,7 +12,7 @@ int main (int argc, char** argv) {
mpc_parser_t* Lispy = mpc_new("lispy");
mpca_lang(MPCA_LANG_DEFAULT, " \
-number : /-?[0-9]+(\\.?[0-9]*)?/ ; \
+number : /-?[0-9]*\\.?[0-9]+/ ; \
operator: '+' | '-' | '*' | '/'; \
expr : <number> | '(' <operator> <expr>+ ')' ;\
lispy : /^/ <operator> <expr>+ /$/; \
(Full code of repl.c
: https://gist.github.com/dunn/1176c4ed7b2ba6c5b68e)
Remove the second \
in front of the decimal and that's a valid regular expression in most contexts, so I don't think this is just a silly typo on my part.
OSX: 10.11.1; same result with Clang and GCC:
๐ฐ clang --version
Apple LLVM version 7.0.0 (clang-700.1.76)
Target: x86_64-apple-darwin15.0.0
Thread model: posix
๐ฐ gcc-5 --version
gcc-5 (Homebrew gcc 5.2.0) 5.2.0
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Hello,
The below fails when I have basic Math expressions like: 9 + 8
`
mpca_lang(MPCA_LANG_DEFAULT,
"
number : /-?[0-9]+/ ;
operator : '*' | '/' ;
oper : '+' | '-' ;
factor : | '(' ')' ;
term : | ;
expr : | ;
lispy : /^/ /$/ ;
",
Number, Operator, Oper,Factor, Term, Expr, Lispy);
The below passed. The only difference is the position of "term" and "factor" where I have union operator in the grammar. Could someone explain why this?
mpca_lang(MPCA_LANG_DEFAULT,
"
number : /-?[0-9]+/ ;
operator : '*' | '/' ;
oper : '+' | '-' ;
factor : | '(' ')' ;
term : | ;
expr : | ;
lispy : /^/ /$/ ;
",
Number, Operator, Oper,Factor, Term, Expr, Lispy);
`
so, I have a grammar and parser, in this file: https://github.com/ikbenlike/Xenon/blob/master/CXenon/src/parser.c. When I try to parse this file:
float:fib(int n){
if (n == 0){
return 0;
}
if (n == 1){
return 1;
}
return fib(n - 1) + fib(n - 2);
}
int:main() {
bool no = false;
int n = 11;
int i = 1.1;
str stuff = "stuff"[1];
int n = fib(stuff[1])[1];
stuff[1];
for(a in b){print(stuff);}
while (stuff[1] == stuff[1]) {
n = fib(10);
stuff();
print(n);
i = i + 1;
}
loop {
print(n);
}
return 0;
}
the parser hangs. When I run it in valgrind and press ctrl+c while it's running, it says this:
==21138== Process terminating with default action of signal 2 (SIGINT)
==21138== at 0x4C2DBB0: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21138== by 0x40231F: mpc_input_terminated (mpc.c:365)
==21138== by 0x402847: mpc_input_char (mpc.c:487)
==21138== by 0x402ABB: mpc_input_string (mpc.c:521)
==21138== by 0x40475C: mpc_parse_run (mpc.c:1026)
==21138== by 0x404996: mpc_parse_run (mpc.c:1056)
==21138== by 0x40526D: mpc_parse_run (mpc.c:1206)
==21138== by 0x4048B1: mpc_parse_run (mpc.c:1041)
==21138== by 0x40491B: mpc_parse_run (mpc.c:1048)
==21138== by 0x40526D: mpc_parse_run (mpc.c:1206)
==21138== by 0x4050F2: mpc_parse_run (mpc.c:1185)
==21138== by 0x40526D: mpc_parse_run (mpc.c:1206)
The string passed to the parse
function is properly terminated with a \0
. This issue did not arise in previous versions.
cheng@ada ~/mpc $ make
gcc -ansi -pedantic -O3 -g -Wall -Werror -Wextra -Wformat=2 -Wshadow -Wno-long-long -Wno-overlength-strings -Wno-format-nonliteral -Wcast-align -Wwrite-strings -Wstrict-prototypes -Wold-style-definition -Wredundant-decls -Wnested-externs -Wmissing-include-dirs -Wswitch-default examples/doge.c mpc.c -lm -o examples/doge
mpc.c: In function โmpcf_floatโ:
mpc.c:2259:3: error: implicit declaration of function โstrtofโ [-Werror=implicit-function-declaration]
*y = strtof(x, NULL);
^
mpc.c:2259:3: error: nested extern declaration of โstrtofโ [-Werror=nested-externs]
cc1: all warnings being treated as errors
make: *** [examples/doge] Error 1
After git clone the repo, i got the error and the <stdlib.h>
is included. i dont know why
Looks like r.output isn't an mpc_ast_t_, it is a mpc_val_t_
Not only that, I'm not sure how to convert a val_t to an ast_t
I am missing something obvious or does the very first example not compile?
mpc_result_t r;
if(mpc_parse("input", input, Maths, &r)) {
mpc_ast_print(r.output);
mpc_ast_delete(r.output);
} else {
Hi,
I am testing with the mpc library and I have a question about regex lines on mpca_lang
. I want to implement comments like in C /* ..... */
. I found a possible regex string:
/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/
The site where I found it is this:
http://ostermiller.org/findcomment.html
When I add it to mpca_lang
list, compiled and I run it, it shows on every command:
<stdin>: error: Parser Undefined!
There's something to do or to understand about this?
P.S Good library, and amazing work
So, I'm trying to parse a file. However, I'm having some troubles with my grammar. This is my grammar:
" ident : /[a-zA-Z_][a-zA-Z0-9_]*/ ; \n"
" number : /-?[0-9]+(\\.[0-9]*)?/ ; \n"
" character : /'.' | \".\"/ ; \n"
" string : /\"(\\\\.|[^\"])*\"/ ; \n"
" boolean : /true | false/ ; \n"
" \n"
" print : /\"print\" (<ident> | <string>)/ ; \n"
" factor : '(' <lexp> ')' \n"
" | <number> \n"
" | <character> \n"
" | <string> \n"
" | <ident> '(' <lexp>? (',' <lexp>)* ')' \n"
" | <ident> ; \n"
" \n"
" term : <factor> (('*' | '/' | '%') <factor>)* ; \n"
" lexp : <term> <index>* (('+' | '-') <term> <index>* )* ; \n"
" \n"
" index : '[' <number> ']' ; \n"
" stmt : '{' <stmt>* '}' \n"
" | \"while\" '(' <exp> <index>* ')' <stmt> \n"
" | \"for\" '(' <exp> <index>* ')' <stmt> \n"
" | \"if\" '(' <exp> ')' <stmt> \n"
" | \"loop\" <stmt> \n"
" | <ident> '=' <lexp> <index>* ';' \n"
" | \"print\" '(' <lexp>? ')' ';' \n"
" | \"return\" <lexp>? ';' \n"
" | <ident> <index>* ';' \n"
" | <ident> '(' <ident>? (',' <ident>)* ')' <index>* ';'; \n"
" \n"
" exp : <lexp> '>' <lexp> \n"
" | <lexp> '<' <lexp> \n"
" | <lexp> \">=\" <lexp> \n"
" | <lexp> \"<=\" <lexp> \n"
" | <lexp> \"!=\" <lexp> \n"
" | <lexp> \"==\" <lexp> \n"
" | <lexp> \"in\" <lexp> ; \n"
" \n"
" typeident : (\"int\" | \"char\" | \"str\" | \"bool\" | \"float\" ) ; \n"
" procedure : (\"int\" | \"char\" | \"str\" | \"bool\" | \"float\" ) ':' <ident> '(' <args> ')' <body> ; \n"
" decls : <typeident> <ident> '=' ( <number> | <character> | <string> | <boolean> | <term> ) <index>* ';' ; \n"
" args : <typeident>? (',' <typeident>)* ; \n"
" body : '{' <decls> <stmt>* '}' ; \n"
" use : (\"use\" /[a-zA-Z_\\/\\.][a-zA-Z0-9_\\/\\.]*/)* ; \n"
" xenon : /^/ <use> <decls> <procedure>* /$/ ; \n"
I'm trying to parse this file:
int:stuff(str string_to_print){
int a = 1;
float b = 1.1;
str c = "stuff";
}
But, it's giving this error:
print.pxe:1:4: error: expected one of 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_' at ':'
I have a version of the parser here but that one has problems where it doesn't make a node the child of another node (while for everything else to work correctly, it should be).
Hello, I've been trying to learn to use this library for a personal project so this problem is probably just a result of this learning process, but it's had me scratching my head for a while now.
I modified the maths.c example to the following grammar:
...
mpc_parser_t *New_Expr = mpc_new("new_expr");
mpca_lang(MPCA_LANG_PREDICTIVE,
" expression : <product> (('+' | '-') <product>)*; "
" product : <value> (('*' | '/') <value>)*; "
" value : /[0-9]+/ | <new_expr> | \"repeat\" | \"rand\" | \"sin\" <new_expr> | \"cos\" <new_expr>; "
" new_expr : '(' <expression> ')'; "
" maths : /^/ <expression> /$/; ",
Expr, Prod, Value, Maths, New_Expr, NULL);
...
mpc_cleanup(5, Expr, Prod, Value, Maths, New_Expr);
That is, I added a few more valid values (repeat, rand, sin <new_expr>, cos <new_expr>
) and added a new rule new_expr
.
It works correctly for inputs that contain repeat
, such as 1 + 2 + repeat
, but doesn't work for others such as 1 + 2 + rand
. I have no idea why this is happening. The error message (for the second expression) reads:
test.txt:1:10: error: expected "repeat", "rand", "sin", "cos" or end of input at 'a'
Thanks in advance.
Just adding an extern C in a ifdef macro in the header would be nice. I mean, I cant think of any downsides personally.
I tried using mpc in CPP, and while I should have been aware I wasnt, so I was clueless for half an hour wondering why the references were undefined.
How would you ignore content that isn't recognized by a rule?
Supposing I have something like this:
string : /"(.|[^"])*"/ ;
lang : /^/ /$/
This should recognize quoted strings, but would fail on anything else. So how to define the language such that it collects or discards anything that isn't a string?
Hello.
I've been exploring MPC because I needed it for a project. MPC is quite cool because It's easy to build a parser and work with it, but one thing has been bothering me.
I need to build a parser for a programming language and after that I need to do several operations over it, such as Type Analysis, its actual evaluation, or generation of some other intermediate code from the resulting AST. The thing is, to do this I need to traverse the resulting AST. This means that at each node I need to do string comparisons with the tag field of the AST.
This makes traversing the tree a slow process.
Is there any use case of MPC that allows me to traverse an AST in a faster way in which I can avoid so many string comparisons? Alternatively, do you have any plans to implement any other identification method for nodes other then string tags?
A way to make things faster would have an integer value associated with every tag. The question is, how to integrate that in MPC in a nice way? How about something like this:
#include <mpc.h>
typedef enum {
tag_int,
tag_expr_mul,
tag_expr_add,
tag_input
} MyTags;
int main() {
mpc_parser_t *mpc_int = mpc_new("int", tag_int);
mpc_parser_t *mpc_expr_mul = mpc_new("expr_mul", tag_expr_mult);
mpc_parser_t *mpc_expr_add = mpc_new("expr_add", tag_expr_add);
mpc_parser_t *mpc_input = mpc_new("input", tag_input);
/* ... */
return 0;
}
This doesn't look as nice as the current implementation, but while traversing the tree we can have comparisons like:
int traverse(mpc_ast_t *ast) {
if(ast->tag_i == tag_expr_add) {
return traverse(ast->children[L_OPR]) + traverse(ast->children[R_OPR]);
}
/* ... */
}
What are your thoughts on this?
Regards!
(Notice that the example I've provided is just a quick and dirty example I made using MPC to build a simple integer calculator)
I have defined a rule like this:
/(?<comment>[<][*](?(?=[<][*])\g<comment>|.)*?[*][>])/
but it doesn't seem to work. I use this rule to find comments in a program like this:
def hello ():
<* a comment nananana
<* a comment nananana
BATMAN!!!! *>
BATMAN!!!! *>
puts("Hello world!\n")
<* a comment nananana
BATMAN!!!! *>
end
I was wondering. Does your library support advanced regular expressions like that?
Note: I found the regex I used here: https://regex101.com/r/pP0kG4/3 and I modified it to work on my code. An it seems to work on this website. But in my code it doesn't.
Am I doing something wrong or you just do not support this?
I've been trying to write some code to ignore anything inside a scope (including nested brackets)
The grammar looks like this:
ignore_body "Body" : '{' <ignore_body>* '}' | /[^{}]*/ ;
func "Function" : <func_decl> '{' <ignore_body>* '}' ;
Any ideas why this would loop? I find debugging this hard, because the mpc core code is difficult to unpick....
I'm having an issue with this grammar:
" number : /-?[0-9]+(\\.[0-9]*)?/ ; \n"
" character : /'.' | \".\"/ ; \n"
" string : /\"(\\\\.|[^\"])*\"/ ; \n"
" boolean : /\"true\" | \"false\"/ ; \n"
" \n"
" factor : '(' <lexp> ')' \n"
" | <number> \n"
" | <character> \n"
" | <string> \n"
" | <ident> '(' <lexp>? (',' <lexp>)* ')' \n"
" | <ident> ; \n"
" \n"
" term : <factor> (('*' | '/' | '%') <factor>)* ; \n"
" lexp : <term> <index>* (('+' | '-') <term> <index>* )* ; \n"
" \n"
" index : '[' <number> ']' ; \n"
" stmt : '{' <stmt>* '}' \n"
" | \"while\" '(' <exp> <index>* ')' <stmt> \n"
" | \"for\" '(' <exp> <index>* ')' <stmt> \n"
" | \"if\" '(' <exp> ')' <stmt> \n"
" | \"loop\" <stmt> \n"
" | <ident> '=' <lexp> <index>* ';' \n"
" | \"return\" <lexp>? ';' \n"
" | <ident> <index>* ';' \n"
" | <ident> '(' <ident>? (',' <ident>)* ')' <index>* ';'; \n"
" \n"
" exp : <lexp> '>' <lexp> \n"
" | <lexp> '<' <lexp> \n"
" | <lexp> \">=\" <lexp> \n"
" | <lexp> \"<=\" <lexp> \n"
" | <lexp> \"!=\" <lexp> \n"
" | <lexp> \"==\" <lexp> \n"
" | <lexp> \"in\" <lexp> ; \n"
" \n"
" typeident : (\"int\" | \"char\" | \"str\" | \"bool\" | \"float\" ) <ident> ; \n"
" decls : (<typeident> '=' ( <number> | <character> | <string> | <boolean> | <term> ) <index>* ';')* ; \n"
" args : <typeident>? (',' <typeident>)* ; \n"
" body : '{' <decls> <stmt>* '}' ; \n"
" procedure : (\"int\" | \"char\" | \"str\" | \"bool\" | \"float\" ) ':' <ident> '(' <args> ')' <body> ; \n"
" use : (\"use\" <string>)* ; \n"
" xenon : /^/ <use> <decls> <procedure>* /$/ ; \n"
The issue is at this part: body: '{' <decls> <stmt>* '}';
. The grammar is fine, but, when parsing a file, it complains when I have a decls
after a stmt
. How do I make it, so that MPC doesn't care about the order that they occur in?
Hello.
Suppose I have an AST with tag "Root" and three children ASTs with the following tags:
Suppose I want to get the child AST "Operator" while traversing a tree. Currently I do something like:
mpc_ast_t operator = ast->children[OP_IND]; /* OP_IND == 1 */
But if I'm building a big parser and the AST gets very complex, it will be cumbersome to hardcode so many indexes for every node/child combination.
A good way to solve this in the current version is just implement a function that returns the index number given a tag, or returns the pointer to the AST given the tag, like:
int mpc_ast_get_index(mpc_ast_t *ast, char *tag);
mpc_ast_t *mpc_ast_get_child(mpc_ast_t *ast, char *tag);
As far as I explored, MPC doesn't have such a feature. Would this be something practical to have?
(Notice that this the "char *tag" parameter could be replaced by an integer tag identifier like I've discussed in my previous issue.)
Hi!
I've been working around the examples in the readme (which by the way are not really up to date), and buildyourownlisp. Updating mpc today lead me to get a segfault on my mpca_lang instanciation.
It seems to be a problem with a NULL pointer manipulation, but don't really have the time to dig in it right now, any clue?
example source
#include <stdio.h>
#include <stdlib.h>
#include "mpc.h"
int main(int argc, char** argv) {
mpc_parser_t *Expr = mpc_new("expr");
mpc_parser_t *Value = mpc_new("value");
mpc_parser_t *Maths = mpc_new("maths");
mpc_result_t result;
char* input = "123";
mpca_lang(MPCA_LANG_DEFAULT,
" \
expression : <value> ; \
value : /[0-9]+/ | <expression> ; \
maths : /^/ <expression> /$/ ; \
",
Expr, Value, Maths);
if (mpc_parse("input", input, Value, &result)) {
mpc_ast_print(result.output);
mpc_ast_delete(result.output);
} else {
mpc_err_print(result.error);
mpc_err_delete(result.error);
}
mpc_cleanup(3, Expr, Value, Maths);
}
gdb session
(gdb) b main
Breakpoint 1 at 0x400dfc: file src/main.c, line 7.
(gdb) r
Starting program: /home/oleiade/Dev/Sandbox/C/config/./bin/debug/config
Breakpoint 1, main (argc=1, argv=0x7fffffffe148) at src/main.c:7
7 mpc_parser_t *Expr = mpc_new("expr");
(gdb) n
8 mpc_parser_t *Value = mpc_new("value");
(gdb)
9 mpc_parser_t *Maths = mpc_new("maths");
(gdb)
11 char* input = "123";
(gdb)
13 mpca_lang(MPCA_LANG_DEFAULT,
(gdb)
Program received signal SIGSEGV, Segmentation fault.
0x0000000000409e46 in mpca_grammar_find_parser (x=0x618c10 "expression", st=0x7fffffffdf40) at src/mpc.c:2886
2886 if (p->name && strcmp(p->name, x) == 0) { return p; }
Hello,
I'm trying to parse decimal numbers using this regexp:
mpca_lang(MPCA_LANG_DEFAULT,
"
number : /-?[0-9]+(.[0-9]*)?/ ;
"
this works but I have to specify always a decimal dot.
lispy> list 1.0 2.0 3.0 4.0
regex
expr|symbol|string:1:1 'list'
expr|number|regex:1:6 '1.0'
expr|number|regex:1:10 '2.0'
expr|number|regex:1:14 '3.0'
expr|number|regex:1:18 '4.0'
regex
{1.000000 2.000000 3.000000 4.000000}
but if I try
lispy> list 1 2 3 4
regex
expr|symbol|string:1:1 'list'
expr|number|regex:1:6 '1 2'
expr|number|regex:1:10 '3 4'
regex
{1.000000 3.000000}
so it seems the number is not parsed correctly.
I'm sure it's my fault, but I cannot understand the error in my regexp.
thanks
Fausto
I wrote the following simple test program, test.c:
#include <stdio.h>
#include "mpc/mpc.h"
int main(int argc, char **argv) {
mpc_result_t result;
if (argc != 2) {
fprintf(stderr, "Usage: test filename\n");
exit(1);
}
mpc_parser_t *asdf = mpc_new("asdf");
mpc_parser_t *jkl = mpc_new("jkl");
mpc_parser_t *line = mpc_new("line");
mpc_parser_t *braceLine = mpc_new("braceLine");
mpc_parser_t *program = mpc_new("program");
mpca_lang(MPCA_LANG_DEFAULT,
"asdf : \"asdf\";"
"jkl : \"jkl;\";"
"line : <asdf>* <jkl>;"
"braceLine : <line> | ('{' <program> '}');"
"program : <braceLine>*;",
asdf, jkl, line, braceLine, program, NULL);
if (mpc_parse_contents(argv[1], program, &result)) {
mpc_ast_print(result.output);
mpc_ast_delete(result.output);
}
else {
fprintf(stderr, "Error!\n");
mpc_err_print(result.error);
mpc_err_delete(result.error);
}
mpc_cleanup(5, asdf, jkl, line, braceLine, program);
return 0;
}
This program runs correctly when the input is valid, and crashes with a segfault when the input is invalid, before reaching the "Error!" fprintf, so the segfault is in mpc_parse_contents. I ran gdb and got the following result:
Program received signal SIGSEGV, Segmentation fault.
0x080502c3 in mpc_ast_print_depth (a=0x0, d=0, fp=0xb374e0) at mpc/mpc.c:2604
2604 if (strlen(a->contents)) {
Let me know if you need any more information.
I have a grammar in C-style string:
identifier : /[a-zA-Z0-9_]+/ ;
lambda : '\\' <identifier> '.' <expr> ;
application : <lambda> ' ' <expr> ;
expr : <lambda> | <application> | <identifier> ;
lizp : /^/ <expr>* /$/ ;
I want to parse codes like:
\x.x y
\x.x \y.y z
...
I use this grammar in mpca_lang() with flag MPCA_LANG_WHITESPACE_SENSITIVE
and generate a binary , the binary will report "Parser Undefined" error.
I think it's caused by the '\\'
in the second grammar, so I replace the '\\'
by another notation.Then it work.
Is this a BUG ? Or how should I do with it ?
Hi,
first of all thanks for your wonderful book!
I am having difficulty adding support for doubles and wondered if you could offer any tips?
I've added a parser for doubles and all the rest of the plumbing as per number but can't seem to get past the first stage, defining the reggae for the parser:
mpca_lang(MPCA_LANG_DEFAULT,
"
number : /-?[0-9]+/ ;
doub : /-?[0-9]+.[0-9]+/ ;
symbol : "list" | "head" | "tail"
| "join" | "eval" | "len"
...
I also get the following error message when typing in say '1.0' into my interpreter.
:1:2: error: expected one of '0123456789', whitespace, '-', one or more of one of '0123456789', "list", "head", "tail", "join", "eval", "len", "cons", "init", '+', '*', '/', '(', '{' or end of input at '.'
thanks in advance,
Jude
Please check out my gist:
https://gist.github.com/yihuang/0af450e858daf2d99138
So this issue is a little out of sorts, I guess. I've tried to modify mpc to parse <base> { <factor> }
instead of <base> { <digits> }
, as I would like to specify n
-separated lists of x
by writing x{n}
in the grammar.
To begin, I modified mpcaf_grammar_repeat()
like so:
static mpc_val_t *mpcaf_grammar_repeat(int n, mpc_val_t **xs) {
(void) n;
if (xs[1] == NULL) { return xs[0]; }
if (strcmp(xs[1], "*") == 0) { free(xs[1]); return mpca_many(xs[0]); }
if (strcmp(xs[1], "+") == 0) { free(xs[1]); return mpca_many1(xs[0]); }
if (strcmp(xs[1], "?") == 0) { free(xs[1]); return mpca_maybe(xs[0]); }
if (strcmp(xs[1], "!") == 0) { free(xs[1]); return mpca_not(xs[0]); }
return mpca_and(2, xs[0], mpca_many(mpca_and(2, xs[1], xs[0])));
}
And modified mpca_lang_st
to define Factor
as:
mpc_define(Factor, mpc_and(2, mpcaf_grammar_repeat,
Base,
mpc_or(6,
mpc_sym("*"),
mpc_sym("+"),
mpc_sym("?"),
mpc_sym("!"),
mpc_tok_brackets(Factor, free),
mpc_pass()),
mpc_soft_delete
));
Sample grammar and the result of mpc_print()
'ing the generated parser:
list : "text"{","} ;
->
(<:> ((<S> ("text" whitespace)) ((<S> ("," whitespace)) (<S> ("text" whitespace)))*))
So the above looks good, but my problem is that parsing this new grammar seems to be causing a segfault after a few iterations of mpc_undefine_unretained()
(upon accessing the passed in parser):
Looking at the stack trace, it looks like the calls to mpc_undefine_and()
match up with the generated parser. The only thing I can think of is that xs[0]
in the first snippet is being freed and then accessed again in the later and() parser that combines xs[1]
and xs[0]
. I'm still not super comfortable with mpc's source so I'm not really sure what to do at this point.
Any ideas?
I'm trying to replace an awful pile of buggy (legacy) regex hacks with a proper parser, and I hit an odd situation. Consider the following minimal example:
mpc_result_t r;
mpc_parser_t *comment = mpc_many(mpcf_strfold, mpc_noneof("*#"));
char *input = strdup("not a #comment");
if(mpc_parse("input", input, comment, &r)) {
printf("\"%s\"\n", r.output); // Quoting to display whitespace bounds
} else { mpc_err_print(r.error); mpc_err_delete(r.error); }
That works, but leaves behind any space between the end of the statement and the beginning of the comment. I thought I could simply remove them with one of the provided parsers, but they don't seem to work as I'd expect. I tried this first:
mpc_parser_t *comment = mpc_stripr(mpc_many(mpcf_strfold, mpc_noneof("*#")));
...and then the same with mpc_strip() and mpc_tok(), but the output continues to be
"not a "
I don't know if this is just a doc bug or a library bug or what, but something certainly feels wrong here. What can we do about it?
I am using currently using your (awesome) project for a doodle of mine. While browsing through it and giving rewriting a shot(for my understanding of it and to fit into my personal coding style), I saw that you do not cast malloc
and the like to the appropriate pointer types, which does not play well with the picky extensions I am normally using. There's a few more occurrences of missing casting(namely when using sizeof
). Should I go and create a pull request for this or do you deem casts as too ugly?
Also, you are not checking the return value of realloc
. Fair enough, but I rewrote that to at least check it in the mpc_stack_parsers_reserve_*
functions, where failing to allocate more memory could happen. Should I create a pull request for that, too?
Hi ,
First of all thank you for that an awesome tutorial and the library. I have reached the polish evaluation portion and i did the evaluation with the grammar
expr : |'(' +')โ;
Now i want to change the grammar to this.
expr : | +;
But I am getting confused or I am unable to visualise how the AST will look like. I am unable to understand the tree structure generated.
(+ 2 3)(+ 2 3) <- For the first grammar
-> lispy|>
regex
operator|char:1:1 '*'
expr|>
char:1:3 '('
operator|char:1:5 '+'
expr|number|regex:1:7 '2'
expr|number|regex:1:9 '3'
char:1:10 ')'
expr|>
char:1:12 '('
operator|char:1:13 '+'
expr|number|regex:1:15 '2'
expr|number|regex:1:17 '3'
char:1:18 ')'
regex
lispy|>
regex
operator|char:1:1 '*'
expr|>
operator|char:1:3 '+'
expr|number|regex:1:5 '2'
expr|number|regex:1:7 '3'
expr|>
operator|char:1:9 '+'
expr|number|regex:1:11 '2'
expr|number|regex:1:13 '3'
regex
Hoping that you will help me .
Hi!
Just wanted to say this is I really like the library, but I had two issues with it. There appears to be
a few memory related problems, I solved this in my project which used by mpc with a wrapper
around realloc, and replacing all instances of realloc with it:
static void *reallocate(void p, size_t n)
{
/**@bug something is not allocating enough memory, the +1 fixes the Valgrind issues/
errno = 0;
void *r = realloc(p, n+1);
if(!r)
fatal("reallocate failed: %s", errno ? strerror(errno) : "unknown reason");
return r;
}
As a temporary fix.
This problem is reproducible with mpcs "test" program. I've attached the output of valgrind. Presumably one of the reallocs somewhere is missing a +1 or something.
The second, minor, issue, I get the same build error as in #60, as strtof was introduced in c99, specify the ansi flag means the strtof definition is never defined, which is allowed behavior. The fix would be to specify -std=c99 in the makefile as c99 functions are used.
Output of uname -a:
Linux dhcppc2 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux
This was on commit 37c12b1.
Thanks!
In my attempt to extend Lispy for a double type, I created the following grammar (full source code):
mpc_parser_t* Integer = mpc_new("integer");
mpc_parser_t* Double = mpc_new("double");
mpc_parser_t* Symbol = mpc_new("symbol");
mpc_parser_t* Sexpr = mpc_new("sexpr");
mpc_parser_t* Expr = mpc_new("expr");
mpc_parser_t* Lispy = mpc_new("lispy");
const char *grammar =
" \n\
double : /-?[0-9]+.[0-9]+/ | ; \n\
integer : /-?[0-9]+/ | ; \n\
symbol : '+' | '-' | '*' | '/' | '%' | '^' | \"min\" | \"max\" | \"inc\" | \"dec\" ; \n\
sexpr : '(' <expr>* ')' ; \n\
expr : <double> | <integer> | <symbol> | <sexpr> ; \n\
lispy : /^/ <expr>* /$/ ; \n\
\n";
printf("Grammar: %s\n", grammar);
mpca_lang(MPC_LANG_DEFAULT, grammar,
Double, Integer, Symbol, Sexpr, Expr, Lispy);
Unfortunately, this gets my program stuck in mpc_parse();
(despite running on Linux;).
I've written an expression like so "decimal : /-?[0-9]+.[0-9]+/;", however it will treat 4 4 and 4.0 both as 4.0.
hi orangeduck:
i'm a reader of your Build Your Own Lisp, thank you for this great book first :)
i also a user of clib
when i try to install mpc via clib in a clean c project, everything work fine.
and then i create a package.json
according to clib's spec like:
{
"dependencies": {
"orangeduck/mpc": "0.8.6"
}
}
but when i execute clib install
with the package.json
above, error shows up:
~> clib install
fetch : orangeduck/mpc:package.json
error : unable to fetch orangeduck/mpc:package.json
so i dig into this problems, found out clib try to fetch the package.json
from
[https://raw.githubusercontent.com/orangeduck/mpc/0.8.6/package.json HTTP/1.1]
check this out: https://github.com/clibs/clib/blob/d05a64a6c40add19d314b2fc639ec58c9a014309/deps/clib-package/clib-package.c#L460
personally, i don't this is a bug of clibs. but the problem can easily fix by set a git tag for mpc
how do you think?
I am reading your book and I would like to know if this parser could be used to build a compiler o this library will be useful only for interpretate a language and not compile it.
First of, thanks for the amazing little library!
I have been working with a friend on a academic project, in which we had to implement a interpreter for a tiny given language. This language has to support compound array and list data types. This means we want to be able to define for example this kind of variables:
list[int][] x
(an array of integer lists)
The grammar that we defined at the start was this:
singletype : "int" | "bool" | "char" ;
array : <type> "[]" ;
list : "list" '[' <type> ']' ;
type : <singletype> | <list> | <array> ;
But mpc keeps falling in infinite loop and we have no idea why...
For know we removed the complex array support leaving it with only primitive types and N depth using this:
singletype : "int" | "bool" | "char" ;
array : <singletype> ("[]")+ ;
list : "list" '[' <type> ']' ;
type : <list> | <array> | <singletype> ;
Still we cannot find out if the problem appears to be in the library or in our grammar definition.
Any ideas are appreciated
Yeah, I was very happy to see this project a few minutes ago, thanks for you work.
Useful code.
I'm writing code using mpc.
#define STRUCTURE \
" \n" \
"number : /-?[0-9]+(\\.[0-9]*)?(e[0-9]+)?/ ; \n" \
"factor : '(' <lexp> ')' \n" \
" | <number> \n" \
" | <string> \n" \
" | <array> \n" \
" | <hash> \n" \
" | <lambda> \n" \
" | <call> \n" \
" | <item> \n" \
" | <ident> ; \n" \
"string : /\"[^\"]*\"/ ; \n" \
"array : '[' <lexp>? (',' <lexp>)* ']' ; \n" \
"pair : <string> ':' <lexp> ; \n" \
"hash : '{' <pair>? (',' <pair>)* '}' ; \n" \
"ident : /[a-zA-Z][a-zA-Z0-9_]*/ ; \n" \
" \n" \
"term : <factor> (('*' | '/' | '%') <factor>)* ; \n" \
"lexp : <term> (('+' | '-') <term>)* ; \n" \
"let_v : <ident> '=' <lexp> ';' ; \n" \
"item : <factor> '[' <lexp> ']' ; \n" \
"let_a : <item> '=' <lexp> ';' ; \n" \
"var : \"var\" <ident> '=' <lexp> ';' ; \n" \
"vararg : \"...\" ; \n" \
"stmts : <stmt>* ; \n" \
" \n" \
"lambda : \"func\" " \
" '(' <ident>? (<vararg> | (',' <ident>)*) ')' '{' <stmts> '}' ; \n" \
"func : \"func\" <ident> " \
" '(' <ident>? (<vararg> | (',' <ident>)*) ')' '{' <stmts> '}' ; \n" \
" \n" \
"call : <ident> '(' <lexp>? (',' <lexp>)* ')' ; \n" \
"return : \"return\" <lexp> ';' ; \n" \
"comment : /#[^\n]*/ ; \n" \
"eof : /$/ ; \n" \
"stmt : (<let_v> | <let_a> | <var> | (<lexp> ';') " \
" | <func> | <return> | <comment>) ; \n" \
"program : <stmts> <eof> ; \n"
It seems <item>
make hang.
Hi,
instead of detailed error message, generic "Unknown Error" is displayed in some cases. Bug appeared in commit
227dd44.
To be specific:
i'm working on code from chapter 6 build your own lisp.
example:
lispy>xxx
<stdinput>: error: Unknown Error
this works:
lispy> + 1 x
<stdinput>:1:5: error: expected '-', one or more of one of '0123456789', '(' or end of input at 'x'
Hello,
While playing with the code from your book Build Your Own Lisp (which is great, by the way!), using Clang's address sanitizer, I noticed that a buffer overflow was detected upon use of mpc_err_print
. Consider the following MWE:
#include "mpc.h"
int main(int argc, char** argv) {
mpc_parser_t* Foobar = mpc_new("foobar");
mpca_lang(MPCA_LANG_DEFAULT, "foobar : \"foo\" | \"bar\";", Foobar);
mpc_result_t r;
if (mpc_parse("<stdin>", argv[1], Foobar, &r)) {
mpc_ast_print(r.output);
mpc_ast_delete(r.output);
} else {
mpc_err_print(r.error);
mpc_err_delete(r.error);
}
mpc_cleanup(1, Foobar);
return 0;
}
when this is compiled (on Ubuntu 14.04 with clang-3.5.1) with
clang -fsanitize=address -std=c99 -Wall test.c mpc.c -lm -o test
I get the following results:
$ ./test foo
string:1:1 'foo'
$ ./test baz
=================================================================
==21438==ERROR: AddressSanitizer: global-buffer-overflow on address 0x000001371ca3 at pc 0x0000004462b0 bp 0x7fffe8244c10 sp 0x7fffe82443d0
READ of size 4 at 0x000001371ca3 thread T0
#0 0x4462af (/home/<snip>/test+0x4462af)
#1 0x4471ee (/home/<snip>/test+0x4471ee)
#2 0x4bcb0c (/home/<snip>/test+0x4bcb0c)
#3 0x4bc2dc (/home/<snip>/test+0x4bc2dc)
#4 0x4bb12e (/home/<snip>/test+0x4bb12e)
#5 0x4baf84 (/home/<snip>/test+0x4baf84)
#6 0x4ba7e3 (/home/<snip>/test+0x4ba7e3)
#7 0x7fb57c3bbec4 (/lib/x86_64-linux-gnu/libc.so.6+0x21ec4)
#8 0x4ba36c (/home/<snip>/test+0x4ba36c)
0x000001371ca3 is located 0 bytes to the right of global variable 'char_unescape_buffer' defined in 'mpc.c:125:13' (0x1371ca0) of size 3
SUMMARY: AddressSanitizer: global-buffer-overflow ??:0 ??
Shadow bytes around the buggy address:
0x000080266340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x000080266350: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x000080266360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x000080266370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x000080266380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x000080266390: 00 00 00 00[03]f9 f9 f9 f9 f9 f9 f9 00 00 00 00
0x0000802663a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0000802663b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0000802663c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0000802663d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0000802663e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Heap right redzone: fb
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack partial redzone: f4
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
ASan internal: fe
==21438==ABORTING
I think I have narrowed the problem down to the call to mpc_err_char_unescape(x->recieved)
on line 181 of mpc.c
, but unfortunately I'm just a beginner at C and haven't got any further than that!
Hello! I am trying to write a parser for a simple scripting language. So I completed my grammar and I tried to run it but I mpc_parse runs for about 25 seconds and then crashes without reason.
The biggest problem is that the error is not always on the same spot. Some times it is on mpc_error or on mpc_delete. I don't know. It's kind of random.
I will give you part of my code and I hope you can tell me if it is my code's problem or if I should start debuggin mpc!!
My code:
int main()
{
mpc_parser_t* Int = mpc_new("int");
mpc_parser_t* Char = mpc_new("char");
mpc_parser_t* String = mpc_new("string");
mpc_parser_t* Id = mpc_new("id");
mpc_parser_t* Type = mpc_new("type");
mpc_parser_t* Formal = mpc_new("formal");
mpc_parser_t* Header = mpc_new("header");
mpc_parser_t* FuncDecl = mpc_new("funcdecl");
mpc_parser_t* VarDef = mpc_new("vardef");
mpc_parser_t* Expr = mpc_new("expr");
mpc_parser_t* Call = mpc_new("call");
mpc_parser_t* Atom = mpc_new("atom");
mpc_parser_t* Simple = mpc_new("simple");
mpc_parser_t* SimpleList = mpc_new("simplelist");
mpc_parser_t* Stmt = mpc_new("stmt");
mpc_parser_t* FuncDef = mpc_new("funcdef");
mpc_parser_t* Program = mpc_new("program");
// Define them with the following Language
mpca_lang(MPCA_LANG_DEFAULT,
" \
int : /-?[0-9]+/ ; \
char : /'[a-zA-Z0-9!@#$%^&*()\\_+-,.\\/<>?;'|\"`~]'/ ; \
string : /\"(\\\\.|[^\"])*\"/ ; \
id : /[a-zA-Z][a-zA-Z0-9_-]*/ ; \
type : \"int\" | \"bool\" | \"char\" | <type> '[' ']' | \"list\" '[' <type> ']' ; \
formal : (\"ref\")? <type> <id> (',' <id>)* ; \
header : <type>? <id> '(' (<formal> (';' <formal>)*)? ')' ; \
funcdecl : \"decl\" <header> ; \
vardef : <type> <id> (',' <id>)* ; \
expr : <atom> | <int> | <char> | '(' <expr> ')' \
| ('+' | '-') <expr> | <expr> ('+' | '-' | '*' | '/' | \"mod\") <expr> \
| <expr> ('=' | \"<>\" | '<' | '>' | \"<=\" | \">=\") <expr> \
| \"true\" | \"false\" | \"not\" <expr> | <expr> (\"and\" | \"or\") <expr> \
| \"new\" <type> '[' <expr> ']' | \"nil\" | \"nil?\" '(' <expr> ')' \
| <expr> '#' <expr> | \"head\" '(' <expr> ')' | \"tail\" '(' <expr> ')' ; \
call : <id> '(' (<expr> (',' <expr>)*)? ')' ; \
atom : <id> | <string> | <atom> '[' <expr> ']' | <call> ; \
simple : \"skip\" | <atom> \":=\" <expr> | <call> ; \
simplelist : <simple> (',' <simple>)* ; \
stmt : <simple> | \"exit\" | \"return\" <expr> \
| \"if\" <expr> ':' <stmt>+ (\"elif\" <expr> ':' <stmt>+)* \
(\"else\" ':' <stmt>+)? \"end\" \
| \"for\" <simplelist> ';' <expr> ';' <simplelist> ':' <stmt>+ \"end\" ; \
funcdef : \"def\" <header> ':' (<funcdef> | <funcdecl> | <vardef>)* <stmt>+ \"end\" ; \
program : /^/ <funcdef> /$/ ; \
",
Int, Char, String, Id, Type, Formal, Header, FuncDecl, VarDef, Expr,
Call, Atom, Simple, SimpleList, Stmt, FuncDef, Program);
mpc_result_t r;
char* input = "def hey () : return 1 end";
if(mpc_parse("input", input, Program, &r))
{
mpc_ast_print((mpc_ast_t*)r.output);
mpc_ast_delete((mpc_ast_t*)r.output);
}
else
{
mpc_err_print(r.error);
mpc_err_delete(r.error);
}
PAUSE("Press any key to continue . . .");
// Undefine and Delete our Parsers
mpc_cleanup(17, Int, Char, String, Id, Type, Formal, Header, FuncDecl, VarDef, Expr,
Call, Atom, Simple, SimpleList, Stmt, FuncDef, Program);
return 0;
}
The grammar I used in a less obfuscated version (it's hard to read on code):
I am really sorry for the long post. I just thought it would be better if I post my error before spending hours over hours to fix it. Maybe you have seen it again and you can pinpoint the problem immediately.
Thanks for your hard work. Mpc is a great piece of code and very helpful.
Hello,
I am trying to implement a double type in my Build Your Own Lisp language. I am using this regex to parse them: /-?[0-9]+\\.[0-9]+/
. When I try to use a double in my REPL, I get a parser error. I am very confused, because I have the dot escaped properly.
The exact error is <stdin>:1:10: error: expected one of '0123456789', '-', one or more of one of '0123456789', one or more of one of 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUV WXYZ0123456789_+-^%*/\=<>!&', '"', '#', '(', '[' or end of input at '.'
Why am I getting this error? Is this a bug?
I do not know whether POSIX compliance is an issue to you, but according to them, the typenames ending with _t
are reserved for them(you can find a complete list of all reserved names here).
I could make this a pull request, but I figured that it is one substitution command and you might want to change the names completely then(because without the ending _t
they sound a bit awkward I guess).
Feel free to close this issue if it is not a problem to you.
Like we've started discussing in #48, the idea is to have a simple way to traverse a tree in either pre order or post order.
To start we have the following save point:
typedef struct mpc_ast_trav_t {
mpc_ast_t *curr_node;
struct mpc_ast_trav_t *parent;
int curr_child;
mpc_ast_trav_order_t order;
} mpc_ast_trav_t;
The order type is the following enum:
typedef enum {
mpc_ast_trav_order_pre,
mpc_ast_trav_order_post
} mpc_ast_trav_order_t;
Using the start function:
mpc_ast_trav_t *mpc_ast_traverse_start(mpc_ast_t *ast,
mpc_ast_trav_order_t order);
We get a saving point which just keeps information about the current location in the tree. To iterate through it we ca use the "next" function:
mpc_ast_t *mpc_ast_traverse_next(mpc_ast_trav_t *trav);
The idea is that we can traverse the whole tree using only the next function. It will be practical in some applications. For example, to convert the whole tree to another specification (Like we've discussed in a previous issue where we wanted to convert from mpc_ast_t
to another tree specification)
I've already started implementing these functions, but haven't finished yet.
Note that for now I've only thought about two order, pre and post. Pre order will make "next" function yield nodes "x", "a", "b", in the following tree:
X
/ \
A B
While post order would make "next" yield "a", "b", "x". Note also that because the trees are not necessarily binary, we don't get infix, prefix, or postfix.
I'm using Lizard plugin (https://github.com/terryyin/lizard) to analyze Cyclomatic Complexity on MPC. I found these after analysis?
NLOC CCN token PARAM length location
170 62 1790 4 221 mpc_parse_run@[email protected]
34 19 241 2 46 mpc_undefine_unretained@[email protected]
60 21 631 1 74 mpc_copy@[email protected]
21 17 222 1 21 mpc_re_escape_char@[email protected]
95 30 863 2 113 mpc_print_unretained@[email protected]
28 17 300 2 37 mpcf_fold_ast@[email protected]
28 16 379 2 38 mpc_nodecount_unretained@[email protected]
137 58 2304 2 166 mpc_optimise_unretained@[email protected]
How I can reduce future defect on future? Please advice. TQ
I tried creating your example. When I try:
mpc_ast_print(r.output);
I get syntax errors.
I use windows compiler though. Am I missing something?
Hi ...
During coding along the buildyourownlisp.com tutorial I stumbled upon an endless loop.
The code is the following:
#include <stdlib.h>
#include <stdio.h>
/* The parser combinator lib */
#include "mpc.h"
#ifdef _WIN32
#include <string.h>
/* Declare a buffer for user input size 2048 */
static char input[2048];
/* Fake readline function */
char * readline(char* prompt)
{
fputs(prompt, stdout);
fgets(buffer, 2048, stdin);
char* cpy = malloc(strlen(buffer) + 1);
strcpy(cpy, buffer);
cpy[strlen(cpy) - 1] = '\0';
return cpy;
}
/* Fake add_history function */
void add_history(char* unused) {}
/* Otherwise include the editline headers */
#else
#include <readline/readline.h>
#include <readline/history.h>
#endif
int main(int argc, char ** argv)
{
/* Create some parsers */
mpc_parser_t* Number = mpc_new("number");
mpc_parser_t* Operator = mpc_new("operator");
mpc_parser_t* Expr = mpc_new("expr");
mpc_parser_t* Lispy = mpc_new("lispy");
/* Define them with the following Language */
mpca_lang(MPCA_LANG_DEFAULT,
" \
number : /-?[0-9]?/ ; \
operator : '+' | '-' | '*' | '/' ; \
expr : <number> | '(' <operator> <expr>+ ')' ; \
lispy : /^/ <operator> <expr>+ /$/ ; \
",
Number, Operator, Expr, Lispy);
/* Print version and exit information */
puts("Lispy Version 0.0.0.0.1");
puts("Press Ctrl+c to Exit.\n");
while (1)
{
char* input = readline("lispy> ");
add_history(input);
/* Attempt to parse the user input */
mpc_result_t r;
if (mpc_parse("<stdin>", input, Lispy, &r))
{
/* On success print the AST */
mpc_ast_print(r.output);
mpc_ast_delete(r.output);
}
else
{
/* Otherwise print the error */
mpc_err_print(r.error);
mpc_err_delete(r.error);
}
free(input);
}
/* Clean up the parser */
mpc_cleanup(4, Number, Operator, Expr, Lispy);
return 0;
}
The endless loop happens in mpc.c
in the function mpc_parse_input
in the while loop on line 887
in the case MPC_TYPE_MANY1
. I put a printf into the while()
body, printing j
and results_slots
, and those numbers increase steadily. (Tested under Cygwin and OSX 10.10).
I'm not sure if this is correct behavior or not, but it's causing me some execution issues.
My rule expects <inner_block>
, which contains (<statement> | <comment>)*
. In the case that there is only 1 child in <inner_block>, the <inner_block> is optimized out in favor of the child, which I do not expect to interpret at the level. Here's the input:
[my_func]
echo (a b c)
...and the grammar as defined by mpca_lang():
qscript : /^/ (<comment> | <resource>)* /$/ ;
comment : '#' /[^\\n]*/ ;
resource : '[' (<rtype> <rname>) ']' <inner_block> ;
rtype : /[*]*/ ;
rname : <qstring> ;
inner_block : (<comment> | <statement>)* ;
statement : <function> '(' (<comment> | <parameter> | <block>)* ')' <seperator> ;
function : <qstring> ;
parameter : (<statement> | <literal>) ;
literal : (<number> | <qstring>) <seperator> ;
block : '{' <inner_block> '}' ;
seperator : ',' | \"\" ;
qstring : (<complexstr> | <simplestr>) <qstring>* ;
simplestr : /[a-zA-Z0-9_!@#$%^&\\*_+\\-\\.=\\/<>]+/ ;
complexstr : (/\"[^\"]*\"/ | /'[^']*'/) ;
number : (<float> | <int>) ;
float : /[-+]?[0-9]+\\.[0-9]+/ ;
int : /[-+]?[0-9]+/ ;
Here are the results from mpc_ast_print():
>
regex
resource|>
char:1:1 '['
rtype|regex
rname|qstring|simplestr|regex:1:2 'my_func'
char:1:9 ']'
statement|> <--------------- shouldn't expect a statement in <resource>!
function|qstring|simplestr|regex:2:4 'echo'
char:2:9 '('
literal|>
qstring|>
simplestr|regex:2:10 'a'
qstring|>
simplestr|regex:2:12 'b'
qstring|simplestr|regex:2:14 'c'
seperator|string
char:2:15 ')'
seperator|string
regex
I've noticed that when I comment out a certain optimization at mpc.c +2695:
if (a->children_num == 1) { return a; }
...I get the desired results:
(snip)
char:1:9 ']'
inner_block|>
statement|>
function|qstring|simplestr|regex:2:4 'echo'
(etc)
I've found that changing the <inner_block> to:
inner_block : (<comment> | <statement>)* \"\";
...which forces a token after <inner_block>, which is a nice work-around, but doesn't feel like a solution.
Edit: Otherwise awesome library, btw :)
Edit 2: Formatting
I was working through Build Your Own Lisp I mistakenly put two '+' in the grammar. The program failed with a segfault.
The problem was easy to spot, but a simple error message would be preferred.
I changed the test program examples/doge.c
if (argc > 1) {
mpc_result_t r;
int i;
for (i = 0; i < 10000; ++i)
if (mpc_parse_contents(argv[1], Doge, &r)) {
/* mpc_ast_print(r.output); */
mpc_ast_delete(r.output);
} else {
mpc_err_print(r.error);
mpc_err_delete(r.error);
}
to parse a file 10000 times.
Running examples/doge.c with an empty input file needs 0.9sec on a 1Ghz x86.
Therefore a single iteration needs ~ 0.09msec for doing nothing (except opening and closing).
Running examples/doge.c with input
so c so c so c so c so c so c so c so c so c
so c so c so c so c so c so c so c so c so c
needs 12.8sec. A single iteration needs ~ 1.3msec.
Running examples/doge.c with 8 input lines of the following form
so c so c so c so c so c so c so c so c so c
needs 54.4sec. A single iteration needs ~ 5.4msec parsing
(4 times the time needed for 2 lines of input).
A file with 2000 lines of input would need more than a second to be parsed.
That is too slow to be used in production.
Hello. I'm writing code and blog post which write small script language using mpc.
I want to implement stacktrace in next blog entry. Is there way to get code location?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.