Giter Club home page Giter Club logo

lrpeg's Introduction

lrpeg's People

Contributors

oovm avatar seanyoung avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

lrpeg's Issues

Optional `;`

definition <- XID_IDENTIFIER ws "<-" ws expression ";"? ws;

Here I can see ; should be optional, but in fact it is not, ; is required.

sequence <- alternative+;

It seems that sequence will eat the beginning of next rule.

x <- a b c 
y <- d
// equivalent to
x <- a b c y <- d
// should equivalent to
x <- a b c; y <- d

In principle, can this problem be solved?

ambiguous tokens

since the operator "~" doesn't seem to be supported and the regex crate can't do look-ahead operations what is the best way to disambiguate between identifiers and keywordsI have tried to reorder them but it's not ideal because it will either fail to parse or end up in the wrong branch.

also thank you for making lrpeg available!

NodeAPI suggestion

This is a relatively private request, only for my project.

All the interfaces I need are listed here

/// It's a node contained in the Lossless Concrete Syntax Tree
/// All subsequent required information will be retained
/// Including spaces, line breaks, and comments or other semantically irrelevant content.
/// Macros and formatting can start at this level
pub trait CSTNode where Self: Sized {
    /// get str of the node
    fn get_str(&self) -> &str;
    /// Provide basic location information
    /// (start_offset, end_offset)
    fn get_span(&self) -> (usize, usize);
    /// Provide detailed row and column ranges
    /// (start_line, start_col, end_line, end_col)
    fn get_range(&self) -> (usize, usize, usize, usize);
    /// Get the tag of the current node
    fn get_node_tag(&self) -> Option<&'static str>;
    /// Get the tag of the current branch
    fn get_branch_tag(&self) -> Option<&'static str>;
    /// Find node tags in all of the children
    /// Then collect then into the vec, and store in hashmap with tag name
    fn get_tag_map(&self) -> HashMap<String, Vec<Self>>;
}

This version is used in pest, lrpeg itself does not store the input reference, so it may look like this

pub trait CSTNode where Self: Sized {
    fn get_str<'i>(&self,input: &'i str) -> &'i str;
    fn get_span(&self, input:&str) -> (usize, usize);
    fn get_range(&self, input:&str) -> (usize, usize, usize, usize);
    fn get_node_tag(&self) -> Option<&'static str>;
    fn get_branch_tag(&self) -> Option<&'static str>;
    fn get_tag_map(&self) -> HashMap<String, Vec<Self>>;
}

Hope this helps refer to which field in Node should keep.


Why do I need two ranges (span, range)?

because the language server protocol is like this, most interfaces are rows and columns, and a few cases are offsets.

Are rules related to order?

// everything ok
IGNORE <- WHITESPACE / COMMENT;

COMMENT <- "//" re#[^\r\n]# .;

If you change the order, it will go wrong:

// failed to run custom build command
COMMENT <- "//" re#[^\r\n]# .;

IGNORE <- WHITESPACE / COMMENT;
error: failed to run custom build command for `lrpeg-test v0.1.0 (D:\Rust\lrpeg\lrpeg-test)`

Bad case report of missing label

Simplified Example

calculator <- <atom:atom> EOI;

atom <-
    Num:/ <num:num>
    Str:/ <str:str>;

num <- re#0|[1-9][0-9]*#;
str <- re#"([^\\"]|\\.)*"#;

Bad Result

(omit EOI)

Node {
    rule: calculator,
    start: 0,
    end: 1,
    children: [
        Node {
            rule: atom,
            start: 0,
            end: 1,
            children: [
                Node {
                    rule: num,
                    start: 0,
                    end: 1,
                    children: [],
                    label: None,
                    alternative: None,
                },
            ],
            label: Some("atom"),
            alternative: Some("Num"),
        },
    ],
    label: None,
    alternative: None,
}

Excepted Result

(omit EOI)

Node {
    rule: calculator,
    start: 0,
    end: 1,
    children: [
        Node {
            rule: atom,
            start: 0,
            end: 1,
            children: [
                Node {
                    rule: num,
                    start: 0,
                    end: 1,
                    children: [],
                    label: Some("num"),
                    alternative: None,
                },
            ],
            label: Some("atom"),
            alternative: Some("Num"),
        },
    ],
    label: None,
    alternative: None,
}

Bad case report of indirect left recursion

Input text

x = a | b | c
    Finished test [unoptimized + debuginfo] target(s) in 0.10s
     Running tests\main.rs (target\debug\deps\main-c0d26e7fa1d49033.exe)

called `Result::unwrap()` on an `Err` value: (2, 6)
thread 'peg::assign' panicked at 'called `Result::unwrap()` on an `Err` value: (2, 6)', projects\ygg-bootstrap\tests\peg\mod.rs:5:38
stack backtrace:
   0: std::panicking::begin_panic_handler
             at /rustc/d7c97a02d1215e4ef26c31cb72dbaf16fd548b2c\/library\std\src\panicking.rs:517
   1: core::panicking::panic_fmt
             at /rustc/d7c97a02d1215e4ef26c31cb72dbaf16fd548b2c\/library\core\src\panicking.rs:100
   2: core::result::unwrap_failed
             at /rustc/d7c97a02d1215e4ef26c31cb72dbaf16fd548b2c\/library\core\src\result.rs:1616
   3: enum$<core::result::Result<yggdrasil_bootstrap::ygg::ygg::Node,tuple$<usize,usize> > >::unwrap<yggdrasil_bootstrap::ygg::ygg::Node,tuple$<usize,usize> >
             at /rustc/d7c97a02d1215e4ef26c31cb72dbaf16fd548b2c\library\core\src\result.rs:1298
   4: main::peg::peg_assert
             at .\tests\peg\mod.rs:5
   5: main::peg::assign
             at .\tests\peg\mod.rs:47
   6: main::peg::assign::closure$0
             at .\tests\peg\mod.rs:42
   7: core::ops::function::FnOnce::call_once<main::peg::assign::closure$0,tuple$<> >
             at /rustc/d7c97a02d1215e4ef26c31cb72dbaf16fd548b2c\library\core\src\ops\function.rs:227
   8: core::ops::function::FnOnce::call_once
             at /rustc/d7c97a02d1215e4ef26c31cb72dbaf16fd548b2c\library\core\src\ops\function.rs:227
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

error: test failed, to rerun pass '--test main'

Process finished with exit code 101

Bad PEG

Forgive me for not minimizing the problem, there are some other problems I am figuring out.

program  <- IGNORE (statement IGNORE)* EOI;

statement  <- empty_statement
  / grammar_statement IGNORE eos?
  / fragment_statement IGNORE eos?
  / import_statement IGNORE eos?
  / ignore_statement IGNORE eos?
  / assign_statement IGNORE eos?
  / macro_define IGNORE eos?
;

empty_statement <- eos;

eos  <- ";";


grammar_statement  <-
    grammar IGNORE symbol IGNORE "{" IGNORE (string IGNORE ("," IGNORE string)* IGNORE ","?)? IGNORE "}"
  / grammar IGNORE symbol IGNORE string?
;
grammar  <- "grammar!";

fragment_statement <- fragment IGNORE symbol;
fragment <- "fragment!";


import_statement  <-
    import IGNORE string IGNORE "{" IGNORE (symbol_alias IGNORE ("," IGNORE symbol_alias)* IGNORE ","?)? IGNORE "}"
  / import IGNORE string
;
import  <- "import!";

symbol_alias <- (symbol IGNORE "as" IGNORE)? symbol;

ignore_statement  <-
    ignore IGNORE symbol
  / ignore IGNORE "{" IGNORE (symbol IGNORE ("," IGNORE symbol)* IGNORE ","?)? IGNORE "}"
;
ignore  <- "ignore!";



assign_statement  <- symbol IGNORE assign_kind IGNORE ("|"/"/")? IGNORE expr;
assign_kind  <- ("^"/"_"/"@")*  "=";

expr  <- expr0;
expr0 <- symbol IGNORE mark_type? IGNORE "<-" IGNORE expr1
    / expr1;
expr1 <- choice_expr IGNORE ("|"/"/") IGNORE choice_expr
    / expr2;
expr2 <- expr2 IGNORE "~" IGNORE expr3
    / expr3;
expr3 <- expr3 IGNORE slice
    / expr4;
expr4 <- expr4 IGNORE suffix
    / expr5;
expr5 <- prefix IGNORE expr5
    / expr6;
expr6 <- "(" IGNORE "/"? IGNORE expr IGNORE ")"
    / data;

choice_expr  <- expr1 IGNORE (mark_branch IGNORE mark_type?)?;
mark_branch  <- ("^"/"!"/"<"/">")? "#" symbol;
mark_type  <- ":" IGNORE symbol;
prefix  <- "!" / "&" / "^" / "*" / "%";
suffix  <- "?" / "+" / "-"/ "*";

data   <- macro_call
  / regex_range
  / list
  / symbol_path
  / integer
  / special
  ;

list   <- "{" IGNORE (data IGNORE ("," IGNORE data)* IGNORE ","?)? IGNORE "}";
slice  <- "{" IGNORE integer IGNORE "," IGNORE integer IGNORE  "}";

regex_range  <- "[" IGNORE (!"]" IGNORE . / "\\" IGNORE .)* IGNORE "]";


macro_call  <-
    "@" symbol_path IGNORE "(" IGNORE (macro_kv IGNORE ("," IGNORE macro_kv)* IGNORE ","?)? IGNORE")"
;
macro_kv  <-  symbol IGNORE " <-" IGNORE expr / expr;



macro_define  <-
    "macro!" IGNORE symbol IGNORE  "(" IGNORE(macro_arg IGNORE ("," IGNORE macro_arg)* IGNORE ","?)? IGNORE ")" IGNORE block
;
macro_arg  <- symbol IGNORE (":" IGNORE symbol)? IGNORE ("=" IGNORE expr)?;
block  <- "{" IGNORE "return"? IGNORE expr IGNORE "}";


string  <-
    re#'([^\\']|\\.)*'#
  / re#"([^\\"]|\\.)*"#
;

integer  <- "0" / re#[1-9]# ("_"? re#[0-9]#)*;

special  <- "true"/"false"/"null";

symbol_path  <-
    symbol IGNORE (("::"/".") IGNORE symbol)*
;
symbol  <- XID_IDENTIFIER;


IGNORE <-  (SPACE / NEWLINE / COMMENT)?;

SPACE <- re#[\s]+#;
NEWLINE <- re#[\r\n]+#;

COMMENT <- "//" re#[^\r\n]*#

Expected behavior

Inline choice_expr into expr1 then everything is ok

expr1 <- expr1 IGNORE (mark_branch IGNORE mark_type?)? IGNORE ("|"/"/") IGNORE choice_expr
    / expr2;

It should be handled correctly without inline here.

Idea: Support `node_tag` and `branch_tag`

Add two fields to identify nodes

#[derive(Clone, Debug)]
pub struct Node<'i> {
    pub rule: Rule,
    pub start: usize,
    pub end: usize,
    pub node_tag: Option<&'i str>,   // The lifetime is the same as the input text
    pub branch_tag: Option<&'i str>, // The lifetime is the same as the input text
    pub children: Vec<Node<'i>>,
    pub alternative: Option<u16>,
}

Basically like this:

expr <-
    "(" expr ")"            #Priority
  / lhs=expr "*" rhs=expr   #Mul
  / lhs=expr "/" rhs=expr   #Div
  / lhs=expr "+" rhs=expr   #Add
  / lhs=expr "-" rhs=expr   #Sub
  / num                     #Atom
  ;
num <- re#[0-9]+#;

PEG.js has this feature

Stack based matching

I have encountered some difficulties, and it doesn’t seem to be easy when pair matching is required.

Such as these situations:

HTML/XML

<a> push into the stack , wait until </a> pop out the stack

TeX

\begin{name} push into the stack , wait until \end{name} pop out the stack.

Markdown

`Γ—4 push into the stack
````markdown
do something
````
wait until  `Γ—4 pop out the stack

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.