kschiess / parslet Goto Github PK

A small PEG based parser library. See the Hacking page in the Wiki as well.

License: MIT License

Ruby 71.89% HTML 25.16% CSS 1.87% Shell 0.06% Slim 0.53% Sass 0.49%

parslet's Introduction

INTRODUCTION

Parslet makes developing complex parsers easy. It does so by

* providing the best error reporting possible
* not generating reams of code for you to debug

Parslet takes the long way around to make your job easier. It allows for
incremental language construction. Often, you start out small, implementing
the atoms of your language first; _parslet_ takes pride in making this
possible.

Eager to try this out? Please see the associated web site:
https://kschiess.github.io/parslet/

SYNOPSIS

  require 'parslet'
  include Parslet

  # parslet parses strings
  str('foo').
    parse('foo') # => "foo"@0

  # it matches character sets
  match['abc'].parse('a') # => "a"@0
  match['abc'].parse('b') # => "b"@0
  match['abc'].parse('c') # => "c"@0

  # and it annotates its output
  str('foo').as(:important_bit).
    parse('foo') # => {:important_bit=>"foo"@0}

  # you can construct parsers with just a few lines
  quote = str('"')
  simple_string = quote >> (quote.absent? >> any).repeat >> quote

  simple_string.
    parse('"Simple Simple Simple"') # => "\"Simple Simple Simple\""@0

  # or by making a fuss about it 
  class Smalltalk < Parslet::Parser
    root :smalltalk

    rule(:smalltalk) { statements }
    rule(:statements) { 
      # insert smalltalk parser here (outside of the scope of this readme)
    }
  end

  # and then
  Smalltalk.new.parse('smalltalk')

FEATURES

  * Tools for every part of the parser chain
  * Transformers generate Abstract Syntax Trees
  * Accelerators transform parsers, making them quite a bit faster
  * Pluggable error reporters
  * Graphviz export for your parser
  * Rspec testing support rig
  * Simply Ruby, composable and hackable

COMPATIBILITY

This library is intended to work with Ruby variants >= 1.9. I've tested it on 
MRI 1.9, rbx-head, jruby. Please report as a bug if you encounter issues.

STATUS 

Production worthy.

(c) 2010-today Kaspar Schiess

parslet's People

Contributors

Stargazers

Watchers

Forkers

rkh grobie txus stormbrew jgarber meh xli lbt rogerbraun sfgeorge deni64k tomburger shaoyangyu henrik hrigu gilesbowkett zmoazeni rip-lang dougpuchalski roryokane ryanscottlewis erebor danbernier swiftype jmettraux baroquebobcat tadd anatol ajh skalee kernow nigelthorne aztack michaelmior admgc ress ngpestelos raphael wusuopu salsify firewalker06 kwangkim eeeyes sheldon-b leadmachineapp chrismwendt johnnaegle cbliard tomoasleep ganrann aoitaku olbrich mikeyhew mperham bangn aprescott jreut tadas-s pietdaniel rathrio eitoball jalada look takahashim sprybts schneems oddbloke rmparanhos bannable glenbray watari3 alexagranov tct-i15fujimura1s rafpe ulugbekov kramax therocketforever bobwhitelock ojundt reetok3 mlarraz tomlea dacuteraccoon bearerpipelinetest jcambass mico scottcmoore angry-boss dosuken123 plurimath taichi-ishitani yiyenene

parslet's Issues

Hanging Parser

Hi,

I stumbled onto a problem with the parser hanging endlessly.
I can reduce it to the following code example:

require 'parslet'
require 'pp'

class Parser < Parslet::Parser
  root :statements
  rule(:statements) { statement.repeat }
  rule(:statement) { text.as(:t1) >> space >> text.as(:t2) >> separator }
  rule(:text) { match('[a-z]').repeat }
  rule(:space) { match('\s').repeat } 
  rule(:separator) { eof | line_break }
  rule(:line_break) { match('[\r\n]').repeat }
  rule(:eof) { any.absent? }
end

parser = Parser.new

pp parser.statement.parse("first second")
 # => {:t1=>"first"@0, :t2=>"second"@6}

pp parser.parse("first second")
# => this one causes the parser to hang

pp parser.parse("first second\ntest test")
# => this one causes the parser to hang

Am I doing something wrong?
Is this a bug of the parser?
This behaviour occurs with parslet version 1.3.0 on ruby 1.9.3p125

Thanks

Create specs that run all examples

We should automatically check the examples in examples/*. This provides us with a nice regression spec. Perhaps run them against stored output?

Write a parser for Ruby HEREdocs.

Write a parser for Rubys heredocs. We might have to extend parslet with something like the semantic predicates of Treetop. A start in that direction can be found here: https://github.com/kschiess/parslet/tree/20110203-exp-predicates

maybe vs repeat(0,1)

I kept getting a 'duplicate subtrees...' error while trying to implement a rule for C selection statements. By accident I realized that the error goes away if I replace maybe() with repeat(0,1). My understanding was that maybe() was effectively an alias for repeat(0,1)?

The selection_statement rule is simplified below to illustrate the problem.

Thanks.

rule(:identifier) do
  space? >> (
    (letter | underscore) >> (digit | letter | underscore).repeat
  ).as(:identifier) >> space?
end

rule(:selection_statement) do
  ( str('if') >> space? >> identifier >> space? >> (str('else') >> space? >> identifier).maybe ).as(:selection_statement)
end

p parser.selection_statement.parse_with_debug 'if i else i'
> Duplicate subtrees while merging result of 
>   selection_statement:('if' SPACE? IDENTIFIER SPACE? ('else' SPACE? IDENTIFIER)?)
> only the values of the latter will be kept. (keys: [:identifier])
> {:selection_statement=>{:identifier=>"i"@10}}

rule(:selection_statement) do
  ( str('if') >> space? >> identifier >> space? >> (str('else') >> space? >> identifier).repeat(0,1) ).as(:selection_statement)
end

p parser.selection_statement.parse_with_debug 'if i else i'
> {:selection_statement=>[{:identifier=>"i"@3}, {:identifier=>"i"@10}]}

Permit execution of partial transformations

str('foo').transform(transformer) >> str('bar')

This is not as easy as it looks, but can be made to work.

#parslet IRC network not specified

From http://kschiess.github.com/parslet/contribute.html
"Join us on IRC in #parslet."

What network is this? I don't see anyone there on Freenode. Thanks.

Mailing list is down?

Apparently, our mailing list has stopped working. I've tried to contact the powers that be about this, but I've yet to receive answer back. Sorry for the inconvenience.

Steal the JS menu idea from coffeescript

http://jashkenas.github.com/coffee-script/, Table of contents

Most of our help pages could profit from a TOC.

Match beginning of line?

Sorry to keep putting questions in your issues, couldn't find another place for them :) Is there a way to match the beginning of a line that will also match the beginning of the file? I could construct a regular expression for this, but don't see a way to match a fully specified regex. I can do a look-behind using prsnt? to check for a previous line termination, but I can't figure how to also treat the first characters in a file as the beginning of a line. My instinct is to use something like (eol | match("^")).prsnt? but "^" isn't valid in a character class, so it doesn't work. Thanks for any help.

include rules in stacktrace

Currently debugging my parser, the stacktrace is somewhat worthless, as I have no clue why it's trying to match the pattern that fails in the first place. Really compiling rules to methods could probably offer that.

Transform: Easier rule writing

The key I match the parse tree with is almost always the name I want for my variable in the rule.

class Parslet::Transform
    class << self
        def expression_from_args(args)
            if args[0].kind_of? Hash
                args[0]
            else
                names = args.collect {|binder| binder.variable_name}
                expression = {}
                names.zip(args) { |key, value|
                    expression[key] = value
                }
                expression
            end
        end

        alias orig_rule rule
        def rule(*args, &block)
            orig_rule(expression_from_args(args), &block)
    end
    end

    alias orig_rule rule
    def rule(*args, &block)
        expression = self.class.expression_from_args(args)
        orig_rule(expression, &block)
    end
end

This allows me to write rules like:

    rule(simple(:proto_name),
             hash_sequence(:args, [:arg_name])) {
        args.collect! {|x| x[:arg_name]}
        Prototype.new(proto_name, args)
    }

    rule(simple(:call_name),
             sequence(:args)) {
        CallExpr.new(call_name, args)
    }

    rule(simple(:variable_name)) {
        VariableExpr.new(variable_name);
    }

    rule(simple(:number)) {
        NumberExpr.new(number.to_f)
    }

    rule(simple(:condition), simple(:then_), simple(:else_)) {
        IfExpr.new(condition, then_, else_)
    }

I will clean the code and make a patch and pull request if there is enough interest.

absnt? -> absent?, prsnt? -> present?

Parslet is great so far. I like how it embraces Ruby syntax. However, abbreviating absent? as absnt? and prsnt? as present? is decidedly not in the spirit of Ruby. It would be better to spell them out. It is easier to remember and easier to read. It only requires [1,2] extra characters.

compatibility with mathn library

I'm using another gem (ruby-units) that happens to require the mathn library from Ruby's standard library. One of the effects of this library is to override some very basic operations, such as division on Integer. When mathn is required, 99/100 doesn't return the Integer 0 as one would normally expect, but rather the Rational (99/100). This is somewhat contrary to general expectations as to how Ruby should behave.

I ran into this issue while constructing my parslet grammar, where it would on some expressions get caught in an infinite loop. I assumed at first that I just had some left-recursion that I wasn't aware of, but digging into parslet's source I found that it had actually finished its parsing attempt and was getting caught in a loop while trying to generate the message for the parsing exception. I eventually traced it down to RangeSearch's lbound method. The algorithm expects the division by 2 to eventually result in right being less than or equal to left, but it in some cases (when the number that's being divided by 2 is not even) right instead ends up becoming a fraction that grows closer and closer to left without ever actually reaching it.

The "problem" is easy enough to fix, just by adding a mid = mid.floor after doing the division.

However, as you might guess by my scare quotes, I don't really think this is parslet's problem, and I'm not quite sure if a fix for such an arcane issue (seriously, who uses mathn?) belongs in parslet. I certainly know my first thought upon coming across x = y/2; x = x.floor in a codebase would be that the programmer obviously didn't know Ruby.

Thoughts? If you don't feel like having mid = mid.floor in there would just be code noise, I'd be happy to submit a pull request with a test.

Here's a close-to-minimal case that causes the infinite loop:

require 'parslet'
include Parslet

def attempt_parse
  possible_whitespace = match('\s').repeat

  cephalopod =
    str('octopus') |
    str('squid')

  parenthesized_cephalopod =
    str('(') >>
    possible_whitespace >>
    cephalopod >>
    possible_whitespace >>
    str(')')

  parser =
    possible_whitespace >>
    parenthesized_cephalopod >>
    possible_whitespace

  parser.parse %{(\nsqeed)\n}
end

attempt_parse and puts 'it terminates before we require mathn'
require 'mathn'
attempt_parse # but it doesn't terminate after requiring mathn

(Note that if you change sqeed to squeed, it does successfully terminate with mathn because the dividend just doesn't happen to end up as an odd number.)

allowing escaped quotes in strings

Hey, I just started playing around with a JSON parser example, and ran into a problem dealing with escaped characters in strings. See my Gist for an example: https://gist.github.com/776081

My String parser looks something like this:

class Str < Parslet::Parser
  rule(:space)      { match('\s').repeat(1) }
  rule(:space?)     { space.maybe }
  rule(:quote)      { str('"') }
  rule(:nonquote)   { str('"').absnt? >> any }
  rule(:escape)     { str('\\') >> any }
  rule(:string)     { quote >> (escape | nonquote).repeat(1).as(:str) >> quote }
  root :string
end

My test cases give me these results

parsing "test"
{:str=>"test"}

parsing "\"test\""
{:str=>"\\\"test\\\""}

I'd like to get the contents of the 'escape' rule without the backslash. I did find one way of doing this, but it creates an object per character:

rule(:escape)     { str('\\') >> any.as(:esc) }
rule(:string)     { quote >> (
    escape | 
    nonquote.as(:char)
  ).repeat(1)
  .as(:str) >> quote }

parsing "test"
{:str=>[{:char=>"t"}, {:char=>"e"}, {:char=>"s"}, {:char=>"t"}]}

parsing "\"test\""
{:str=>
  [{:esc=>"\""},
   {:char=>"t"},
   {:char=>"e"},
   {:char=>"s"},
   {:char=>"t"},
   {:esc=>"\""}]}

Any thoughts on that?

horribly slow

I revert my verbal comments about speed not mattering much. I wrote a smalltalk parser and parsing 5 lines of somewhat complex code takes 132.37 on the latest release and 41.42 on master. While the difference is extremely impressive, that is still not acceptable.

Site. Code examples bug

http://savepic.org/1217820.png

WinXP, FireFox 3.6, Chrome.

Tuning branch is broken.

Just tried the tuning branch:

require 'parslet'
LoadError: no such file to load -- parslet/source
    from <internal:lib/rubygems/custom_require>:29:in `require'
    from <internal:lib/rubygems/custom_require>:29:in `require'
    from /home/hal/.rvm/gems/ruby-1.9.2-p136/gems/parslet-1.1.0/lib/parslet.rb:230:in `<top (required)>'

Fix documentation for Transform

It should say that:

WHOLE hashes are matched, not partial. data loss...
If it gets complicated, you're doing it wrong
Just a way to get a nice AST with having the user choose
Execution environment is left unclear on purpose

Transform Documentation should detail injectable context

I've been toying with parslet for 2 days now and only found how to inject context via transformer.apply(tree, context) by looking through past issues via github. Would be better to publicise this in the documentation.
Note this is different from the y = 12 example.

To me this was essential to have, sure it's important for others too.

BTW Absolutely awesome parser - a powerful and elegant design.

Detect indirect or direct loops in grammar execution

When reentering one of the rules already tried for a given position in the input, we might abort with a 'endless loop warning'.

Allow injecting context into transformations

Something like:

transform.apply(tree, :document => MyDuckument.new)

Transform: Match sequence of hashes

More than once in writing my parsers I hit the issue of finding it easiest to generate an array of hashes, such as:

    rule(:var_expr) {

        init = str('=') >> space >> expression
        var = ident.as(:var_name) >> space >> init.maybe.as(:var_init)
        list = str(',') >> space >> var


        str('var') >> space >> (var >> list.repeat(0)).as(:var_list) >> space >>
        str('in') >> space >> expression.as(:var_expr)
    }

Notice var_list will consist of an Array of Hash's in the parse tree. This means var_list can't be matched by the 'sequence' matcher. Since I hit this issue more than once I decided extend Parslet with this 'hash_sequence' matcher:

module MyTransformers
    class HashSequenceBind < Parslet::Pattern::SubtreeBind

        def initialize(symbol, keys, key_binder=nil)
            self.symbol = symbol
            @keys = keys
            @key_binder = key_binder || Parslet::Pattern::SimpleBind.new
        end

        def can_bind?(subtree)
            subtree.kind_of?(Array) and
                subtree.all? { |el|
                    el.is_a?(Hash) and
                    el.keys == @keys and
                    el.all? { |k, v|
                        @key_binder.can_bind? v
                    }
                }
        end
    end
end

module Parslet
    def hash_sequence(symbol, keys)
        MyTransformers::HashSequenceBind.new(symbol, keys)
    end
    module_function :hash_sequence
end

Used in a transform like:

    rule(hash_sequence(:var_list, [:var_name, :var_init]),
             simple(:var_expr)) {
        VarExpr.new(var_list, var_expr)
    }

Notice var_list will match only an Array whose elements are all Hash's with the keys :var_name, and :var_init, and then only if all those values are matched by Parslet::Pattern::SimpleBind. This last behavior is modifiable by the optional third parameter to hash_sequence.

I've used this feature more than once and will make a patch and pull request if others find it useful.

Avoid creating object structures that aren't needed afterwards

Inside the parser, large structures of Ruby objects are created and then freed again. Can we find a way only to create what we return to the user?

Tuning branch retains previous parse trees.

I noticed that the results from previous parsing, stays within the parser object.

parser.logical_and_expression.parse('1 && 2')
 => {:binary_and=>{:left=>{:constant=>"1 "}, :right=>{:constant=>"2"}}} 
parser.logical_and_expression.parse('1 && 3')
 => {:binary_and=>{:left=>{:constant=>"1 "}, :right=>{:constant=>"2"}}}

Example Parser: Mailing Addresses

Here's a parser I wrote last night: https://gist.github.com/1008044

It just parses mailing addresses. The parser uses a ruby method to parse state abbreviations that I think is pretty sweet.

If you like this parser, feel free to include it in the Parslet examples.

Thanks for Parslet, Andy

error_tree not reaching the "real issue"

Hello,

I have a small language and I was hoping Parslet and its error_tree would help me pinpoint my errors, but I found it's rejecting hastily my string and indicates the error as "upstream"...

Here is my gist, tested against the latest Parslet (98425d3...), and ruby 1.8.7 (2010-01-10 patchlevel 249) [i686-darwin10.8.0](and 1.9.2-p290):

https://gist.github.com/2374233

In the last test, the code looks like:

    define f()
      begin
        @res.name
      end
    end

The error output looks like:

Failed to match sequence (LINE_SEPARATOR? BLOCK LINE_SEPARATOR?) at line 2 char 5.

`- Failed to match sequence (LINE_SEPARATOR? BLOCK LINE_SEPARATOR?) at line 2 char 5.
   `- Expected one of [DEFINE_BLOCK, BEGIN_BLOCK] at line 2 char 5.
      |- Failed to match sequence (define:'define' SPACE name:IDENTIFIER '()' BODY 'end') at line 3 char 7.
      |  `- Expected "end", but got "beg" at line 3 char 7.
      `- Failed to match sequence (pre:((type:'concurrent' SPACE)?) begin:'begin' BODY 'end') at line 2 char 5.
         `- Expected "begin", but got "defin" at line 2 char 5.

Expected "end", but got "beg" at l3 c7

I was hoping it'd indicate me something like Expected "define", but got "@res.n" but it complains about the enclosing begin. When the code gets bigger, it gets too fuzzy: "hey somewhere in the middle of that fat block, there is an issue...".

Is it an issue with my parser or is it Parslet giving up too quickly when preparing the error_tree?

Anyway, as usual, many thanks for Parslet, cheers,

John

rake not working

I am having issues with getting rake to work:

Trevors-iMac:parslet trevor$ rake spec --trace
rake/rdoctask is deprecated.  Use rdoc/task instead (in RDoc 2.4.2+)
rake/gempackagetask is deprecated.  Use rubygems/package_task instead
NOTE: Gem.available? is deprecated, use Specification::find_by_name. It will be removed on or after 2011-11-01.
Gem.available? called from /Users/trevor/.rvm/gems/ruby-1.9.2-p290/gems/sdoc-0.3.7/lib/sdoc/generator.rb:6.
** Invoke spec (first_time)
** Execute spec
/Users/trevor/.rvm/rubies/ruby-1.9.2-p290/bin/ruby -S bundle exec rspec ./spec/acceptance/examples_spec.rb ./spec/acceptance/regression_spec.rb ./spec/acceptance/repetition_and_maybe_spec.rb ./spec/parslet/atom_results_spec.rb ./spec/parslet/atoms/alternative_spec.rb ./spec/parslet/atoms/base_spec.rb ./spec/parslet/atoms/combinations_spec.rb ./spec/parslet/atoms/dsl_spec.rb ./spec/parslet/atoms/entity_spec.rb ./spec/parslet/atoms/named_spec.rb ./spec/parslet/atoms/re_spec.rb ./spec/parslet/atoms/sequence_spec.rb ./spec/parslet/atoms/transform_spec.rb ./spec/parslet/atoms/visitor_spec.rb ./spec/parslet/atoms_spec.rb ./spec/parslet/convenience_spec.rb ./spec/parslet/error_tree_spec.rb ./spec/parslet/export_spec.rb ./spec/parslet/expression/treetop_spec.rb ./spec/parslet/parser_spec.rb ./spec/parslet/parslet_spec.rb ./spec/parslet/pattern_spec.rb ./spec/parslet/rig/rspec_spec.rb ./spec/parslet/slice_spec.rb ./spec/parslet/source/line_cache_spec.rb ./spec/parslet/source_spec.rb ./spec/parslet/transform_spec.rb
/Users/trevor/.rvm/rubies/ruby-1.9.2-p290/bin/ruby: No such file or directory -- bundle (LoadError)
rake aborted!
ruby -S bundle exec rspec ./spec/acceptance/examples_spec.rb ./spec/acceptance/regression_spec.rb ./spec/acceptance/repetition_and_maybe_spec.rb ./spec/parslet/atom_results_spec.rb ./spec/parslet/atoms/alternative_spec.rb ./spec/parslet/atoms/base_spec.rb ./spec/parslet/atoms/combinations_spec.rb ./spec/parslet/atoms/dsl_spec.rb ./spec/parslet/atoms/entity_spec.rb ./spec/parslet/atoms/named_spec.rb ./spec/parslet/atoms/re_spec.rb ./spec/parslet/atoms/sequence_spec.rb ./spec/parslet/atoms/transform_spec.rb ./spec/parslet/atoms/visitor_spec.rb ./spec/parslet/atoms_spec.rb ./spec/parslet/convenience_spec.rb ./spec/parslet/error_tree_spec.rb ./spec/parslet/export_spec.rb ./spec/parslet/expression/treetop_spec.rb ./spec/parslet/parser_spec.rb ./spec/parslet/parslet_spec.rb ./spec/parslet/pattern_spec.rb ./spec/parslet/rig/rspec_spec.rb ./spec/parslet/slice_spec.rb ./spec/parslet/source/line_cache_spec.rb ./spec/parslet/source_spec.rb ./spec/parslet/transform_spec.rb failed
/Users/trevor/.rvm/gems/ruby-1.9.2-p290/gems/rspec-core-2.6.4/lib/rspec/core/rake_task.rb:139:in `rescue in block (2 levels) in initialize'
/Users/trevor/.rvm/gems/ruby-1.9.2-p290/gems/rspec-core-2.6.4/lib/rspec/core/rake_task.rb:135:in `block (2 levels) in initialize'
/Users/trevor/.rvm/gems/ruby-1.9.2-p290@global/gems/rake-0.9.2/lib/rake/file_utils_ext.rb:57:in `verbose'
/Users/trevor/.rvm/gems/ruby-1.9.2-p290/gems/rspec-core-2.6.4/lib/rspec/core/rake_task.rb:131:in `block in initialize'
/Users/trevor/.rvm/gems/ruby-1.9.2-p290@global/gems/rake-0.9.2/lib/rake/task.rb:205:in `call'
/Users/trevor/.rvm/gems/ruby-1.9.2-p290@global/gems/rake-0.9.2/lib/rake/task.rb:205:in `block in execute'
/Users/trevor/.rvm/gems/ruby-1.9.2-p290@global/gems/rake-0.9.2/lib/rake/task.rb:200:in `each'
/Users/trevor/.rvm/gems/ruby-1.9.2-p290@global/gems/rake-0.9.2/lib/rake/task.rb:200:in `execute'
/Users/trevor/.rvm/gems/ruby-1.9.2-p290@global/gems/rake-0.9.2/lib/rake/task.rb:158:in `block in invoke_with_call_chain'
/Users/trevor/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/monitor.rb:201:in `mon_synchronize'
/Users/trevor/.rvm/gems/ruby-1.9.2-p290@global/gems/rake-0.9.2/lib/rake/task.rb:151:in `invoke_with_call_chain'
/Users/trevor/.rvm/gems/ruby-1.9.2-p290@global/gems/rake-0.9.2/lib/rake/task.rb:144:in `invoke'
/Users/trevor/.rvm/gems/ruby-1.9.2-p290@global/gems/rake-0.9.2/lib/rake/application.rb:112:in `invoke_task'
/Users/trevor/.rvm/gems/ruby-1.9.2-p290@global/gems/rake-0.9.2/lib/rake/application.rb:90:in `block (2 levels) in top_level'
/Users/trevor/.rvm/gems/ruby-1.9.2-p290@global/gems/rake-0.9.2/lib/rake/application.rb:90:in `each'
/Users/trevor/.rvm/gems/ruby-1.9.2-p290@global/gems/rake-0.9.2/lib/rake/application.rb:90:in `block in top_level'
/Users/trevor/.rvm/gems/ruby-1.9.2-p290@global/gems/rake-0.9.2/lib/rake/application.rb:129:in `standard_exception_handling'
/Users/trevor/.rvm/gems/ruby-1.9.2-p290@global/gems/rake-0.9.2/lib/rake/application.rb:84:in `top_level'
/Users/trevor/.rvm/gems/ruby-1.9.2-p290@global/gems/rake-0.9.2/lib/rake/application.rb:62:in `block in run'
/Users/trevor/.rvm/gems/ruby-1.9.2-p290@global/gems/rake-0.9.2/lib/rake/application.rb:129:in `standard_exception_handling'
/Users/trevor/.rvm/gems/ruby-1.9.2-p290@global/gems/rake-0.9.2/lib/rake/application.rb:59:in `run'
/Users/trevor/.rvm/gems/ruby-1.9.2-p290@global/gems/rake-0.9.2/bin/rake:32:in `<top (required)>'
/Users/trevor/.rvm/gems/ruby-1.9.2-p290/bin/rake:19:in `load'
/Users/trevor/.rvm/gems/ruby-1.9.2-p290/bin/rake:19:in `<main>'
Tasks: TOP => spec

I don't have much experience with the tools of "Ruby Land," and currently have too much else on my plate to investigate further. Right now this issue is keeping me from writing and running specs to so I can patch parslet with the features mentioned in: #52 and #53

Stateful error trees are buggy

Saving the '@offending_parslet' in alternatives (but not only there) in the parslet instance itself is not correct. In cases where one parslet can be nested in itself, the @offending_parslet will contain only the innermost error after the parse.

This can be seen in Johns gist in issue #64: (https://gist.github.com/2374233) In the sample

define foo()
  begin
     @bar.baz
  end
end

the error message will contain why the define-rule didn't match '@bar.baz' and not why it didn't match 'begin ... end'.

Storing this kind of stuff in instance variables is a design-smell anyway. This is the issue why we need to take that out.

Parslet doesn't support serializing

tree = MyParser.parse(q)
Marshal.dump(tree) #=> TypeError: no marshal_dump is defined for class StringIO

For example it doesn't let me to cache it with memcached because it can't be serialized.

It would be nice to have this.

Support left recursion

It seems that there is a trick to permit left recursive rules in PEG grammars. Research into that and make parslet do it.

Multibyte strings aren't matched correctly

pry(main)> str("あああ").parse("あああ")
Parslet::ParseFailed: Premature end of input at line 1 char 4.

Reported by rogerbraun as part of pull request #55.

Parsing space separated word list hangs

The following tiny parser to parse a space separated world list hand, but I can't seem to understand why. Could this be a bug?

module Rinfi
  class Parser < Parslet::Parser
    root(:clause)    
    rule(:ws1)           { match[" \t\r\n"]                                   }
    rule(:ws)            { ws1.repeat                                         }
    rule(:ws?)           { ws.maybe                                           }
    rule(:crlf)          { match["\r"]  >> match["\n"]                        }
    rule(:cr)            { match["\r"]                                        }
    rule(:lf)            { match["\n"]                                        }
    rule(:cr_or_lf)      { lf  | crlf | cr                                    }



    rule :word do
      (match["A-Za-z"].repeat).as(:word)
    end

    rule :spaced_word do
      word >> ws?
    end

    rule :spaced_word? do
      spaced_word.repeat(0)
    end

    rule :clause do
      (word >> (ws >> word).repeat(0)).as(:clause)
    end
  end
end

# This hangs, probably an eternal loop somewhere:
res = Rinfi::Parser.new.clause.parse("hello world")

line number tracking

Is there an easy way to add line number tracking to a grammar (for a language that does not care about lines) in order to improve error messages and such? Basically, I'm currently migrating my largely handwoven compiler (except for the bytecode generator) to the Rubinius compiler tool chain, which actually cares about line numbers. A short peak into the Parslet source reveals that there is actually line number tracking built in (in Parslet::Source) but it seems there is no way to add those to the parse tree.

string encodings on Ruby 1.9

Many thanks for Parslet, it's a wonderful tool.

# encoding: UTF-8

require 'rubygems'
require 'parslet'

class Parser < Parslet::Parser
  rule(:string) { any.repeat(1).as(:string) }
  root(:string)
end

strings = [ "whatever", "pour le mérite", "電車" ]

strings.each do |string|
  tree = Parser.new.parse(string)
  slice = tree[:string]
  s = slice.to_s
  p [ s, s == string, s.respond_to?(:encoding) ? s.encoding : '' ]
end

on Ruby 1.8.7-p249 (SnoLeo, installed via RVM), it yields :

["whatever", true, ""]
["pour le m\303\251rite", true, ""]
["\351\233\273\350\273\212", true, ""]

on Ruby 1.9.2-p180 :

["whatever", true, #<Encoding:ASCII-8BIT>]
["pour le m\xC3\xA9rite", false, #<Encoding:ASCII-8BIT>]
["\xE9\x9B\xBB\xE8\xBB\x8A", false, #<Encoding:ASCII-8BIT>]

on Ruby 1.9, the resulting strings are ASCII (the source is UTF-8).

Could Parslet::Slice.to_s generated strings pick the encoding of the source string ? There is this force_encoding(enc) method for Ruby 1.9 strings.

Cheers,

Parslet - Get Started typos

Hi,

I just worked through the Get Started guide and noticed a few typos in the code. For example under the heading Making the parser complete

class MiniP < Parslet::Parser
  # Single character rules
  rule(:lparen)     { str('(') >> space? }
  rule(:rparen)     { str(')') >> space? }
  rule(:comma)      { str(',') >> space? }
  # etc
end

Looks like you need to escape the brackets to get it working.

Is there a copy of this page somewhere I could clone and provide a patch to fix theses issues?

Implicitly ignore whitespace?

Is there a simple way to make it such that rules like the following assume there may be whitespace between the tokens, without the need to explicitly spell it out in every single rule? For example, using lex and yacc, you just make WSP a no-op and the lexer skips over it, meaning its not visible in your parse tree.

# simply
rule(:math_expr) { t_int.as(:left) >> operator.as(:op) >> t_int.as(:right) }

# rather than the verbose
rule(:math_expr) { t_int.as(:left) >> wsp? >> operator.as(:op) >> wsp? >> t_int.as(:right) }

Make 'return values' for repetitions really simple to understand.

We already have logic that makes

something.maybe

either return something or nil. And

something.repeat(...)

will return an array either way (even if 0 matches). Now (naively said) we need the same thing for

string.repeat(0)

to return '' (empty string) when no match succeeds.

The implementation idea is that we produce the output the programmer probably wants and simplify his transformations at the source.

Parslet::Slice#slice only works correctly on parent

class S < Parslet::Parser
  rule(:string){
    str('"') >> ( (str('\\') >> any) | (str('"').absnt? >> any)).repeat >> str('"')
  }
end

s = S.new.string.parse(%("foo")) #=> "\"foo\""@0
s.slice(1, s.size - 2) #=> "\"fo"@0
s.parent.slice(1, s.size - 2) #=> "foo"@1

push 1.1.0 gem

also, create tag.

Should allow unconsumed input after a successful match

I'm trying to use Parslet to repeatedly parse statements in an input stream. For better error recovery, I don't want to just use repeat; if one statement is malformed, I want to be able to recover by, say, throwing away all characters up to a delimiter, then resuming my normal "next statement" mode.

There seems to be no good way to do this without replicating most of apply (or VM::run). Even just a flag to not error on unconsumed input would be nice, but something more fully-formed would probably include reusing a Parslet::Source (which you can't do just by passing it to parse, because it will try to wrap it in another Parslet::Source instance).

>> mutates the atom, sometimes

It looks like the >> operator mutates Sequence atoms. This makes it difficult to compose atoms:

require 'parslet'
include Parslet

def listed(arg)
  # just like the example: http://kschiess.github.com/parslet/parser.html
  arg >> (str(',') >> arg).repeat
end

class MiniP < Parslet::Parser
  rule(:turtle) { str('{') >> listed(turtle >> str(':') >> turtle).maybe >> str('}') }
  root :turtle
end

p MiniP.new.parse('{}')
p MiniP.new.parse('{{}:{}}')

outputs:

parslet-1.2.1/lib/parslet/atoms/base.rb:208:in `to_s': stack level too deep (SystemStackError)

I believe this is the cause:

irb(main):001:0> require 'parslet'
=> true
irb(main):002:0> include Parslet
=> Object
irb(main):003:0> a = str('1')
=> '1'
irb(main):004:0> b = a >> str('2')
=> '1' '2'
irb(main):005:0> c = b >> str('3')
=> '1' '2' '3'
irb(main):006:0> a
=> '1'
irb(main):007:0> b
=> '1' '2' '3'
irb(main):008:0> c
=> '1' '2' '3'

Notice that after c is created, the value of b changed.

Transform rules only matching whole objects

When writing rules they only match whole objects and not nodes of objects. For an example please see this Gist.

I understand that multiple KV pairs can be passed to a rule but I would argue that this an exception rather than the normal behaviour. In my example I expected the multi rule to match both whole objects and attributes of objects.

Issue when binding 'y' in ruby 1.9.3

This isn't really a bug in Parslet, but this is where it manifested and I figured somebody else might run into it at some point. See: ruby/psych#50. The fix from the Parslet side is to have Context extend BasicObject instead of BlankSlate, but BasicObject was only added in 1.9.3, so it'll break backwards compatibility if it's switched. It might be worth adding some sort of warning for people upgrading to 1.9.3.

automatic whitespace handling

Parslet grammers are littered with whitespace checks, making them harder to read. Leaving them out fails to parse valid things properly. Take the javascript parser as an example: https://github.com/matthewd/capuchin/blob/d47f4b19eb888b6a4fc5428d3d1fdfcdb551b183/lib/capuchin/parser.rb

There is sp? everywhere. There are very few cases where whitespace is not allowed, and decorating those cases with a different operator to join the atoms seems sufficient.

So, this is a feature request for some sort of functionality like this. pyPEG has a skipws option which seems to work ok.

Matching css comments /* - not sure if this is a bug or my ignorance

Hi there. I'm loving this library.

I'm working with CSS parsing, and I've come across this confusing issue:

∴ irb
irb(main):001:0> require "parslet"
=> true
irb(main):002:0> Parslet.match('\*').parse('*')
=> "*"@0
irb(main):003:0> Parslet.match('/').parse('/')
=> "/"@0
irb(main):004:0> Parslet.match('/\*').parse('/*')
Parslet::ParseFailed: Don't know what to do with "*" at line 1 char 2.
        from /Users/zmoazeni/.rbenv/versions/1.9.3-p385/lib/ruby/gems/1.9.1/gems/parslet-1.5.0/lib/parslet/cause.rb:63:in `raise'
        from /Users/zmoazeni/.rbenv/versions/1.9.3-p385/lib/ruby/gems/1.9.1/gems/parslet-1.5.0/lib/parslet/atoms/base.rb:46:in `parse'
        from (irb):4
        from /Users/zmoazeni/.rbenv/versions/1.9.3-p385/bin/irb:12:in `<main>'
irb(main):005:0> (Parslet.match('/') >> Parslet.match('\*')).parse('/*')
=> "/*"@0
irb(main):006:0> '/*' =~ Parslet.match('/\*').re
=> 0

Am I doing something boneheaded with combining the regex Parslet.match('/\*') or is this a subtle bug?

Allow creating rules in Parslet::Parser on an instance level, like it's possible in Parslet::Transform

First the code.

I'm able to call instance methods from the constructor of the Transform, but not the Parser. I want to do this so I can modify rules of instances of the parser at runtime.

If you can just tell me where to look for a fix for this, I can try to solve it myself :).

Raise an exception if the whole tree is not transformed

Transformer for now just leaves out pieces it doesn't have a rule for. It might throw an exception (configurable) when that happens.

Best way to match case insensitive strings?

I am trying to match a set of keywords which are case insensitive, but am having trouble seeing from the various atom implementations how this might be done, as it doesn't appear that flags can be passed into match or string. In past grammars I have had to resort to things like

[Mm][Yy] [Kk][Ee][Yy][Ww][Oo][Rr][Dd]

But I was very much hoping there was something I'm missing that would make that unecessary? :)

Parslet::Slice should be kind_of String

Parslet::Slice does a good job mimicing String, except that it does not inherit from String. This breaks code that explicitly checks for obj.kind_of?(String).

There should be a way to distinguish error messages

When parslet doesn't consume its input, it outputs a message like this:
Don't know what to do with ... at line 3 char 1.

In that case it makes no sense to look at error_tree. There should be a way to detect this; parse_with_debug should use that to not print the tree. Here's what the (useless) tree looks like:

`- Unknown error in AN_EXPRESSION{0, } EOL / AN_EXPRESSION
   |- Unknown error in AN_EXPRESSION{0, } EOL
   |  `- Unknown error in LINE_COMMENT / SPACE? [\n\r]{0, } SPACE?
   |     |- Unknown error in '//' (![\n\r] .){0, } [\n\r]
   |     |  `- Unknown error in [\n\r]
   |     `- Unknown error in SPACE? [\n\r]{0, } SPACE?
   `- Failed to match sequence ('a' SPACE?) at line 1 char 3.
      `- Expected "a", but got "/" at line 1 char 3.