Giter Club home page Giter Club logo

goparsec's Introduction

Parser combinator library in Golang

Build Status Coverage Status GoDoc Sourcegraph Go Report Card

A library to construct top-down recursive backtracking parsers using parser-combinators. Before proceeding you might want to take at peep at theory of parser combinators. As for this package, it provides:

  • A standard set of combinators.
  • Regular expression based simple-scanner.
  • Standard set of tokenizers based on the simple-scanner.

To construct syntax-trees based on detailed grammar try with AST struct

  • Standard set of combinators are exported as methods to AST.
  • Generate dot-graph EG: dotfile for html.
  • Pretty print on the console.
  • Make debugging easier.

NOTE that AST object is a recent development and expect user to adapt to newer versions

Quick links

Combinators

Every combinator should confirm to the following signature,

    // ParsecNode type defines a node in the AST
    type ParsecNode interface{}

    // Parser function parses input text, higher order parsers are
    // constructed using combinators.
    type Parser func(Scanner) (ParsecNode, Scanner)

    // Nodify callback function to construct custom ParsecNode.
    type Nodify func([]ParsecNode) ParsecNode

Combinators take a variable number of parser functions and return a new parser function.

Using the builtin scanner

Builtin scanner library manages the input buffer and implements a cursor into the buffer. Create a new scanner instance,

    s := parsec.NewScanner(text)

The scanner library supplies method like Match(pattern), SkipAny(pattern) and Endof(), refer to for more information on each of these methods.

Panics and Recovery

Panics are to be expected when APIs are misused. Programmers might choose to ignore errors, but not panics. For example:

  • Kleene and Many combinators take one or two parsers as arguments. Less than one or more than two will throw a panic.
  • ManyUntil combinator take two or three parsers as arguments. Less than two or more than three will throw a panic.
  • Combinators accept Parser function or pointer to Parser function. Anything else will panic.
  • When using invalid regular expression to match a token.

Examples

  • expr/expr.go, implements a parsec grammar to parse arithmetic expressions.
  • json/json.go, implements a parsec grammar to parse JSON document.

Clone the repository run the benchmark suite

    $ cd expr/
    $ go test -test.bench=. -test.benchmem=true
    $ cd json/
    $ go test -test.bench=. -test.benchmem=true

To run the example program,

    # to parse expression
    $ go run tools/parsec/parsec.go -expr "10 + 29"

    # to parse JSON string
    $ go run tools/parsec/parsec.go -json '{ "key1" : [10, "hello", true, null, false] }'

Projects using goparsec

  • Monster, production system in golang.
  • GoLedger, ledger re-write in golang.

If your project is using goparsec you can raise an issue to list them under this section.

Articles

How to contribute

Issue Stats Issue Stats

  • Pick an issue, or create an new issue. Provide adequate documentation for the issue.
  • Assign the issue or get it assigned.
  • Work on the code, once finished, raise a pull request.
  • Goparsec is written in golang, hence expected to follow the global guidelines for writing go programs.
  • If the changeset is more than few lines, please generate a report card.
  • As of now, branch master is the development branch.

goparsec's People

Contributors

gitter-badger avatar joneshf avatar leovailati avatar prataprc avatar ptxmac avatar tokenshift avatar tompaton avatar wiiittttt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

goparsec's Issues

Minor: Comments are wrong in JSON example

// False is alias for string type denoting JSON `null`
type False string

// Num is alias for string type denoting JSON `null`
type Num string

// String is alias for string type denoting JSON `null`
type String string

When parsec.String() attempts to scan at the end of the input buffer, a panic occurs.

I have a grammar of the form (simplified):

expr = term+
term = ident | string

Where ident = parsec.Ident() and string = parsec.String(). term+ is implemented using parsec.Many(). Terms are repeatedly consumed from the input until the buffer is completely consumed; at which point parsec.String() panics at terminals.go:24, due to the cursor having passed the end of the buffer.

Unicode support.

Default Scanner (SimpleScanner) and Terminal parser functions should have unicode support, along with utf8 encoding.

how to define recursive grammar?

var (
    UnaryExpr  = Ord(And(LPAREN, Expr, RPAREN))
    Expr       = Ord(UnaryExpr, BinaryExpr)
)

as you see, UnaryExpr and Expr refer to each other.
this library won't work because when we define UnaryExpr, Expr is still uninitialized.

Feature Request: Expose ability to set whitespace pattern for SimpleScanner.SkipWS()

It seems like if you want to use the SimpleScanner, but override the default whitespace pattern, you need to copy all of the code for the scanner locally and change the pattern. It would be great if there was a simple way to change the pattern via the packages API (either a global variable, public struct member, etc.)

If there is currently a way to do this already, please let me know.

Tab completion.

Investigate whether goparsec can be used for tab completion when used for parsing text inputs, like in command line or web forms.

AST Terminals and Non-Terminals.

Create a new AST struct that supports all combinators, in addition to that it can treat Non-terminals as a separate interface.

Error reporting

Is there a way to report errors when input cannot be parsed?

Thoughts on an assertion library?

Your tests are littered with assertions that look like this:

node, _ := ast.Parsewith(y, s)
if node.GetValue() != "onetwothree" {
	t.Errorf("unexpected: %v", node.GetValue())
} else if _, ok := node.GetChildren()[2].(MaybeNone); ok == false {
	t.Errorf("expected MaybeNone")
}

they could look like this:

node, _ := ast.Parsewith(y, s)
require.Equal(t, "onetwothree", node.GetValue())
require.IsType(t, MaybeNone{}, node.GetChildren()[2])

which is much, much easier to read.

Personally I'm a fan of https://github.com/stretchr/testify. Unlike some of the other "testing frameworks" it is still just using the standard go testing signatures.

Implement Atom() parser similar to Token().

@leovailati demonstrated that MatchString() method, that matches with i/p string
byte-by-byte, is lot faster than the Match() method from Scanner.

Token() parser takes a regular expression, and matches it with i/p token.
Similarly implement an Atom() parser that uses MatchString() that can match faster
than Token's Match().

NewScanner with SetWsPattern does not skip whitespace when used with And

I was expecting that two should be matched since it should escape the whitespace but it did not. Also the whitespace should be already be set by default which means that there should be no need to call SetWSPattern

	s := NewScanner([]byte("one two three")).SetWSPattern(" ")
	v, e := p(s)
	if v == nil {
		t.Errorf("End() didn't match %q", e)
	}```


Error message:
End() didn't match &{"one two three" '\x00' '\x01' map["^ ":%!q(*regexp.Regexp=&{^  0xc0003bef00 <nil> 0 65536 []  [] 0 0 0 2 false 4 1 false})] " " %!q(bool=false)}

There are few places in tokeniser where it calls SkipWS but not used that Scanner returned by that function. For example
https://github.com/prataprc/goparsec/blob/master/tokeniser.go#L85

Review downstream pull request from 5b7abca

a. parsec_test.go:TestStrEOF(), changes where removed check whether it needs to be brought back.
b. terminal.go:String(), implementation has changed this seem to correlate with (a.)
c. terminal.go:Token(): check on the implication of using SkipAny() instead of SkipWS().
d. terminal.go:TokenOrdChoice().

And(x, Maybe(And(y, z))) is broken

Consider this simplified grammar for addition:

Add ::= Int ("+" Int)?

Trying to express it with goparsec as follows:

package test

import (
	"testing"
	"github.com/prataprc/goparsec"
)

func TestParsec(t *testing.T) {
	intP := goparsec.Int()
	plusP := goparsec.Token("+", "op")
	addP := goparsec.And(nil, intP, goparsec.Maybe(nil, goparsec.And(nil, plusP, intP)))
	//addP := goparsec.OrdChoice(nil, goparsec.And(nil, intP, plusP, intP), intP) // (1)
	//addP := goparsec.And(nil, intP, plusP, intP) // (2)
	ast := goparsec.NewAST("expr", 32)
	s := goparsec.NewScanner([]byte("1 + 2"))
	_, err := ast.Parsewith(addP, s)
	if err != nil {
		t.Fatal(err)
	}
}

Gives the following error:

=== RUN   TestParsec
--- FAIL: TestParsec (0.00s)
panic: interface conversion: []parsec.ParsecNode is not parsec.Queryable: missing method GetAttribute [recovered]
	panic: interface conversion: []parsec.ParsecNode is not parsec.Queryable: missing method GetAttribute

goroutine 6 [running]:
testing.tRunner.func1.1(0x52cb80, 0xc00007cdb0)
	/nix/store/i14hzzs9ww364b5zl4c4g503rarwk09g-go-1.14.4/share/go/src/testing/testing.go:940 +0x2f5
testing.tRunner.func1(0xc000136120)
	/nix/store/i14hzzs9ww364b5zl4c4g503rarwk09g-go-1.14.4/share/go/src/testing/testing.go:943 +0x3f9
panic(0x52cb80, 0xc00007cdb0)
	/nix/store/i14hzzs9ww364b5zl4c4g503rarwk09g-go-1.14.4/share/go/src/runtime/panic.go:969 +0x166
PATH/vendor/github.com/prataprc/goparsec.(*AST).Parsewith(0xc00008df38, 0xc00007cb70, 0x583140, 0xc00007a230, 0xc00007cb70, 0x55, 0x37e, 0x4c0c30)
	PATH/vendor/github.com/prataprc/goparsec/ast.go:90 +0xae
PATH/test.TestParsec(0xc000136120)
	PATH/parser_test.go:19 +0x31e
testing.tRunner(0xc000136120, 0x55f4d0)
	/nix/store/i14hzzs9ww364b5zl4c4g503rarwk09g-go-1.14.4/share/go/src/testing/testing.go:991 +0xdc
created by testing.(*T).Run
	/nix/store/i14hzzs9ww364b5zl4c4g503rarwk09g-go-1.14.4/share/go/src/testing/testing.go:1042 +0x357

Process finished with exit code 1

Replacing the parser with (1) gives the same error, replacing with (2) works, but it is not equivalent (obviously).

How to write parser to extract elements from an array

Hi,

I am writing a service where I have to parse a array of bytes. Now, different portions of the array contain different fields which I need to extract. I am wondering if parser-combinator is a good candidate to approach this.

For eg- if I have array of 10 elements. 1-5 signify field1, 6-7 is field2, 8-10 is field3. And I want to pass the array and get those 3 fields. How would I do it using this library ?

P.S. - Sorry if this is a n00b question. I am trying with familiarize myself with the concept.

Thanks !

Allow Nodify functions to decide a match.

In the present implementation, standard combinators like AND, OR etc.. try to match the input text with parser function, and if there is a match it dispatches the Nodify function subscribed for that combinator.

Proposed solution is to allow the Nodify callback to return a nil if it decides that the matching nodes, though they succeed the combinator rule, fall under exceptional cases. In such cases, the standard combinator should proceed in such a manner that the match is a failure. And importantly I should not consume the scanned text.

Project status; Panic in parsec.String() with unterminated string

Hi @prataprc, first off: is this project still maintained? Im guessing that the Travis CI is no longer working and that the version of go (1.13) means that it wont build with modern toolchain?

If I run an unterminated string through the String() parser, it panics with runtime error: index out of range [6] with length 6:

data := []byte(`"hello`)
scanner := parsec.NewScanner(data)
node, _ := parsec.String()(scanner)
print(node)

I think the issue is in tokenizer.go, in method scanString (suggested changes in comments):

func scanString(txt []byte) (tok []byte, readn int) {
	if len(txt) < 2 {
		return nil, 0
	}

	e := 1
	for txt[e] != '"' {
		c := txt[e]
		if c == '\\' || c == '"' || c < ' ' {
			break
		}
		if c < utf8.RuneSelf {
			e++
			// SUGGESTED CHANGE 1
			// if e >= len(txt) {
			// 	return nil, 0
			// }
			continue
		}
		r, size := utf8.DecodeRune(txt[e:])
		if r == utf8.RuneError && size == 1 {
			return nil, 0
		}
		e += size

		// SUGGESTED CHANGE 2
		// if e >= len(txt) {
		// 	return nil, 0
		// }
	}

	// ...

Is there an appetite for bringing this project up-to-date and fixing this issue?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.