Giter Club home page Giter Club logo

parsey's Introduction

Parsey

Swift Parser Combinator Framework

In addition to simple combinators, Parsey supports source location/range tracking, backtracking prevention, and custom error messages.

Parsey Playground

Features

  • Combinator interface

    • |, ~~, ~~>, <~~, ^^ combinator operators
  • Lexer primitives:

    • Lexer.whitespace, Lexer.signedInteger, ...
  • Regex-like combinators:

    • Postfix .+ for .many().
      • Example: let arrayLiteral = "[" ~~> expression.+ <~~ "]"
    • Postfix .* for .manyOrNone().
      • Example: let classDef = (attribute | method).*
    • Postfix .? for .optional().
      • Example: let declaration = "let" ~~> id ~~ (":" ~~> type).? ~~ ("=" ~~> expression)
    • Postfix + for .manyConcatenated().
      • Example: let skippedSpaces = (Lexer.space | Lexer.tab)+
    • Infix + for .concatenatingResult(with:).
      • Example: let type = Lexer.upperLetter + Lexer.letter*
    • Lexer.regex(_:) for directly applying regular expressions.
      • Example: let id = Lexer.regex("[a-zA-Z][a-zA-Z0-9]*")
  • Backtracking prevention

    • .! postfix operator or .nonbacktracking()
  • Parser tagging for error messages

    • <!-- operator or .tagged(_:)
  • Rich error messages with source location

    • For example:
    Parse failure at 2:4 ----
    (+ %% 1 -20) 2 3)
       ^~~~~~~~~~~~~~
    Expecting an expression, but found "%"
    
  • Source range tracking

    • ^^^ operator or .mapParse(_:)
    • For example, S-expression \n(+ \n\n(+ +1 -20) 2 3) gets parsed to the following range-tracked AST:
    Expr:(2:1..<4:16):[
        ID:(2:2..<2:3):+,
        Expr:(4:1..<4:11):[
            ID:(4:2..<4:3):+,
            Int:(4:4..<4:6):1,
            Int:(4:7..<4:10):-20],
        Int:(4:12..<4:13):2,
        Int:(4:14..<4:15):3]
    

Requirements

  • Swift 3

  • Any operating system

Package

To use it in your Swift project, add the following dependency to your Swift package description file.

    .Package(url: "https://github.com/rxwei/Parsey", majorVersion: 1)

⚙ Examples

0️⃣ An LLVM Compiler Frontend written in Swift using Parsey

The COOL Programming Language

1️⃣ Parse Left-associative Infix Expressions with Operator Precedence

indirect enum Expression {
    case integer(Int)
    case symbol(String)
    case infix(String, Expression, Expression)
}

enum Grammar {
    static let integer = Lexer.signedInteger
        ^^ {Int($0)!} ^^ Expression.integer

    static let symbol = Lexer.regex("[a-zA-Z][0-9a-zA-Z]*")
        ^^ Expression.symbol

    static let addOp = Lexer.anyCharacter(in: "+-")
        ^^ { op in { Expression.infix(op, $0, $1) } }
    
    static let multOp = Lexer.anyCharacter(in: "*/")
        ^^ { op in { Expression.infix(op, $0, $1) } }

    /// Left-associative multiplication
    static let multiplication = (integer | symbol).infixedLeft(by: multOp)

    /// Left-associative addition
    static let addition = multiplication.infixedLeft(by: addOp)

    static let expression: Parser<Expression> = addition
}

try print(Grammar.expression.parse("2"))
/// Output:
/// Expression.integer(2)

try print(Grammar.expression.parse("2+1+2*a"))
/// Output:
/// Expression.infix("+",
///                  .infix("+", .integer(2), .integer(1)),
///                  .infix("*", .integer(2), .symbol("a")))

2️⃣ Parse S-Expressions

indirect enum Expr {
    case sExp([Expr])
    case int(Int)
    case id(String)
}

enum Grammar {
    static let whitespaces = (Lexer.space | Lexer.tab | Lexer.newLine)+
    static let anInt = Lexer.signedInteger ^^ { Int($0)! } ^^ Expr.int
    static let anID = Lexer.regex("[a-zA-Z_+\\-*/][0-9a-zA-Z_+\\-*/]*") ^^ Expr.id
    static let aSExp: Parser<Expr> =
        "(" ~~> (anExp.!).many(separatedBy: whitespaces).amid(whitespaces.?) <~~ ")"
        ^^ Expr.sExp
    static let anExp = anInt | anID | aSExp <!-- "an expression"
}

/// Success
try Grammar.anExp.parse("(+ (+ 1 -20) 2 3)")
/// Output: Expr.sExp(...)

/// Failure
try Grammar.anExp.parse("(+ \n(+ %% 1 -20) 2 3)")
/// Output: Parse failure at 2:4 ----
///         (+ %% 1 -20) 2 3)
///            ^~~~~~~~~~~~~~
///         Expecting an expression, but found "%"

3️⃣ Parse S-Expressions with Source Range Tracking

indirect enum Expr {
    case sExp([Expr], SourceRange)
    case int(Int, SourceRange)
    case id(String, SourceRange)
}

enum Grammar {
    static let whitespaces = (Lexer.space | Lexer.tab | Lexer.newLine)+

    static let anInt = Lexer.signedInteger 
        ^^^ { Expr.int(Int($0.target)!, $0.range) }

    static let anID = Lexer.regex("[a-zA-Z_+\\-*/][0-9a-zA-Z_+\\-*/]*")
        ^^^ { Expr.id($0.target, $0.range) }

    static let aSExp: Parser<Expr> =
        "(" ~~> (anExp.!).many(separatedBy: whitespaces).amid(whitespaces.?) <~~ ")"
        ^^^ { Expr.sExp($0.target, $0.range) }

    static let anExp = anInt | anID | aSExp <!-- "an expression"
}

/// Success
try Grammar.anExp.parse("(+ (+ 1 -20) 2 3)")
/// Output: Expr.sExp(...)

/// Failure
try Grammar.anExp.parse("(+ \n(+ %% 1 -20) 2 3)")
/// Output: Parse failure at 2:4 ----
///         (+ %% 1 -20) 2 3)
///            ^~~~~~~~~~~~~~
///         Expecting an expression, but found "%"

Dependency

License

MIT License

parsey's People

Contributors

rxwei avatar kyouko-taiga avatar

Stargazers

Simon Solnes avatar ebigram avatar xiaohu avatar ptruser avatar  avatar anyinfa avatar Manuel Maly avatar GAURAV avatar Brad Barrows avatar Omar Gómez avatar Bastian Kusserow avatar June French avatar Arne Bahlo avatar Jegor Gołowin avatar Peter Keše avatar  avatar  avatar tuqp53 avatar Jacob Relkin avatar Bart Chrzaszcz avatar Paul Gowder avatar abdul dakkak avatar Austin Rude avatar  avatar Arnau Siches avatar  avatar Ian Keen avatar Kaweh Kazemi avatar Alex avatar Ales Tsurko avatar Lane Schwartz avatar Jin Mingjian avatar Bliss Chapman avatar  avatar Masahiko Tsujita avatar Bernhard Loibl avatar Don Sleeter avatar John D. Pope avatar Krzysztof Magosa avatar @rubynerd avatar Ian Ownbey avatar Yeechan Lu avatar Donghua Li avatar Ravi Ranjan avatar Mateusz Szklarek avatar  avatar Yoichi Tagaya avatar Dominique d'Argent avatar Daniel Gräfe avatar Yasuhiro Inami avatar sablib avatar  avatar Nan-Jiang Jiang avatar  avatar Konstantin Pavlikhin avatar Tim Kersey avatar Jasdev Singh avatar Chris Eidhof avatar

Watchers

 avatar James Cloos avatar  avatar  avatar Brad Barrows avatar

parsey's Issues

Is it possible to completely backtrack after `suffixed(by:)`?

Let's say I have two parsers, one for call expressions, and one for subscript expressions:

static let callExpr = atom.suffixed(by: "(" ~~> callArgs <~~ ")")
static let callArgs =
    (callArg.many(separatedBy: comma) <~~ comma.?).?
    ^^^ { (args, loc) in
        return { CallExpr(callee: $0, arguments: args ?? [], location: loc) as Node }
    }

static let subscriptExpr = atom.suffixed(by: "[" ~~> subscriptArgs <~~ "]")
static let subscriptArgs =
    callArg.many(separatedBy: comma) <~~ comma.?
    ^^^ { (args, loc) in
        return { SubscriptExpr(callee: $0, arguments: args, location: loc) as Node }
    }

static let atom = ident | literal | "(" ~~> expr <~~ ")"
static let expr = callExpr | subscriptExpr | atom

Both those parsers can successfully parse things like f(1, 2) and m[0, 8] respectively. My problem is to find a way to combine them in the expr parser.

The way I presented it above, the expr parser can no longer parse subscript expressions. It first tries to parse a call expression, successfully parsing the callee but then failing to parse the arguments. And because it doesn't backtrack until before the callee, it is left with an input of the form [0, 8], which can't be parsed by the remaining parsers.

Maybe I could create a function Parser<(Node) -> Node>, that could choose between call and subscript expressions and I'd be left with a single atom.suffixed(by ...) parser. But this solution seems rather complex and difficult to maintain if I add things like dot and suffix expressions later on.

I guess the best would be if I could somehow tell the expr parser to completely backtrack after any of its options failed.

swift 3.1 build errors in release configuration

Running swift build -c release
Compile Swift Module 'Funky' (10 sources)
Compile Swift Module 'Parsey' (5 sources)
sourceRoot/Packages/Parsey/Sources/Combinators.swift:71:62: error: var 'tagged' is internal and cannot be referenced from an '@inline(__always)' function
            catch var failure as ParseFailure where !failure.tagged {
                                                             ^
sourceRoot/Packages/Parsey/Sources/Error.swift:19:18: note: var 'tagged' is not '@_versioned' or public
    internal var tagged: Bool = false
                 ^
sourceRoot/Packages/Parsey/Sources/Combinators.swift:72:25: error: instance method 'tag' is internal and cannot be referenced from an '@inline(__always)' function
                failure.tag(tag)
                        ^
sourceRoot/Packages/Parsey/Sources/Error.swift:32:28: note: instance method 'tag' is not '@_versioned' or public
    internal mutating func tag(_ tag: String) {
                           ^
sourceRoot/Packages/Parsey/Sources/Combinators.swift:71:62: error: var 'tagged' is internal and cannot be referenced from an '@inline(__always)' function
            catch var failure as ParseFailure where !failure.tagged {
                                                             ^
sourceRoot/Packages/Parsey/Sources/Error.swift:19:18: note: var 'tagged' is not '@_versioned' or public
    internal var tagged: Bool = false
                 ^
sourceRoot/Packages/Parsey/Sources/Combinators.swift:72:25: error: instance method 'tag' is internal and cannot be referenced from an '@inline(__always)' function
                failure.tag(tag)
                        ^
sourceRoot/Packages/Parsey/Sources/Error.swift:32:28: note: instance method 'tag' is not '@_versioned' or public
    internal mutating func tag(_ tag: String) {
                           ^
<unknown>:0: error: build had 1 command failures

Using swift 3.1 snapshot.

LMK if there's any more information that would be useful.

error: 'Parse<Target>' initializer is inaccessible due to 'internal' protection level

I am trying to write a parser for IP numbers, e.g. "192.168.140.1". The parser should fail, if one of the numbers is invalid. I thought I could write a combinator to define the constraints. Then it would be possible to write:
Lexer.unsignedInteger.validRange(0,255)

What I am getting is:
error: 'Parse' initializer is inaccessible due to 'internal' protection level

Strange issue related to the use of substring in ParserInput

I discovered a strange issue that seems to be linked to the use of substring in ParserInput.

My apologies, I wasn't able to create a minimal example. But the issue can be observed by running the tests of my LogicKit library, with the latest changes merged in master. At line 44 of Parser.swift, I parse conjunction of terms, using infixedLeft(by:). Everything works fine with two terms, but if I attempt to parse three, the parser produces completely invalid results (In that particular example, an invalid value of the enum) that may even trigger runtime errors.

Using a String rather than a Substring for the property ParserInput.stream seems to solve the problem. Therefore, my guess is that something wrong is happening with the pointers of the substring when the parser backtracks. Unfortunately I don't have any evidence to support that claim, and I don't know how the current implementation differs from the previous one (which was based on character views).

Any idea on the source of the problem?
Also, I understand some concerns were raised about the use of strings in ParserInput, as they may involve a lot of unnecessary copies. But maybe there is an intermediate solution that could rely on Swift's default copy-on-write behavior since strings are value-types? Maybe some benchmarks could shed some light on the issue.

Struggling to understand `suffixed(by:)`

I'm trying to write a grammar for a language that supports function calls with the usual syntax, i.e.

f()
f(a, b)
f(a, b)(c)
...

Since I'm not very familiar with parser combinators, the approach I took is quite similar to what I've done with LALR grammars, but couldn't managed to get where I wanted. In the search for a solution, I came across the suffix operation parser (Parser.suffixed(by:)) in the code, which seems quite promising. But unfortunately, it didn't bring me any more success so far.

Trying to strip as much as I can, here's a minimal example:

static let callExpr = atom.suffixed(by: "(" ~~> callArgs <~~ ")")
static let callArgs =
        expr.many(separatedBy: comma) <~~ comma.?
        ^^^ { (args, loc) in { CallExpr(callee: $0, arguments: args, location: loc) as Node } }
static let ident = Lexer.regex("[a-zA-Z_]\\w*")
        ^^^ { (val, loc) in Ident(name: val, location: loc) }
static let atom: Parser<Node> = callExpr | ident ^^ { $0 as Node }

This compiles just fines, but fails miserably with signal 4.

Any help would be amazingly appreciated ;)

Tag 2.1.0 is missing files

If I include

.package(url: "https://github.com/rxwei/Parsey.git", from: "2.1.0"),

... everything breaks :-(

It appears that git tag 2.1.0 is missing all source files except 'Functional' and 'Operators' -- see:
https://github.com/rxwei/Parsey/tree/2.1.0/Sources/Parsey

The only way to make swift compile the code is to go to .build/checkouts/... and manually check out master branch.

I'd kindly ask to include missing files and make a new release.

Issue "closure argument was escaped..."

Hello,

I use the library "LogicKit" (By @kyouko-taiga ) which has dependencies on your library.
There are a lot of tests in "ProofKit" that I launched and all worked before (I suppose before swift 4.2).

An issue is detected in the file parser.swift at the line 317.

Here a screen of what I have.

If you want to check the test, you just need to launch tests of "LogicKit", in the file LogicKitParserTests.swift

I tried to find with the debugger which line can throw this error, and it seems there is one (but not the only one): XCTAssertEqual(try Grammar.variable.parse(".var(\"x\")") , .var("x")) (Line 8 in the link above)

If you need more informations tell me. Maybe it's just a problem with the new version of swift.

Implement staging

Implement a staging back-end code generator / interpreter as a seamless plugin to support the current tagless front-end. Contributions are welcome.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.