Giter Club home page Giter Club logo

parser-tools's Introduction

Racket is a general-purpose programming language and an ecosystem for language-oriented programming.

This repository holds the source code for the core of Racket plus some related packages. The rest of the Racket distribution source code is in other repositories, mostly under the Racket GitHub organization.

Quick Start

Pre-built versions of Racket for a variety of operating systems and architectures, as well as convenient source distributions are available at

https://download.racket-lang.org

Racket comes with extensive documentation, including several tutorials. You can read all of this documentation, as well as documentation for third-party packages at

https://docs.racket-lang.org

Building from Source

For information on building Racket from this repository, see the Build Guide.

Contributing

Contribute to Racket by

By making a contribution, you are agreeing that your contribution is licensed under the LGPLv3, Apache 2.0, and MIT licenses. Those licenses are available in this repository in the files racket/src/LICENSE-LGPL.txt, racket/src/LICENSE-APACHE.txt, and racket/src/LICENSE-MIT.txt.

See the Building, Distributing, and Contributing to Racket for more guidance on contributing.

The Friendly Environment Policy contains guidelines on expected behavior within the Racket community.

License

Racket is free software; see LICENSE for more details.

parser-tools's People

Contributors

bnoordhuis avatar dyoo avatar elibarzilay avatar jbclements avatar jeapostrophe avatar leifandersen avatar liberalartist avatar mbutterick avatar mfelleisen avatar mflatt avatar mikesperber avatar puercopop avatar rfindler avatar rmculpepper avatar rrthomas avatar samth avatar sorawee avatar tov avatar williewillus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

parser-tools's Issues

Parser raise read error on empty source

I think there might be an issue with the parser generator or at least a missing detail in the documentation. It is not clear how to handle an empty input, the following code shows the issue using a simple lexer and parser definition.

#lang racket

(require parser-tools/lex
         parser-tools/yacc)

(define-tokens tokens (ANY))
(define-empty-tokens empty-tokens (EOF))

(define sample-lexer
  (lexer
   [(eof) (token-EOF)]
   [any-string (token-ANY lexeme)]))

(define sample-parser
  (parser
   [start expr]
   [end EOF]
   [error void]
   [tokens tokens empty-tokens]
   [grammar
    [expr [(ANY) $1]]]))

(define (parse s)
  (define in (open-input-string s))
  (sample-parser (lambda () (sample-lexer in))))

(parse "abc")
(parse "")

The output

"abc"
parser: Cannot continue after error
  context...:
   /snap/racket/current/usr/share/racket/collects/syntax/readerr.rkt:15:2: -raise-read-error
   /snap/racket/current/usr/share/racket/pkgs/parser-tools-lib/parser-tools/yacc.rkt:356:16: parsing-loop
   body of "/home/davide/repos/sample-racket-language/sample.rkt"

The issue is more complex when the parser has multiple end tokens for example

#lang racket

(require parser-tools/lex
         parser-tools/yacc)

(define-tokens tokens (ANY))
(define-empty-tokens empty-tokens (EOF NEWLINE))

(define sample-lexer
  (lexer
   [(eof) (token-EOF)]
   [";" (token-NEWLINE)]
   [(repetition 1 +inf.0 (char-complement ";")) (token-ANY lexeme)]))

(define sample-parser
  (parser
   [start expr]
   [end EOF NEWLINE]
   [error void]
   [tokens tokens empty-tokens]
   [grammar
    [expr [(ANY) $1]]]))


(define in (open-input-string "abc;def"))
(sample-parser (lambda () (sample-lexer in)))
(sample-parser (lambda () (sample-lexer in)))
(sample-parser (lambda () (sample-lexer in)))

The output is

"abc"
"def"
parser: Cannot continue after error
  context...:
   /snap/racket/current/usr/share/racket/collects/syntax/readerr.rkt:15:2: -raise-read-error
   /snap/racket/current/usr/share/racket/pkgs/parser-tools-lib/parser-tools/yacc.rkt:356:16: parsing-loop
   body of "/home/davide/repos/sample-racket-language/sample-2.rkt"

What I would like to achieve is the same behavior of read for example

(define in (open-input-string "1 2"))
(read in)
(read in)
(read in)

Output is

1
2
#<eof>

Is this possible with the parser generator? Am I missing something?

cfg-parser binds production names in scope of their expressions

The following file should fail to compile because foo should be unbound. However, foo is bound, to some value internal to the parser:

#lang racket

(require parser-tools/lex parser-tools/cfg-parser)

(define-empty-tokens my-tokens (eof BLAH))

(define x
  (cfg-parser
   (start foo)
   (tokens my-tokens)
   (end eof)
   (grammar
    (foo ((BLAH)
          ;; this should be unbound but it isn't.
          foo)))))

For an example of how this might be harmful, consider:

(define (foo x) ...)
(define my-parser (cfg-parser ... (grammar ... (foo ((bar) (foo $1))))))

Feature request: support for Happy-style error recovery

Here is a small program that uses an error nonterminal to implement an error recovery rule:

#lang racket/base

(require parser-tools/lex
         parser-tools/yacc)

(define-empty-tokens my-tokens
  [t1 t2 t3 done])

(define (parse lst)
  (define-values [has-next? get-next] (sequence-generate lst))
  ((parser
    (tokens my-tokens)
    (error (λ (a b c) (void)))
    (debug "parser-debug.out")
    (start go)
    (end done)
    (grammar
     [go  [(nt1 nt2 nt3) (list $1 $3)]]
     [nt1 [(t1)          't1]]
     [nt2 [(t2)          't2]
          [(error)       'error]]
     [nt3 [(t3)          't3]]))
   get-next))

(parse '(t1 t2 t3 done))
(parse '(t1 t3 done))

I believe both uses of parse ought to produce the same output, but they do not. The first one succeeds, while the second one fails:

'(t1 t3)
(#(struct:stack-frame 4 #f #f #f) #(struct:stack-frame 0 #f #f #f))
parser: Cannot continue after error

The printed stack says the parser is in state 4, so let’s look at that state in parser-debug.out:

State 4
        nt1 -> t1 . 

        error                                   reduce  3
        t2                                      reduce  3

The error rule is clearly there! Yet it does not fire.


Looking at the source code, the issue appears to be that the code that attempts error states only runs shift rules, not reduce rules:

(let ((a (find-action stack 'error #f start-pos end-pos)))
(cond
((runtime-shift? a)
;; (printf "shift:~a\n" (runtime-shift-state a))
(set! stack
(cons
(make-stack-frame (runtime-shift-state a)
#f
start-pos
end-pos)
stack))
(remove-input tok val start-pos end-pos))
(else
;; (printf "discard state:~a\n" (car stack))
(cond
((< (length stack) 2)
(raise-read-error "parser: Cannot continue after error"
#f #f #f #f #f))
(else
(set! stack (cdr stack))
(remove-states)))))))))

As far as I can tell, this seems like a simple oversight.

Parser not passing required keyword argument #:stack to error function

As the title says, I'm getting this error:

application: required keyword argument not supplied
  procedure: ...error-repr.rkt:20:15
  required keyword: #:stack
  arguments...:
   #t
   'b
   #f
   (position 1 1 0)
   (position 2 1 1)
  context...:
   /Applications/Racket v7.7/collects/racket/private/kw.rkt:1551:14
   /Applications/Racket v7.7/share/pkgs/parser-tools-lib/parser-tools/yacc.rkt:347:16: parsing-loop
   "...error-repr.rkt": [running body]
   temp35_0
   for-loop
   run-module-instance!
   perform-require!

Code to reproduce:

#lang racket

(require parser-tools/lex
         parser-tools/yacc)

(define-empty-tokens the-tokens (a b EOF))

(define test-lexer
    (lexer-src-pos
        ["a" 'a]
        ["b" 'b]
        [(eof) 'EOF]))

(define sad-parser
    (parser
        (tokens the-tokens)
        (start q)
        (end EOF)
        (src-pos)
        (error (lambda (invalid-token? token-name token-value start-pos end-pos #:stack s)
                    '()))
        (grammar
            (q [(a) '()]))
    ))

(define (sad-parse input-port)
    (port-count-lines! input-port)
    (sad-parser (lambda () (test-lexer input-port))))

(sad-parse (open-input-string "b"))

I'm hoping I'm just missing something here...it's difficult to debug my parser without the stack information :|

Feature request: adjust-srcloc

Producing syntax object with the parser is really tedious, due to how one needs to reference appropriate $k-start-pos and $k-end-pos, subtracting positions to get the span, etc.

It would be nice if there is a function/macro adjust-srcloc where:

(adjust-srcloc <syntax-object>)

modifies the syntax object to have the syntax location from $1-start-pos to $n-end-pos

(adjust-srcloc <syntax-object> <k>)

modifies the syntax object to have the syntax location from $k-start-pos to $k-end-pos.

(adjust-srcloc <syntax-object> <a> <b>)

modifies the syntax object to have the syntax location from $a-start-pos to $b-end-pos.

That way, the action expression can be written concisely with:

(adjust-srcloc #`(foo #,(adjust-srcloc #'bar 2)) 1 3)

`cfg-parser` BNF inaccurate wrt implementation

IIUC, the syntax block for the cfg-parser says that it takes 0 or more clauses, where a clause is any one of those 6 options. It seems that cfg-parser requires at least one clause, and necessarily a grammar clause at that, to even generate valid syntax. Furthermore, I believe the grammar form claims it admits parse rules for 0 or more non-terminals, but the implementation requires at least one parse rule, and I believe that each RHS actually requires one or more grammar-ids, rather than 0 as stated.

See:

@defform/subs[#:literals (grammar tokens start end precs src-pos
suppress expected-SR-conflicts expected-RR-conflicts
debug yacc-output prec)
(cfg-parser clause ...)
([clause (grammar (non-terminal-id
((grammar-id ...) maybe-prec expr)
...)
...)
(tokens group-id ...)
(start non-terminal-id ...)
(end token-id ...)
(@#,racketidfont{error} expr)
(src-pos)])]{

*IIUC

cannot use `start` or `atok` as terminals

This bug was first reported on my fork of parser-tools. As best I could tell, the problem occurs at this location, where this maneuver happens:

#`(grammar (start [() null]
                  [(atok start) (cons $1 $2)])
           (atok [(tok) (make-tok 'tok-id 'tok $e pos ...)] ...)))
#`(start start)

The solution I came up with — well, it passes all tests, I can’t claim to understand why it works — involves renaming some identifiers like so:

(with-syntax ([%start start-id-temp]
              [%atok atok-id-temp])
  #`(grammar (%start [() null]
                     [(%atok %start) (cons $1 $2)])
             (%atok [(tok) (make-tok 'tok-id 'tok $e pos ...)] ...)))
(with-syntax ([%start start-id-temp])
  #`(start %start))

I’d like to contribute a patch but there’s an ugliness I don’t know how to fix:

One can’t literally use %start and %atok as the new identifiers, because then %start and %atok can’t be used as terminals. Hence the use of the variables start-id-temp and atok-id-temp, which hold the temp identifiers. But where to get these identifiers?

“Dude, just use generate-temporary.” Turns out that won’t work. IIUC the parser spawns multiple threads. If these identifiers are different across threads, then the results can’t be recombined. So although the names don’t have to be consistent across every run of the parser, they apparently do have to be consistent among threads.

I have no idea how to make this happen. My so-terrible-it-works idea was to assign very long, very weird names for start-id-temp and atok-id-temp. But I’m not prepared to pollute the Racket codebase with that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.