racket / parser-tools Goto Github PK

Racket 100.00%

parser-tools's Introduction

Racket is a general-purpose programming language and an ecosystem for language-oriented programming.

This repository holds the source code for the core of Racket plus some related packages. The rest of the Racket distribution source code is in other repositories, mostly under the Racket GitHub organization.

Quick Start

Pre-built versions of Racket for a variety of operating systems and architectures, as well as convenient source distributions are available at

https://download.racket-lang.org

Racket comes with extensive documentation, including several tutorials. You can read all of this documentation, as well as documentation for third-party packages at

https://docs.racket-lang.org

Building from Source

For information on building Racket from this repository, see the Build Guide.

Contributing

Contribute to Racket by

submitting a pull request
visiting the Racket forums
- Racket Discussions Discourse
- Racket Discord #internals
- Slack

By making a contribution, you are agreeing that your contribution is licensed under the LGPLv3, Apache 2.0, and MIT licenses. Those licenses are available in this repository in the files racket/src/LICENSE-LGPL.txt, racket/src/LICENSE-APACHE.txt, and racket/src/LICENSE-MIT.txt.

See the Building, Distributing, and Contributing to Racket for more guidance on contributing.

The Friendly Environment Policy contains guidelines on expected behavior within the Racket community.

License

Racket is free software; see LICENSE for more details.

parser-tools's People

Contributors

Stargazers

Watchers

Forkers

puercopop mbutterick jbclements rmculpepper mukeshtiwari leifandersen tov jeapostrophe liberalartist spdegabrielle sorawee seanpm2001 rrthomas

parser-tools's Issues

Parser raise read error on empty source

I think there might be an issue with the parser generator or at least a missing detail in the documentation. It is not clear how to handle an empty input, the following code shows the issue using a simple lexer and parser definition.

#lang racket

(require parser-tools/lex
         parser-tools/yacc)

(define-tokens tokens (ANY))
(define-empty-tokens empty-tokens (EOF))

(define sample-lexer
  (lexer
   [(eof) (token-EOF)]
   [any-string (token-ANY lexeme)]))

(define sample-parser
  (parser
   [start expr]
   [end EOF]
   [error void]
   [tokens tokens empty-tokens]
   [grammar
    [expr [(ANY) $1]]]))

(define (parse s)
  (define in (open-input-string s))
  (sample-parser (lambda () (sample-lexer in))))

(parse "abc")
(parse "")

The output

"abc"
parser: Cannot continue after error
  context...:
   /snap/racket/current/usr/share/racket/collects/syntax/readerr.rkt:15:2: -raise-read-error
   /snap/racket/current/usr/share/racket/pkgs/parser-tools-lib/parser-tools/yacc.rkt:356:16: parsing-loop
   body of "/home/davide/repos/sample-racket-language/sample.rkt"

The issue is more complex when the parser has multiple end tokens for example

#lang racket

(require parser-tools/lex
         parser-tools/yacc)

(define-tokens tokens (ANY))
(define-empty-tokens empty-tokens (EOF NEWLINE))

(define sample-lexer
  (lexer
   [(eof) (token-EOF)]
   [";" (token-NEWLINE)]
   [(repetition 1 +inf.0 (char-complement ";")) (token-ANY lexeme)]))

(define sample-parser
  (parser
   [start expr]
   [end EOF NEWLINE]
   [error void]
   [tokens tokens empty-tokens]
   [grammar
    [expr [(ANY) $1]]]))


(define in (open-input-string "abc;def"))
(sample-parser (lambda () (sample-lexer in)))
(sample-parser (lambda () (sample-lexer in)))
(sample-parser (lambda () (sample-lexer in)))

The output is

"abc"
"def"
parser: Cannot continue after error
  context...:
   /snap/racket/current/usr/share/racket/collects/syntax/readerr.rkt:15:2: -raise-read-error
   /snap/racket/current/usr/share/racket/pkgs/parser-tools-lib/parser-tools/yacc.rkt:356:16: parsing-loop
   body of "/home/davide/repos/sample-racket-language/sample-2.rkt"

What I would like to achieve is the same behavior of read for example

(define in (open-input-string "1 2"))
(read in)
(read in)
(read in)

Output is

1
2
#<eof>

Is this possible with the parser generator? Am I missing something?

cfg-parser binds production names in scope of their expressions

The following file should fail to compile because foo should be unbound. However, foo is bound, to some value internal to the parser:

#lang racket

(require parser-tools/lex parser-tools/cfg-parser)

(define-empty-tokens my-tokens (eof BLAH))

(define x
  (cfg-parser
   (start foo)
   (tokens my-tokens)
   (end eof)
   (grammar
    (foo ((BLAH)
          ;; this should be unbound but it isn't.
          foo)))))

For an example of how this might be harmful, consider:

(define (foo x) ...)
(define my-parser (cfg-parser ... (grammar ... (foo ((bar) (foo $1))))))

Feature request: support for Happy-style error recovery

Here is a small program that uses an error nonterminal to implement an error recovery rule:

#lang racket/base

(require parser-tools/lex
         parser-tools/yacc)

(define-empty-tokens my-tokens
  [t1 t2 t3 done])

(define (parse lst)
  (define-values [has-next? get-next] (sequence-generate lst))
  ((parser
    (tokens my-tokens)
    (error (λ (a b c) (void)))
    (debug "parser-debug.out")
    (start go)
    (end done)
    (grammar
     [go  [(nt1 nt2 nt3) (list $1 $3)]]
     [nt1 [(t1)          't1]]
     [nt2 [(t2)          't2]
          [(error)       'error]]
     [nt3 [(t3)          't3]]))
   get-next))

(parse '(t1 t2 t3 done))
(parse '(t1 t3 done))

I believe both uses of parse ought to produce the same output, but they do not. The first one succeeds, while the second one fails:

'(t1 t3)
(#(struct:stack-frame 4 #f #f #f) #(struct:stack-frame 0 #f #f #f))
parser: Cannot continue after error

The printed stack says the parser is in state 4, so let’s look at that state in parser-debug.out:

State 4
        nt1 -> t1 . 

        error                                   reduce  3
        t2                                      reduce  3

The error rule is clearly there! Yet it does not fire.

Looking at the source code, the issue appears to be that the code that attempts error states only runs shift rules, not reduce rules:

parser-tools/parser-tools-lib/parser-tools/yacc.rkt

Lines 318 to 338 in d76cb89

 (let ((a (find-action stack 'error #f start-pos end-pos))) 

 (cond 

 ((runtime-shift? a) 

 ;; (printf "shift:~a\n" (runtime-shift-state a)) 

 (set! stack 

 (cons 

 (make-stack-frame (runtime-shift-state a) 

 #f 

 start-pos 

 end-pos) 

 stack)) 

 (remove-input tok val start-pos end-pos)) 

 (else 

 ;; (printf "discard state:~a\n" (car stack)) 

 (cond 

 ((< (length stack) 2) 

 (raise-read-error "parser: Cannot continue after error" 

 #f #f #f #f #f)) 

 (else 

 (set! stack (cdr stack)) 

 (remove-states)))))))))

As far as I can tell, this seems like a simple oversight.

Parser not passing required keyword argument #:stack to error function

As the title says, I'm getting this error:

application: required keyword argument not supplied
  procedure: ...error-repr.rkt:20:15
  required keyword: #:stack
  arguments...:
   #t
   'b
   #f
   (position 1 1 0)
   (position 2 1 1)
  context...:
   /Applications/Racket v7.7/collects/racket/private/kw.rkt:1551:14
   /Applications/Racket v7.7/share/pkgs/parser-tools-lib/parser-tools/yacc.rkt:347:16: parsing-loop
   "...error-repr.rkt": [running body]
   temp35_0
   for-loop
   run-module-instance!
   perform-require!

Code to reproduce:

#lang racket

(require parser-tools/lex
         parser-tools/yacc)

(define-empty-tokens the-tokens (a b EOF))

(define test-lexer
    (lexer-src-pos
        ["a" 'a]
        ["b" 'b]
        [(eof) 'EOF]))

(define sad-parser
    (parser
        (tokens the-tokens)
        (start q)
        (end EOF)
        (src-pos)
        (error (lambda (invalid-token? token-name token-value start-pos end-pos #:stack s)
                    '()))
        (grammar
            (q [(a) '()]))
    ))

(define (sad-parse input-port)
    (port-count-lines! input-port)
    (sad-parser (lambda () (test-lexer input-port))))

(sad-parse (open-input-string "b"))

I'm hoping I'm just missing something here...it's difficult to debug my parser without the stack information :|

Feature request: adjust-srcloc

Producing syntax object with the parser is really tedious, due to how one needs to reference appropriate $k-start-pos and $k-end-pos, subtracting positions to get the span, etc.

It would be nice if there is a function/macro adjust-srcloc where:

(adjust-srcloc <syntax-object>)

modifies the syntax object to have the syntax location from $1-start-pos to $n-end-pos

(adjust-srcloc <syntax-object> <k>)

modifies the syntax object to have the syntax location from $k-start-pos to $k-end-pos.

(adjust-srcloc <syntax-object> <a> <b>)

modifies the syntax object to have the syntax location from $a-start-pos to $b-end-pos.

That way, the action expression can be written concisely with:

(adjust-srcloc #`(foo #,(adjust-srcloc #'bar 2)) 1 3)

`cfg-parser` BNF inaccurate wrt implementation

IIUC, the syntax block for the cfg-parser says that it takes 0 or more clauses, where a clause is any one of those 6 options. It seems that cfg-parser requires at least one clause, and necessarily a grammar clause at that, to even generate valid syntax. Furthermore, I believe the grammar form claims it admits parse rules for 0 or more non-terminals, but the implementation requires at least one parse rule, and I believe that each RHS actually requires one or more grammar-ids, rather than 0 as stated.

See:

parser-tools/parser-tools-doc/parser-tools/parser-tools.scrbl

Lines 722 to 734 in 4afd434

 @defform/subs[#:literals (grammar tokens start end precs src-pos 

 suppress expected-SR-conflicts expected-RR-conflicts 

 debug yacc-output prec) 

 (cfg-parser clause ...) 

 ([clause (grammar (non-terminal-id 

 ((grammar-id ...) maybe-prec expr) 

 ...) 

 ...) 

 (tokens group-id ...) 

 (start non-terminal-id ...) 

 (end token-id ...) 

 (@#,racketidfont{error} expr) 

 (src-pos)])]{

*IIUC

cannot use `start` or `atok` as terminals

This bug was first reported on my fork of parser-tools. As best I could tell, the problem occurs at this location, where this maneuver happens:

#`(grammar (start [() null]
                  [(atok start) (cons $1 $2)])
           (atok [(tok) (make-tok 'tok-id 'tok $e pos ...)] ...)))
#`(start start)

The solution I came up with — well, it passes all tests, I can’t claim to understand why it works — involves renaming some identifiers like so:

(with-syntax ([%start start-id-temp]
              [%atok atok-id-temp])
  #`(grammar (%start [() null]
                     [(%atok %start) (cons $1 $2)])
             (%atok [(tok) (make-tok 'tok-id 'tok $e pos ...)] ...)))
(with-syntax ([%start start-id-temp])
  #`(start %start))

I’d like to contribute a patch but there’s an ugliness I don’t know how to fix:

One can’t literally use %start and %atok as the new identifiers, because then %start and %atok can’t be used as terminals. Hence the use of the variables start-id-temp and atok-id-temp, which hold the temp identifiers. But where to get these identifiers?

“Dude, just use generate-temporary.” Turns out that won’t work. IIUC the parser spawns multiple threads. If these identifiers are different across threads, then the results can’t be recombined. So although the names don’t have to be consistent across every run of the parser, they apparently do have to be consistent among threads.

I have no idea how to make this happen. My so-terrible-it-works idea was to assign very long, very weird names for start-id-temp and atok-id-temp. But I’m not prepared to pollute the Racket codebase with that.

	(let ((a (find-action stack 'error #f start-pos end-pos)))
	(cond
	((runtime-shift? a)
	;; (printf "shift:~a\n" (runtime-shift-state a))
	(set! stack
	(cons
	(make-stack-frame (runtime-shift-state a)
	#f
	start-pos
	end-pos)
	stack))
	(remove-input tok val start-pos end-pos))
	(else
	;; (printf "discard state:~a\n" (car stack))
	(cond
	((< (length stack) 2)
	(raise-read-error "parser: Cannot continue after error"
	#f #f #f #f #f))
	(else
	(set! stack (cdr stack))
	(remove-states)))))))))

	@defform/subs[#:literals (grammar tokens start end precs src-pos
	suppress expected-SR-conflicts expected-RR-conflicts
	debug yacc-output prec)
	(cfg-parser clause ...)
	([clause (grammar (non-terminal-id
	((grammar-id ...) maybe-prec expr)
	...)
	...)
	(tokens group-id ...)
	(start non-terminal-id ...)
	(end token-id ...)
	(@#,racketidfont{error} expr)
	(src-pos)])]{