elixir-tools / spitfire Goto Github PK

View Code? Open in Web Editor NEW

47.0 47.0 6.0 273 KB

Error tolerant parser for Elixir

Home Page: https://www.elixir-tools.dev

License: MIT License

Elixir 62.31% Nix 0.77% Erlang 36.31% Euphoria 0.61%

elixir erlang otp parser pratt-parser recursive-descent-parser

spitfire's People

Contributors

Stargazers

Watchers

Forkers

davydog187 lucacervello belaba zachallaun njichev biletskyy

spitfire's Issues

bug: failed to parse a bracketed keyword list

Description

There was an instance where I was writing an ExUnit test, and was adding another key into a @tag. The parser raised the following exception.

[Error] ** (FunctionClauseError) no function clause matching in Spitfire.parse_kw_identifier/1
    (spitfire 0.1.0) lib/spitfire.ex:501: Spitfire.parse_kw_identifier(%{tokens: [{:kw_identifier, {60, 10, ~c"timeout"}, :timeout}, {:int, {60, 19, 120000}, ~c"120_000"}, {:eol, {60, 26, 1}}, {:identifier, {63, 5, ~c"test"}, :test}, {:bin_string, {63, 10, nil}, ["somestring"]}, {:",", {63, 91, 0}}, {:%{}, {63, 93, nil}}, {:"{", {63, 94, nil}}, {:eol, {63, 95, 1}}, {:kw_identifier, {64, 7, ~c"foo"}, :foo}, {:identifier, {64, 17, ~c"foo"}, :foo}, {:eol, {64, 25, 1}}, {:"}", {65, 5, 1}}, {:do, {65, 7, nil}}, {:eol, {65, 9, 1}}, {:identifier, {66, 7, ~c"file"}, :file}, {:match_op, {66, 12, nil}, :=}, {:bin_string, {66, 14, nil}, ["somestring", {{66, 46, nil}, {66, 56, nil}, [{:identifier, {66, 48, ~c"foo"}, :foo}]}, "somestring"]}, {:eol, {66, 70, 2}}, {:identifier, {68, 7, ~c"foo"}, :foo}, {:match_op, {68, 18, nil}, :=}, {:alias, {68, 20, ~c"foo"}, :foo}, {:., {68, 30, nil}}, {:paren_identifier, {68, 31, ~c"load"}, :load}, {:"(", {68, 35, nil}}, {:identifier, {68, 36, ~c"file"}, :file}, {:",", {68, 40, 0}}, {:atom, {68, 42, ~c"local"}, :local}, {:")", {68, 48, nil}}, {:eol, {68, 49, 2}}, {:identifier, {70, 7, ~c"bar"}, :bar}, {:match_op, {70, 19, nil}, :=}, {:paren_identifier, {70, 21, ~c"bar"}, :bar}, {:"(", {70, 32, nil}}, {:identifier, {70, 33, ~c"foo"}, :foo}, {:")", {70, 41, nil}}, {:eol, {70, 42, 1}}, {:"{", {71, 7, nil}}, {:identifier, {71, 8, ~c"messages"}, :messages}, {:",", {71, 16, 0}}, {:identifier, {71, 18, ~c"_baz"}, :_baz}, {:"}", {71, 26, nil}}, {:match_op, {71, 28, nil}, :=}, {:paren_identifier, {71, 30, ~c"alice"}, :alice}, {:"(", {71, 45, ...}}, {:identifier, {71, ...}, :foo}, {:")", {...}}, {:eol, ...}, {...}, ...], literal_encoder: #Function<0.120709541/2 in NextLS.DocumentSymbol.fetch/1>, errors: [], current_token: {:at_op, {60, 5, nil}, :@}, fuel: 150, peek_token: {:identifier, {60, 6, ~c"tag"}, :tag}, nesting: 1})
    (spitfire 0.1.0) lib/spitfire/while.ex:5: Spitfire.While2.recurse/3
    (spitfire 0.1.0) lib/spitfire.ex:533: Spitfire.parse_bracketless_kw_list/1
    (spitfire 0.1.0) lib/spitfire.ex:277: Spitfire.parse_expression/6
    (spitfire 0.1.0) lib/spitfire.ex:1871: Spitfire.parse_identifier/1
    (spitfire 0.1.0) lib/spitfire.ex:277: Spitfire.parse_expression/6
    (spitfire 0.1.0) lib/spitfire.ex:618: Spitfire.parse_prefix_expression/1
    (spitfire 0.1.0) lib/spitfire.ex:277: Spitfire.parse_expression/6

refactor: on demand lexing

Description

Recursive Descent parsers tend to lex the document as they go, rather than all up front. This allows you to know what to do in the case a bad token is lexed in the context of the current parsing state.

The existing lexer (elixir_tokenizer) is designed to work with a parser generator (yecc), so it parses it all up front, and if it reaches a bad token, it bails and returns an error.

Solution

Refactor the existing :elixir_tokenizer (vendored as :spitfire_tokenzier to enable on demand lexing.

The API of the module should basically consist of

new - creates a new lexer state instance from a source code string
next_token - returns the next token in the document and the new lexer state

The token structure should stay the same and contain the same semantics.

Considerations

the existing implementation has a special interpolation module
conversion from Erlang to Elixir is not necessary, but might eventually be done

Proposal: Introduce some hook to change encoding of parsed result

Hi Mitch! First off, thanks for your work on Spitfire. It's shaping up really well and is already a valuable tool!

Second, to hedge a bit, I don't know whether implementing this proposal would be appropriate right now given Spitfire's current state of development, but I wanted to bring it up now regardless.

Proposal

Introduce some hook, be it a callback, behaviour, or something else, that controls the encoding of the parsed result. The default would be to emit Macro.t(), as Spitfire currently does, but the hook would allow parsing Elixir code into a bespoke AST of arbitrary format.

As a concrete example, you might imagine the following:

Spitfire.parse!("1 + 2.5", encoder: &custom_encoder/1) # not sure what actual encoder arity would be
#=>
%BinaryCall{
  op: :+, 
  meta: %{...}, 
  left: %Integer{value: 1, meta: %{...},
  right: %Float{value: 2.5, meta: %{...}
}

Context

Elixir's AST is intentionally minimal. One reason is to facilitate authoring macros. For example:

foo(x: 1, y: 2)

# parses to:
{:foo, [], [[{:x, 1}, {:y, 2}]]}

# as opposed to:
{:foo, [],
 [[{:{}, [], [:x, 1]},
   {:{}, [], [:y, 2]}]]}

This makes it trivial to use keyword lists as options in macros, but also means that code processing an AST has to handle two forms of tuple. This is only one example of where the default AST can be cumbersome, but there are many situations where complex pattern matching, guards, or even metadata inspection are required to precisely differentiate syntax.

Sourceror, for instance, has a long standing issue for an enriched AST.

Additional Considerations

The perhaps obvious alternative is to transform Elixir AST into whatever format you want after the fact. This has two downsides that I can think of:
1. It is slower to parse, walk, and transform than it would be to parse and output the desired result in one shot.
2. There is additional metadata/context during parsing that is not included in the Elixir AST but that could be valuable. As a concrete example: when parsing Foo.Bar.\nBaz (note the newline), it's not possible to determine which line Bar occurs on without inspecting the source, but with token data, it would be.
It could be valuable to allow this or another hook to maintain and return an accumulator as well. This might be used to collect lint violations (in additional to parse errors). Based on this comment, it seems like you're already planning to return an accumulated value in addition to the parse result.
I expect, if implemented, parsing in the default case would be measurably slower due to the overhead of the additional call whenever a node is being constructed. I'm not sure what an acceptable amount of performance loss is, but I acknowledge that there is a line somewhere. (My gut says something like 1.1-1.2x would be acceptable, while 2x would almost certainly not be.)

Grouped expression with single expression sometimes is a block

Description

Currently the following snippet is parsed as a :__block__ by Code.string_to_quoted, but I'm not sure why

(!false)

I have asked a question to core to resolve what the behavior should be: elixir-lang/elixir#13324

multi aliases

Description

alias Foo.{Bar, Baz}

alias Foo.{
  Bar,
  Baz
}

bug: node could be dropped in certain circumstances

Description

The following code snippet seemed to drop the __cursor__() node from the result.

"def handle_call({:foo, foo}, _from, state) do\n    {:reply, :ok,\n     %{state |\n        foo: s\n__cursor__()\n,\n        bar: Foo.Bar.load(state.foo, state.baz)}}\n  end\n"

Spitfire doesn't handle nested access patterns

Spitfire doesn't handle

foo[bar["baz"]]

Code.string_to_quoted/2 returns

{:ok,
 {{:.,
   [from_brackets: true, closing: [line: 1, column: 15], line: 1, column: 4],
   [Access, :get]},
  [from_brackets: true, closing: [line: 1, column: 15], line: 1, column: 4],
  [
    {:foo, [line: 1, column: 1], nil},
    {{:.,
      [from_brackets: true, closing: [line: 1, column: 14], line: 1, column: 8],
      [Access, :get]},
     [from_brackets: true, closing: [line: 1, column: 14], line: 1, column: 8],
     [{:bar, [line: 1, column: 5], nil}, "baz"]}
  ]}}

bug: peek_token

** (FunctionClauseError) no function clause matching in Spitfire.peek_token/1
    (spitfire 0.1.0) lib/spitfire.ex:2167: Spitfire.peek_token(%{tokens: [nil | :eot], literal_encoder: #Function<49.133785804/2 in NextLS.handle_request/2>, errors: [{[line: 4, column: 13], "missing closing bracket for list"}], current_token: {:fake_closing_bracket, nil}, fuel: 150, peek_token: nil, nesting: 1})
    (spitfire 0.1.0) lib/spitfire.ex:295: anonymous fn/7 in Spitfire.parse_expression/6
    (spitfire 0.1.0) lib/spitfire/while.ex:49: Spitfire.While.do_while/2
    (spitfire 0.1.0) lib/spitfire.ex:1875: Spitfire.parse_identifier/1
    (spitfire 0.1.0) lib/spitfire.ex:283: Spitfire.parse_expression/6
    (spitfire 0.1.0) lib/spitfire.ex:936: anonymous fn/1 in Spitfire.parse_do_block/2
    (spitfire 0.1.0) lib/spitfire/while.ex:5: Spitfire.While2.recurse/3
    (spitfire 0.1.0) lib/spitfire/while.ex:10: Spitfire.While2.recurse/3

Spitfire doesn't handle unquote inside specs

This snippet is handled just fine by Code.string_to_quoted/2 but Spitfire returns an error

defmodule Foo do
  @possible_results [:a, :b, :c]

  @type result_type ::
          unquote(
            for result <- @possible_results, reduce: [] do
              acc -> {:|, [], [result, acc]}
            end
          )
end

bug: trailing commas

trailing commas need to be parsed in many places

Add repo description, website, and topics

😄

Error-tolerant tokenization

I've been thinking more about error tolerance in Elixir parsing and wanted to start a discussion about errors that come from the tokenizer.

Currently, the behavior of the tokenizer is to stop tokenizing when an error is encountered, emitting the error and the token accumulator. This means that Spitfire can only be error-tolerant to the degree that the tokenizer is. Consider this bit of example code:

defmodule Foo do
  def bar(x) do
<<<<<<< HEAD
    x + 1
=======
    x + 2
>>>>>>> main
  end

  def baz(x), do: x + 3
end

When the tokenizer encounters the merge conflict, it bails out, so the best that Spitfire can do is to give us context about the module and function head, but nothing about the body and nothing about the definition for baz below.

If the tokenizer were to accumulate errors instead, Spitfire might be able to give us the AST for the following, as well as errors that VCS merge conflict markers were present on lines 3/5/7:

defmodule Foo do
  def bar(x) do

    x + 1

    x + 2

  end

  def baz(x), do: x + 3
end

While I haven't reviewed every tokenizer error case, there are a significant number where an error could be accumulated and tokenization could continue. Alternatively, the tokenizer could emit "error tokens" of some kind, e.g. {{:error, :version_contol_marker}, {3, 1, nil}, ~c"<<<<<<< HEAD"} and then handled in the parser.

Spitfire doesn't handle `%MODULE.Foo{bar: "foo"}`

Spitfire doesn't handle

       %__MODULE__.Foo{bar: "foo"}

MODULE prefixed aliases and function calls

Description

__MODULE__.Foo

__MODULE__.foo()

elixir-tools / spitfire Goto Github PK

spitfire's People

Contributors

Stargazers

Watchers

Forkers

spitfire's Issues

Description

Description

Solution

Considerations

Proposal

Context

Additional Considerations

Description

Description

Description

Description

Recommend Projects

Recommend Topics

Recommend Org