elixir-tools / spitfire Goto Github PK
View Code? Open in Web Editor NEWError tolerant parser for Elixir
Home Page: https://www.elixir-tools.dev
License: MIT License
Error tolerant parser for Elixir
Home Page: https://www.elixir-tools.dev
License: MIT License
There was an instance where I was writing an ExUnit test, and was adding another key into a @tag
. The parser raised the following exception.
[Error] ** (FunctionClauseError) no function clause matching in Spitfire.parse_kw_identifier/1
(spitfire 0.1.0) lib/spitfire.ex:501: Spitfire.parse_kw_identifier(%{tokens: [{:kw_identifier, {60, 10, ~c"timeout"}, :timeout}, {:int, {60, 19, 120000}, ~c"120_000"}, {:eol, {60, 26, 1}}, {:identifier, {63, 5, ~c"test"}, :test}, {:bin_string, {63, 10, nil}, ["somestring"]}, {:",", {63, 91, 0}}, {:%{}, {63, 93, nil}}, {:"{", {63, 94, nil}}, {:eol, {63, 95, 1}}, {:kw_identifier, {64, 7, ~c"foo"}, :foo}, {:identifier, {64, 17, ~c"foo"}, :foo}, {:eol, {64, 25, 1}}, {:"}", {65, 5, 1}}, {:do, {65, 7, nil}}, {:eol, {65, 9, 1}}, {:identifier, {66, 7, ~c"file"}, :file}, {:match_op, {66, 12, nil}, :=}, {:bin_string, {66, 14, nil}, ["somestring", {{66, 46, nil}, {66, 56, nil}, [{:identifier, {66, 48, ~c"foo"}, :foo}]}, "somestring"]}, {:eol, {66, 70, 2}}, {:identifier, {68, 7, ~c"foo"}, :foo}, {:match_op, {68, 18, nil}, :=}, {:alias, {68, 20, ~c"foo"}, :foo}, {:., {68, 30, nil}}, {:paren_identifier, {68, 31, ~c"load"}, :load}, {:"(", {68, 35, nil}}, {:identifier, {68, 36, ~c"file"}, :file}, {:",", {68, 40, 0}}, {:atom, {68, 42, ~c"local"}, :local}, {:")", {68, 48, nil}}, {:eol, {68, 49, 2}}, {:identifier, {70, 7, ~c"bar"}, :bar}, {:match_op, {70, 19, nil}, :=}, {:paren_identifier, {70, 21, ~c"bar"}, :bar}, {:"(", {70, 32, nil}}, {:identifier, {70, 33, ~c"foo"}, :foo}, {:")", {70, 41, nil}}, {:eol, {70, 42, 1}}, {:"{", {71, 7, nil}}, {:identifier, {71, 8, ~c"messages"}, :messages}, {:",", {71, 16, 0}}, {:identifier, {71, 18, ~c"_baz"}, :_baz}, {:"}", {71, 26, nil}}, {:match_op, {71, 28, nil}, :=}, {:paren_identifier, {71, 30, ~c"alice"}, :alice}, {:"(", {71, 45, ...}}, {:identifier, {71, ...}, :foo}, {:")", {...}}, {:eol, ...}, {...}, ...], literal_encoder: #Function<0.120709541/2 in NextLS.DocumentSymbol.fetch/1>, errors: [], current_token: {:at_op, {60, 5, nil}, :@}, fuel: 150, peek_token: {:identifier, {60, 6, ~c"tag"}, :tag}, nesting: 1})
(spitfire 0.1.0) lib/spitfire/while.ex:5: Spitfire.While2.recurse/3
(spitfire 0.1.0) lib/spitfire.ex:533: Spitfire.parse_bracketless_kw_list/1
(spitfire 0.1.0) lib/spitfire.ex:277: Spitfire.parse_expression/6
(spitfire 0.1.0) lib/spitfire.ex:1871: Spitfire.parse_identifier/1
(spitfire 0.1.0) lib/spitfire.ex:277: Spitfire.parse_expression/6
(spitfire 0.1.0) lib/spitfire.ex:618: Spitfire.parse_prefix_expression/1
(spitfire 0.1.0) lib/spitfire.ex:277: Spitfire.parse_expression/6
Recursive Descent parsers tend to lex the document as they go, rather than all up front. This allows you to know what to do in the case a bad token is lexed in the context of the current parsing state.
The existing lexer (elixir_tokenizer) is designed to work with a parser generator (yecc), so it parses it all up front, and if it reaches a bad token, it bails and returns an error.
Refactor the existing :elixir_tokenizer
(vendored as :spitfire_tokenzier
to enable on demand lexing.
The API of the module should basically consist of
new
- creates a new lexer state instance from a source code stringnext_token
- returns the next token in the document and the new lexer stateThe token structure should stay the same and contain the same semantics.
Hi Mitch! First off, thanks for your work on Spitfire. It's shaping up really well and is already a valuable tool!
Second, to hedge a bit, I don't know whether implementing this proposal would be appropriate right now given Spitfire's current state of development, but I wanted to bring it up now regardless.
Introduce some hook, be it a callback, behaviour, or something else, that controls the encoding of the parsed result. The default would be to emit Macro.t()
, as Spitfire currently does, but the hook would allow parsing Elixir code into a bespoke AST of arbitrary format.
As a concrete example, you might imagine the following:
Spitfire.parse!("1 + 2.5", encoder: &custom_encoder/1) # not sure what actual encoder arity would be
#=>
%BinaryCall{
op: :+,
meta: %{...},
left: %Integer{value: 1, meta: %{...},
right: %Float{value: 2.5, meta: %{...}
}
Elixir's AST is intentionally minimal. One reason is to facilitate authoring macros. For example:
foo(x: 1, y: 2)
# parses to:
{:foo, [], [[{:x, 1}, {:y, 2}]]}
# as opposed to:
{:foo, [],
[[{:{}, [], [:x, 1]},
{:{}, [], [:y, 2]}]]}
This makes it trivial to use keyword lists as options in macros, but also means that code processing an AST has to handle two forms of tuple. This is only one example of where the default AST can be cumbersome, but there are many situations where complex pattern matching, guards, or even metadata inspection are required to precisely differentiate syntax.
Sourceror, for instance, has a long standing issue for an enriched AST.
Foo.Bar.\nBaz
(note the newline), it's not possible to determine which line Bar
occurs on without inspecting the source, but with token data, it would be.Currently the following snippet is parsed as a :__block__
by Code.string_to_quoted, but I'm not sure why
(!false)
I have asked a question to core to resolve what the behavior should be: elixir-lang/elixir#13324
alias Foo.{Bar, Baz}
alias Foo.{
Bar,
Baz
}
The following code snippet seemed to drop the __cursor__()
node from the result.
"def handle_call({:foo, foo}, _from, state) do\n {:reply, :ok,\n %{state |\n foo: s\n__cursor__()\n,\n bar: Foo.Bar.load(state.foo, state.baz)}}\n end\n"
Spitfire doesn't handle
foo[bar["baz"]]
Code.string_to_quoted/2 returns
{:ok,
{{:.,
[from_brackets: true, closing: [line: 1, column: 15], line: 1, column: 4],
[Access, :get]},
[from_brackets: true, closing: [line: 1, column: 15], line: 1, column: 4],
[
{:foo, [line: 1, column: 1], nil},
{{:.,
[from_brackets: true, closing: [line: 1, column: 14], line: 1, column: 8],
[Access, :get]},
[from_brackets: true, closing: [line: 1, column: 14], line: 1, column: 8],
[{:bar, [line: 1, column: 5], nil}, "baz"]}
]}}
** (FunctionClauseError) no function clause matching in Spitfire.peek_token/1
(spitfire 0.1.0) lib/spitfire.ex:2167: Spitfire.peek_token(%{tokens: [nil | :eot], literal_encoder: #Function<49.133785804/2 in NextLS.handle_request/2>, errors: [{[line: 4, column: 13], "missing closing bracket for list"}], current_token: {:fake_closing_bracket, nil}, fuel: 150, peek_token: nil, nesting: 1})
(spitfire 0.1.0) lib/spitfire.ex:295: anonymous fn/7 in Spitfire.parse_expression/6
(spitfire 0.1.0) lib/spitfire/while.ex:49: Spitfire.While.do_while/2
(spitfire 0.1.0) lib/spitfire.ex:1875: Spitfire.parse_identifier/1
(spitfire 0.1.0) lib/spitfire.ex:283: Spitfire.parse_expression/6
(spitfire 0.1.0) lib/spitfire.ex:936: anonymous fn/1 in Spitfire.parse_do_block/2
(spitfire 0.1.0) lib/spitfire/while.ex:5: Spitfire.While2.recurse/3
(spitfire 0.1.0) lib/spitfire/while.ex:10: Spitfire.While2.recurse/3
This snippet is handled just fine by Code.string_to_quoted/2
but Spitfire returns an error
defmodule Foo do
@possible_results [:a, :b, :c]
@type result_type ::
unquote(
for result <- @possible_results, reduce: [] do
acc -> {:|, [], [result, acc]}
end
)
end
trailing commas need to be parsed in many places
I've been thinking more about error tolerance in Elixir parsing and wanted to start a discussion about errors that come from the tokenizer.
Currently, the behavior of the tokenizer is to stop tokenizing when an error is encountered, emitting the error and the token accumulator. This means that Spitfire can only be error-tolerant to the degree that the tokenizer is. Consider this bit of example code:
defmodule Foo do
def bar(x) do
<<<<<<< HEAD
x + 1
=======
x + 2
>>>>>>> main
end
def baz(x), do: x + 3
end
When the tokenizer encounters the merge conflict, it bails out, so the best that Spitfire can do is to give us context about the module and function head, but nothing about the body and nothing about the definition for baz
below.
If the tokenizer were to accumulate errors instead, Spitfire might be able to give us the AST for the following, as well as errors that VCS merge conflict markers were present on lines 3/5/7:
defmodule Foo do
def bar(x) do
x + 1
x + 2
end
def baz(x), do: x + 3
end
While I haven't reviewed every tokenizer error case, there are a significant number where an error could be accumulated and tokenization could continue. Alternatively, the tokenizer could emit "error tokens" of some kind, e.g. {{:error, :version_contol_marker}, {3, 1, nil}, ~c"<<<<<<< HEAD"}
and then handled in the parser.
Spitfire doesn't handle
%__MODULE__.Foo{bar: "foo"}
__MODULE__.Foo
__MODULE__.foo()
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.