Giter Club home page Giter Club logo

spitfire's People

Contributors

davydog187 avatar lucacervello avatar mhanberg avatar njichev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

spitfire's Issues

bug: failed to parse a bracketed keyword list

Description

There was an instance where I was writing an ExUnit test, and was adding another key into a @tag. The parser raised the following exception.

[Error] ** (FunctionClauseError) no function clause matching in Spitfire.parse_kw_identifier/1
    (spitfire 0.1.0) lib/spitfire.ex:501: Spitfire.parse_kw_identifier(%{tokens: [{:kw_identifier, {60, 10, ~c"timeout"}, :timeout}, {:int, {60, 19, 120000}, ~c"120_000"}, {:eol, {60, 26, 1}}, {:identifier, {63, 5, ~c"test"}, :test}, {:bin_string, {63, 10, nil}, ["somestring"]}, {:",", {63, 91, 0}}, {:%{}, {63, 93, nil}}, {:"{", {63, 94, nil}}, {:eol, {63, 95, 1}}, {:kw_identifier, {64, 7, ~c"foo"}, :foo}, {:identifier, {64, 17, ~c"foo"}, :foo}, {:eol, {64, 25, 1}}, {:"}", {65, 5, 1}}, {:do, {65, 7, nil}}, {:eol, {65, 9, 1}}, {:identifier, {66, 7, ~c"file"}, :file}, {:match_op, {66, 12, nil}, :=}, {:bin_string, {66, 14, nil}, ["somestring", {{66, 46, nil}, {66, 56, nil}, [{:identifier, {66, 48, ~c"foo"}, :foo}]}, "somestring"]}, {:eol, {66, 70, 2}}, {:identifier, {68, 7, ~c"foo"}, :foo}, {:match_op, {68, 18, nil}, :=}, {:alias, {68, 20, ~c"foo"}, :foo}, {:., {68, 30, nil}}, {:paren_identifier, {68, 31, ~c"load"}, :load}, {:"(", {68, 35, nil}}, {:identifier, {68, 36, ~c"file"}, :file}, {:",", {68, 40, 0}}, {:atom, {68, 42, ~c"local"}, :local}, {:")", {68, 48, nil}}, {:eol, {68, 49, 2}}, {:identifier, {70, 7, ~c"bar"}, :bar}, {:match_op, {70, 19, nil}, :=}, {:paren_identifier, {70, 21, ~c"bar"}, :bar}, {:"(", {70, 32, nil}}, {:identifier, {70, 33, ~c"foo"}, :foo}, {:")", {70, 41, nil}}, {:eol, {70, 42, 1}}, {:"{", {71, 7, nil}}, {:identifier, {71, 8, ~c"messages"}, :messages}, {:",", {71, 16, 0}}, {:identifier, {71, 18, ~c"_baz"}, :_baz}, {:"}", {71, 26, nil}}, {:match_op, {71, 28, nil}, :=}, {:paren_identifier, {71, 30, ~c"alice"}, :alice}, {:"(", {71, 45, ...}}, {:identifier, {71, ...}, :foo}, {:")", {...}}, {:eol, ...}, {...}, ...], literal_encoder: #Function<0.120709541/2 in NextLS.DocumentSymbol.fetch/1>, errors: [], current_token: {:at_op, {60, 5, nil}, :@}, fuel: 150, peek_token: {:identifier, {60, 6, ~c"tag"}, :tag}, nesting: 1})
    (spitfire 0.1.0) lib/spitfire/while.ex:5: Spitfire.While2.recurse/3
    (spitfire 0.1.0) lib/spitfire.ex:533: Spitfire.parse_bracketless_kw_list/1
    (spitfire 0.1.0) lib/spitfire.ex:277: Spitfire.parse_expression/6
    (spitfire 0.1.0) lib/spitfire.ex:1871: Spitfire.parse_identifier/1
    (spitfire 0.1.0) lib/spitfire.ex:277: Spitfire.parse_expression/6
    (spitfire 0.1.0) lib/spitfire.ex:618: Spitfire.parse_prefix_expression/1
    (spitfire 0.1.0) lib/spitfire.ex:277: Spitfire.parse_expression/6

refactor: on demand lexing

Description

Recursive Descent parsers tend to lex the document as they go, rather than all up front. This allows you to know what to do in the case a bad token is lexed in the context of the current parsing state.

The existing lexer (elixir_tokenizer) is designed to work with a parser generator (yecc), so it parses it all up front, and if it reaches a bad token, it bails and returns an error.

Solution

Refactor the existing :elixir_tokenizer (vendored as :spitfire_tokenzier to enable on demand lexing.

The API of the module should basically consist of

  • new - creates a new lexer state instance from a source code string
  • next_token - returns the next token in the document and the new lexer state

The token structure should stay the same and contain the same semantics.

Considerations

  • the existing implementation has a special interpolation module
  • conversion from Erlang to Elixir is not necessary, but might eventually be done

Proposal: Introduce some hook to change encoding of parsed result

Hi Mitch! First off, thanks for your work on Spitfire. It's shaping up really well and is already a valuable tool!

Second, to hedge a bit, I don't know whether implementing this proposal would be appropriate right now given Spitfire's current state of development, but I wanted to bring it up now regardless.

Proposal

Introduce some hook, be it a callback, behaviour, or something else, that controls the encoding of the parsed result. The default would be to emit Macro.t(), as Spitfire currently does, but the hook would allow parsing Elixir code into a bespoke AST of arbitrary format.

As a concrete example, you might imagine the following:

Spitfire.parse!("1 + 2.5", encoder: &custom_encoder/1) # not sure what actual encoder arity would be
#=>
%BinaryCall{
  op: :+, 
  meta: %{...}, 
  left: %Integer{value: 1, meta: %{...},
  right: %Float{value: 2.5, meta: %{...}
}

Context

Elixir's AST is intentionally minimal. One reason is to facilitate authoring macros. For example:

foo(x: 1, y: 2)

# parses to:
{:foo, [], [[{:x, 1}, {:y, 2}]]}

# as opposed to:
{:foo, [],
 [[{:{}, [], [:x, 1]},
   {:{}, [], [:y, 2]}]]}

This makes it trivial to use keyword lists as options in macros, but also means that code processing an AST has to handle two forms of tuple. This is only one example of where the default AST can be cumbersome, but there are many situations where complex pattern matching, guards, or even metadata inspection are required to precisely differentiate syntax.

Sourceror, for instance, has a long standing issue for an enriched AST.

Additional Considerations

  • The perhaps obvious alternative is to transform Elixir AST into whatever format you want after the fact. This has two downsides that I can think of:
    1. It is slower to parse, walk, and transform than it would be to parse and output the desired result in one shot.
    2. There is additional metadata/context during parsing that is not included in the Elixir AST but that could be valuable. As a concrete example: when parsing Foo.Bar.\nBaz (note the newline), it's not possible to determine which line Bar occurs on without inspecting the source, but with token data, it would be.
  • It could be valuable to allow this or another hook to maintain and return an accumulator as well. This might be used to collect lint violations (in additional to parse errors). Based on this comment, it seems like you're already planning to return an accumulated value in addition to the parse result.
  • I expect, if implemented, parsing in the default case would be measurably slower due to the overhead of the additional call whenever a node is being constructed. I'm not sure what an acceptable amount of performance loss is, but I acknowledge that there is a line somewhere. (My gut says something like 1.1-1.2x would be acceptable, while 2x would almost certainly not be.)

multi aliases

Description

alias Foo.{Bar, Baz}

alias Foo.{
  Bar,
  Baz
}

bug: node could be dropped in certain circumstances

Description

The following code snippet seemed to drop the __cursor__() node from the result.

"def handle_call({:foo, foo}, _from, state) do\n    {:reply, :ok,\n     %{state |\n        foo: s\n__cursor__()\n,\n        bar: Foo.Bar.load(state.foo, state.baz)}}\n  end\n"

Spitfire doesn't handle nested access patterns

Spitfire doesn't handle

foo[bar["baz"]]

Code.string_to_quoted/2 returns

{:ok,
 {{:.,
   [from_brackets: true, closing: [line: 1, column: 15], line: 1, column: 4],
   [Access, :get]},
  [from_brackets: true, closing: [line: 1, column: 15], line: 1, column: 4],
  [
    {:foo, [line: 1, column: 1], nil},
    {{:.,
      [from_brackets: true, closing: [line: 1, column: 14], line: 1, column: 8],
      [Access, :get]},
     [from_brackets: true, closing: [line: 1, column: 14], line: 1, column: 8],
     [{:bar, [line: 1, column: 5], nil}, "baz"]}
  ]}}

bug: peek_token

** (FunctionClauseError) no function clause matching in Spitfire.peek_token/1
    (spitfire 0.1.0) lib/spitfire.ex:2167: Spitfire.peek_token(%{tokens: [nil | :eot], literal_encoder: #Function<49.133785804/2 in NextLS.handle_request/2>, errors: [{[line: 4, column: 13], "missing closing bracket for list"}], current_token: {:fake_closing_bracket, nil}, fuel: 150, peek_token: nil, nesting: 1})
    (spitfire 0.1.0) lib/spitfire.ex:295: anonymous fn/7 in Spitfire.parse_expression/6
    (spitfire 0.1.0) lib/spitfire/while.ex:49: Spitfire.While.do_while/2
    (spitfire 0.1.0) lib/spitfire.ex:1875: Spitfire.parse_identifier/1
    (spitfire 0.1.0) lib/spitfire.ex:283: Spitfire.parse_expression/6
    (spitfire 0.1.0) lib/spitfire.ex:936: anonymous fn/1 in Spitfire.parse_do_block/2
    (spitfire 0.1.0) lib/spitfire/while.ex:5: Spitfire.While2.recurse/3
    (spitfire 0.1.0) lib/spitfire/while.ex:10: Spitfire.While2.recurse/3

Spitfire doesn't handle unquote inside specs

This snippet is handled just fine by Code.string_to_quoted/2 but Spitfire returns an error

defmodule Foo do
  @possible_results [:a, :b, :c]

  @type result_type ::
          unquote(
            for result <- @possible_results, reduce: [] do
              acc -> {:|, [], [result, acc]}
            end
          )
end

Error-tolerant tokenization

I've been thinking more about error tolerance in Elixir parsing and wanted to start a discussion about errors that come from the tokenizer.

Currently, the behavior of the tokenizer is to stop tokenizing when an error is encountered, emitting the error and the token accumulator. This means that Spitfire can only be error-tolerant to the degree that the tokenizer is. Consider this bit of example code:

defmodule Foo do
  def bar(x) do
<<<<<<< HEAD
    x + 1
=======
    x + 2
>>>>>>> main
  end

  def baz(x), do: x + 3
end

When the tokenizer encounters the merge conflict, it bails out, so the best that Spitfire can do is to give us context about the module and function head, but nothing about the body and nothing about the definition for baz below.

If the tokenizer were to accumulate errors instead, Spitfire might be able to give us the AST for the following, as well as errors that VCS merge conflict markers were present on lines 3/5/7:

defmodule Foo do
  def bar(x) do

    x + 1

    x + 2

  end

  def baz(x), do: x + 3
end

While I haven't reviewed every tokenizer error case, there are a significant number where an error could be accumulated and tokenization could continue. Alternatively, the tokenizer could emit "error tokens" of some kind, e.g. {{:error, :version_contol_marker}, {3, 1, nil}, ~c"<<<<<<< HEAD"} and then handled in the parser.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.