Giter Club home page Giter Club logo

Comments (9)

shwestrick avatar shwestrick commented on May 25, 2024 1

Some additional discussion from Slack:

It seems pretty important to not screw up typical uses of comments such as commenting/uncommenting code. A good example is this, where the code author may later reactive the Baz branch:

case exp of
  Foo x => ...
| Bar y => ...
(*| Baz z => ...*)
| _ => ...

To handle this behavior, one idea would be to ensure that comments don't move across tokens when pretty-printed. We could implement this by attaching comments to individual tokens. The AST currently stores all tokens, so this actually wouldn't be a huge change. Then the pretty-printer just has to handle "reflowing" the comment, e.g. adjusting indentation.

from smlfmt.

shwestrick avatar shwestrick commented on May 25, 2024

I've done a little digging. Unsurprisingly, lots of people have dealt with the same issue. The umbrella concept is a concrete syntax tree aka CST.

For example:

The most popular solution appears to just be to store everything explicitly. Nothing fancy. I'm inclined to go in this direction for the sake of simplicity, and because it aligns well with the current strategy of "storing every token". In some sense, our Ast type is already halfway to a CST.

The simplest thing to do now would be to start by adding an additional Token variant for whitespace. (We already have a Comment variant.) Then we put this "extra" stuff into the Ast, and perhaps change its name to Cst.

The type of each "extra" element would be Token.t Seq.t where each token is a comment or whitespace.

from smlfmt.

shwestrick avatar shwestrick commented on May 25, 2024

One way to put extra stuff into the Ast is to identify it at the shallowest possible node. The result is that we see extra elements "between" items.

For example, Ast.Exp.IfThenElse could be changed to look like below. Note that there isn't any extra at the beginning, nor at the end. This is pretty verbose, and would require changing every Ast node. But certainly it is very self explanatory, and the span (leftmost and rightmost source pos) of each node is as "compact" as possible.

datatype exp =
  ...
| IfThenElse of
    { iff: Token.t
    , extra_after_iff: extra
    , exp1: exp
    , extra_after_exp1: extra
    , thenn: Token.t
    , extra_after_thenn: extra
    , exp2: exp
    , extra_after_exp2: extra
    , elsee: Token.t
    , extra_after_elsee: extra
    , exp3: exp
    }
| ...

from smlfmt.

shwestrick avatar shwestrick commented on May 25, 2024

To cut down on the verbosity, we could reuse the same trick as how "delimiters" are stored in the AST, where we use a sequence of N-1 things to go between each of N things.

With the approach, most Ast nodes get just a single additional field of extra_between items.

For example with IfThenElse:

| IfThenElse of
    { iff: Token.t
    , exp1: exp
    , thenn: Token.t
    , exp2: exp
    , elsee: Token.t
    , exp3: exp
    , extra_between: extra Seq.t   (** length 5 *)
    }

The alignment is:

if   exp1   then   exp2   else   exp3
   ^      ^      ^      ^      ^           (5 things between)

A more interesting example might be the Ast.Exp.DecMultiple node which is for sequences of core declarations. The "elems" are individual declarations and the "delims" are the (optional) semicolons between them. We would then have an additional sequence of "extra" stuff:

    (** dec [[;] dec ...] *)
    | DecMultiple of
        { elems: dec Seq.t
        , delims: Token.t option Seq.t
        , extra_between: extra Seq.t    (** whitespace and comments between every elem and delim *)
        }

The idea here is that for N declarations, there would be N-1 delimiters, and 2*(N-1) extra elements:

elems         dec1   dec2   dec3      ...  dec(N-1)   decN
delims             ;      ;       ;   ...           ; 
extra_between     ^ ^    ^ ^     ^ ^  ...          ^ ^

Note that this is equivalent to what I described in the previous comment, but is much less verbose in how the type is written. I'm not convinced it's much of an improvement, though.

from smlfmt.

shwestrick avatar shwestrick commented on May 25, 2024

Okay, it's been a few days. These previous ideas were a bit too deep-fried... the effort required would be massive for such a small gain. I'm beginning to believe that whitespace/comments really aren't tree-structured at all, and might not belong in the CST.

So, here's a nice alternative idea. We augment tokens with these additional functions (when the Lexer creates tokens, it needs to set up support for these):

val prevToken: token -> token option
val nextToken: token -> token option
val commentsAndWhitespaceBeforeToken: token -> token seq
val commentsAndWhitespaceAfterToken: token -> token seq

Then, the CST will be unchanged, and yet we still have the same functionality as putting whitespace/comments into the CST: at any token, we can look up any comments/whitespace immediately before and after.

from smlfmt.

shwestrick avatar shwestrick commented on May 25, 2024

So, here's a nice alternative idea. We augment tokens with these additional functions (when the Lexer creates tokens, it needs to set up support for these):

val prevToken: token -> token option
val nextToken: token -> token option
val commentsAndWhitespaceBeforeToken: token -> token seq
val commentsAndWhitespaceAfterToken: token -> token seq

This is now implemented in 4860dcd.

The next step is to use these functions in the pretty-printer to preserve comments in the output.

from smlfmt.

shwestrick avatar shwestrick commented on May 25, 2024

I'm now moving forward with inserting comments automatically in pretty-printing. To do so, I'm introducing an
"intermediate representation" for pretty-printing whose structure is similar to a PrettySimpleDoc.t, but whose leaf elements have type Token.t. This will allow us to do a pass over this "IR" to insert comments.

from smlfmt.

shwestrick avatar shwestrick commented on May 25, 2024

This is now fully implemented! See e17721d

from smlfmt.

shwestrick avatar shwestrick commented on May 25, 2024

There is still more work to do (e.g. #28), but it will be useful to keep those as separate issue threads.

Progress!

from smlfmt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.