Currently, comments are ignored for pretty-printing and are not tracked by the AST.</p

This is now fully implemented! See <a class="commit-link" data-hovercard-type="commit"

There is still more work to do (e.g. <a class="issue-link js-issue-link" data-error-te

Handling comments for pretty-printing about smlfmt HOT 9 CLOSED

shwestrick commented on May 25, 2024

Handling comments for pretty-printing

from smlfmt.

Comments (9)

shwestrick commented on May 25, 2024 1

Some additional discussion from Slack:

It seems pretty important to not screw up typical uses of comments such as commenting/uncommenting code. A good example is this, where the code author may later reactive the Baz branch:

case exp of
  Foo x => ...
| Bar y => ...
(*| Baz z => ...*)
| _ => ...

To handle this behavior, one idea would be to ensure that comments don't move across tokens when pretty-printed. We could implement this by attaching comments to individual tokens. The AST currently stores all tokens, so this actually wouldn't be a huge change. Then the pretty-printer just has to handle "reflowing" the comment, e.g. adjusting indentation.

from smlfmt.

shwestrick commented on May 25, 2024

I've done a little digging. Unsurprisingly, lots of people have dealt with the same issue. The umbrella concept is a concrete syntax tree aka CST.

For example:

Python LibCST https://github.com/Instagram/LibCST
Javascript CST https://github.com/cst/cst

The most popular solution appears to just be to store everything explicitly. Nothing fancy. I'm inclined to go in this direction for the sake of simplicity, and because it aligns well with the current strategy of "storing every token". In some sense, our Ast type is already halfway to a CST.

The simplest thing to do now would be to start by adding an additional Token variant for whitespace. (We already have a Comment variant.) Then we put this "extra" stuff into the Ast, and perhaps change its name to Cst.

The type of each "extra" element would be Token.t Seq.t where each token is a comment or whitespace.

from smlfmt.

shwestrick commented on May 25, 2024

One way to put extra stuff into the Ast is to identify it at the shallowest possible node. The result is that we see extra elements "between" items.

For example, Ast.Exp.IfThenElse could be changed to look like below. Note that there isn't any extra at the beginning, nor at the end. This is pretty verbose, and would require changing every Ast node. But certainly it is very self explanatory, and the span (leftmost and rightmost source pos) of each node is as "compact" as possible.

datatype exp =
  ...
| IfThenElse of
    { iff: Token.t
    , extra_after_iff: extra
    , exp1: exp
    , extra_after_exp1: extra
    , thenn: Token.t
    , extra_after_thenn: extra
    , exp2: exp
    , extra_after_exp2: extra
    , elsee: Token.t
    , extra_after_elsee: extra
    , exp3: exp
    }
| ...

from smlfmt.

shwestrick commented on May 25, 2024

To cut down on the verbosity, we could reuse the same trick as how "delimiters" are stored in the AST, where we use a sequence of N-1 things to go between each of N things.

With the approach, most Ast nodes get just a single additional field of extra_between items.

For example with IfThenElse:

| IfThenElse of
    { iff: Token.t
    , exp1: exp
    , thenn: Token.t
    , exp2: exp
    , elsee: Token.t
    , exp3: exp
    , extra_between: extra Seq.t   (** length 5 *)
    }

The alignment is:

if   exp1   then   exp2   else   exp3
   ^      ^      ^      ^      ^           (5 things between)

A more interesting example might be the Ast.Exp.DecMultiple node which is for sequences of core declarations. The "elems" are individual declarations and the "delims" are the (optional) semicolons between them. We would then have an additional sequence of "extra" stuff:

    (** dec [[;] dec ...] *)
    | DecMultiple of
        { elems: dec Seq.t
        , delims: Token.t option Seq.t
        , extra_between: extra Seq.t    (** whitespace and comments between every elem and delim *)
        }

The idea here is that for N declarations, there would be N-1 delimiters, and 2*(N-1) extra elements:

elems         dec1   dec2   dec3      ...  dec(N-1)   decN
delims             ;      ;       ;   ...           ; 
extra_between     ^ ^    ^ ^     ^ ^  ...          ^ ^

Note that this is equivalent to what I described in the previous comment, but is much less verbose in how the type is written. I'm not convinced it's much of an improvement, though.

from smlfmt.

shwestrick commented on May 25, 2024

Okay, it's been a few days. These previous ideas were a bit too deep-fried... the effort required would be massive for such a small gain. I'm beginning to believe that whitespace/comments really aren't tree-structured at all, and might not belong in the CST.

So, here's a nice alternative idea. We augment tokens with these additional functions (when the Lexer creates tokens, it needs to set up support for these):

val prevToken: token -> token option
val nextToken: token -> token option
val commentsAndWhitespaceBeforeToken: token -> token seq
val commentsAndWhitespaceAfterToken: token -> token seq

Then, the CST will be unchanged, and yet we still have the same functionality as putting whitespace/comments into the CST: at any token, we can look up any comments/whitespace immediately before and after.

from smlfmt.

shwestrick commented on May 25, 2024

So, here's a nice alternative idea. We augment tokens with these additional functions (when the Lexer creates tokens, it needs to set up support for these):
val prevToken: token -> token option
val nextToken: token -> token option
val commentsAndWhitespaceBeforeToken: token -> token seq
val commentsAndWhitespaceAfterToken: token -> token seq

This is now implemented in 4860dcd.

The next step is to use these functions in the pretty-printer to preserve comments in the output.

from smlfmt.

shwestrick commented on May 25, 2024

I'm now moving forward with inserting comments automatically in pretty-printing. To do so, I'm introducing an
"intermediate representation" for pretty-printing whose structure is similar to a PrettySimpleDoc.t, but whose leaf elements have type Token.t. This will allow us to do a pass over this "IR" to insert comments.

from smlfmt.

shwestrick commented on May 25, 2024

This is now fully implemented! See e17721d

from smlfmt.

shwestrick commented on May 25, 2024

There is still more work to do (e.g. #28), but it will be useful to keep those as separate issue threads.

Progress!

from smlfmt.

Handling comments for pretty-printing about smlfmt HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent