Comments (9)
Some additional discussion from Slack:
It seems pretty important to not screw up typical uses of comments such as commenting/uncommenting code. A good example is this, where the code author may later reactive the Baz
branch:
case exp of
Foo x => ...
| Bar y => ...
(*| Baz z => ...*)
| _ => ...
To handle this behavior, one idea would be to ensure that comments don't move across tokens when pretty-printed. We could implement this by attaching comments to individual tokens. The AST currently stores all tokens, so this actually wouldn't be a huge change. Then the pretty-printer just has to handle "reflowing" the comment, e.g. adjusting indentation.
from smlfmt.
I've done a little digging. Unsurprisingly, lots of people have dealt with the same issue. The umbrella concept is a concrete syntax tree aka CST.
For example:
- Python LibCST https://github.com/Instagram/LibCST
- Javascript CST https://github.com/cst/cst
The most popular solution appears to just be to store everything explicitly. Nothing fancy. I'm inclined to go in this direction for the sake of simplicity, and because it aligns well with the current strategy of "storing every token". In some sense, our Ast
type is already halfway to a CST.
The simplest thing to do now would be to start by adding an additional Token variant for whitespace. (We already have a Comment
variant.) Then we put this "extra" stuff into the Ast
, and perhaps change its name to Cst
.
The type of each "extra" element would be Token.t Seq.t
where each token is a comment or whitespace.
from smlfmt.
One way to put extra stuff into the Ast
is to identify it at the shallowest possible node. The result is that we see extra elements "between" items.
For example, Ast.Exp.IfThenElse
could be changed to look like below. Note that there isn't any extra at the beginning, nor at the end. This is pretty verbose, and would require changing every Ast
node. But certainly it is very self explanatory, and the span (leftmost and rightmost source pos) of each node is as "compact" as possible.
datatype exp =
...
| IfThenElse of
{ iff: Token.t
, extra_after_iff: extra
, exp1: exp
, extra_after_exp1: extra
, thenn: Token.t
, extra_after_thenn: extra
, exp2: exp
, extra_after_exp2: extra
, elsee: Token.t
, extra_after_elsee: extra
, exp3: exp
}
| ...
from smlfmt.
To cut down on the verbosity, we could reuse the same trick as how "delimiters" are stored in the AST, where we use a sequence of N-1 things to go between each of N things.
With the approach, most Ast
nodes get just a single additional field of extra_between
items.
For example with IfThenElse
:
| IfThenElse of
{ iff: Token.t
, exp1: exp
, thenn: Token.t
, exp2: exp
, elsee: Token.t
, exp3: exp
, extra_between: extra Seq.t (** length 5 *)
}
The alignment is:
if exp1 then exp2 else exp3
^ ^ ^ ^ ^ (5 things between)
A more interesting example might be the Ast.Exp.DecMultiple
node which is for sequences of core declarations. The "elems" are individual declarations and the "delims" are the (optional) semicolons between them. We would then have an additional sequence of "extra" stuff:
(** dec [[;] dec ...] *)
| DecMultiple of
{ elems: dec Seq.t
, delims: Token.t option Seq.t
, extra_between: extra Seq.t (** whitespace and comments between every elem and delim *)
}
The idea here is that for N declarations, there would be N-1 delimiters, and 2*(N-1) extra elements:
elems dec1 dec2 dec3 ... dec(N-1) decN
delims ; ; ; ... ;
extra_between ^ ^ ^ ^ ^ ^ ... ^ ^
Note that this is equivalent to what I described in the previous comment, but is much less verbose in how the type is written. I'm not convinced it's much of an improvement, though.
from smlfmt.
Okay, it's been a few days. These previous ideas were a bit too deep-fried... the effort required would be massive for such a small gain. I'm beginning to believe that whitespace/comments really aren't tree-structured at all, and might not belong in the CST.
So, here's a nice alternative idea. We augment tokens with these additional functions (when the Lexer creates tokens, it needs to set up support for these):
val prevToken: token -> token option
val nextToken: token -> token option
val commentsAndWhitespaceBeforeToken: token -> token seq
val commentsAndWhitespaceAfterToken: token -> token seq
Then, the CST will be unchanged, and yet we still have the same functionality as putting whitespace/comments into the CST: at any token, we can look up any comments/whitespace immediately before and after.
from smlfmt.
So, here's a nice alternative idea. We augment tokens with these additional functions (when the Lexer creates tokens, it needs to set up support for these):
val prevToken: token -> token option val nextToken: token -> token option val commentsAndWhitespaceBeforeToken: token -> token seq val commentsAndWhitespaceAfterToken: token -> token seq
This is now implemented in 4860dcd.
The next step is to use these functions in the pretty-printer to preserve comments in the output.
from smlfmt.
I'm now moving forward with inserting comments automatically in pretty-printing. To do so, I'm introducing an
"intermediate representation" for pretty-printing whose structure is similar to a PrettySimpleDoc.t
, but whose leaf elements have type Token.t
. This will allow us to do a pass over this "IR" to insert comments.
from smlfmt.
This is now fully implemented! See e17721d
from smlfmt.
There is still more work to do (e.g. #28), but it will be useful to keep those as separate issue threads.
Progress!
from smlfmt.
Related Issues (20)
- Grouping of function application HOT 2
- Infix style with large expressions HOT 1
- Style for nested `if` expressions?
- SuccessorML syntax HOT 8
- Long single-line comments can force large indentation increases
- Support mlb option "allowExtendedTextConsts true" HOT 4
- Format of `andalso` and `orelse` HOT 5
- Support top-level expressions HOT 5
- Rightward drift in deep function application
- Revisit `Promise` overhead in document structure
- Non-idempotent formatting
- Parser bug: unexpected functor HOT 1
- Heuristic for placing fun decl body on separate line HOT 1
- Automatically detect if stdin isn't a terminal HOT 2
- Remove `--stdio` flag
- Comments immediately before end of file don't format well HOT 3
- Editor integration HOT 6
- Continuous Integration Support HOT 6
- TODO: new release HOT 1
- Unusual command line interface HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from smlfmt.