redprl / asai Goto Github PK
View Code? Open in Web Editor NEW๐ฉบ A library for compiler diagnostics
Home Page: https://ocaml.org/p/asai
License: Apache License 2.0
๐ฉบ A library for compiler diagnostics
Home Page: https://ocaml.org/p/asai
License: Apache License 2.0
The following design, upon approval, should be carefully documented and applied everywhere:
Currently, the STLC example is violating the design, and documentation is lacking.
I have two proposals for the replacement:
โก
๏ฟฝ
One exception is CR. To handle CRLF smoothly, CR should probably just be removed. Another is (horizontal) tabs, which is discussed in #66.
Inspired by @mikeshulman's suggestion, perhaps it's useful to display newline characters and/or the end-of-file. I can think of a few choices:
โค
โ
โฒ
โต
โ
โ
ยถ
โน
@johnyob has implemented the Rust-style diagnostic rendering at https://gitlab.com/alistair.obrien/grace https://github.com/johnyob/grace.
On the surface, it looks straightforward to adapt it into an asai diagnostic handler. Of course, the renderer will not satisfy our Level-2 display stability requirements, but I think we can implement a renderer with lower display stability requirements, which has the benefits of not relying solely on colors to highlight spans.
Reasons:
eio
and lsp
, preventing the "core" asai
from being integrated into lightweight applications.As a general principle I think we should design our interface to match what we think is correct, not what the current LSP protocol wants us to do. For example, an LSP diagnostic always requires the following data:
For 1, if no file URIs make sense, the current workaround is to use the project file or a folder. For 2, if no ranges make sense, the current workaround is to use the fake span [0,0)
(the very beginning of the file). All of these hacks are LSP-specific and I propose burying them in Asai_lsp
not in Asai
. This applies to the entire library---I believe it is better to question the LSP protocol (or any existing protocol) more.
Actionable item: move all LSP-related hacks to Asai_lsp
, or at least hide them from the public interface.
Changes:
info
and warning
into print
Based on the Discord discussion:
E.tracef ?loc "when@ loading@ module@ %s" name @@ fun () -> ...
;;
E.printf ~code:`GalaxyNumber ?loc "number@ %i@ is@ too@ large" very_small_number
;;
E.fatalf ~code:`TypeError ?loc "%a@ does@ not@ have@ type@ %a" pp_tm tm pp_ty ty
;;
E.printf ~code:`EmojiError ?loc ~marks:all_occurrences
"emoji@ %a@ is@ used@ more@ than@ %i@ times."
pp_emoji emoji threshold
;;
E.messagef ~code:`TypeError ?loc
"%s@ has@ type@ type@ %a,@ but@ we@ expected@ it@ to@ have@ type@ %a."
var pp_tp actual pp_tp expected
|> E.mark [binding_loc1; binding_loc2]
|> E.fatal
;;
E.messagef ?loc ~code:`ChiError "variable@ name@ %s@ does@ not@ have@ any@ emojis." var
|> E.fatal ~marks:[]
;;
E.printf ?loc ~code:`ChiInfo "raise@ %s@ here." "CCHM"
In sum, we should have these functions
tracef
to construct a backtracemessagef
to construct a messagemark
to add, well, marksfatal(f)
and print(f)
to log something, and all four variants can take ?marks
fatal(f)
intends to end the program after printing out the message.-f
functions always take ?loc
.?marks
and/or calls of E.mark
).In the LSP protocol, "related info" is closer to multi-spans, while "causes" are currently used as the backtrace in the unix backend. These two seem to be semantically distinct, and ideally we should capture semantic differences in the Ultimate API.
Currently, the unix backend prints out something like this:
Error [E123]: This is the high-level description of the error
that is hopefully more understandable.
[cool Unicode art]
**dubious code**
^^^^^^^^^^^^
This line is wrong!
[cool Unicode art]
I think we can do the following instead for diagnostics with ranges:
[cool Unicode art]
**dubious code**
^^^^^^^^^^^^
[E123] This is the high-level description of
the error that is hopefully more understandable.
[cool Unicode art]
Benefits:
Bonus: compatibility with LSP.
Yet another great question/suggestion from @mikeshulman. The library is expecting a file_path
in a position, but there's no "file" at interactive prompts. While the internal code is highly modularized and file I/O is isolated, the current external API bundles the file I/O. Here are some concrete fixes I can think of:
Position.file_path
to Position.source
and make its type polymorphic (so that we will have 'source Span.t
). Downsides: too many type variables can reduce usability.Position.file_path
to Position.source
but fix its type to the following:
type source = [`File of string | `String of string | `Uri of string ]
In either way, a more generic backend which can handle strings from interactive prompts should be made available. This can be as simple as taking yet another optional argument to handle URIs (if we want to allow them).
Range
instead of Span
.For 0.2.0, Span
will only be deprecated, before its removal in later versions.
U+000A
U+000D
U+000D U+000A
U+000B
U+000C
U+0085
(UTF-8: 0xC2 0x85
)U+2028
(UTF-8: 0xE2 0x80 0xA8
)U+2029
(UTF-8: 0xE2 0x80 0xA9
)In either mode, report any inconsistency.
It's one of the design goals to trivialize the embedding of error messages from a library into the main application. The end users should not be able to (easily) tell whether it's some library or the main program generating the error message. The current API already provides the mechanism to do so, but adding an example should help us examine the current design.
This is a redesign of src/core/Loc.mli
. Changes:
Asai.Span
).filename
to file_path
.height
does not make sense in the presence of wrapping, and line_numbers
will be useless once wrapping is supported in the terminal backend.)utf8_
because we will not support other encodings anyways.(** {1 Types} *)
(** The type of positions *)
type position = {
file_path : string;
(** The absolute file path of the file that contains the position. *)
offset : int;
(** The byte offset of the position relative to the beginning of the file. *)
start_of_line : int;
(** The byte offset pointing to the beginning of the line that contains the position. *)
line_num : int;
(** The 1-indexed line number of the line that contains the position. *)
}
(** The abstract type of spans. *)
type t
(** {1 Builders} *)
(** [make start end_] builds the span [[start, end_)] from a pair of positions [start] and [end_].
@raise Invalid_argument if the positions do not share the same file path or if [end_] comes before [start]. The comparison of file paths is done by [String.equal] without any path normalization.
*)
val make : position -> position -> t
(** [of_lex_pos pos] conversion [pos] of type {!type:Lexing.position} to a {!type:position}. The input [pos] must be in byte-indexed. (Therefore, [ocamllex] is compatible, but [sedlex] is not because it uses code points.) *)
val of_lex_pos : Lexing.position -> position
(** [to_positions span] returns the pair of the start and end positions of [span]. *)
val to_positions : t -> position * position
(** {1 Accessors} *)
(** [file_path span] returns the file path associated with [span]. *)
val file_path : t -> string
(** [start_line_num span] returns the 1-indexed line number of the start position. *)
val start_line_num : t -> int
(** [end_line_num span] returns the 1-indexed line number of the end position. *)
val end_line_num : t -> int
(** {1 Auxiliary types} *)
type 'a located = { span : t option; value : 'a }
Currently, the terminal backend is not happy with control characters (for good reasons), but maybe it is more user-friendly to turn \n
into pp_force_newline
especially for beginners.
@mmcqd Maybe we can have a raw
option to turn off the conversion?
This is another good question/suggestion by @mikeshulman. These two I believe were copied from the language server protocol (LSP) @TOTBWF. Is there a source of these terms? The official LSP specification does not seem to explain them?!
Let me just write down what I thought they are:
Currently, the registered printer and Asai_tty do not handle multi-line messages well/correctly.
Using Format.kdprintf
, it should be possible to provide an alternative interface that takes format strings directly. For example:
E.fatal "number %i is too cool" 42
We now use @include
, an odoc extension, in multiple places.
open Bwd
(** Styles *)
type style = [`Context | `Highlight | `Mark]
(** A segment is a styled string without control characters. *)
type marked_string = style * string
(** A line is a list of segments. *)
type marked_line = marked_string list
(** A block is a collection of consecutive lines. *)
type marked_block = { start_line_num : int; text : marked_line list }
(** A file consists of multiple blocks. *)
type marked_file = { file_path : string; blocks : marked_block list }
(** A multi-span consists of all formatted spans across multiple files. *)
type marked_message = marked_file list * Asai.Diagnostic.message
(** a message *)
type t =
{ code : string
; severity : Asai.Severity.t
; message : marked_message
; traces : marked_file bwd
}
U+000A
U+000D
U+000D U+000A
U+000B
U+000C
U+0085
(UTF-8: 0xC2 0x85
)U+2028
(UTF-8: 0xE2 0x80 0xA8
)U+2029
(UTF-8: 0xE2 0x80 0xA9
)I consider it memory leak when whole content of closed files still occupy the memory. I wonder if we could switch to a Unix.lseek
-based implementation?
The Tty backend should have explicitly printed out the severity (e.g., "ERROR") instead of only relying on highlighting colors. This was suggested by @mikeshulman and I strongly agreed! I guess the remaining question is to find a good place in the Unicode art to add it:
Current output with location information:
โโโ examples/stlc/example.lambda
โ
1 โ line1
2 โ line2
โ
20 โ line20
21 โ line21
โท
[E002] Message line 1
Message line 2
Current output without location information:
[E002] Message line 1
Message line 2
It would be a good idea to add some example configurations for various editors, so we could just copy+paste when using asai.lsp
in other projects.
Unfortunately, the situation with lsp-mode
is pretty grim; see emacs-lsp/lsp-mode#3625
It should be noted that autocompletion, semantic tokens, document synchronization, go to definition/declaration, find references, hover, etc. are all out of the scope of asai
. The current LSP backend is exclusive and makes it difficult to create an LSP server that has error reporting and other features. This has to be changed.
I'm a bit concerned that someone will try to use this library together with another library for logging (e.g., syslog), rendering our use of the name Logger
at odds. @TOTBWF suggested using the name Reporter
instead, which I think is very reasonable. I wonder if there's any other suggestion? The resolution of this issue shall be implemented in 0.2.0.
The current handler is not ideal:
Concrete suggestions: remove "empty lines" between blocks; instead, mix remarks/messages with content from end users (while maintaining Level-2 display stability).
We would like to print Python and Go code!
I recently learned that the position type from Fmlib_parse does not contain byte offsets. While I think it is more efficient if the parser library can provide byte offsets directly, we can also provide some (inefficient) helper function recovering the information by scanning the content from the start, counting the newline sequences.
This function should be added along with, or after the resolution of issue #87.
Yetยฒ another good suggestion/question from @mikeshulman.
Currently, there's a quickstart tutorial. Is it good enough? Only a real experiment can tell. @jonsterling I think it's time to try out this library!
I have not created a minimal reproduction of this yet.
This library is heavily influenced by LSP and is not 100% compatible with the Rust design. Perhaps it's good to review the Rust design document and note the differences:
Based on some private conversation, I believe what we want is some support for developers to debug. @mmcqd
The current design is to make users happy, but we may also want to support spewing out more debugging information.
This should solve some rendering issue
Yetยณ another question/suggestion from @mikeshulman. The current asai library wants the implementer to directly write out the message when sending it. The other design is usid in cooltt, whene we send "structured" errors of this type:
type t =
| MalformedCase
| InvalidTypeExpression of CS.con
| ExpectedSynthesizableTerm of CS.con_
| CannotEliminate of Pp.env * S.tp
| ExpectedSimpleInductive of Pp.env * S.tp
| InvalidModifier of CS.con
| ExpectedFailure of CS.decl
And then there's a pretty printer to turn these structured errors into elaborated messages. I am not sure how to have an API that works (nicely) for both, but at very least the design decision to "write the (unstructured) message when sending it" should be documented. This reminded me of the old debate between catgets
and gettext
, except that we are using high-level, rich constructors from OCaml types instead of integers in the case of catgets
.
Any suggestions for redesigning the API?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.