kaleidawave / ezno Goto Github PK

View Code? Open in Web Editor NEW

2.2K 18.0 37.0 2.34 MB

A JavaScript compiler and TypeScript checker written in Rust with a focus on static analysis and runtime performance

Home Page: https://kaleidawave.github.io/posts/introducing-ezno/

License: MIT License

Rust 99.64% JavaScript 0.36%

compiler javascript types typescript typechecker

ezno's Introduction

A JavaScript compiler and TypeScript checker written in Rust with a focus on static analysis and runtime performance.

Important

Ezno is in active development and currently does not support enough features to check existing projects. Check out the getting started guide for experimenting with what it currently supports.

What Ezno is

A type checker for JavaScript usable through a CLI (with a LSP also in the works)
A high level library that allows type checking to be added to other tools!
Checks programs with guaranteed type safety (no runtime TypeErrors) (as long as definitions are sound)
Types aimed at soundness and tracing for better static analysis
A imperative type system that tracks and evaluates the side effects of functions and control flow structures. It is similar to an interpreter, but acts with types instead of values and does not run IO side effects etc
A collection of experiments of types. Many are being worked out and are in the prototype stage. Some of the new behaviors benefit JavaScript specifically and others could be applied to other languages
Written in Rust
Fast and Small
Open source! You can help build Ezno!
A challenge to the status quo of type checking, optimisations and compilation through deeper static analysis beyond syntax analysis

What Ezno is not

eNZo, the Z is in front of the N (pronounce as 'Fresno' without the 'fr') 😀
Be on parity with TSC or 1:1, it has some different behaviors but should work in existing projects using TSC
Faster as a means to serve large codebases. Cut out bloat and complex code first!
Smarter as a means to allow more dynamic patterns. Keep things simple!
A binary executable compiler. It takes in JavaScript (or a TypeScript or Ezno superset) and does similar processes to traditional compilers, but at the end emits JavaScript. However in the future it could generate a lower level format using its event (side-effect) representation

Crate	Lines Of Code	Contains
checker		Stores for types and contexts, type checking logic and optional synthesis over the parser AST
parser		AST definitions, logic for parsing, AST to string and visiting

Help contribute

Check out good first issues and comment on discussions! Feel free to ask questions on parts of the code of the checking implementation.

Read CONTRIBUTING.md for information about building and testing.

ezno's People

Contributors

Stargazers

Watchers

ezno's Issues

feat(ci): auto-fix formatting and linting on new PRs

This is in response to your Cool things in '23 post:

I also really want a GitHub action/bot to apply clippy --fix and rustfmt format on the current PR. I don't know how I feel asking the author to do the sort of bookkeeping in a project if simple checks fail. It would be nice if I could comment @my-bot try fix lints and formatting and it runs it and commits the fixes (rather than me having to checkout and do it manually). Is this even a good idea, does it change who authored it when it appears in git blame`? If you know a simple solution feel free to PR it.

I do not know know about rust-tooling, but in JavaScript-land there exists lint-staged to run pre-configured linters on files that were changed between main and the head of a branch the PR is on.

There is also husky for running lint-staged as a pre-commit hook in Git (so that devs get feedback even before commit is made, even if they don't have rustfmt in their IDEs)

I am not opening a PR for this as I am not sure if you would be open to having npm-based tools added in this repository, but here is how the above could be configured:

Create lint-staged.config.js:

const autoFix = !process.env.NO_AUTO_FIX;

module.exports = {
  "*.{rs}": [`clippy ${autoFix ? "--fix" : ""}`, 'rustfmt format'],
};

Install dependencies:

npm install --save-dev lint-staged husky

Add this to scripts section in package.json:

    "lint": "lint-staged --config lint-staged.config.js",

Create GitHub Action workflow lint.yml:

name: Lint code

on:
  push:
  pull_request:

jobs:
  linting:
    if: github.event_name != 'pull_request' || github.event.pull_request.draft == false
    steps:

      - uses: actions/checkout@v3
        with:
          fetch-depth: 0
          # Fetch a branch, not detached HEAD (to allow committing changes)
          ref: ${{ github.event_name == 'pull_request' && github.head_ref || github.ref_name }}

      - uses: actions/setup-node@v3

      - name: Install dependencies
        run: npm ci

      - name: Lint packages
        run: |
          if [ ${{ github.event_name }} == 'pull_request' ]; then
            npx lint --diff="origin/${{ github.base_ref }}...origin/${{ github.head_ref }}"
          else
            npx lint --diff "HEAD~1...HEAD"
          fi

      - name: Commit linted files (if made any changes)
        if: github.event_name == 'pull_request'
        # Creates a new commit. Amending an existing commit is a bad idea
        # because:
        # - Would require original committer to force pull. May cause merge
        #   conflicts
        # - Changes that require linting might have been made over several
        #   commits. If you amend the last commit, than the fixes for all
        #   commits would be in the last commit.
        run: |
          # Check if any files were changed by autofix
          git add .
          if git diff-index --quiet HEAD --; then
            echo "Linters did not detect any issues. Good job!"
          else
            # Set committer to the person who committed the last commit on this branch:
            git config --local committer.name "$( git log -1 --pretty=format:'%cn' )"
            git config --local committer.email "$( git log -1 --pretty=format:'%ce' )"
            # Author commit as a GitHub Action. Commits authored by GitHub
            # Action do not trigger GitHub Action workflows. This avoids a
            # cycle of Action triggering itself in a loop.
            git config --global author.email "github-actions"
            git config --global author.name "41898282+github-actions[bot]@users.noreply.github.com"

            git commit \
              --message "chore(lint): lint code with clippy and rustfmt" \
              --message "Triggered by ${{ github.event.pull_request.head.sha }} on branch ${{ github.head_ref }}" \
              --no-verify
            git push --no-verify
          fi

Alternatively, instead of using npm's lint-staged, there is a GitHub action dorny/paths-filter
Example usage in https://github.com/specify/specify7/blob/production/.github/workflows/test.yml

Handling explicit file extension (Node16 + NodeNext)

Description

Hello again!

When tsconfig has "module": "node16" | "nodenext", all the imports and exports will end with .js even if it's importing a typescript file. See documentation

Solution

I guess it's either reading the tsconfig file to know how to handle imports and exports, or the easiest way of doing it: if an import with an explicit .js fails, it can fallback to trying to read the same file but with .ts

Fix and figure out how multiple properties should be represented

Currently there is no difference in the representation of properties that are dependent once OR multiple times.

aka { [a: number]: "value" } has under a context properties as [(PropertyKey::Type(*number type*), Property::Value(TypeId -> Type::Constant(Constant::String("value")))]. There is no difference between this being a project with many keys that are number like OR a single key, which is of the type of number.

I think this can be solved by a separate variant Property on the RHS called Property::Multiple(Box<Property>) which denotes that the RHS property is dependent. This only makes sense if the LHS is PropertyKey::Type though 🤔. Also the RHS needs depend of types from the LHS for type mappings in TypeScript 🤔🤔

Automatic Semicolon Insertion

Currently the parser parses invalid script

$ cargo run -- ast-explorer full-ast
full_ast> const x = 2 const y = 3
ParseOutput(
    Module {
        statements: [
            VariableDeclaration(
                ConstDeclaration {
                    keyword: Keyword(
                        Const,
                        0..5,
                    ),
                    declarations: [
                        VariableDeclaration {
                            name: None(
                                Name(
                                    Standard(
                                        "x",
                                        VariableId(1),
                                        6..7,
                                    ),
                                ),
                            ),
                            type_reference: None,
                            expression: NumberLiteral(
                                Number(
                                    2.0,
                                ),
                                10..11,
                                ExpressionId(1),
                            ),
                        },
                    ],
                },
            ),
            VariableDeclaration(
                ConstDeclaration {
                    keyword: Keyword(
                        Const,
                        12..17,
                    ),
                    declarations: [
                        VariableDeclaration {
                            name: None(
                                Name(
                                    Standard(
                                        "y",
                                        VariableId(2),
                                        18..19,
                                    ),
                                ),
                            ),
                            type_reference: None,
                            expression: NumberLiteral(
                                Number(
                                    3.0,
                                ),
                                22..23,
                                ExpressionId(2),
                            ),
                        },
                    ],
                },
            ),
        ],
        ..
    },
    ParsingState { .. },
)

Here it should fail, there needs to be a semi colon after the 2 expression. This currently works because the check is not there. The check is not there because I wanted to get non semicolon lines to work before I added automatic semi-colon insertion.

JavaScript accepts places where semi-colons aren't necessary and automatically inserts them.

Steps to fix

Add a AutoSemiColon variant to TSXToken
The lexer should insert in places according to the rule
Statement parsing should check either a AutoSemiColon or SemiColon token terminates some statements

Span improvements

All AST contains information about which slice and the position in the slice it was parsed from through Span. Currently:

Spans are start byte index in the string, end byte index in the string.
Binary expressions (and some others combined structures) don't have position information, instead they combine the left and right spans.

There are opportunities for improvements here, which may require changes to source-map.

Changes with lexing:

Token sends a Span. There are two redundant pieces of information
- SourceId. This is sent the same for every token. It should be on ParsingState instead
- Is an end position needed. This maybe could be worked out from a start + length of token (token length won't change so isn't an issue).
- Instead tokens would be of type Token<TSXToken, /* start */ usize>. And to construct a AST position it would be Span { start: token_start, end: token_start + token_length, source_id: state.current_source_id }

Changes to the AST, parsing and spans:

Are byte slice based spans a good idea. Do line column spans offer benefits?
- Some of the #1 could benefit from knowing about line breaks, while possible with slice based one it's a bit overhead
Should the end marker be relative or absolute?
Do computed positions cause issues? Are errors ever produce on modified AST? Aka swapping binary operands leads to a structure with broken position information.
For literals, do they need a end byte marker. Could be more efficient if the end is computed via start + literal length? Does that suffer from the problems from directly above.

Libraries that use bytewise start-end spans:

codespan-reporting

Libraries that use line-column spans:

Source maps are line-column based
LSP and so LSP types
Rollup reporting (used by the JS WASM edition) is currently line-column

Add positions to events

When a function returned value (which is looked up via events) doesn't match it error. This error doesn't contain the location of the return.

Many of the Events could benefit from having a position (using SourceWithSpan) to mark where they occurred.

To do this add a field with the type SourceWithSpan to an event and modify the synthesis and other code to make it compile. Then repeat for every variant of Event

Try catch AST

Somehow along the way have missed out parsing try catch (finally) statements: https://developer.mozilla.org/en-US/docs/web/javascript/reference/statements/try...catch

ezno/parser/src/statements/mod.rs

Lines 146 to 148 in d311155

 TSXToken::Keyword(TSXKeyword::Try) => { 

 todo!() 

 }

This should be very similar logic to statements like if and while, however the spec says that statements have to be Blocks and not allow single statements.

The while statement code is a good reference for how it should be implemented. Can copy that and then adjust for the keywords and such. Then just need to import that back into statements/mod.rs, create a similar variant and hook it up to it's parse, position and to_string implementation.

If anyone wants to tackle this and needs more information/help, let me know 👍

Record mutated variables / objects that might be mutated by a unknown loop or function call

Given something like

function (cb: Function) {
    const obj = { a: 2 }
    cb(obj)
    console.log(obj.a)
}

This might fail if the function passed as cb deletes a from obj.

As there is no annotation syntax for marking a function as pure/without side effects (at the moment 👀) the safety isn't known inside the function.

This currently isn't caught in Ezno, so needs fixing.

Instead:

Objects passed to functions (and some other items that might be mutated by calling a unknown function) must be marked as unstable or something
Using this unstable poly types should require checking in apply_effect (as it evaluates side effects and is aware of what stage the function is at)

For / while loops

Add iterators for array based on length etc
Add effects running with some iteration limit

Multiple files as entry point

Update: after #102 have added multiple entry points to the checker. However it is not fully wired up to Ezno's CLI with:

ezno/src/cli.rs

Lines 170 to 172 in 1aa77e2

 CompilerSubCommand::Check(check_arguments) => { 

 let CheckArguments { input, watch: _, definition_file, timings } = check_arguments; 

 let entry_points = vec![input];

The next part (that you can contribute to):

Get a Vec<PathBuf> from a input path.
- This could be a glob pattern or just a comma separated list
- Should consider whether this method will work with the notify crate. While check --watch isn't currently implemented, it would be good if this addition can work with a watch mode in the future

Note I have not checked side-effect edge cases in #102

Currently there is only one entry path. OP wants to check all files in the directory. Note each entry point also type checks its imports, so this isn't necessary for checking imports and exports.

Rather than running entry_points.for_each(|path| check_project(path, ...)) this needs to be built into the internal ModuleData. So it should be callable as check_project(vec![...]).

Things to think about

Side effects in modules (want to reuse some existing data, but make it distinct)
How to get a Vec<PathBuf> from the glob passed to the CLI
- How this works in the CLI watch mode

Discussed in #80

^{Originally posted by o-az November 11, 2023}
How can I run oxidation-compiler to typecheck multiple files by passing a glob string / directory path?

Works:

npx oxidation-compiler@latest check src/file.tsx

None of these commands check all files:

npx oxidation-compiler@latest check src/*.tsx

npx oxidation-compiler@latest check src/

npx oxidation-compiler@latest check src/*

Parallel module (and maybe other item) checking

Maybe modules (and maybe functions) could be checked in parallel. Just needs CheckingData stuff to be behind a Mutex and stuff...

Somethings:

Blocks still need to be done in sequence as statements above can have effect on things below
These needs to be opt-out at build time (as WASM doesn't support threads)
- The parser currently has threaded code. But using generics and cfg is opt out (which is how the parser aspect still works in WASM code)
Even if this works, if it has no significant impact then it won't be added/merged immediately

Add setters

At the time of writing Ezno supports getters on objects and classes.

It does this by treating getters the same as regular functions. Then when 'getting' the property it evaluates the getter function

ezno/checker/src/types/properties.rs

Lines 271 to 292 in 5021262

 PropertyValue::Getter(getter) => { 

 let state = ThisValue::Passed(on); 

 let call = getter.call( 

 CalledWithNew::None, 

 state, 

 None, 

 // TODO 

 None, 

 &[], 

 SpanWithSource::NULL_SPAN, 

 environment, 

 behavior, 

 types, 

 true, 

 ); 

 match call { 

 Ok(res) => Some((PropertyKind::Getter, res.returned_type)), 

 Err(_) => { 

 todo!() 

 } 

 } 

 }

This maintains all the behaviour of the event system in detecting side effects. etc

At the time of writing the setter part is not yet implemented. It should be the exact same code as 'getting' but calling the setter this time AND the setter should be called with one argument, the assigned value! and return the assigner as setters (confusingly) ignore the returned value and just use the RHS value. Similar to getter logic, the value of this should be on (the type the property is being looked up on). To handle the restriction, the Result::Err from .call should be pulled apart and any parameter mismatches should be turned into assignment diagnostics (can ignore the other kinds of errors for now)

At the end, this should type check

let a = 2;
const obj = {
   set value(v) {
       a = v;
   }
}

let b: 80 = (obj.value = 80);
let c: 80 = a;

Proxy support

As much as want it not to exist. I guess it needs to be supported.

Steps

Add a variant to ObjectNature which is Proxy { trap: TypeId }
Modify get_property and set_property to call a function if on has this structure

TODO

Way to annotate something that creates a proxy...

Improve and fix parser fuzzing

The two fuzzing tests are current failing

The module_roundtrip_naive is currently catching a issue in TemplateLiteral. This is great as it highlights where the invalid assertion is, but currently it only shows the following

Failing input:

	artifacts/module_roundtrip_naive/crash-b858cb282617fb0956d960215c8e84d1ccf909c6

Output of `std::fmt::Debug`:

	" "

Reproduce with:

	cargo fuzz run --sanitizer=none module_roundtrip_naive artifacts/module_roundtrip_naive/crash-b858cb282617fb0956d960215c8e84d1ccf909c6

Minimize test case with:

	cargo fuzz tmin --sanitizer=none module_roundtrip_naive artifacts/module_roundtrip_naive/crash-b858cb282617fb0956d960215c8e84d1ccf909c6

Is this identifier b858cb282617fb0956d960215c8e84d1ccf909c6 deterministic and able to be used locally, or is it lost after the run? If not it would be great if it was printed afterwards or else uploaded using upload-artifact (the 90 day retention is fine), so that the input is can be used to help fix the issue.

And the second fuzzing suite, module_roundtrip_structured, currently cannot find arbitrary. I think maybe a dependency changed in boa-ast (which we are currently use as a agent for generating AST)?

Read from imports

Imports are not currently implemented :/

They should read from the JavaScript source 99% of the time (Ezno has some problems with existing .d.ts files, JS is a better source of truth).

To do

Think about how import resolving works (look at whether there is an existing library that can do this for us)
Have a Module struct that contains information (I think there is one from my previous attempt, which needs improving)
Think about the side-effects of module important
Record things around export declarations
Dynamic import?

Lexing of spaces breaks attribute parsing

When evaluating an expression that has an attribute, the current lexer in the parser-fixes2 branch allows matching spaces as part of the tag name. This prevents the Lexing State from moving to lexing the attributes, instead giving an error when it reaches the = character.

Example: running ezno ast-explorer ast and using the expression <div className="test"></div>
Output:

error: 
  ┌─ INPUT:1:2
  │
1 │ <div className="test"></div>
  │  ^ Invalid character '=' in JSX tag

When removing the match on spaces here it allows the lexer to continue onto the attribute lexing state, giving the following output for the same example:

JSXRoot(
    Element(
        JSXElement {
            tag_name: "div",
            attributes: [
                Static(
                    "className",
                    "test",
                    5..21,
                ),
            ],
            children: Children(
                [],
            ),
            expression_id: ExpressionId(1),
            position: 0..28,
        },
    ),
)

I have a fork with the changes made to the parser-fixes2 branch here, but wanted to double check that there wasn't a reason to be lexing spaces as part of the tag name first.

Thank you :)

Evaluate Profile-Guided Optimization (PGO) and LLVM BOLT

Hi!

Recently I did many Profile-Guided Optimization (PGO) benchmarks on multiple projects (including static analysis tools and compilers like Rustc, Clang, Clangd, Clang Tidy, and many others) - the results are available here. So that's why I think it's worth trying to apply PGO to Ezno.

I can suggest the following things to do:

Evaluate PGO's applicability to Ezno.
If PGO helps to achieve better performance - add a note to Ezno's documentation about that. In this case, users and maintainers will be aware of another optimization opportunity for Ezno.
Provide PGO integration into the build scripts. It can help users and maintainers easily apply PGO for their own workloads.
Optimize prebuilt Ezno binaries with PGO.

After PGO, I can suggest evaluating LLVM BOLT as an additional optimization step after PGO.

For the Rust projects, I recommend starting with cargo-pgo.

[feature-request] Is it possible to integrate this into compile time check?

for example, is sqlx, we can do compile time syntax check on sql. it would be fantastic if we can do the same for js/ts. This would be veryy useful in

wasm_bindgen_test
where you could just write correct js code with syntax check to run your test. Allows for better CI testing.
wasm_bindgen(typescript_custom_section)
where you want to have customized type for the interfaces you provide.

I wonder if this project has matured to this point yet?

I want to build a general metadata extraction platform based on this project. In addition to generating metadata for TypeScript libraries, this platform mainly generates metadata for front-end framework paradigms such as React and Vue. I wonder if this project has matured to this point yet?

Add a LSP (language service protocol) extension

need to update the 2k lines of code I wrote last year to the new Ezno

There is an example video near the top of the page

Things to consider:

AST has to be partially valid, this may only work for ezno-parser as it has Cursor variants which enable const x = ; to be valid AST
Distribution using WASM seems like a good solution: https://www.osohq.com/post/building-vs-code-extension-with-rust-wasm-typescript
What features of https://microsoft.github.io/language-server-protocol/ to support

Future things

Not rechecking the whole source every write

Add more specification tests

If you haven't seen already Ezno has a document that outlines all its current checking capabilities!

Each section contains error diagnostics that arise when checking the code block above. and it's not all aesthetic these code blocks are tested against the checker with a custom testing crate

The aim of this document is to break down the features into parts so

we know what is currently implemented
we know what is not currently implemented (via to_implement.md)
Find regressions in behaviour after fixing or adding new features (newer features breaking older checker features)
Test performance of parts or the the whole suite

At the time of writing. 129 tests currently pass

Important

Ezno will hopefully be useful on new projects with little dependencies at around ~250 tests passing. At the around the 400-500 mark it might be ready for existing projects.

I need your help in writing more

Write them in staging, merge them either into to_implement.md if it doesn't work or if they do, then specification.md
Optionally write corresponding issues for failing specification tests
Check existing specification.md for improvements to coverage or invalid cases

Tip

Maybe you have a favourite JS feature or something you think is great that TSC does great at checking and isn't currently covered. Then it would be great to add it!

this is a general issue, that can relate to many pull requests

Type of arrays and other collection types

Given an array like

let x = [1, 4, "item"]

It is typed (or will be as { [0]: 1, [1]: 4, [2]: "item" }). The problem is how to reference its prototype.

Array is a generic structure so it could be Array<1 | 4 | "item">. However it would have to be modified every push...

It is needed to test equality

let x: Array<number> = [1, 4, "item"]

Alternatively the T type could be figured out from items. Along the lines of interface Array<T is this[number]>...

This also affects Set, Map, which could be done like

interface Map<K is this.#items[number][0], V is this.#items[number][1]> {
    #items: Array<[K, V]>
}

and also Promise!!!

interface Promise<T is ReturnType<this.#func>> {
    #func: Function
}

WASM edition

Want to be able to interface with the Ezno compiler from JavaScript. This will be useful for using in JS based build tools and for using in the browser for REPLs. Also would be cool to use in other languages with WASM runtimes.

wasm-pack looks like a good tool for taking Rust code and generating JS.

I attempted a while ago but haven't got it to the point of ready to publish:

ezno/parser/src/lib.rs

Lines 192 to 212 in c834e10

 #[cfg(target_arch = "wasm32")] 

 fn from_string( 

 string: String, 

 settings: ParseSettings, 

 source_id: SourceId, 

 offset: Option<usize>, 

 cursors: Vec<(usize, EmptyCursorId)>, 

 ) -> ParseResult<ParseOutput<Self>> { 

 let lex_settings = lexer::LexSettings { 

 include_comments: false, 

 lex_jsx: settings.jsx, 

 ..Default::default() 

 }; 

 let mut reader = BufferedTokenQueue::new(); 

 lexer::lex_source(&string, &mut reader, &lex_settings, Some(source_id), offset, cursors)?; 

 let ret = Self::from_reader(&mut reader, &settings); 

 if ret.is_ok() { 

 reader.expect_next(TSXToken::EOS)?; 

 } 

 ret 

 }

Steps:

Wasm doesn't currently support threads, which means the default ParallelTokenQueue won't work. Instead it should use a BufferedTokenQueue, which should work drop in. Just requires the lexing to be completed before starting parsing, as shown in the above code snippet where the function is called in the same thread. Need to change the dependency specification so that the parallel feature is replaced with the buffered feature:

ezno/parser/Cargo.toml

Lines 29 to 32 in c834e10

# TODO needs buffered for WASM

tokenizer-lib = { version = "1.5.0", features = [

"parallel",

], default_features = false }
Make some basic functions for taking a source string and returning it whitespace minified or prettified, like in the ast-explorer command. Ideally written in Rust and using wasm-packs glue code and shouldn't require writing a JavaScript to interface with it.
Deploying to the NPM package to be used as a library. Somehow do this in the publish CI.

Further steps:

If want to read and write AST, it needs to be able to be serialized and deserialized as a JSON formatted string. This is possible with Serde. Should use it's derive macro (and maybe behind cfg_attr behind a feature flag in the parser). This is also important if it would be added to https://astexplorer.net/
See if can write AST visitors in JS as function callbacks
Investigate https://napi.rs/, although looks like it is designed just for nodejs

Improve operators

There are currently some issues with operators

Firstly this should not .unwrap, it should raise a error

ezno/checker/src/types/operations.rs

Line 75 in 0f8b69d

op_type.unwrap().prop_to_type(),

Secondly all operators should be added.

The simple ones are bitwise logic, all mathematical operators (modulo, exponent...)
Initially I thought subtraction could be done composing addition and unary negation. On second thoughts and to just get it working, it should be it's own operator.
Each operator needs a associative constant function (which is implemented here)

Logical operators are a bit different as they can short-circuit which can conditionally run code. It could maybe be implement the same way as others as I work out the best way to approach conditional contexts.

The latter part is root contexts, which I will fully add at some point.

Improve CI testing and publishing

Testing

Run fuzzing tests in parallel. Will show red light correctly and be faster
Get cargo fuzz from: https://github.com/rust-fuzz/cargo-fuzz/releases/tag/0.11.2
Split up testing step by crates (eventually this should mean that it should only run tests for crates that have changed)
Have some sort of feature matrix for testing
Add hyperfine and flamegraph performance tracing
Enforce Cargo.toml formats

need to check whether caching is working correctly

Publishing

Automatically add ezno-ast-generator if ezno-parser is updated

Check JS Doc / other type annotations

#57 Added basic support for reading comments as type annotations in certain positions:

error:
  ┌─ ./private/test-files/demo.ts:1:24
  │
1 │ const x /* string */ = 4;
  │            ------      ^ Type 4 is not assignable to type string
  │            │
  │            Variable declared with type string

There is also JS Doc, which contains type annotations. Ezno's synthesis could parse these comments (so done inside the checker, rather than the parser) and pass them off as the type annotation if there isn't one.

Additionally, could there be changes Ezno could make to support things that currently aren't representable in JS Doc?

More configuration and per file options/configuration

Currently there exists TypeCheckOptions, which offers some very basic configuration that isn't really implemented.

Thinking further about configuring the checker for certain things. I think it needs to be more granular than a bunch of bools that apply to ever piece of code in the project.

Maybe one package allows console + 2 etc. Maybe another needs to parse comments for JSDoc. Things such as calling new (function x() {}) might be disallowed for user functions (which could provide speedups for the compiler etc).

Somethings to think about:

Figuring out where speedups can occur from configuration
Figuring out what is type checking and what is more linting (allowed in the runtime, but discouraged)
- I think there a very few that actually come under type checking and most are linting, which should not be added to ezno-checker, as it's role is just a checker (based on runtime)
Figuring out what TSC configuration is and how to correspond

Implement destructuring assignment (declaration works already)

While declaration destructuring is implemented (aka you can declare some variables where the LHS destructors the RHS value), the assignment part is not implemented

ezno/checker/src/context/environment.rs

Lines 412 to 413 in 7eb3a31

 Assignable::ObjectDestructuring(_) => todo!(), 

 Assignable::ArrayDestructuring(_) => todo!(),

For

let array1 = [1, 2, 3];
let a = 0, b = 0;
[a, b] = array1;

a satisfies 1;
b satisfies "hello world";

to work. It should be very similar (but different as it is acting on intermediate level rather than AST) to the assign_fields function used for the declaration kind

ezno/checker/src/synthesis/variables.rs

Line 198 in 7eb3a31

fn assign_to_fields<T: crate::ReadFromFS>(

The implementation should be very similar to the current behaviour for assigning to a variable, however it needs to do the recursive assignment and get_property things from destructuring.

Add an option NOT to bind the value of `this` to a returned value when getting a property to account for destructuring

While . access does bind the value of this to the value that is being accessed. Destructuring does not

Ezno does bind the value of this. It currently does not account for this difference during destructuring.

Steps to fix:

Add a bind_this parameter to get_property in checker/src/types/properties.rs
Don't do the binding logic if false
Pass false during destructuring cases and true else where

Unfortunately doesn't look like it is documented here

NPM release

At the moment the ezno-cli is only published on crates.io. Also the publish GitHub action should build a bunch of platform specific binaries and upload them to the releases page. However, as of writing, haven't had a fully successful deploy to verify that works 😆.

It would be nice, if like similar tools, ezno-cli was published as npm executable. This is done with the bin field in package.json.

Ideally it would be done by uploading the assets in the release action step.

The only problem is that these binaries are platform specific. Currently not sure how to communicate the platform the binary is built for to NPM, so that npx and npm install only install and run the binary for the user's platform.

Also might need to be in step with the WASM library code.

This would be great to get done and tested before the checker gets merged, so that when it is it can be installed from NPM.

Assigning to field with a union type across functions

Where I can try Ezno? Please add some online sandbox where I can try Ezne online
Here is interesting example that makes you think about whole soundness of type checking

function anotherFunc<T extends {field: string | null}>(obj: T){
    obj.field = null;
};

function someFunc<T extends {field: string}>(obj: T){
    obj.field;
    anotherFunc(obj);
    obj.field.length;
};

Does Ezno handle error in this example? (TypeScript doesn't). Here we have a runtime error because caller sets "obj.field = null" but callee does not expect this value. I think typecheker needs to calculate diff between passed type and type declared in parameter signature and make sure nothing from this diff set of values does not assign to parameter in the body of function in order to prevent runtime error. What do you think?

Source map bindings

Generating source maps when turning AST back into strings should be already possible because it is built on source-map. However no bindings are currently registered because add_mapping is never called.

A while ago I built a small css parser and it could generate reasonable maps. Here it used as mapping

https://github.com/kaleidawave/css-parser/blob/cb81b0b22a3be0aa7073f89f86af2cda87f25039/src/selectors.rs#L131-L133

Need to do the same here. The question is at what points? Not everything has to be mapped. Maybe just start with variable references and adjust until the output of https://evanw.github.io/source-map-visualization/ looks useful

Parse error with comment in object literal

Strange error for this code:

Getting started section?

This looks interesting and I'd like to try it out. I've gone through both blog posts, the readme, some code and all of the issues. From my reading of the readme, I infer it to ultimately be a tsc drop-in replacement.

I know you're probably very busy, but could you write/point me to a short "Getting Started" or "Migrating From Tsc" explainer for (incredibly) lazy people like me just to check it out on a barebones minimal TS example project? (preferably using esm!)

aside

It seems interesting to me that you're combining JS+TS+WASM here (and Rust!), especially in light of the now-archived prism lib (looked fabulous!!). I saw that got no love in the issues section which I can definitely relate to, but that lib (and I'm guessing this related compiler+checker) is at the very least related to what I'm looking for.

And all of this stems from my wanting to create a lower level reactive front end from my DLT architecture, and heavier frameworks have a lot of baggage that wouldn't leverage the unique content addressing.

Expected type during value synthesis

Currently the mapper functions a parameter is synthesised as generic

declare const array1: Array<number>;
array1.map(function mapper(a) { return a + 2 })

then the function is checked based on the restriction.

A better way would be to synthesise the function with the knowledge/expectation that it is expected to be a function as number => U. This means that the new_function can then create parameter restrictions eagerly.

This could be considered as traditional type inference

Unified reference 'setting' API

Envisioning a Reference enum (rename existing Reference from events to RootReference) that is a enum Reference { Variable(VariableId), Property { on: TypeId, key: TypeId } }. The there could be a set_reference method on Context that is the following.

fn set_reference<T: crate::FSResolver, U: Operator>(
   &mut self,
   reference: Reference,
   operation: U,
   rhs: &U::RHS,
   checking_data: &mut CheckingData<T>
) -> TypeId

Where Operator is the following

trait Operator {
    type RHS: SynthesiseToType;

    fn return_behavior(&self) -> ReturnBehavior;
}

trait SynthesiseToType {
    fn to_type(&self, ...) -> TypeId;
}

enum ReturnBehavior {
    NewValue,
    OldValue
}

This would handle all the update/assignment operations and (hopefully) be the simplest way to add this.

Inferred generic (poly type) constraints

A aim of Ezno is to work without writing type annotations. Variables don't need them because of the value system, return types are based on the body. The final piece is the unknown-ness of inputs to functions.

This can be in several places:

Explicit parameter

function func1(a) {
    return a.x // The constraint of `a` should be inferred as `{ x: any }`
}

Non restricted variable

let a = 4;
function func2() {
    return Math.sin(a) // The constraint of closed over references `a` should be inferred as number (with strict casts at least)
}

Cycles (recursion)

interface Node {
    parent: Node | null,
    x: 
}

function depth(obj: Node) {
    if (obj.parent) {
        // `depth` hasn't been checked yet. A restriction should be made
        return depth(obj.parent) + 1
    } else {
        return 0
    }
}

See #29

Others

Throw...

`any` as `unknown`

To better support existing codebases any should be treated as unknown. No annotation should be treated as unknown

Combining restrictions

When two restrictions are made they are combined using &

interface X {
    readable: boolean
}

interface Y {
    name: string
}

function x(p: X) {}
function y(p: Y) {}

function z(param1) {
    x(param1)
    y(param1)
}

param1 is now restricted to X & Y.

this might create impossible types (string & number), which is fine but a warning should be raised

Places where constraint inference is done

Calling a type
Getting a property
subtype-ing. If x has a unknown restriction and x <: number is evaluated, then x's restriction becomes number

I had up to here existing implemented, the hard part is now:

Nesting

function x(obj) {
     return obj.prop1.prop2.prop3
}

This needs to place a inference condition on a inference restriction 😅

The non-local `unknown` issue

If unknown is found on a fixed restriction, then a nested dynamic restriction might have to be on a fixed restriction.

interface Something {
    prop: any
}

function doSomethingWithSomething(s: Something) {
    return Math.sin(s.prop)
}

extended from https://github.com/kaleidawave/ezno/blob/main/checker/docs/inference.md

It is still in progress the best way to tackle this, interested if people have real world parameter inference examples they want supported:

Binary operator checking

Current the functions to doing binary operations don't check the types of either side. First a bit of background:

There are two functions for binary operators, they are split between

Mathematical and bitwise operations. Both sides have to be checked as there is no short circuiting behavior
Logical operators. These can have logical operators (and in the future the RHS can be narrowed for && operations)

in and instanceof operators are handled specially on Environment

There is a also a current option strict_casts (it should probably be renamed to no_implicit_casts) in TypeCheckOptions. When this is true should not allow operations like "hi" + 2.

These two functions are currently set up to return a Result. However both the logic for checking the sides and the diagnostic has not been implemented. For example

ezno/checker/src/behavior/operations.rs

Lines 76 to 84 in 5021262

 pub fn evaluate_mathematical_operation( 

 lhs: TypeId, 

 operator: MathematicalAndBitwise, 

 rhs: TypeId, 

 types: &mut TypeStore, 

 strict_casts: bool, 

 ) -> Result<TypeId, ()> { 

 fn attempt_constant_math_operator( 

 lhs: TypeId,

There are several things to carefully consider here

Still want to enable Symbol.toPrimative behaviour with hints and such (that will require these functions to have all the context to be able to do a call_type
- This should disallow Object.toPrimative fall through as this result are not useful

Exporting type from another file

Description

Hello!
First of all thank you for your work, I found out recently about this project and what you've done is really awesome!

I tried to run ezno against a codebase as a test, and I found that export type { ... } from '...' is not supported and is not in the to_implement.md as well.

AFAIK, this expression should be parsed but ignored as it brings nothing to the analysis I think

Improve type printing

Types are printed for error messages etc. Currently printing of types is temporary

ezno/checker/src/types/printing.rs

Line 6 in 422a359

 pub fn print_type(types: &TypeStore, id: TypeId, env: &GeneralEnvironment) -> String { 

Improvements to be made:

Take a &mut impl Write to not create Strings for each type, instead adding to existing buffer (like in the parser printer/to_string_from_buffer)
Take a &mut HashSet<TypeId> and avoid recursion, instead printing ... in loops or smth
In { "x": ... }, while "x" is the key of an object, it should be prettified to just { x: ... }

Others:

Could this method also have a debug mode, which does the same logic but prints additional internal logic?

Hold operators on special struct on root contexts

Operands are handled methods on an interface (for ease of use and others)

https://github.com/Boshen/oxc/blob/3263f2b654f6e43df321523a71f66e0cb0265ff9/crates/oxc_cli/src/type_check/mod.rs#L14-L20

It may be faster and simpler for when they are found to add them to a struct like

struct OperatorFunctions {
    add: Option<FunctionType>,
    ...
}

that exists on a root context, rather than every mathematical operation to do the lookup a property.

This would mean that different behaviour for operators is per project and couldn't get more granular, but that shouldn't be a problem.

Type checking recursive/cyclic functions

Eventually want to support

function myRecusiveLoop(i, x) {
    if (i <= 0) {
        return x * 2;
    }
    console.log(i);
    myRecusiveLoop(i - 1)
}

This is one of the most complex type checking problems but is possible with the following

Hoisting functions (and variables referencing function) will mean that references are known during checking the function body.
There should be a poly type for the result of calling a function whose body hasn't been synthesized (in the cases of recursion/cycles)
For recursion without a type annotation, it has a mutable/inferred restriction. The inferred type should be collected, so that when the recursive body is checked its return type can be checked that it satisfies recursion...
The effects under recursive functions need to be handled the same way as for for statements (which needs writeup, but means backing out of evaluating large loops and shortening to less precise types). apply_effect knows that it is cycle and needs to not get stuck in a infinite loop
When calling mutually recursive functions it also need to take the poly type into account and not register that two functions are called to break tree shaking. e.g.
```
function x() { y() }
function y() { x() }
```
both should have no call count

Feel free to comment any code examples you have that you want to work with Ezno's checking features

Make global identifiers relative to source

ExpressionId and other *Id structures can be used as a map key to map data to AST nodes. In the checker it used to associate type information with AST.

Currently, to create them there is a program wide (atomic) counter.

ezno/parser/src/expressions/mod.rs

Line 61 in c834e10

static EXPRESSION_ID_COUNTER: AtomicU16 = AtomicU16::new(1);

With incremental checking/watching, this means that it could overflow with enough changes without the program ending to reset the counter. It also makes incremental compilation really hard because there is data local the programs lifecycle that isn't easy to restore.

This shouldn't make anything faster as the atomic increment overhead (compared to non atomic) is negligible.

A better method is to have the identifiers relative to the source id. Hopefully this could be packaged together with Span position information to reuse its source_id field. This would mean that would write over existing identifiers and so existing identifier based data should be dropped as it is now invalid. The local counter state would be stored on ParsingState.

Function .bind. Then .call and .apply

Function .bind should be implemented as a constant function which creates a new Function type with the ThisValue set to the first argument.

Then once that is added .call and .apply can be added as constant functions

interface Function {
    bind(value): Function performs const bind;

    call(this_ty, ...arguments): any performs {
        return this.bind(this_ty)(...arguments)
    }

    apply(this_ty, arguments): any performs {
        return this.bind(this_ty)(...arguments)
    }
}

something like that

Not sure how checking will work...?

Also the this_value of call_type should be removed. The only this is from the type itself

Add binary type caching back and think about TypeId

Ezno contains a way to represent types (and subsequent other internals) in a binary way. For example the literal type "hello" can be serialized (and stored on disk) as a array of bytes such as [0x01, 0x48, 0x65, 0x6c, 0x6c, 0x6f]. Here 0x01 might point the discriminant/tag of the Type enum and the rest are utf8 (or whatever Rust .as_bytes() representation is).

This is a substitute for .d.ts files and can have the benefits

Represent events and other Ezno specific things(that are more complicated to write in definition files)
Faster
- Smaller, trivial parsing
- Can be split apart, run in parallel without hoisting (not doing ATM tho)

.d.ts files are still useful and used for non-cached checking and as a way to write definitions (as the binary is not intuitive to write). These files do not have to be generated by library authors or added to package managers etc.

Handling the serialization and deserialization logic is done with the binary-serialize-derive crate (which is currently under this repo, although might be moved in the future if it isn't specific to ezno-checker). It is a procedural macro that generates the logic for Type <-> [u8]

This was implemented a while ago. But got disabled sometime around the release as type information was split between TypeStore and Environment.

This needs to be added back, but needs some thinking and big adjustments before

Function types can reference positions of parameters. If it is split from definition files then there needs to be be a way to reference positions in the existing definition files. Not sure how that works with the current SourceId and source_map::filesystem setup. Maybe need to also store information from the filesystem from checking

AST to type mappings

There should be a way to get the type from a AST expression. While initially this was done using unique identifiers, I have a new idea.

Instead of 'id's on AST, instead use their source position data! It is unique between AST nodes (of the same kind) and already exists, so no need to make AST structs larger. When synthesising expression AST it can use the position/span as a key for assigning a type value. A span is equivalent to a Range<u32>, so will work with anything that can convert to it. And any AST formats given to Ezno will have position information.

Retrieving these types is useful when doing optimisations after the type synthesis.

And they are also used for retrieving data in the LSP...

Finding a type mapping through looking in a range

In the LSP, a hover request sends a position in the source. A novel way of mapping positions would be HashMap<Span, TypeId>, however when trying to search data it would have to iterate through each key in the map. While it would be per source LSP it still might be a costly iteration. Instead I am thinking of a data structure which keys values not based on if they match of the range but instead in the range. For example in the following there are several ranges.

let x = 3, y = 5;

Type   x + y + 2
3      -
5          -
5              -
8      -----
10     ---------

Given a scalar index is should be able to find the closest mapping.

A quick prototype of the data structure

Here is what I came up with to solve the problem

use std::collections::BTreeMap;
use std::fmt::Debug;
use std::ops::Range;

#[derive(Debug)]
pub struct RangeMap<T> {
    entries: BTreeMap<u32, Vec<(u32, T)>>,
}

impl<T> RangeMap<T> {
    pub fn new() -> Self {
        Self {
            entries: Default::default(),
        }
    }

    pub fn push(&mut self, range: Range<u32>, item: T) {
        if let Some(existing) = self.entries.get_mut(&range.start) {
            existing.push((range.end, item));
        } else {
            self.entries.insert(range.start, vec![(range.end, item)]);
        }
    }

    pub fn get(&self, point: u32) -> Option<&T> {
        self.entries
            .range(0..(point + 1))
            .rev()
            .find_map(|(_, v)| v.iter().find_map(|(e, v)| (*e > point).then_some(v)))
    }

    pub fn get_exact(&self, range: Range<u32>) -> Option<&T> {
        self.entries
            .get(&range.start)
            .and_then(|v| v.iter().find_map(|(e, v)| (*e == range.end).then_some(v)))
    }
}

After writing this I found btree-range-map, but think it is more complicated than above

Note this works because while they can intersect, any intersections are encapsulated (no overhangs).

$$ S_1 \cap S_2 \neq \emptyset \implies S_1 \subset S_2 \vee S_2 \subset S_1 $$

Diagnostics container as a trait / callback

Currently all diagnostics (type checking errors, warnings, information items) are buffered into a vector which is output.

There could instead be a case where the errors are passed to a callback that immediately prints them or raises them to some process.

This could:

Use references or format_args! to reduce allocation
Use threads. Type checking could continue while another writes to stdout
Reduce allocating a large vector (in the case where a codebase has A LOT of errors)

The ReadFromFS trait / generic parameter used throughout the checker, could be made into a more CheckingStuff trait that is custom for the checking environment (CLI, WASM project, LSP etc). A more general trait would be less code rewriting and avoid a third (unnecessary) generic parameter.

Excess property checks (including the effects of spreads)

It would be nice to allow Ezno to have more guards and checks than Typescript. One of those is performing excess property checks on spreading (which Typescript doesn't do), link to the issue: microsoft/TypeScript#39998

I'm pretty sure there are other common footguns that Typescript has which would be nice to avoid here with a more strict typing. Great work!

Or / sum on the RHS type equality

subtyping.rs current implements subtyping. Subtyping (sometimes expression as type1 <: type2), represents whether the type on the RHS satisfies all the properties of the LHS type.

For or / sum / union types the following logic happens.
If the LHS is a or type, then the RHS has to subtype the LHS or the RHS.

So for a | b <: b (let x: number | string = 4);

a | b <: b --> a <: b || b <: b

The following is true as the RHS is true.

The problem is what happens if the Or type is on the RHS. Will elaborate on cases later but it is currently broken

Record which context closed over variables are in

This probably creates issues, or at least isn't particular optimal

ezno/checker/src/events/application.rs

Lines 42 to 52 in 48deac2

 // TODO temp, might need to set something else. Doesn't work deep 

 let facts = target.get_top_level_facts(environment); 

 for closure_id in type_arguments 

 .closure_id 

 .iter() 

 .chain(type_arguments.structure_arguments.iter().flat_map(|s| s.closures.iter())) 

 { 

 facts 

 .closure_current_values 

 .insert((*closure_id, RootReference::Variable(variable)), new_value); 

 }

	CompilerSubCommand::Check(check_arguments) => {
	let CheckArguments { input, watch: _, definition_file, timings } = check_arguments;
	let entry_points = vec![input];

	PropertyValue::Getter(getter) => {
	let state = ThisValue::Passed(on);
	let call = getter.call(
	CalledWithNew::None,
	state,
	None,
	// TODO
	None,
	&[],
	SpanWithSource::NULL_SPAN,
	environment,
	behavior,
	types,
	true,
	);
	match call {
	Ok(res) => Some((PropertyKind::Getter, res.returned_type)),
	Err(_) => {
	todo!()
	}
	}
	}

	#[cfg(target_arch = "wasm32")]
	fn from_string(
	string: String,
	settings: ParseSettings,
	source_id: SourceId,
	offset: Option<usize>,
	cursors: Vec<(usize, EmptyCursorId)>,
	) -> ParseResult<ParseOutput<Self>> {
	let lex_settings = lexer::LexSettings {
	include_comments: false,
	lex_jsx: settings.jsx,
	..Default::default()
	};
	let mut reader = BufferedTokenQueue::new();
	lexer::lex_source(&string, &mut reader, &lex_settings, Some(source_id), offset, cursors)?;
	let ret = Self::from_reader(&mut reader, &settings);
	if ret.is_ok() {
	reader.expect_next(TSXToken::EOS)?;
	}
	ret
	}

	# TODO needs buffered for WASM
	tokenizer-lib = { version = "1.5.0", features = [
	"parallel",
	], default_features = false }

	Assignable::ObjectDestructuring(_) => todo!(),
	Assignable::ArrayDestructuring(_) => todo!(),

	pub fn evaluate_mathematical_operation(
	lhs: TypeId,
	operator: MathematicalAndBitwise,
	rhs: TypeId,
	types: &mut TypeStore,
	strict_casts: bool,
	) -> Result<TypeId, ()> {
	fn attempt_constant_math_operator(
	lhs: TypeId,

	// TODO temp, might need to set something else. Doesn't work deep
	let facts = target.get_top_level_facts(environment);
	for closure_id in type_arguments
	.closure_id
	.iter()
	.chain(type_arguments.structure_arguments.iter().flat_map(\|s\| s.closures.iter()))
	{
	facts
	.closure_current_values
	.insert((*closure_id, RootReference::Variable(variable)), new_value);
	}

kaleidawave / ezno Goto Github PK

ezno's Introduction

Help contribute

ezno's People

Contributors

Stargazers

Watchers

Forkers

ezno's Issues

Description

Solution

Steps to fix

Changes with lexing:

Changes to the AST, parsing and spans:

Libraries that use bytewise start-end spans:

Libraries that use line-column spans:

Discussed in #80

Steps

TODO

If you haven't seen already Ezno has a document that outlines all its current checking capabilities!

I need your help in writing more

wasm-pack looks like a good tool for taking Rust code and generating JS.

Steps:

Further steps:

Testing

Publishing

aside

Explicit parameter

Non restricted variable

Cycles (recursion)

Others

any as unknown

Combining restrictions

Places where constraint inference is done

Nesting

The non-local unknown issue

Description

Finding a type mapping through looking in a range

A quick prototype of the data structure

Recommend Projects

Recommend Topics

Recommend Org

`any` as `unknown`

The non-local `unknown` issue