Giter Club home page Giter Club logo

pikelet-lang / pikelet Goto Github PK

View Code? Open in Web Editor NEW
607.0 607.0 25.0 5.69 MB

A friendly little systems language with first-class types. Very WIP! 🚧 🚧 🚧

Home Page: https://pikelet-lang.github.io/pikelet/

License: Apache License 2.0

Rust 99.51% JavaScript 0.49%
bidirectional-typechecking compiler dependent-record-types dependent-records dependent-types programming-language rust systems-language systems-programming type-system type-theory typechecker

pikelet's People

Contributors

adrianwong avatar boomshroom avatar brendanzab avatar dependabot-support avatar dependabot[bot] avatar heyrutvik avatar memoryruins avatar phase avatar ratmice avatar repnop avatar tzemanovic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pikelet's Issues

Dependent Records (first pass)

This is a cut down set of requirements for dependent records, in contrast to the more lofty goals described in issue #2.

The following additions will be made to the core syntax:

R ::= l : T, R  -- field type annotation
    | ε         -- empty record type

r ::= l = t, r  -- field value definition
    | ε         -- empty record value

t,T ::= ...
      | Record { R }   -- record types
      | record { r }   -- record values
      | t.l            -- projection
  • The same label cannot be used multiple times in the same record.
  • For now we will be picky on field order, ie. Record { x : I8, y: I8 } will not be compatible with Record { y: I8, x : I8 }, and record { x = 1, y = 2 } : Record { x : I8, y: I8 } is a type error.
  • Field type annotations introduce new values into the type checking context, and may be used in the type annotations of later fields.
  • Field value definitions introduce new values into the evaluation context, to be when constructing later fields.
  • Care must be taken to preserve these substitutions when projecting fields.

User-defined infix operators

It would be nice to have operators for:

  • forward and reverse function composition: (<<), (>>)
  • forward and reverse function application: (<|), (|>)
  • mathematical operations: (+), (-), (*), (/)
  • string concatenation: (++)

Resources

Use "goldenfiles" for some tests

At the moment I'm explicitly writing out asts manually in some of my tests. This is tedious to update when I change the AST, and it would be handy to just get a diff of the changes automatically generated instead. This would allow us to be more liberal with our testing too. 'Goldenfiles' (also known as 'snapshot testing') seems like a good answer to this. There is a crate called goldenfile that does this for Rust.

Here are some tests that I think could be improved with this method:

We could probably add some more tests after that once we set up a nice, repeatable process for this.

Error recovery in the type checker

We should be able to run programs that:

  • are broken syntactically
  • are partially type-incorrect

We should also be able to see more than one type error message.

At the moment we recover from syntactic errors, inserting concrete::Term::Error variants in the places that we fail to parse. But then we blow up if wen try to desugar these into RawTerms. Instead we should probably insert a static-error : {a : Type} -> String -> a function in those places instead...

Rename signed integer types

I'm proposing that we rename the signed integer types from I8, I16, I32, I64 to S8, S16, S32, S64. I originally was following Rust's lead with the naming of signed integers, but I feel like this is more consistent.

If you do help out on this one, please be sure to update the book as well! 😄

Turn theory appendix in book into the Pikelet specification

I've been looking at the Web Assembly specification, and have been very impressed. I'm thinking it might be nice to follow its lead with regard to notation and approach. At the moment I just use the formal notation on the theory page, but it would nice to be able to expand it to a larger specification.

Static site generators

I'm not sure what we should use for documentation. Mdbook has been nice, but the support for maths notation support in pulldown-cmark is lacking right now. It would be lovely if pulldown-cmark supported setting the class of inline and block elements, kind of like: {katex}`\Gamma \vdash x : \tau` - this way it would also render ok in Github's previews, but I don't know what it would take to get that merged.

The features I need are:

  • Math support:
  • Ability to add a custom syntax highlighting - currently I use a fork of highlight.js for this

So yeah some other options available to us:

Tool Language Markup Language Math support Custom syntax highlighting Docs theme Boots to SPA Versions PDF Export
Gutenburg Rust Markdown No (could be added) Awkward Yes No No No
Hakyll Haskell Markdown MathJax or KaTeX Unknown No No No No
Slick Haskell Markdown MathJax or KaTeX Unknown No No No No
Sphinx Python ReStructured text MathJax or KaTeX Unknown Yes No Yes Yes
VuePress Javascript Markdown KaTeX (server rendered) Yes Yes Yes No No
Docusaurus Javascript Markdown KaTeX (server rendered) Yes Yes No Yes No

Separate repository

Should we put the spec and/or site in a separate repository? Or continue with the mono-repo approach?

Existing Specs

Add facilities for debugging/visualising the typechecker

Currently it's hard to get much insight into what the type checker is doing when it is checking a term, even with a debugger. This is a barrier to entry for helping new developers gain understand how the Pikelet type checking algorithm works. This is important if we want more developers to be able to confidently contribute fixes and features to the type checker. It could also help us fix bugs and spot opportunities for optimizations.

A debugger would output some directed graph showing how a Pikelet term is checked. We could potentially use Graphvis, or perhaps HTML for this. Console logging is another option.

Finish implementing Core->Concrete resugaring

This will allow us to remove all the excess pretty printers that we have to write for all the core representations by just pretty printing the concrete representation. Perhaps we could keep the core pretty printers around for debugging purposes, but use a simpler s-expression syntax that doesn't require stuff like precedences calculated.

  • Insert parens in proper places
  • use simpler s-expr representation for core syntax
  • Rename generated names (#6), taking care to avoid accidental name captures

Effects: Pikelet should actually be useful!

At the moment Pikelet doe not do anything useful. In order to be more useful, we probably need some form of way to perform effects.

Initially we could use an Io : Type -> Type wrapper type for this.

Eventually I would like us to move to something like an algebraic effect system. The limiting factor for this would be that it must be zero-cost. ie. We need to make sure that any reification of effects can be erased or partially evaluated away at compile time in a reliable way. This should result in codegen identical to if one had written code in an imperative language.

Resources

yallop/effects-bibliography has lots of links.

Here are some papers that are more specifically about integrating dependent types and effects:

  • D. Ahman, Handling Fibred Algebraic Effects. (Paper, Agda code)
  • D. Ahman, N. Ghani, G. Plotkin. Dependent Types and Fibred Computational Effects. (Paper)
  • Y. Cong, K. Asai, Handling Delimited Continuations with Dependent Types (Paper, Video)
  • C. McBride. Do be do be do. (Paper)
  • E. Miquey. A Classical Sequent Calculus with Dependent Types (Paper)
  • F. Nielson, H. R. Nielson. Type and Effect Systems. (Paper)
  • P.-M. Pédrot. Taming Effects in a Dependent World (Slides)
  • P.-M. Pédrot, N. Tabareau. Failure is Not an Option: An Exceptional Type Theory. (Paper, Slides)
  • A. Rossberg. 1ML with Special Effects: F-ing Generativity Polymorphism. (Paper)

Refactor the compiler to use visitors

I poke fun at expression parser/evaluator written in OO style, because really the visitor pattern is just asking for it. But I have a weird, almost morbid fondness for such things written in plain old ANSI C. It’s complicated between me and C don’t ask me about it I’m not ready

Drn’t Tran Ma 🦖

At the moment we create lots of intermediate data structures between the stages in the compiler, and it would be nice to cut that down!

Using the visitor pattern might make things more efficient by allowing us to fuse various traversals into single passes. Ultimately it might even be a good idea (or even necessary) to push the visitor abstraction back to the nameless library as well. The challenge would be to continue to maintain a nice, readable codebase in the process.

Resources

Variable not bound during evaluation

Found during a demo at the Melbourne Haskell User Group:

Pikelet> (\(t : Type) (f : (a : Type) -> a -> a) => f t) String (\(a : Type) (x : a) => x)
error: internal compiler error: cannot find `t$1` in scope
- <repl>:1:46: not found in this scope

If we manually apply the first argument it seems to be fine:

Pikelet> (\(f : (a : Type) -> a -> a) => f String) (\(a : Type) (x : a) => x)
\x$12 : #String => x$12 : ($7 : #String) -> #String

Thanks @andykitchen for finding this!

Don't panic on mismatched number of record fields

The following expressions illustrate the problem:

record { x = "hello" } : Record { x : String; y : String }
record { x = "hello"; y = "hello" } : Record { x : String }

These both cause the following panic:

panicked at 'assertion failed: `(left == right)`
  left: `2`,
 right: `1`', /../moniker-0.3.2/src/scope.rs:87:13

This is due to the following call to Scope::unbind2 in the type checker:

        // C-RECORD
        (&raw::Term::Record(span, ref raw_scope), &Value::RecordType(ref raw_ty_scope)) => {
            let (raw_fields, (), raw_ty_fields, ()) =
                Scope::unbind2(raw_scope.clone(), raw_ty_scope.clone());

Scope::unbind2 should probably return a result in this case, but for now we could work around this by adding the following check:

        // C-RECORD
        (&raw::Term::Record(span, ref raw_scope), &Value::RecordType(ref raw_ty_scope)) => {
            if raw_scope.unsafe_pattern.unsafe_patterns.len()
                != raw_ty_scope.unsafe_pattern.unsafe_patterns.len()
            {
                return Err(unimplemented!("mismatched fields"));
            }

            let (raw_fields, (), raw_ty_fields, ()) =
                Scope::unbind2(raw_scope.clone(), raw_ty_scope.clone());

unimplemented!("mismatched fields") should be replaced with an appropriate error diagnostic, and we'll need some tests to cover this as well! Here are some tests that could go in the check_term.rs file:

#[test]
fn record_field_mismatch_lt() {
    let mut codemap = CodeMap::new();
    let tc_env = TcEnv::default();

    let expected_ty = r"Record { x : String; y : String }";
    let given_expr = r#"record { x = "hello" }"#;

    let expected_ty = parse_nf_term(&mut codemap, &tc_env, expected_ty);
    match check_term(&tc_env, &parse_term(&mut codemap, given_expr), &expected_ty) {
        Err(TypeError::/* your error variant here! */ { .. }) => {},
        Err(err) => panic!("unexpected error: {:?}", err),
        Ok(term) => panic!("expected error but found: {}", term),
    }
}

#[test]
fn record_field_mismatch_gt() {
    let mut codemap = CodeMap::new();
    let tc_env = TcEnv::default();

    let expected_ty = r"Record { x : String }";
    let given_expr = r#"record { x = "hello"; y = "hello" }"#;

    let expected_ty = parse_nf_term(&mut codemap, &tc_env, expected_ty);
    match check_term(&tc_env, &parse_term(&mut codemap, given_expr), &expected_ty) {
        Err(TypeError::/* your error variant here! */ { .. }) => {},
        Err(err) => panic!("unexpected error: {:?}", err),
        Ok(term) => panic!("expected error but found: {}", term),
    }
}

Implement a Language Server and build an extension for VS Code

Currently Pikelet is implemented in a very traditional way - ie. it consumes a source string, does some processing, and spits something out. This isn't a very good approach when wanting to support interactive editing, however, but the more more modern, query-driven approaches to language implementation are still sparsley documented.

One way to make some of the requirements for this more concrete is to try our hands at implementing a language server communicating via the LSP to a VS Code extension. This could help direct these efforts, before we accumulate too much tech debt!

Plan

  1. create a skeleton language extension for VS Code
  2. implement a syntax highlighter
  3. build a rudimentary language server using languageserver-types and codespan-lsp, possibly upstreaming helpful stuff to codespan. Others are currently working on similar problems (which is a good thing!), so we should coordinate our work on the codespan Gitter and on brendanzab/codespan#8.
  4. Add integration tests for the language server

TODO: Improve this plan!

Resources

Terminology bikeshed: Universe shifting/lifting/embiggening

So I now have some sort of McBride-style universe shifting implemented in Pikelet in (excuse my poor technical writing). I was wondering about terminology here. Should we call the foo^n a:

Wondering if you have any strong thoughts @jonsterling, @pigworker?

Prior discussion and implementation here: #10, #125

cc. @LightAndLight

Add some primitive operations

Here's a list of some primitive functions that would be useful to have:

  • #String-eq : #String -> #String -> #Bool
  • #String-lt : #String -> #String -> #Bool
  • #String-le : #String -> #String -> #Bool
  • #String-ge : #String -> #String -> #Bool
  • #String-gt : #String -> #String -> #Bool
  • #String-gt : #String -> #String -> #Bool
  • #String-append : #String -> #String -> #String
  • #Char-eq : #Char -> #Char -> #Bool
  • #Char-lt : #Char -> #Char -> #Bool
  • #Char-le : #Char -> #Char -> #Bool
  • #Char-ge : #Char -> #Char -> #Bool
  • #Char-gt : #Char -> #Char -> #Bool
  • #Char-gt : #Char -> #Char -> #Bool
  • #Char-to-string : #Char -> #String
  • #U8-eq : #U8 -> #U8 -> #Bool
  • #U8-lt : #U8 -> #U8 -> #Bool
  • #U8-le : #U8 -> #U8 -> #Bool
  • #U8-ge : #U8 -> #U8 -> #Bool
  • #U8-gt : #U8 -> #U8 -> #Bool
  • #U8-gt : #U8 -> #U8 -> #Bool
  • #U8-add : #U8 -> #U8 -> #U8
  • #U8-sub : #U8 -> #U8 -> #U8
  • #U8-mul : #U8 -> #U8 -> #U8
  • #U8-div : #U8 -> #U8 -> #U8
  • #U8-to-string : #U8 -> #String
  • #U16-eq : #U16 -> #U16 -> #Bool
  • #U16-lt : #U16 -> #U16 -> #Bool
  • #U16-le : #U16 -> #U16 -> #Bool
  • #U16-ge : #U16 -> #U16 -> #Bool
  • #U16-gt : #U16 -> #U16 -> #Bool
  • #U16-gt : #U16 -> #U16 -> #Bool
  • #U16-add : #U16 -> #U16 -> #U16
  • #U16-sub : #U16 -> #U16 -> #U16
  • #U16-mul : #U16 -> #U16 -> #U16
  • #U16-div : #U16 -> #U16 -> #U16
  • #U16-to-string : #U16 -> #String
  • #U32-eq : #U32 -> #U32 -> #Bool
  • #U32-lt : #U32 -> #U32 -> #Bool
  • #U32-le : #U32 -> #U32 -> #Bool
  • #U32-ge : #U32 -> #U32 -> #Bool
  • #U32-gt : #U32 -> #U32 -> #Bool
  • #U32-gt : #U32 -> #U32 -> #Bool
  • #U32-add : #U32 -> #U32 -> #U32
  • #U32-sub : #U32 -> #U32 -> #U32
  • #U32-mul : #U32 -> #U32 -> #U32
  • #U32-div : #U32 -> #U32 -> #U32
  • #U32-to-string : #U32 -> #String
  • #U64-eq : #U64 -> #U64 -> #Bool
  • #U64-lt : #U64 -> #U64 -> #Bool
  • #U64-le : #U64 -> #U64 -> #Bool
  • #U64-ge : #U64 -> #U64 -> #Bool
  • #U64-gt : #U64 -> #U64 -> #Bool
  • #U64-gt : #U64 -> #U64 -> #Bool
  • #U64-add : #U64 -> #U64 -> #U64
  • #U64-sub : #U64 -> #U64 -> #U64
  • #U64-mul : #U64 -> #U64 -> #U64
  • #U64-div : #U64 -> #U64 -> #U64
  • #I8-eq : #I8 -> #I8 -> #Bool
  • #I8-lt : #I8 -> #I8 -> #Bool
  • #I8-le : #I8 -> #I8 -> #Bool
  • #I8-ge : #I8 -> #I8 -> #Bool
  • #I8-gt : #I8 -> #I8 -> #Bool
  • #I8-gt : #I8 -> #I8 -> #Bool
  • #I8-add : #I8 -> #I8 -> #I8
  • #I8-sub : #I8 -> #I8 -> #I8
  • #I8-mul : #I8 -> #I8 -> #I8
  • #I8-div : #I8 -> #I8 -> #I8
  • #I8-to-string : #I8 -> #String
  • #I16-eq : #I16 -> #I16 -> #Bool
  • #I16-lt : #I16 -> #I16 -> #Bool
  • #I16-le : #I16 -> #I16 -> #Bool
  • #I16-ge : #I16 -> #I16 -> #Bool
  • #I16-gt : #I16 -> #I16 -> #Bool
  • #I16-gt : #I16 -> #I16 -> #Bool
  • #I16-add : #I16 -> #I16 -> #I16
  • #I16-sub : #I16 -> #I16 -> #I16
  • #I16-mul : #I16 -> #I16 -> #I16
  • #I16-div : #I16 -> #I16 -> #I16
  • #I16-to-string : #I16 -> #String
  • #I32-eq : #I32 -> #I32 -> #Bool
  • #I32-lt : #I32 -> #I32 -> #Bool
  • #I32-le : #I32 -> #I32 -> #Bool
  • #I32-ge : #I32 -> #I32 -> #Bool
  • #I32-gt : #I32 -> #I32 -> #Bool
  • #I32-gt : #I32 -> #I32 -> #Bool
  • #I32-add : #I32 -> #I32 -> #I32
  • #I32-sub : #I32 -> #I32 -> #I32
  • #I32-mul : #I32 -> #I32 -> #I32
  • #I32-div : #I32 -> #I32 -> #I32
  • #I32-to-string : #I32 -> #String
  • #I64-eq : #I64 -> #I64 -> #Bool
  • #I64-lt : #I64 -> #I64 -> #Bool
  • #I64-le : #I64 -> #I64 -> #Bool
  • #I64-ge : #I64 -> #I64 -> #Bool
  • #I64-gt : #I64 -> #I64 -> #Bool
  • #I64-gt : #I64 -> #I64 -> #Bool
  • #I64-add : #I64 -> #I64 -> #I64
  • #I64-sub : #I64 -> #I64 -> #I64
  • #I64-mul : #I64 -> #I64 -> #I64
  • #I64-div : #I64 -> #I64 -> #I64
  • #F32-eq : #F32 -> #F32 -> #Bool
  • #F32-lt : #F32 -> #F32 -> #Bool
  • #F32-le : #F32 -> #F32 -> #Bool
  • #F32-ge : #F32 -> #F32 -> #Bool
  • #F32-gt : #F32 -> #F32 -> #Bool
  • #F32-gt : #F32 -> #F32 -> #Bool
  • #F32-add : #F32 -> #F32 -> #F32
  • #F32-sub : #F32 -> #F32 -> #F32
  • #F32-mul : #F32 -> #F32 -> #F32
  • #F32-div : #F32 -> #F32 -> #F32
  • #F32-to-string : #F32 -> #String
  • #F64-eq : #F64 -> #F64 -> #Bool
  • #F64-lt : #F64 -> #F64 -> #Bool
  • #F64-le : #F64 -> #F64 -> #Bool
  • #F64-ge : #F64 -> #F64 -> #Bool
  • #F64-gt : #F64 -> #F64 -> #Bool
  • #F64-gt : #F64 -> #F64 -> #Bool
  • #F64-add : #F64 -> #F64 -> #F64
  • #F64-sub : #F64 -> #F64 -> #F64
  • #F64-mul : #F64 -> #F64 -> #F64
  • #F64-div : #F64 -> #F64 -> #F64

Allow files to be loaded from the REPL

We should be able to load files in the REPL, either when passing a list of filenames on startup, or by a :load/:l command while the REPL is running:

cargo run repl ./src/library/prelude.pi
Pikelet> :load ./src/library/prelude.pi
Pikelet> id String "hi"
"hi" : #String

We should also be able to reload files using :reload/:r:

Pikelet> :load ./src/library/prelude.pi
Pikelet> :reload

Use arena for allocating AST nodes

This might be faster than using reference counting, as we currently do in the core AST. We'll have take care not to let it adversely impact readability in the type checker, however.

Remove block comments

PR #32 introduced block comments to the lexer. It would be nice if these nested too! This would then admit the following string:

{- 
    {- ... -} 
    {- 
       {- ... -} 
    -}
-}

We have decided instead to remove the block comments entirely!

Dependent records

One big motivation I've had for going with dependent types in the first place is the idea of being able to use the same language feature (dependent records) for both modules and data types. This is similar to what is proposed in 1ML, but I would prefer not to jump through the hoops of forcing the language into System Fω, and rather just go straight to dependent types, seeing as they are such a handy tool to have in the first place.

These would be super handy for representing modules. Eg.

Graph (a : Type) = Record {
    Node : Type,
    Edge : Type,

    has-edge : a -> Node -> Node -> Bool,
    edges : a -> Node -> List Edge,
}

MyGraph = DArray (Int, Int)

MyGraph-graph : Graph MyGraph
MyGraph-graph = record {
    Node = Int,
    Edge = (Int, Int),

    has-edge = _,
    edges = _,
}

This would also allow us to hide the length of length of arrays to get something like Rust's Vec type:

DArray (a : Type) = Record {
    len : U64,
    data : Array a len,
};

Relationship with Sigma types

A very simple way of representing dependent records would be to use dependent pairs (sigma types):

Unit : Type
{} : Unit
{e1, e2} : (x : t1) * t2
left : (x : t1) * t2 -> t1
right : (x : t1) * t2 -> t2

Where t1 depends on t2.

The trouble with this formulation is that labels are not significant when comparing for equality at the type and value level.

Structural vs Nominal

I would kind of prefer to go structural rather than nominal with this, but it seems that most dependently typed languages use nominal records. I'm guessing this is because it makes checking if two record types are compatible much easier. If we went structural we might need to do some complex subtyping and coercions to handle the fact that fields might be mentioned out of order.

In his thesis on Agda, Ulf Norel says:

In practise, however, it is a good idea to let each record declaration introduce a new type. This means that two record types declared to have the same fields will be different, but they will have the same elements. One advantage of this is that it significantly improves the efficiency of checking equality between record types—instead of comparing the types of all the fields, it is enough to compare the names. It is also good programming practise to keep intentionally different types separate in the type system.

Data layout

It would be nice if records were efficiently packed in memory by default, like Rust does it. The unboxed types issue #18 is related.

A potential issue with a non-uniform memory representation is that, as @ollef points out, if you have a record { x : A, y : B x, z : C }, then the memory offset of z depends on x. We might be able to do some size analysis though, and require non-statically known sizes to have a pointer indirection.

If we allow structural types, data layout might become trickier. We might need to have a consistent 'normalized field order' for records. Things would become even more trickier if row polymorphism was brought into the mix.

Other stuff to consider wrt. data layout is struct of array and array of structs. Perhaps our compile time reflection would be powerful enough to auto-derive this, but langs like Jai have this built in.

This paper talks about some interesting ways to parameterise over data layout: https://www.doc.ic.ac.uk/~scd/ShapesOnwards.pdf

Here is a book going through Data Oriented Design.

Row polymorphism

It would be nice to support row polymorphism, but I have no idea how this would look.

Haven't thought much about it. That paper uses subtyping. For row typing you might be able to do like { Γ; Θ={ y₀:B₀; ⋯ yₙ:Bₙ }; Δ[Γ, Θ] } where Γ and Δ are treated as two separate row variables, Δ depends on Γ and Θ, and Θ is closed with respect to Γ. Not sure though.

First class labels

Labels are also interesting - I've never really liked the 'type level string' nature of record labels, especially once row polymorphism comes into play. Would be nice to have these be namespaced, sort of like how clojure.spec does it. Not sure what the ergonomics would look like though.

Lenses

Would we want to auto-generate these? If we had first class labels and row polymorphism I could see us being generate polymorphic lenses in a library, like how Purescript does it.

References

Let and where bindings

I've implemented concrete syntax for let and where bindings, but not hooked them up to the core syntax yet. We need to do this at some stage!

I've been thinking - could we perhaps desugar them into dependent records like 1ML does? Eg:

  • let x = y; .. in z to (record { x = y, .., body = z }).body
  • z where x = y; .. to (record { x = y, .., body = z }).body

Not sure what the implications for alpha equality would be though. Also whether this is compatible with how we want bindings to be evaluated. 🤔

Allow fields to be shifted

It would be nice to be able to shift field lookups too!

At the moment this just gives:

Pikelet> prelude.id^1 Type
error: expected one of "(", ")", "->", ".", ":", ";", "=", "=>", "?", "Record", "Type", "[", "]", "binary literal", "character literal", "decimal literal", "else", "float literal", "hex literal", "identifier", "import", "octal literal", "of", "record", "string literal", "then", "where", or "}", found `^`
- <repl>:1:11
1 | prelude.id^1
  |           ^ unexpected token

Fix ambiguous dependent function syntax

Currently we use the following syntax for functions:

(x : t1) -> t2  -- dependent functions
t1 -> t2        -- non-dependent functions

Sadly once we add in the type annotation syntax, ie. e : t, these notations become ambiguous in a context free grammar. This means we have to do an additional pass over the AST, fixing this after-the-fact.

We should either:

  • Live with this ambiguity and the hack
  • Switch to a non-context free parser generator, like some kind of PEG with ordered choice
  • Choose a different dependent function syntax, perhaps something like: Fn (x : t1) -> t2
  • Remove the type annotation syntax, perhaps replacing it with a function like: the : (a : Type) -> a -> a (this is what Idris does)

Compiler backend

It would be nice to actually compile Pikelet programs!

Initial sketch of passes

We'll probably want some intermediate representations like MLton to get the higher level Pikelet code to be in better shape for passing to a back-end.

  • Raw - check/infer/elaborate -> Core
    • fill in holes
    • pass instances and implicit args explicitly
    • add explicit coercions
  • Core - A-Normalize -> A-Normal Form (#91)
    • make evaluation order and control flow explicit
  • A-Normal Form - Closure Conversion -> Closure Converted
    • separate anonymous functions into a pair of:
      • a function pointer
      • the data containing the environment
  • Closure Converted - ??? -> SSA
  • SSA - ??? -> ...
  • ... - Codegen -> LLVM/Cretonne/Bytecode

Unresolved questions

We would like to do more work on the type system to make efficient code-gen easier. For example tracking coeffects such as usage mutliplicities (#7), and data flow annotations (allowing us to partially evaluate code at compile time). Regions would also be handy at some stage.

Another issue we'll have to face at some time is garbage collection. Ultimately it would be nice to not have to force users to use a specific allocation strategy, but figuring out how this interacts with multiplicities might be tricky. Until we have a better idea it might be be easier just to use something like the Boehm-Demers-Weiser GC in the interim.

Possible codegen targets

Some possible targets for codegen are as follows:

  • LLVM (the inkwell crate looks handy!)
  • Cretonne
  • Web Assembly (could be covered by Cretonne or LLVM)
  • Some sort of Typed Assembly Language (TAL)?
  • Custom JIT (see runtimes-WG for some ideas)

A JIT might be handy as it would allow us to embed Pikelet in existing Rust codebases. This could help us gain adoption without needing to convince folks to switch to an entirely new language in the mid-term.

Resources:

Example compilers in Rust

Sixten is another functional language with dependent types and unboxed data types. Might be worth looking there for inspiration too!

Cumulative Universe Hierarchy

At the moment we use a stratified universe hierarchy to avoid Girard's Paradox (endemic to systems where Type : Type). Unfortunately this means that we lose some polymorphism. For example:

Pikelet> (\(x : Type) (x: x) => x) Type String
error: found a term of type `Type 1`, but expected a term of type `Type`
- <repl>:1:27: the term
Pikelet> (\(x : Type) (x: x) => x) (Type 1) Type
error: found a term of type `Type 2`, but expected a term of type `Type`
- <repl>:1:28: the term

It would be nice to be able to write the identity function once and have it work for all levels in the universe hierarchy.

Resources:

Get Pikelet to build on wasm

@sordina was trying to get Pikelet to build on wasm, but was running into issues with some system dependencies:

error[E0432]: unresolved import `sys`
 --> /Users/../.cargo/registry/src/github.com-1ecc6299db9ec823/mortal-0.1.5/src/screen.rs:9:5
  |
9 | use sys;
  |     ^^^ no `sys` in the root

error[E0432]: unresolved import `sys`
  --> /Users/../.cargo/registry/src/github.com-1ecc6299db9ec823/mortal-0.1.5/src/terminal.rs:10:5
   |
10 | use sys;
   |     ^^^ no `sys` in the root

error: aborting due to 2 previous errors

Getting Pikelet to build on wasm would be great, because it would let us build a 'try' website in the browser without needing to run it on a server, and also support use cases like the one above.

Plans for formalizing the type system and implementation

Had a nice chat with @pythonesque on Twitter:

I think at this point most people involved in Coq development, at least, agree that a majority of the stuff in it should be in a proof assistant (maybe should have been from the beginning). I strongly recommend you try to prove as much as you possibly can; it'll save you later.

https://twitter.com/awesomeintheory/status/983592526232932352

Apparently lots of the trouble comes in the form of optimizations to the type checking algorithm.

I think it might make sense in the short term to try to keep the implementation as close to the syntax-directed algorithm described in the appendix of the book. For the short term our programs should be small, and speed should be less of a concern for us.

As a precaution, we should also try to ensure certain aspects of our implementation avoid known trouble areas, for example using implicit substitutions, and building up a tangle of mutually recusive definitions/modules/types (although that can be hard to avoid at times in dependent type systems).

In parallel we could:

  1. Define the declarative semantics (non-syntax-directed) and prove soundness theorems about that
  2. Ensure that the syntax directed semantics match the properties of the declarative semantics
  3. Using something fancy like Iris to design an imperative, optimized version of the type checker
  4. Ensure that the optimized version matches the properties of the declarative semantics as well

Then we could start carefully porting over the optimizations to the Rust implementation.

Resources

Papers

Projects

Implement a pass to compile Core terms to A-Normal Form

So, it seems that the first stage of compilation that we might want to do is to compile to A-Normal Form (ANF). I'm guessing we could start a new module called pikelet::compiler::anf for this. We'll probably want a new set of IR types for this.

A-Normal Form syntactically sequentializes computations, and it partitions expressions into two classes: atomic expressions and complex expressions.

Because it (implicitly) simplifies the internal structure of continuations, it is easier to construct an interpreter for A-Normal Form.

For the same reason, it is easier to generate machine code or construct a static analyzer.

A-Normalization: Why and How

Type Preservation

I ultimately want each of our passes to be type preserving, so we'll want a new type system and type checker, but it's fine if the initial version leaves this out. The paper Compiling Dependent Types Without Continuations outlines how this can be done for dependent types.

Resources

Don't feel that you need to read and understand all of these in their entirety to tackle this issue, but I've included them with the hope that they might be helpful or interesting! Let me know if there are any more I should add!

Implicit (inferred) arguments

Currently type parameters must always be supplied to polymorphic functions. Eg.

id : (a : Type) -> a -> a;
id a x = x;

const : (a b : Type) -> a -> b -> a;
const a b x y = x;

This can get quite annoying! Eg.

id String "hi"
id I32 42
const String Char "hi" '\s'
const I32 Char 42 '\s'

Implicit arguments will allow users to omit some arguments Eg.

id : {a : Type} -> a -> a;
id a x = x;

const : {a b : Type} -> a -> b -> a;
const a b x y = x;

This would make using the above functions much less tedious:

id "hi"
id 42
const "hi" '\s'
const 42 '\s'

We will probably have to add some simple unification to the type checker to enable this!

Variable numbering doesn't reset

Each time a term is printed in the REPL the variable numbering is different. It looks like the variable numbers don't reset at any time. This is confusing if you enter the same or similar term twice and expect to get the same or similar output.

Move to an incremental/query driven architecture

Before we go too far down the path of building a traditional compiler (#9), it probably makes sense to start thinking about how we might incrementalise things. This will be super important for supporting a good editor experience (#97). If we put this off for too long we might end up having to rebuild a bunch - this is what Rust is facing, for example.

Without knowing much about it, perhaps something like the Incremental Lambda Calculus would be handy for this. We could try to find the derivative of each pass of our compiler, based on the result of a previous run. This could also be a helpful framework for formalising our incremental compiler as well (#39)!

CRDT-style data structures could also be of potential use, and perhaps projects like timely-dataflow and differential-dataflow.

Unresolved Questions

What is the difference between 'incremental' vs. 'query-driven'?

Resources

Record keyword/sigil ambiguity bikeshedding

I'd like to use bare braces for implicit arguments (see #8), so we'll need a keyword or sigil for dependent record syntax (see #2). Now, if I just used a single identifier, like record, for record types and record values, I would run into the following ambiguity:

Pikelet> :t record {}

Is this an empty record value, or the type of an empty record?

If we were going to have default fields then we would run into a similar issue:

Pikelet> :t record { x : I32 = 23 }

Is this a record value with a field annotated with a type I32 and a value 23, or is it a record type with a field of type I32 and a default value 23?

One way to resolve this would be to defer this choice to type checking, and require annotations in ambiguous circumstances. Bidirectional checking should make this trivial, but I'm not sure how nice it would be ergonomically.

Idris has a similar ambiguity around (), which can stand for the 'unit value' or the 'unit type'. It however defaults to the unit value in ambiguous circumstances, and coerces to a type or value when it has the necessary type information:

Idris> ()
() : ()
Idris> the () ()
() : ()
Idris> the Type ()
() : Type

Another option would be to have two different keywords, but then I have to decide on that:

  • Record/record is one way, using a capital to signify the change in universe level
  • Struct/struct - would convey more of a feeling of that fact we want a non-uniform memory representation
  • sig/struct, like in ML?
  • sig/record?
  • signature/record - maybe a bit verbose?

Implement 'Let Arguments Go First'

It would be nice to improve the bidirectional type checking algorithm to also pull information from the arguments. This could also help us eventually implement implicit arguments (see #8). A nice approach for doing this was given in the paper Let Arguments Go First. In the introduction the authors state (emphasis added):

We believe that, similarly to standard bi-directional type checking, bi-directional type checking with an application mode can be applied to a wide range of type systems. Our work shows two particular and non-trivial applications. Other potential areas of applications are other type systems with subtyping, static overloading, implicit parameters or dependent types.

In section 3 (the Hindley-Milner example) of the paper they show the application context pulling type information from the arguments in order to fill in the 'hidden type arguments' of the foralls via the subtyping rules. Perhaps we could do something similar for implicit arguments! This would effectively involve combining sections 3, 4, and 5.3 into one system.

Resources

  • N. Xie, B. C. d. S. Oliveira, Let Arguments Go First. (Paper, Slides)
  • C. Jenkins, A. Stump, Spine Local Type Inference. (Paper, Slides)

Figure out a strategy for handling common (usually ignored) errors

Here are some common errors that are often ignored, but would be nice to be able to avoid either statically or ensure they are accounted for at runtime:

  • Out of memory errors
  • Integer overflow/underflow (ranged integers?)
  • Divide by zero
  • Loss of numerical precision
  • Units of measure mismatches (compile time units of measure lib?)
  • Index out of bounds (track array/matrix dimensions at the type level?)

Pattern match compilation

Core languages like PiSigma and MiniTT have simple constructs for pattern matching, allowing for easier verification of type checking.

I'm not sure where this should be done, however. MLton seems to do it during its defunctorise pass.

We can do it in a naive way for now, but there is an interesting paper showing how to do it well: Compiling Pattern Matching to Good Decision Trees. Do note that we have the added complication of perhaps wanting to add dependent patterns etc.

Compiling non-dependent pattern matching to case trees is described in chapter 5 of The Implementation of Functional Programming Languages. This is what Sixten currently uses, "with some dependent pattern matching hacked in".

Compiling dependent pattern matching to case trees has been described by Jesper Cockx in:

Quantitative Type Theory (ie. Linear Types + Erasure)

Quantitative type theory promises to get us both erasure and linear types in the same feature.

The idea is that we could add usage count annotations (multiplicities) to each binder in order to track how many times the parameters can be used at runtime:

alloc : {a : Type} (1 op : (1 data : T) -> IO a) -> IO a
free : {a : Type} (1 data : T) -> IO ()
newMArray : {a b : Type} -> Int -> (1 f : (1 data : MArray a) -> b) -> b
write : {a : Type} (1 data : MArray a) -> (Int, a) -> MArray a
read : {a : Type} (1 data : MArray a) -> Int -> (MArray a, a)
freeze : {a : Type} (1 data : MArray a) -> Array a

There would be the following multiplicities initially:

  • 0: erased
  • 1: linear - must be used exactly once
  • ω: unbounded - can be used any number of times

Here are some resources:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.