gelisam / klister Goto Github PK

an implementation of stuck macros

License: BSD 3-Clause "New" or "Revised" License

Haskell 98.04% Emacs Lisp 1.66% Nix 0.31%

klister's Introduction

Klister

Klister [TyDe 2020, video] is a programming language, a research prototype which combines features from Racket, ML, and a strict Haskell into a single language. It is named after its most distinguishing feature, "stuck macros" [Compose NYC 2019], as "Klister" is Danish for "adhesive".

#lang "prelude.kl"

-- do notation is not builtin syntax, it's implemented as a library!
(import "monad.kl")

-- An effectful action whose inferred type is (-> String (IO Unit))
(defun putStrLn (str)
  (write stdout (string-append str "\n")))

-- "run" is like main, except you can have more than one.
(run
  -- Klister doesn't have type classes yet, so "do" needs an explicit
  -- dictionary argument.
  (do io-monad
    (putStrLn "hello")
    (putStrLn "world")))

You can run the above program using either stack or cabal:

$ cabal run klister -- run examples/hello.kl
hello
world

Features

Features we borrow from Racket:

Custom syntax, via hygienic macros with easy-to-override hygiene.
Custom languages (#lang), via macros which reinterpret terms into those of an existing #lang.
Syntax objects, that is, s-expressions annotated with source locations and lexical information.
A module system which respects the phase system. Thus, if Klister one day supports generating binaries, those binaries will not be unnecessarily clogged with dependencies which were only needed at compile-time.

Features we borrow from ML:

A type system with parametric polymorphism, algebraic datatypes, and Hindley-Milner type inference.

Features we borrow from Haskell:

Monadic macros; our macros have type (-> Syntax (Macro Syntax)), where Macro Syntax is similar to Q Exp in TemplateHaskell. Note that this type implies that a macro is allowed to generate ill-typed code; this error is caught where the macro is called, not where the macro is defined. We thus aim for the expressivity of Template Haskell, not the extra guarantees of Typed Template Haskell.
Purely functional; primitives with compile-time side-effects (e.g. comparing identifiers while taking into account the current set of bindings) run in the Macro monad, while primitives with runtime side-effects (e.g. printing to stdout) run in the IO monad.
Higher-kinded types; for example monads are defined as a library.

Features which make Klister special (but not necessarily unique; see the bibliography for languages with similar features):

Type-providing macros; a macro can provide type information about the code it plans to generate.
Type-aware macros; a macro can obtain type requirements about the code it needs to generate.
Stuck macros; the above two features make it possible for macros to communicate, and thus to affect what each other generates. The language primitives are designed so that the order in which the macros are expanded cannot affect their results, and indeed the same is true for the order in which the macro expansion and type-inference steps are interleaved. This means that the order in which the type checker traverses a program and generates constraints is not visible to the authors of macros, providing a predictable programming model. This makes Klister code more robust to refactorings which affect that order.
Problem-aware macros; in addition to the type, a macro can learn which "problem" it needs to solve, namely whether it must generate an expression, a type, a pattern, etc. Each problem would correspond to a form of judgment if the language was formalized, e.g. a typing judgment for the expression problem, a well-formed type judgment for the type problem, etc.

Cool things which can be built using the above features:

Macros communicating via types
Custom type-driven code generation, via macros which generate code from a type.
Languages with custom type systems, via macros which reinterpret types into those of an existing #lang, and which contribute to type inference by providing type information about the code they generate. The variety of type-systems which can be implemented this way is unforunately limited by the core type system to which everything must be rewritten.
Languages with custom implicit terms, via macros which generate terms of an existing #lang based on a type in the new #lang.

While we think Klister demonstrates some neat ideas, there are some limitations which make Klister an impractical choice for most real-life projects. If you want to help make Klister a more practical language, please reach out!

Here are the most prominent Racket features which are missing from Klister:

Klister does not yet support custom readers, and thus every #lang looks like a Lisp. This also limits languages to Integer literals and String literals.
local-expand is planned, but not yet implemented.
Syntax parameters are planned, but not yet implemented.

Here are the most prominent Haskell features which are missing from Klister:

Type classes are planned as a library, but are not yet implemented.
Type annotations containing foralls are planned, but not yet implemented. Currently, Klister only supports type ascriptions, e.g. (+ (the Integer (* 2 3)) 1), for giving the type of a sub-expression.
Klister does not support GADTs nor type families.

Here are the most prominent features which Racket and Haskell both have but which are missing from Klister:

Klister is missing commonly-expected datatypes like Map, Set, and Double.
Klister requires functions and datatypes to be defined before they are used.
Klister does not support concurrency. It might be possible to implement a #lang with a green thread scheduler.
Klister does not support exception-handling. error and syntax-error both terminate the program immediately, like panic! in Rust. It is definitely possible to implement Either-based error handling, and it should be possible to implement a #lang in which exceptions are an ambient effect.
Klister does not have a rich ecosystem of libraries. It does not have a package repository where individual contributors can release their own packages. Please upload your Klister code to the examples folder, it currently contains all the Klister code which was ever written.
Klister does not have a rich set of IO primitives out of which you could build all the libraries you need yourself. Currently, you can only print to stdout.
A Foreign-Function-Interface (FFI), to reuse Haskell's rich ecosystem of libraries (and its own FFI to C), is planned but not yet implemented.
Expanding modules separately, to speed up expansion times, is planned but not yet implemented.
Klister does not produce binary executables.

Guide and Reference

The Klister Guide consists of the various commented examples linked from the above feature list, plus the extra information in the sub-sections below.

The Klister Reference covers every identifier in the "prelude.kl" language, but doesn't currently say much about each. It consists of a list of examples showing how to use the macros, and a list of type signatures documenting how to use the values and functions.

Imports

The import form will search for modules in the same directory as the importing module, and in directories listed in the KLISTERPATH environment variable, a :-separated list of directories.

Setting KLISTERPATH is needed for all tests to pass, as some rely on modules from the examples directory;

$ cabal test

Test suite klister-tests: RUNNING...
All tests
  Golden tests
    keyword-used-incorrectly:                          FAIL
      ...
      +Module not found: keyword-test.kl
      +Searched in directory at /.../examples/failing-examples
      +And on $KLISTERPATH:
      ...
    wrong-keyword:                                     FAIL
      ...
      +Module not found: keyword-test.kl
      +Searched in directory at /.../examples/failing-examples
      +And on $KLISTERPATH:
      ...
    bound-vs-free:                                     FAIL
      ...
      +Module not found: do.kl
      +Searched in directory at /.../examples/failing-examples
      +And on $KLISTERPATH:
      ...

3 out of 132 tests failed
Test suite klister-tests: FAIL

$ KLISTERPATH=examples cabal test
...
Test suite klister-tests: PASS

klister's People

Contributors

Stargazers

Watchers

Forkers

langston-barrett jameshaydon silky sckelemen mariszo icodein lkuper doyougnu typechecker up-do alxbnct

klister's Issues

On syntax design

I just heard your talk at ICFP. You are using a s-expression syntax, and expressed interest in an "extensible syntax" your language could use. I'm of the opinion that, usually, an extensible syntax is not what you want, so for your consideration I would like to show you the approach I prefer to use in programming language design.

LES is a language* designed to feel extensible while having a fixed syntax. It is built on the syntax-agnostic and language-agnostic concept of Loyc trees.

A fixed syntax offers the following advantages relative to a extensible syntax:

Syntax highlighting can show structure in isolation, without access to referenced libraries or any other files. And similarly for structure editors.
There is no feedback between semantic analysis and the parser, which makes parsing of large projects embarrasingly parallel; all source files can be parsed simultaneously. It also makes parsing cacheable, e.g. a memory-resident compiler can cache the parsed version of all files to improve compilation speed, or a non-resident compiler can cache blittable data structures in files that represent source code.

LES has the following additional advantages/characteristics:

LES version 2 is a superset of JSON
Unlimited set of binary operators, unary operators and "superexpressions"
The parser is relatively simple, and therefore can be included in the language's standard library for use at runtime, enabling homoiconicity
LES parses to Loyc trees. Loyc trees are syntax-agnostic, so it offers "separation of concerns" between syntax and semantics. In principle this allows you to support additional syntaxes like Python, Java, etc., although the only parser I've written for a mainstream language that produces Loyc trees is for C# (Enhanced C#).
Less wheel reinvention: it is conventional to design a new syntax for every new language. It's easier if you don't.

* really a family of languages, as LES3 will replace LES2; I'm planning to split LES3 into multiple versions while sunsetting LES2. Also Enhanced C# (and its language-agnostic macro processor, LeMP) is based on the same Loyc trees as LES.

Let me know if this is an interesting concept to you (and if not, why not).

KLISTERPATH

In support of #54, it would be nice to import modules by something other than their actual filepath. I'll work on something like PYTHONPATH for Klister.

Error when exporting the same identifier twice

License file

We need a license file, or our code may not be effectively open source! From http://choosealicense.com/no-license:

You’ll have to check with your own legal counsel regarding your particular project, but generally speaking, the absence of a license means that default copyright laws apply. This means that you retain all rights to your source code and that nobody else may reproduce, distribute, or create derivative works from your work.

FWIW, I really don't care what license we have.

implement Ballantyne et al's upcoming paper

I'm not sure how much I can write here since this is a public forum and the paper isn't published yet, so all I'll say is that this is a placeholder ticket for implementing a typed variant of the paper in question.

Type errors should contain Types

This is a TODO in the code, and we should do it 😄

Disallow incomplete pattern matches

The Core language has no source locations attached to it, because it should be failure-free. But it's not failure-free, you can have incomplete CoreCase expressions that fail. There are a few possible solutions:

Disallow incomplete syntax-case statements: Exhaustive pattern-matching checking is difficult, and we probably don't want to implement a special case of it here just for this purpose.
Attach a source location to CoreCase: This is basically a hack, and should be avoided.
Require a catch-all pattern: This is probably what I'll do.
Just make the user create a macro that adds a bunch of extra catch-all patterns at the end of their case statements: This seems like a hard macro to write, so let's avoid this solution until we've written it.

Integers, not signals

Now that we have type-aware macros, the signals we were playing with at the beginning as a proof of concept have outlived their usefulness. Let's drop the signalling machinery, but keep the parser and treat them as proper integers. A few primitive operations like +, -, *, and integer=? would be nice.

Kind-aware type macros

Just as expression macros can block on unsolved type metas, once #92 is done, we should get type macros that can block on unsolved kind metas.

Golden testing for expected failures

The non-examples directory contains files we expect to fail. We should augment the testing infrastructure to ensure they continue to fail in the way we expect.

Example: Simple ad-hoc polymorphism

After the merge of #53, we should try to implement some simple ad-hoc polymorphism via type-aware macros as a warm-up for typeclasses. A good example might be a version of Haskell's fmap that works on the macro monad, List, Maybe, and a few other prelude/kernel datatypes.

Get rid of hpack

From my perspective, hpack is causing more problems than it solves.

With common stanzas, the repetetiveness of the cabal file format is much reduced, and I keep having to remember to run hpack when I'm building the project. It's also been causing some intermittent CI issues.

How do you all feel about replacing it with a .cabal file with a common stanza?

Use primitive values whenever possible

For example, it is not possible to pass >>= as an argument to a higher-order function, we have to write (lambda (mx cc) (>>= mx cc)) instead because >>= is a macro.

We only did this because at the time, we were only able to define primitive macros, we did not yet have the ability to define primitive values. Now we do!

Error primitive

We should have something like Haskell's error. The argument should be a syntax object with a message in it, so that it can report a source location for the error - this will be useful to make macros that expand to error cases. RE #36 .

clarify meta variables in error messages

A type error like

Expected
  ((META(MetaPtr 8524) → Bool) → (META(MetaPtr 8512) → META(MetaPtr 8403)))
but got
  ((META(MetaPtr 8524) → Bool) →
   ((List META(MetaPtr 8524)) → (Maybe META(MetaPtr 8524))))
Specifically, Syntax doesn't match (List META(MetaPtr 8524))

Is pretty hard to read. Let's do like GHC and pick short names for those meta variables, like this:

Expected
  ((a → Bool) → (b → c))
but got
  ((a → Bool) → ((List a) → (Maybe a)))
Specifically, Syntax doesn't match (List a)

Now that the error is readable, we can actually see a more important problem: we didn't even replace the meta variable b with its value Syntax, thus resulting in a seemingly-inconsistent error message!

support organizing modules into folders

The import statement takes a string containing a filename, which makes it look like we can specify any path, but it only works for files which are directly in the KLISTERPATH. For example, if the current file is at examples/advent-2020/day1.kl, I cannot use (import "../list-datatype.kl") to import examples/list-datatype.kl nor (import "day1/input.kl") to import examples/advent-2020/day1/input.kl. More importantly, even if KLISTERPATH points to examples, I cannot use (import "advent-2020/day1/input.kl") to import examples/advent-2020/day1/input.kl.

hygiene for helper functions

When I define a public-facing macro in terms of an auxiliary helper, I often prefer to define my helper as a local function, this way they can take arguments of type other than Syntax, and they can access the variables which are lexically-bound inside the public-facing macros, I don't have to encode them inside the auxiliary macro's Syntax argument. Unfortunately, this means that all the recursive iterations of that helper function are within the same macro invocation, and therefore that binders introduced at different iterations may shadow each other. It is possible to bypass the problem using make-introducer to generate a unique identifier, gensym-style. But how about defining special hygiene-aware functions which go through the dance of adding and flipping a Scope in order to get the same automatic hygiene we already get from macros?

To recap, before a macro is invoked, the expander adds a fresh Scope to its input Syntax, runs the macro, and then flips that Scope on the resulting output Syntax. The result is that all the identifiers which were passed by the caller don't have that Scope anymore (it was added and then flipped off), while the identifiers which were introduced by the macro do (they did not have that Scope and then it was flipped on). We could thus define a syntax-aware function as one which emulates that behaviour by adding a Scope to its input Syntax, runs its body, and then flips that Scope on the output Syntax.

migrate away from Travis CI

As this reddit post explains, Travis is dropping support for open-source projects, so we should consider switching to a different CI provider. The post recommends Github Actions, so that's what I plan to look at, but I'm open to other providers, e.g. Circle CI, if others have had a good experience with something.

examples/io.kl fails on CI, but only for macOS

The error message says

  Test output was different from 'examples/io.golden'. It was:
  #<output port> : Output-Port
  #<IO action> : (IO Integer)
  hello world!

which is weird because:

this is exactly the expected output
this is macOS, not Windows, so there shouldn't be any newline issues
I am running macOS on my laptop and that test passes for me

dot-dot-dot outside of macros

in #115 , I generated dot-dot-dot.kl, which provides a variant of syntax-case which supports nested patterns, matching keywords in a hygienic way, and binding the tail of a list to a variable using ,(xs ...). It is nice, but it only works inside macros! The reason is that the keyword comparison uses free-identifier=?, which itself only works inside macros. Since Syntax is a very convenient (albeit not very precise) type for representing just about any kind of data, it would be quite convenient to also provide a variant of syntax-case which does work outside macros, at the cost of dropping support for keywords.

Use -Werror in CI

How do we feel about enforcing GHC warnings in CI? It speeds up my workflows because I know that if I see a warning, it's my fault and I need to fix it (I don't have to worry about "oh, was that there before?").

Strings

It would be nice to have a string datatype! There are two ways I can see doing this:

Introduce a Char datatype and have String := List Char, a la Haskell. This has all the same benefits and drawbacks as it did in Haskell, namely all the operations on such can already be defined in Klister, but the performance is horrible.
Introduce a primitive String type and a few operations on it. What would be a good basis set? We could simply have cons and uncons, but that reduces this approach back to the above.

Get rid of 'Unique'

The use of Unique in Core means that we can't generate ASTs outside of IO, which is inconvenient for things like testing with Hedgehog.

Negative integer literals

The reader currently parses -5 as an identifier rather than an integer literal.

"expected" and "actual" are still swapped?

I remember that our error messages used to say "expected ACTUAL but found EXPECTED". We fixed that, but the error message is still backwards:

-- Type mismatch at unknown location. Expected Syntax but got (List Integer)
(example (the (List Integer) 'foo))

If I had written the type-checker myself, I would definitely have used unification so that I wouldn't have to think about when to check and when to synthesize. And then I would expect to see bugs like the above, because there is no way to know which side of a unification is the expected type and which side is the actual type. But since @david-christiansen wrote the type-checker, I'm pretty sure he used a bidirectional algorithm, in which we do know which side is expected and which side is actual. So what's going on?

Also, why is the location unknown? The very similar (example (+ 'foo 1)) does give a location.

Circular imports aren't handled

This will just spin:

$ cat examples/circular-1.kl
#lang kernel
(import "circular-2.kl")

$ cat examples/circular-2.kl
#lang kernel
(import "circular-1.kl")

demonstrate that separate compilation is possible

Klister is an interpreter, not a compiler, but we want to demonstrate that its ideas are applicable in a wider setting in the hope that they will be adopted by a larger project such as ghc. To this end, we should serialize each module after expansion, so that we can load this binary artifact (which would be compiled code in a compiler, but would consist of Maps of bindings in our context) instead of re-traversing a module.

This would also improve the performance of our test suite, as several examples happen to load the same modules (e.g. the prelude).

One thing we need to be careful about is our newUnique calls: Uniques are only guaranteed to be unique within one invocation of a Haskell program, not between runs! Therefore, we need to use a slightly more complex system of references. For example, we could write our own variant of newUnique which increments a different counter for each module, and each reference would include both a module identifier and that unique number.

The first argument to vec-syntax should be in parens

The first arugment to vec-syntax has an arbitrary number of elements in it, so it should be wrapped in ().

IO monad

Yay interactive programs!

Installation and usage documentation

As @david-christiansen pointed out in #57, we should have some documentation on installation and usage of Klister

We should have syntax parameters

This would make a number of macros nicer to write. See http://scheme2011.ucombinator.org/papers/Barzilay2011.pdf for details.

Actually use #%module

Right now, the system is ignoring #%module. This should be fixed by making the reader work more like Racket's in the presence of #lang.

Ditch paren-square bracket distinction in syntax

Now that I'm writing some more macros, the limitations of our approach to distinguishing fixed-arity vs variable-arity operators is becoming more clear.

In particular, I've been trying to write a quasiquoting macro. This would be used for writing other macros much more conveniently. However, the fixed-arity nature of our square bracket pattern matching operator means that I essentially need to copy-paste the pattern-match cases, once for each length of square-bracketed-list that I want to be able to support. I can't even generate all the cases with a metaprogram :-)

The way I see it, there's two ways forward: (1) eliminate the distinction, or (2) provide variable-arity metaprogramming operators for fixed-arity programs (e.g. a list-like API to the square-bracketed vectors).

Replicating the variable-arity API for fixed-arity things seems to be a failure, so I propose that we do number 1. Any objections?

Module search order

@david-christiansen suggested on #59 that imports be resolved thusly:

What about resolving ambiguities thusly: first, check for adjacent files with the given name. Then, check for stdlib. Only after that, fall back on KLISTERPATH.

Literate Klister

As we build out some examples of cool type-aware stuck macros, maybe it would make sense to write about them at the same time. Such commentary could be reused in slides or papers.

One nice way to do this (IMO) would be to embed Klister in, say Markdown or Org, like Literatw Agda or Literate Haskell.

Simpler primitives than `syntax-case`

Now that we have datatypes and integers, I think we can do better than the built-in syntax-case operator. Here's a proposal:

First, we reflect our ExprF functor in the kernel language as the equivalent of:

(datatype (List A) ...)
(datatype (Syntax-Contents A)
  (list-syntax (List A))
  (integer-syntax Integer)
  (string-syntax String)
  (identifier String))

This requires making lists built-in like Bools.

Next, we can have two operations: open-syntax : (-> Syntax (Syntax-Contents Syntax)) and close-syntax : (-> Syntax Syntax (Syntax-Contents Syntax) Syntax). The former exposes one level down into the syntax object, and the latter takes some contents along with a source of scopes and a source of position information, creating a syntax object.

This is enough to implement something like our current syntax-case, but it also allows macros to manufacture identifiers (e.g. if we wanted to have a macro that created a datatype's signature functor with a -F suffix). It allows us to greatly simplify our evaluator and expander, because we replace builtins with primitive functions. And it ought to allow for some simpler destructuring patterns in some cases.

Thoughts?

Deep pattern matching in kernel language

After working a bit with pattern macros, I think we should have deep pattern matching in the core language rather than as a library.

This is mainly because it makes pattern macros much more useful. Right now, we can't define e.g. list as a pattern macro, because it won't work here. I don't think it's so hard to implement, either, especially if we're not caring too much about optimal performance (although proper case trees are not that hard to do either, and probably worth it).

Another thing to consider is whether pattern macros should actually expand to patterns, or whether we should expose some sort of more fundamental interface (e.g. success/failure continuations), which would enable things like view patterns or guards to be defined, I think. But that's for another time, I suppose.

The REPL is mostly useless

Right now, expressions entered in the REPL don't have an appropriate lexical context associated with them. This means that there's no way to associate names at the REPL to actual defined names.

We should fix this. There's a few design possibilities:

The REPL is always in #lang kernel, but it can import and export things from modules. In this model, the REPL is basically an anonymous module. In this case, REPL commands should be declarations. This way is pretty easy to set up in the code - we just need to maintain a set of scopes to enrich each expression with prior to expanding it.
The REPL is started in a particular context. For instance, one could say "give me a REPL for foo.kl line 30 column 20". This would probably have to find the next module-context position to avoid complications with local bindings (or we could implement computation with open terms, but that seems a bit out-of-scope for an ML). To do this, we'd have to instrument the expander to load that module, but watch for the set of scopes to use at that site. Then, we'd stick those scopes on each REPL command, and treat REPL forms as being examples.
We wing it, and invent a language that's almost-but-not-quite Klister with REPL semantics. This is the approach taken in GHC (where ghci allows things that aren't really Haskell expressions, and has a collection of meta-commands as well) and in Racket (where the top level is hopelessly confusing with respect to modules, bindings, and effects if you scratch it a bit).

I like number 2, but REPL-in-context is also one of my hobby horses. How do you all feel about it?

Logical module paths?

At the moment, module names are simply names of files in the same directory as the importing module. With #57, they will be names of files in some set of directories. We don't yet have any support for organizing and laying out modules on the filesystem. This results in potential for conflicts between module names, and doesn't encourage organizing modules into hierarchies.

Many languages (Haskell, Coq, Python) have a notion of a "logical module path", usually like TopLevelName.SubName.ModuleName, which corresponds to a directory structure like TopLevelName/SubName/ModuleName.lang on disk. This is a pretty well-understood way to mitigate the above issues. Should we do something similar in Klister?

Imports should fail if the name isn't exported from the module

This should fail because vec-syntax isn't exported from n-ary-app.kl

#lang kernel
[import [shift (only "n-ary-app.kl" define lambda pure quote syntax-case vec-syntax) 1]]

Redundant pattern match in Expander

https://travis-ci.org/gelisam/klister/jobs/595776213#L770

src/Expander.hs:918:9: warning: [-Woverlapping-patterns]
    Pattern match is redundant
    In a case alternative: EUserMacro _otherVal -> ...
    |
918 |         EUserMacro _otherVal ->
    |         ^^^^^^^^^^^^^^^^^^^^^^^...

add Set and Map to the standard library

I'm trying to use Klister to implement some adventofcode.com problems in order to discover some rough edges, and oh my does it have a lot! Here is the first of many.

Most algorithms involve basic data structures like Set and Map, but all we provide out of the box is lists, numbers, strings, boolean, and syntax objects.

I see several ways forward:

don't bother; Klister is a research language, no need to make it practical.
reimplement Set and Map in Klister
add Set and Map as new primitives
add FFI support so we can wrap Haskell's Set and Map as a library

Personally, I prefer (4) even though it is the hardest to implement, because Set and Map are only the tip of the iceberg. One thing I would like to note is that in some languages, importing from a source language via the FFI yields functions which don't fit well in the target language or whose types aren't as precise as we can express in the target language. In Klister, however, our innovation is in our macro system, not in our type system, so we have the opposite problem: there are many Haskell types which we cannot express! I think this makes FFI a rather good approach to fleshing out our standard library, at least for the functions whose type we can express.

Expand declaration group macros to multiple declarations

It should be possible to generate multiple declarations from one declaration macro. Here's a suggested API:

(group
  d1
  d2
  d3 ...)

causes the results of expanding d1, d2, d3, etc to be inlined into the module at that spot. This is analogous to top-level begin in Racket.

We could also implement this with two simpler primitives:

(no-decls)

has no effect on the module, and

(decls d1 d2)

are d1 and d2 next to each other. Nested uses of decls ending in (no-decls) can simulate group.

More support for passing types around in the expander

Right now, almost all types (with the exception of annotations using the) are inferred. Even with macrology,the is insufficient, because it only supports ascribing types, rather than type schemes.

Here's a sketch of a design for definition annotations.

Desiderata:

Allow partial specification of types/schemes. We don't want to have an all-or-nothing approach, because part of the definition may come from one macro, and part from another. Something like Haskell's PartialTypeSignatures seems much more important here.
Respect lexical scoping. A hack like ScopedTypeVariables is not great, because definition forms really bind type variables simultaneously in the equivalent of a System F forall and big-lambda.
Don't require large algorithmic changes in the type checker. Right now, the generalization machinery uses the notion of binding levels that I learned from @sestoft rather than scanning the context for free variables, and I'd like to stick with that mechanism for the sake of both efficiency and implementation simplicity.

Here's how I think we can accomplish these goals.

First off, we enrich each syntactic form that's subject to generalization (that is, define, flet, let, and example) with an extra set of parens that binds rigid variables, which we model as fresh type constants in the expander, with an encoding like the one used for names of datatypes. Example syntax in the kernel language:

(define (A) id (the (-> A A) (lambda (x) x)))

We can make this optional in non-kernel languages using a little macro. These bound type variables can only unify with themselves and with metas from their own scope (which means our notion of binding level might need to become a bit more informative, and instead talk about nested scopes rather than just being natural numbers - I'll think about this a bit). When it comes time to generalize, these turn into bound variables from the type scheme. Things to be aware of: 1. rigid type vars that don't get used don't get generalized - so it's "free" to bind a few. 2. unconstrained metas still get generalized, so (define (A) (lambda (x) (lambda (y) (pair (the A x) y)))) will be a (forall (A B) (-> A (-> B (Pair A B)))).

Next, we need a way to provide partial type information to sub-invocations of macros (see @gelisam's talk for the setPartialType operator and its usefulness). I think we can do this with another expression form, with-unknown-type. An expression

(with-unknown-type (M) EXPR)

elaborates EXPR, but with M bound to a new type metavariable. This will allow something like:

(with-unknown-type (A)
  (the (-> (List A) (List A)) (my-macro)))

which lets my-macro do a bit of type-casing without getting blocked. These fresh metas are created at the current binding level, and are subject to generalization from enclosing (but not enclosed) scopes.

We could do something like:

foo :: forall a . a -> _ -> _
foo x y = (x, y)

by expanding to:

(define foo (A)
  (with-unknown-type (B)
    (with-unknown-type (C)
      (the (-> A (-> B C))
        (lambda (x) (lambda (y) (pair x y)))))))

Thoughts?

Example: Type-directed synthesis (AKA tactics)

A fun type-aware macro example would be a macro synth that could be given a type as an argument, and would attempt to create a value of that type. I imagine it would be most useful for function types, e.g.

(the (-> A (-> B A)) -- const (for some concrete types A and B)
     (synth))

Prelude/stdlib

We have a lot of useful syntax, datatypes, and functions them scattered throughout examples. We should collect these into a cohesive, organized set of modules that could be easily imported.

pmatch port

We should port Oleg's pmatch: https://github.com/webyrd/quines/blob/master/pmatch.scm

Scoping rules broken when importing many modules

The modules examples/list-datatype.kl and examples/lispy-do.kl each type check independently. Thus, importing them should never break.

However, here's two files that break:

#lang "n-ary-app.kl"

(import "list-datatype.kl")

(import "lispy-do.kl")

gives

No progress was possible:
    (ExpandVar Ty {unTy = TFun (Ty {unTy = TMetaVar (MetaPtr 3286)}) (Ty {unTy = TFun (Ty {unTy = TMetaVar (MetaPtr 3272)}) (Ty {unTy = TSyntax})})} (SplitCorePtr 3287) Syntax {_unSyntax = Stx {_stxScopeSet = ScopeSet {_universalScopes = fromList [Scope {scopeNum = 307, scopePurpose = "Module root for /home/davidc/Code/Haskell/stuck-macros/examples/let.kl"},Scope {scopeNum = 308, scopePurpose = "For import at les/let.kl:3.1-4.1"},Scope {scopeNum = 309, scopePurpose = "For import at les/let.kl:4.1-5.1"},Scope {scopeNum = 477, scopePurpose = "For import at les/let.kl:5.1-6.1"},Scope {scopeNum = 478, scopePurpose = "For import at les/let.kl:6.1-8.1"}], _phaseScopes = fromList [(Phase {phaseNum = 0},fromList [Scope {scopeNum = 0, scopePurpose = "Root for phase p0"},Scope {scopeNum = 479, scopePurpose = "For macros at les/let.kl:8.1-27.1"}]),(Phase {phaseNum = 1},fromList [Scope {scopeNum = 3, scopePurpose = "Root for phase p1"},Scope {scopeNum = 482, scopePurpose = "For variable \"stx\""},Scope {scopeNum = 484, scopePurpose = "For variable \"_\""},Scope {scopeNum = 485, scopePurpose = "For variable \"args\""},Scope {scopeNum = 486, scopePurpose = "For variable \"body\""}])]}, _stxSrcLoc = SrcLoc {_srcLocFilePath = "/home/davidc/Code/Haskell/stuck-macros/examples/let.kl", _srcLocStart = SrcPos {_srcPosLine = 21, _srcPosCol = 23}, _srcLocEnd = SrcPos {_srcPosLine = 21, _srcPosCol = 26}}, _stxValue = Id "map"}} (Var 1252))
    (ExpandVar Ty {unTy = TFun (Ty {unTy = TMetaVar (MetaPtr 3294)}) (Ty {unTy = TFun (Ty {unTy = TMetaVar (MetaPtr 3280)}) (Ty {unTy = TSyntax})})} (SplitCorePtr 3295) Syntax {_unSyntax = Stx {_stxScopeSet = ScopeSet {_universalScopes = fromList [Scope {scopeNum = 307, scopePurpose = "Module root for /home/davidc/Code/Haskell/stuck-macros/examples/let.kl"},Scope {scopeNum = 308, scopePurpose = "For import at les/let.kl:3.1-4.1"},Scope {scopeNum = 309, scopePurpose = "For import at les/let.kl:4.1-5.1"},Scope {scopeNum = 477, scopePurpose = "For import at les/let.kl:5.1-6.1"},Scope {scopeNum = 478, scopePurpose = "For import at les/let.kl:6.1-8.1"}], _phaseScopes = fromList [(Phase {phaseNum = 0},fromList [Scope {scopeNum = 0, scopePurpose = "Root for phase p0"},Scope {scopeNum = 479, scopePurpose = "For macros at les/let.kl:8.1-27.1"}]),(Phase {phaseNum = 1},fromList [Scope {scopeNum = 3, scopePurpose = "Root for phase p1"},Scope {scopeNum = 482, scopePurpose = "For variable \"stx\""},Scope {scopeNum = 484, scopePurpose = "For variable \"_\""},Scope {scopeNum = 485, scopePurpose = "For variable \"args\""},Scope {scopeNum = 486, scopePurpose = "For variable \"body\""}])]}, _stxSrcLoc = SrcLoc {_srcLocFilePath = "/home/davidc/Code/Haskell/stuck-macros/examples/let.kl", _srcLocStart = SrcPos {_srcPosLine = 20, _srcPosCol = 33}, _srcLocEnd = SrcPos {_srcPosLine = 20, _srcPosCol = 36}}, _stxValue = Id "map"}} (Var 1252))
    (AwaitingMacro ExprDest (Ty {unTy = TMetaVar (MetaPtr 3199)}) (SplitCorePtr 3196) (TaskAwaitMacro Stx {_stxScopeSet = ScopeSet {_universalScopes = fromList [Scope {scopeNum = 307, scopePurpose = "Module root for /home/davidc/Code/Haskell/stuck-macros/examples/let.kl"},Scope {scopeNum = 308, scopePurpose = "For import at les/let.kl:3.1-4.1"},Scope {scopeNum = 309, scopePurpose = "For import at les/let.kl:4.1-5.1"},Scope {scopeNum = 477, scopePurpose = "For import at les/let.kl:5.1-6.1"},Scope {scopeNum = 478, scopePurpose = "For import at les/let.kl:6.1-8.1"},Scope {scopeNum = 479, scopePurpose = "For macros at les/let.kl:8.1-27.1"}], _phaseScopes = fromList [(Phase {phaseNum = 0},fromList [Scope {scopeNum = 0, scopePurpose = "Root for phase p0"}])]}, _stxSrcLoc = SrcLoc {_srcLocFilePath = "/home/davidc/Code/Haskell/stuck-macros/examples/let.kl", _srcLocStart = SrcPos {_srcPosLine = 16, _srcPosCol = 5}, _srcLocEnd = SrcPos {_srcPosLine = 16, _srcPosCol = 8}}, _stxValue = "let"} [(SplitCorePtr 3295),(SplitCorePtr 3287)] #[let.kl:30.10-33.24]
     <(let
       ((x (quote five)))
       (let ((x (quote two)) (y x)) (quasiquote ((unquote y) (unquote x)))))>))
    (GeneralizeType (SplitCorePtr 3196) _ _)
    (GeneralizeType (SplitCorePtr 3184) _ _)
    (AwaitingMacro ExprDest (Ty {unTy = TMetaVar (MetaPtr 3187)}) (SplitCorePtr 3184) (TaskAwaitMacro Stx {_stxScopeSet = ScopeSet {_universalScopes = fromList [Scope {scopeNum = 307, scopePurpose = "Module root for /home/davidc/Code/Haskell/stuck-macros/examples/let.kl"},Scope {scopeNum = 308, scopePurpose = "For import at les/let.kl:3.1-4.1"},Scope {scopeNum = 309, scopePurpose = "For import at les/let.kl:4.1-5.1"},Scope {scopeNum = 477, scopePurpose = "For import at les/let.kl:5.1-6.1"},Scope {scopeNum = 478, scopePurpose = "For import at les/let.kl:6.1-8.1"},Scope {scopeNum = 479, scopePurpose = "For macros at les/let.kl:8.1-27.1"}], _phaseScopes = fromList [(Phase {phaseNum = 0},fromList [Scope {scopeNum = 0, scopePurpose = "Root for phase p0"}])]}, _stxSrcLoc = SrcLoc {_srcLocFilePath = "/home/davidc/Code/Haskell/stuck-macros/examples/let.kl", _srcLocStart = SrcPos {_srcPosLine = 16, _srcPosCol = 5}, _srcLocEnd = SrcPos {_srcPosLine = 16, _srcPosCol = 8}}, _stxValue = "let"} [(SplitCorePtr 3295),(SplitCorePtr 3287)] #[let.kl:27.10-27.29]<(let ((x (quote five))) x)>))

and

#lang "n-ary-app.kl"

(import (shift "list-datatype.kl" 1))

(import (shift "lispy-do.kl" 1))

gives

Type mismatch at let.kl:21.23-21.26.
Expected (List META(MetaPtr 3303)) but got Syntax

which indicates that the wrong binding for map is being chosen (the one from list-datatype.kl instead of the one from list.kl).

I suspect that there's some state that needs to be cleared to the side when recursively expanding, but that this isn't happening.

macros should map between typed ASTs

A macro is currently defined by a function of type Syntax -> Macro Syntax. This requires encoding arguments to other macros as Syntax values, and then parsing them back into values in the callee. This also precludes passing opaque values such as functions and Macro actions to another macro.

Inspired by this Twitter discussion, I suggest to generalize: a macro should instead be a function of type a -> Macro b. This implies that the (generated) code would have regions, a bit like in PHP where some regions contain HTML, others contain PHP code, others contain CSS, JavaScript, etc. Each region uses a different type (Syntax, or some other alternative) to represent its AST. A macro of type a -> Macro b must appear in a region whose AST is b, and delimits a region inside of which the AST is a.

One day this might be directly exposed to the user via reader macros, but as a first step, let's say that the user writes a program using Syntax's s-expressions as a surface syntax, that every form in that program is a macro, and that once everything is expanded, the result is a Core program which uses the Core as its AST, not Syntax. In between macros may introduce various other intermediate AST types as they find it convenient. For example, a public-facing macro may accept a Syntax, but delegate most of the work to an auxiliary macro which takes in the part of the Syntax which remains to be processed plus a Macro Syntax action as an accumulator argument. No problem: that auxiliary function can take a (Syntax, Macro Syntax) as an argument, and return whatever AST the public-facing macro returns.

Error primitive should use strings, not syntax

After the merge of #82 we should be able to construct better errors using string-append.