nasso / koak Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 0.0 319 KB

Kind Of Advanced Kaleidoscope

License: BSD 3-Clause "New" or "Revised" License

Haskell 93.03% Makefile 0.25% C 0.10% Python 6.63%

koak's Introduction

hiii 😋

im a 🐐🐉 who writes 🦀 while listening to 🐀👑

you can find me on 🦋, 🐦 and 🐘

koak's People

Contributors

Stargazers

Watchers

koak's Issues

The parser should respect the precedence of the binary operators

Add a way to split code into multiple files ("modules")

Right now, there's no way to call a function defined in another file. We should provide an import/export mechanism to split work across multiple files.

Figure out how files are "imported"
Figure out how definitions are "exported"
Figure out how a standard library might be provided using this feature
Write a specification for the "modules" feature
Implement it in the parser, analyser and compiler
Implement a small library for demonstration purposes

Integration tests are still run when compilation fails

When running integration tests with tests.py, the compiler is compiled with stack build. However, if the compilation fails, the tests are still run (somehow?).

To Reproduce
Steps to reproduce the behaviour:

Run ./tests.py at the root of a clean repository
See no compilation error and tests running
Modify any Haskell source file to introduce a compilation error
Run ./tests.py again
See compilation error but tests are still run

Expected behaviour
The compilation error prevents tests from being run.

Actual behaviour
Tests are run despite the compilation having failed.

Add comments

Some people say they add "comments" to their code... No idea what they mean by that. We should support them, maybe. Or maybe not, I don't know.

// IEEE-veted random number generator
// See RFC 1149.5
fn random(seed: i32): i32 {
  4
}

Add more primitive types

The set of primitive types should be expanded.

Type errors aren't easy to read

Right now, errors emitted by the type-checker are simply printed using their Show instance. This is "cringe".

We should provide information about the file, line, and syntactic constructs related to the error. It would be even better to show more context and help/suggestions that can better explain what is wrong with the code.

Currently, the AST does not store any information about the location of the tokens in the source code. This is required to produce error messages after parsing.

Add the ability to omit the `;` in some situations

Sometimes a ; feels out of place:

if foo {
  bar();
};

It would be nice to be able to omit it when an if, while, for or any other control structure is used as a statement instead of an expression (basically wherever a statement is expected).

if foo {
  bar();
}

This could be implemented by trying to parse such structures before trying to parse an expression wrapped into an SExpr.

The parser needs to parse Binop operations

We need a complete binop operation's parser (+, -, *, /, >, >=, <, <=)

We can't run one specific integration test

There's currently no way to run a single integration test (or a group of tests). This can be very useful to avoid running all the test suite, because running all tests can become very slow as the test suite grows larger.

Parser MVP

We need a complete parser able to parse the MVP grammar defined in grammar.min.w3c-ebnf.

AST type definitions

Add more advanced types

"Weak" type aliases (type a ~ b, where a and b can be used interchangeably)
"Strong" type aliases (type a = b, where a and b are semantically distinct types, aka "newtype")
Tuples ((a, b, c, ...), the special case of the empty tuple () is already implemented)
Structures
Arrays
Algebraic data types
Pointers?
References?

With all that, it might become interesting to have some more advanced type-system features, such as traits, type classes, generics, move semantics, ownership/borrowing, polymorphism...

No strings?

There should be a "built-in" string type. We will also probably want to support string literals. This probably implies some sort of char type? Of course, strings should be UTF-8. I think. Good luck!

We should provide a small tour/guide/quick start to use the language

It can't do much for now, so there isn't a lot to say, but some documentation that explains how to use the language, its semantics, and how to compile programs written in it would be a nice thing to have.

GitHub's Wiki feature can be used for that.

Add a link to the wiki in the README
Explain how to get the compiler
Basic "Hello World!" tutorial
Variables (mutability, primitive types, scopes...)
Functions and arguments
Control flow (if, else, while, for...)

Add the ability to write `return;` instead of `return ();`

Having to write return (); is a bit verbose, most languages allow omitting the return value in functions returning nothing.

Add a Foreign-Function Interface

We should provide a way to use functions that are defined in a separate library (e.g. to use write from the libc). These functions should only have to be "declared" (not defined), and their name shouldn't be mangled.

Come up with a syntax
Add support for "external" function declaration (without a definition) in the IRs
Implement the feature in the parser, static analyser and compiler

Add command-line flags to keep source information in the binary

Something like cc -g would greatly help for debugging. No idea what implementing this implies, but it might have some work in common with #85.

Analyser MVP

Add project skeleton

Establishing a common base upon which the rest of the language can be implemented will allow us to rapidly develop an MVP by efficiently parallelising work.

Dead-code elimination

Right now, dead LLVM IR code is still being generated. The analyser could remove a lot of it by performing constant propagation:

fn main() {
  while false { <dead code> }

  if false { <dead code> }

  let a = false;

  if a && { <dead code> } {
    <dead code>
  } else {
    let b = !a;

    if b { } else { <dead code> }
  }

  return ();
  <dead code>
}

Note however that it should be possible to skip this optimisation step, as most (all?) integration tests can be reduced to a single constant value at compile-time! Ideally, the test suite would be run with and without such optimisations.

Add `==` and `!=` comparisons for empty type

They are constant, but make sense! I don't know if it should be handled by the compiler or the analyser though?

Compiler MVP

A compiler able to generate LLVM assembly or bitcode for basic programs with an arbitrary exit code (to verify program behaviour).

A way to get command-line arguments

Add a way to get command-line arguments, something like:

import args from "sys"

fn main() {
  let a = args.get(1);
  let b = args.get(2);

  let a = i32.parse(a);
  let b = i32.parse(b);

  print(a + b);
}

Command line interface

We should provide a CLI to parse, analyse and compile programs. We could also link object files together by calling an external linker such as clang's, ld or link.exe.

Add a way to chain conditional branches

Currently, conditional branches cannot be chained using a form of else if/elif construct. It would be nice to have a way to write something like:

if a {
  x
} else if b {
  y
} else {
  z
}

And be semantically equivalent to:

if a {
  x
} else {
  if b {
    y
  } else {
    z
  }
}

The command line interface should provide a way to compile and link object files

Motivation

Most compilers (like gcc or clang) provide a way to compile and link a program consisting of many source files.
They actually just issue a call to an external linker (such as ld, lld, gold or mold) with the correct arguments (library search paths, libraries to link to, whether to link statically or dynamically...).

Possible implementation

We should provide a similar interface: koak a.koa b.koa should compile a.koa and b.koa, and then link them together to produce an executable a.out binary. If neither a.koa or b.koa define an entry point (fn main), a linker error can be produced (unless a shared library is to be created).

Ideally, it should be possible to link to static libraries too (foo.a). This could be achieved by specifying -l/-L arguments or by directly specifying the path to foo.a as part of the input files:

koak main.koa -lfoo -L./lib
koak main.koa lib/foo.a

gcc/clang can also take object files to link them directly (skipping compilation). This is useful to perform incremental compilation (only compiling object files when their source has changed). koak a.o b.o should do the same as koak a.koa b.koa.

Entry point

The entry point of a Koa program is the main function. However, this isn't the actual entry point of a program on many platforms (e.g. it's _start on Linux). One way to make main the "entry point" is to link the object files to a small (platform dependent) runtime (here written in x86_64 assembly):

bits 64

section .text
  global _start
  extern main

_start:
  xor rax, rax  ; clear rax
  call main     ; call main
  mov rdi, rax  ; save return value (if main returns (), rax will stay 0)
  mov rax, 60   ; 60 = exit
  syscall       ; exit

This could also be included as LLVM inline assembly in modules containing a main function.