dibyendumajumdar / ravi Goto Github PK

Ravi is a dialect of Lua, featuring limited optional static typing, JIT and AOT compilers

License: Other

LLVM 2.85% C 86.03% C++ 0.01% Lua 10.11% Makefile 0.39% Python 0.14% CMake 0.26% Batchfile 0.12% Shell 0.05% Dockerfile 0.04%

lua jit programming-language mirjit c

ravi's Introduction

Ravi Programming Language

Ravi is a dialect of Lua with limited optional static typing and features a JIT compiler powered by MIR as well as support for AOT compilation to native code. The name Ravi comes from the Sanskrit word for the Sun. Interestingly a precursor to Lua was Sol which had support for static types; Sol means the Sun in Portugese.

Lua is perfect as a small embeddable dynamic language so why a derivative? Ravi extends Lua with static typing for improved performance when JIT compilation is enabled. However, the static typing is optional and therefore Lua programs are also valid Ravi programs.

There are other attempts to add static typing to Lua - e.g. Typed Lua but these efforts are mostly about adding static type checks in the language while leaving the VM unmodified. The Typed Lua effort is very similar to the approach taken by Typescript in the JavaScript world. The static typing is to aid programming in the large - the code is eventually translated to standard Lua and executed in the unmodified Lua VM.

My motivation is somewhat different - I want to enhance the VM to support more efficient operations when types are known. Type information can be exploited by JIT compilation technology to improve performance. At the same time, I want to keep the language safe and therefore usable by non-expert programmers.

Of course there is the fantastic LuaJIT implementation. Ravi has a different goal compared to LuaJIT. Ravi prioritizes ease of maintenance and support, language safety, and compatibility with Lua 5.3, over maximum performance. For more detailed comparison please refer to the documentation links below.

Features

Optional static typing
Type specific bytecodes to improve performance
Compatibility with Lua 5.3 (see Compatibility section below)
Generational GC from Lua 5.4
defer statement for releasing resources
Compact JIT backend MIR
A distribution with batteries
A Visual Studio Code debugger extension - interpreted mode debugger
A new compiler framework for JIT and AOT compilation
AOT Compilation to shared library
Preview feature: Ability to embed C code snippets

Articles about Ravi

Documentation

Lua Goodies

An Introduction to Lua attempts to provide a quick overview of Lua for folks coming from other languages.
Lua 5.3 Bytecode Reference is my attempt to bring up to date the Lua 5.1 Bytecode Reference.
A patch for Lua 5.3 implements the 'defer' statement.
A patch for Lua 5.4.[0-2] implements the 'defer' statement.
Updated patch for Lua 5.4.[3-4] implements the 'defer' statement.

Lua 5.4 Position Statement

Lua 5.4 relationship to Ravi is as follows:

Generational GC - back-ported to Ravi.
New random number generator - back-ported to Ravi.
Multiple user values can be associated with userdata - under consideration.
<const> variables - not planned.
<close> variables - Ravi has 'defer' statement which is the better option in my opinion, hence no plans to support <close> variables.
Interpreter performance improvements - these are beneficial to Lua interpreter but not to the JIT backends, hence not much point in back-porting.
Table implementation changes - under consideration.
String to number coercion is now part of string library metamethods - back-ported to Ravi.
utf8 library accepts codepoints up to 2^31 - back-ported to Ravi.
Removal of compatibility layers for 5.1, and 5.2 - not implemented as Ravi continues to provide these layers as per Lua 5.3.

Compatibility with Lua 5.3

Ravi should be able to run all Lua 5.3 programs in interpreted mode, but following should be noted:

Ravi supports optional typing and enhanced types such as arrays (see the documentation). Programs using these features cannot be run by standard Lua. However all types in Ravi can be passed to Lua functions; operations on Ravi arrays within Lua code will be subject to restrictions as described in the section above on arrays.
Values crossing from Lua to Ravi will be subjected to typechecks should these values be assigned to typed variables.
Upvalues cannot subvert the static typing of local variables (issue #26) when types are annotated.
Certain Lua limits are reduced due to changed byte code structure. These are described below.
Ravi uses an extended bytecode which means it is not compatible with Lua 5.x bytecode.
Ravi incorporates the new Generational GC from Lua 5.4, hence the GC interface has changed.

Limit name	Lua value	Ravi value
MAXUPVAL	255	125
LUAI_MAXCCALLS	200	125
MAXREGS	255	125
MAXVARS	200	125
MAXARGLINE	250	120

When JIT compilation is enabled there are following additional constraints:

Ravi will only execute JITed code from the main Lua thread; any secondary threads (coroutines) execute in interpreter mode.
In JITed code tailcalls are implemented as regular calls so unlike the interpreter VM which supports infinite tail recursion JIT compiled code only supports tail recursion to a depth of about 110 (issue #17)
Debug api and hooks are not supported in JIT mode

History

2015
- Implemented JIT compilation using LLVM
- Implemented libgccjit based alternative JIT (now discontinued)
2016
- Implemented debugger for Ravi and Lua 5.3 for Visual Studio Code
2017
- Embedded C compiler using dmrC project (C JIT compiler) (now discontinued)
- Additional type-annotations
2018
- Implemented Eclipse OMR JIT backend (now discontinued)
- Created Ravi with batteries.
2019
- New language feature - defer statement
- New JIT backend MIR.
2020
- New parser / type checker / compiler
- Generational GC back-ported from Lua 5.4
- Support for LLVM backend archived
2021
- Integrated AOT and JIT compilation support
- Embedded C syntax
Current Priorities
- Improve Embedded C support with more validation
- Improve tests and documentation overall
- Ensure new compiler is production grade (i.e. always generates correct code)

License

MIT License

ravi's People

Contributors

Stargazers

Watchers

ravi's Issues

Array initializer with multiple items fail

This is because the way Lua sets these values is from the largest index to smallest index - presumably to optimise performance. But we need to do the reverse when setting ravi arrays as we only grow the array if the len+1 element is set.

Too much memory being consumed during some tests

When auto compiling all Lua functions, the memory usage increases greatly during the running of Lua tests. The problem has become apparent when running tests with -port=true; previously I was running tests with -U=true.

The issue seems to be because in JIT mode each compiled function has associated with it the LLVM module, execution engine plus the generated code. But Lua's garbage collector is unaware of the size of the function objects - hence the rate at which these objects are collected / finalized is not fast enough.

Putting a call to lua_gc(L, LUA_GCSTEP, 200) seems to help but then causes tests in gc.lua to fail.

For now I have put in a call to lua_gc(L, LUA_GCCOLLECT, 0) after every n JIT compilations - n can be configured. This is crude but appears to limit the amount of memory being used. A better solution is required - one that doesn't cause a full GC.

OP_TAILCALL optimization for tail recursion

Crash on Windows when throwing error

On Windows there is occasionally a crash when Lua attempts a Longjmp because an error has been raised. Windows complains about misaligned or invalid stack.

The error does not occur on Linux or MAC OSX (Yosemite).

Invoking ravi.iscompiled() on C function causes a crash

This is because the check on function type is not differentiating between C and Lua functions.

libgccjit based JIT missing OP_IDIV implementation

libgccjit based JIT missing OP_MOD implementation

To remind myself

Incorrect compilation of expression like (10)[3]

In Lua test events.lua line 395 there is an expression such as:

(10)[3]

This is incorrectly identified as of type integer - as the sub-expression's type is propagated.

Thanks to the fact that via debug api even primitive types can have a metatable assigned - above snippet is actually valid Lua code.

pairs() does not work on Ravi arrays

A value assigned from GETTABLE to a variable ignores the type

ipairs() does not work on Ravi arrays

Local variable initialization fails if assigned to a value from array

local x: int[] = { 1 }
local y: int = x[1]

Implement slices of arrays

Slices will allow efficient access of subset of arrays, for example, each matrix column of a matrix could be accessed via slice.

Initial work checked in - outstanding issues:
a) Need to move the API to Ravi module
b) After a slice has been created if the original table array is reallocated the slice will be invalid. Simple solution is to disallow size change after slices are created

libgccjit based JIT missing bitwise operations

Implement an alternative JIT compiler using libgccjit

This will be an alternative to the existing LLVM JIT compiler - and both implementations will be kept in sync.

Array initialiser does not accept the full table syntax

At present following fails:

local arry : int[] = { 1, 2 }

Following works:

local arry : int[] = {}

Provide an API to dump the machine code generated by LLVM for a given function.

It would be nice to see the final machine code produced by LLVM after JITting a given function.

Enhance performance of FORNUM loops

Performance can be improved by generating different opcodes for the common case of integer for loop where the index is a constant and therefore known to be <0 or >0.

The array type coertion operations are not working in JIT compiled code

The following opcodes are misbehaving in JITed code:

OP_RAVI_TOARRAYI
OP_RAVI_TOARRAYF
OP_RAVI_MOVEAI
OP_RAVI_MOVEAF

There are two bugs - first wrong variable being used in comparison, secondly the type of a table is LUA_TTABLE | BIT_ISCOLLECTABLE = 69. So the type check always fails.

Implement custom memory manager for LLVM to allow early release of module and EE

Currently we hold on to the module and EE even after generating code for a function. It is possible to release these as described in referenced thread below:

http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-February/082051.html

Debug API needs to skip over compiled code

Compiled code should be treated as C functions and skipped over

libgccjit based JIT missing constant optimization in type specific arithmetic operators

Reminder to myself

bit-wise operators are not JIT compiled

Missing type check for assignment to array elements

When values are assigned to arrays the type check is missing so any garbage can be assigned

Lua C API does not work correctly with arrays (specialization of tables)

Need to ensure that C code does not break arrays.

table library functions should work with Ravi arrays

At the moment there are some issues.

remove() fails as it attempts to insert a nil value.

Implement binary operators that embed the constant value

If small constant values we can embed the value in the instruction when the constant is being added to or multiplied by a register

Create a Lua binding for LLVM

A Lua binding for LLVM will make the power of LLVM accessible to Lua programmers. This binding needs to work for Lua 5.3 hence we cannot use Ravi specific features.

Initially the binding will be part of Ravi but needs to be built so that the binding can be loaded in Lua programs without having to use Ravi.

Work has started on this - please see:
https://github.com/dibyendumajumdar/ravi/tree/master/llvmbinding

Implement op codes for array set and get specialized for arrays

libgccjit implementation - try out struct based value types

Current implementation represents values as unions similar to Lua, however this may be causing issues in generating optimized code. Need to try out the struct based approach used for LLVM

Implement NaN tagging approach for Values to improve performance of floating point variables

Lua 5.3 uses a separate type code field in values to identify types. This means that when performing arithmetic operations on numbers (doubles) there is a need to update two fields, the value and the type.

The overhead of updating the type field can be reduced for floating point operations by using the NaN tagging approach. Basically we can split the value into two parts.

The first part will be a double - holding either doubles or a type code encoded in a NaN.
The second part will be the actual value.

Implement performance tests

Fix tbaa meta data

I misunderstood the tbaa metadata usage - it seems that when loading or storing the primitive types the metadata should be set to path node. e.g.

store double %3, double* %5, align 8, !tbaa !11

!11 = metadata !{metadata !12, metadata !12, i64 0}
!12 = metadata !{metadata !"double", metadata !3, i64 0}

automate tests with CI

You are implementing opcodes and changing the code. Something can be accidentally broken. Since Lua itself and Ravi have some tests, it worth using a continuous integration platform for this project.

You can use lua-travis-example as an example.

Coroutines do not work in compiled code as yielding and resuming are not supported

Possible solution is to use Mike Pall's Coco package.

Reduce/remove redundant operators

We don't really need Register+Konstant and Konstant+Register as one of them is enough.

Type coertion for local array declarations is missing when assigning function call values

local aa: int[] = print()

Failed to compile but should have compiled and given a runtime error.

Debug API does not work in JIT mode

The main issue is that the 'savedpc' in the call frame is not updated in JIT code. Updating this on every bytecode instruction would inhibit optimization. However one potential solution is to update it when one of following happens:
a) OP_CALL or OP_TAILCALL is invoked
b) A metamethod is invoked
c) Apart from above we also need to handle hooks. Since this is a large overhead (function call) we need to only enable this conditionally

Event	Implementation Status
OP_CALL	Done
OP_TAILCALL	Done
OP_TFORPREP	Done
__add	Done
__sub	Done
__mul	Done
__div	Done
__mod	Done
__pow	Done
__unm	Done
__idiv	Done
__band	Not JITed
__bor	Not JITed
__bxor	Not JITed
__bnot	Not JITed
__shl	Done
__shr	Done
__concat	Done
__len	Done
__eq	Done
__lt	Done
__le	Done
__index	Done
__newindex	Done
__call	Done
Hook	Done

Implement typed function parameters

At present the parser does not support typed parameters in functions.

Specialized SETUPVAL opcodes are not JIT compiled

To fix the issue with upvalues subverting static types new opcodes have been introduced - these are type specialized versions of OP_SETUPVAL. These opcodes are not yet JIT compiled

Do not hard-code the optimization level

You currently seem to hardcode the LVVM optimization level:

llvm::PassManagerBuilder pmb;
pmb.OptLevel = 1;
pmb.SizeLevel = 0;

May be it is a good idea to provide a command-line parameter or some other means of configuring it?
Have you tried with OptLevel=2 or OptLevel=3? I can imagine that require more time for JITting, but may be they produce a much better code?

Implement binary operators where one operand has dynamic type

This will enable us to use type specific operators in more scenarios - right this degenerates to ANY type

up-values can subvert static typing of local variables

function x() local i: integer; return function(j) i = j; end; end
f=x()
f(5)
f('hello')

Generate inline code for comparison operations when types are known

Currently all comparison ops call a function which means that comparisons are inefficient and I think they particularly affect the performance of benchmarks such as fannkuchen.

When types are known and both types are the same (common case) we should be able to generate more specialised comparison opcodes that can be inlined

How to infer static types for logical operators

The type of a logical operators and/or is determined based on the arguments - so this poses a problem when determining the return type statically.

Possible solutions:
We could specialise for situations where both operands of the boolean operator are of known type - because in this case the resulting type is going to be the same as that of the operands. However, when either operand is of a different type - only if the comparison is between constants can the resulting type be determined at compile time

The operator # is not recognized as integer value

data = {}
local n: integer = #data

Above should work

Cleanup the way we pass type info when parsing local declarations

See localvar_explist() and localstat()

Generated LLVM is inefficient

When I look at the dumped LLVM IR, it looks pretty inefficient.

Let me try to explain what I mean:

One of the main reasons for it seems to be the fact that Ravi currently very closely models how LuaVM works with a stack. As a result, Ravi JIT always loads/stores operands of any Opcode to from/to memory, which prohibits many LLVM optimizations. While this is required upon entering a JITted function or leaving it (e.g. calling into a non-JITted code or returning), I'm not sure it is required inside the function being JITted.

Overall, I think it is important to closely model LuaVM stack only when there is a chance it is being "observed" from outside of a JITted function and the stack is expected to be exactly as if it was prepared by LuaVM.

But if you have a temporary value or a local variable which is not "observable" from outside, you are free to work with it without mapping it to a LuaVM stack. You can most likely simply model it as an "alloca" (of a TValue?) in LLVM IR and perform operations directly on it.

Eventually, you may even try to map all incoming LuaVM values to local LLVM "alloca" upon entering a function and store them back to the LuaVM stack upon leaving a function.

IMHO, keeping values in LLVM IR %-values would allow LLVM to perform common-subexpression-elimination, hoisting of values out of loops, load-store optimizations, copy forwarding and much more.

May be my view is rather naive and I miss an obvious reason why this cannot be done and one should always closely model the LuaVM stack inside the JITted function.

A Lua test in calls.lua (5.3.0 test pack) crashes on Windows

LLVM version 3.6.0
Windows 8.1 64-bit
VS 2013

The crash occurs in following code in calls.lua:
function err_on_n (n)
if n==0 then error(); exit(1);
else err_on_n (n-1); exit(1);
end
end

do
function dummy (n)
if n > 0 then
assert(not pcall(err_on_n, n))
dummy(n-1)
end
end
end

-- FIXME - this causes a fault
dummy(10)

libgccjit based JIT missing OP_POW implementation