gilbo / ebb Goto Github PK

View Code? Open in Web Editor NEW

86.0 86.0 11.0 7.7 MB

DSL for physical simulation

Home Page: http://ebblang.org

License: Other

Lua 0.43% Perl 14.09% Perl 6 1.61% Makefile 0.50% C++ 3.74% C 1.27% Python 0.20% Shell 0.10% Cuda 0.23% Terra 77.84%

ebb's People

Contributors

Stargazers

Watchers

Forkers

elliottslaughter wme7 serpheroth jonathan-beard bssrdf manopapad cpehle tomerwei jomorlier loreenhinckley95 yig

ebb's Issues

LVector __tostring

It would be nice to be able to easily pretty-print LVectors from Lua. This is a low priority enhancement.

Hook to explicitly invoke compilation

The rub is that the user might need to provide some extra information to make sure we're fully typed when this happens

Phase detection in semantic checking

Semantic checker should be updated to keep track of how global Liszt objects are being accessed in the kernel. This will require:

Semantic checker needs to be able to detect reductions. Check runtime/common/liszt_runtime.h for a the liszt of reduction phases we support.

(Note that L_REDUCE_PLUS is used for both plus and minus, and L_REDUCE_MULTIPLY is used for both multiplication and division.)

Right now we will only detect reductions if they are in the form "lisztobj = lisztobj op expr". Specifically, the global variable being reduced must show up as the left-hand side of the binary operation.

Semantic checker needs to be able to detect when a global liszt object changes phase within a kernel and should issue an error.
Semantic checker should product a symbol table that maps all fields / scalars to their phase in the kernel, to be used for codegen. The symbol table should be indexed by the objects themselves instead of their names so that we don't inadvertently miss phase errors because someone referred to the same object with two different variables in the global scope.

Discuss Polymorphic Functions

Fix Phase Checking

Remove ExprStatement

Arbitrary expression statements are not allowed in either lua or terra and, since we haven't yet decided on how we are going to allow function calls from a liszt kernel, we should just remove it for now.

Vector arithmetic in lua

The vector class should be able to support arithmetic operations in lua scope (e.g. 'v1 + v2' should return a new Vector, etc.) The Vector metatable should be modified so that Vector objects support all vector operators inline.

Operator type checking:
'+' and '-' should be vector [op] vector -> vector operations
'*' is vector [op] number -> vector, or number [op] vector -> vector
'/' is vector [op] number -> vector (you cannot divide a number by a vector)

Other useful methods:
Vector.norm(v) -> returns the 2-norm of v (can also be called as v:norm())
Vector.dot(v1, v2) -> returns v1 . v2 (can also be called as v1:dot(v2))
Vector.normalize()
Vector[i] should return/write to the i'th element of the vector
Vector.init({x, x, x}) or some such.

Are Vega examples being slowed down by GPU round-trip blocking/latency?

Float constants

Add ability to declare a single-precision floating point constant as e.g. 0.5f.

Some alternate options:

type all non-integral numeric constants as single-precision by default
infer types of numeric constants (this could get real complicated...)

Also worth considering, regardless of the above:

add an L.NewConstant(...) statement that mirrors the L.NewGlobal declaration. This would allow us to type values from the Lua Scope while still letting the compiler specialize them into the code.

Port lulesh to liszt

Lulesh site: https://codesign.llnl.gov/lulesh.php

We have been asked by our LANL friends to port lulesh to liszt-in-terra. (For reference, there is an existing implementation in Liszt-in-scala.)

Verify out-of-order test case output on GPU

Print statements on the GPU are not guaranteed to come out in any order with respect to thread_id, block_id, etc. Thus, we need a more intelligent diff that can verify that the output of a GPU test case has exactly one of each line of the test output, in any order, and nothing else, in order to verify the results of GPU tests.

Add semantic checking for writes to lua scope

Semantic checker should report an error when an object from Lua scope that is not of a Liszt data type (Scalar or Field) is written to.

Vector type during semantic checking

Vector type has kind "vector" currently. Should change this and handle it accordingly during semantic checking, since a vector type should not be an actual vector.

Enforce Liszt function parameter type annotations

When we added function polymorphism and replaced kernels with just plain Liszt functions, we also introduced support in the parser for optionally type annotating any argument to a Liszt function. However, these type annotations are currently ignored.

Work Item:
Add support to the type-checker to complain when these explicit annotations are violated.

3d grid

We want a 3d grid implementation. This should be a fairly straightforward adaptation of the 2d grid code. I recommend against trying to somehow generalize the 2d/3d grid code into one form. While abstracting across 2d/3d will reduce code duplication, the implementation will also get considerably more difficult to read. (my prediction)

Simple "Literal Analysis" in Typechecker

We want to have a way to throw type-checking errors when certain accesses use non-constant values.

We want this for Affine-Indexing, and also for indexing into vectors and matrices. In the latter case, we want to be able to have loops over constant ranges generate indices with known bounds (i.e. not constant, but something we can certify is ok to index with statically.

Also, this issue may be related to #34 which handles a special form of constants: strings, which can't be manipulated with computation/arithmetic anyway.

Efficient Field Polymorphism

The initial proposal for field polymorphism ( #34 ) suggests fixing field names/identities via typechecking. This is fine. However, we would eventually like to be able to avoid re-typechecking / re-compiling when field names/identities change but their types/parent relationships remain unchanged.

(As an example, this is an important step to being able to efficiently support temporary fields; since it's not reasonable to re-compile every time the identity of a temporary field changes.)

Periodic Boundary Conditions

There are multiple approaches to solving this problem. Pick one and implement it.

We did decide that it would be better to make whatever features we add for this modal. That is, the user needs to explicitly turn on periodic boundary condition support in the code by calling some method. This allows us to capture that user choice and throw an error if we're being deployed on GPU or Cluster. This way we can choose not to implement periodic boundary conditions on non-CPU runtimes if we want to.

HIGH PRIORITY: Ivan asked for this.

String Literals for format strings in the Print statement

See the other issue on Field Polymorphism #34 . This issue depends on it.

Once support for string literals is added we'd like to support some of printf string formatting in Liszt.

Maybe introduce novel syntax (e.g. %v, %m ? are those conflicted?) for printing out vector or matrix values? (Don't get caught up on this though)

Fix example FEM codes

The hexIntForce and rectangleSqueeze examples are broken because rely on a deprecated field storage implementation. These should be fixed as time permits.

GPU Insert/Delete

This requires building some kind of GPU write-buffer for the INSERT support. DELETE support could be improved by having a GPU-resident compaction kernel, but Defragging can maybe be ignored for the time being?

Suggested design: (Suppose a function has insert statements)

Each warp/block should set up a local write-buffer in shared memory.
Whenever (a) that write-buffer is full or (b) the block reaches the end of execution, the local write-buffer should be flushed into a global write-buffer, probably immediately at the end of the relation's fields.
Space in the global write buffer should be requested/reserved by performing a GPU atomic addition on a global write-buffer tail pointer/counter value.
We will need to have some kind of expensive re-allocation and copy step if/when the global write-buffer fills up. For an experimental implementation, we can probably get away with just crashing when this happens.

The above design seems optimal to some slight variations for the following reasons:

Without local write-buffers, the whole execution will have very bad global contention at every insert statement on the global buffer counter, which is likely to be very inefficient.
If write space is statically provisioned prior to execution rather than relying on some kind of synchronization, we'll have to allocate an unreasonable and pessimistically large amount of space to potentially write into.

Globals in Legion: Use small regions instead of Futures?

Should we use tiny regions or futures?

Problem with tiny regions: Does Legion support this with reasonable performance or will it overload the system?

Problem with futures: Futures don't have any notion of memory residency. In the case of GPU tasks, this introduces spurious blocks on copying global data to/from GPU memory.

Reducing / Field Write into part of a vector/matrix field is erroneously allowed

Here's a snippet of code with the problem:

for r = 0,3 do
  t.e[i,v].stiffness[r,k] += dH[r,i]
end

So, the problem here is that the left-hand side is not correctly resolved as a field-write (actually reduction) here because the conversion of Assignment AST nodes into GlobalWrite or FieldWrite AST nodes relies on pattern matching the pattern Assignment(Global, rhs) or Assignment(FieldAccess, rhs) and rewriting the Assignment into a GlobalWrite or FieldWrite node.

However, in the above snippet of code, the immediate left hand-side of the assignment is actually a SquareIndex ast node (the indexing into the stiffness matrix).

In general, the phenomenon which will trigger this problem is trying to reduce/write into only one entry of a vector or matrix field/global value. We should either disallow this entirely (not recommended) or fix it so that this is correctly detected as a write/reduce by the typechecker (recommended).

Affine Indexing Expression

This is a language feature which is necessary to perform stencil analysis on grid code. It would involve translating all the current Grid Macro implementations (e.g. constant relative offset c(1,2)).

This is very low priority, since it doesn't really matter until we start doing stencil analysis. However, it's a simple task for anyone who wants something simple to do.

Field Polymorphism / String Literals

We want to support polymorphism of Liszt functions over different possible fields. This is important for writing solvers or other kinds of generic numeric code in Liszt.

Proposed User Syntax:

-- definition of field parameters as untyped variables
-- may be possible to add a L.string type annotation
liszt foo( c : cells, in_field, out_field )
  ...
  c[in_field]
  ...
end

-- calling convention
cells:map(foo, 'temperature', 'temperature_shadow')

Proposed semantics & typechecking:
Field parameters should just be string values.

The actual string value will be supplied at type-checking time. This string value will be used to type the parameter variable with the type L.string('the_actual_string') a type which should not be coercable to any other string type. As a result, a string-typed variable isn't actually inhabited by values; it's an opaque carrier object whose purpose is only to propagate the string type containing the constant. Whenever a string constant is expected at any point in the code, the typechecker attempts to find a string value in the type. (This same mechanism should make it possible to add formatting strings to the print statement) To be clear, the proposal is that strings should be incorporated into Liszt, but not as proper first-class values.

Also note that this proposed design results in having to re-type-check a function for every possible assignment of string arguments. We can certainly remove this decision at a later point, but it seems reasonable and expedient for now.

Don't Inline All Functions

This requires having some kind of functional call abstraction that's consistent, along with function types of some sort in the typechecker. The function call abstraction is complicated because there are a lot of implicit arguments to handle passing all of the fields that need accessing.

potential name collision issues in liszt.lua, liszt.t

The liszt library files that try to keep imports out of their public namespaces by making them local neglect to check and see if the shadowed variable existed in the global scope before it was over-written by the imports. Thus, we may be hiding previously visible libraries in the global namespace when we set _G.runtime to nil. This is unlikely to cause problems in the near future, but should eventually be fixed.

QuoteExpr => Let Expr

There's currently an AST node called QuoteExpr (I think) that conflates quoted code with let expressions (since we only allow liszt code to explicitly write let expressions via the syntactic form liszt quote stmts in expr end).

It would be good to clean this up in the compiler at some point.

Disassembly Reflection Feature

pipe this through from Terra, including working out something for GPUs etc.

Support for Global variables on GPU

Field reading and writing mostly works on GPUs. Following that example, extract the global phase analysis data, and set up the Bran/Germ to provide dynamic global locations to the kernel. (Another design is reasonable. Recommended to talk to Gilbert first if you're deviating from the field approach significantly)

This item requires developing a good understanding of the new codegen & execution details. (Bran/Germ, etc.) It's also easy to mostly copy an existing example (fields). So, this makes a good task to familiarize new/returning people with the code.

HIGH PRIORITY: Main obstacle to GPU being feature complete.

Kernel Invocation Error Reporting

Right now, any error that happens while a kernel is running will cause a stack dump to report the error as originating within our compiler. That's bad for us.

There are two stages to fixing this:

Wrap all kernel invocations in a Lua xpcall() that produces a slightly more useful stack dump. That is, the stack dump should locate the error at the kernel call site instead.
Eventually, we'd like to plumb line-number information from the parser through to the Terra code-gen and then somehow have terra dump more useful debug information. This may be difficult and requires Zach's input.

Insert/Delete don't work on Legion

We need to add back in Insert/Delete support.

It's deprecated everywhere and not supported under Legion at all.

See #37 for getting insert/delete working on GPUs

Functions (simple version inlined)

Min/Max Reduction

Add support for reducing fields or globals with the reduction operators min= and max=. This requires touching a lot of different parts of the compiler very lightly. Good familiarizing task.

HIGH PRIORITY: Ivan requested

Audit: terralib.new ffi.new terra global

We need to find all allocation sites in the compiler and possibly change them to eliminate unnecessary uses of Terra globals and .new declarations

I recently learned from Zach that we should be avoiding Terra global declarations because they invoke the LLVM toolchain. Notably, they will NOT be garbage collected.

I also learned that terralib.new/ffi.new both allocate memory on the Lua heap, which is probably also not what we want to do in many cases, though it does ensure that the memory will be garbage collected.

One of the main use cases for abusing these features has been the need for a pattern like the following:

allocate space for some structure used/defined by a C-API
(maybe) fill out data in that struct
pass a pointer for that struct to the C-API call
(maybe) read data out of that struct

Sometimes, we may also rely on the struct persisting beyond the call-site, which frequently makes it wise to rely on garbage collection to be safe.

The following snippet of code will

allocate space on the C-heap.
return a pointer to that data (so now we can pass a pointer to the C-API; note that if we access a field on the object pointed to from Lua code, Terra/LuaJIT will dereference the pointer for us, so we can read/write using the pointer too)
install a finalizer on the pointer, which DOES live in the LuaJIT heap.
The finalizer will just free the data by default, but other behaviors can be installed too.

local function SafeHeapAlloc(ttype, finalizer)
  if not finalizer then finalizer = C.free end
  local ptr = terralib.cast( &ttype, C.malloc(terralib.sizeof(ttype)) )
  ffi.gc( ptr, finalizer )
  return ptr
end

Insert Statement

Talk to Gilbert for the design. Need to be able to insert particles using code like the following:

liszt kernel ( c : cells )
  ...
  insert { cell = c, pos = c.center } into particles
  ...
end

Lower Priority: This is needed for full particle support, but we can kludge around it for a while still.

Update builtins to work on GPU

B.length and B.print builtins will need to be updated to generate GPU-specific code because they currently generate code that makes calls to functions in C standard libraries. I expect this will be a trivial fix.

Add double data type to Liszt, change semantic checker to assign double type to numeric literals

Terra is already inferring numeric types to be doubles from the generated code, so we should add doubles to our list of accepted types and make sure that we are correctly inferring what types we end up passing to runtime functions.

Implement Scalar type

liszt kernels can only write to or perform reductions on global variables if they are a Liszt Field or Scalar object, so users must be able to create, manipulate, and access Scalars from lua code.

The Scalar object needs to be implemented so that it:

creates a new scalar object, and allocates a new lScalar object through a runtime call, with Scalar.new(type)
type-checks when created and keeps track of it's own data type
include/liszt.lua and runtime/liszt.t needs to be updated to make sure the runtime initScalar function is being called properly, and not with the dummy values that are being used now.
Scalar Objtype needs to be integrated into semantic type checking
Scalar needs methods that allow the user to update and access the scalar value in lua scope, e.g. "scalar:setTo(3.5)" or "local x = 4 + scalar.value()"

Fix sphere_cloth benchmark

The sphere_cloth benchmark is currently failing to run, presumeably due to an interface change in the grid class. This should be fixed as time permits.

GPU reduction strategy using global atomics

We had a reviewer recommend that on Kepler GPUs the global reduction tree might be more efficient if we replace the second kernel invocation with atomic operations.

To make this clear, consider that implementing a tree reduction in CUDA involves potentially 3 different granularities of parallelism:

Warp-level parallelism
Block-level parallelism
Kernel-level parallelism

Our current scheme uses a tree to perform warp-level parallelism, sync_threads() at the end of the primary kernel execution to aggregate values written to shared memory and then a second kernel to perform kernel-level parallelism.

This proposal is to either (a) replace the second kernel entirely by writing block-level reduction values into a common variable using atomic adds, or (b) replace both kernel and block-level by writing the result of a warp-level tree reduction directly into a global common variable using an atomic add.

Add vector literal syntax to liszt

Update parser and semantic checker to be able to recognize and build vectors from a statement like:

"var v = {1, 3, 5}" or some such.

Fix Numeric Literal parsing of floats/doubles

Liszt should parse 1.0 / 12.0 as double / double, not int / int. (Terra now has extensions for parsing floats and doubles, so this bug can be fixed.)

Write only permission for fields

There are some tricky issues about the write-only permission

Support variable blocksize based on number of required registers/ number of fields used and field sizes.

Blocksize is hardcoded right now for GPUs. For kernels that touch large fields, like small matrices and vectors, it may make sense to use a smaller block size, compared to kernels that use only scalars.

Multiple Return Values

Ivan requested this feature. We can do it, but it will require adding new value-types to the compiler for tuples. Those have to be plumbed all the way through then.

Dynamic Scoping Bug

This is an instance of the classic PL bug. The problem arises due to name re-use in combination with subtree substitution provoked by macro expansion or user-defined-function inlining.

I believe this may be solvable by translating all of the names into symbols during the specialization pass. However, (I haven't given this enough thought) it may be necessary to do a full, proper beta-renaming step to ensure correct behavior. This requires careful, careful thought.

This is relatively high priority b/c any bugs it causes will be really hard to diagnose. However, it's also somewhat unlikely to crop up soon, so we may be able to postpone fixing it a while yet.

Almost certainly Gilbert will fix this. If someone else thinks they have a good enough handle on the problem then they're welcome to give it a go, but make sure to talk to Gilbert or Zach first to make sure you understand the subtleties of why/how this problem crops up.

Re-add support for multiple global reductions from a kernel

We stripped this out to get Legion working

liszt wrapper script is directory dependent

Modify liszt script and setup.sh to make liszt work when it is not evoked from the top level directory of the project. This includes fixing the path to terra and making sure that the proper library paths are fed to terra as options in the liszt script.

Delete Statement

Talk to Gilbert for the design details. Need to be able to delete particles using code like the following:

liszt kernel ( p : particles )
  if bad_particle(p) then
    delete p
  end
end