Giter Club home page Giter Club logo

knossos-ksc's Introduction

Knossos-KSC

Compile a lisp-like IR with automatic differentiation and user-defined rewrites.

This project is a compiler and code-gen tool that will accelerate writing AI algorithms as well as making them easier.
The core is a purely functional lisp-like IR that can be translated from high-level languages, and can be linked to a variety of backends to generate code.

Currently implemented frontends are

  • TorchScript: ts2k
  • Julia: j2ks
  • F#: f2k
  • ONNX: onnx2ks
  • KS-Lisp: The IR itself is exchanged in a lisp-like text format (see below).

Current backends:

  • CPU/C++: In src/ksc and src/runtime
  • GPU/Futhark: Also in src/ksc
  • MLIR: In mlir

Current transformers:

  • KSC: Various Autodiff and optimization transforms, in Haskell
  • RLO: A rewriting-based optimizer using reinforcement learning

Docs are built to https://microsoft.github.io/knossos-ksc/

KS-Lisp: A low-sugar IR

The text form of the IR is "low sugar", focused more on simplicity than user-friendliness. However it is human writable, and lispers in particular may like to play with it. There's a VS Code syntax highlighter extension in etc/ks-vscode.

The AST has just a few concepts: Lambda, Call, Let binding, Conditionals, Constants, Assert, Top-level function definitions, and Rewrite rules.

It's also strongly typed, using a bare set of types: Tensor, Tuple, and Values (Float, Integer, String).
However, unlike say MLIR or TorchScript, these are composable, so e.g. we can have a (Tensor 1 (Tuple Float (Tensor 2 Float))). Tensors are annotated with their "rank" or number of dimensions.

The IR is pure functional, so functions may be called more than once or not at all depending on the use of their outputs. Values are passed and returned "as if" by value, although the optimizer reserves the right implement this using refcounting/copy/move, as long as the same computation is performed.

The lisp-like IR is extremely simple -- all the language builtins are in this code:

;; Externally defined function "atan2", which returns a Float, and takes two Floats
(edef atan2 Float (Float Float)) 

#| Block comments
 -- User-defined function myfun 
 takes an Integer and Tensor of (Float Float) pairs
 and returns a pair of String and Float
|#
(def myfun                                       ; function name
  (Tuple String Float)                           ; return type
  ((i : Integer)                                 ; argument 1: int
   (v : Tensor 3 (Tuple Float Float)))           ; argument 2: 3D tensor of tuple
  (assert (gt i 0)                               ; (assert TEST BODY)
     (if (eq i 0)                                ; (if TEST TEXPR FEXPR)
        ; "then" br
        (tuple "fred"                            ; tuple constructor
           (let (tmp (index 0 v))                ; (let (VAR VAL) BODY)
             (mul (get$1$2 tmp) 2.0)) )          ; no builtins -- even mul is a function
        ; "else" br
        (let ((t1 (index 0 v))                   ; (let ((VAR1 VAL1) ... (VARn VALn)) BODY)
              (t2 (myfun (sub i 1) v)))          ; Recursive call to myfun
          t2))))

;; Rewrite rule
(rule "mul.commute"                     ; rule name
    ((a : Float) (b : Float))           ; "template_vars": free variables in the template
    (mul a b)                           ; "template": a pattern to match
    (mul b a))                          ; replacement
; Rules are *not* applied exhaustively, that's very intentionally left up to the compiler.  
; For example the rule above, if matched exhaustively would not terminate.

;; And compilation produces f and various derivatives, as if
(edef rev$myfun ; classic reverse mode.  If f :: S -> T, then rev$f :: (S, dT) -> dS
    (Tuple (Tuple) (Tensor 1 (Tuple Float Float))) ; dS is tangent-type of inputs (dInteger = void)
    ((#|s  |# (Tuple Integer (Tensor 1 (Tuple Float Float)))) ; inputs in a single tuple
     (#|dt |# (Tuple (Tuple) Float)))) ; dT is tangent type of returns

(edef D$myfun  ; linear map as per Elliot.  If f :: S->T, D$f :: S -> (LM dS dT) where LM is linear map
    (LM
      (Tuple (Tuple) (Tensor 1 (Tuple Float Float))) ; dS
      (Tuple (Tuple) Float) ; dT
    )
    (Integer (Tensor 1 (Tuple Float Float)))) ; inputs as normal (not single-tupled)

See the ksc syntax primer for a longer introduction to the ks language. The ksc test directory provides more examples of the constructs available when writing .ks files.

INSTALLATION/BUILDING

Installation

See ksc or mlir

Running the ksc executable

Running

Run the ksc executable as follows to differentiate, compile and run a .ks program. This example runs hello-world.ks.

./build/bin/ksc --compile-and-run \
  --ks-source-file src/runtime/prelude.ks \
  --ks-source-file test/ksc/hello-world.ks \
  --ks-output-file obj/test/ksc/hello-world.kso \
  --cpp-include prelude.h \
  --cpp-output-file obj/test/ksc/hello-world.cpp \
  --c++ g++ \
  --exe-output-file obj/test/ksc/hello-world.exe

or with PowerShell syntax:

./build/bin/ksc --compile-and-run `
  --ks-source-file src/runtime/prelude.ks `
  --ks-source-file test/ksc/hello-world.ks `
  --ks-output-file obj/test/ksc/hello-world.kso `
  --cpp-include prelude.h `
  --cpp-output-file obj/test/ksc/hello-world.cpp `
  --c++ g++ `
  --exe-output-file obj/test/ksc/hello-world.exe

Which should produce output like this:

read decls
Linted Typechecked defs
Linted Grad
Linted Grad tupled
Linted Optgrad
Linted Optgrad tupled
Linted Diffs
Linted OptDiffs
Linted CSE
Writing to obj/test/ksc/hello-world.kso
Writing to obj/test/ksc/hello-world.cpp
Compiling: g++ -fmax-errors=5 -fdiagnostics-color=always -Wall -Wno-unused -Wno-maybe-uninitialized -Isrc/runtime -O3 -g -std=c++17 -o obj/test/ksc/hello-world.exe obj/test/ksc/hello-world.cpp
Running

Hello world!
If you are seeing this output then knossos-ksc has successfully compiled and run the hello-world.ks program!

PyTorch frontend

See doc/sphinx

Tests

To run the ksc self-tests use the command line

./build/bin/ksc --test --fs-test out.fs

(Don't worry if the final test, of out.fs, fails. It is a test for F#-to-ks, which most users will not have set up.)

Generating a .kso file from a .ks file without differentiating

To generate a .kso file from a .ks file without differentiating, i.e. to type check and apply ksc's heuristic optimisations, use the command line

./build/bin/ksc --generate-cpp \
  --ks-source-file src/runtime/prelude.ks \
  --ks-source-file input.ks \
  --ks-output-file output.ks \
  --cpp-include prelude.h \
  --cpp-output-file output.cpp

KSC Internals

Syntax of .ks files

In the compiler, the IR is defined in Lang.hs. The syntax is defined by the parser in Parse.hs and the pretty-printer in Lang.hs. testRoundTrip in Main.hs checks that they agree.

Continuous integration

knossos-ksc has a continuous integration build set up on Azure DevOps. To manually run a CI build click "Run pipeline", type the name of your branch under "Branch/tag", and click "Run".

Code of Conduct

Collaboration on this project is subject to the Microsoft Open Source Code of Conduct.

knossos-ksc's People

Contributors

acl33 avatar athas avatar awf avatar cgravill avatar dcrc2 avatar dependabot[bot] avatar grvillic avatar jinnaiyuu avatar martin-luecke avatar rengolin avatar ryotatomioka avatar siddharth-krishna avatar timhutton avatar toelli-msft avatar tomjaguarpaw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

knossos-ksc's Issues

optSumBuild rule is wrong

It seems that one of our optSumBuild rules is wrong.

knossos-ksc/src/ksc/Opt.hs

Lines 532 to 536 in 326c0a9

optSumBuild n i (Call deltaVec [_n1, Var i1, e])
| deltaVec `isThePrimFun` "deltaVec"
, i == i1
-- TODO n == sz
= Just $ pBuild n (Lam i e)

The original expression creates a vector of length _n1 but the rewritten expression will make a vector of length n . Have I understood this correctly? I'm surprised that we didn't notice this earlier.

I guess that a fix would be to check that n == _n1, i.e. simply check for syntactic equality of the length expressions.

v2-repl command is missing an argument

In the README it says:
cabal v2-repl --ghc-option=-Wwarn
But this gives me following error:

cabal.exe: No targets given and there is no package in the current directory.
Use the target 'all' for all packages in the project or specify packages or
components by name or location. See 'cabal build --help' for more details on
target options.

The command works if I put all or Knossos at the end.

kspy-related things

There are some things we should bear in mind about ks2py. We don't need to do them now but it's good to keep a note of them.

  • We should install jax only once (mnistcnn installs it too)
  • I would prefer the tests not to depend on external data. That's hard to resolve though, if we want to to a real test of a Resnet inference.
  • We should run ks2py on all our other test cases too (just to check that they compile, not necessarily that they run)

Better C++ codegen when passing multiple arguments as tuple

blas_slow.ks contains code implementing BLAS routine "gemm" and a small benchmark. Execution time (after building with ksc and g++ 7.4) between 4.8-5.1s consistent over at least 20 runs on Linux VM. ksc commit a45667a, haskell 8.4.4, cabal 2.4.1.0, Ubuntu 18.04.

blas_fast.ks contains the same code, but passing the 5 arguments in a tuple. Execution time between 3.8-4.1s on same VM.

blast_test2.cpp contains the c++ code generated from blas_fast.ks, but modified by hand to pass the 5 arguments separately as 5 args rather than 1 tuple. It also runs in around 4s, so argument passing is not the difference, but rather, the c++ code generated by ksc in response to the tuple, is (somehow) better....

blas_slow.ks.txt

blas_fast.ks.txt

blas_test2.cpp.txt

Add explicit type annotations to IR?

Today, the serialized IR has minimal type annotations. Every function declares its argument and return types, and a type propagation (TP) stage gives a unique type to every expression. It might be easier for some tooling to avoid having to run TP, reading the types from a serialized format instead.

Questions: is this useful? What syntax?

For syntax, as these annotations are essentially comments -- they do not coerce or construct, they just annotate, so I imagine they should look a little like comments.

Semicolons are like comments, and colons are like semicolons, so today's

1.2 ; Float
v ; (Tuple Float Float)
(build n (lam (i : Integer) (add 1.2 v))) ; Vec Float

could become

1.2 : Float
v : (Tuple Float Float)
(build n (lam (i : Integer) (add 1.2 v))) : Vec Float

because types are all LL parseable, this adds no ambiguity to the grammar.

There's also the option of not annotating constants or lambdas, or indeed anything but calls.

And while doing this, we might tidy the function decls too:

(def f : Vec Float 
    ((x : Float) (y : Integer))
...
)

------ other option ------

Given than block comments are #| |#, I used to consider some sort of square bracket thing, e.g. today's

1.2
v
(build n (lam (i : Integer) (add 1.2 v)))

would become

1.2 [Float]
v [Vec Float]
(build n [Integer] (lam (i : Integer) (add 1.2 [Float] v [Float]) [Float]) [Lambda Integer Float]) [Vec Float] ; this looks like it might be easily lost in debug output
; that still might be easily lost in debug output, could try...
(build [Vec Float]  n [Integer] (lam (i : Integer) (add [Float] 1.2 [Float] v [Float]) ) ) ; alternative for call-like forms
; or....
([Vec Float] build n [Integer] (lam (i : Integer) ([Float] add 1.2 [Float] v [Float]) ) [Lambda Integer Float]) ; further alternative for call-like forms

$ranhashdoub type inconsistency

The C++ definition of $ranhasdoub is as follows:

        inline double $ranhashdoub(uint64_t v) {
          return 5.42101086242752217E-20 * $ranhash(v);
        }

The ksc type checker assigns $ranhashdoub the type Integer -> Float. Everywhere else, Integer seems to be translated to type int (meaning int32_t on most platforms). Because of C++'s delightful implicit type conversions from int32_t to uint64_t, this does not prevent the generated C++ code from compiling, but it does cause issues for the Futhark backend, which is more picky with respect to types.

I suggest changing $ranhasdoub to simply take an int32_t as argument instead:

        inline double $ranhashdoub(int32_t v) {
          return 5.42101086242752217E-20 * $ranhash(v);
        }

The Great Renaming (includes Vec -> Tensor)

Our names are pretty nasty at the moment. "build" is not special, it's just a way to construct an object of type Tensor.
We should fix these when we get a good window.

Here's a list of suggested rewrites. Some are polymorphic so currently need to be in Prim.hs, others should move to prelude.ks

build -> Tensor.init    ; `.` is allowed in identifiers, so this is not special, just a convention
rand -> (Tensor.rand  sizes)
to_float -> Integer.init  ; Integer.init might also be overloaded from bool
get$m$n -> get$m   ; No need for $n, as names need not be unique
etc..

Other renamings: ks types in C++ to ks::Float, ks::Vec, or, uglier but safer, to ks_Float, ks_Vec to prepare for C backend. Note that or->ks_or and abs->ks_abs are better than ks::or etc because of reserved words and clashes with global namespace names like abs.

Add tangent-space add and scale

A differentiable programming language needs two main primitives, allowing tangents to be added to types, and tangents to be scaled. This we should add

ts_add :: T x dT -> T
ts_scale :: Real x dT -> dT

noting that we will ultimately need multiple float types (e.g. 32 bit 64 bit etc).

Add union types

We need union types. Why?

  • Very practically, AD might require if to return different things in different branches.
  • And we would like to construct Lists and Trees

A starting point might be "why not use Inl, Inr, case?"
At first glance, it's a lot of new infrastructure, duplicating "if", and it uses concepts not in our lowering targets (e.g. C)

(Inl "fred") -- What type is this?  String + 'b is quite fancy for KS

We might overcome his by explicitly allowing function calls to name their return type, e.g.

(: (Union String Float) Inl "fred")

This mentions types in a way that no other KS constructs do, but there has been consideration of other uses, e.g. creating zeros or random tensors.
Then one might create such a union using

(def foo (Union String Float) ((x : Float))
  (if (gt x 3.0)
     (: (Union String Float) init.l "fred")
     (: (Union String Float) init.r 1.1)))

And deconstruct it using getters named after the get$M of tuples

(def oof ... ((x : Float) (u : (Union String Float)))
  (if (gt x 3.0) # user ensures this is the same cond, or (get$tag u)
     (get$l (Union String Float) "fred")
     (get$r (Union String Float) 1.1))

Boolean tagged union.

Possibly staying closer to the rest of KS concepts, we could replace "l" and "r" with "t" and "f"
A boolean tagged union would look like a 3-tuple (cond if_true if_false).
It would be constructed using "if":

typeof (if p t f) =  Union (typeof t) (typeof f)

and accessed using getters named after the get$M of tuples

(if (get$cond u)
    .. do something with (get$t u)
    .. do something with (get$f u))

A first implications of this scheme are that all ifs would produce unions, which would very quickly yield unwieldy types.
One might try to avoid this using an explicit union constructor

(Union.if cond true_branch false_branch)

Or one might aggressively try to collapse unions, i.e.

  mkUnion s t 
     | s == t = s
     | otherwise = TypeUnion s t
  -- ? mkUnion (Tuple as) (Tuple bs) = TypeTuple $ zipWith mkUnion as bs
  -- mkUnion (Vec as) (Vec bs)  -- not this one -- Vecs need not have same size

but this feels messy, may have implications for auto-generated code which would need to know if such collapse had happened.

Untagged

As one alternative to consider, the simplest union type might be untagged, constructed only using "if":

typeof (if p t f) =  Union (typeof t) (typeof f)

So, we make an e.g. (Union String Float) using

(if p
   "fred"
   1.1)

If users want to add a tag, they can do so

(if n == 0
   (tuple true "fred")
   (tuple false 1.1))

We might simply declare TypeUnion TypeX TypeX in Lang.hs with codegen emitting a C union. I believe it's very valuable to think about the C/C++ lowering of the type. We would lower (Union T F) to

union {
  T t;
  F f;
};

Remember we have no types with interesting destructors… This cements that.

Note that this is an untagged union -- there's no case statement, you need to get$t or get$f depending what you put in. If you get the wrong thing, Bad Things will happen.

This will also allow us to codegen LMs again....

Using a union without a tag:

(let (u (if p "fred" 4))
  ....
     (assert q_was_implied_by_b
         (strlen (get$t u))
))

More constructionally?

(if-union p "fred" 1.1) 

Here are some older ideas, which I think are not good:

Here’s a very minimalist union. Just as we eschew field names in structs/records, and desugar them all to tuples, we will use an order dependence in switching on unions

(def f String (u : Union Int String)
      (union_case u
	(format “int %d” u) ;; u evaluates as int in the Int branch
	(append “string ” u))) ;; u evaluates as string in the String branch

But that's pretty grotty, so it may be worth a little sugar. Just as we might consider pattern-matching tuples in let, e.g. of the form (let ((ui us) (Tuple 2 “3”)) body) we could have

      (union_case ((ui us) u) ;; number of names in (ui us) must match, easy to check
	(format “int %d” ui) 
	(append “string ” us)) 

We could go more verbose

      (union_case  u : ((ui : Int) (us : String))
	(format “int %d” ui) 
	(append “string ” us))

but KINAUFL.

I also don't think we should somehow switch on the type, like

      (union_case u
	Int (format “int %d” u) ;; u evaluates as int in the Int branch
	String (append “string ” u)) ;; u evaluates as string in the String branch

because we don't need to, the type might be long, and we might in some sensible cases have the same type in different branches.

Mutable sumbuild aliases

There's a bug in sumbuild that can arise due to aliasing between the accumulator variable and the summand. I think that to fix it we will have to explicitly copy the first summand (or explicitly create a fresh zero vector of the right size, but that is strictly less efficient).

Our recipe for sumbuild N (lam i body) is:

Loop N times, performing the following

  • Run body(i) to come up with the summand s
  • If it's the first iteration round the loop, set the accumulator acc = s
  • If it's a subsequent iteration, in-place accumulate acc += s

This recipe doesn't work if s references the same C++ value each time. In such a case the first iteration causes acc to reference that value too, so that value gets in-place added to itself N-1 times, resulting in s * 2^(N-1) (rather than s * N)! (Not only is this the wrong result, but it also destroys the original value of s, which we may have wanted to use elsewhere.) For example see the following code, which should return the sum of 20 length-5 vectors consisting of entirely 1s, i.e. it should return [20, 20, 20, 20, 20].

(def o2 (Vec n Float)
     (qs : Vec n Float)
       (sumbuild 20 (lam (i : Integer)
         (if (ne i 1) qs qs))))

(def main Integer ()
     (let (qs (build 5 (lam (i : Integer) 1.0)))
     (pr (o2 qs))))

Output (NB 254288 = 2^(20-1))

[524288, 524288, 524288, 524288, 524288]

This was exposed by #133. That PR loses some ksc optimisation possibilities making code compiled with it more likely to trigger this bug.

(cc @simonpj)

Bug in TupleAD

There is a bug in TupleAD. See the branch toelli/tuplead-bug (72d01f6). To reproduce run

ghci -isrc/ksc -e 'demoFFilter (Just 999) (snd . moveMain) TupleAD \
   ["src/runtime/prelude.ks", "test/ksc/test3.ks"]' src/ksc/Ksc/Pipeline.hs

It will say

--------------------------
Type errors in The full Jacobian (unoptimised)
  Variable occurrence mis-match for a1
    Expected type: Vec Float
    Actual type:   ((Vec Float),
                    (LM (Float, Integer, (Vec Float)) (Vec Float)))
  In the call of: Dt$size
  In the rhs of the binding for: r
  In the rhs of the binding for: gt$t156
  In the definition of Dt$q
End of type errors
--------------------------

paramSizeBinders has effects that it shouldn't

paramSizeBinders shouldn't really need to do this. Any variables that are brought into scope because they are def arguments should be handled some other way.

paramSizeBinders (TVar TypeInteger v)

However, removing this clause breaks things. For example

>>>>> TEST: "test/ksc/gmm.ks"

Type errors in Type checking
  Not in scope: local var/tld: n
  In the definition of mkvec
End of type errors

It feels like the problem must come from

extendLclSTM (paramsSizeBinders vars) $
because the other usages of paramsSizeBinders seem irrelevant. At the moment it's unclear why the problem arises there.

Consistent name-mangling convention

KSC needs to know about very few primitive functions, and all functions are in principle edef-able except, today, those taking lambdas as arguments.

Furthermore, almost all functions should be monomorphic, i.e. no generics, no templates.

That means that for common functions like adding, we need to provide different definitions for float+vector, vector+vector etc. Given this will generate many similar names, I propose we have a naming convention that means we can easily write function names that don't clash (including machine-generated names). I believe the following scheme works:

(def fname S ((x1 : T1) ... (xn : Tn)) =>
M[fname] -> fname "@" M[T1] "," .. "," M[Tn]
M[Integer] -> i64
M[Float] -> f64
M[String] -> s
M[Bool] -> b
M[Tuple (T1, ...,Tn)] -> "<" M[T1] "," .. "," M[Tn] ">"
M[Vec T] -> "v" M[T]

so if one wanted to define add of (Tuple (Vec Float) (Vec (Tuple Int Float))) to Vec (Tuple (Tuple Float Int) Float) one would write

(def add@<vf64,v<i64,f64>>,v<<f64,i64>>,f64>> (...)

The commas are actually unnecessary, and the bitwidths are just to ensure we can do them.

Without, the above would have been

(def add@<vfv<if>>v<<fi>f>> (...)

which is certainly shorter :-|

Warning: The install command is a part of the legacy v1 style of cabal usage.

Following instructions on README

PS C:\dev\knossos\knossos-ksc> cabal install hspec parsec mtl hashable filepath
Warning: The install command is a part of the legacy v1 style of cabal usage.

Please switch to using either the new project style and the new-install
command or the legacy v1-install alias as new-style projects will become the
default in the next version of cabal-install. Please file a bug if you cannot
replicate a working v1- use case with the new-style commands.
...

Can we fix?

Things break when there's a vector size variable called "a"

The code below fails with the error below, because the code has a vector size variable called a and there's also special variable called a in the optimisation rules.

I know this is the cause because the latter is the only place that argVar is used and after changing argVar to Simple "aa" the type error no longer arises!

(def strange Float (x : Vec a Float)
    (sumbuild a (lam (j : Integer) 3.0)))
--------------------------
Type errors in OptDiffs
  Ill-typed call to primitive: build
    In a call of: build (Fun (PrimFun "build"))
     Arg types: [Vec a Float, (Lam Integer Float)]
     Args: [d$x,
            (lam (j : Integer)
                 0.0)]
    ST lookup: Nothing
  In the definition of fwd$strange

  Ill-typed call to primitive: build
    In a call of: build (Fun (PrimFun "build"))
     Arg types: [Float, (Lam Integer Float)]
     Args: [d$r,
            (lam (sum$i : Integer)
                 d$r)]
    ST lookup: Nothing
  In the call of: constVec
  In the definition of rev$strange
End of type errors
--------------------------

/cc @ryotatomioka

Update transcript

  • Update the transcript of running ghci/cabal repl so that it refers to a version of ghc that we support
  • Indicate how to install update ghc/cabal on Ubuntu (hvr's ppr)

Rename argVar and resVar

Change argVar from "a" to "ksc$argVar" so that occurrences of it are easy to spot in generated code.
Similarly resVar. See #282

Python backend broken for polymorphic functions

Python backend assumed that there cannot be more than one function with the same name but different argument types. This will not be the case when #276 goes in. To fix this, we need to run a type propagation. It would make sense to move sparser.py and expression.py from RLO and let them handle the parsing and type propagation instead of doing it half baked in translate.py.

error: ‘fwd$deltaVec’ was not declared in this scope

I am trying to see if I can manually rewrite a rev$ function while preserving correctness. However it seems that I cannot use deltaVec in my program.

Minimum reproducer

(def delta (Vec k Float) ((k : Integer) (val : Float))
    (deltaVec k 0 val))

Full error

read decls
Linted Typechecked defs
Linted Grad
Linted Grad tupled
Linted Optgrad
Linted Optgrad tupled
Linted Diffs
Linted OptDiffs
Linted CSE
Writing to obj/rev_generated_deltavecs.kso
Writing to obj/rev_generated_deltavecs.cpp
Compiling: g++ -fmax-errors=5 -fdiagnostics-color=always -Wall -Wno-unused -Wno-maybe-uninitialized -Isrc/runtime -O3 -g -std=c++17 -o obj/rev_generated_deltavecs.exe obj/rev_generated_deltavecs.cpp
obj/rev_generated_deltavecs.cpp: In function ‘ks::ty$fwd$delta ks::fwd$delta(int, double, std::tuple<>, double)’:
obj/rev_generated_deltavecs.cpp:170:19: error: ‘fwd$deltaVec’ was not declared in this scope
 vec<double> c$0 = fwd$deltaVec(k,0,val,std::make_tuple(),std::make_tuple(),d$val);
                   ^~~~~~~~~~~~
obj/rev_generated_deltavecs.cpp:170:19: note: suggested alternative: ‘fwd$delta’
 vec<double> c$0 = fwd$deltaVec(k,0,val,std::make_tuple(),std::make_tuple(),d$val);
                   ^~~~~~~~~~~~
                   fwd$delta
obj/rev_generated_deltavecs.cpp: In function ‘ks::ty$rev$delta ks::rev$delta(int, double, ks::vec<double>)’:
obj/rev_generated_deltavecs.cpp:313:37: error: ‘rev$deltaVec’ was not declared in this scope
 tuple<tuple<>,tuple<>,double> c$0 = rev$deltaVec(k,0,val,d$r);
                                     ^~~~~~~~~~~~
obj/rev_generated_deltavecs.cpp:313:37: note: suggested alternative: ‘rev$delta’
 tuple<tuple<>,tuple<>,double> c$0 = rev$deltaVec(k,0,val,d$r);
                                     ^~~~~~~~~~~~
                                     rev$delta
*** Exception: Compilation failed
CallStack (from HasCallStack):
  error, called at src/ksc/Cgen.hs:766:36 in main:Cgen

Add more rules

Hi @jinnaiyuu, can you add to the rules document all of the rules you showed us today that are not already there, and open a PR for it?

Lams of tuple arguments don't Cgen correctly

Lams didn’t get converted to one-arg properly. We don’t have any lams in our testcases outside of build/sumbuild/folds so I never noticed.

Cgened lams should have their single ksc argument packed up from multiple C++ arguments, like cgened Defs. In fact we can probably reuse the same code for both.

I guess it’s an hour or two of extra work.

Not reverse-mode AD

When I run test1.ksc, I see that the generated code contains functions named rev$foo that compute gradients, however they are not using reverse-mode AD. These generated functions are simply computing forward AD multiple times and tupling up the results. How do I get the generated code for reverse-mode AD?

Polymorphic declarations

 
We should sketch out a design for polymorphic functions and rules.

If we take the example of to_device, it should allow rewrites as follows:

	(to_device N (f x)) -> (f (to_device N x))

which requires to_device to have signature something like

	to_device :: (Int, ‘t) -> ‘t

and today that essentially means edefing it on all types it might see

	(edef to_device Float (Integer Float))
	(edef to_device String (Integer String))
	…
	(edef to_device (Tuple Float (Vec ...)) (Integer (Tuple Float (Vec ...))))

This might be fine, but it might be nicer to say something like

	(forall (Type 't)
	    (edef to_device 't (Integer 't))

Of course this is somewhat all-or-nothing, I don’t have a way to say forall t in some subset of all types. But maybe it covers an important common case just well enough.. and if we're only doing this unconstrained polymorphism then we could just do it without introducing forall and Type,
by letting an apostrophe introduce a new type variable....
Sort of ideologically ugly, but I like the practice of visually tagging type variables anyway...

	(edef to_device 't (Integer 't))

This would allow function declarations of e.g. build to live outside Prim.hs

	(edef build (Vec 't) (Integer (Lam Integer 't)))
	(edef size Integer (Vec 't))

and if we introduced Tangent as a type keyword, we could declare

	(edef ts_add 't ('t (Tangent 't)))

Note that this is not a proposal to introduce generics a la .NET, it's just a parameterized source of method declarations. Method lookup first looks up the function name, then unifies against the set of accepted argument types.

Polymorphic rules

I can imagine some form of polymorphic rule definition of the form

	(rule "to_device.sink" ((x : 't) (f : (Lam 't 's)) (N : Integer)) 
		(to_device N (f x)) 
		(f (to_device N x)))

This doesn't feel like it needs horrendous unification, but maybe there are cans of worms here?

Add Note on Global function ad-hoc overloading

Add somewhere a Note like this, and point to it where necessary:

-- Note [Global function ad-hoc overloading]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A global function is uniquely identified by a combination of
* Its name (e.g. 'mul')
* Its argument type (e.g. (Float, Float))

Many global functions can be called 'mul', provided they have different
argument types.  Indeed you can see this in src/runtime/prelude.ks

The global symbol table is keyed by (name, arg-type) pairs.

All of this would go out of the window if we had polymorphism,
because then the argument type could be a type variable.
But we don't!

Moreover, for type inference, we can work out the type of
the aguments before looking up the function to find its
result type.  That would fail if were were inferring the type
of a recursive function -- but fortunately all functions have
explicitly-declared types.

Originally posted by @simonpj in #266

Call Futhark via API?

Is it possible to link in Futhark and call it directly via a Haskell API? That would make the tests slightly more robust than shelling out to the binary.

Print tuple types right

Currently we pretty-print the type (TypeTuple [Int, Bool] as

  (Tuple Int, Bool)

The commas here are extremely confusing, making it look vey like Tuple [Tuple [Int], Bool]. This was introduced inadvertently in

commit 2b2af6668c084ccaf204a90529a6e75a2a81b2a7
Author: Andrew Fitzgibbon <[email protected]>
Date:   Tue Apr 23 17:56:33 2019 +0100

We should choose one of

   (Tuple Int Bool)
or
  (Int, Bool)

In fact we already have a pretty printer flag that lets us choose when to print in s-expr format. Let's just use it to choose between these two.

Agreed? Tom, might you look at this. It should be a 5-min job.

Error when subexpression has type naming variable bound within the subexpression

On master bffa9e4, this fails to typecheck:

(edef mul$Mat$Vec (Vec m Float) ((Vec m (Vec n Float)) (Vec n Float)))

(def reduced
    (Float)
    ()
        (sum (mul$Mat$Vec 
            (let (var238 (21))
                (build var238 (lam (var239 : Integer) (build 10 (lam (var240 : Integer) (to_float var239))))))
            (build 10 (lam (j : Integer) (to_float j))))))

Error is:

Type errors in Type checking
  Locally bound variable var238
    appears free in result type Vec var238 (Vec 10 Float)
  in the let binding for var238
  In the call of: mul$Mat$Vec
  In the call of: sum
  In the definition of reduced
End of type errors

Error is similar (without the mul$Mat$Vec line) if we lift the let (var238 (21)) to outside the mul$Mat$Vec (inside the sum).
This feels unnecessarily restrictive; we should declare var238 as a bound (existential) within the result type, or anonymous. We don't really need syntax for the user to write this down: in this example, the variable disappears as soon as the result of the mul$Mat$Vec is fed into sum. (Indeed, if we ever need to write the type down, then maybe raising an error is valid - programmer would have to lift the let further out.)

Substitution broken in the presence of shadowing

Do we expect substitution to be broken in the presence of shadowing? If I make the following change to demoN (before the change things are jigged around enough that no bug is exposed) then run ghci -isrc/ksc -e 'demoF TupleAD "test/ksc/sum.ks"' src/ksc/Main.hs then I get the result below. In "The full Jacobian (optimised)" the two a1s have become confused.

index 2dedc64c..83445786 100644
--- a/src/ksc/Main.hs
+++ b/src/ksc/Main.hs
@@ -58,11 +58,14 @@ demoN adp decls

        ; disp "Typechecked declarations" env defs

-       ; (env1, opt_defs) <- optDefs rulebase env defs
-       ; disp "Optimized original definition" env1 opt_defs
+--       ; (env1, opt_defs) <- optDefs rulebase env defs
+--       ; disp "Optimized original definition" env1 opt_defs

-       ; anf_defs <- anfDefs opt_defs
-       ; disp "Anf-ised original definition" env1 anf_defs
+--       ; anf_defs <- anfDefs opt_defs
+--       ; disp "Anf-ised original definition" env1 anf_defs
+
+       ; let env1 = env
+             anf_defs = defs

        ; let grad_defs = gradDefs adp anf_defs
              env2      = env1 `extendGblST` grad_defs
----------------------------
The full Jacobian (unoptimised)
----------------------------

def Dt$vsum (Float, LM (Integer, Vec n Float) Float)
  (i : Integer, v : Vec n Float)
  = (let { gt$i : (Integer, LM (Integer, Vec n Float) Integer)
             = (i,
                lmHCat
                  ( lmOne( (_ : Integer) ),
                    lmZero( (_ : Vec n Float), (_ : Integer) ) )) }
     (let { gt$v : (Vec n Float,
                    LM (Integer, Vec n Float) (Vec n Float))
              = (v,
                 lmHCat
                   ( lmZero( (_ : Integer), (_ : Vec n Float) ),
                     lmOne( (_ : Vec n Float) ) )) }
      (let { gt$i : (Integer, LM (Integer, Vec n Float) Integer)
               = (i, lmZero( (_ : (Integer, Vec n Float)), (_ : Integer) )) }
       (let { gt$n : (Integer, LM (Integer, Vec n Float) Integer)
                = (n, lmZero( (_ : (Integer, Vec n Float)), (_ : Integer) )) }
        (if i == 0
         then (0.0, lmZero( (_ : (Integer, Vec n Float)), (_ : Float) ))
         else (let { a1 : (Float, LM (Integer, Vec n Float) Float)
                       = (let { r : (Float, LM (Integer, Vec n Float) Float)
                                  = Dt$index( fst( gt$i ), fst( gt$v ) ) }
                          (fst( r ),
                           lmCompose( snd( r ), lmVCat( snd( gt$i ), snd( gt$v ) ) ))) }
               (let { a2 : (Float, LM (Integer, Vec n Float) Float)
                        = (let { a1 : (Integer, LM (Integer, Vec n Float) Integer)
                                   = (let { a2 : (Integer, LM (Integer, Vec n Float) Integer)
                                              = (1,
                                                 lmZero
                                                   ( (_ : (Integer, Vec n Float)),
                                                     (_ : Integer) )) }
                                      (let { r : (Integer, LM (Integer, Integer) Integer)
                                               = Dt$+( fst( gt$i ), fst( a2 ) ) }
                                       (fst( r ),
                                        lmCompose( snd( r ), lmVCat( snd( gt$i ), snd( a2 ) ) )))) }
                           (let { r : (Float, LM (Integer, Vec n Float) Float)
                                    = Dt$vsum( fst( a1 ), fst( gt$v ) ) }
                            (fst( r ),
                             lmCompose( snd( r ), lmVCat( snd( a1 ), snd( gt$v ) ) )))) }
                (let { r : (Float, LM (Float, Float) Float)
                         = Dt$+( fst( a1 ), fst( a2 ) ) }
                 (fst( r ),
                  lmCompose( snd( r ), lmVCat( snd( a1 ), snd( a2 ) ) ))))))))))
Linted The full Jacobian (unoptimised)

...

----------------------------
The full Jacobian (optimised)
----------------------------

def Dt$vsum (Float, LM (Integer, Vec n Float) Float)
  (i : Integer,primCallResultTy: Could not determine result type for
  lmHCat [lmOne( (_ : Integer) ), lmOne( (_ : Float) )]
  [LM Integer Integer, LM Float Float]
 v : Vec n Float)
  = (if i == 0
     then (0.0, lmZero( (_ : (Integer, Vec n Float)), (_ : Float) ))
     else (let { t21 : Integer = i + 1 }
           (let { r_6 : (Float, LM (Integer, Vec n Float) Float)
                    = Dt$vsum( t21, v ) }
            (t21 + fst( r_6 ),
             lmCompose
               ( lmHCat( lmOne( (_ : Integer) ), lmOne( (_ : Float) ) ),
                 lmVCat
                   ( lmZero( (_ : (Integer, Vec n Float)), (_ : Integer) ),
                     lmCompose
                       ( snd( r_6 ),
                         lmVCat
                           ( lmZero( (_ : (Integer, Vec n Float)), (_ : Integer) ),
                             lmHCat
                               ( lmZero( (_ : Integer), (_ : Vec n Float) ),
                                 lmOne( (_ : Vec n Float) ) ) ) ) ) )))))

--------------------------
Type errors in The full Jacobian (optimised)
  Ill-typed call to primitive: +
    In a call of: + (Fun (PrimFun "+"))
     Arg types: [Integer, Float]
     Args: [t21, fst( r_6 )]
    ST lookup: Nothing
  In the definition of Dt$vsum
End of type errors
--------------------------

Add linear map constructors

We should allow users to define linear maps, for when we need real Jacobians. This will require some constructors

(lmZero s t)  ;; (S, T) -> LM S T
(lmOne t)     ;; T -> LM T T
(lmVCat m1 m2) ;;  (LM S1 T, LM S2 T) -> LM (S1, S2) T
;etc
(lmMatrix m)        ;; Vec n Vec m T -> LM (Vec m T) (Vec n T) ; sizes just for annotation

How to suppress unused variable warning in futhark check?

Currently we use futhark check in our CI pipeline to check that the Futhark code we emit is at least type-correct. Because it's generated code we end up with quite a lot of unused variable warnings. Is it possible to suppress them? futhark check --help doesn't seem very promising:

~/futhark-0.14.1-linux-x86_64/bin/futhark check --help
Futhark 0.14.1
git: HEAD @ 7b1c9ac (Fri Jan 24 14:09:09 2020 +0100)
Copyright (C) DIKU, University of Copenhagen, released under the ISC license.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Usage: futhark check program
Options:

  -V  --version  Print version information and exit.
  -h  --help     Print help and exit.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.