Giter Club home page Giter Club logo

diospyros's People

Contributors

agent-lee avatar avanhatt avatar dependabot[bot] avatar jamesbornholt avatar jasperxfliang avatar jonathandltran avatar rachitnigam avatar sampsyo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

diospyros's Issues

Emit include header file to allow #include of kernel into codebase

For generated kernels, accompanying include header file for import is needed to allow it to get compiled into larger code bases. Quick fix might be to just emit a kernel.h and preprocessor.h file for now with just the signature definitions which can be picked up by a build system.

Control namespace for generated kernels

Generated kernels should probably be encapsulated in a namespace to avoid polluting the base namespace and so that we can precisely call different implementations with potentially the same function call for testing. Probably can get away with a default namespace like dios or diospyros for now but to facilitate automation we may want to allow it to eventually be user controlled since we'll be generating override kernels.

For instance, we may need to call:
// base definition
Eigen::foo()
// our optimized override
OurNamespace::Eigen::foo()

Aligned writes into output matrix

Instead of using vector-shuffle-set! to set the computation on the output array, create a sketch where the write is done at locations $i, $i+1, ... and so on.

This forces the sketch to never shuffle the output.

Runtime validation failing for various matrix multiply configurations

After some initial results after running coverage over different matrix multiply dimensions, the following configurations appear to be failing validation. I think this may again be related to how we store the results back to memory but not sure. The failing cases appears to be super small dimensionality corner cases. The remaining results are still being generated.

Failing configurations:
input_rows, input_cols, output_cols
1, 1, 2
1, 2, 3
1, 1, 5
1, 1, 7
1, 2, 1
1, 2, 2,
1, 2, 3
1, 3, 1
1, 3, 2
1, 3, 3
1, 4, 2
1, 6, 1

Repro from 1054f25:

  • eigen --kernel multiple --input_rows <input_rows> --input_cols <input_cols> --output_cols <output_cols>
  • Validation when running the test harness will throw a failure when validating the synthesized kernel against the specification

cdios: support for sqrt

Needed for vector norms and unit quaternion calculations.

Example specification:

  float acc = 0.0f;
  for (int i = 0; i < 4; i++) {
    acc += a_in[i] * a_in[i];
  }
  for (int i = 0; i < 4; i++) {
    b_out[i] = a_in[i] / std::sqrt(acc);
  }
}```

Emit PDX_SAPOS_FP for final stores

I think at some point we fixed this issue with the flush instruction but it appears to have resurfaced again on my end.

Repro from diospyros/utils:

  • eigen.py --kernel multiply
  • By default produces a 3x3x3x3 matrix multiply kernel
  • Running test should throw a failure since matrices do not match

Fix for kernel is to add this at the bottom of the generated kernel:
PDX_SAPOS_FP(align_c_out, (xb_vec4Mx8 *)c_out);

Diffing the kernel generated by master with the kernel that was generated after commit 9e0119b only shows changes to the DRAM section. The code body appears to be the same but both cases seem to be failing on my machine now without the flush instruction.

Support early returns/break/continue in cdios

Need continuations to translate the following patterns in C:

  • continue
  • break
  • (early) return
for (i =0; i < N; i++) {
    if (i > N/2) continue; ... 
}
for (i =0; i < N; i++) {
    if (i > N/2) break; ... 
}
if (foo < bar) return;

Racket's for construct does not support these out of the box, we'll want to use explicit continuations

raco pkg install c possibly missing from setup

Repro:

  • Setup as according to README
  • cdios demo/matrix-multiply.c

Throws:
standard-module-name-resolver: collection not found
for module path: c
collection: "c"
in collection directories:
/home/vtlee/.racket/7.3/collects
/usr/local/share/racket/collects
... [171 additional linked and package directories]

Fixed it with:
raco pkg install c-utils

multiple functions in `cdios`

Support compiling multiple functions in one module in cdios. For simplicity, enforce the specification restrictions on the last function (which can call into other functions).

Pre-calculate reg-of table

quotient and bsdiv are expensive, avoid exposing them to the solving by pre-calculating the register-of function up to the maximum index size.

Profile to check that this is faster.

cdios: disallow modification of for loop variable in body

If the body of a Racket for loop modifies the index variable, it does not persist to the next iteration: the front end of cdios should error on this (or do the smart thing and translate to an equivalent recursive while). Currently FFT is rewritten manually to a while loop to work around this.

Incremental synthesis

The first query should not use the cost model at all. The subsequent queries can then use the resulting program as an upper bound on cost.

Tensilica header imports missing from generated kernel.c

Looks like in the generated kernel.c the Tensilica header imports that support some of these typedefs and macros are missing. Maybe we can emit all the assume headers in the generated kernel.c file by default to make it self-contained?

By itself, the kernel.c when compiled generates a bunch of errors like these on our build system:

stderr: arvr/projects/surreal/flash/diospyros/src/MatMult3x3x3x3.cpp:36:3: error: unknown type name 'valign'
valign align_a_in;
^
arvr/projects/surreal/flash/diospyros/src/MatMult3x3x3x3.cpp:37:33: error: use of undeclared identifier 'xb_vecMxf32'
align_a_in = PDX_LA_MXF32_PP((xb_vecMxf32 *)a_in);
^
arvr/projects/surreal/flash/diospyros/src/MatMult3x3x3x3.cpp:37:46: error: expected expression
align_a_in = PDX_LA_MXF32_PP((xb_vecMxf32 *)a_in);

CDIOS: Compiling C->Racket failed

cdios cdios-tests/matrix-multiply.c
Standard C compilation successful
Writing intermediate files to: compile-out
CDIOS: Compiling C->Racket failed
src/utils.rkt:47:25: current-oracle: unbound identifier
in: current-oracle
context...:
/home/diospyros/Downloads/rosette/rosette/base/form/module.rkt:16:0

I just started to learn Racket programming. In the environment of Racket8.0, the above error occurred. I don't think it is a problem with the code itself. I just started to learn Racket programming. In the environment of Racket8.0, CDIOS has the above error. I don't think it is a problem with the code itself. I installed the necessary components according to the instructions, but some of them may be missed. Please Give me some suggestions, thank you!

Need control over emitted kernel function name

Need control over the code generated kernel function name to facilitate integration. Currently the tool defaults to void kernel() in the generated file. For automation flows, this will eventually need to be supplied by the user so that it can be changed to something arbitrary to eliminate manually copying and adjusting the function call.

This probably will need us to plumb through arguments to the tool to propagate it through to the code generator.

2x1 x 1x4 matrix multiply causes ISS to abort

While running coverage over different matrix multiply dimensions, the following pathological case appears to generate a kernel that causes the instruction set simulator validation to abort.

Repro configurations:

Specification definition specification.c :

/*!

  Specification file of the target kernel to be consumed by the Diosypros tool

*/

#define A_ROWS 2
#define A_COLS 1
#define B_COLS 4

void MatMult2x1x1x4(
    float a_in[A_ROWS * A_COLS],
    float b_in[A_COLS * B_COLS],
    float c_out[A_ROWS * B_COLS]) {
  for (int i = 0; i < A_ROWS; i++) {
    for (int j = 0; j < B_COLS; j++) {
      c_out[j * A_ROWS + i] = 0;

      for (int k = 0; k < A_COLS; k++) {
        c_out[j * A_ROWS + i] += a_in[k * A_ROWS + i] * b_in[j * A_COLS + k];
      }
    }
  }
}

Manifest definition file diospyros.json:
{"inputs": {"a": "Eigen::Matrix<float, 2, 1>", "b": "Eigen::Matrix<float, 1, 4>"}, "outputs": {"c": "Eigen::Matrix<float, 2, 4>"}, "test": "c = a * b", "name": "MatMult2x1x1x4", "namespace": "Eigen", "specification": "specification.c", "specification_kernel": "MatMult2x1x1x4", "manifest_path": "build_2x1x1x4/spec", "build": "build_2x1x1x4", "src_dir": "build_2x1x1x4/src", "include_dir": "build_2x1x1x4/include", "bin": "build_2x1x1x4/bin", "test_dir": "build_2x1x1x4/test"}

Call diospyros --manifest diospyros.json and run test and/or benchmark code. Throws the following exception during ISS:
*WARNING* Unhandled user exception: LoadStoreAlignmentCause (0xbf26daea)

scalars.h emitted as relative path breaks self-contained build

For the generated kernel, the scalars.h import uses a relative path in the file. Recommend forcing the build to copy scalars.h into compile-out/include or something similar so that the compile output is self-contained.

#include <float.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <xtensa/sim.h>
#include <xtensa/tie/xt_pdxn.h>
#include <xtensa/tie/xt_timer.h>
#include <xtensa/xt_profiling.h>
#include "../../../../src/scalars.h" // scalars.h requires relative path -> would rather have #include <scalar.h> or <diosypros/scalar.h> or something similar.

Ternary operator support

Enable support for ternary operators like the following to improve quality of life:

Repro specification:

void abs_test(float a_in[4], float b_out[4]) {
  for (int i = 0; i < 4; i++) {
    b_out[i] = (a_in[i] < 0) ? -a_in[i] : a_in[i];
  }
}

Compiler aborts with:

can't handle expr #s((expr:if expr 1) #s(src 95 3 15 129 3 49 #f) #s((expr:binop expr 1) #s(src 96 3 16 107 3 27 #f) #s((expr:array-ref expr 1) #s(src 96 3 16 103 3 23 #f) #s((expr:ref expr 1) #s(src 96 3 16 100 3 20 #f) #s((id:var id 1) #s(src 96 3 16 100 3 20 #f) a_in)) #s((expr:ref expr 1) #s(src 101 3 21 102 3 22 #f) #s((id:var id 1) #s(src 101 3 21 102 3 22 #f) i))) #s((id:op id 1) #s(src 104 3 24 105 3 25 #f) <) #s((expr:int expr 1) #s(src 106 3 26 107 3 27 #f) 0 ())) #s((expr:unop expr 1) #s(src 111 3 31 119 3 39 #f) #s((id:op id 1) #s(src 111 3 31 112 3 32 #f) -) #s((expr:array-ref expr 1) #s(src 112 3 32 119 3 39 #f) #s((expr:ref expr 1) #s(src 112 3 32 116 3 36 #f) #s((id:var id 1) #s(src 112 3 32 116 3 36 #f) a_in)) #s((expr:ref expr 1) #s(src 117 3 37 118 3 38 #f) #s((id:var id 1) #s(src 117 3 37 118 3 38 #f) i)))) #s((expr:array-ref expr 1) #s(src 122 3 42 129 3 49 #f) #s((expr:ref expr 1) #s(src 122 3 42 126 3 46 #f) #s((id:var id 1) #s(src 122 3 42 126 3 46 #f) a_in)) #s((expr:ref expr 1) #s(src 127 3 47 128 3 48 #f) #s((id:var id 1) #s(src 127 3 47 128 3 48 #f) i))))
  context...:
   /home/vtlee/.racket/7.3/pkgs/rosette/rosette/base/form/control.rkt:30:25
   /home/vtlee/diospyros/src/c-meta.rkt:135:12: for-loop
   /home/vtlee/.racket/7.3/pkgs/rosette/rosette/base/form/control.rkt:31:25
   /home/vtlee/.racket/7.3/pkgs/rosette/rosette/base/form/control.rkt:30:25
   /home/vtlee/diospyros/src/c-meta.rkt:135:12: for-loop
   /home/vtlee/.racket/7.3/pkgs/rosette/rosette/base/form/control.rkt:31:25
   /home/vtlee/diospyros/src/c-meta.rkt:205:0: translate-fn-decl
   "/home/vtlee/diospyros/src/c-meta.rkt": [running body]
   temp37_0
   for-loop
   run-module-instance!125
   perform-require!78
Error: Compilation aborted. cdios return error code 1.

Would enable macros and other compact code like:

#define abs(x) ((x < 0) ? -x : x)

Probably a good to have feature but not absolutely necessary. This one can probably be ignored if if-else support works.

Better error message for failed synth with run-experiment

When a configuration passed to run-experiment fails to synthesize, the error message is p opaque:

application: not a procedure;
 expected a procedure that can be applied to arguments
  given: #<void>
  arguments...:
   #f

We should fail more gracefully

Enable scalar inputs to specification

Need something like this supported:

void foo(float a_in, float b_in, float c_in, float *x0_out, float *x1_out) {
// function body
}

Can probably be implemented by a 1x1 pointer right now but would improve quality of life.

Partial PDX_SAV_MXF32_XP stores must be flushed to memory

The intrinsic that saves to memory works if filling all 16 bytes when called as such:
PDX_SAV_MXF32_XP(v_43, align_c_out, (xb_vecMxf32 *)c_out, 16);
However, when filling less than 16 bytes, it will cause the store to fail and not propagate. So for instance, invoking this instruction by itself will not actually propagate the store to the first word of the memory:
PDX_SAV_MXF32_XP(v_43, align_c_out, (xb_vecMxf32 *)c_out, 4);
The solution is to either add the following flush after the PDX_SAV_MXF32_XP instruction or in the case of writing one word, to just store to the pointer:
PDX_SAPOS_FP(align_c_out, (xb_vec4Mx8 *)c_out); // added after PDX_SAV_MXF32_XP
or
*c_out = ; // just directly save value

Currently the compiler generated kernel does not flush the data out which causes it to produce partially incorrect results because values are not propagated.

Implement predicated VMAC

The SDK instruction has a predicated vector mac instruction which might be easier to compile to than a predicated store instruction (which doesn't seem to exist).

Adding __XTENSA__ guards to generated intrinsic kernels

For generated intrinsic kernels, maybe we should add the #ifdef XTENSA #endif to ensure that the kernel is transparent to non-Tensilica compilation targets. If we want, there can also be a fall back to the specification code. If we need both specification and intrinsic code we may need to go with some sort of namespace solution. Perhaps we should discuss more.

The use case is regression testing and CI. Not all CI machines likely will have access to the Tensilica tool chain. But we can still validate any functionality we care about using a CPU compiler.

Enable support for data-dependent if-else

Needed for catching corner cases regarding pathological values.

Repro specification:

  float acc = 0.0f;
  for (int i = 0; i < 4; i++) {
    acc += a_in[i] * a_in[i];
  }
  for (int i = 0; i < 4; i++) {
    if (acc == 0.0f) {
      b_out[i] = 0.0f;
    } else {
      b_out[i] = a_in[i] / acc;
    }
  }
}```

Compilation aborts with:

Writing intermediate files to: compile-out
==: this match expander must be used inside match
in: (== acc 0.0)
context...:
do-raise-syntax-error
apply-transformer-in-context
apply-transformer52
dispatch-transformer41
for-loop
[repeats 1 more time]
finish-bodys
for-loop
finish-bodys
for-loop
[repeats 1 more time]
finish-bodys
for-loop
[repeats 1 more time]
finish-bodys
for-loop
...
Error: Compilation aborted. cdios return error code 1.

Demo requires pypy3 installation

Repro:

cdios demo/matrix-multiply.c
Standard C compilation successful
Writing intermediate files to: compile-out
cat compile-out/spec.rkt | pypy3 src/dios-egraphs/vec-dsl-conversion.py -w 4 -p > compile-out/spec-egg.rkt
/bin/sh: pypy3: command not found
make: *** [Makefile:52: compile-out/spec-egg.rkt] Error 127
cat: compile-out/kernel.c: No such file or directory

Fixed with just installing based on this: https://doc.pypy.org/en/latest/install.html
After installation:

  • pypy3 -m ensurepip
  • pypy3 -mpip install sexpdata

Kernels fails with contract violation (support scalar outputs)

Minimal repro:

void foo(float x_in, float y_in[3], float z_out) {
  float acc = 0.0f;
  for (int a = 0; a < 3; a++) {
    acc = acc + x_in * y_in[a];
  }
  z_out = acc;
}

Error message:

[vtlee@vtlee-fedora-IT2277812 ifelse0]$ diospyros.py --manifest spec/diospyros.json 
Standard C compilation successful
Writing intermediate files to: compile-out
in-range: contract violation
  expected: real?
  given: #f
Error: Compilation aborted. cdios return error code 1.

Generated code comment fields to improve tracking

Could be useful to add additional comment information at the top of the generated code file. Right now it just includes a Git hash and status but at some point we may want to add some of these:

  • generated code timestamp
  • something along the lines of "This code was automatically generated by "
  • code path to the original specification file location

2D matrices in `cdios`

Support multidimensional (but still statically defined) matrices in cdios. Plan is to still unroll to single-dimensional Racket.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.