cucapra / diospyros Goto Github PK
View Code? Open in Web Editor NEWSearch-based compiler for high-performance DSP programming
License: MIT License
Search-based compiler for high-performance DSP programming
License: MIT License
For generated kernels, accompanying include header file for import is needed to allow it to get compiled into larger code bases. Quick fix might be to just emit a kernel.h and preprocessor.h file for now with just the signature definitions which can be picked up by a build system.
We should run DCE after register allocation and shuffle truncation. We generate quite a few unnecessary registers.
Generated kernels should probably be encapsulated in a namespace to avoid polluting the base namespace and so that we can precisely call different implementations with potentially the same function call for testing. Probably can get away with a default namespace like dios or diospyros for now but to facilitate automation we may want to allow it to eventually be user controlled since we'll be generating override kernels.
For instance, we may need to call:
// base definition
Eigen::foo()
// our optimized override
OurNamespace::Eigen::foo()
We can support squares written as n * n
, should be fairly simply to extend to target Racket's expt
.
float powf(float x, float y);
Instead of using vector-shuffle-set!
to set the computation on the output array, create a sketch where the write is done at locations $i
, $i+1
, ... and so on.
This forces the sketch to never shuffle the output.
For readability and parsing specs, it would be nice for (make-symbolic-bv-list ty size)
to take an optional name parameter instead of naming all symbolic values v
.
After some initial results after running coverage over different matrix multiply dimensions, the following configurations appear to be failing validation. I think this may again be related to how we store the results back to memory but not sure. The failing cases appears to be super small dimensionality corner cases. The remaining results are still being generated.
Failing configurations:
input_rows, input_cols, output_cols
1, 1, 2
1, 2, 3
1, 1, 5
1, 1, 7
1, 2, 1
1, 2, 2,
1, 2, 3
1, 3, 1
1, 3, 2
1, 3, 3
1, 4, 2
1, 6, 1
Repro from 1054f25:
eigen --kernel multiple --input_rows <input_rows> --input_cols <input_cols> --output_cols <output_cols>
Needed for vector norms and unit quaternion calculations.
Example specification:
float acc = 0.0f;
for (int i = 0; i < 4; i++) {
acc += a_in[i] * a_in[i];
}
for (int i = 0; i < 4; i++) {
b_out[i] = a_in[i] / std::sqrt(acc);
}
}```
I think at some point we fixed this issue with the flush instruction but it appears to have resurfaced again on my end.
Repro from diospyros/utils
:
eigen.py --kernel multiply
Fix for kernel is to add this at the bottom of the generated kernel:
PDX_SAPOS_FP(align_c_out, (xb_vec4Mx8 *)c_out);
Diffing the kernel generated by master with the kernel that was generated after commit 9e0119b only shows changes to the DRAM section. The code body appears to be the same but both cases seem to be failing on my machine now without the flush instruction.
So that for each benchmark, we only need to change parameters and reference names
Compile basic data-independent conditionals (i.e., 2dconv) with cdios
.
Need continuations to translate the following patterns in C:
for (i =0; i < N; i++) {
if (i > N/2) continue; ...
}
for (i =0; i < N; i++) {
if (i > N/2) break; ...
}
if (foo < bar) return;
Racket's for
construct does not support these out of the box, we'll want to use explicit continuations
Repro:
Throws:
standard-module-name-resolver: collection not found
for module path: c
collection: "c"
in collection directories:
/home/vtlee/.racket/7.3/collects
/usr/local/share/racket/collects
... [171 additional linked and package directories]
Fixed it with:
raco pkg install c-utils
Implement a sketch where the compute gets to select one of the last n
shuffle vectors defined before it. Having a parameter n
allows us to make trade-offs with the size of the formula and the flexibility of the sketch.
We use:
(define (all-args)
(hash-keys map))
Which does not keep argument order in the kernel as desired: (order should be I, F, O)
void kernel(float * input_F, float * input_I, float * input_O)
Support compiling multiple functions in one module in cdios
. For simplicity, enforce the specification restrictions on the last function (which can call into other functions).
quotient
and bsdiv
are expensive, avoid exposing them to the solving by pre-calculating the register-of function up to the maximum index size.
Profile to check that this is faster.
If the body of a Racket for loop modifies the index variable, it does not persist to the next iteration: the front end of cdios should error on this (or do the smart thing and translate to an equivalent recursive while). Currently FFT is rewritten manually to a while loop to work around this.
The first query should not use the cost model at all. The subsequent queries can then use the resulting program as an upper bound on cost.
Then remove continuous-aligned-vec
Looks like in the generated kernel.c the Tensilica header imports that support some of these typedefs and macros are missing. Maybe we can emit all the assume headers in the generated kernel.c file by default to make it self-contained?
By itself, the kernel.c when compiled generates a bunch of errors like these on our build system:
stderr: arvr/projects/surreal/flash/diospyros/src/MatMult3x3x3x3.cpp:36:3: error: unknown type name 'valign'
valign align_a_in;
^
arvr/projects/surreal/flash/diospyros/src/MatMult3x3x3x3.cpp:37:33: error: use of undeclared identifier 'xb_vecMxf32'
align_a_in = PDX_LA_MXF32_PP((xb_vecMxf32 *)a_in);
^
arvr/projects/surreal/flash/diospyros/src/MatMult3x3x3x3.cpp:37:46: error: expected expression
align_a_in = PDX_LA_MXF32_PP((xb_vecMxf32 *)a_in);
cdios cdios-tests/matrix-multiply.c
Standard C compilation successful
Writing intermediate files to: compile-out
CDIOS: Compiling C->Racket failed
src/utils.rkt:47:25: current-oracle: unbound identifier
in: current-oracle
context...:
/home/diospyros/Downloads/rosette/rosette/base/form/module.rkt:16:0
I just started to learn Racket programming. In the environment of Racket8.0, the above error occurred. I don't think it is a problem with the code itself. I just started to learn Racket programming. In the environment of Racket8.0, CDIOS has the above error. I don't think it is a problem with the code itself. I installed the necessary components according to the instructions, but some of them may be missed. Please Give me some suggestions, thank you!
In a point product kernel, we want to write:
float qvec[3] = {q_in[0], q_in[1], q_in[2]};
But currently need to write:
float qvec[3];
qvec[0] = q_in[0];
qvec[1] = q_in[1];
qvec[2] = q_in[2];
Need control over the code generated kernel function name to facilitate integration. Currently the tool defaults to void kernel() in the generated file. For automation flows, this will eventually need to be supplied by the user so that it can be changed to something arbitrary to eliminate manually copying and adjusting the function call.
This probably will need us to plumb through arguments to the tool to propagate it through to the code generator.
While running coverage over different matrix multiply dimensions, the following pathological case appears to generate a kernel that causes the instruction set simulator validation to abort.
Repro configurations:
Specification definition specification.c
:
/*!
Specification file of the target kernel to be consumed by the Diosypros tool
*/
#define A_ROWS 2
#define A_COLS 1
#define B_COLS 4
void MatMult2x1x1x4(
float a_in[A_ROWS * A_COLS],
float b_in[A_COLS * B_COLS],
float c_out[A_ROWS * B_COLS]) {
for (int i = 0; i < A_ROWS; i++) {
for (int j = 0; j < B_COLS; j++) {
c_out[j * A_ROWS + i] = 0;
for (int k = 0; k < A_COLS; k++) {
c_out[j * A_ROWS + i] += a_in[k * A_ROWS + i] * b_in[j * A_COLS + k];
}
}
}
}
Manifest definition file diospyros.json
:
{"inputs": {"a": "Eigen::Matrix<float, 2, 1>", "b": "Eigen::Matrix<float, 1, 4>"}, "outputs": {"c": "Eigen::Matrix<float, 2, 4>"}, "test": "c = a * b", "name": "MatMult2x1x1x4", "namespace": "Eigen", "specification": "specification.c", "specification_kernel": "MatMult2x1x1x4", "manifest_path": "build_2x1x1x4/spec", "build": "build_2x1x1x4", "src_dir": "build_2x1x1x4/src", "include_dir": "build_2x1x1x4/include", "bin": "build_2x1x1x4/bin", "test_dir": "build_2x1x1x4/test"}
Call diospyros --manifest diospyros.json
and run test and/or benchmark code. Throws the following exception during ISS:
*WARNING* Unhandled user exception: LoadStoreAlignmentCause (0xbf26daea)
Debug c-based implementation of QR decomposition to reach parity with the Racket DSL.
Would make sense to throw in timing data per query here, too.
Right now loads/stores are to multiples of the full register width, but need to handle inputs that are not aligned to that size.
For the generated kernel, the scalars.h import uses a relative path in the file. Recommend forcing the build to copy scalars.h into compile-out/include or something similar so that the compile output is self-contained.
#include <float.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <xtensa/sim.h>
#include <xtensa/tie/xt_pdxn.h>
#include <xtensa/tie/xt_timer.h>
#include <xtensa/xt_profiling.h>
#include "../../../../src/scalars.h" // scalars.h requires relative path -> would rather have #include <scalar.h> or <diosypros/scalar.h> or something similar.
Investigate whether we can improve the baseline's performance with "super software pipelining"
See section 4.5 Software Pipelining
of Xtensa C and C++ Compiler user guide.
Enable support for ternary operators like the following to improve quality of life:
Repro specification:
void abs_test(float a_in[4], float b_out[4]) {
for (int i = 0; i < 4; i++) {
b_out[i] = (a_in[i] < 0) ? -a_in[i] : a_in[i];
}
}
Compiler aborts with:
can't handle expr #s((expr:if expr 1) #s(src 95 3 15 129 3 49 #f) #s((expr:binop expr 1) #s(src 96 3 16 107 3 27 #f) #s((expr:array-ref expr 1) #s(src 96 3 16 103 3 23 #f) #s((expr:ref expr 1) #s(src 96 3 16 100 3 20 #f) #s((id:var id 1) #s(src 96 3 16 100 3 20 #f) a_in)) #s((expr:ref expr 1) #s(src 101 3 21 102 3 22 #f) #s((id:var id 1) #s(src 101 3 21 102 3 22 #f) i))) #s((id:op id 1) #s(src 104 3 24 105 3 25 #f) <) #s((expr:int expr 1) #s(src 106 3 26 107 3 27 #f) 0 ())) #s((expr:unop expr 1) #s(src 111 3 31 119 3 39 #f) #s((id:op id 1) #s(src 111 3 31 112 3 32 #f) -) #s((expr:array-ref expr 1) #s(src 112 3 32 119 3 39 #f) #s((expr:ref expr 1) #s(src 112 3 32 116 3 36 #f) #s((id:var id 1) #s(src 112 3 32 116 3 36 #f) a_in)) #s((expr:ref expr 1) #s(src 117 3 37 118 3 38 #f) #s((id:var id 1) #s(src 117 3 37 118 3 38 #f) i)))) #s((expr:array-ref expr 1) #s(src 122 3 42 129 3 49 #f) #s((expr:ref expr 1) #s(src 122 3 42 126 3 46 #f) #s((id:var id 1) #s(src 122 3 42 126 3 46 #f) a_in)) #s((expr:ref expr 1) #s(src 127 3 47 128 3 48 #f) #s((id:var id 1) #s(src 127 3 47 128 3 48 #f) i))))
context...:
/home/vtlee/.racket/7.3/pkgs/rosette/rosette/base/form/control.rkt:30:25
/home/vtlee/diospyros/src/c-meta.rkt:135:12: for-loop
/home/vtlee/.racket/7.3/pkgs/rosette/rosette/base/form/control.rkt:31:25
/home/vtlee/.racket/7.3/pkgs/rosette/rosette/base/form/control.rkt:30:25
/home/vtlee/diospyros/src/c-meta.rkt:135:12: for-loop
/home/vtlee/.racket/7.3/pkgs/rosette/rosette/base/form/control.rkt:31:25
/home/vtlee/diospyros/src/c-meta.rkt:205:0: translate-fn-decl
"/home/vtlee/diospyros/src/c-meta.rkt": [running body]
temp37_0
for-loop
run-module-instance!125
perform-require!78
Error: Compilation aborted. cdios return error code 1.
Would enable macros and other compact code like:
#define abs(x) ((x < 0) ? -x : x)
Probably a good to have feature but not absolutely necessary. This one can probably be ignored if if-else support works.
When a configuration passed to run-experiment fails to synthesize, the error message is p opaque:
application: not a procedure;
expected a procedure that can be applied to arguments
given: #<void>
arguments...:
#f
We should fail more gracefully
Codegen currently emits #include "../../../../src/scalars.h"
, only do this when necessary (when the code uses scalar negation or sgn).
Need something like this supported:
void foo(float a_in, float b_in, float c_in, float *x0_out, float *x1_out) {
// function body
}
Can probably be implemented by a 1x1 pointer right now but would improve quality of life.
The intrinsic that saves to memory works if filling all 16 bytes when called as such:
PDX_SAV_MXF32_XP(v_43, align_c_out, (xb_vecMxf32 *)c_out, 16);
However, when filling less than 16 bytes, it will cause the store to fail and not propagate. So for instance, invoking this instruction by itself will not actually propagate the store to the first word of the memory:
PDX_SAV_MXF32_XP(v_43, align_c_out, (xb_vecMxf32 *)c_out, 4);
The solution is to either add the following flush after the PDX_SAV_MXF32_XP instruction or in the case of writing one word, to just store to the pointer:
PDX_SAPOS_FP(align_c_out, (xb_vec4Mx8 *)c_out); // added after PDX_SAV_MXF32_XP
or
*c_out = ; // just directly save value
Currently the compiler generated kernel does not flush the data out which causes it to produce partially incorrect results because values are not propagated.
The SDK instruction has a predicated vector mac instruction which might be easier to compile to than a predicated store instruction (which doesn't seem to exist).
For generated intrinsic kernels, maybe we should add the #ifdef XTENSA #endif to ensure that the kernel is transparent to non-Tensilica compilation targets. If we want, there can also be a fall back to the specification code. If we need both specification and intrinsic code we may need to go with some sort of namespace solution. Perhaps we should discuss more.
The use case is regression testing and CI. Not all CI machines likely will have access to the Tensilica tool chain. But we can still validate any functionality we care about using a CPU compiler.
Needed for catching corner cases regarding pathological values.
Repro specification:
float acc = 0.0f;
for (int i = 0; i < 4; i++) {
acc += a_in[i] * a_in[i];
}
for (int i = 0; i < 4; i++) {
if (acc == 0.0f) {
b_out[i] = 0.0f;
} else {
b_out[i] = a_in[i] / acc;
}
}
}```
Compilation aborts with:
Writing intermediate files to: compile-out
==: this match expander must be used inside match
in: (== acc 0.0)
context...:
do-raise-syntax-error
apply-transformer-in-context
apply-transformer52
dispatch-transformer41
for-loop
[repeats 1 more time]
finish-bodys
for-loop
finish-bodys
for-loop
[repeats 1 more time]
finish-bodys
for-loop
[repeats 1 more time]
finish-bodys
for-loop
...
Error: Compilation aborted. cdios return error code 1.
Repro:
cdios demo/matrix-multiply.c
Standard C compilation successful
Writing intermediate files to: compile-out
cat compile-out/spec.rkt | pypy3 src/dios-egraphs/vec-dsl-conversion.py -w 4 -p > compile-out/spec-egg.rkt
/bin/sh: pypy3: command not found
make: *** [Makefile:52: compile-out/spec-egg.rkt] Error 127
cat: compile-out/kernel.c: No such file or directory
Fixed with just installing based on this: https://doc.pypy.org/en/latest/install.html
After installation:
Shuffle/select should be added by later compiler passes, after register allocation.
We should import as (require (prefix-in c: c))
so as to not overwrite the definition of struct
Minimal repro:
void foo(float x_in, float y_in[3], float z_out) {
float acc = 0.0f;
for (int a = 0; a < 3; a++) {
acc = acc + x_in * y_in[a];
}
z_out = acc;
}
Error message:
[vtlee@vtlee-fedora-IT2277812 ifelse0]$ diospyros.py --manifest spec/diospyros.json
Standard C compilation successful
Writing intermediate files to: compile-out
in-range: contract violation
expected: real?
given: #f
Error: Compilation aborted. cdios return error code 1.
Could be useful to add additional comment information at the top of the generated code file. Right now it just includes a Git hash and status but at some point we may want to add some of these:
Support multidimensional (but still statically defined) matrices in cdios
. Plan is to still unroll to single-dimensional Racket.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.