spcl / dace Goto Github PK
View Code? Open in Web Editor NEWDaCe - Data Centric Parallel Programming
Home Page: http://dace.is/fast
License: BSD 3-Clause "New" or "Revised" License
DaCe - Data Centric Parallel Programming
Home Page: http://dace.is/fast
License: BSD 3-Clause "New" or "Revised" License
Describe the bug
When opening a Dace program with Diode, the SDFG is not drawn and an error appears "ValueError: No SDFGs found in file. SDFGs are only recognized when @dace.programs or SDFG objects are found in the global scope"
This happens with some of the polybench samples (not all of them)
To Reproduce
Steps to reproduce the behavior:
Describe the bug
When importing DaCe, there's a cyclic dependency in dace.libraries.blas -> dace.library -> dace -> dace.frontend.python.newast -> dace.libraries.blas
. This is causing some weird errors in unexpected places, and we need to find a resolution to this.
To Reproduce
No minimal test case has been produced yet.
We need to upgrade the Xilinx compilation flow to target Vitis, which is the rebranded/repackaged version of SDx/SDAccel. This requires:
xocc
to v++
Test-case: call_sdfg_test.py
Focus on the 'printf("hello world %f\\n", i)'
Lines 1 to 20 in ed882c4
This is misgenerated as
[ 25%] Building CXX object CMakeFiles/caller.dir....//dace/.dacecache/caller/src/cpu/caller.cpp.o
....//dace/.dacecache/caller/src/cpu/caller.cpp:21:20: warning: character constant too long for its type
21 | printf('hello world %f\n', i);
| ^~~~~~~~~~~~~~~~~~
....//dace/.dacecache/caller/src/cpu/caller.cpp: In function ‘void __program_caller_internal(float*)’:
....//dace/.dacecache/caller/src/cpu/caller.cpp:10:51: warning: ‘new’ of type ‘float’ with extended alignment 64 [-Waligned-new=]
10 | float *__tmp0 = new float DACE_ALIGN(64)[2];
| ^
....//dace/.dacecache/caller/src/cpu/caller.cpp:10:51: note: uses ‘void* operator new [](std::size_t)’, which does not have an alignment parameter
....//dace/.dacecache/caller/src/cpu/caller.cpp:10:51: note: use ‘-faligned-new’ to enable C++17 over-aligned new support
....//dace/.dacecache/caller/src/cpu/caller.cpp:21:20: error: invalid conversion from ‘int’ to ‘const char*’ [-fpermissive]
21 | printf('hello world %f\n', i);
| ^~~~~~~~~~~~~~~~~~
| |
| int
In file included from /usr/include/c++/9.2.0/cstdio:42,
from /usr/lib/python3.8/site-packages/dace/codegen/../runtime/include/dace/dace.h:5,
from ....//dace/.dacecache/caller/src/cpu/caller.cpp:2:
/usr/include/stdio.h:332:43: note: initializing argument 1 of ‘int printf(const char*, ...)’
332 | extern int printf (const char *__restrict __format, ...);
| ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~
make[2]: *** [CMakeFiles/caller.dir/build.make:63: CMakeFiles/caller.dir....//dace/.dacecache/caller/src/cpu/caller.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:78: CMakeFiles/caller.dir/all] Error 2
make: *** [Makefile:84: all] Error 2
The "
quotes for string became '
quotes for char
On Python 3.8.1 and python-astunparse 1.6.2
Describe the bug
DIODE does not interact well with SDFGs following the merge of the serialization_cleanup
branch.
Among broken features:
To Reproduce
Steps to reproduce the behavior:
Right now, we are specifying the compiler with an executable name (e.g., "xocc"). Instead, we should pass the root folder of the installation (e.g., /opt/Xilinx/Vitis/2019.2
). Furthermore, we should default to not setting this in the config, and instead letting the CMake script find the local installation.
Is your feature request related to a problem? Please describe.
Currently SDFG.name
is implemented as a Python property, with an advanced setter and validation. This requires it to be handled manually during serialization, which causes weird issues.
Describe the solution you'd like
SDFG.name
is used as a property, so it should be one.
Describe the bug
Applying strict transformations and then compiling this SDFG produces an error.
But only compiling without the strict transformations produces the desired output.
To Reproduce
Steps to reproduce the error:
sdfg.apply_strict_transformations()
sdfg.compile(optimizer="")
Steps to reproduce the working equivalent program:
sdfg.compile(optimizer="")
Expected behavior
Get programs that produce the same output, with and without sdfg.apply_strict_transformations()
.
Desktop
Error log
-- Configuring done
-- Generating done
-- Build files have been written to: /home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/build
Scanning dependencies of target coriolis_stencil
[ 25%] Building CXX object CMakeFiles/coriolis_stencil.dir/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp.o
/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp: In function ‘void __program_coriolis_stencil_internal(double*, double*, double*, double*, double*, int, int, int, int)’:
/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:17:230: error: no matching function for call to ‘dace::ArrayViewIn<double, 2, 1, 0>::ArrayViewIn(int, int, int)’
auto __v_in = dace::ArrayViewIn<double, 2, 1, dace::NA_RUNTIME> (v + ((w + (((8 * (K + 1)) * (u - 1)) * int_ceil(I, 8))) + ((8 * (k + v)) * int_ceil(I, 8))), ((8 * (K + 1)) * int_ceil(I, 8)), 1);
^
In file included from /home/dominic/work/dace/dace/codegen/../runtime/include/dace/dace.h:20,
from /home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:2:
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:45:28: note: candidate: ‘template<class ... Dim> dace::ArrayViewIn<T, DIMS, VECTOR_LEN, NUM_ACCESSES, ALIGNED, OffsetT>::ArrayViewIn(const T*, const Dim& ...)’
explicit DACE_HDFI ArrayViewIn(T const* ptr, const Dim&... strides) :
^~~~~~~~~~~
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:45:28: note: template argument deduction/substitution failed:
/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:17:104: note: cannot convert ‘(v + ((w + (((8 * (K + 1)) * (u - 1)) * int_ceil<int, int>(I, 8))) + ((8 * (k + v)) * int_ceil<int, int>(I, 8))))’ (type ‘int’) to type ‘const double*’
auto __v_in = dace::ArrayViewIn<double, 2, 1, dace::NA_RUNTIME> (v + ((w + (((8 * (K + 1)) * (u - 1)) * int_ceil(I, 8))) + ((8 * (k + v)) * int_ceil(I, 8))), ((8 * (K + 1)) * int_ceil(I, 8)), 1);
~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /home/dominic/work/dace/dace/codegen/../runtime/include/dace/dace.h:20,
from /home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:2:
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:24:11: note: candidate: ‘constexpr dace::ArrayViewIn<double, 2, 1, 0>::ArrayViewIn(const dace::ArrayViewIn<double, 2, 1, 0>&)’
class ArrayViewIn
^~~~~~~~~~~
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:24:11: note: candidate expects 1 argument, 3 provided
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:24:11: note: candidate: ‘constexpr dace::ArrayViewIn<double, 2, 1, 0>::ArrayViewIn(dace::ArrayViewIn<double, 2, 1, 0>&&)’
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:24:11: note: candidate expects 1 argument, 3 provided
/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:18:64: error: expected primary-expression before ‘)’ token
auto *v_in = __v_in.ptr<1>();
^
/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:20:43: warning: unused variable ‘fc_in’ [-Wunused-variable]
auto *fc_in = __fc_in.ptr<1>();
^~~~~
/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:45:230: error: no matching function for call to ‘dace::ArrayViewIn<double, 2, 1, 0>::ArrayViewIn(int, int, int)’
auto __u_in = dace::ArrayViewIn<double, 2, 1, dace::NA_RUNTIME> (u + ((((((8 * u) * (K + 1)) * int_ceil(I, 8)) + w) + ((8 * (k + v)) * int_ceil(I, 8))) - 1), ((8 * (K + 1)) * int_ceil(I, 8)), 1);
^
In file included from /home/dominic/work/dace/dace/codegen/../runtime/include/dace/dace.h:20,
from /home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:2:
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:45:28: note: candidate: ‘template<class ... Dim> dace::ArrayViewIn<T, DIMS, VECTOR_LEN, NUM_ACCESSES, ALIGNED, OffsetT>::ArrayViewIn(const T*, const Dim& ...)’
explicit DACE_HDFI ArrayViewIn(T const* ptr, const Dim&... strides) :
^~~~~~~~~~~
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:45:28: note: template argument deduction/substitution failed:
/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:45:104: note: cannot convert ‘(u + ((((((8 * u) * (K + 1)) * int_ceil<int, int>(I, 8)) + w) + ((8 * (k + v)) * int_ceil<int, int>(I, 8))) - 1))’ (type ‘int’) to type ‘const double*’
auto __u_in = dace::ArrayViewIn<double, 2, 1, dace::NA_RUNTIME> (u + ((((((8 * u) * (K + 1)) * int_ceil(I, 8)) + w) + ((8 * (k + v)) * int_ceil(I, 8))) - 1), ((8 * (K + 1)) * int_ceil(I, 8)), 1);
~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /home/dominic/work/dace/dace/codegen/../runtime/include/dace/dace.h:20,
from /home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:2:
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:24:11: note: candidate: ‘constexpr dace::ArrayViewIn<double, 2, 1, 0>::ArrayViewIn(const dace::ArrayViewIn<double, 2, 1, 0>&)’
class ArrayViewIn
^~~~~~~~~~~
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:24:11: note: candidate expects 1 argument, 3 provided
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:24:11: note: candidate: ‘constexpr dace::ArrayViewIn<double, 2, 1, 0>::ArrayViewIn(dace::ArrayViewIn<double, 2, 1, 0>&&)’
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:24:11: note: candidate expects 1 argument, 3 provided
/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:46:64: error: expected primary-expression before ‘)’ token
auto *u_in = __u_in.ptr<1>();
^
/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:48:43: warning: unused variable ‘fc_in’ [-Wunused-variable]
auto *fc_in = __fc_in.ptr<1>();
^~~~~
In file included from /home/dominic/work/dace/dace/codegen/../runtime/include/dace/dace.h:16,
from /home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:2:
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/math.h: At global scope:
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/math.h:108:43: warning: ‘dace::math::pi’ defined but not used [-Wunused-variable]
static DACE_CONSTEXPR typeless_pi pi{};
^~
CMakeFiles/coriolis_stencil.dir/build.make:62: recipe for target 'CMakeFiles/coriolis_stencil.dir/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp.o' failed
make[2]: *** [CMakeFiles/coriolis_stencil.dir/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp.o] Error 1
CMakeFiles/Makefile2:77: recipe for target 'CMakeFiles/coriolis_stencil.dir/all' failed
make[1]: *** [CMakeFiles/coriolis_stencil.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
Describe the bug
error: cannot convert ‘dace::ArrayViewIn<float, 0, 1, 1>’ to ‘float*’ in assignment
__tmp0 = dace::ArrayViewIn<float, 0, 1, 1> (a + 0);
To Reproduce
import numpy as np
import dace
@dace.program
def foo123(a : dace.float32[2], b : dace.float32[2]):
b[0] = a[0]
A = np.array([1,2], dtype=np.float32)
B = np.array([3,4], dtype=np.float32)
foo123(A, B)
print(A)
print(B)
To test, run immaterial_test.py or immaterial_range_test.py without strict transformations.
Describe the bug
Numpy interface semantics of passing arrays in function arguments doesn't correspond to python semantic.
To Reproduce
import numpy as np
import dace
# dace semantics
M = dace.symbol('M')
K = dace.symbol('K')
@dace.program
def sdfg_transpose(A : dace.float32[M, K], B : dace.float32[K, M]):
for i, j in dace.map[0:M, 0:K]:
B[j, i] = A[i, j]
@dace.program
def transpose_test_fail(C : dace.float32[20, 20], D : dace.float32[20, 20]):
sdfg_transpose(C[:], D[:])
@dace.program
def transpose_test_success(C : dace.float32[20, 20], D : dace.float32[20, 20]):
sdfg_transpose(C[:], D)
c = np.random.rand(20, 20).astype(np.float32)
d = np.zeros((20, 20), dtype=np.float32)
e = np.zeros((20, 20), dtype=np.float32)
transpose_test_fail(c, d, K=20, M=20)
transpose_test_success(c, e, K=20, M=20)
print('dace 1', np.linalg.norm(c.transpose() - d))
print('dace 2', np.linalg.norm(c.transpose() - e))
# python semantics
c = np.random.rand(20, 20).astype(np.float32)
d = np.zeros((20, 20), dtype=np.float32)
e = np.zeros((20, 20), dtype=np.float32)
def transpose(a, b):
b[:] = a[:].transpose()
transpose(c[:], d)
transpose(c[:], e[:])
print('python 1', np.linalg.norm(c.transpose() - d))
print('python 2', np.linalg.norm(c.transpose() - e))
Output
dace 1 11.521441
dace 2 0.0
python 1 0.0
python 2 0.0
Expected output
dace 1 0.0
dace 2 0.0
python 1 0.0
python 2 0.0
Additional context
The possible reason for this problem is that D[:]
creates a copy and passes it inside the function sdfg_transpose
. It makes difficulties if we want to assign something to the subset of the array, because any subsetting operation (like D[3:7]
) will create a copy.
Issue
import numpy as np
import dace
M = dace.symbol('M')
K = dace.symbol('K')
@dace.program
def sdfg_transpose(A : dace.float32[M, K], B : dace.float32[K, M]):
for i, j in dace.map[0:M, 0:K]:
B[j, i] = A[i, j]
@dace.program
def transpose_test(C : dace.float32[20, 20], D : dace.float32[5, 5], E : dace.float32[10, 10]):
sdfg_transpose(C[0:5,0:5], D)
sdfg_transpose(C[0:10,0:10], E)
c = np.random.rand(20, 20).astype(np.float32)
d = np.zeros((5, 5), dtype=np.float32)
e = np.zeros((10, 10), dtype=np.float32)
transpose_test(c, d, e, K=???, M=???) # what K and M I should use here?
print(np.linalg.norm(c[0:5,0:5].transpose() - d))
print(np.linalg.norm(c[0:10,0:10].transpose() - e))
Proposed solution 1
Automatical derivation of symbolic values
import numpy as np
import dace
M = dace.symbol('M')
K = dace.symbol('K')
@dace.program
def sdfg_transpose(A : dace.float32[M, K], B : dace.float32[K, M]):
for i, j in dace.map[0:M, 0:K]:
B[j, i] = A[i, j]
@dace.program
def transpose_test(C : dace.float32[20, 20], D : dace.float32[5, 5], E : dace.float32[10, 10]):
sdfg_transpose(C[0:5,0:5], D)
sdfg_transpose(C[0:10,0:10], E)
c = np.random.rand(20, 20).astype(np.float32)
d = np.zeros((5, 5), dtype=np.float32)
e = np.zeros((10, 10), dtype=np.float32)
transpose_test(c, d, e) # <<< THIS
print(np.linalg.norm(c[0:5,0:5].transpose() - d))
print(np.linalg.norm(c[0:10,0:10].transpose() - e))
Proposed solution 2
import numpy as np
import dace
M = dace.symbol('M')
K = dace.symbol('K')
@dace.program
def sdfg_transpose(A : dace.float32[M, K], B : dace.float32[K, M]):
for i, j in dace.map[0:M, 0:K]:
B[j, i] = A[i, j]
@dace.program
def transpose_test(C : dace.float32[20, 20], D : dace.float32[5, 5], E : dace.float32[10, 10]):
sdfg_transpose(C[0:5,0:5], D, K=5, M=5) # <<< THIS
sdfg_transpose(C[0:10,0:10], E, K=10, M=10) # <<< THIS
c = np.random.rand(20, 20).astype(np.float32)
d = np.zeros((5, 5), dtype=np.float32)
e = np.zeros((10, 10), dtype=np.float32)
transpose_test(c, d, e)
print(np.linalg.norm(c[0:5,0:5].transpose() - d))
print(np.linalg.norm(c[0:10,0:10].transpose() - e))
Currently, streams are not properly handled in fpga_transform_state.
This will let the codegeneration phase fails in one tries to convert to FPGA DaCe programs that contain stream (e.g. samples/simple/filter.py
).
In the case of transient streams, this could be fixed by changing the storage class (+ I think some changes in sdfg_nesting.py
).
We are missing many tests (e.g., running code, transformations) in DIODE, which could be tested in two ways:
diode_client
, sending HTTP requests to the serverIn the Python frontend, a global variable with the same name as a map variable will override it.
Transformations broke since the merge of the serialization cleanup
Describe the bug
Segfault during illegal memory access from CPU to cudaMalloc allocated memory. Codegen creates code for WCR on CPU instead of GPU.
To Reproduce
Describe the bug
I am trying to use the input variable (not symbolic value) to define range of map iterations. It doesn't work (probably due to incorrect memlet propagation).
To Reproduce
import dace
import numpy as np
N = dace.symbol('N')
@dace.program
def plus_1(X_in: dace.float32[N], num: dace.float32[1], X_out: dace.float32[N]):
@dace.map
def p1(i : _[0:num[0]]):
x_in << X_in[i]
x_out >> X_out[i]
x_out = x_in + 1
X = np.random.rand(10).astype(np.float32)
Y = np.zeros(10)
num = np.zeros(1)
num[0] = 7
plus_1(X_in=X, num=num, X_out=Y, N=10)
print(Y)
It gives an error: KeyError: 'Missing program argument "__p1_e0"'
Expected behavior
First 7 elements of Y filled by non-zero random values from X with added 1 to them.
(another problem: can't show all text on memlets simultaneously, this is why there are two screenshots)
Symbols are now per-SDFG
When a program segfaults and crashes when run through DIODE, DIODE also dies.
Instead, DIODE should run the program in a separate process, and realize that the process crashed, and report this to the user.
When multiple tests are running in Jenkins concurrently, the MPI test can fail sporadically. This shows up as false negatives for commits/pull requests that don't actually contain any new bugs.
/opt/mpich3.2.11/bin/mpirun
Running python3
Traceback (most recent call last):
File "/var/lib/jenkins/workspace/dace_intel_fpga/tests/../tests/immaterial_test.py", line 4, in <module>
import dace
File "/var/lib/jenkins/workspace/dace_intel_fpga/dace/__init__.py", line 4, in <module>
from .frontend.python.decorators import *
File "/var/lib/jenkins/workspace/dace_intel_fpga/dace/frontend/python/decorators.py", line 7, in <module>
from dace.frontend.python import parser
File "/var/lib/jenkins/workspace/dace_intel_fpga/dace/frontend/python/parser.py", line 8, in <module>
from dace.config import Config
File "/var/lib/jenkins/workspace/dace_intel_fpga/dace/config.py", line 266, in <module>
Config.initialize()
File "/var/lib/jenkins/workspace/dace_intel_fpga/dace/config.py", line 89, in initialize
Config.load()
File "/var/lib/jenkins/workspace/dace_intel_fpga/dace/config.py", line 111, in load
Config._config_metadata['required'])
File "/var/lib/jenkins/workspace/dace_intel_fpga/dace/config.py", line 24, in _add_defaults
if k not in config:
TypeError: argument of type 'NoneType' is not iterable
Ask for a demo or usage for running double buffering in Python.
| source code | | xforms | history
files +-----------------+ SDFG +------------------
| generated code | | properties
Memlets that become scalars do not generate proper nested sdfgs. Usually happens with maps of size 1
subset
and other_subset
, but src_subset
and dst_subset
or subset
/reindex
num_accesses
going into the node (number of inputs) and the sum of num_accesses
going out of the node (outputs)After that:
To Reproduce
Try to compile:
@dace.program
def linear(x: dace.float32[N, N, N], w: dace.float32[N, N]):
out = np.ndarray(x.shape, x.dtype)
for i in dace.map[0:N]:
out[i] = x[i] @ w
return out
Compiling this, however, works:
@dace.program
def linear(x: dace.float32[N, N, N], w: dace.float32[N, N]):
out = np.ndarray(x.shape, x.dtype)
for i in dace.map[0:N]:
out[i] = x[i] @ w[:]
return out
Use a testing framework like pytest
Describe the bug
Compiling this SDFG produces an error.
But first applying strict transformations an then compiling produces the desired output.
To Reproduce
Steps to reproduce the error:
sdfg.compile(optimizer="")
Steps to reproduce the workaround:
sdfg.apply_strict_transformations()
sdfg.compile(optimizer="")
Expected behavior
Get programs with the same output with and without sdfg.apply_strict_transformations()
.
Desktop
Describe the bug
If the first run of dace program fails in jupyter notebook, the second run will complain that the shared library is already loaded.
To Reproduce
Steps to reproduce the behavior (see screenshot):
K
and M
Expected behavior
Cell 4 should be executed without any problems.
Describe the bug
SDFG.from_file
calls SDFG.validate
, which fails if there are unexpanded library nodes.
To Reproduce
Load an SDFG from file that has unexpanded library nodes.
Expected behavior
This failure should only happen when we're doing/about to do code generation. We need to move this validation somewhere else, or distinguish between the two cases when calling SDFG.validate
.
Additional context
Add any other context about the problem here.
Running this code:
import dace
import numpy as np
n = dace.symbol("n")
@dace.program
def dot(x: dace.float32[n], y: dace.float32[n], result: dace.float32[1]):
@dace.map(_[0:n])
def product(i):
x_in << x[i]
y_in << y[i]
result_out >> result(1, lambda a, b: a + b)
result_out = x_in * y_in
# ----------
# MAIN
# ----------
if __name__== "__main__":
a = np.array([1,2,3,4,5,6], dtype=np.float32)
b = np.array([1,2,3,4,5,6], dtype=np.float32)
c = np.array([0], dtype=np.float32)
dot_sdfg = dot.to_sdfg()
dot_sdfg(x=a, y=b, result=c, n=a.shape[0])
print("Vec a: ", a)
print("Vec b: ", b)
print(c)
After applying "FPGATransformSDFG" the tasklet in connector and the inner state source memlet have a name clash i.e. produce a shadowing issue. See also in the attached image of the SDFG generated by the code after applying the FPGA transformation.
Last lines of error output:
File "/home/burgerm/dace/dace/codegen/targets/cpu.py", line 464, in _emit_copy
" " + self.memlet_definition(sdfg, memlet, False, vconn),
File "/home/burgerm/dace/dace/codegen/targets/cpu.py", line 975, in memlet_definition
allow_shadowing=allow_shadowing)
File "/home/burgerm/dace/dace/codegen/targets/target.py", line 226, in add
raise dace.codegen.codegen.CodegenError(err_str)
dace.codegen.codegen.CodegenError: Shadowing variable x_in from type DefinedType.Pointer to DefinedType.Scalar
DIODE should work with more than one file, and be able to be part of a Python workflow with saving files.
Describe the bug
If two different SDFG are loaded inside the same .py or .ibynb script with the same name
sdfg1 = dace.SDFG('unique_name')
x = sdfg1.compile(optimizer=False)
sdfg2 = dace.SDFG('unique_name')
y = sdfg2.compile(optimizer=False)
An error appears:
... dace/codegen/compiler.py:104: UserWarning: Library ... already loaded, renaming file self._library_filename)
Segmentation fault (core dumped)
Reproduce
Sometimes it is required to make more than two SDFG with the same name to reproduce.
To Reproduce
@dace.program
def subrange_of_subrange(A: dace.float32[2, 3, 4, 5], B: dace.float32[4]):
i = 0
j = 0
k = 0
B[:] = A[:, i, :, j][k, :]
i
, j
, and k
do not appear in the generated code
Integration of a minimal set of SMI functionalities (p2p communications for the moment being).
Introduce the concept of remote streams
If SMI should be used or not, is determined at codegeneration by looking at if remote streams are used
The use of SMI is detected in the code-generation phase. In this case, proper Make targets are created for favoring compilation/emulation of SMI based programs.
This requires to define a topology file (that contains the mapping program <-> rank) for the sake of emulation. In this first implementation, this is not so meaningful but will be required for full SMI integration
Defined a target_name field, which can be set when returning a codegenobject and we want to have the field initialized
For the sake of enabling an easy emulation toolchain, the host generated code will assume the presence of the following attributes:
smi_rank
: current rank (int)smi_num_ranks
: total number of ranks (int)smi_device
: device used (int, useful for running on Noctua)These must be defined by specializing the SDFG.
TODO: this must be cleaned
Describe the bug
When floor division (//) instead of division (/) is used inside the index, compilation fails.
To Reproduce
Try to run this program:
@dace.program(dace.float64[N], dace.float64[N])
def floor_div(Input, Output):
@dace.map(_[0:N])
def div(i):
inp << Input[i//N]
out >> Output[i]
out = inp
When compiling and running multiple SDFGs with the same name from the same Python executable, the SDFG is usually deleted in between (I assume by garbage collection?), which unloads the dynamic libraries.
However, when CUDA is involved, it seems that the SDFGs are not deleted even when they are no longer references by Python, which results in the loaded library to stick around. In practice, this can result in DuplicateDLLError when trying to load a new SDFG using the same name.
SDFGs should be cleaned up when they are no longer referenced, and it should be possible to run multiple SDFGs with the same name from the same Python executable without explicitly deleting them.
Run test tests/library/blas_dot.py
on the library_nodes
branch, but change the initialization of dace.SDFG
to always have the same name, then run the test including the cuBLAS runs.
Name each SDFG differently, or explicitly call del my_sdfg
between executions.
Describe the bug
Adding operators to element-wise statements, or augmented assignment, fails to run through the frontend.
To Reproduce
Run the program below:
@dace.program
def transpose(A: dace.float32[M, K], B: dace.float32[K, M]):
for i, j in dace.map[0:M, 0:K]:
B[j, i] = A[i, j] + 1
Describe the bug
When I disable strict transformations, generated code doesn't compile.
error: invalid types ‘dace::vec<float, 1> {aka float}[int]’ for array subscript
X_out[0] = __tmpout;
To Reproduce
Steps to reproduce the behavior:
automatic_strict_transformations: false
in ~/.dace.conf
import numpy as np
import dace
N = dace.symbol('N')
@dace.program
def dace_sum(X_in: dace.float32[N], X_out: dace.float32[1]):
dace.reduce(lambda a, b: a + b, X_in, X_out, identity=0)
@dace.program
def dace_max(X_in: dace.float32[N], X_out: dace.float32[1]):
dace.reduce(lambda a, b: max(a, b), X_in, X_out)
@dace.program
def dace_softmax(X_in : dace.float32[N], X_out : dace.float32[N]):
tmp_max = dace.define_local([1], dtype=dace.float32)
tmp_sum = dace.define_local([1], dtype=dace.float32)
dace_max(X_in, tmp_max)
@dace.map
def softmax_tasklet_sub(i : _[0:N]):
x_in << X_in[i]
x_max << tmp_max
x_out >> X_out[i]
x_out = exp(x_in - x_max)
dace_sum(X_out, tmp_sum)
@dace.map
def softmax_tasklet_div(i : _[0:N]):
x_in << X_out[i]
x_sum << tmp_sum
x_out >> X_out[i]
x_out = x_in / x_sum
X = np.array([1,2,3,4,5], dtype=np.float32)
Y = np.zeros(X.shape, dtype=np.float32)
dace_softmax(X_in=X, X_out=Y, N=X.shape[0])
Expected behavior
Everything should work as with
automatic_strict_transformations: true
in ~/.dace.conf
import numpy as np
import dace
@dace.program
def foo123(a : dace.float32[2,3], b : dace.float32[2,3]):
b[0,:] = a[0,:]
A = np.full((2,3), 3, dtype=np.float32)
B = np.full((2,3), 4, dtype=np.float32)
foo123(A, B)
print(A)
print(B)
Error:
InvalidSDFGEdgeError: Dimensionality mismatch between src/dst subsets (at state assign_6_4, edge b[0, 0:3] -> [0:2] (__tmp0:None -> b:None))
Expected output:
If you replace b[0,:] = a[0,:]
by b[0] = a[0]
everything works as expected.
Currently DIODE relies on automatic library node expansion to work. The workflow can be improved by having the buttons to expand individual library nodes for further transformation right in the UI. This should be part of the transformation chain as well, so that it can be undone and saved as part of the DIODE workspace.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.