sx-aurora / veda Goto Github PK
View Code? Open in Web Editor NEWVEDA (VE Driver API)
License: Other
VEDA (VE Driver API)
License: Other
It seems that the VE10 has up to "10" virtual cores. The "cores_enable" is a hex value, each bit represents one active core. We need to read that file and create a map per VE for each core.
"numa0_cores" and "numa1_cores" show which cores are dedicated for which VE.
Seems like cmake performs a git clone with --depth 1
so now that aveo has one more commit than before git is unable to checkout d2b04de. I think we need to set GIT_SHALLOW false
[egonzalez@XAIJPVE1 build]$ make
[ 3%] Performing download step (git clone) for 'aveo'
Cloning into 'src'...
error: pathspec 'd2b04de' did not match any file(s) known to git.
CMake Error at tmp/aveo-gitclone.cmake:40 (message):
Failed to checkout tag: 'd2b04de'
I'm using one device +one stream OMP mode:
My Veda codes almost correspond below lines:
//once: vedaCtxCreate(&ctx, VEDA_CONTEXT_MODE_OMP, 0)
vedaDevicePrimaryCtxRetain(&ctx, device);
vedaCtxPushCurrent(ctx);
vedaMemAllocAsync
vedaMemcpyHtoDAsync
...
call kernel
vedaCtxPopCurrent(&ctx);
I keep Veda calls async and synchronize them when I need them inside the CPU operations.
to my surprise, my async Veda operations took more time than I expected, as it was synchronizing inside (vedaMemcpyHtoDAsync).
I'm getting a linking error in my test project, trying to use VERA:
[ 85%] Linking CXX executable veda_test
libfill_lib.so: undefined reference to `veraInit()'
I understand I should be linking my host library with libvera.so
if I'm going to use VERA functionality. For the VEDA functionality, CMake defines VEDA_LIBRARY:
grep -r "FIND_LIBRARY"
veda-0.9.5.1/cmake/FindVE.cmake: FIND_LIBRARY(VEDA_LIBRARY "libveda.so" "libveda.a" PATHS "${VEDA_DIR}/lib64")
but I see no equivalent for the VERA_LIBRARY variable.
Am I doing something wrong?
Some header files seem to be missing when compiling code using vera.h
My CMake compilation output is:
/usr/local/ve/veda-0.9.5/include/vera.h:4:24: fatal error: vera_enums.h: No such file or directory
#include "vera_enums.h"
As a side note, when searching the aurora system (and the github repository), I can't find the next file included in vera.h
either:
vera_types.h
Currently users who use cmake won't notice that their build of a .vso file includes linking libveda.vso (VE side runtime) to it. Users with other make setups need to explicitly add libveda.vso to the link line, which is actually ok.
When using veda for something like a fast JIT, creating, loading and unloading .vso objects all the time, linking libveda.vso to each of the VE side shared objects is increasing latency. It would be sufficient to load the libveda.vso runtime just once, when the proc is created (or the first context for a device is opened in veda). This is an issue encountered during porting JuliaLang to VE.
thread_local std::list could cause segs;
I faced this problem when I had singleton for my vedaInit and vedaExit;
with a little investigation, I found that its due to thread_local and std::llist
I believe it could be either replaced with std::vector or just thread_local could be removed
#0 0x00007ffff5e21143 in std::__cxx11::_List_base<veda::Context*, std::allocator<veda::Context*> >::_M_clear (this=0x7ffff7fcb980) at /usr/include/c++/8/bits/list.tcc:74
#1 0x00007ffff5e21030 in std::__cxx11::list<veda::Context*, std::allocator<veda::Context*> >::clear (this=0x7ffff7fcb980) at /usr/include/c++/8/bits/stl_list.h:1508
#2 0x00007ffff5e2093f in veda::Contexts::shutdown () at /home/qwr/veda/src/veda/Contexts.cpp:54
#3 0x00007ffff5e09041 in vedaExit () at /home/qwr/veda/src/veda.cpp:36
#4 0x0000000001e500e6 in VEDA_HANDLE::~VEDA_HANDLE() ()
#5 0x00007ffff4ae4b0c in __run_exit_handlers () from /lib64/libc.so.6
#6 0x00007ffff4ae4c40 in exit () from /lib64/libc.so.6
#7 0x00007ffff4ace49a in __libc_start_main () from /lib64/libc.so.6
here, I wrote simple thread_local + std::list to show problem
https://godbolt.org/z/7jPWWexGP
we have java project(dl4j) that calls cpp library with veda.
I am facing strange behaviour with Veda. I dont know if I'm using veda incorrectly or its something else.
so each time on exit I'm getting this (annoying lock/wait with below messages):
[VH] [TID 3480243] ERROR: wait_req_ack() timeout waiting for ACK req=1604
[VH] [TID 3480243] ERROR: close() child sent no ACK to EXIT. Killing it.
plus when I ran my inference more , I am getting error for the method call which works just fine for one inference session.
[VE] ERROR: sigactionHandler() Interrupt signal 11 received
0x600fffe00000
0x60001004cf40 -> (null)
0x600c01583b60 -> __vthr$_pcall_va
[VH] [TID 3611133] ERROR: unpack_call_result() VE exception 11
�@$+U?�^\s�?(*f@xϾ?�v�A]
[VH] [TID 3611133] ERROR: _progress_nolock() Internal error on executing a command(-4)
[VEDA_ERROR_VEO_COMMAND_EXCEPTION] /home/qwr/veda/src/veda/Context.cpp (435)
I'm using the veda this way: for now I integrated your vednn library through veda, so it has few ops support
java->javacpp-> nd4jcpp+veda-> device vednn lib
Veda handle cpp class {
function,
module,
ctx
}
Singleton Veda{
list<veda_handle> ;
//for now I'm using one device
init(){
init and load libs per device,
also I pop
vedaCtxPopCurrent;
//this way I'm solving vedaExit problem that I had because of thread local std::list
}
exit() { vedaExit;}
}
how I'm callng (for now I'm calling it using 0th device with the context that was created in OMP mode):
vedaDevicePrimaryCtxRetain(&ctx, device);
vedaCtxPushCurrent(ctx);
veda Mem calls..
veda Launch
sync
vedaCtxPopCurrent(&ctx);
instead of using reinterpret_cast style type punning and enable_if
overload all types with its' proper C setter (int ,..., float, double)
inline VEDAresult vedaArgsSet(VEDAargs args, const int idx, const int32_t value) {
return vedaArgsSetI32(args, idx, value);
}
inline VEDAresult vedaArgsSet(VEDAargs args, const int idx, const int64_t value) {
return vedaArgsSetI64(args, idx, value);
}
inline VEDAresult vedaArgsSet(VEDAargs args, const int idx, const float value) {
return vedaArgsSetF32(args, idx, value);
}
inline VEDAresult vedaArgsSet(VEDAargs args, const int idx, const double value) {
return vedaArgsSetF64(args, idx, value);
}
.....
....
template<typename T, typename... Args>
inline VEDAresult __vedaLaunchKernel(VEDAfunction func, VEDAstream stream, uint64_t* result, VEDAargs args, const int idx, const T value, Args... vargs) {
static_assert(!std::is_same<T, bool>::value, "Don't use bool as data-type when calling a VE function, as it defined as 1B on VH and 4B on VE!");
vedaArgsSet(args, idx, value);
return __vedaLaunchKernel(func, stream, result, args, idx+1, vargs...);
}
I have been using the delayed_malloc as an inspiration to try out my own simple test. Instead of copying a char array, I've been copying two vectors/arrays that are added in a kernel into a third array.
When delay-allocating this third array on the VE, and keeping the memCopy, memFree & ctx synchronization in the same order as in the example case, everything seems to be working fine. So the order is:
VEDA(vedaMemcpyDtoHAsync(cpu_return, kern_sum, N*sizeof(int), stream));
VEDA(vedaMemFreeAsync(x_device, stream));
VEDA(vedaMemFreeAsync(y_device, stream));
VEDA(vedaMemFreeAsync(kern_sum, stream));
vedaCtxSynchronize();
Now. If I replace the device allocation by a pre-allocation, then the array returned has some garbage in the first few elements. The only way to get a proper data return is to put the vedaCtxSynchronize()
call before the vedaMemFreeAsync()
calls.
Did I do something wrong, or is this a bug? Intuitively I would have put a sync() before a free() to start with, but perhaps that's then missing the point of the vedaMemFreeAsync()
functionality?
See attachment for a minimal test case.
Hi,
when using your injection the MPI__COMPILER variables are not set to a valid MPI Compiler.
For instance
CMAKE_MINIMUM_REQUIRED(VERSION 3.9)
PROJECT(TEST C CXX)
FIND_PACKAGE(MPI REQUIRED)
message(${MPI_C_COMPILER})
returns /opt/nec/ve/ncc/3.0.8/bin/ncc/mpincc
as path for the MPI C Compiler.
Best regards,
Severin
Is there any way somehow to embed the device library in the host and load from within?
thanks
Hi,
I noticed that with multiple MPI versions installed, veda doesn't always selects the newest version when using the injection.
With versions 1.3.0, 2.0.0, 2.2.0, 2.3.0, 2.5.0, 2.7.0 and 2.10.0 installed this leads to 2.7.0 being selected. It would be nice if veda would either select the newest MPI version, or, even better, use the MPI version which is in PATH. (At RWTH Aachen we use a module system on the aurora to switch between multiple compiler/MPI versions which would be nice to be considered by veda)
Best regards,
Severin
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.