Comments (7)
If I remember correctly there is a bit of scheduling overhead involved with dirty nifs?
What about a selective approach for nifs that are problematic? ( Or are the majority of the allocation causing problems? )
Possibly using enif_system_info to get the dirty_scheduler_support and then some compile time macro to check if we can actually call enif_schedule_nif?
from cl.
Yeah, there is some overhead, but it's measured in nanoseconds on modern hardware best I can tell. The "yielding NIF" approach is probably not going to be very tractable as most of the NIF functions map pretty much 1:1 to the OpenCL functions with the exception of unpacking terms.
I'll build some test cases where I can isolate and time the individual functions on thousands of runs on all three boxes and report back. There was a great presentation a couple years ago at ElixirConf US where they were performing timing studies on different NIF-handling approaches, maybe I can find the test scaffolding for that somewhere.
It's possible that it's only a subset of problem children, and also that it might be an OpenCL driver-vendor issue. After all, my MacPro is Mid-2010 vintage and I can't imagine that the bus between RAM and the GPU on it is that much faster than an i7-7800X with server-speed RAM and Nvidia Pascal GPUs...
from cl.
I could wrap the nif table entries with something like:
//-------------------------------
#if (ERL_NIF_MAJOR_VERSION > 2) || ((ERL_NIF_MAJOR_VERSION == 2) && (ERL_NIF_MINOR_VERSION >= 12))
//#define NIF_FUNC(name,arity,fptr) {(name),(arity),(fptr),(ERL_NIF_DIRTY_JOB_CPU_BOUND)}
#define NIF_FUNC(name,arity,fptr) {(name),(arity),(fptr),(0)}
#elif (ERL_NIF_MAJOR_VERSION > 2) || ((ERL_NIF_MAJOR_VERSION == 2) && (ERL_NIF_MINOR_VERSION >= 7))
#define NIF_FUNC(name,arity,fptr) {(name),(arity),(fptr),(0)}
#else
#define NIF_FUNC(name,arity,fptr) {(name),(arity),(fptr)}
#endif
//-------------------------------
This way it would be fairly easy to switch to an all dirty nif approach, if it turns out
that the overhead is ok. Or at least allow switch to dirty nif for any one that wants
to compile using -DUSE_DIRTY_SCHEDULER flag ?
Perhaps even have a NIF_DIRTY_FUNC entry that is backward compatible?
from cl.
And just to clarify. My idea with using enif_schedule_nif was not meant to break up a the nif in several pieces, but rather a way to dynamically decide when to run a nif on a dirty secheduler. The idea is to have one entry point ( like: cl:create_image/5 ) in the NIF say ecl_create_image_dyn you can check parameters to see if you want to call create_image/5 indirectly by using enif_scheduler_nif with ERL_NIF_DIRTY_JOB_CPU_BOUND flag or just call ecl_crate_image directly.
from cl.
I prepared the nif table so you can switch between dirty and non dirty. Also added example cl:noop_/0 which is dynamic dirty and and cl:dirty_noop/0 that is always dirty (if supported). You can find a small simple benchmark in test/cl_noop that check the call overhead.
from cl.
Nice, thanks! Once I finish up the 1.2 wrappers, docs and unit tests I'll take a swing at this.
from cl.
OK, just forked your latest to play around with dirty scheduler support and timings. So far I've made one tiny change to c_src/Makefile
to allow USE_DIRTY_SCHEDULER
to be set from the environment when being included as a dependency:
ifeq ($(USE_DIRTY_SCHEDULER), 1)
$(info Compiling with support for dirty schedulers)
CFLAGS += -DUSE_DIRTY_SCHEDULER
endif
I'll keep you posted on what I find out. It might wind up being a compile directive that will be enabled only for certain projects that know they are going to spend a lot of time in OpenCL calls...
from cl.
Related Issues (20)
- `%VSN%` macro is not compatible with rebar get-deps HOT 4
- `rebar compile` doesn't create `cl_drv.so` file HOT 12
- Extract `cl/examples` into separate git repository `cl_examples` HOT 2
- Errors when running some tests on ubuntu 12.10 with AMD/ATI Radeon HD7750 HOT 3
- Add clu:build_source/3 and clu:build_source_file/2 which should expose OpenCL compiler options string argument HOT 1
- Path to OpenCL `lib` folder for AMD APP in dev branch HOT 1
- cl_map example does not work for device all, but is ok with cpu | gpu
- Non-standard application name/version for cl app breaks releases HOT 1
- how to set Local to a value equivalent to C 'NULL' in enqueue_nd_range_kernel function? HOT 6
- Can't run examples (undefined clu:setup) HOT 1
- Verify that the library works with Vulkan HOT 1
- {vsn, git} in version tagged (1.2.1) release archive
- Cannot be build with Intel SDK 2016 R3
- Warnings when building it in Mingw64+Intel SDK 2017R2
- "MinGW-w64 - for 32 and 64 bit Windows" - review the broken link is needed
- enqueue_copy_image has the wrong arity
- Missing OpenCL 1.2 atoms for CL_DEVICE_PARTITION_FAILED and CL_INVALID_DEVICE_PARTITION_COUNT
- Incorrect cflags on aarch64
- OpenCL deprecation warnings when compiling the code HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cl.