Comments (22)
The example works on my machine with both Intel and NVIDIA SDKs:
./sort
input: [ 83, 86, 77, 15, 93, 35, 86, 92, 49, 21, 62, 27, 90, 59, 63, 26, 40, 26, 72, 36, 11, 68, 67, 29, 82, 30, 62, 23, 67, 35, 29, 2, 22 ]
output: [ 2, 11, 15, 21, 22, 23, 26, 26, 27, 29, 29, 30, 35, 35, 36, 40, 49, 59, 62, 62, 63, 67, 67, 68, 72, 77, 82, 83, 86, 86, 90, 92, 93 ]
compute::sort
also works for me with much larger inputs (a couple of millions). It seems that your version of OpenCL has problems with kernels compilation. Can you successfully run other examples from Boost.Compute or even something as simple as https://gist.github.com/ddemidov/2925717?
from compute.
Yes I can run other examples e.g. changing the input to 32 items does not fail, so the problem is somewhere in radix_sort.
I can't try your example because 1) it is opencl 1.2 (correct me if I'm wrong) and I don't have nvidia opencl1.2 headers i.e. CL/cl.hpp. and 2) it does not compile on OSX.
from compute.
My mistake, I needed cl.h pp
I can run it, but both my setups do not contain double precision capable hardware:
$ ./hello
GPUs with double precision not found.
from compute.
It should be enough to delete lines 10-16 and replace double
with float
throughout the example. Sorry for inconvenience.
from compute.
Lines 55-60 should also be removed.
from compute.
as expected, does not fail:
./hello
GeForce 9600 GT
3
from compute.
9600 GT is compute capability 1.0, which does not suport atomic operations. And the kernels generated for the example do use atomics.
from compute.
And the error message for hd4000 says device not available
. Are you able to run the example there?
from compute.
Yes,
./hello.osx
HD Graphics 4000
3
from compute.
I am able to reproduce this as well... The problem is with radix_sort
. The CL_DEVICE_NOT_AVAILABLE
error usually occurs when the program source for the kernel fails to compile.
from compute.
Ya, like Denis said, the radix_sort()
uses atomics which are not supported by your hardware. The reason you see a difference when running different numbers of values is that internally sort()
will use and insertion sort algorithm (which doesn't require atomics) for tiny inputs (32 or less values) and the radix sort algorithm for anything larger.
from compute.
Just to verify, my nvidia 9600gt as well as the intel hd4000 do not have support for atomics?
from compute.
Can you compile and run the following code? It should list your devices with supported extensions. If the device supports atomics, it should have cl_khr_local_int32_base_atomics
, cl_khr_local_int32_extended_atomics
in the output.
#include <iostream>
#include <vector>
#include <string>
#define __CL_ENABLE_EXCEPTIONS
#include <CL/cl.hpp>
int main() {
try {
// Get list of OpenCL platforms.
std::vector<cl::Platform> platform;
cl::Platform::get(&platform);
if (platform.empty()) {
std::cerr << "OpenCL platforms not found." << std::endl;
return 1;
}
for(auto p = platform.begin(); p != platform.end(); p++) {
std::vector<cl::Device> pldev;
try {
p->getDevices(CL_DEVICE_TYPE_ALL, &pldev);
for(auto d = pldev.begin(); d != pldev.end(); d++) {
std::cout << d->getInfo<CL_DEVICE_NAME>() << ":\n\t"
<< d->getInfo<CL_DEVICE_EXTENSIONS>() << std::endl;
}
} catch(...) {
}
}
} catch (const cl::Error &err) {
std::cerr
<< "OpenCL error: "
<< err.what() << "(" << err.err() << ")"
<< std::endl;
return 1;
}
}
from compute.
Seems it's supported
Intel(R) Core(TM) i5-3427U CPU @ 1.80GHz:
cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats cl_APPLE_command_queue_priority
HD Graphics 4000:
cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_image2d_from_buffer cl_khr_gl_depth_images cl_khr_depth_images
from compute.
Those should work. Can you add #define BOOST_COMPUTE_DEBUG_KERNEL_COMPILATION
to the top of your file (before any <boost/compute/*>
includes. This will then print out information indicating why the kernel failed to build.
from compute.
added the line to the sort_vector example:
//---------------------------------------------------------------------------//
// Copyright (c) 2013 Kyle Lutz <[email protected]>
//
// Distributed under the Boost Software License, Version 1.0
// See accompanying file LICENSE_1_0.txt or copy at
// http://www.boost.org/LICENSE_1_0.txt
//
// See http://kylelutz.github.com/compute for more information.
//---------------------------------------------------------------------------//
#include <algorithm>
#include <iostream>
#include <vector>
#define BOOST_COMPUTE_DEBUG_KERNEL_COMPILATION
#include <boost/compute/system.hpp>
#include <boost/compute/algorithm/copy.hpp>
#include <boost/compute/algorithm/sort.hpp>
#include <boost/compute/container/vector.hpp>
...
Output is the same as without the #define
./sort_vector.osx
input: [ 7, 49, 73, 58, 30, 72, 44, 78, 23, 9, 40, 65, 92, 42, 87, 3, 27, 29, 40, 12, 3, 69, 9, 57, 60, 33, 99, 78, 16, 35, 97, 26, 12 ]
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::compute::context_error> >'
what(): [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (10015)
[1] 19241 abort ./sort_vector.osx
from compute.
That's strange. Can you add the following two lines to the top of main:
const boost::compute::device device = boost::compute::system::default_device();
std::cout << "device: " << device.name() << std::endl;
This should find the default OpenCL device (or any OpenCL device) and print its name to stdout. If that doesn't work then there is something wrong with the OpenCL implementation installed on your system. Ensure that the correct OpenCL library/framework is being linked.
from compute.
printing the device works:
./sort_vector.osx
device: HD Graphics 4000
input: [ 7, 49, 73, 58, 30, 72, 44, 78, 23, 9, 40, 65, 92, 42, 87, 3, 27, 29, 40, 12, 3, 69, 9, 57, 60, 33, 99, 78, 16, 35, 97, 26, 12 ]
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::compute::context_error> >'
what(): [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (10015)
[1] 42740 abort ./sort_vector.osx
from compute.
This is very strange. I found this post on the Intel forums (https://software.intel.com/en-us/forums/topic/505677) but with no resolution (and the error message from the OpenCL implementation isn't very helpful).
from compute.
Commented on the intel forum to ask the topic starter if there's any followup
from compute.
@hansbogert
I'm no longer able to reproduce this(tried with 33 and 50 items) both on my Intel HD 4000 as well as Geforce 650M. Perhaps an OS update fixed it(running Yosemite DP 8 now). Check if it is fixed for you as well.
from compute.
Indeed solved, bisected it and came down to commit: a78212f
Rename K to K_BITS in radix_sort()
This should fix the following error seen on the Apple OpenCL
implementation when compiling the radix_sort program: "error:
definition of macro 'K' conflicts with an identifier used in
the precompiled header".
So solved long ago.
from compute.
Related Issues (20)
- Build program failure when use default_random_engine. HOT 1
- Can we use iterator in valarray?
- how about android devices HOT 2
- how to fill array of custom user structures ?
- how to reduce "array of custom user structures" with custom function ?
- vectorized lower_bound HOT 2
- Adding nullptr to std::string HOT 1
- Cannot run example on README.md: what(): Out of Host Memory HOT 1
- How can I return a custom struct from a BOOST_COMPUTE_FUNCTION
- BOOST_COMPUTE_CLOSURE triggers run-time compiling every time the closure value changes
- Unwanted function call for std::map::operator[] HOT 1
- g++ appears to resolve OpenCL/cl.h header but clang does not.
- How to convert a cv::UMat to cv::Mat?
- Minimum Boost Version
- Cmake error on android
- transform_reduce on M1 Pro
- Please do not set CMAKE_MODULE_PATH.
- Why does the transform(...) method work only on vectors with values of type <float> ?
- Get rid of `BOOST_COMPUTE_ADAPT_STRUCT` integrating boost::pfr HOT 1
- Modular Boost C++ Libraries Request
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from compute.