using <a href="https://github.com/kylelutz/compute/blob/master/example/sort_vector.cpp

It should be enough to delete lines <a href="https://gist.github.com/ddemidov/2925717#

as expected, does not fail: <div class="snippet-clipboard-content notranslate posi

Yes, <div class="snippet-clipboard-content notranslate position-relative overflow-

compute::sort fails with as little as 33 items about compute HOT 22 CLOSED

boostorg commented on May 16, 2024

compute::sort fails with as little as 33 items

from compute.

Comments (22)

ddemidov commented on May 16, 2024

The example works on my machine with both Intel and NVIDIA SDKs:

./sort 
input: [ 83, 86, 77, 15, 93, 35, 86, 92, 49, 21, 62, 27, 90, 59, 63, 26, 40, 26, 72, 36, 11, 68, 67, 29, 82, 30, 62, 23, 67, 35, 29, 2, 22 ]
output: [ 2, 11, 15, 21, 22, 23, 26, 26, 27, 29, 29, 30, 35, 35, 36, 40, 49, 59, 62, 62, 63, 67, 67, 68, 72, 77, 82, 83, 86, 86, 90, 92, 93 ]

compute::sort also works for me with much larger inputs (a couple of millions). It seems that your version of OpenCL has problems with kernels compilation. Can you successfully run other examples from Boost.Compute or even something as simple as https://gist.github.com/ddemidov/2925717?

from compute.

hansbogert commented on May 16, 2024

Yes I can run other examples e.g. changing the input to 32 items does not fail, so the problem is somewhere in radix_sort.
I can't try your example because 1) it is opencl 1.2 (correct me if I'm wrong) and I don't have nvidia opencl1.2 headers i.e. CL/cl.hpp. and 2) it does not compile on OSX.

from compute.

hansbogert commented on May 16, 2024

My mistake, I needed cl.h pp

I can run it, but both my setups do not contain double precision capable hardware:

$ ./hello
GPUs with double precision not found.

from compute.

ddemidov commented on May 16, 2024

It should be enough to delete lines 10-16 and replace double with float throughout the example. Sorry for inconvenience.

from compute.

ddemidov commented on May 16, 2024

Lines 55-60 should also be removed.

from compute.

hansbogert commented on May 16, 2024

as expected, does not fail:

./hello
GeForce 9600 GT
3

from compute.

ddemidov commented on May 16, 2024

9600 GT is compute capability 1.0, which does not suport atomic operations. And the kernels generated for the example do use atomics.

from compute.

ddemidov commented on May 16, 2024

And the error message for hd4000 says device not available. Are you able to run the example there?

from compute.

hansbogert commented on May 16, 2024

Yes,

./hello.osx
HD Graphics 4000
3

from compute.

roshanr95 commented on May 16, 2024

I am able to reproduce this as well... The problem is with radix_sort. The CL_DEVICE_NOT_AVAILABLE error usually occurs when the program source for the kernel fails to compile.

from compute.

kylelutz commented on May 16, 2024

Ya, like Denis said, the radix_sort() uses atomics which are not supported by your hardware. The reason you see a difference when running different numbers of values is that internally sort() will use and insertion sort algorithm (which doesn't require atomics) for tiny inputs (32 or less values) and the radix sort algorithm for anything larger.

from compute.

hansbogert commented on May 16, 2024

Just to verify, my nvidia 9600gt as well as the intel hd4000 do not have support for atomics?

from compute.

ddemidov commented on May 16, 2024

Can you compile and run the following code? It should list your devices with supported extensions. If the device supports atomics, it should have cl_khr_local_int32_base_atomics, cl_khr_local_int32_extended_atomics in the output.

#include <iostream>
#include <vector>
#include <string>

#define __CL_ENABLE_EXCEPTIONS
#include <CL/cl.hpp>

int main() {
    try {
        // Get list of OpenCL platforms.
        std::vector<cl::Platform> platform;
        cl::Platform::get(&platform);

        if (platform.empty()) {
            std::cerr << "OpenCL platforms not found." << std::endl;
            return 1;
        }

        for(auto p = platform.begin(); p != platform.end(); p++) {
            std::vector<cl::Device> pldev;

            try {
                p->getDevices(CL_DEVICE_TYPE_ALL, &pldev);

                for(auto d = pldev.begin(); d != pldev.end(); d++) {
                    std::cout << d->getInfo<CL_DEVICE_NAME>() << ":\n\t"
                        << d->getInfo<CL_DEVICE_EXTENSIONS>() << std::endl;
                }
            } catch(...) {
            }
        }
    } catch (const cl::Error &err) {
        std::cerr
            << "OpenCL error: "
            << err.what() << "(" << err.err() << ")"
            << std::endl;
        return 1;
    }
}

from compute.

hansbogert commented on May 16, 2024

Seems it's supported

Intel(R) Core(TM) i5-3427U CPU @ 1.80GHz:
    cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats cl_APPLE_command_queue_priority
HD Graphics 4000:
    cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_image2d_from_buffer cl_khr_gl_depth_images cl_khr_depth_images

from compute.

kylelutz commented on May 16, 2024

Those should work. Can you add #define BOOST_COMPUTE_DEBUG_KERNEL_COMPILATION to the top of your file (before any <boost/compute/*> includes. This will then print out information indicating why the kernel failed to build.

from compute.

hansbogert commented on May 16, 2024

added the line to the sort_vector example:

//---------------------------------------------------------------------------//
// Copyright (c) 2013 Kyle Lutz <[email protected]>
//
// Distributed under the Boost Software License, Version 1.0
// See accompanying file LICENSE_1_0.txt or copy at
// http://www.boost.org/LICENSE_1_0.txt
//
// See http://kylelutz.github.com/compute for more information.
//---------------------------------------------------------------------------//

#include <algorithm>
#include <iostream>
#include <vector>

#define BOOST_COMPUTE_DEBUG_KERNEL_COMPILATION
#include <boost/compute/system.hpp>
#include <boost/compute/algorithm/copy.hpp>
#include <boost/compute/algorithm/sort.hpp>
#include <boost/compute/container/vector.hpp>
...

Output is the same as without the #define

./sort_vector.osx
input: [ 7, 49, 73, 58, 30, 72, 44, 78, 23, 9, 40, 65, 92, 42, 87, 3, 27, 29, 40, 12, 3, 69, 9, 57, 60, 33, 99, 78, 16, 35, 97, 26, 12 ]
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::compute::context_error> >'
  what():  [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (10015)
[1]    19241 abort      ./sort_vector.osx

from compute.

kylelutz commented on May 16, 2024

That's strange. Can you add the following two lines to the top of main:

const boost::compute::device device = boost::compute::system::default_device();
std::cout << "device: " << device.name() << std::endl;

This should find the default OpenCL device (or any OpenCL device) and print its name to stdout. If that doesn't work then there is something wrong with the OpenCL implementation installed on your system. Ensure that the correct OpenCL library/framework is being linked.

from compute.

hansbogert commented on May 16, 2024

printing the device works:

./sort_vector.osx
device: HD Graphics 4000
input: [ 7, 49, 73, 58, 30, 72, 44, 78, 23, 9, 40, 65, 92, 42, 87, 3, 27, 29, 40, 12, 3, 69, 9, 57, 60, 33, 99, 78, 16, 35, 97, 26, 12 ]
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::compute::context_error> >'
  what():  [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (10015)
[1]    42740 abort      ./sort_vector.osx

from compute.

kylelutz commented on May 16, 2024

This is very strange. I found this post on the Intel forums (https://software.intel.com/en-us/forums/topic/505677) but with no resolution (and the error message from the OpenCL implementation isn't very helpful).

from compute.

hansbogert commented on May 16, 2024

Commented on the intel forum to ask the topic starter if there's any followup

from compute.

roshanr95 commented on May 16, 2024

@hansbogert
I'm no longer able to reproduce this(tried with 33 and 50 items) both on my Intel HD 4000 as well as Geforce 650M. Perhaps an OS update fixed it(running Yosemite DP 8 now). Check if it is fixed for you as well.

from compute.

hansbogert commented on May 16, 2024

Indeed solved, bisected it and came down to commit: a78212f

Rename K to K_BITS in radix_sort()

    This should fix the following error seen on the Apple OpenCL
    implementation when compiling the radix_sort program: "error:
    definition of macro 'K' conflicts with an identifier used in
    the precompiled header".

So solved long ago.

from compute.

compute::sort fails with as little as 33 items about compute HOT 22 CLOSED

Comments (22)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent