Giter Club home page Giter Club logo

Comments (15)

ptillet avatar ptillet commented on July 22, 2024

Sorry for the delay.

Right now, ISAAC's test suite uses prime numbers for M, N and K, but caffe calls GEMM on corner cases (M=1, K=1 or N=1). This has caused some issues in the past.

I'll add tests for corner cases ASAP so I can pinpoint the problem. The only hardware I can test this on is Broadwell 5500U iGPU. What hardware have you been using. I've not been using Beignet; I'll try this too.

Thanks for the report!

from triton.

gongzg avatar gongzg commented on July 22, 2024

@ptillet Broadwell 5500U is good to reproduce this issue, although the performance is not as good as the OpenCL SDK. But the test suite pass rate is very good, please use the git master beignet. Thanks for your support!

from triton.

ptillet avatar ptillet commented on July 22, 2024

GEMM-xT fails for M=1. That's probably the cause of all your issues. I'm on it :)

from triton.

ptillet avatar ptillet commented on July 22, 2024

@gongzg I have just pushed two small fixes for BLAS, for an uninitialized variable in BLAS-2 and the BLAS3 failure when M==1. Does it help?

from triton.

gongzg avatar gongzg commented on July 22, 2024

@ptillet the crash issue is fixed, but the test failures still exist.

from triton.

ptillet avatar ptillet commented on July 22, 2024

I have fixed more dot() corner cases in dbfaef8. There were still some problems left for the cases where SGEMV degenerates to SDOT or SAXPY... Hopefully this should fix some more issues with caffe.

from triton.

gongzg avatar gongzg commented on July 22, 2024

@ptillet There are still many failures. I choose one case as below:

build/test/test.testbin --gtest_filter=NetTest/2.TestSharedWeightsResume

It fails with the latest ISAAC. And if I choose viennacl's GEMM and the other math functions still use ISAAC, it could pass. For your reference. Thanks.

from triton.

ptillet avatar ptillet commented on July 22, 2024

I've fixed some bugs in the master branch. Now all the OpenCL caffe tests pass on my machine.

from triton.

gongzg avatar gongzg commented on July 22, 2024

@ptillet I dig into the reduce_2d::generate_impl and found there is a barrier issue.

element_wise_loop_1D(stream, p_.fetch_policy, (reduction_type_==REDUCE_ROWS)?p_.simd_width:1, "r", upper.str(), "$GLOBAL_IDX_1", "$GLOBAL_SIZE_1", device, [&](unsigned int cwidth)
{
...
stream << "#pragma unroll" << std::endl;
stream << "for($SIZE_T stride = " << p_.local_size_0/2 << "; stride >0; stride /=2)" << std::endl;
stream << "{" << std::endl;
stream.inc_tab();
stream << "$LOCAL_BARRIER;" << std::endl;
stream << "if (lidx < stride)" << std::endl;
stream << "{" << std::endl;
stream.inc_tab();
}

You can see the LOCAL_BARRIER is within the loop, and one example of the loop head is as below:
for(unsigned int r = get_global_id(1)_4; r < (M +3)/4_4/4_4; r += get_global_size(1)_4)
You can see that the r is a variable value which means different for different work items, thus not all work items will enter the loop at the same time which breaks the barrier usage policy defined in the opencl spec and will cause Intel's OpenCL compiler hang, please refer the following description of the barrier routine in OpenCL spec.

Description
All work-items in a work-group executing the kernel on a processor must execute this function before any are allowed to continue execution beyond the barrier. This function must be encountered by all work-items in a work-group executing the kernel.

If barrier is inside a conditional statement, then all work-items must enter the conditional if any work-item enters the conditional statement and executes the barrier.

If barrier is inside a loop, all work-items must execute the barrier for each iteration of the loop before any are allowed to continue execution beyond the barrier.

from triton.

ptillet avatar ptillet commented on July 22, 2024

Thanks for investigating the generated kernels! The latest commit did introduce a synchronization issue not caught by the unit tests (dammit!). I'll try to fix this tonight.

For reference, here is ISAAC normally handles this issue:

The upper-bound on the outer loop for r should be rounded up to the next multiple of local_size_1, to ensure that every work group does enter all the iterations of the loop:

for(size_t r = get_global_id(1); r < (M + LOCAL_SIZE_1 - 1)/LOCAL_SIZE_1*LOCAL_SIZE_1; r += get_global_size(1))

All bounds checks are done in this loop in parts that do not include a barrier.

The entire work group enters not only the first loop but also all iterations the second one:

for(size_t stride = LOCAL_SIZE_0/2; stride >0; stride /=2)

from triton.

ptillet avatar ptillet commented on July 22, 2024

Just pushed a quick dirty fix that seems to fix synchronization issues on my machine, at the price of a performance hit. Will investigate and try to get a proper fix ASAP.

from triton.

gongzg avatar gongzg commented on July 22, 2024

@ptillet The latest fix works. Thanks for your quick fix.

from triton.

gongzg avatar gongzg commented on July 22, 2024

@ptillet unfortunately there are still some failures on machine. One of them are
build/test/test.testbin --gtest_filter=InnerProductLayerTest/2.TestBackwardTranspose
It could pass with the 6ac5e1f....

No GPU hang this time.

from triton.

ptillet avatar ptillet commented on July 22, 2024

Interesting, the test fails randomly on my machine, about half of the time. Maybe an uninitialized variable... Looking into it...

from triton.

ptillet avatar ptillet commented on July 22, 2024

fd5c6d3 seems to have taken care of it. There is one more issue I'm fixing with the Intel OCL Driver -- there's a segfault on deinitialization, and I suspect it causes some tests to crash.

from triton.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.