The latest isaac code triggers many test failures with caffe's opencl branch. The good

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I have fixed more dot() corner cases in <a class="commit-link" data-hovercard-type="co

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Regression for caffe opencl branch. about triton HOT 15 CLOSED

triton-lang commented on July 22, 2024

Regression for caffe opencl branch.

from triton.

Comments (15)

ptillet commented on July 22, 2024

Sorry for the delay.

Right now, ISAAC's test suite uses prime numbers for M, N and K, but caffe calls GEMM on corner cases (M=1, K=1 or N=1). This has caused some issues in the past.

I'll add tests for corner cases ASAP so I can pinpoint the problem. The only hardware I can test this on is Broadwell 5500U iGPU. What hardware have you been using. I've not been using Beignet; I'll try this too.

Thanks for the report!

from triton.

gongzg commented on July 22, 2024

@ptillet Broadwell 5500U is good to reproduce this issue, although the performance is not as good as the OpenCL SDK. But the test suite pass rate is very good, please use the git master beignet. Thanks for your support!

from triton.

ptillet commented on July 22, 2024

GEMM-xT fails for M=1. That's probably the cause of all your issues. I'm on it :)

from triton.

ptillet commented on July 22, 2024

@gongzg I have just pushed two small fixes for BLAS, for an uninitialized variable in BLAS-2 and the BLAS3 failure when M==1. Does it help?

from triton.

gongzg commented on July 22, 2024

@ptillet the crash issue is fixed, but the test failures still exist.

from triton.

ptillet commented on July 22, 2024

I have fixed more dot() corner cases in dbfaef8. There were still some problems left for the cases where SGEMV degenerates to SDOT or SAXPY... Hopefully this should fix some more issues with caffe.

from triton.

gongzg commented on July 22, 2024

@ptillet There are still many failures. I choose one case as below:

build/test/test.testbin --gtest_filter=NetTest/2.TestSharedWeightsResume

It fails with the latest ISAAC. And if I choose viennacl's GEMM and the other math functions still use ISAAC, it could pass. For your reference. Thanks.

from triton.

ptillet commented on July 22, 2024

I've fixed some bugs in the master branch. Now all the OpenCL caffe tests pass on my machine.

from triton.

gongzg commented on July 22, 2024

@ptillet I dig into the reduce_2d::generate_impl and found there is a barrier issue.

element_wise_loop_1D(stream, p_.fetch_policy, (reduction_type_==REDUCE_ROWS)?p_.simd_width:1, "r", upper.str(), "$GLOBAL_IDX_1", "$GLOBAL_SIZE_1", device, [&](unsigned int cwidth)
{
...
stream << "#pragma unroll" << std::endl;
stream << "for($SIZE_T stride = " << p_.local_size_0/2 << "; stride >0; stride /=2)" << std::endl;
stream << "{" << std::endl;
stream.inc_tab();
stream << "$LOCAL_BARRIER;" << std::endl;
stream << "if (lidx < stride)" << std::endl;
stream << "{" << std::endl;
stream.inc_tab();
}

You can see the LOCAL_BARRIER is within the loop, and one example of the loop head is as below:
for(unsigned int r = get_global_id(1)_4; r < (M +3)/4_4/4_4; r += get_global_size(1)_4)
You can see that the r is a variable value which means different for different work items, thus not all work items will enter the loop at the same time which breaks the barrier usage policy defined in the opencl spec and will cause Intel's OpenCL compiler hang, please refer the following description of the barrier routine in OpenCL spec.

Description
All work-items in a work-group executing the kernel on a processor must execute this function before any are allowed to continue execution beyond the barrier. This function must be encountered by all work-items in a work-group executing the kernel.

If barrier is inside a conditional statement, then all work-items must enter the conditional if any work-item enters the conditional statement and executes the barrier.

If barrier is inside a loop, all work-items must execute the barrier for each iteration of the loop before any are allowed to continue execution beyond the barrier.

from triton.

ptillet commented on July 22, 2024

Thanks for investigating the generated kernels! The latest commit did introduce a synchronization issue not caught by the unit tests (dammit!). I'll try to fix this tonight.

For reference, here is ISAAC normally handles this issue:

The upper-bound on the outer loop for r should be rounded up to the next multiple of local_size_1, to ensure that every work group does enter all the iterations of the loop:

for(size_t r = get_global_id(1); r < (M + LOCAL_SIZE_1 - 1)/LOCAL_SIZE_1*LOCAL_SIZE_1; r += get_global_size(1))

All bounds checks are done in this loop in parts that do not include a barrier.

The entire work group enters not only the first loop but also all iterations the second one:

for(size_t stride = LOCAL_SIZE_0/2; stride >0; stride /=2)

from triton.

ptillet commented on July 22, 2024

Just pushed a quick dirty fix that seems to fix synchronization issues on my machine, at the price of a performance hit. Will investigate and try to get a proper fix ASAP.

from triton.

gongzg commented on July 22, 2024

@ptillet The latest fix works. Thanks for your quick fix.

from triton.

gongzg commented on July 22, 2024

@ptillet unfortunately there are still some failures on machine. One of them are
build/test/test.testbin --gtest_filter=InnerProductLayerTest/2.TestBackwardTranspose
It could pass with the 6ac5e1f....

No GPU hang this time.

from triton.

ptillet commented on July 22, 2024

Interesting, the test fails randomly on my machine, about half of the time. Maybe an uninitialized variable... Looking into it...

from triton.

ptillet commented on July 22, 2024

fd5c6d3 seems to have taken care of it. There is one more issue I'm fixing with the Intel OCL Driver -- there's a segfault on deinitialization, and I suspect it causes some tests to crash.

from triton.

Regression for caffe opencl branch. about triton HOT 15 CLOSED

Comments (15)

build/test/test.testbin --gtest_filter=NetTest/2.TestSharedWeightsResume

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent