Comments (4)
There is obviously something wrong in the compiler here as it should emit valid SPIR-V.
But it also feels to me that this source is not correct. The compiler has no way to know that grid
is a multiple of 4
.
Thus global uint4* histAddr4 = (global uint4*)(hist+grid);
result in a potential unaligned pointer, which lead to undefined behavior according to the specification .
A workaround for this bug would be to either:
- define
hist
asglobal uint4* hist
and change the cast ifgrid
is known to be a multiple of 4:
- global uint4* histAddr4 = (global uint4*)(hist+grid);
+ global uint4* histAddr4 = (global uint4*)(hist + grid/4);
- perform all the loads from
hist
with avload4
which supports unaligned access.
from clspv.
In fact grid = get_global_id(0);
will not always be a multiple of 4
. So a proper implementation should use vload4
. Something like that:
kernel void postprocessHistogram(global uint *hist, int grid_height, int grid_stride)
{
const size_t grid = get_global_id(0);
int grid_x_idx = grid % grid_stride;
int grid_y_idx = grid / grid_stride;
int i = grid_y_idx * grid_height + grid_height / 2;
ulong4 sum4 = { 0, 0, 0, 0 };
hist += grid;
for (uint i = 0; i < 4086; i += 4) {
const uint4 val4 = vload4(0, hist);
const uint4 indices = { i, i + 1, i + 2, i + 3 };
sum4 += convert_ulong4(val4 * indices);
hist += 4;
}
ulong sum = sum4.x + sum4.y + sum4.z + sum4.w;
const ushort avg = convert_ushort_rte(sum / (float)(grid_height));
ulong var = 0;
for (int i = 0; i < 4086; i += 4) {
ulong4 diff = { avg - i, avg - i - 1, avg - i - 2, avg - i - 3 };
diff *= diff;
diff *= convert_ulong4(vload4(0, hist));
var += diff.x;
var += diff.y;
}
}
Which seems to be well compiled by clspv
according to spirv-val
.
It leads to this kind of pattern in the SPIR-V:
%37 = OpLoad %uint %34
%39 = OpPtrAccessChain %_ptr_StorageBuffer_uint %34 %uint_1
%40 = OpLoad %uint %39
%42 = OpPtrAccessChain %_ptr_StorageBuffer_uint %34 %uint_2
%43 = OpLoad %uint %42
%45 = OpPtrAccessChain %_ptr_StorageBuffer_uint %34 %uint_3
%46 = OpLoad %uint %45
%47 = OpCompositeConstruct %v4uint %37 %40 %43 %46
And then it's up to the Vulkan SPIR-V compiler to know whether or not the hardware supports unaligned loads and can coalesce those loads into one I guess.
from clspv.
The point that the casting may cause run-time alignment issue is well taken. But not sure if similar invalid spirv (phi with inconsistent types) will show up with a different kind of cast.
We'll do the workaround in the source code. Thanks.
from clspv.
I agree, there is a bug to fix here.
2 ideas for it:
- Return a proper error with
clspv
when this kind of pattern is detected. - Find a way to force lowering the users of the gep to use
i32
instead ofv4i32
.
from clspv.
Related Issues (20)
- Loads of i32s are fragmented into 4 bytes HOT 3
- how can I cross compile clspv in x86 for risc-v HOT 1
- Loads and Stores of i32s from offset addresses are fragmented HOT 1
- Loads and Stores from offset addresses are fragmented HOT 5
- Implement compatibility for external LLVM-IL HOT 15
- Clspv Fragments access to global memory by the Smallest access size
- Improve `-cl-mad-enable` support
- MADs can be obfuscated by clspv HOT 4
- Segfault with default initialization
- Issue with C++ references
- Error in LinkBuiltinLibrary HOT 5
- -mfmt=c is not a valid option HOT 3
- Addrspace cast lowering pass creates invalid IR for intrinsic calls
- Crash with structures containing generic pointers
- LowerAddrSpaceCast cannot handle multiple llvm.memcpy
- Always generate MemoryAccessAlignedMask for loads/stores
- physical_ptrtoint tests failing
- Invalid OpPtrAccessChain into a Function Storage pointer
- Handle PHINode in LowerAddrSpaceCast better
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clspv.