Context While implementing layer test for Equal operation of TF fr

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

[Good First Issue]: Inconsistent behavior/errors of Equal operation in comparison with TF framework using inf, -inf about openvino HOT 8 OPEN

p-durandin commented on May 24, 2024

[Good First Issue]: Inconsistent behavior/errors of Equal operation in comparison with TF framework using inf, -inf

from openvino.

Comments (8)

maxnick commented on May 24, 2024 1

The first observation is that, there are some wrong values in ymm register for input values of Equal op: m256_f32 = {1.00000000, 1.00000000, 1.00000000, nan, nan, nan, 65504.0000, 65504.0000} vs weights that have correct values: m256_f32 = {nan, inf, -inf, nan, inf, -inf, inf, -inf}

Maybe the reason is the conversion op But they both go through conversion node...

UPD. So, in case of CPU, the weights go as constant nodes and are loaded correctly, but a conversion node for input uses static_cast<dst_t>(std::max(std::min(tmp[j], ubound), lbound)); where ubound and lbound are numeric_limits::max and numeric_limits::lowest respectively that are pos and neg finite values and thus this conversion clamps infinities to finite values

This clamping apparently makes sense only for integral dst types, if the dst type is one of the floating point types than just a static_cast can be used

@DannyVlasenko , thank you for discovering the root cause of the test failure when it's run on the CPU plugin. I would rather agree that such clamping doesn't really reproduce the intermediate conversion behavior for floating point types. Thus, at least for the template specializations for ov::float16 type it does make sense to remove this clamping, just like it's been done for bfloat16.

from openvino.

maxnick commented on May 24, 2024 1

@maxnick Should I make a separate PR for CPU case? Because GPU and CPU cases seem to be unrelated and I'm still thinking how to solve GPU case

@DannyVlasenko , yes, definitely, there is no point of making joint PR for GPU and CPU.

from openvino.

DannyVlasenko commented on May 24, 2024

.take

from openvino.

github-actions commented on May 24, 2024

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

from openvino.

DannyVlasenko commented on May 24, 2024

The first observation is that, there are some wrong values in ymm register for input values of Equal op:
m256_f32 = {1.00000000, 1.00000000, 1.00000000, nan, nan, nan, 65504.0000, 65504.0000}
vs weights that have correct values:
m256_f32 = {nan, inf, -inf, nan, inf, -inf, inf, -inf}

Maybe the reason is the conversion op
But they both go through conversion node...

UPD.
So, in case of CPU, the weights go as constant nodes and are loaded correctly, but a conversion node for input uses
static_cast<dst_t>(std::max(std::min(tmp[j], ubound), lbound));
where ubound and lbound are numeric_limits::max and numeric_limits::lowest respectively that are pos and neg finite values and thus this conversion clamps infinities to finite values

This clamping apparently makes sense only for integral dst types, if the dst type is one of the floating point types than just a static_cast can be used

from openvino.

p-durandin commented on May 24, 2024

@DannyVlasenko while you wait reply from CPU, have you checked GPU behavior?

from openvino.

DannyVlasenko commented on May 24, 2024

@DannyVlasenko while you wait reply from CPU, have you checked GPU behavior?

Looks like I have a solution for CPU, so I can make a separate PR for it, if it is ok.

For GPU the only findings so far are that the kernel selector sets the EnableFP16Emulation flag because engineInfo.supports_fp16 is false. I suppose the cl_khr_fp16 extension should be enabled as the result, but I cannot find any code that does so or even checks for this flag down the line. If I comment out this EnableFP16Emulation, then opencl says obviously that "declaring variable of type 'half' is not allowed". I'm currently investigating this, didn't do anything here last week tbh.

from openvino.

DannyVlasenko commented on May 24, 2024

The first observation is that, there are some wrong values in ymm register for input values of Equal op: m256_f32 = {1.00000000, 1.00000000, 1.00000000, nan, nan, nan, 65504.0000, 65504.0000} vs weights that have correct values: m256_f32 = {nan, inf, -inf, nan, inf, -inf, inf, -inf}
Maybe the reason is the conversion op But they both go through conversion node...
UPD. So, in case of CPU, the weights go as constant nodes and are loaded correctly, but a conversion node for input uses static_cast<dst_t>(std::max(std::min(tmp[j], ubound), lbound)); where ubound and lbound are numeric_limits::max and numeric_limits::lowest respectively that are pos and neg finite values and thus this conversion clamps infinities to finite values
This clamping apparently makes sense only for integral dst types, if the dst type is one of the floating point types than just a static_cast can be used

@DannyVlasenko , thank you for discovering the root cause of the test failure when it's run on the CPU plugin. I would rather agree that such clamping doesn't really reproduce the intermediate conversion behavior for floating point types. Thus, at least for the template specializations for ov::float16 type it does make sense to remove this clamping, just like it's been done for bfloat16.

@maxnick Should I make a separate PR for CPU case? Because GPU and CPU cases seem to be unrelated and I'm still thinking how to solve GPU case

from openvino.

[Good First Issue]: Inconsistent behavior/errors of Equal operation in comparison with TF framework using inf, -inf about openvino HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent