Giter Club home page Giter Club logo

Comments (8)

maxnick avatar maxnick commented on May 24, 2024 1

The first observation is that, there are some wrong values in ymm register for input values of Equal op: m256_f32 = {1.00000000, 1.00000000, 1.00000000, nan, nan, nan, 65504.0000, 65504.0000} vs weights that have correct values: m256_f32 = {nan, inf, -inf, nan, inf, -inf, inf, -inf}

Maybe the reason is the conversion op But they both go through conversion node...

UPD. So, in case of CPU, the weights go as constant nodes and are loaded correctly, but a conversion node for input uses static_cast<dst_t>(std::max(std::min(tmp[j], ubound), lbound)); where ubound and lbound are numeric_limits::max and numeric_limits::lowest respectively that are pos and neg finite values and thus this conversion clamps infinities to finite values

This clamping apparently makes sense only for integral dst types, if the dst type is one of the floating point types than just a static_cast can be used

@DannyVlasenko , thank you for discovering the root cause of the test failure when it's run on the CPU plugin. I would rather agree that such clamping doesn't really reproduce the intermediate conversion behavior for floating point types. Thus, at least for the template specializations for ov::float16 type it does make sense to remove this clamping, just like it's been done for bfloat16.

from openvino.

maxnick avatar maxnick commented on May 24, 2024 1

@maxnick Should I make a separate PR for CPU case? Because GPU and CPU cases seem to be unrelated and I'm still thinking how to solve GPU case

@DannyVlasenko , yes, definitely, there is no point of making joint PR for GPU and CPU.

from openvino.

DannyVlasenko avatar DannyVlasenko commented on May 24, 2024

.take

from openvino.

github-actions avatar github-actions commented on May 24, 2024

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

from openvino.

DannyVlasenko avatar DannyVlasenko commented on May 24, 2024

The first observation is that, there are some wrong values in ymm register for input values of Equal op:
m256_f32 = {1.00000000, 1.00000000, 1.00000000, nan, nan, nan, 65504.0000, 65504.0000}
vs weights that have correct values:
m256_f32 = {nan, inf, -inf, nan, inf, -inf, inf, -inf}

Maybe the reason is the conversion op
But they both go through conversion node...

UPD.
So, in case of CPU, the weights go as constant nodes and are loaded correctly, but a conversion node for input uses
static_cast<dst_t>(std::max(std::min(tmp[j], ubound), lbound));
where ubound and lbound are numeric_limits::max and numeric_limits::lowest respectively that are pos and neg finite values and thus this conversion clamps infinities to finite values

This clamping apparently makes sense only for integral dst types, if the dst type is one of the floating point types than just a static_cast can be used

from openvino.

p-durandin avatar p-durandin commented on May 24, 2024

@DannyVlasenko while you wait reply from CPU, have you checked GPU behavior?

from openvino.

DannyVlasenko avatar DannyVlasenko commented on May 24, 2024

@DannyVlasenko while you wait reply from CPU, have you checked GPU behavior?

Looks like I have a solution for CPU, so I can make a separate PR for it, if it is ok.

For GPU the only findings so far are that the kernel selector sets the EnableFP16Emulation flag because engineInfo.supports_fp16 is false. I suppose the cl_khr_fp16 extension should be enabled as the result, but I cannot find any code that does so or even checks for this flag down the line. If I comment out this EnableFP16Emulation, then opencl says obviously that "declaring variable of type 'half' is not allowed". I'm currently investigating this, didn't do anything here last week tbh.

from openvino.

DannyVlasenko avatar DannyVlasenko commented on May 24, 2024

The first observation is that, there are some wrong values in ymm register for input values of Equal op: m256_f32 = {1.00000000, 1.00000000, 1.00000000, nan, nan, nan, 65504.0000, 65504.0000} vs weights that have correct values: m256_f32 = {nan, inf, -inf, nan, inf, -inf, inf, -inf}
Maybe the reason is the conversion op But they both go through conversion node...
UPD. So, in case of CPU, the weights go as constant nodes and are loaded correctly, but a conversion node for input uses static_cast<dst_t>(std::max(std::min(tmp[j], ubound), lbound)); where ubound and lbound are numeric_limits::max and numeric_limits::lowest respectively that are pos and neg finite values and thus this conversion clamps infinities to finite values
This clamping apparently makes sense only for integral dst types, if the dst type is one of the floating point types than just a static_cast can be used

@DannyVlasenko , thank you for discovering the root cause of the test failure when it's run on the CPU plugin. I would rather agree that such clamping doesn't really reproduce the intermediate conversion behavior for floating point types. Thus, at least for the template specializations for ov::float16 type it does make sense to remove this clamping, just like it's been done for bfloat16.

@maxnick Should I make a separate PR for CPU case? Because GPU and CPU cases seem to be unrelated and I'm still thinking how to solve GPU case

from openvino.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.