Comments (8)
The first observation is that, there are some wrong values in ymm register for input values of Equal op: m256_f32 = {1.00000000, 1.00000000, 1.00000000, nan, nan, nan, 65504.0000, 65504.0000} vs weights that have correct values: m256_f32 = {nan, inf, -inf, nan, inf, -inf, inf, -inf}
Maybe the reason is the conversion op But they both go through conversion node...
UPD. So, in case of CPU, the weights go as constant nodes and are loaded correctly, but a conversion node for input uses
static_cast<dst_t>(std::max(std::min(tmp[j], ubound), lbound));
where ubound and lbound are numeric_limits::max and numeric_limits::lowest respectively that are pos and neg finite values and thus this conversion clamps infinities to finite valuesThis clamping apparently makes sense only for integral dst types, if the dst type is one of the floating point types than just a static_cast can be used
@DannyVlasenko , thank you for discovering the root cause of the test failure when it's run on the CPU plugin. I would rather agree that such clamping doesn't really reproduce the intermediate conversion behavior for floating point types. Thus, at least for the template specializations for ov::float16
type it does make sense to remove this clamping, just like it's been done for bfloat16
.
from openvino.
@maxnick Should I make a separate PR for CPU case? Because GPU and CPU cases seem to be unrelated and I'm still thinking how to solve GPU case
@DannyVlasenko , yes, definitely, there is no point of making joint PR for GPU and CPU.
from openvino.
.take
from openvino.
Thank you for looking into this issue! Please let us know if you have any questions or require any help.
from openvino.
The first observation is that, there are some wrong values in ymm register for input values of Equal op:
m256_f32 = {1.00000000, 1.00000000, 1.00000000, nan, nan, nan, 65504.0000, 65504.0000}
vs weights that have correct values:
m256_f32 = {nan, inf, -inf, nan, inf, -inf, inf, -inf}
Maybe the reason is the conversion op
But they both go through conversion node...
UPD.
So, in case of CPU, the weights go as constant nodes and are loaded correctly, but a conversion node for input uses
static_cast<dst_t>(std::max(std::min(tmp[j], ubound), lbound));
where ubound and lbound are numeric_limits::max and numeric_limits::lowest respectively that are pos and neg finite values and thus this conversion clamps infinities to finite values
This clamping apparently makes sense only for integral dst types, if the dst type is one of the floating point types than just a static_cast can be used
from openvino.
@DannyVlasenko while you wait reply from CPU, have you checked GPU behavior?
from openvino.
@DannyVlasenko while you wait reply from CPU, have you checked GPU behavior?
Looks like I have a solution for CPU, so I can make a separate PR for it, if it is ok.
For GPU the only findings so far are that the kernel selector sets the EnableFP16Emulation flag because engineInfo.supports_fp16 is false. I suppose the cl_khr_fp16 extension should be enabled as the result, but I cannot find any code that does so or even checks for this flag down the line. If I comment out this EnableFP16Emulation, then opencl says obviously that "declaring variable of type 'half' is not allowed". I'm currently investigating this, didn't do anything here last week tbh.
from openvino.
The first observation is that, there are some wrong values in ymm register for input values of Equal op: m256_f32 = {1.00000000, 1.00000000, 1.00000000, nan, nan, nan, 65504.0000, 65504.0000} vs weights that have correct values: m256_f32 = {nan, inf, -inf, nan, inf, -inf, inf, -inf}
Maybe the reason is the conversion op But they both go through conversion node...
UPD. So, in case of CPU, the weights go as constant nodes and are loaded correctly, but a conversion node for input usesstatic_cast<dst_t>(std::max(std::min(tmp[j], ubound), lbound));
where ubound and lbound are numeric_limits::max and numeric_limits::lowest respectively that are pos and neg finite values and thus this conversion clamps infinities to finite values
This clamping apparently makes sense only for integral dst types, if the dst type is one of the floating point types than just a static_cast can be used@DannyVlasenko , thank you for discovering the root cause of the test failure when it's run on the CPU plugin. I would rather agree that such clamping doesn't really reproduce the intermediate conversion behavior for floating point types. Thus, at least for the template specializations for
ov::float16
type it does make sense to remove this clamping, just like it's been done forbfloat16
.
@maxnick Should I make a separate PR for CPU case? Because GPU and CPU cases seem to be unrelated and I'm still thinking how to solve GPU case
from openvino.
Related Issues (20)
- [Bug]: The inference result using CPU on MacOS M2 is abnormal, but the result using TEMPLATE device is normal HOT 10
- [Bug]: Memory leak with TBB 2020.3
- [Feature Request]: Rename inputs/outputs HOT 2
- [Bug]: NPU not detected in OpenVINO container HOT 2
- [Bug]: ov::InferRequest::infer() not thread safe when having multiple models HOT 8
- [Performance]: Latency is much longer than RealTime when using benchmark_app for profiling HOT 1
- How to use pre-allocated buffer in output ?
- Openvino Chinese Path and Interface Support HOT 11
- [Bug]: Model infers on Intel CPU, but crashes on ARM CPU (both systems using Ubuntu 22.04) HOT 6
- [Bug]: when loading model, 'Floating point exception (core dumped)‘ happended HOT 2
- [Bug]: Run OpenVINO benchmark_app with Yolo-v4-tf and yolo-v8 INT8 model Failed on NPU HOT 2
- [Bug]: Missing outputs when converting ONNX to IR with ovc HOT 5
- [Performance]: Periodic slow down in InferRequest.infer
- Bug HOT 1
- [Bug]: onnx HOT 4
- [Bug]: crashes when calling c++ in arm android HOT 6
- [Bug]: Operators unsupported HOT 2
- [Bug]: When i use Ultra9 NPU to run models, it report: get_shape was called on a descriptor::Tensor with dynamic shape
- [Bug]: LSAN reports leaks in ONNX frontend HOT 1
- [Bug]: OpenVINO does not support the following ONNX operations: DeformConv HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openvino.