Comments (8)
@yury-gorbachev Tomorrow I will try to find a computer with an Intel GPU, or I will give up this ticket.:)
Any Intel SoC laptop should work, it typically has iGPU on it...
from openvino.
Hi @awayzjj , thank you for checking the issue!
Let me add more details. The issue is connected with an incorrect performance profiling report for the IF operation. It is expected that the IF operation aggregates the execution time of all operations contained in its inner body. However, as I can see, it reports the execution time only for the last primitive in the IF’s inner body.
Regarding your questions:
- I believe the original description meant not "the reported time is almost zero", but rather "the IF operator doesn’t display the real execution time of all underlying operations".
- For checking the correctness of performance profiling, I think it's better to use the benchmark_app with the “-hint latency” option. This will run only a single inference request, and in that case, we can expect that the reported “Total time” metric from per-layer profiling would be more or less close to the final latency and throughput results reported by benchmark_app.
So, to debug this issue, you should use a simple model with an IF operation, the model should contain at least two operations in its else/then body (the type of operations doesn’t matter here, but it's better to use the bigger shapes just to make it more visible)
from openvino.
@p-durandin Hi, I wonder if I can work on this issue without Intel GPU?
from openvino.
.take
from openvino.
Thank you for looking into this issue! Please let us know if you have any questions or require any help.
from openvino.
@awayzjj - since you taken it - I assume you have Intel GPU? :)
from openvino.
@yury-gorbachev Tomorrow I will try to find a computer with an Intel GPU, or I will give up this ticket.:)
from openvino.
@yury-gorbachev @p-durandin @sshlyapn I conducted benchmarking on a model with only one if operation as follows, howerver I have 2 questions:
-
What is meant by "the reported time is almost zero" in the issue context, yet the
Total time
marked with underscores in the provided picture is566.967 milliseconds
? -
Why do you think "Performance counter numbers work incorrectly"? What should be the correct relationship between them? In the example you provided, they are respectively 566.967 ms vs. 4398.12 ms, while in my example, they are 0.002 ms vs. 2.79 ms."
$ ./bin/intel64/Release/benchmark_app -m ./tmp/if.xml -d GPU -pc
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2024.2.0-15181-7ca4f07a3c4
[ INFO ]
[ INFO ] Device info:
[ INFO ] GPU
[ INFO ] Build ................................. 2024.2.0-15181-7ca4f07a3c4
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(GPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 3.36 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Network inputs:
[ INFO ] cond (node: cond) : boolean / [...] / []
[ INFO ] Network outputs:
[ INFO ] res (node: res) : f32 / [...] / [5]
[Step 5/11] Resizing model to match image sizes and given batch
[ WARNING ] cond: layout is not set explicitly. It is STRONGLY recommended to set layout manually to avoid further issues.
[Step 6/11] Configuring input of the model
[ WARNING ] No batch dimension was found, asssuming batch to be 1. Beware: this might affect FPS calculation.
[ INFO ] Model batch size: 1
[ INFO ] Network inputs:
[ INFO ] cond (node: cond) : boolean / [...] / []
[ INFO ] Network outputs:
[ INFO ] res (node: res) : f32 / [...] / [5]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 599.08 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ] NETWORK_NAME: test_if
[ INFO ] OPTIMAL_NUMBER_OF_INFER_REQUESTS: 4
[ INFO ] PERF_COUNT: YES
[ INFO ] ENABLE_CPU_PINNING: NO
[ INFO ] MODEL_PRIORITY: MEDIUM
[ INFO ] GPU_HOST_TASK_PRIORITY: MEDIUM
[ INFO ] GPU_QUEUE_PRIORITY: MEDIUM
[ INFO ] GPU_QUEUE_THROTTLE: MEDIUM
[ INFO ] GPU_ENABLE_LOOP_UNROLLING: YES
[ INFO ] GPU_DISABLE_WINOGRAD_CONVOLUTION: NO
[ INFO ] CACHE_DIR:
[ INFO ] CACHE_MODE: optimize_speed
[ INFO ] PERFORMANCE_HINT: THROUGHPUT
[ INFO ] EXECUTION_MODE_HINT: PERFORMANCE
[ INFO ] COMPILATION_NUM_THREADS: 8
[ INFO ] NUM_STREAMS: 2
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0
[ INFO ] INFERENCE_PRECISION_HINT: f16
[ INFO ] DEVICE_ID: 0
[ INFO ] EXECUTION_DEVICES: GPU.0
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Test Config 0
[ INFO ] cond ([...], boolean, [], static): random (binary data/numpy array is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 0.35 ms
[Step 11/11] Dumping statistics report
[ INFO ] Performance counts for 0-th infer request:
Convert_2253 OPTIMIZED_OUT layerType: Convert execType: undef realTime (ms): 0.000 cpuTime (ms): 0.000
If_36 EXECUTED layerType: If execType: undef realTime (ms): 0.000 cpuTime (ms): 0.000
cond EXECUTED layerType: Parameter execType: wait_for_events__u8 realTime (ms): 0.000 cpuTime (ms): 0.002
res OPTIMIZED_OUT layerType: Convert execType: undef realTime (ms): 0.000 cpuTime (ms): 0.000
res/sink_port_0 EXECUTED layerType: Result execType: reorder_data__f32 realTime (ms): 0.002 cpuTime (ms): 0.007
Total time: 0.002 milliseconds
Total CPU time: 0.009 milliseconds
Full device name: Intel(R) UHD Graphics (iGPU)
[ INFO ] Performance counts for 1-th infer request:
Convert_2253 OPTIMIZED_OUT layerType: Convert execType: undef realTime (ms): 0.000 cpuTime (ms): 0.000
If_36 EXECUTED layerType: If execType: undef realTime (ms): 0.000 cpuTime (ms): 0.000
cond EXECUTED layerType: Parameter execType: wait_for_events__u8 realTime (ms): 0.000 cpuTime (ms): 0.002
res OPTIMIZED_OUT layerType: Convert execType: undef realTime (ms): 0.000 cpuTime (ms): 0.000
res/sink_port_0 EXECUTED layerType: Result execType: reorder_data__f32 realTime (ms): 0.002 cpuTime (ms): 0.007
Total time: 0.002 milliseconds
Total CPU time: 0.009 milliseconds
Full device name: Intel(R) UHD Graphics (iGPU)
[ INFO ] Performance counts for 2-th infer request:
Convert_2253 OPTIMIZED_OUT layerType: Convert execType: undef realTime (ms): 0.000 cpuTime (ms): 0.000
If_36 EXECUTED layerType: If execType: undef realTime (ms): 0.000 cpuTime (ms): 0.000
cond EXECUTED layerType: Parameter execType: wait_for_events__u8 realTime (ms): 0.000 cpuTime (ms): 0.002
res OPTIMIZED_OUT layerType: Convert execType: undef realTime (ms): 0.000 cpuTime (ms): 0.000
res/sink_port_0 EXECUTED layerType: Result execType: reorder_data__f32 realTime (ms): 0.002 cpuTime (ms): 0.007
Total time: 0.002 milliseconds
Total CPU time: 0.009 milliseconds
Full device name: Intel(R) UHD Graphics (iGPU)
[ INFO ] Performance counts for 3-th infer request:
Convert_2253 OPTIMIZED_OUT layerType: Convert execType: undef realTime (ms): 0.000 cpuTime (ms): 0.000
If_36 EXECUTED layerType: If execType: undef realTime (ms): 0.000 cpuTime (ms): 0.000
cond EXECUTED layerType: Parameter execType: wait_for_events__u8 realTime (ms): 0.000 cpuTime (ms): 0.002
res OPTIMIZED_OUT layerType: Convert execType: undef realTime (ms): 0.000 cpuTime (ms): 0.000
res/sink_port_0 EXECUTED layerType: Result execType: reorder_data__f32 realTime (ms): 0.002 cpuTime (ms): 0.007
Total time: 0.002 milliseconds
Total CPU time: 0.009 milliseconds
Full device name: Intel(R) UHD Graphics (iGPU)
[ INFO ] Execution Devices: [ GPU.0 ]
[ INFO ] Count: 686692 iterations
[ INFO ] Duration: 60000.73 ms
[ INFO ] Latency:
[ INFO ] Median: 0.38 ms
[ INFO ] Average: 0.34 ms
[ INFO ] Min: 0.22 ms
[ INFO ] Max: 2.79 ms
[ INFO ] Throughput: 11444.73 FPS
I would greatly appreciate any guidance or suggestions you can offer. Thank you!
from openvino.
Related Issues (20)
- [Bug]: convert layout to 'NCHW' (from 'NHWC' specified above at tensor layout) unable to infer normally HOT 5
- [Performance]: VariadicSplit Op's CPU time is different between 2024.0.0 and 2023.0.0 HOT 6
- [Bug]: Type redefinition compilation error when using `-DENABLE_FASTER_BUILD=ON` in cmake HOT 1
- [Feature Request]: Convolution HOT 1
- [BUG/Question] Frontend Extension not working with python API HOT 5
- [Build]: `fatal error: 'napi.h' file not found` on Homebrew installation on Mac OS X HOT 4
- [Bug]: Cannot cast vector from f32 constant to f64. Some values are outside the range. Example: inf HOT 3
- [Bug]: mbind failed: Operation not permitted HOT 1
- [Feature Request]: Add Support for `MLX` Frontend HOT 4
- [Bug]: `mask` input for `DeformableConv2D` is ignored when ONNX model is converted into OpenVINO IR format using Python `openvino.convert_model`.
- [Bug]: Inference on Arc A770 segfaults after some number of tokens
- [Build]: failed to parse GPG signature for RPM repository HOT 4
- [Bug]: [GPU] clEnqueueFillBuffer, error code: -30 HOT 3
- [Build]: Does not build on musl libc HOT 22
- [Bug]: ov::cache_dir mechanism is sensitive to the currently set LC_NUMERIC locale HOT 4
- [Good First Issue]: [OV JS] Support core.query_model() HOT 3
- [Good First Issue]: Enable CompiledModel set/get property() HOT 3
- Exception occurred during running replacer "ObjectDetectionAPIOutputReplacement" [Build]: - empty IR HOT 14
- [Bug]: No translator found for TFLite_Detection_PostProcess node. HOT 1
- [Bug]: TFLITE CONVERSION FAILIURE No translator found for TFLite_Detection_PostProcess node. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openvino.