Context Performance counter numbers works incorrectly with if oper

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[Good First Issue]: if operation is incorrectly reporting profiling data in GPU about openvino HOT 8 OPEN

p-durandin commented on May 25, 2024

[Good First Issue]: if operation is incorrectly reporting profiling data in GPU

from openvino.

Comments (8)

yury-gorbachev commented on May 25, 2024 1

@yury-gorbachev Tomorrow I will try to find a computer with an Intel GPU, or I will give up this ticket.:)

Any Intel SoC laptop should work, it typically has iGPU on it...

from openvino.

sshlyapn commented on May 25, 2024 1

Hi @awayzjj , thank you for checking the issue!

Let me add more details. The issue is connected with an incorrect performance profiling report for the IF operation. It is expected that the IF operation aggregates the execution time of all operations contained in its inner body. However, as I can see, it reports the execution time only for the last primitive in the IF’s inner body.

Regarding your questions:

I believe the original description meant not "the reported time is almost zero", but rather "the IF operator doesn’t display the real execution time of all underlying operations".
For checking the correctness of performance profiling, I think it's better to use the benchmark_app with the “-hint latency” option. This will run only a single inference request, and in that case, we can expect that the reported “Total time” metric from per-layer profiling would be more or less close to the final latency and throughput results reported by benchmark_app.

So, to debug this issue, you should use a simple model with an IF operation, the model should contain at least two operations in its else/then body (the type of operations doesn’t matter here, but it's better to use the bigger shapes just to make it more visible)

from openvino.

awayzjj commented on May 25, 2024

@p-durandin Hi, I wonder if I can work on this issue without Intel GPU?

from openvino.

awayzjj commented on May 25, 2024

.take

from openvino.

github-actions commented on May 25, 2024

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

from openvino.

yury-gorbachev commented on May 25, 2024

@awayzjj - since you taken it - I assume you have Intel GPU? :)

from openvino.

awayzjj commented on May 25, 2024

@yury-gorbachev Tomorrow I will try to find a computer with an Intel GPU, or I will give up this ticket.:)

from openvino.

awayzjj commented on May 25, 2024

@yury-gorbachev @p-durandin @sshlyapn I conducted benchmarking on a model with only one if operation as follows, howerver I have 2 questions:

What is meant by "the reported time is almost zero" in the issue context, yet the Total time marked with underscores in the provided picture is 566.967 milliseconds?
Why do you think "Performance counter numbers work incorrectly"? What should be the correct relationship between them? In the example you provided, they are respectively 566.967 ms vs. 4398.12 ms, while in my example, they are 0.002 ms vs. 2.79 ms."

$ ./bin/intel64/Release/benchmark_app -m ./tmp/if.xml -d GPU -pc
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2024.2.0-15181-7ca4f07a3c4
[ INFO ]
[ INFO ] Device info:
[ INFO ] GPU
[ INFO ] Build ................................. 2024.2.0-15181-7ca4f07a3c4
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(GPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 3.36 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Network inputs:
[ INFO ]     cond (node: cond) : boolean / [...] / []
[ INFO ] Network outputs:
[ INFO ]     res (node: res) : f32 / [...] / [5]
[Step 5/11] Resizing model to match image sizes and given batch
[ WARNING ] cond: layout is not set explicitly. It is STRONGLY recommended to set layout manually to avoid further issues.
[Step 6/11] Configuring input of the model
[ WARNING ] No batch dimension was found, asssuming batch to be 1. Beware: this might affect FPS calculation.
[ INFO ] Model batch size: 1
[ INFO ] Network inputs:
[ INFO ]     cond (node: cond) : boolean / [...] / []
[ INFO ] Network outputs:
[ INFO ]     res (node: res) : f32 / [...] / [5]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 599.08 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ]   NETWORK_NAME: test_if
[ INFO ]   OPTIMAL_NUMBER_OF_INFER_REQUESTS: 4
[ INFO ]   PERF_COUNT: YES
[ INFO ]   ENABLE_CPU_PINNING: NO
[ INFO ]   MODEL_PRIORITY: MEDIUM
[ INFO ]   GPU_HOST_TASK_PRIORITY: MEDIUM
[ INFO ]   GPU_QUEUE_PRIORITY: MEDIUM
[ INFO ]   GPU_QUEUE_THROTTLE: MEDIUM
[ INFO ]   GPU_ENABLE_LOOP_UNROLLING: YES
[ INFO ]   GPU_DISABLE_WINOGRAD_CONVOLUTION: NO
[ INFO ]   CACHE_DIR:
[ INFO ]   CACHE_MODE: optimize_speed
[ INFO ]   PERFORMANCE_HINT: THROUGHPUT
[ INFO ]   EXECUTION_MODE_HINT: PERFORMANCE
[ INFO ]   COMPILATION_NUM_THREADS: 8
[ INFO ]   NUM_STREAMS: 2
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS: 0
[ INFO ]   INFERENCE_PRECISION_HINT: f16
[ INFO ]   DEVICE_ID: 0
[ INFO ]   EXECUTION_DEVICES: GPU.0
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Test Config 0
[ INFO ] cond  ([...], boolean, [], static):	random (binary data/numpy array is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 0.35 ms

[Step 11/11] Dumping statistics report
[ INFO ] Performance counts for 0-th infer request:
Convert_2253         OPTIMIZED_OUT        layerType: Convert              execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
If_36                EXECUTED             layerType: If                   execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
cond                 EXECUTED             layerType: Parameter            execType: wait_for_events__u8  realTime (ms): 0.000      cpuTime (ms): 0.002
res                  OPTIMIZED_OUT        layerType: Convert              execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
res/sink_port_0      EXECUTED             layerType: Result               execType: reorder_data__f32    realTime (ms): 0.002      cpuTime (ms): 0.007
Total time:              0.002 milliseconds
Total CPU time:          0.009 milliseconds

Full device name: Intel(R) UHD Graphics (iGPU)

[ INFO ] Performance counts for 1-th infer request:
Convert_2253         OPTIMIZED_OUT        layerType: Convert              execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
If_36                EXECUTED             layerType: If                   execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
cond                 EXECUTED             layerType: Parameter            execType: wait_for_events__u8  realTime (ms): 0.000      cpuTime (ms): 0.002
res                  OPTIMIZED_OUT        layerType: Convert              execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
res/sink_port_0      EXECUTED             layerType: Result               execType: reorder_data__f32    realTime (ms): 0.002      cpuTime (ms): 0.007
Total time:              0.002 milliseconds
Total CPU time:          0.009 milliseconds

Full device name: Intel(R) UHD Graphics (iGPU)

[ INFO ] Performance counts for 2-th infer request:
Convert_2253         OPTIMIZED_OUT        layerType: Convert              execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
If_36                EXECUTED             layerType: If                   execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
cond                 EXECUTED             layerType: Parameter            execType: wait_for_events__u8  realTime (ms): 0.000      cpuTime (ms): 0.002
res                  OPTIMIZED_OUT        layerType: Convert              execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
res/sink_port_0      EXECUTED             layerType: Result               execType: reorder_data__f32    realTime (ms): 0.002      cpuTime (ms): 0.007
Total time:              0.002 milliseconds
Total CPU time:          0.009 milliseconds

Full device name: Intel(R) UHD Graphics (iGPU)

[ INFO ] Performance counts for 3-th infer request:
Convert_2253         OPTIMIZED_OUT        layerType: Convert              execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
If_36                EXECUTED             layerType: If                   execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
cond                 EXECUTED             layerType: Parameter            execType: wait_for_events__u8  realTime (ms): 0.000      cpuTime (ms): 0.002
res                  OPTIMIZED_OUT        layerType: Convert              execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
res/sink_port_0      EXECUTED             layerType: Result               execType: reorder_data__f32    realTime (ms): 0.002      cpuTime (ms): 0.007
Total time:              0.002 milliseconds
Total CPU time:          0.009 milliseconds

Full device name: Intel(R) UHD Graphics (iGPU)

[ INFO ] Execution Devices: [ GPU.0 ]
[ INFO ] Count:               686692 iterations
[ INFO ] Duration:            60000.73 ms
[ INFO ] Latency:
[ INFO ]    Median:           0.38 ms
[ INFO ]    Average:          0.34 ms
[ INFO ]    Min:              0.22 ms
[ INFO ]    Max:              2.79 ms
[ INFO ] Throughput:          11444.73 FPS

I would greatly appreciate any guidance or suggestions you can offer. Thank you!

from openvino.

[Good First Issue]: if operation is incorrectly reporting profiling data in GPU about openvino HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent