Giter Club home page Giter Club logo

Comments (8)

yury-gorbachev avatar yury-gorbachev commented on May 25, 2024 1

@yury-gorbachev Tomorrow I will try to find a computer with an Intel GPU, or I will give up this ticket.:)

Any Intel SoC laptop should work, it typically has iGPU on it...

from openvino.

sshlyapn avatar sshlyapn commented on May 25, 2024 1

Hi @awayzjj , thank you for checking the issue!

Let me add more details. The issue is connected with an incorrect performance profiling report for the IF operation. It is expected that the IF operation aggregates the execution time of all operations contained in its inner body. However, as I can see, it reports the execution time only for the last primitive in the IF’s inner body.

Regarding your questions:

  1. I believe the original description meant not "the reported time is almost zero", but rather "the IF operator doesn’t display the real execution time of all underlying operations".
  2. For checking the correctness of performance profiling, I think it's better to use the benchmark_app with the “-hint latency” option. This will run only a single inference request, and in that case, we can expect that the reported “Total time” metric from per-layer profiling would be more or less close to the final latency and throughput results reported by benchmark_app.

So, to debug this issue, you should use a simple model with an IF operation, the model should contain at least two operations in its else/then body (the type of operations doesn’t matter here, but it's better to use the bigger shapes just to make it more visible)

from openvino.

awayzjj avatar awayzjj commented on May 25, 2024

@p-durandin Hi, I wonder if I can work on this issue without Intel GPU?

from openvino.

awayzjj avatar awayzjj commented on May 25, 2024

.take

from openvino.

github-actions avatar github-actions commented on May 25, 2024

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

from openvino.

yury-gorbachev avatar yury-gorbachev commented on May 25, 2024

@awayzjj - since you taken it - I assume you have Intel GPU? :)

from openvino.

awayzjj avatar awayzjj commented on May 25, 2024

@yury-gorbachev Tomorrow I will try to find a computer with an Intel GPU, or I will give up this ticket.:)

from openvino.

awayzjj avatar awayzjj commented on May 25, 2024

@yury-gorbachev @p-durandin @sshlyapn I conducted benchmarking on a model with only one if operation as follows, howerver I have 2 questions:

  1. What is meant by "the reported time is almost zero" in the issue context, yet the Total time marked with underscores in the provided picture is 566.967 milliseconds?

  2. Why do you think "Performance counter numbers work incorrectly"? What should be the correct relationship between them? In the example you provided, they are respectively 566.967 ms vs. 4398.12 ms, while in my example, they are 0.002 ms vs. 2.79 ms."

$ ./bin/intel64/Release/benchmark_app -m ./tmp/if.xml -d GPU -pc
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2024.2.0-15181-7ca4f07a3c4
[ INFO ]
[ INFO ] Device info:
[ INFO ] GPU
[ INFO ] Build ................................. 2024.2.0-15181-7ca4f07a3c4
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(GPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 3.36 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Network inputs:
[ INFO ]     cond (node: cond) : boolean / [...] / []
[ INFO ] Network outputs:
[ INFO ]     res (node: res) : f32 / [...] / [5]
[Step 5/11] Resizing model to match image sizes and given batch
[ WARNING ] cond: layout is not set explicitly. It is STRONGLY recommended to set layout manually to avoid further issues.
[Step 6/11] Configuring input of the model
[ WARNING ] No batch dimension was found, asssuming batch to be 1. Beware: this might affect FPS calculation.
[ INFO ] Model batch size: 1
[ INFO ] Network inputs:
[ INFO ]     cond (node: cond) : boolean / [...] / []
[ INFO ] Network outputs:
[ INFO ]     res (node: res) : f32 / [...] / [5]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 599.08 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ]   NETWORK_NAME: test_if
[ INFO ]   OPTIMAL_NUMBER_OF_INFER_REQUESTS: 4
[ INFO ]   PERF_COUNT: YES
[ INFO ]   ENABLE_CPU_PINNING: NO
[ INFO ]   MODEL_PRIORITY: MEDIUM
[ INFO ]   GPU_HOST_TASK_PRIORITY: MEDIUM
[ INFO ]   GPU_QUEUE_PRIORITY: MEDIUM
[ INFO ]   GPU_QUEUE_THROTTLE: MEDIUM
[ INFO ]   GPU_ENABLE_LOOP_UNROLLING: YES
[ INFO ]   GPU_DISABLE_WINOGRAD_CONVOLUTION: NO
[ INFO ]   CACHE_DIR:
[ INFO ]   CACHE_MODE: optimize_speed
[ INFO ]   PERFORMANCE_HINT: THROUGHPUT
[ INFO ]   EXECUTION_MODE_HINT: PERFORMANCE
[ INFO ]   COMPILATION_NUM_THREADS: 8
[ INFO ]   NUM_STREAMS: 2
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS: 0
[ INFO ]   INFERENCE_PRECISION_HINT: f16
[ INFO ]   DEVICE_ID: 0
[ INFO ]   EXECUTION_DEVICES: GPU.0
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Test Config 0
[ INFO ] cond  ([...], boolean, [], static):	random (binary data/numpy array is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 0.35 ms

[Step 11/11] Dumping statistics report
[ INFO ] Performance counts for 0-th infer request:
Convert_2253         OPTIMIZED_OUT        layerType: Convert              execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
If_36                EXECUTED             layerType: If                   execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
cond                 EXECUTED             layerType: Parameter            execType: wait_for_events__u8  realTime (ms): 0.000      cpuTime (ms): 0.002
res                  OPTIMIZED_OUT        layerType: Convert              execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
res/sink_port_0      EXECUTED             layerType: Result               execType: reorder_data__f32    realTime (ms): 0.002      cpuTime (ms): 0.007
Total time:              0.002 milliseconds
Total CPU time:          0.009 milliseconds

Full device name: Intel(R) UHD Graphics (iGPU)

[ INFO ] Performance counts for 1-th infer request:
Convert_2253         OPTIMIZED_OUT        layerType: Convert              execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
If_36                EXECUTED             layerType: If                   execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
cond                 EXECUTED             layerType: Parameter            execType: wait_for_events__u8  realTime (ms): 0.000      cpuTime (ms): 0.002
res                  OPTIMIZED_OUT        layerType: Convert              execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
res/sink_port_0      EXECUTED             layerType: Result               execType: reorder_data__f32    realTime (ms): 0.002      cpuTime (ms): 0.007
Total time:              0.002 milliseconds
Total CPU time:          0.009 milliseconds

Full device name: Intel(R) UHD Graphics (iGPU)

[ INFO ] Performance counts for 2-th infer request:
Convert_2253         OPTIMIZED_OUT        layerType: Convert              execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
If_36                EXECUTED             layerType: If                   execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
cond                 EXECUTED             layerType: Parameter            execType: wait_for_events__u8  realTime (ms): 0.000      cpuTime (ms): 0.002
res                  OPTIMIZED_OUT        layerType: Convert              execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
res/sink_port_0      EXECUTED             layerType: Result               execType: reorder_data__f32    realTime (ms): 0.002      cpuTime (ms): 0.007
Total time:              0.002 milliseconds
Total CPU time:          0.009 milliseconds

Full device name: Intel(R) UHD Graphics (iGPU)

[ INFO ] Performance counts for 3-th infer request:
Convert_2253         OPTIMIZED_OUT        layerType: Convert              execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
If_36                EXECUTED             layerType: If                   execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
cond                 EXECUTED             layerType: Parameter            execType: wait_for_events__u8  realTime (ms): 0.000      cpuTime (ms): 0.002
res                  OPTIMIZED_OUT        layerType: Convert              execType: undef                realTime (ms): 0.000      cpuTime (ms): 0.000
res/sink_port_0      EXECUTED             layerType: Result               execType: reorder_data__f32    realTime (ms): 0.002      cpuTime (ms): 0.007
Total time:              0.002 milliseconds
Total CPU time:          0.009 milliseconds

Full device name: Intel(R) UHD Graphics (iGPU)

[ INFO ] Execution Devices: [ GPU.0 ]
[ INFO ] Count:               686692 iterations
[ INFO ] Duration:            60000.73 ms
[ INFO ] Latency:
[ INFO ]    Median:           0.38 ms
[ INFO ]    Average:          0.34 ms
[ INFO ]    Min:              0.22 ms
[ INFO ]    Max:              2.79 ms
[ INFO ] Throughput:          11444.73 FPS

I would greatly appreciate any guidance or suggestions you can offer. Thank you!

from openvino.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.