Giter Club home page Giter Club logo

Comments (9)

bstollenvidia avatar bstollenvidia commented on July 17, 2024 1

Could you please elaborate in what case we will see a fractional number?

In all cases. I would suggest trying out the dcgmproftester12 tool that's installed with DCGM to understand how workloads relate to profiling metrics.
See docs here: CUDA Test Generator (dcgmproftester)ΒΆ

And also how to interpret it?

You can think of it as a percentage but instead of between 0 and 100 it's between 0 and 1.0. You can multiply this by 100 if you want percentage.

from dcgm.

bstollenvidia avatar bstollenvidia commented on July 17, 2024

DCGM_FI_DEV_GPU_UTIL is roughly equal to DCGM_FI_PROF_GR_ENGINE_ACTIVE. DCGM_FI_PROF_GR_ENGINE_ACTIVE is higher precision and works on MIG.

DCGM_FI_DEV_MEM_COPY_UTIL is utilization of the copy engine of the GPU. I would shy away from it as it may not capture all memory bandwidth. Sometimes cuda mem copies use cuda kernels rather than the copy engine, which would not be picked up by this metric. Also, this metric does not work on MIG.

DCGM_FI_PROF_DRAM_ACTIVE is dram bandwidth vs theoretical maximum. This metric is accurate and captures all transfers to and from the GPU's DRAM.

"memory utilization" is ambiguous. Do you mean bandwidth or allocation? For bandwidth, use DCGM_FI_PROF_DRAM_ACTIVE. For allocation, use DCGM_FI_DEV_FB_USED,DCGM_FI_DEV_FB_FREE, DCGM_FI_DEV_FB_TOTAL.

Also dcgm documentation says not all fields can be queried in parallel.

This is no longer true. We will remove this from our documentation.

More specifically I wanted to know if the DCGM_FI_DEV_GPU_UTIL can be allotted to any group or does it need to be part of its own group?

DCGM_FI_DEV_GPU_UTIL can be in any group. The same is true for any other fieldIds.

from dcgm.

starry91 avatar starry91 commented on July 17, 2024

Thanks @bstollenvidia! Could you please help me with the following questions as well?

  1. Is PROF_SM_OCCUPANCY the right field to understand the SM effective usage? i.e, are all the cores in an SM being utilized? (I do understand this could be limited by SM shared/L1 memory, registers)
  2. Is PROF_SM_ACTIVE*PROF_SM_OCCUPANCY*Total_cores indicative of the cores used (rough estimate)?
  3. Is there a way I can get DCGM_FI_DEV_FB_USED, DCGM_FI_DEV_FB_FREE, DCGM_FI_DEV_FB_TOTAL for MIGs as well? Is this something being planned for future releases? The documentation says only the *_PROF_* fields are available for MIGs.
  4. w.r.t PROF_SM_OCCUPANCY Say a GPU has 128 cores/SM, does that mean its maximum number of concurrent warps is 128/32 = 4?

from dcgm.

bstollenvidia avatar bstollenvidia commented on July 17, 2024

Think of them as 3 dimensions of utilization:
PROF_GR_ENGINE_ACTIVE - Is any kernel running on any SM?
PROF_SM_ACTIVE - What ratio of SMs are active?
PROF_SM_OCCUPANCY - How many warps are running vs theoretical max (2048 per SM).

Rough estimate of SMs used would be PROF_SM_ACTIVE * numSMs.

w.r.t PROF_SM_OCCUPANCY Say a GPU has 128 cores/SM, does that mean its maximum number of concurrent warps is 128/32 = 4?

There aren't cores per SM. I would read up on the cuda programming model here:
https://developer.nvidia.com/blog/cuda-refresher-cuda-programming-model/
or here:
https://en.wikipedia.org/wiki/Thread_block_(CUDA_programming)

The FB metrics are available at the MIG level. See

case DCGM_FE_GPU_I: // Fall through

Make sure you're using our latest release (DCGM 3.1.3) from here https://developer.nvidia.com/dcgm#Downloads

from dcgm.

starry91 avatar starry91 commented on July 17, 2024

@bstollenvidia The image in https://developer.nvidia.com/blog/cuda-refresher-cuda-programming-model/ does mention SM has multiple cores. Am I understanding it in a wrong way?
image

Another reference which gives out the same meaning.

from dcgm.

starry91 avatar starry91 commented on July 17, 2024

PROF_GR_ENGINE_ACTIVE - Is any kernel running on any SM?

@bstollenvidia Does this mean this can only have a discreet set of values, either 0 or 1? 0 if nothing is running, 1 if GPU is being used in any way(Even if 1 core or device memory is being used).
Also, is the value inclusive of memory utilization as well, i.e, it would be 1 even if the compute core is not being used but data is being transferred to the device memory?

from dcgm.

starry91 avatar starry91 commented on July 17, 2024

@bstollenvidia Could you please look into the questions posted in the above 2 comments?

from dcgm.

bstollenvidia avatar bstollenvidia commented on July 17, 2024

SMs don't have independent cores. They can do cuda threads in parallel but every thread is doing exactly the same instruction per cycle. They re-run kernels for any branches taken.

Does this mean this can only have a discreet set of values, either 0 or 1? 0 if nothing is running, 1 if GPU is being used in any way(Even if 1 core or device memory is being used).

No. The values are from 0 - 1. Could be 0.522922 or something.

Also, is the value inclusive of memory utilization as well, i.e, it would be 1 even if the compute core is not being used but data is being transferred to the device memory?

Yes. Waiting on memory transfers = busy. Technically, the SM is busy but blocked on a dram transfer.

from dcgm.

starry91 avatar starry91 commented on July 17, 2024

No. The values are from 0 - 1. Could be 0.522922 or something.

@bstollenvidia Could you please elaborate in what case we will see a fractional number? And also how to interpret it?

from dcgm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.