I was going through the different dcgm <a href="https://docs.nvidia.com/datacenter/dcg

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

PROF_GR_ENGINE_ACTIVE - Is any kernel running on any SM? </blockquote

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Question about DCGM fields about dcgm HOT 9 CLOSED

nvidia commented on July 17, 2024

Question about DCGM fields

from dcgm.

Comments (9)

bstollenvidia commented on July 17, 2024 1

Could you please elaborate in what case we will see a fractional number?

In all cases. I would suggest trying out the dcgmproftester12 tool that's installed with DCGM to understand how workloads relate to profiling metrics.
See docs here: CUDA Test Generator (dcgmproftester)¶

And also how to interpret it?

You can think of it as a percentage but instead of between 0 and 100 it's between 0 and 1.0. You can multiply this by 100 if you want percentage.

from dcgm.

bstollenvidia commented on July 17, 2024

DCGM_FI_DEV_GPU_UTIL is roughly equal to DCGM_FI_PROF_GR_ENGINE_ACTIVE. DCGM_FI_PROF_GR_ENGINE_ACTIVE is higher precision and works on MIG.

DCGM_FI_DEV_MEM_COPY_UTIL is utilization of the copy engine of the GPU. I would shy away from it as it may not capture all memory bandwidth. Sometimes cuda mem copies use cuda kernels rather than the copy engine, which would not be picked up by this metric. Also, this metric does not work on MIG.

DCGM_FI_PROF_DRAM_ACTIVE is dram bandwidth vs theoretical maximum. This metric is accurate and captures all transfers to and from the GPU's DRAM.

"memory utilization" is ambiguous. Do you mean bandwidth or allocation? For bandwidth, use DCGM_FI_PROF_DRAM_ACTIVE. For allocation, use DCGM_FI_DEV_FB_USED,DCGM_FI_DEV_FB_FREE, DCGM_FI_DEV_FB_TOTAL.

Also dcgm documentation says not all fields can be queried in parallel.

This is no longer true. We will remove this from our documentation.

More specifically I wanted to know if the DCGM_FI_DEV_GPU_UTIL can be allotted to any group or does it need to be part of its own group?

DCGM_FI_DEV_GPU_UTIL can be in any group. The same is true for any other fieldIds.

from dcgm.

starry91 commented on July 17, 2024

Thanks @bstollenvidia! Could you please help me with the following questions as well?

Is PROF_SM_OCCUPANCY the right field to understand the SM effective usage? i.e, are all the cores in an SM being utilized? (I do understand this could be limited by SM shared/L1 memory, registers)
Is PROF_SM_ACTIVE*PROF_SM_OCCUPANCY*Total_cores indicative of the cores used (rough estimate)?
Is there a way I can get DCGM_FI_DEV_FB_USED, DCGM_FI_DEV_FB_FREE, DCGM_FI_DEV_FB_TOTAL for MIGs as well? Is this something being planned for future releases? The documentation says only the *_PROF_* fields are available for MIGs.
w.r.t PROF_SM_OCCUPANCY Say a GPU has 128 cores/SM, does that mean its maximum number of concurrent warps is 128/32 = 4?

from dcgm.

bstollenvidia commented on July 17, 2024

Think of them as 3 dimensions of utilization:
PROF_GR_ENGINE_ACTIVE - Is any kernel running on any SM?
PROF_SM_ACTIVE - What ratio of SMs are active?
PROF_SM_OCCUPANCY - How many warps are running vs theoretical max (2048 per SM).

Rough estimate of SMs used would be PROF_SM_ACTIVE * numSMs.

w.r.t PROF_SM_OCCUPANCY Say a GPU has 128 cores/SM, does that mean its maximum number of concurrent warps is 128/32 = 4?

There aren't cores per SM. I would read up on the cuda programming model here:
https://developer.nvidia.com/blog/cuda-refresher-cuda-programming-model/
or here:
https://en.wikipedia.org/wiki/Thread_block_(CUDA_programming)

The FB metrics are available at the MIG level. See

DCGM/dcgmlib/src/DcgmCacheManager.cpp

Line 10585 in 661d939

case DCGM_FE_GPU_I: // Fall through

Make sure you're using our latest release (DCGM 3.1.3) from here https://developer.nvidia.com/dcgm#Downloads

from dcgm.

starry91 commented on July 17, 2024

@bstollenvidia The image in https://developer.nvidia.com/blog/cuda-refresher-cuda-programming-model/ does mention SM has multiple cores. Am I understanding it in a wrong way?

Another reference which gives out the same meaning.

from dcgm.

starry91 commented on July 17, 2024

PROF_GR_ENGINE_ACTIVE - Is any kernel running on any SM?

@bstollenvidia Does this mean this can only have a discreet set of values, either 0 or 1? 0 if nothing is running, 1 if GPU is being used in any way(Even if 1 core or device memory is being used).
Also, is the value inclusive of memory utilization as well, i.e, it would be 1 even if the compute core is not being used but data is being transferred to the device memory?

from dcgm.

starry91 commented on July 17, 2024

@bstollenvidia Could you please look into the questions posted in the above 2 comments?

from dcgm.

bstollenvidia commented on July 17, 2024

SMs don't have independent cores. They can do cuda threads in parallel but every thread is doing exactly the same instruction per cycle. They re-run kernels for any branches taken.

Does this mean this can only have a discreet set of values, either 0 or 1? 0 if nothing is running, 1 if GPU is being used in any way(Even if 1 core or device memory is being used).

No. The values are from 0 - 1. Could be 0.522922 or something.

Also, is the value inclusive of memory utilization as well, i.e, it would be 1 even if the compute core is not being used but data is being transferred to the device memory?

Yes. Waiting on memory transfers = busy. Technically, the SM is busy but blocked on a dram transfer.

from dcgm.

starry91 commented on July 17, 2024

No. The values are from 0 - 1. Could be 0.522922 or something.

@bstollenvidia Could you please elaborate in what case we will see a fractional number? And also how to interpret it?

from dcgm.

Question about DCGM fields about dcgm HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent