Comments (1)
Update Frequency - how often DCGM will read the hardware counters (metric values)
Max Keep Age - how long will a metric value with a timestamp be kept in the internal cache. There is an API to get metric values since some timestamps in the past, not just the latest value.
Max Keep Samples - how many metric values should be kept in the internal cache.
The DCGM will compute how many metric values are kept in the cache based on those argument values and select the biggest/longest result.
For example, if you specify frequency = 1sec, maxKeepAge = 300sec, and maxKeepSamples = 500, then DCGM will keep 500 metric values, and with 1sec frequency, that will be 500sec of historical values.
On the other hand, if you specify maxKeepSamples = 1, then DCGM will keep 300sec of historical values, which at 1sec frequency corresponds to 300 metric values.
I hope that clarifies the meaning of the arguments.
from dcgm.
Related Issues (20)
- How to use profiling from python bindings? HOT 2
- How do I inject errors into the GPU hardware? HOT 1
- H100 GPU docker container exit 137
- device memory ECC Errors can not take effect HOT 2
- Old data are copied into new data in dcgmGroupSamples.GetAllSinceLastCall
- #include "newrandom.h" HOT 1
- How to get the module profile loaded? HOT 5
- Can DCGM achieve obtaining gpu information of another host? HOT 2
- New segmentation fault from version v3.3.0 HOT 4
- @nguoido,
- dcgm diag pcie test hangs indefinitely for H100 80GB HBM3 HOT 1
- DCGM_FI_PROF_GR_ENGINE_ACTIVE and MIG HOT 10
- dcgmi diag multiple tests skipped HOT 4
- Previous profiling results are still stored in dcgmGroup.samples.GetAllSinceLastCall
- Errors in nv-hostengine log HOT 6
- Does DCGM support profiling metrics for A10 ? HOT 9
- When I run diagnostics, the two GPUs in the group both get failed results. HOT 2
- Does DCGM supports creating groups of GPU from different hosts? HOT 1
- diag --configfile option is silently ignored if --parameters options is present
- No NVLINK activity on DGX-A100 320GB HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dcgm.