Comments (1)
Maybe some advanced discussion regarding accuracy/wddm, impact of cuda event timers etc.
CUDA event timers have a resolution of "around 0.5 microscends", and timing only behaves as intended when the event's are recorded in the NULL (default) stream:
Computes the elapsed time between two events (in milliseconds with a resolution of around 0.5 microseconds).
If either event was last recorded in a non-NULL stream, the resulting time may be greater than expected (even if both used the same stream handle). This happens because the cudaEventRecord() operation takes place asynchronously and there is no guarantee that the measured latency is actually just between the two events. Any number of other different stream operations could execute in between the two measured events, thus altering the timing in a significant way.
Under WDDM, due to how the WDDM command buffers work, cudaEvent
based timing is only meaningful for pure device code (unless you add immediate stream/event/device sync after recording). See FLAMEGPU/FLAMEGPU2#451.
The current implementation in FLAME GPU uses std::steady_clock
timers when the gpu is running under WDDM.
std::steady_clock
timers are generally not as good, but they are implementation and hardware specific, so can't document a known accuracy / precision. It might be possible to calculate one at runtime though. They might not be precise enough to give useful per step or per layer timing depending on the model.
std::high_resolution_clock
sounds like it should be better, but its implementation defined. MSVC it is just a std::steady_clock
, but gcc uses std::system_clock
which is not good for performance timing (it's not monotonic).
from flamegpu2-docs.
Related Issues (20)
- Critical title level warnings HOT 4
- Documentation landing page
- Sidebar link order
- Windows CMake guide, how to use non-default CUDA version HOT 2
- Add VSCode debugging example(s) HOT 1
- Github Actions node deprecation warnings
- Document cleanup() HOT 1
- JSON is no longer strictly valid JSON
- Message list persistence
- Improve documentation around how to use agent states
- Vis Agent State hiding HOT 1
- `newProperty` in Agent Variables documentation
- Array msg, how to get message index within array during iteration
- CI: purge `gh-pages` on each publication
- Beginner's feedback for improving documentation HOT 6
- Python agent sort code example wrong
- CMake: flamegpu_add_library
- AgentDescription::setSortPeriod(0) is not documented
- Improve coverage of agent state transitions.
- API Docs not currently built on docs website
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flamegpu2-docs.