Giter Club home page Giter Club logo

Comments (1)

ptheywood avatar ptheywood commented on June 24, 2024

Maybe some advanced discussion regarding accuracy/wddm, impact of cuda event timers etc.

CUDA event timers have a resolution of "around 0.5 microscends", and timing only behaves as intended when the event's are recorded in the NULL (default) stream:

Computes the elapsed time between two events (in milliseconds with a resolution of around 0.5 microseconds).

If either event was last recorded in a non-NULL stream, the resulting time may be greater than expected (even if both used the same stream handle). This happens because the cudaEventRecord() operation takes place asynchronously and there is no guarantee that the measured latency is actually just between the two events. Any number of other different stream operations could execute in between the two measured events, thus altering the timing in a significant way.

source

Under WDDM, due to how the WDDM command buffers work, cudaEvent based timing is only meaningful for pure device code (unless you add immediate stream/event/device sync after recording). See FLAMEGPU/FLAMEGPU2#451.
The current implementation in FLAME GPU uses std::steady_clock timers when the gpu is running under WDDM.

std::steady_clock timers are generally not as good, but they are implementation and hardware specific, so can't document a known accuracy / precision. It might be possible to calculate one at runtime though. They might not be precise enough to give useful per step or per layer timing depending on the model.
std::high_resolution_clock sounds like it should be better, but its implementation defined. MSVC it is just a std::steady_clock, but gcc uses std::system_clock which is not good for performance timing (it's not monotonic).

from flamegpu2-docs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.