There should probably be some coverage of timing. Performance

Timing about flamegpu2-docs HOT 1 OPEN

flamegpu commented on June 24, 2024

Timing

from flamegpu2-docs.

Comments (1)

ptheywood commented on June 24, 2024

Maybe some advanced discussion regarding accuracy/wddm, impact of cuda event timers etc.

CUDA event timers have a resolution of "around 0.5 microscends", and timing only behaves as intended when the event's are recorded in the NULL (default) stream:

Computes the elapsed time between two events (in milliseconds with a resolution of around 0.5 microseconds).

If either event was last recorded in a non-NULL stream, the resulting time may be greater than expected (even if both used the same stream handle). This happens because the cudaEventRecord() operation takes place asynchronously and there is no guarantee that the measured latency is actually just between the two events. Any number of other different stream operations could execute in between the two measured events, thus altering the timing in a significant way.

source

Under WDDM, due to how the WDDM command buffers work, cudaEvent based timing is only meaningful for pure device code (unless you add immediate stream/event/device sync after recording). See FLAMEGPU/FLAMEGPU2#451.
The current implementation in FLAME GPU uses std::steady_clock timers when the gpu is running under WDDM.

std::steady_clock timers are generally not as good, but they are implementation and hardware specific, so can't document a known accuracy / precision. It might be possible to calculate one at runtime though. They might not be precise enough to give useful per step or per layer timing depending on the model.
std::high_resolution_clock sounds like it should be better, but its implementation defined. MSVC it is just a std::steady_clock, but gcc uses std::system_clock which is not good for performance timing (it's not monotonic).

from flamegpu2-docs.

Recommend Projects

Timing about flamegpu2-docs HOT 1 OPEN

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent