Giter Club home page Giter Club logo

Comments (4)

tmcdonell avatar tmcdonell commented on May 27, 2024

Asynchronous execution entails using non-default stream(s) and event waiting for dependencies.

With support for streams and events, we should also (correctly) support asynchronous memory transfer, which additionally requires:

  • The host memory is pinned, so the CUDA driver can do a DMA. Currently Accelerate (base) allocates in pageable memory that is pinned only with respect to the Haskell RTS's GC. Internally, the CUDA driver must copy the data to a pinned region, before performing the DMA.
  • If data transfers and kernels operate in distinct non-default streams these will also overlap on all devices which support the feature (almost all 1.1 and later devices).

from accelerate.

tmcdonell avatar tmcdonell commented on May 27, 2024

See also:

from accelerate.

robstewart57 avatar robstewart57 commented on May 27, 2024

Note: this issue is further discussed in June/July 2014 on the accelerate mailing list here.

from accelerate.

tmcdonell avatar tmcdonell commented on May 27, 2024

This is all possible now, just not exposed very nicely yet. See this profiler output, where compute and data transfer overlaps nicely with full-speed DMA to pinned memory:

screenshot 2016-02-10 15 55 31

Also note this example however, where the CUDA pinned memory allocator is (a) not concurrent, and (b) can be teeeerribly slow:

screenshot 2016-02-10 15 56 31

So we may want to do a nursery-style caching allocator. These screenshots are from different machines, and the latter is a 2-GPU box, so may have further strangeness going on...

from accelerate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.