Comments (4)
Asynchronous execution entails using non-default stream(s) and event waiting for dependencies.
With support for streams and events, we should also (correctly) support asynchronous memory transfer, which additionally requires:
- The host memory is pinned, so the CUDA driver can do a DMA. Currently Accelerate (base) allocates in pageable memory that is pinned only with respect to the Haskell RTS's GC. Internally, the CUDA driver must copy the data to a pinned region, before performing the DMA.
- If data transfers and kernels operate in distinct non-default streams these will also overlap on all devices which support the feature (almost all 1.1 and later devices).
from accelerate.
See also:
- https://developer.nvidia.com/content/how-optimize-data-transfers-cuda-cc
- https://developer.nvidia.com/content/how-overlap-data-transfers-cuda-cc
from accelerate.
Note: this issue is further discussed in June/July 2014 on the accelerate mailing list here.
from accelerate.
This is all possible now, just not exposed very nicely yet. See this profiler output, where compute and data transfer overlaps nicely with full-speed DMA to pinned memory:
Also note this example however, where the CUDA pinned memory allocator is (a) not concurrent, and (b) can be teeeerribly slow:
So we may want to do a nursery-style caching allocator. These screenshots are from different machines, and the latter is a 2-GPU box, so may have further strangeness going on...
from accelerate.
Related Issues (20)
- [BUG] Imperfect dead code elimination
- [BUG] Unexpectedly long phases when training a neural network HOT 1
- Support CUDA 11 HOT 1
- [BUG] CUDA-10 library doesn't support the Turing-based RTX 2060? HOT 8
- `inconsistent valuation @ shared 'Acc'` when trying to lift non-`Acc` function to `Acc` HOT 6
- `Foreign` instance for reference interpreter
- Is there a way to force accelerate operations to be sequentially evaluated? HOT 10
- [BUG] doc bugs
- Could not enable debugging options HOT 5
- Support GHCJS compilation HOT 7
- [BUG] Function hashes have incorrect length causing internal errors HOT 2
- [BUG] undefined symbol: _ZTIN4llvm10CallbackVHE HOT 4
- [BUG] Value 'sm_30' is not defined for option 'gpu-name' HOT 4
- [BUG] typo in Semigroup instance of Exp (Maybe a) HOT 1
- How to realise convolution? HOT 13
- [Tracking Issue] Implementing (Segmented) Single-Pass Look-Back Scans
- [BUG] Internal error in package accelerate and LLVM.PTX backend: CUDA Exception - misaligned address HOT 1
- [BUG] Runtime error with llvm-ptx backend: double free or corruption (!prev)
- [BUG] Library won't compile with debug flag when referenced by another project's cabal.project file. HOT 9
- [BUG] ptxas fatal error, sm_89 not defined for gpu-name
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from accelerate.