Comments (2)
That text is quite old and is incorrect for coherent platforms, e.g. GH200 and POWER9+V100, where the GPU memory can be cached by the CPU. In that case, entire cache lines are exchanged and good performance is obtained.
Why GPU BAR cannot be prefetched?
On non-coherent platforms, the CPU can create MMIO mappings of the GPU BAR.
Prefetching per se refers to cache coherent platforms, and it does not apply to MMIO (UC) mappings.
Current CPU ISAs do not support reading or writing larger than 8B/16B granules when targeting MMIO mappings, even when using vector extensions.
On x86 platforms, Write-Combining (WC) mappings can improve write bandwidth, but do not help read performance.
Note that here I refer to CPU loads and stores, i.e. reads and writes generated by CPU instructions, which are interesting because of the their low latency properties.
Besides that, there can be DMA engines, similar to the GPU Copy Engines, which provide full performance in exchange of higher latencies.
In the README, no prefetched =>no PCIe burst reads transactions. Why the former leads to the latter?
That is an experimental statement.
When using cached mappings, loads and stores would trigger read/write transactions of entire cache lines, leading to large size PCIe transactions.
As of today, PCIe does not support a cache coherency protocol, so on non-coherent platforms, the CPU mappings of the GPU BAR cannot be cached (C). The best you can do there is to use MMIO or WC mappings.
I hope that helps.
from gdrcopy.
You have already solved my problem. Thanks a lot for your response!!!
from gdrcopy.
Related Issues (20)
- Unable to compile GDRCOPY v2.4 HOT 2
- Minimal steps to install gdrdrv driver only please HOT 6
- Fail to access mapped memory from CPU side(Fail data_validation tests) HOT 14
- tests build failing when check.h is not available HOT 1
- How to understand the file "nv-p2p-dummpy.c" HOT 3
- Driver flavor detection fails for 545 series HOT 2
- bad performance(compare with cuMemcpy) on x86 system HOT 3
- GDRCopy 2.4 on Centos7 failing build of RPM packages HOT 2
- Increasing utilization - gdrcopy_copybw HOT 3
- Improve the error report of gdrcopy_pplat when the CUDA kernel cannot be launched
- Safe Mounting of /dev/gdrdrv in a kubernetes environment - HostPath appears to fail HOT 12
- How to effectively test if gdrcopy is enabled using Real world ML workload ? HOT 2
- Can't make with Intel Compiler HOT 4
- MAINT: gdr_unmap segfault on master branch via NVSHMEM 2.10.1 on Cray Slingshot 11 with cuFFTMp HOT 22
- Assertion "(cuStreamQuery(0)) == (CUDA_ERROR_NOT_READY)" HOT 8
- "gdrcopy_sanity" failed with 555-open driver on Grace-Hopper HOT 6
- How should I implement device-to-device memory copying between different hosts? For example, copying the contents from host B's GPU memory to host A's CPU memory. HOT 4
- How much the size of GDR can pin? Is there differences on Tesla and Quadro? HOT 10
- Can gdrcopy use in RCCL? HOT 2
- Error when building tests: Undefined reference to symbol '__c_mset4' HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gdrcopy.