Comments (3)
So I just had the driver on the machine update to 370. I think this may have fixed the issue. I'll keep you posted.
A related question is do you know if anyone is working on nccl/tensorflow integration?
from nccl.
It appears the issue is completely resolved in the 370 driver. I'd close the issue but I'm curious on the answer to my tensorflow question.
from nccl.
@scott-gray adding NCCL to an existing computation-graph framework is pretty clean, e.g. https://github.com/caffe2/caffe2/tree/master/caffe2/contrib/nccl
from nccl.
Related Issues (20)
- Why duplicate nChannels in connect.cc HOT 1
- All Reduce Performance on H100 VMs HOT 1
- NCCL fallback to Ring,LL on broadcast perf and NCCL_ALGO=Tree HOT 1
- why two GPU far than PXB under intel cpu use P2P will be slower(without NVLink) HOT 2
- About NVLS MC/UC buffer
- nccl-test can use nvidia sharp, but training job can not use nvidia sharp
- Dual 4090 bandwidth slower with PCIe HOT 1
- Profiling Tools for NCCL collective operations
- Local user buffer registration for NVLink SHARP HOT 1
- Some questions about selecting NET when searching channels. HOT 12
- Compute time in the reduction operation
- Understanding LL, LL128, and Simple Protocols
- Performance Degradation in Alltoall Operation with NCCL 2.19 and 2.20 HOT 5
- NCCL2.21 hangs at cudaLaunchKernelExC() HOT 6
- How are threads in different channels parallelized
- How sendProxyProgress() in net.cc works HOT 2
- Execute all_reduce_perf block HOT 1
- Has NCCL support inter-node through NVswitch and NVlink? HOT 7
- For channel computing, why nvlinkBw is accumulated, but pciBw is not? Is this a BUG? HOT 2
- nccl with specified pkey_index HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nccl.