Comments (4)
NCCL does support multi-threaded control. In fact, calling ncclCommInitRank() from each thread, you can even perform the communicator initialization in multi-threaded mode. The code above contains only a couple of nccl calls, neither of which can deadlock. The hang most likely occurs during the comm initialization step. Can you include the listing of main?
from nccl.
I have similar dead lock issue when I called ncclCommInitRank from multiple threads.
And I found some threads stuck in syncRingDirect and closeGather.
from nccl.
ladiemon, the latest commit (dba3ec9) should fix the deadlock in ncclCommInitRank.
from nccl.
Closing old bug.
from nccl.
Related Issues (20)
- Why duplicate nChannels in connect.cc HOT 1
- All Reduce Performance on H100 VMs HOT 1
- NCCL fallback to Ring,LL on broadcast perf and NCCL_ALGO=Tree HOT 1
- why two GPU far than PXB under intel cpu use P2P will be slower(without NVLink) HOT 2
- About NVLS MC/UC buffer
- nccl-test can use nvidia sharp, but training job can not use nvidia sharp
- Dual 4090 bandwidth slower with PCIe HOT 1
- Profiling Tools for NCCL collective operations
- Local user buffer registration for NVLink SHARP HOT 1
- Some questions about selecting NET when searching channels. HOT 12
- Compute time in the reduction operation
- Understanding LL, LL128, and Simple Protocols
- Performance Degradation in Alltoall Operation with NCCL 2.19 and 2.20 HOT 5
- NCCL2.21 hangs at cudaLaunchKernelExC() HOT 6
- How are threads in different channels parallelized
- How sendProxyProgress() in net.cc works HOT 2
- Execute all_reduce_perf block HOT 1
- Has NCCL support inter-node through NVswitch and NVlink? HOT 7
- For channel computing, why nvlinkBw is accumulated, but pciBw is not? Is this a BUG? HOT 2
- nccl with specified pkey_index HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nccl.