Comments (1)
The ncclLaunchKernel function plays a pivotal role, being responsible for initiating the execution of NCCL kernels. The implementation of this function relies on CUDA's cudaLaunchKernel API, which is used to enqueue the NCCL kernel for execution.
To thoroughly understand this process, it is essential to delve into the execution mechanism of CUDA kernels. Within the implementation of NCCL, cudaLaunchKernel is the key function that triggers the kernel execution. It accepts a pointer as its first argument, which points to a CUDA kernel function that conforms to a specific signature.
In the source code of NCCL, you may notice that header files such as all_reduce.h and all_gather.h define functions with the device attribute. These functions are restricted to execute on the device side and are called by functions with the global attribute defined in common.cu. The global functions serve as the entry point for CUDA kernels; they are the targets pointed to by the first argument of the cudaLaunchKernel function.
from nccl.
Related Issues (20)
- Profiling Tools for NCCL collective operations
- Local user buffer registration for NVLink SHARP HOT 1
- Some questions about selecting NET when searching channels. HOT 12
- Compute time in the reduction operation
- Understanding LL, LL128, and Simple Protocols
- Performance Degradation in Alltoall Operation with NCCL 2.19 and 2.20 HOT 5
- NCCL2.21 hangs at cudaLaunchKernelExC() HOT 6
- How are threads in different channels parallelized
- How sendProxyProgress() in net.cc works HOT 2
- Execute all_reduce_perf block HOT 1
- Has NCCL support inter-node through NVswitch and NVlink? HOT 8
- For channel computing, why nvlinkBw is accumulated, but pciBw is not? Is this a BUG? HOT 2
- nccl with specified pkey_index HOT 1
- How to locate the hanging node? HOT 1
- Why dose theoretical busBw multiply by the ratio 5/6?
- how double binary tree communicate HOT 4
- NCCL error "receiving 524288 bytes instead of 65536" HOT 1
- Why can't two GPUs in a virtual machine communicate using P2Pīŧ HOT 1
- The variable NCCL_IB_ADDR_RANGE did not work properly after being configured HOT 3
- GID index change cause training to stop on ConnectX-7 400G Adapters when traing LLM HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
đ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. đđđ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google â¤ī¸ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nccl.