Giter Club home page Giter Club logo

nccl's Introduction

nccl

Windows version of NVIDIA's NCCL ('Nickel') for multi-GPU training - please use https://github.com/NVIDIA/nccl for changes.

Supported Development Environments:

* Visual Studio 2022 & CUDA 11.8
* Visual Studio 2022 & CUDA 11.7

MyCaffe uses the nccl64_134.dll library for multi-gpu communication during multi-gpu training.

nccl's People

Contributors

zorodervoncodier avatar

Stargazers

Philip avatar  avatar  avatar  avatar liheng avatar  avatar Romain Hoarau avatar  avatar  avatar  avatar Benjamin Berman avatar  avatar  avatar echobinarybytes avatar kyouhimesama avatar arkii avatar  avatar XXIVK avatar Pavel Kukov avatar Hailey Hu avatar  avatar Borislav Stanimirov avatar Hans Alemão avatar  avatar  avatar Ibrahim Ahmed avatar Yuanxing Duan avatar Eunkwang Jeon avatar  avatar sevenyq avatar lizz avatar WangYX avatar airium avatar neatstorie avatar JunBo Yang avatar Komahan Boopathy   கோமகன் பூபதி avatar  avatar Alex Leung avatar Dimitar Krastev avatar Hamidreza Ramezani avatar Lucas Alves avatar  avatar  avatar Ahmad Salim Al-Sibahi avatar Qian avatar Vos avatar Marc Lavallee avatar  avatar coderlee avatar  avatar Rémi Ratajczak avatar Jiayong avatar

Watchers

Steve Brown avatar James Cloos avatar  avatar Siqin Tao avatar WangYX avatar  avatar

nccl's Issues

Is there any guide about installation?

Hi there,
Thank you so much for the repo. Since I met a nccl problem on my windows machine.
But kinda confused by you repo how to use these things?
Thanks again.

Can we build a pip package from these?

It would be great if we could build a pip package for this.
Not sure where to start though, as I don't know where things should go under "normal": circumstances.

Any suggestions?

.EXE's not in directory after build

Hello, I'm having an issue where the .exe files are not showing up in the directory after I build the project. I'm trying to build the project for CUDA version 10.1. I get no errors from VS, and it tells me the build is successful. It creates nccl64_134.10.1.dll and copies cudart64_101.dll correctly, it just isn't creating the .exe files for some reason. I thought they might be somewhere else, but they're nowhere to be found. Any ideas?

More detailed step for install.

I can't understand how to build this nccl on windows, can you give more detailed steps for install?

I would be very grateful ;)

Error in CheckDelta

I had to comment out CheckDelta in the test files to make them work ... what are the implications of this? I am extremely new to C++ and building via Visual Studio (like, a few days of experience), so I do not have strong debugging skills! But please let me know your thoughts. Additionally, when I use task manager to track GPU activity, I only see my original GPU doing everything in terms of processing. The new, 2nd GPU is showing copying activity, and VRAM utilization increases, so I know something is working. I wonder if it is not doing any processing because of the commented-out code! I highly appreciate any help, I bought a 2nd 3090, a beefy PSU, an NVLink, and redid all my wiring for my Windows system without knowing that cross GPU communication is not straightforward on Windows! Your repo gives me hope though.

If there is anything I can provide, please let me know. There really isn't much of an error code, it just says:

(base) [User]\GitHub\NCCL\windows\test\single\x64\Debug>all_reduce_test 300000000
Using devices
Rank 0 uses device 0 [0x24] NVIDIA GeForce RTX 3090
Rank 1 uses device 1 [0x2d] NVIDIA GeForce RTX 3090

                                             out-of-place                    in-place
  bytes             N    type      op     time  algbw  busbw      res     time  algbw  busbw      res
 300000000     300000000    char     sumCuda failure [User]\GitHub\NCCL\test\include\test_utilities.h:313 'invalid argument'

Which I traced back to the CheckDelta calls in the test codes.

when i run nccl.10.1.vcxproj, i get wrong. it notes that

nccl.10.1.vcxproj -> D:\NCCL-master\windows\x64\Release\nccl64_134.10.1.dll copy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudart*.dll" "D:\NCCL-master\windows\x64\Release\" 系统找不到指定的路径。

in fact the file 'cudart64_101.dll' exists in 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin'

do u know how to fit it? thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.