Giter Club home page Giter Club logo

cuda-programming's Introduction

CUDA programming C++

The most common deep learning frameworks such as Tensorflow and PyThorch often rely on kernel calls in order to use the GPU to compute parallel computations and accelerate the computation of such networks. The most famous interface that allows developers to program using the GPU is CUDA, created by NVIDIA. This repository will keep track of my progress in this area. I will base it mainly on what I'm learning man by man from my master in deep learning run by Deep Learning Italia Academy, on Udemy CUDA programming Masterclass with C++ and also of course on NVIDIA documentation.

My purpose is to deepen my knowledge about parallel programming!

parallel_cube

In this repository :

  • Hello World

    I learned key concepts such as host (cpu) and device (gpu) computation, the context switch method, and the apparent parallel execution of cpu. The difference between process and thread, how threads share memory. I know that there are 2 level of prallelism (1) task level and (2) data level. The difference between parallelism and concurrency. Finally I was able to launch the kernel using the grid and block parameters

  • Threads Organization

    Often figuring out how and which threads access the kernel function is difficult. I have learned to use variables of type dim3 blockIdx, blockDim, gridDim to identify them.

  • Unique Index Calculation

    Often identifying unique thread IDs can be difficult, especially when using grids and 2 or even 3 dimensional blocks. Here I solve this problem

  • Memory Transfer

    In addition to processing data on the GPU, we also need to transfer data from the CPU to the GPU, and transfer the results back.

  • Sum Array

    Let's transfer and sum 2 arrays in GPU. Monitor the time needed using clocks, and also lets handle the CUDA errors creating a macro and wrapping all the CUDA functions.

  • Device Query

    Here is a simple script to query on the fly our device and get its properties

  • Intro to Warps

    We should consider the parallelism between software and hardware. Since each core of a SM can execute in parallel only a single warp (32 thread) this should be the otimal number oh threads in a block. If we 1 single thread in a block, the hardware will still assign a warp of 32 with resources for 32 threads, but 31 of htem will be inactive, and it will be a waste of resources.

  • Wrap Divergence

    Wrap divergence is an issue for prallel computing. Part of the wrap, and so part of the NVIDIA SM can be disabled, and you can waste resources. Pay attention to if-else statments. You can check the branch_efficiencu metric using compiling with nvcc and running nvprof

cuda-programming's People

Contributors

march-08 avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.