Giter Club home page Giter Club logo

cap-cuda's Introduction

Gauss Seidel using CUDA

This small project contains the implementation of the Gauss Seidel linear equations solver, using NVIDIA CUDA for parallelizing computations using a GPU. The Gauss Seidel method for solving linear equations is an iterative method, in which the values for the given variables keep changing until a certain threshold of variance is reached.

The goal of this projet was to provide a faster resolution time than the sequential version of the code, which is also provided in the repository.


How does it work?

General considerations:

In order to parallelize the calculations, ALL data dependency constraints need to be omitted. This constraint relaxation will only mean that the solver function will take longer iterations to reach the predefined threshold. However, this is totally fine, as the speed up that is achieved by parallelizing the calculations, is way higher than the cost of more iterations.

Parallelization steps:

1. Decomposition: as CUDA architecture favours fine grain parallelization, the computational unit is defined as all the iterations a CUDA thread needs to perform in order to achieve a value delta smaller than the threshold.

2. Assignation: the assignment is pretty clear: each matrix cell will be paired with a CUDA thread. However, given that CUDA threads are grouped in chunks of variable size: how many threads per block should we defined? (this will determine the required number of blocks for a given matrix size).

The formula to apply is the following one:

int blocks_num = ceil((n-2) * (n-2) / threads_per_block)

3. Orchestation: first of all, the initial matrix needs to be copied from the host (CPU) to the device 8GPU) allowing the computations to begin.

The orchestation between the different CUDA threads is literally zero: each thread is given a cell to transform, and no iteration is done between threads. There is a special case to consider though: as whole groups of threads are raised, there may be the case that some threads from the last group are left with no cell to compute.

Finally, the computed results are copied from device memory (GPU) to host memory (CPU).

4. Mapping: all CUDA threads are executed in the same device (GPU). No multi-GPU support is provided.


What is in the repository?

The repository contains:

  • Sequential code version (gc_seq.cu).
  • CUDA code version (gs_cuda.cu).
  • Bash script: for executing with multiple configurations (exec_script.sh).
  • Makefile: used to compile and clean the project.

Usage

First of all, the NVCC compiler needs to be installed ⚠️. Later on, the executable is generated by doing:

$ make

Once the code is compiled, it can receive 2 arguments:

  • Matrix size: usually a power of 2 (64, 128, 256, 1024...).
  • Threads per block: better a multiple of 32 (32, 64, 128...).

The execution command will be as follows:

$ ./gs_cuda <matrix_size> <threads_per_block>

Example:

$ ./gs_cuda 1024 64

Results

As the whole goal of this project was to speed-up the resolution of the linear equation system, it is important to state the obtained acceleration. The speed-up values have been computed using the Amdahl's law:


Authors

This project was developped by:

Sinclert Pérez

Silvia Barbero

Pablo León

cap-cuda's People

Contributors

fyrier avatar sinclert avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.