Gauss Seidel using CUDA

This small project contains the implementation of the Gauss Seidel linear equations solver, using NVIDIA CUDA for parallelizing computations using a GPU. The Gauss Seidel method for solving linear equations is an iterative method, in which the values for the given variables keep changing until a certain threshold of variance is reached.

The goal of this projet was to provide a faster resolution time than the sequential version of the code, which is also provided in the repository.

How does it work?

General considerations:

In order to parallelize the calculations, ALL data dependency constraints need to be omitted. This constraint relaxation will only mean that the solver function will take longer iterations to reach the predefined threshold. However, this is totally fine, as the speed up that is achieved by parallelizing the calculations, is way higher than the cost of more iterations.

Parallelization steps:

1. Decomposition: as CUDA architecture favours fine grain parallelization, the computational unit is defined as all the iterations a CUDA thread needs to perform in order to achieve a value delta smaller than the threshold.

2. Assignation: the assignment is pretty clear: each matrix cell will be paired with a CUDA thread. However, given that CUDA threads are grouped in chunks of variable size: how many threads per block should we defined? (this will determine the required number of blocks for a given matrix size).

The formula to apply is the following one:

int blocks_num = ceil((n-2) * (n-2) / threads_per_block)

3. Orchestation: first of all, the initial matrix needs to be copied from the host (CPU) to the device 8GPU) allowing the computations to begin.

The orchestation between the different CUDA threads is literally zero: each thread is given a cell to transform, and no iteration is done between threads. There is a special case to consider though: as whole groups of threads are raised, there may be the case that some threads from the last group are left with no cell to compute.

Finally, the computed results are copied from device memory (GPU) to host memory (CPU).

4. Mapping: all CUDA threads are executed in the same device (GPU). No multi-GPU support is provided.

What is in the repository?

The repository contains:

Sequential code version (gc_seq.cu).
CUDA code version (gs_cuda.cu).
Bash script: for executing with multiple configurations (exec_script.sh).
Makefile: used to compile and clean the project.

Usage

First of all, the NVCC compiler needs to be installed ⚠️. Later on, the executable is generated by doing:

$ make

Once the code is compiled, it can receive 2 arguments:

Matrix size: usually a power of 2 (64, 128, 256, 1024...).
Threads per block: better a multiple of 32 (32, 64, 128...).

The execution command will be as follows:

$ ./gs_cuda <matrix_size> <threads_per_block>

Example:

$ ./gs_cuda 1024 64

Results

As the whole goal of this project was to speed-up the resolution of the linear equation system, it is important to state the obtained acceleration. The speed-up values have been computed using the Amdahl's law:

Authors

This project was developped by:

Sinclert Pérez

Silvia Barbero

Pablo León

makberl / cap-cuda Goto Github PK

cap-cuda's Introduction

Gauss Seidel using CUDA

How does it work?

General considerations:

Parallelization steps:

What is in the repository?

Usage

Results

Authors

cap-cuda's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent