This repository documents my learning process of CUDA programming to implement parallel algorithms for matrix decomposition.
CUDA programming refers to writing code using the CUDA (Compute Unified Device Architecture) platform developed by NVIDIA. It allows developers to harness the computational power of NVIDIA GPUs (Graphics Processing Units) for general-purpose processing tasks, beyond just graphics rendering. With CUDA programming, tasks can be parallelized and executed efficiently on GPU cores, enabling faster processing of data-intensive applications such as scientific simulations, deep learning, and data analytics. This approach is particularly beneficial for algorithms that can be decomposed into parallelizable tasks, as it can significantly accelerate their execution compared to traditional CPU-based computing.
- [1] Jia J T, Xie R, Ni S, et al. An efficient breakdown-free algorithm for numerically evaluating the determinants of (p, q)-pentadiagonal matrices[J]. Numerical Algorithms, 2024: 1-19.
- [2] Jia J T, Xie R, Xu X Y, et al. A bidiagonalization-based numerical algorithm for computing the inverses of (p, q)-tridiagonal matrices[J]. Numerical Algorithms, 2023, 93(2): 899-917.