Tracking.jl: Accelerating multi-antenna GNSS receivers with CUDA

This repository contains the source code for the paper "Tracking.jl: Accelerating multi-antenna GNSS receivers with CUDA". It is submitted to be published in the JuliaCon Proceedings.

Scripts reproducing the benchmarks and figures can be found under /scripts, the algorithms source code under /src. Paper itself resides under /paper. Here is a URL to the current paper.pdf.

Data

You can download the raw data from the experiments on the two platforms specified in the paper here. The dataset is identified witha a DOI: 10.5281/zenodo.5933726

How to use this repository

This code base is using the Julia Language and DrWatson to make a reproducible scientific project named

GPUAcceleratedTracking

It is authored by Can Özmaden.

To (locally) reproduce this project, do the following:

Download this code base. Notice that raw data are typically not included in the git-history and may need to be downloaded independently.

Open a Julia console and do:

julia> using Pkg
julia> Pkg.add("DrWatson") # install globally, for using `quickactivate`
julia> Pkg.activate("path/to/this/project")
julia> Pkg.instantiate()

This will install all necessary packages for you to be able to run the scripts and everything should work out of the box, including correctly finding local paths.

Algorithms Description

NVIDIA Jetson AGX Xavier

Acknowledgements

The following packages have played a crucial role during the preparation of this paper:

JuliaGNSS: Tracking.jl, GNSSSignals.jl
GPU Programming: CUDA.jl
Data analysis and visualization: BenchmarkTools.jl, DrWatson.jl, DataFrames.jl, Makie.jl

Review

Hi @coezmaden! I was asked to review your JuliaCon proceedings submission in here JuliaCon/proceedings-review#128, so here goes 🙂

In general, your paper was nicely written and a pleasant read, so kudos for that! I have a few comments on the paper itself, but also on the code and the repository. I hope this is helpful!

(Also a caveat: I'm going to focus on the things I'm familiar with, which is Julia and GPUs. I hope somebody else can take a look at the SDR-related pieces.)

Reproducibility

The README should mention a Julia version to use, as 1.8 seemed incompatible (I took 1.7 from the Manifest, which works)

It would also be good if the README contained some information on how to reproduce the measurements. I gather I first need to run the scripts/run_benchmark scripts before scripts/plot_benchmarks? But that doesn't actually run on the GPU, so I have to customize the params? Doing so I still don't get the actual plots from the paper, so it'd be good to add some instructions on that.

Are there plans to upstream this work into Tracking.jl? I'm missing a bit how this work is to be used with the rest of the ecosystem, as the GPUAcceleratedTracking repository is focussed on the benchmarks related to the paper. The template in JuliaCon/proceedings-review#128 specifically asks about example usage / functionality documentation / tests etc, and that does seem to be missing a bit.

Performance

Generally I was surprised by the desktop GPU being outperformed by the CPU, and the explanation is fairly short. I see in the repository that you did actually profile the code using NVTX/nsys/ncu, so what were the conclusions from that work? What makes the GPU implementation slow (memory copy, kernels, etc)? Launch overhead shouldn't be it when we're doing >1ms of processing.

The charts in the paper also show some suspicious processing time drops at the highest sampling frequency, as well as some generally very noisy measurements for certain GPU implementations. What's going on there?

It also wasn't clear to me if your measurements are fully end-to-end? I notice you're using BenchmarkTools, so you're not 'cheating' by only measuring kernel execution time, but do your measurements include the time to upload memory to the GPU and download results back (if that matters)?

Finally, the template asks for a comparison to approaches with similar goals. Do you know of any?

Paper

In general, the paper is nicely written 👍 There's a couple of typo's (s/tieing/tying/, s/hadrware/hardware/), so I'd recommend going through it with a LaTeX-aware spell checker.

The template does ask for a more explicit 'statement of need' though, so you should extract some of the introduction into such a section. As part of that, I would elaborate some more on the need for GPU power in the context of SDR/GNSS processing, as that's only mentioned in passing.

I would also suggest using \texttt for code-like names instead of quoting them as you currently do (e.g. an evaluation of the "cplx_multi" reduction).

Finally, there's some missing DOIs that the bot spotted in the other thread.

Now for some more detailed comments:

Methodology

I'd mention the CUDA.jl (3.8) and Julia (1.7) version.

You mention 'this ensures a coalesced memory access ... discussed in more detail later', but I don't actually find more details there.

Algorithm

wrt. device limits, I would mention that each dimension has its own limit, and then elaborate that blocks are special because there's a limit in x*y*z.

The grid limit is currently shown in the overview table, and isn't terribly interesting, especially because it's fairly stable: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications-technical-specifications-per-compute-capability

s/first-grade support/first-class support/

About the parallel reduction, it's unclear to me if you're describing CUDA's existing mapreducedim here or whether this is a new implementation or extension. I think it would be good to elaborate on that.

You also mention cooperative groups, but I don't see those used in the code? This because despite what your explanation suggests, they don't fully obviate a multi-pass reduction, as you need to be able to fit all thread blocks in memory when doing a grid-wide sync.

Experiment Evaluation

Generally, see the comments on performance above. It would be good to put some of those clarifications in the paper.

coezmaden / gpuacceleratedtracking Goto Github PK

gpuacceleratedtracking's Introduction

Tracking.jl: Accelerating multi-antenna GNSS receivers with CUDA

Data

How to use this repository

Algorithms Description

NVIDIA Jetson AGX Xavier

Acknowledgements

gpuacceleratedtracking's People

Contributors

Stargazers

Watchers

Forkers

gpuacceleratedtracking's Issues

Reproducibility

Performance

Paper

Recommend Projects

Recommend Topics

Recommend Org