Giter Club home page Giter Club logo

x86-simd-sort's Introduction

x86-simd-sort

C++ header file library for SIMD based 16-bit, 32-bit and 64-bit data type sorting algorithms on x86 processors. Source header files are available in src directory. We currently only have AVX-512 based implementation of quicksort, argsort, quickselect, paritalsort and key-value sort. This repository also includes a test suite which can be built and run to test the sorting algorithms for correctness. It also has benchmarking code to compare its performance relative to std::sort. The following API's are currently supported:

Quicksort

void avx512_qsort<T>(T* arr, int64_t arrsize)

Supported datatypes: uint16_t, int16_t, _Float16, uint32_t, int32_t, float, uint64_t, int64_t and double.

For floating-point types, if arr contains NaNs, they are moved to the end and replaced with a quiet NaN. That is, the original, bit-exact NaNs in the input are not preserved.

Argsort

std::vector<int64_t> arg = avx512_argsort<T>(T* arr, int64_t arrsize)
void avx512_argsort<T>(T* arr, int64_t *arg, int64_t arrsize)

Supported datatypes: uint32_t, int32_t, float, uint64_t, int64_t and double.

The algorithm resorts to scalar std::sort if the array contains NaNs.

Quickselect

void avx512_qselect<T>(T* arr, int64_t arrsize)
void avx512_qselect<T>(T* arr, int64_t arrsize, bool hasnan)

Supported datatypes: uint16_t, int16_t, _Float16, uint32_t, int32_t, float, uint64_t, int64_t and double.

For floating-point types, if bool hasnan is set, NaNs are moved to the end of the array, preserving the bit-exact NaNs in the input. If NaNs are present but hasnan is false, the behavior is undefined.

Partialsort

void avx512_partial_qsort<T>(T* arr, int64_t arrsize)
void avx512_partial_qsort<T>(T* arr, int64_t arrsize, bool hasnan)

Supported datatypes: uint16_t, int16_t, _Float16, uint32_t, int32_t, float, uint64_t, int64_t and double.

For floating-point types, if bool hasnan is set, NaNs are moved to the end of the array, preserving the bit-exact NaNs in the input. If NaNs are present but hasnan is false, the behavior is undefined.

Key-value sort

void avx512_qsort_kv<T>(T* key, uint64_t* value , int64_t arrsize)

Supported datatypes: uint64_t, int64_t and double

Algorithm details

The ideas and code are based on these two research papers [1] and [2]. On a high level, the idea is to vectorize quicksort partitioning using AVX-512 compressstore instructions. If the array size is < 128, then use Bitonic sorting network implemented on 512-bit registers. The precise network definitions depend on the size of the dtype and are defined in separate files: avx512-16bit-qsort.hpp, avx512-32bit-qsort.hpp and avx512-64bit-qsort.hpp. Article [4] is a good resource for bitonic sorting network. The core implementations of the vectorized qsort functions avx512_qsort<T>(T*, int64_t) are modified versions of avx2 quicksort presented in the paper [2] and source code associated with that paper [3].

Example to include and build this in a C++ code

Sample code main.cpp

#include "src/avx512-32bit-qsort.hpp"

int main() {
    const int ARRSIZE = 1000;
    std::vector<float> arr;

    /* Initialize elements is reverse order */
    for (int ii = 0; ii < ARRSIZE; ++ii) {
        arr.push_back(ARRSIZE - ii);
    }

    /* call avx512 quicksort */
    avx512_qsort(arr.data(), ARRSIZE);
    return 0;
}

Build using gcc

g++ main.cpp -mavx512f -mavx512dq -O3

This is a header file only library and we do not provide any compile time and run time checks which is recommended while including this your source code. A slightly modified version of this source code has been contributed to NumPy (see this pull request for details). This NumPy pull request is a good reference for how to include and build this library with your source code.

Build requirements

None, its header files only. However you will need make or meson to build the unit tests and benchmarking suite. You will need a relatively modern compiler to build.

gcc >= 8.x

Build using Meson

meson is the recommended build system to build the test and benchmark suite.

meson setup builddir && cd builddir && ninja

It build two executables:

  • testexe: runs a bunch of tests written in ./tests directory.
  • benchexe: measures performance of these algorithms for various data types.

Build using Make

Makefile uses -march=sapphirerapids as a global compile flag and hence it will require g++-12. make command builds two executables:

  • testexe: runs a bunch of tests written in ./tests directory.
  • benchexe: measures performance of these algorithms for various data types and compares them to std::sort.

You can use make test and make bench to build just the testexe and benchexe respectively.

Requirements and dependencies

The sorting routines relies only on the C++ Standard Library and requires a relatively modern compiler to build (gcc 8.x and above). Since they use the AVX-512 instruction set, they can only run on processors that have AVX-512. Specifically, the 32-bit and 64-bit require AVX-512F and AVX-512DQ instruction set. The 16-bit sorting requires the AVX-512F, AVX-512BW and AVX-512 VMBI2 instruction set. The test suite is written using the Google test framework. The benchmark is written using the google benchmark framework.

References

x86-simd-sort's People

Contributors

r-devulap avatar sterrettm2 avatar mosullivan93 avatar bkmgit avatar rdower avatar zbjornson avatar ruclz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.