Giter Club home page Giter Club logo

ruapu's Introduction

ruapu

GitHub License GitHub Actions Workflow Status

Detect CPU ISA features with single-file

CPU✅ x86, x86-64
✅ arm, aarch64
✅ mips
✅ powerpc
✅ s390x
✅ loongarch
✅ risc-v
✅ openrisc
#define RUAPU_IMPLEMENTATION
#include "ruapu.h"

int main()
{
    // initialize ruapu once
    ruapu_init();

    // now, tell me if this cpu has avx2
    int has_avx2 = ruapu_supports("avx2");

    // loop all supported features
    const char* const* supported = ruapu_rua();
    while (*supported)
    {
        fprintf(stderr, "%s\n", *supported);
        supported++;
    }

    return 0;
}
OS✅ Windows
✅ Linux
✅ macOS
✅ Android
✅ iOS
✅ FreeBSD
✅ NetBSD
✅ OpenBSD
✅ DragonflyBSD
✅ Solaris
✅ SyterKit
Compiler✅ GCC
✅ Clang
✅ MSVC
✅ MinGW

Best practice for using ruapu.h in multiple compilation units

  1. Create one ruapu.c for your project
  2. ruapu.c is ONLY #define RUAPU_IMPLEMENTATION and #include "ruapu.h"
  3. Other sources #include "ruapu.h" but NO #define RUAPU_IMPLEMENTATION

Features

  • Detect CPU ISA with single-file    sse2, avx, avx512f, neon, etc.
  • Detect vendor extended ISA     apple amx, risc-v vendor ISA, etc.
  • Detect richer ISA on Windows ARM   IsProcessorFeaturePresent() returns little ISA information
  • Detect x86-avx512 on macOS correctlymacOS hides it in cpuid
  • Detect new CPU's ISA on old systemsthey are usually not exposed in auxv or MISA
  • Detect CPU hidden ISA        fma4 on zen1, ISA in hypervisor, etc.

Supported ISA  (more is comming ... :)

CPU ISA
x86 mmx sse sse2 sse3 ssse3 sse41 sse42 sse4a xop avx f16c fma fma4 avx2 avx512f avx512bw avx512cd avx512dq avx512vl avx512vnni avx512bf16 avx512ifma avx512vbmi avx512vbmi2 avx512fp16 avx512er avx5124fmaps avx5124vnniw avxvnni avxvnniint8 avxvnniint16 avxifma amxfp16 amxbf16 amxint8 amxtile aesni sha
arm half edsp neon vfpv4 idiv
aarch64 neon vfpv4 lse cpuid asimdrdm asimdhp asimddp asimdfhm bf16 i8mm frint jscvt fcma mte mte2 sve sve2 svebf16 svei8mm svef32mm svef64mm sme smef16f16 smef64f64 smei64i64 pmull crc32 aes sha1 sha2 sha3 sha512 sm3 sm4 svepmull svebitperm sveaes svesha3 svesm4 amx
mips msa mmi sx asx msa2 crypto
powerpc vsx
s390x zvector
loongarch lsx lasx
risc-v i m a f d c v zba zbb zbc zbs zbkb zbkc zbkx zcb zfa zfbfmin zfh zfhmin zicond zicsr zifencei zmmul zvbb zvbc zvfh zvfhmin zvfbfmin zvfbfwma zvkb zvl32b zvl64b zvl128b zvl256b zvl512b zvl1024b xtheadba xtheadbb xtheadbs xtheadcondmov xtheadfmemidx xtheadfmv xtheadmac xtheadmemidx xtheadmempair xtheadsync xtheadvdot spacemitvmadot spacemitvmadotn spacemitvfmadot
openrisc orbis32 orbis64 orfpx32 orfpx64 orvdx64

Let's ruapu

ruapu with C

Compile ruapu test program

# GCC / MinGW
gcc main.c -o ruapu
# Clang
clang main.c -o ruapu
# MSVC
cl.exe /Fe: ruapu.exe main.c

Run ruapu in command line

./ruapu
mmx = 1
sse = 1
sse2 = 1
sse3 = 1
ssse3 = 1
sse41 = 1
sse42 = 1
sse4a = 1
xop = 0
... more lines omitted ...

ruapu with Python

Compile and install ruapu library

# from pypi
pip3 install ruapu
# from source code
pip3 install ./python

Use ruapu in python

import ruapu

ruapu.supports("avx2")
# True

ruapu.supports(isa="avx2")
# True

ruapu.rua()
#(mmx', 'sse', 'sse2', 'sse3', 'ssse3', 'sse41', 'sse42', 'avx', 'f16c', 'fma', 'avx2')

ruapu with Rust

Compile ruapu library

# from source code
cd rust
cargo build --release

Use ruapu in Rust

extern crate ruapu;

fn main() {
    println!("supports neon: {}", ruapu::supports("neon").unwrap());
    println!("supports avx2: {}", ruapu::supports("avx2").unwrap());
    println!("rua: {:?}", ruapu::rua());
}

ruapu with Lua

Compile ruapu library

# from source code
cd lua
# lua binding has been tested on Lua 5.2~5.4
luarocks make

Use ruapu in Lua

ruapu = require "ruapu";
print(ruapu.supports("mmx"));
for _, ext in ipairs(ruapu.rua()) do
    print(ext);
end

ruapu with Erlang

Compile ruapu library

% add this to deps list 
% in your rebar.config
{ruapu, "0.1.0"}

Use ruapu in Erlang rebar3 shell

ruapu:rua().
{ok,["neon","vfpv4","asimdrdm","asimdhp","asimddp",
     "asimdfhm","bf16","i8mm","pmull","crc32","aes","sha1",
     "sha2","sha3","sha512","amx"]}
> ruapu:supports("neon").
true
> ruapu:supports(neon).
true
> ruapu:supports(<<"neon">>).
true
> ruapu:supports("avx2").
false
> ruapu:supports(avx2).
false
> ruapu:supports(<<"avx2">>).
false

ruapu with Fortran

Compile ruapu library

# from source code
cd fortran
cmake -B build
cmake --build build

Use ruapu in Fortran

program main
    use ruapu, only: ruapu_init, ruapu_supports, ruapu_rua
    implicit none

    character(len=:), allocatable :: isa_supported(:)
    integer :: i

    call ruapu_init()

    print *, "supports sse: ", ruapu_supports("sse")
    print *, "supports neon: ", ruapu_supports("neon")

    isa_supported = ruapu_rua()
    do i = 1, size(isa_supported)
        print *, trim(isa_supported(i))
    end do
end program main

ruapu with Golang

Compile ruapu library

cd go
go build -o ruapu-go

Use ruapu in Golang

package main

import (
	"fmt"
	"ruapu-go/ruapu"
	"strconv"
)

func main() {
	ruapu.Init()
	avx2Status := ruapu.Supports("avx2")
	fmt.Println("avx2:" + strconv.Itoa(avx2Status))
	rua := ruapu.Rua()
	fmt.Println(rua)
}

ruapu with Haskell

Add ruapu library to your project

haskell/Ruapu.hs, haskell/ruapu.c and ruapu.h should be copied in your project.

Use ruapu in Haskell

import Ruapu
-- Ruapu.rua :: IO [String]
-- Ruapu.supports :: String -> IO Bool
main = do
    Ruapu.init
    Ruapu.supports "mmx" >>= putStrLn . show
    Ruapu.rua >>= sequence_ . map putStrLn

ruapu with Vlang

Compile ruapu library

cd vlang
v .

Use ruapu in Vlang

module main

import ruapu

fn main() {
    ruapu.ruapu_init()
    mut avx2_status := ruapu.ruapu_supports('avx2')
    if avx2_status {
        println('avx2: ' + avx2_status.str())
    }

    println(ruapu.ruapu_rua())
}

ruapu with Pascal

Compile ruapu library

cd pascal
sudo apt install fpc
cmake .
make
fpc ruapu.lpr

Use ruapu in Pascal

program ruapu;

uses ruapu_pascal;

var
  has_avx2: integer;
  supported: PPAnsiChar;
begin
  // initialize ruapu once
  ruapu_init();

  // now, tell me if this cpu has avx2
  has_avx2 := ruapu_supports('avx2');

  // loop all supported features
  supported := ruapu_rua();
  while supported^ <> nil do
  begin
      writeln(supported^);
      inc(supported);
  end;

  readln();
end.
      

ruapu with Java

Compile ruapu library and example

./gradlew build

Run example

java -cp \
    ./build/libs/ruapu-1.0-SNAPSHOT.jar \
    ./Example.java

Use ruapu in Java

import ruapu.Ruapu;
import java.util.*;

class Example {
    public static void main(String args[]) {
        Ruapu ruapu = new Ruapu();
        
        System.out.println("avx: " + ruapu.supports("avx")); 
        // avx: 1
        System.out.println(Arrays.toString(ruapu.rua())); 
        // [mmx, sse, sse2, sse3, ssse3, sse41, sse42, avx, f16c, fma, avx2]
    }
}
      

ruapu with cangjie

Compile ruapu library

cd cangjie
cd c-src
cmake .
make

run example

cd cangjie
cjpm run

or compile example

cd cangjie
cjpm build
./target/release/bin/main
Use ruapu in cangjie
import ruapu.*
main(): Int64 {
    ruapu_init()
    let neon_supported = ruapu_supports("neon")
    println("supports neon: ${neon_supported}") 
    let d = ruapu_rua()
    for (i in d) {
        println(i)
    }
    return 0
}
Github-hosted runner result (Linux)
mmx = 1
sse = 1
sse2 = 1
sse3 = 1
ssse3 = 1
sse41 = 1
sse42 = 1
sse4a = 1
xop = 0
avx = 1
f16c = 1
fma = 1
avx2 = 1
avx512f = 0
avx512bw = 0
avx512cd = 0
avx512dq = 0
avx512vl = 0
avx512vnni = 0
avx512bf16 = 0
avx512ifma = 0
avx512vbmi = 0
avx512vbmi2 = 0
avx512fp16 = 0
avx512er = 0
avx5124fmaps = 0
avx5124vnniw = 0
avxvnni = 0
avxvnniint8 = 0
avxifma = 0
amxfp16 = 0
amxbf16 = 0
amxint8 = 0
amxtile = 0
Github-hosted runner result (macOS)
mmx = 1
sse = 1
sse2 = 1
sse3 = 1
ssse3 = 1
sse41 = 1
sse42 = 1
sse4a = 0
xop = 0
avx = 1
f16c = 1
fma = 1
avx2 = 1
avx512f = 0
avx512bw = 0
avx512cd = 0
avx512dq = 0
avx512vl = 0
avx512vnni = 0
avx512bf16 = 0
avx512ifma = 0
avx512vbmi = 0
avx512vbmi2 = 0
avx512fp16 = 0
avx512er = 0
avx5124fmaps = 0
avx5124vnniw = 0
avxvnni = 0
avxvnniint8 = 0
avxifma = 0
amxfp16 = 0
amxbf16 = 0
amxint8 = 0
amxtile = 0
Github-hosted runner result (macOS M1)
neon = 1
vfpv4 = 1
cpuid = 0
asimdhp = 1
asimddp = 1
asimdfhm = 1
bf16 = 0
i8mm = 0
sve = 0
sve2 = 0
svebf16 = 0
svei8mm = 0
svef32mm = 0
Github-hosted runner result (Windows)
mmx = 1
sse = 1
sse2 = 1
sse3 = 1
ssse3 = 1
sse41 = 1
sse42 = 1
sse4a = 1
xop = 0
avx = 1
f16c = 1
fma = 1
avx2 = 1
avx512f = 0
avx512bw = 0
avx512cd = 0
avx512dq = 0
avx512vl = 0
avx512vnni = 0
avx512bf16 = 0
avx512ifma = 0
avx512vbmi = 0
avx512vbmi2 = 0
avx512fp16 = 0
avx512er = 0
avx5124fmaps = 0
avx5124vnniw = 0
avxvnni = 0
avxvnniint8 = 0
avxifma = 0
amxfp16 = 0
amxbf16 = 0
amxint8 = 0
amxtile = 0
FreeBSD/NetBSD/OpenBSD VM result (x86_64)
mmx = 1
sse = 1
sse2 = 1
sse3 = 1
ssse3 = 1
sse41 = 1
sse42 = 1
sse4a = 1
xop = 0
avx = 1
f16c = 1
fma = 1
fma4 = 0
avx2 = 1
avx512f = 0
avx512bw = 0
avx512cd = 0
avx512dq = 0
avx512vl = 0
avx512vnni = 0
avx512bf16 = 0
avx512ifma = 0
avx512vbmi = 0
avx512vbmi2 = 0
avx512fp16 = 0
avx512er = 0
avx5124fmaps = 0
avx5124vnniw = 0
avxvnni = 0
avxvnniint8 = 0
avxifma = 0
amxfp16 = 0
amxbf16 = 0
amxint8 = 0
amxtile = 0

Techniques inside ruapu

ruapu is implemented in C language to ensure the widest possible portability.

ruapu determines whether the CPU supports certain instruction sets by trying to execute instructions and detecting whether an Illegal Instruction exception occurs. ruapu does not rely on the cpuid instructions and registers related to the CPU architecture, nor does it rely on the MISA information and system calls of the operating system. This can help us get more detailed CPU ISA information.

FAQ

Why is the project named ruapu

 ruapu is the abbreviation of rua-cpu, which means using various extended instructions to harass and amuse the CPU (rua!). Based on whether the CPU reacts violently (throws an illegal instruction exception), it is inferred whether the CPU supports a certain extended instruction set.

Why is ruapu API designed like this

 We consider gcc builtin functions to be good practice, saying __builtin_cpu_init() and __builtin_cpu_supports(). ruapu refers to this design, which can be a 1:1 replacement for gcc functions, and supports more operating systems and compilers, giving it better portability.

Why does SIGILL occur when executing in debugger or simulator, such as gdb, lldb, qemu-user, sde etc.

 Because debuggers and simulators capture the signal and stop the ruapu signal handler function by default, we can continue execution at this time, or configure it specifically, such as handle SIGILL nostop in gdb. ruapu technically cannot prevent programs from stopping in debuggers and emulators

How to add detection capabilities for new instructions to ruapu

Assume that the new extended instruction set is named rua

  1. Add RUAPU_INSTCODE(rua, rua-inst-hex) // rua r0,r0 and RUAPU_ISAENTRY(rua) in ruapu.h
  2. Add PRINT_ISA_SUPPORT(rua) in main.c to print the detection result
  3. Add entries about rua in README.md
  4. Create a pull request!

https://godbolt.org/ is a good helper to view the compiled binary code of instructions.

Repos that use ruapu

  • ncnnHigh-performance neural network inference framework
  • libllmEfficient inference of large language models

Credits

Contribution behavior

License

MIT License

ruapu's People

Contributors

alexhjh avatar apachiww avatar bakacai avatar cocoa-xu avatar cyyself avatar deepdive543443 avatar dependabot[bot] avatar dreamcmi avatar dtcxzyw avatar eastonman avatar junchao-loongson avatar kernelbin avatar ljoson avatar mizu-bai avatar mollysophia avatar monkeyking avatar nihui avatar scarsty avatar strongtz avatar synodriver avatar tianzerl avatar whyb avatar yoh-z avatar yuzukitsuru avatar zchrissirhcz avatar ziyao233 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ruapu's Issues

more information

Will more CPU information be displayed in the library in the future? Or is it just ISA information, like I want to get the CPU's cache capacity support.

这个库未来会增加更多的CPU信息展示么?还是只是ISA信息,比如我想要获取CPU的cache容量支持情况。

Expect to add more x64 ISA about AI

AVX512(Intel® Advanced Vector Extensions 512)

AVX512_BITALG (expect to add)
AVX512_VPOPCNTDQ (expect to add)
AVX512_VP2INTERSECT (expect to add)

AVX512PF (Intel® Xeon Phi™ only.)
AVX512ER (Intel® Xeon Phi™ only.)
AVX512_4VNNIW (Intel® Xeon Phi™ only.)
AVX512_4FMAPS (Intel® Xeon Phi™ only.)

AVX (VEX-encoded) versions of the Vector Neural Network Instructions

AVX512_VNNI (already exists)
AVX_VNNI (already exists)
AVX_VNNI_INT8 (already exists)
AVX_VNNI_INT16 (expect to add)

Intel® AMX(Intel® Advanced Matrix Extensions)

AMX_FP16 (expect to add)
AMX_COMPLEX (expect to add)
AMX_BF16 (expect to add)
AMX_TILE (expect to add)
AMX_INT8 (expect to add)

reference: intel architecture instruction set extensions programming reference

It seems that ruapu has broken the std::chrono clock of mingw64

使用mingw64 gcc13.2 (Windows 11),运行 ruapu_init() 后会让std::chrono无法正确计算时间,返回nan

mingw64 Clang也有问题, 详见godbolt


I'm using mingw64 with gcc 13.2 (Windows 11), and run ruapu_init() will make std::chrono clock return nan.

mingw64 with Clang also has this issue, example in godbolt


#include <chrono>
#include <cstdio>

#define RUAPU_IMPLEMENTATION
#include "ruapu.h"

struct Timer {
    std::chrono::time_point<std::chrono::high_resolution_clock> start = std::chrono::high_resolution_clock::now();
    ~Timer()
    {
        auto end = std::chrono::high_resolution_clock::now();
        std::chrono::duration<double, std::milli> duration = end - start;
        std::printf("Timer took %lfms\n", duration.count());
    }
};

int main()
{
    ruapu_init();
    {
        Timer timer{};
    }
    return 0;
}

image

Segmentation fault on s390x with `-O0`

A weird segmentation fault on s390x machine if compiled with -O0 (i.e., the default optimalisation level). Works fine with any other optimalisation level, like -O3 or -Os.

$ gcc main.c -o ruapu
$ ./ruapu
Segmentation fault (core dumped)
$ gcc main.c -O3 -o ruapu
$ ./ruapu
zvector = 1
$ gcc main.c -Os -o ruapu
$ ./ruapu
zvector = 1

Tested on an s390x virtual machine on IBM Cloud.

GCC version:

gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/s390x-linux-gnu/11/lto-wrapper
Target: s390x-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.4.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=s390x-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch=z13 --with-tune=z15 --enable-s390-excess-float-precision --with-long-double-128 --enable-multilib --enable-checking=release --build=s390x-linux-gnu --host=s390x-linux-gnu --target=s390x-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.