agenium-scale / nsimd Goto Github PK

View Code? Open in Web Editor NEW

316.0 27.0 29.0 7.08 MB

Agenium Scale vectorization library for CPUs and GPUs

License: MIT License

C++ 17.38% Python 40.06% C 40.32% Batchfile 0.30% Shell 1.29% CMake 0.65%

simd simd-programming sse2 sse42 avx avx2 avx512 neon neon128 aarch64

nsimd's People

Contributors

Stargazers

Watchers

nsimd's Issues

Citing NSIMD in a research paper

Hey! I am working on a research paper that has benchmarks that uses NSIMD for vectorization. I use the open-source version. I could not find this repository on JOSS or Zenodo.

Is there a set way to cite your repository? How do you guys prefer to be cited in a paper?

-Thanks

Add "copysign" intrinsic

There is no copysign intrinsic; it should be straightforward to add.

Tests should special case rsqrt8

The test case generator generates special code for rsqrt11. It should probably do the same for rsqrt8.

Implement a `flipsign` function

I find the non-standard flipsign function https://docs.julialang.org/en/v1/base/math/#Base.flipsign convenient, e.g. to implement upwind finite differencing stencils. It can be defined as

flipsign(x, y) = copysign(1, y) * x

but can be implemented more efficiently as

flipsign(x, y) = x ^ (y & SIGNMASK)

Autodetect CPU architecture

It seems that I need to add -DNSIMD_AVX2 -DNSIMD_FMA when I compile.

Instead, nsimd should be able to detect this from CPU flags at compile time, or should remember this from how it was configured with cmake.

Provide cbrt

cbrt is an ANSI C math function. It would be convenient if it was supported in nsimd, instead of having to fall back to scalar code.

Feature Request : Better support of msvc compiler

The variable NSIMD_CXX is misused in case of MSVC compiler
EG: NSIMD_CXX ==2014 for visual 2019
eg: we can't check value like NSIMD_CXX < 201103L
to switch on allocator flavor

Speed up Python code generation

The mechanism used in the get_impl functions which stores C code for intrinsics in the impls dictionary regenerates code for all (!) the intrinsics every time code for one of the intrinsics is generated. This slows down code generation.

To see this effect, you need to disable using clang-format, which is the slowest part of code generation.

I suggest using lambdas in the impls dictionary, so that code generation can be delayed until when a dictionary entry is actually used.

Compilation issue with msvc 2015 on 32 bits target

Hello,

We are facing a compilation issue using nsimd on 32 bit target (VS2015 toolset v140_xp) :

1>D:\Projets\DevSources\platform_sw_nsimd\sw_pico..\common\misc_lib\nsimd\include\nsimd/nsimd.h(419): error C3861: '__popcnt64' : identificateur introuvable
1>D:\Projets\DevSources\platform_sw_nsimd\sw_pico..\common\misc_lib\nsimd\include\nsimd/x86/sse2/storeu.h(47): error C2719: 'a1' : le paramètre formel avec l'alignement demandé 16 ne sera pas aligné
1>D:\Projets\DevSources\platform_sw_nsimd\sw_pico..\common\misc_lib\nsimd\include\nsimd/x86/sse2/storeu.h(75): error C2719: 'a1' : le paramètre formel avec l'alignement demandé 16 ne sera pas aligné
1>D:\Projets\DevSources\platform_sw_nsimd\sw_pico..\common\misc_lib\nsimd\include\nsimd/x86/sse42/storeu.h(47): error C2719: 'a1' : le paramètre formel avec l'alignement demandé 16 ne sera pas aligné
1>D:\Projets\DevSources\platform_sw_nsimd\sw_pico..\common\misc_lib\nsimd\include\nsimd/x86/sse42/storeu.h(75): error C2719: 'a1' : le paramètre formel avec l'alignement demandé 16 ne sera pas aligné
1>D:\Projets\DevSources\platform_sw_nsimd\sw_pico..\common\misc_lib\nsimd\include\nsimd/x86/sse2/eq.h(47): error C2719: 'a0' : le paramètre formel avec l'alignement demandé 16 ne sera pas aligné
1>D:\Projets\DevSources\platform_sw_nsimd\sw_pico..\common\misc_lib\nsimd\include\nsimd/x86/sse2/eq.h(47): error C2719: 'a1' : le paramètre formel avec l'alignement demandé 16 ne sera pas aligné
1>D:\Projets\DevSources\platform_sw_nsimd\sw_pico..\common\misc_lib\nsimd\include\nsimd/x86/sse2/eq.h(56): error C2719: 'a0' : le paramètre formel avec l'alignement demandé 16 ne sera pas aligné
1>D:\Projets\DevSources\platform_sw_nsimd\sw_pico..\common\misc_lib\nsimd\include\nsimd/x86/sse2/eq.h(56): error C2719: 'a1' : le paramètre formel avec l'alignement demandé 16 ne sera pas aligné

the first line (error C3861: '__popcnt64' : identificateur introuvable) make me think that it has never been tested with such target

Thanks in advance

nsimd defines `i64` etc. in global namespace

It seems that nsimd defines types i64 etc. in the global namespace (see file nsimd.h, lines 793 ff.). These should probably be prefixed with nsimd_.

Provide a C11 API for core

Add is_aligned for both C and C++ APIs

// C
int is_aligned(const void* const ptr) {
#if NSIMD_WORD_SIZE == 32
const u32 val = (u32)ptr;
#else
const u64 val = (u64)ptr;
#endif
return val % (NSIMD_MAX_ALIGNMENT / CHAR_BIT) == 0;
}

// C++
template<typename T>
bool is_aligned(const T* const ptr) {
#if NSIMD_WORD_SIZE == 32
u32 val = (u32)ptr;
#else
u64 val = (u64)ptr;
#endif
return val % (NSIMD_MAX_ALIGNMENT / CHAR_BIT) == 0;
}

Provide more "inline" operators such as copysign, isfinite, etc...

Certain math functions such as copysign can be implemented efficiently with bitwise operations. It would be convenient to have these available:

copysign
isfinite
isinf
isnan
isnormal
signbit

I find the non-standard flipsign function https://docs.julialang.org/en/v1/base/math/#Base.flipsign convenient, e.g. to implement upwind finite differencing stencils. It can be defined as

flipsign(x, y) = copysign(1, y) * x

but can be implemented more efficiently as

flipsign(x, y) = x ^ (y & SIGNMASK)

Usage of SVE pack with vector

While trying the following code:

#include <nsimd/nsimd-all.hpp>
#include <vector>

int main()
{
    std::vector<nsimd::pack<float> > vect;

    return 0;
}

I run into a compilation error:

gupta2@juawei-a27:~/codes/test_codes$ armclang++ -DNSIMD_SVE -march=armv8-a+sve -ftree-vectorize -I$HOME/install/arm/nsimd/include -L/$HOME/install/arm/nsimd/lib sve.cpp
In file included from sve.cpp:2:
In file included from /opt/ohpc/pub/ARM/opt/arm/gcc-8.2.0_Generic-AArch64_RHEL-7_aarch64-linux/lib/gcc/aarch64-linux-gnu/8.2.0/../../../../include/c++/8.2.0/vector:64:
/opt/ohpc/pub/ARM/opt/arm/gcc-8.2.0_Generic-AArch64_RHEL-7_aarch64-linux/lib/gcc/aarch64-linux-gnu/8.2.0/../../../../include/c++/8.2.0/bits/stl_vector.h:286:35: error: arithmetic on a pointer to an incomplete type
      'nsimd::pack<float, 1, nsimd::sve>'
                      _M_impl._M_end_of_storage - _M_impl._M_start);
                      ~~~~~~~~~~~~~~~~~~~~~~~~~ ^
/opt/ohpc/pub/ARM/opt/arm/gcc-8.2.0_Generic-AArch64_RHEL-7_aarch64-linux/lib/gcc/aarch64-linux-gnu/8.2.0/../../../../include/c++/8.2.0/bits/stl_vector.h:391:7: note: in instantiation of member function 'std::_Vector_base<nsimd::pack<float,
      1, nsimd::sve>, std::allocator<nsimd::pack<float, 1, nsimd::sve> > >::~_Vector_base' requested here
      vector()
      ^
sve.cpp:6:38: note: in instantiation of member function 'std::vector<nsimd::pack<float, 1, nsimd::sve>, std::allocator<nsimd::pack<float, 1, nsimd::sve> > >::vector' requested here
    std::vector<nsimd::pack<float> > vect;
                                     ^
In file included from sve.cpp:2:
In file included from /opt/ohpc/pub/ARM/opt/arm/gcc-8.2.0_Generic-AArch64_RHEL-7_aarch64-linux/lib/gcc/aarch64-linux-gnu/8.2.0/../../../../include/c++/8.2.0/vector:62:
/opt/ohpc/pub/ARM/opt/arm/gcc-8.2.0_Generic-AArch64_RHEL-7_aarch64-linux/lib/gcc/aarch64-linux-gnu/8.2.0/../../../../include/c++/8.2.0/bits/stl_construct.h:136:25: error: incomplete type '_Value_type'
      (aka 'nsimd::pack<float, 1, nsimd::sve>') used in type trait expression
      std::_Destroy_aux<__has_trivial_destructor(_Value_type)>::
                        ^
/opt/ohpc/pub/ARM/opt/arm/gcc-8.2.0_Generic-AArch64_RHEL-7_aarch64-linux/lib/gcc/aarch64-linux-gnu/8.2.0/../../../../include/c++/8.2.0/bits/stl_construct.h:206:7: note: in instantiation of function template specialization
      'std::_Destroy<nsimd::pack<float, 1, nsimd::sve> *>' requested here
      _Destroy(__first, __last);
      ^
/opt/ohpc/pub/ARM/opt/arm/gcc-8.2.0_Generic-AArch64_RHEL-7_aarch64-linux/lib/gcc/aarch64-linux-gnu/8.2.0/../../../../include/c++/8.2.0/bits/stl_vector.h:567:7: note: in instantiation of function template specialization
      'std::_Destroy<nsimd::pack<float, 1, nsimd::sve> *, nsimd::pack<float, 1, nsimd::sve> >' requested here
        std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish,
             ^
sve.cpp:6:38: note: in instantiation of member function 'std::vector<nsimd::pack<float, 1, nsimd::sve>, std::allocator<nsimd::pack<float, 1, nsimd::sve> > >::~vector' requested here
    std::vector<nsimd::pack<float> > vect;
                                     ^
2 errors generated.

How do I work with SVE vector packs?

How to use SSE2 128-bit pack in my project?

Hey! I'm using this repository to optimize my code, but I don't know how to use 128-bit registers in my program. I'm compiling and runing in Linux, and my CPU is "Intel(R) Xeon(R) Gold 6161 CPU @ 2.20GHz". I'm sure it supports SSE2. here are my operations.

I use this command to generate files first.

python3 egg/hatch.py -Af

My CMakeList.txt is like this.

set(NSIMD_INCLUDE_DIRS ${CMAKE_CURRENT_SOURCE_DIR}/nsimd/include)
ExternalProject_Add(nsimd
    SOURCE_DIR "${CMAKE_CURRENT_SOURCE_DIR}/nsimd"
    BINARY_DIR "${CMAKE_BINARY_DIR}/third_party/nsimd"
    CMAKE_CACHE_ARGS "-DCMAKE_POSITION_INDEPENDENT_CODE:BOOL=true"
    CMAKE_ARGS "-DCMAKE_INSTALL_PREFIX=${CMAKE_BINARY_DIR}/External/ -DSIMD=SSE2 -DSIMD_OPTIONALS=FMA"
)

My project code is like this.

#include <nsimd/nsimd-all.hpp>

...some codes...

using BaseType = int8_t;
using PackType = nsimd::pack<BaseType>;
uint64_t packLen = nsimd::len(PackType());
std::cout << packLen << "\n";

The output is "8". Did it means this pack only support 64bit data? How can I get a pack that support 128bit or more? Thank you.

Provide a constexpr size function

I could not find a way to use the size() function of the C++ pack structure in a constexpr manner. For example, this fails:

constexpr int vsize = pack<double>().size();

because the constructor pack<double>() is not constexpr.

I believe one way to obtain the size of a fixed-size container is via tuple_size. This works e.g. for std::array as well. One could then write

constexpr size_t vsize = std::tuple_size_v<pack<double>>;

Have a real documentation

Right shift operation not working

Issue type: Error

While utilizing the Right shift and Left shift bitwise operators, I'm getting an error:

/home/nk/opt/nsimd/include/nsimd/cxx_adv_api_functions.hpp:998:16: error: no matching function for call to ‘shr(const simd_vector&, int&, float, nsimd::cpu)’
  998 |   ret.car = shr(a0.car, a1, T(), SimdExt());

Reproducing the error

Here is the minimal program to reproduce the issue:

#include <iostream>
#include <nsimd/nsimd-all.hpp>

int main()
{
    nsimd::pack<float> f(42.0f);
    nsimd::pack<float> f2 = nsimd::shr(f2, 1);

    std::cout << f2 << std::endl;

    return 0;
}

Is there something I'm doing wrong?

Expected behavior

Essentially I wish to change a simd vector, say [1, 2, 3, 4] to look like [0,1,2,3] using the right shift operation.

Masked vector store functions?

Does nsimd provide masked store functions (e.g. vstoreu_masked or similar)? I could not find any.

I am maintaining the SIMD library of the Einstein Toolkit (see https://bitbucket.org/cactuscode/cactusutils/src/master/Vectors/), and am interested in exploring a community supported approach. If there are no masked store intrinsics in nsimd, would you be interested in accepting a contribution? Could you provide a few rough guidelines for implementing this?

Add packl ctor taking bool + add overloads of operators involving packl's and bool's

Documentation on building/running tests is probably out of date.

So, following the instructions, I ran cmake

 cmake .. -DSIMD=AVX2 -DDEV=1 -DBOOST_ROOT=/**/boost_1_72_0 -GNinja

It says:

CMake Warning:
  Manually-specified variables were not used by the project:

    BOOST_ROOT
    DEV

Then: ninja -j1 update - unknown target

However, running:

ninja -j 4 tests
ctest

Has worked, apparently successfully: 100% tests passed, 0 tests failed out of 2691

Was running on macos, I know - not a supported target, but I hope it will work since the tests passed.

Feature request : Add satured Add and Sub

Hi i migrate from boost::simd to nsimd
It seems that nsimd lack of satured operator for addition/substraction (or i miss something)
It could be usefull for ImageProcessing algorithm
I will try to add it on my side and make a pull request

Three tests failing on MacOS (Intel, AVX2)

When I check out a new copy of nsimd, configure with

python3 egg/hatch.py --all --force
mkdir build && cd build
cmake -GNinja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ -DCMAKE_C_FLAGS='-D_DARWIN_C_SOURCE' -DCMAKE_CXX_FLAGS='-D_DARWIN_C_SOURCE' -DSIMD=AVX2 -DSIMD_OPTIONALS=FMA -DCMAKE_INSTALL_PREFIX=$HOME/nsimd ..
cmake --build .
cmake --build . --target tests
ctest . -V

then three tests are failing:

The following tests FAILED:
	505 - tests.c_base.rec11.f16.c (Failed)
	1211 - tests.cxx_adv.rec11.f16.cpp (Failed)
	1913 - tests.cxx_base.rec11.f16.cpp (Failed)

Debug output in test cases?

The files nsimd.cpp and nsimd-all.cpp output results via printf, as e.g. with the lines

  for (int i = 0; i < n; i++) {
    fprintf(stdout, "%f vs %f\n", double(buf[i]), double(-i * i));
  }

In the file nsimd-all.cpp, this output also differs from the condition that decides whether the test case succeeds or fails (a +1 is missing).

Feature request: shuffle, split, tofloat, toint support

Unless I am missing the correct naming, it looks like
nsimd misses some functions like shuffle, split, tofloat, toint

Are those planned ?

Thanks !

Cannot build 2.0 release

When I download the 2.0 release and follow the instructions, I receive this error:

$ bash scripts/build.sh for sse2 sse42 avx avx2 with gcc
+ set -e
+ SETUP_SH=/tmp/nsimd-2.0/scripts/setup.sh
+ NSCONFIG=/tmp/nsimd-2.0/scripts/../nstools/bin/nsconfig
+ HATCH_PY=/tmp/nsimd-2.0/scripts/../egg/hatch.py
+ BUILD_ROOT=/tmp/nsimd-2.0/scripts/..
+ sh /tmp/nsimd-2.0/scripts/setup.sh
+ set -e
+ NSTOOLS_DIR=/tmp/nsimd-2.0/scripts/../nstools
+ [email protected]:agenium-scale/nstools.git
+ NSTOOLS_URL2=https://github.com/agenium-scale/nstools.git
+ '[' -e /tmp/nsimd-2.0/scripts/../nstools/README.md ']'
+ cd /tmp/nsimd-2.0/scripts/..
++ git remote get-url origin
++ sed s/nsimd/nstools/g
fatal: not a git repository (or any of the parent directories): .git
+ git clone
fatal: You must specify a repository to clone.

Provide += operators etc.

The pack class in the C++ API does not provide += etc. operators. One has to write x = x + y instead of the shortcut x += y.

[setup.bat] change NSTOOLS_URL in setup.sh/setup.bat

Hi and thanks a lot for this library and congrats for V2
I'm facing issue related to build nstools on windows (MSVC 2019 14.28)
For fetching nstools the below url is used

[email protected]:agenium-scale/nstools.git

Which seems to require some user specific key
Changing to

https://github.com/agenium-scale/nstools

Solve the pb

Error while building Nsimd with AARCH64

While building Nsimd with -DSIMD=AARCH64 and arm-hpc-compiler, I am getting errors as described here.

I do not face any issues while building with gcc. The errors come up only with compilers with clang backends (ex: arm-hpc-compiler, clang). I face the error with the Clang 9.0.1 as well, which is fairly new.

Here is the output of lscpu:

gupta2@juawei-a19:~/2d_stencil/benchmark/builds/arm_trace(master)$ lscpu
Architecture:        aarch64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              64
On-line CPU(s) list: 0-63
Thread(s) per core:  1
Core(s) per socket:  32
Socket(s):           2
NUMA node(s):        4
Vendor ID:           ARM
Model:               2
Model name:          Cortex-A72
Stepping:            r0p2
BogoMIPS:            100.00
L1d cache:           2 MiB
L1i cache:           3 MiB
L2 cache:            16 MiB
L3 cache:            64 MiB
NUMA node0 CPU(s):   0-15
NUMA node1 CPU(s):   16-31
NUMA node2 CPU(s):   32-47
NUMA node3 CPU(s):   48-63
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid

Here is the output of uname -a:

gupta2@juawei-a19:~/2d_stencil/benchmark/builds/arm_trace(master)$ uname -a
Linux juawei-a19 4.18.0-80.7.2.el7.aarch64 #1 SMP Thu Sep 12 16:13:20 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux

The builds were done with the current master.

Add shuffles

First, size agnostic shuffles, such as reverse, unpack, zip, unzip, ... will be added. Then it seems that (to be confirmed) all supported and yet to be supported architecture by NSIMD have SIMD vector lengths multiple of 128 bits. Therefore custom shuffles whose pattern is 128-bits wide can be repeated on all 128-bits lanes of a SIMD vector which make thoses shuffles also length agnostic. In any case one can follow the same reasoning on 64-bits patterns as SIMD vector length are, a priori, multiple of 64 bits which is sizeof(float64_t) * 8.

When regenerating files, only write those files that changed

When regenerating files, only write those files that changed. This would make it much faster to rebuild everything, especially the tests.

Strange intrinsic name "round_to_even"

The intrinsic name round_to_even sounds strange. I assume that this is the usual "round to the nearest integer, break ties towards even numbers". However, the name of the intrinsic reads as if it always rounded towards the nearest even number, i.e. as if it never returned an odd number.

Installs into "include/include"

I just installed nsimd (master branch) on OS X, and the include files ended up in $INSTALLDIR/include/include/nsimd. Note the double include/include that probably shouldn't be there.

Why rounding to zero when converting from f32 to f16?

I notice that nsimd rounds towards zero (instead of towards the nearest representable number) when converting f32 to f16 on AVX512 architectures. See the constants _MM_FROUND_TO_ZERO in the function store in platform_x86.py. All things being equal, nsimd should use the constant _MM_FROUND_TO_NEAREST_INT instead.

Allow conversion between logical and scalars

It can easily be done using if_else, but we should offer a simpler way of doing it.

Self-test errors on macOS

I have downloaded and built the master branch of nsimd. This is macOS with an Intel CPU. I configured with

../nstools/bin/nsconfig .. -Dsimd=avx2 -Dmpfr='-I/opt/local/include -L/opt/local/lib -lmpfr'

and then ran the self-tests with

../nstools/bin/nstest -j$(nproc)

This failed with the errors

-- SUMMARY: 8 fails out of 3059 tests
-- FAILED: ./tests.cxx_adv.notl.u8.cpp98
-- FAILED: ./tests.cxx_base.gather.f16.cpp98
-- FAILED: ./tests.cxx_base.gather.f32.cpp98
-- FAILED: ./tests.cxx_base.maskz_loadu1.i8.cpp11
-- FAILED: ./tests.cxx_base.upcvt.u32_to_f64.cpp98
-- FAILED: ./tests.modules.fixed_point.abs.fp_8_7.cpp11
-- FAILED: ./tests.modules.fixed_point.andl.fp_4_1.cpp11
-- FAILED: ./tests.modules.fixed_point.ne.fp_8_4.cpp98

I believe nsimd chose my MacPorts-installed clang version 11.0.0 as compiler.

NSIMD SVE intrinsics generates movprfx

Most NSIMD SVE intrinsics generates a movprfx instruction. This is caused by the use of *_z intrinsics which puts zeros in inactive lanes. and the compiler must use this instruction to generrate correct code. But as all SVE intrinsics in NSIMD are use with svptrue_*() we simple must use *_x intrinsics which puts undefined values in inactive lanes and do not generate this instructions.

We do not know if movprfx slows down execution ; to be tested but less code to execute seems better (at first glance at least).

For details on movprfx see https://developer.arm.com/documentation/ddi0596/2021-03/SVE-Instructions/MOVPRFX--unpredicated---Move-prefix--unpredicated--

Separate human-maintained and autogenerated code

I'm trying to modify nsimd, and I find it difficult to get started since it's not obvious which files are autogenerated and which are not. It would be nice if all autogenerated code was safely "stashed away" into its own subdirectory.

Support for WebAssembly SIMD

Hi there,

Are you intending on adding support for WebAssembly SIMD?

Add loads/stores of BFloat16

BFloat16 are truncated standard float32, therefore

loads involves unpacks and
stores involves unzip

This is OK for all supported architectures.

Reference: https://en.wikipedia.org/wiki/Bfloat16_floating-point_format.

mask_for_loop_tail produces scalar code

I think that mask_for_loop_tail calculates the mask via scalar code, which is then assembled into a mask vector. (I checked with AVX2.) Using iota() instead of calculating the mask element-by-element should prevent this.

Use `option` function in CMake

Support all NSIMD operators for the fixed_point module

Currently, only a small part of the basic NSIMD operators are implemented in the fixed_point module. However, most of the other operators like multiple loads/stores, zip/unzip or casts can be easily wrapped too.

Provide fabs, fmax, fmin

The C++ standard provides fabs, fmax, and fmin, with the same meaning as abs, max, and min. It would be convenient to have these available for nsimd::pack as well.

Too many arguments for format string

File gen_tests.py, line 405 is

            code += ['nsimd::store{}u(&vout_nsimd[i], vc);'.format(logical, typ)]

which has one {} in the format string, but has two arguments.

Add support for CMake (CMakeLists.txt and FindNsimd.cmake as a start)

Is there a FindNsimd.cmake file or similar structure to find nsimd through CMake, or a method to ease searching nsimd when using a CMake project?

Don't use unions for reinterpreting data

The C++ standard does not allow using unions to reinterpret data as a different type, e.g. to access the bit patterns of a float (see e.g. https://en.wikipedia.org/wiki/Type_punning#Use_of_union). One has to use memcpy instead.

While GCC allows this as an extension to the C++ standard, other compilers do not. I don't recall exactly which compilers these are, but I have had trouble in the past while using the IBM XL or PGI compilers on non-Intel architectures.

Add documentation for to_pack, to_pack_interleave, get_pack and scoped_aligned_mem

Fix missing documentation for:

to_pack
to_pack_interleave
get_pack
scoped_aligned_mem

Use Sleef math functions to complete NSIMD

NSIMD currently lasks "big" math functions such as cos, sin, exp and many other. We plan to use the excellent Sleef instead of providing our own for several reasons:

Sleef supports a wide range of SIMD extensions
Sleef is a library recognized by many people:
- Arm
- IBM
- Unity
- NAIST (Nara Institute of Science and Technology)
Sleef is tested seriously using MPFR
Sleef is benchmarked and show really good results

Feature request : Populate library with c++ STL like algorithm

Hi i migrate from boost::simd to nsimd
It seems that nsimd doesnt provide high level STL like algorithm (Transform,Reduce,etc...)
I could try to implement them on my side
Questions:
Where to put them?
Where to put tests?

agenium-scale / nsimd Goto Github PK

nsimd's People

Contributors

Stargazers

Watchers

Forkers

nsimd's Issues

Issue type: Error

Reproducing the error

Expected behavior

Recommend Projects

Recommend Topics

Recommend Org