Giter Club home page Giter Club logo

Comments (4)

gquintin avatar gquintin commented on June 12, 2024

Hi ustcsq,

I am not sure how you get the segfault --> https://godbolt.org/z/z1se7oGe3
To help you I will need a fully functionnal snippet of code along with its compilation command-line.

My advice

Do not write code like this. The code you write won't even compile for certain SIMD extensions such as Arm SVE. You have to separate concerns : data structure and computations. You should use a std::vector<float> and in you computation code something like this:

template <typename T>
int computations(std::vector<T> &tmp) {
    typedef nsimd::pack<T> vec; // shortcut
    int s = nsimd::len<vec>(); // shortcut
    int n = (int)tmp.size();  // shortcut
    T *ptr = tmp.data(); // always work with raw pointers
    for (int i = 0; i + s <= tmp.size(); i += s) {
        vec v = nsimd::loadu<vec>(ptr + i);
        // ... do some work with v
        nsimd::storeu(ptr + i, v);
    }
}

from nsimd.

ustcsq avatar ustcsq commented on June 12, 2024

https://godbolt.org/z/z9fMd3o8n
I got the segfault under avx2 gcc version <=10.3
thks

from nsimd.

gquintin avatar gquintin commented on June 12, 2024

It seems to be a bug of those versions of GCC. As you can see when disassembling the resulting binary and doing a run with GDB you can see the problem. A lot of people think that doing a std::vector<nsimd::pack<float>> or a std::vector<__m256> will be more optimized as there will be no need for loads/stores. But this is of course wrong. When writing tmp[0] a load (or store) instruction is generated by the compiler. And this is where is the problem. You cannot assume that data is properly aligned and obviously it is not but the compiler wrongly generated a movaps as you can see below

  friend std::ostream &operator<<(std::ostream &os, pack const &a0) {
    12b0:       4c 8d 54 24 08          lea    0x8(%rsp),%r10
    12b5:       48 83 e4 e0             and    $0xffffffffffffffe0,%rsp
        __ostream_insert(__out, __s,
    12b9:       ba 02 00 00 00          mov    $0x2,%edx
    12be:       41 ff 72 f8             pushq  -0x8(%r10)
    12c2:       55                      push   %rbp
    12c3:       48 89 e5                mov    %rsp,%rbp
    12c6:       41 56                   push   %r14
    12c8:       41 55                   push   %r13
    12ca:       41 54                   push   %r12
    12cc:       49 89 fc                mov    %rdi,%r12
    12cf:       41 52                   push   %r10
    12d1:       53                      push   %rbx
    12d2:       48 81 ec 28 01 00 00    sub    $0x128,%rsp
    T buf[max_len_t<T>::value];
    storeu(buf, a0.car, T(), SimdExt());
    12d9:       c5 fc 28 06             vmovaps (%rsi),%ymm0
    12dd:       48 8d 35 20 0d 00 00    lea    0xd20(%rip),%rsi        # 2004 <_IO_stdin_used+0x4>
}

extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm256_storeu_ps (float *__P, __m256 __A)
{
  *(__m256_u *)__P = __A;
    12e4:       c5 fc 29 85 d0 fe ff    vmovaps %ymm0,-0x130(%rbp)
    12eb:       ff 
    12ec:       c5 fc 29 85 b0 fe ff    vmovaps %ymm0,-0x150(%rbp)
    12f3:       ff 

To bypass this GCC bug you can do the following:

#include <nsimd/nsimd-all.hpp>
#include <iostream>

int main() {
    std::vector<nsimd::pack<float>, nsimd::allocator<float>> tmp(8);
    std::cout << "tmp : " << tmp.size() << ", " << tmp.capacity() << std::endl;
    std::cout << tmp[0] << std::endl;
    return 0;
}

The nsimd::allocator will force the proper alignment of data and the wrongly generated movaps by GCC will be given aligned pointers and it will work.

But again, please, do not write code like this.

from nsimd.

ustcsq avatar ustcsq commented on June 12, 2024

thks

from nsimd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.