hello, first a big thanks for simde! i'm trying to use it to compile

Add operator overloading for C++,about simd-everywhere/simde

Comments (21)

nemequ commented on May 20, 2024 1

First off, thanks for the knd words. I'm glad someone has found SIMDe useful :)

Interesting; I don't think ~ is technically part of the SSE2 API (nor are any other operators, AFAIK), but I know some compilers support because they just typedef __m128i to something using the compiler's vector extensions. There's not a lot we can do about that in C, but since you're using C++ I think it should be possible to use operator overloading to achieve what you're after.

If you don't want to rely on compiler-specific features, you could convert it to use intrinsics. There is no bitwise not; GCC generates a pcmpeqd and a pxor, so instead of ~val you would have _mm_xor_si128(val, _mm_cmpeq_epi32(val, val)). ~~Since you have several bitwise not operations in there it would probably be wise to save the value with all bits set~~ (edit: nevermind, looked at the code again and the values aren't necessarily reusable).

simde__m128i tmp = ~simde_mm_and_si128(simde_mm_and_si128(simde_mm_xor_si128(__a32, simde_mm_cmpeq_epi32(__a32, __a32)), simde_mm_xor_si128(__b32, simde_mm_cmpeq_epi32(__b32, __b32))), simde_mm_castps_si128(__chance32));
__c32 = simde_mm_and_si128(__c32, simde_mm_xor_si128(tmp, simde_mm_cmpeq_epi32(tmp, tmp))); // Make 0 if a == 0 AND b == 0

Of course, a SIMDe implementation would be a bit less abstruse since we would just use the bitwise not operator, which could be a bit faster for architectures which support bitwise not (e.g., NEON with vmvnq_s32). First, I think a prefixed extension would be in order… maybe something like (untested):

SIMDE__FUNCTION_ATTRIBUTES
simde__m128i
simde_x_mm_not_si128 (simde__m128i a) {
#if defined(SIMDE_SSE2_NEON)
  return SIMDE__M128I_NEON_C(i32, vmvnq_s32(a.neon_i32));
#else
  simde__m128i r;
  SIMDE__VECTORIZE
  for (size_t i = 0 ; i < (sizeof(r.i32) / sizeof(r.i32[0])) ; i++) {
    r.i32[i] = ~(a.i32[i]);
  }
  return r;
#endif
}

That way people can take advantage from C, too. Then, just slap on something like

#if defined(__cplusplus)
SIMDE__FUNCTION_ATTRIBUTES
simde__m128i
operator~(simde__m128i v) {
  return simde_x_mm_not_si128(v);
}
#endif

And everything should work.

This one function wouldn't be a big deal, but if SIMDe is going to support operator overloading in C++ I'd like to do it all at once, or at least for all the x86 ISA extensions. I'm pretty busy right now but I'll try to add operator overloading as soon as I have some time, unless you'd like to take a stab at a patch.

from simde.

hexdump0815 commented on May 20, 2024

hi nemequ,

thanks for the quick response. i think i'll have to reread your resonse a few more times to understand and then will try what you suggested. please let me know if i should test some potential future patches as well. i'll report back the results of your above proposal.

btw. i made another suggestion for simde in #38 too :)

once more a lot of thanks and best wishes - hexdump

from simde.

nemequ commented on May 20, 2024

I've created an operator-overloading branch which adds operator overloading to simde__m128i; we would need something similar for at least the other Intel types (simde__m64, simde__m128, simde__m128i, simde__m128d, simde__m256, simde__m256i, simde__m256d, …). Note that I also had to implement a couple of extensions (for mul and mod as well as not) to fill in the holes.

from simde.

nemequ commented on May 20, 2024

BTW, using ~ doesn't work on PGI.

from simde.

hexdump0815 commented on May 20, 2024

i have tried to compile against your operator-overloading branch. first, can it be that there is still a small typo in it?

diff --git a/simde/x86/sse2.h b/simde/x86/sse2.h
index 1819b1d..5cc0bfb 100644
--- a/simde/x86/sse2.h
+++ b/simde/x86/sse2.h
@@ -4044,7 +4044,7 @@ operator&(simde__m128i v) {
 SIMDE__FUNCTION_ATTRIBUTES
 simde__m128i
 operator~(simde__m128i v) {
-  return simde_mm_not_si128(v);
+  return simde_x_mm_not_si128(v);
 }
 
 #endif

afterwards i'm getting the following compile errors:

g++ -O3 -Wsuggest-override -std=c++11  -DSLUG=Valley -fPIC -I/compile/Rack/include -I/compile/Rack/dep/include -DVERSION=0.6.16 -MMD -MP -g -O3 -march=armv7 -mfpu=neon -ffast-math -fno-finite-math-only -Wall -Wextra -Wno-unused-parameter -DARCH_LIN -c -o build/src/Plateau/Dattorro.cpp.o src/Plateau/Dattorro.cpp
In file included from src/Plateau/../Common/DSP/InterpDelay.hpp:4:0,
                 from src/Plateau/../Common/DSP/AllpassFilter.hpp:2,
                 from src/Plateau/Dattorro.hpp:6,
                 from src/Plateau/Dattorro.cpp:1:
/compile/Rack/include/x86/sse2.h: In function 'simde__m128i operator+(simde__m128i)':
/compile/Rack/include/x86/sse2.h:4011:30: error: too few arguments to function 'simde__m128i simde_mm_add_epi64(simde__m128i, simde__m128i)'
   return simde_mm_add_epi64(v);
                              ^
In file included from src/Plateau/../Common/DSP/InterpDelay.hpp:4:0,
                 from src/Plateau/../Common/DSP/AllpassFilter.hpp:2,
                 from src/Plateau/Dattorro.hpp:6,
                 from src/Plateau/Dattorro.cpp:1:
/compile/Rack/include/x86/sse2.h:238:1: note: declared here
 simde_mm_add_epi64 (simde__m128i a, simde__m128i b) {
 ^~~~~~~~~~~~~~~~~~
In file included from src/Plateau/../Common/DSP/InterpDelay.hpp:4:0,
                 from src/Plateau/../Common/DSP/AllpassFilter.hpp:2,
                 from src/Plateau/Dattorro.hpp:6,
                 from src/Plateau/Dattorro.cpp:1:
/compile/Rack/include/x86/sse2.h: In function 'simde__m128i operator-(simde__m128i)':
/compile/Rack/include/x86/sse2.h:4017:30: error: too few arguments to function 'simde__m128i simde_mm_sub_epi64(simde__m128i, simde__m128i)'
   return simde_mm_sub_epi64(v);
                              ^
/compile/Rack/include/x86/sse2.h:3482:1: note: declared here
 simde_mm_sub_epi64 (simde__m128i a, simde__m128i b) {
 ^~~~~~~~~~~~~~~~~~
/compile/Rack/include/x86/sse2.h: In function 'simde__m128i operator*(simde__m128i)':
/compile/Rack/include/x86/sse2.h:4023:32: error: too few arguments to function 'simde__m128i simde_x_mm_mul_epi64(simde__m128i, simde__m128i)'
   return simde_x_mm_mul_epi64(v);
                                ^
/compile/Rack/include/x86/sse2.h:2067:1: note: declared here
 simde_x_mm_mul_epi64 (simde__m128i a, simde__m128i b) {
 ^~~~~~~~~~~~~~~~~~~~
/compile/Rack/include/x86/sse2.h: At global scope:
/compile/Rack/include/x86/sse2.h:4028:25: error: 'simde__m128i operator/(simde__m128i)' must take exactly two arguments
 operator/(simde__m128i v) {
                         ^
/compile/Rack/include/x86/sse2.h:4034:25: error: 'simde__m128i operator%(simde__m128i)' must take exactly two arguments
 operator%(simde__m128i v) {
                         ^
/compile/Rack/include/x86/sse2.h: In function 'simde__m128i operator&(simde__m128i)':
/compile/Rack/include/x86/sse2.h:4041:30: error: too few arguments to function 'simde__m128i simde_mm_and_si128(simde__m128i, simde__m128i)'
   return simde_mm_and_si128(v);
                              ^
In file included from src/Plateau/../Common/DSP/InterpDelay.hpp:4:0,
                 from src/Plateau/../Common/DSP/AllpassFilter.hpp:2,
                 from src/Plateau/Dattorro.hpp:6,
                 from src/Plateau/Dattorro.cpp:1:
/compile/Rack/include/x86/sse2.h:396:1: note: declared here
 simde_mm_and_si128 (simde__m128i a, simde__m128i b) {
 ^~~~~~~~~~~~~~~~~~
/compile/Rack/compile.mk:64: recipe for target 'build/src/Plateau/Dattorro.cpp.o' failed
make: *** [build/src/Plateau/Dattorro.cpp.o] Error 1

any idea what is happening here?

a lot of thanks in advance and best wishes - hexdump

from simde.

nemequ commented on May 20, 2024

Oops. Can you try this (on top of the operator-overloading branch)?

diff --git a/simde/x86/sse2.h b/simde/x86/sse2.h
index 1819b1d..b7e1d7b 100644
--- a/simde/x86/sse2.h
+++ b/simde/x86/sse2.h
@@ -4007,38 +4007,38 @@ simde_x_mm_not_si128 (simde__m128i a) {
 
 SIMDE__FUNCTION_ATTRIBUTES
 simde__m128i
-operator+(simde__m128i v) {
-  return simde_mm_add_epi64(v);
+operator+(simde__m128i a, simde__m128i b) {
+  return simde_mm_add_epi64(a, b);
 }
 
 SIMDE__FUNCTION_ATTRIBUTES
 simde__m128i
-operator-(simde__m128i v) {
-  return simde_mm_sub_epi64(v);
+operator-(simde__m128i a, simde__m128i b) {
+  return simde_mm_sub_epi64(a, b);
 }
 
 SIMDE__FUNCTION_ATTRIBUTES
 simde__m128i
-operator*(simde__m128i v) {
-  return simde_x_mm_mul_epi64(v);
+operator*(simde__m128i a, simde__m128i b) {
+  return simde_x_mm_mul_epi64(a, b);
 }
 
 SIMDE__FUNCTION_ATTRIBUTES
 simde__m128i
-operator/(simde__m128i v) {
-  return simde_mm_div_epi64(v);
+operator/(simde__m128i a, simde__m128i b) {
+  return simde_mm_div_epi64(a, b);
 }
 
 SIMDE__FUNCTION_ATTRIBUTES
 simde__m128i
-operator%(simde__m128i v) {
-  return simde_x_mm_mod_epi64(v);
+operator%(simde__m128i a, simde__m128i b) {
+  return simde_x_mm_mod_epi64(a, b);
 }
 
 SIMDE__FUNCTION_ATTRIBUTES
 simde__m128i
-operator&(simde__m128i v) {
-  return simde_mm_and_si128(v);
+operator&(simde__m128i a, simde__m128i b) {
+  return simde_mm_and_si128(a, b);
 }
 
 SIMDE__FUNCTION_ATTRIBUTES

from simde.

nemequ commented on May 20, 2024

I just pushed a new operator-overloading branch which I actually tested (well, made sure it compiled).

This was a bit more painful than I expected; even though the _mm_div_epi* functions are annotated as only requiring SSE (so I thought it was already in SIMDe), they're actually part of SVML. I'll have to add a C++ test case.

from simde.

hexdump0815 commented on May 20, 2024

thanks a lot - i just recompiled it based on the latest operator-overloading and can confirm, that it is compiling perfectly fine now. once more a lot of thanks!

from simde.

Flix01 commented on May 20, 2024

Hey! I think I need this too for __m128 operations.
I'm trying to convert this physic library to simde and I'm getting something like:

../simd_emu/nudge.cpp:841:20: error: invalid operands to binary expression
      ('nudge::simd4_float' (aka 'simde__m128') and 'nudge::simd4_float')
                simd4_float c = a*b;
                                ~^~
../simd_emu/nudge.cpp:842:41: error: invalid operands to binary expression
      ('simde__m128' and 'simde__m128')
  ...return simd128::shuffle32<0,0,0,0>(c) + simd128::shuffle32<1,1,1,1>(c) ...
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../simd_emu/nudge.cpp:846:50: error: invalid operands to binary expression
      ('simde__m128' and 'simde__m128')
  ...c = simd128::shuffle32<1,2,0,0>(a) * simd128::shuffle32<2,0,1,0>(b);
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../simd_emu/nudge.cpp:847:50: error: invalid operands to binary expression
      ('simde__m128' and 'simde__m128')
  ...d = simd128::shuffle32<2,0,1,0>(a) * simd128::shuffle32<1,2,0,0>(b);
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../simd_emu/nudge.cpp:848:12: error: invalid operands to binary expression
      ('nudge::simd4_float' (aka 'simde__m128') and 'nudge::simd4_float')
                return c - d;
                       ~ ^ ~
../simd_emu/nudge.cpp:854:10: error: invalid operands to binary expression
      ('nudge::simd4_float' (aka 'simde__m128') and 'nudge::simd4_float')
                rx = ay*bz - az*by;
                     ~~^~~
../simd_emu/nudge.cpp:854:18: error: invalid operands to binary expression
      ('nudge::simd4_float' (aka 'simde__m128') and 'nudge::simd4_float')
                rx = ay*bz - az*by;
                             ~~^~~
../simd_emu/nudge.cpp:855:10: error: invalid operands to binary expression
      ('nudge::simd4_float' (aka 'simde__m128') and 'nudge::simd4_float')
                ry = az*bx - ax*bz;
                     ~~^~~
../simd_emu/nudge.cpp:855:18: error: invalid operands to binary expression
      ('nudge::simd4_float' (aka 'simde__m128') and 'nudge::simd4_float')
                ry = az*bx - ax*bz;
                             ~~^~~
../simd_emu/nudge.cpp:856:10: error: invalid operands to binary expression
      ('nudge::simd4_float' (aka 'simde__m128') and 'nudge::simd4_float')
                rz = ax*by - ay*bx;
                     ~~^~~
../simd_emu/nudge.cpp:856:18: error: invalid operands to binary expression
      ('nudge::simd4_float' (aka 'simde__m128') and 'nudge::simd4_float')
                rz = ax*by - ay*bx;
                             ~~^~~
../simd_emu/nudge.cpp:860:38: error: invalid operands to binary expression
      ('nudge::simd4_float' (aka 'simde__m128') and 'nudge::simd4_float')
                simd4_float f = simd_float::rsqrt(x*x + y*y + z*z);
                                                  ~^~
../simd_emu/nudge.cpp:860:44: error: invalid operands to binary expression
      ('nudge::simd4_float' (aka 'simde__m128') and 'nudge::simd4_float')
                simd4_float f = simd_float::rsqrt(x*x + y*y + z*z);
                                                        ~^~
../simd_emu/nudge.cpp:860:50: error: invalid operands to binary expression
      ('nudge::simd4_float' (aka 'simde__m128') and 'nudge::simd4_float')
                simd4_float f = simd_float::rsqrt(x*x + y*y + z*z);
                                                              ~^~
../simd_emu/nudge.cpp:861:5: error: no viable overloaded '*='
                x *= f;
                ~ ^  ~
../simd_emu/nudge.cpp:862:5: error: no viable overloaded '*='
                y *= f;
                ~ ^  ~
../simd_emu/nudge.cpp:863:5: error: no viable overloaded '*='
                z *= f;
                ~ ^  ~
../simd_emu/nudge.cpp:1249:51: error: invalid operands to binary expression
      ('nudge::simd4_float' (aka 'simde__m128') and 'nudge::simd4_float')
  ...simd4_float relative_rotation_x = a_rotation_x * b_rotation_s - b_rotati...
                                       ~~~~~~~~~~~~ ^ ~~~~~~~~~~~~
../simd_emu/nudge.cpp:1104:22: note: candidate function not viable: no known
      conversion from 'nudge::simd4_float' (aka 'simde__m128') to 'float' for
      1st argument
static inline float3 operator * (float a, float3 b) {
                     ^
../simd_emu/nudge.cpp:1109:22: note: candidate function not viable: no known
      conversion from 'nudge::simd4_float' (aka 'simde__m128') to
      'nudge::(anonymous namespace)::float3' for 1st argument
static inline float3 operator * (float3 a, float b) {
                     ^
../simd_emu/nudge.cpp:1134:22: note: candidate function not viable: no known
      conversion from 'nudge::simd4_float' (aka 'simde__m128') to
      'nudge::(anonymous namespace)::Rotation' for 1st argument
static inline float3 operator * (Rotation lhs, float3 rhs) {
                     ^
../simd_emu/nudge.cpp:1139:24: note: candidate function not viable: no known
      conversion from 'nudge::simd4_float' (aka 'simde__m128') to
      'nudge::(anonymous namespace)::Rotation' for 1st argument
static inline Rotation operator * (Rotation lhs, Rotation rhs) {
                       ^
../simd_emu/nudge.cpp:1182:25: note: candidate function not viable: no known
      conversion from 'nudge::simd4_float' (aka 'simde__m128') to
      'nudge::Transform' for 1st argument
static inline Transform operator * (Transform lhs, Transform rhs) {
                        ^
../simd_emu/nudge.cpp:1249:81: error: invalid operands to binary expression
      ('nudge::simd4_float' (aka 'simde__m128') and 'nudge::simd4_float')
  ...= a_rotation_x * b_rotation_s - b_rotation_x * a_rotation_s - t_x;
                                     ~~~~~~~~~~~~ ^ ~~~~~~~~~~~~
../simd_emu/nudge.cpp:1104:22: note: candidate function not viable: no known
      conversion from 'nudge::simd4_float' (aka 'simde__m128') to 'float' for
      1st argument
static inline float3 operator * (float a, float3 b) {
                     ^
../simd_emu/nudge.cpp:1109:22: note: candidate function not viable: no known
      conversion from 'nudge::simd4_float' (aka 'simde__m128') to
      'nudge::(anonymous namespace)::float3' for 1st argument
static inline float3 operator * (float3 a, float b) {
                     ^
../simd_emu/nudge.cpp:1134:22: note: candidate function not viable: no known
      conversion from 'nudge::simd4_float' (aka 'simde__m128') to
      'nudge::(anonymous namespace)::Rotation' for 1st argument
static inline float3 operator * (Rotation lhs, float3 rhs) {
                     ^
../simd_emu/nudge.cpp:1139:24: note: candidate function not viable: no known
      conversion from 'nudge::simd4_float' (aka 'simde__m128') to
      'nudge::(anonymous namespace)::Rotation' for 1st argument
static inline Rotation operator * (Rotation lhs, Rotation rhs) {
                       ^
../simd_emu/nudge.cpp:1182:25: note: candidate function not viable: no known
      conversion from 'nudge::simd4_float' (aka 'simde__m128') to
      'nudge::Transform' for 1st argument
static inline Transform operator * (Transform lhs, Transform rhs) {
                        ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.

Do you think it's the same issue ?

P.S. I'm not sure about it, but maybe we can just add operator overloads in blocks like:

#ifdef __cplusplus
/* overloads here */
#endif

and normal C code in:

#ifdef __cplusplus
extern "C" {
#endif
/* Plain C code here */
#ifdef __cplusplus
}
#endif

So a single branch could work for both C and C++ (please correct me if I'm wrong).

[Edit] Never mind. I've already given up the conversion attempt (too difficult for me). So it's OK to me. Feel free to ignore this post.

from simde.

nemequ commented on May 20, 2024

Yep, you're right that this looks like the same issue. FWIW, you could fix this pretty easily by using simde_mm_mul_ps(a,b) instead of a * b. I'd actually suggest that you do that even if you're not porting to SIMDe since it should increase portability to other compilers (like PGI).

You're also right about putting the implementation in an ifdef; that's what I've done for simde__m128i in the operator-overloading branch, just need to do the same thing for simde__m128.

from simde.

Flix01 commented on May 20, 2024

@nemequ: thanks for your quick answer.

You're also right about putting the implementation in an ifdef; that's what I've done for simde__m128i in the operator-overloading branch, just need to do the same thing for simde__m128.

Sorry, I'm actually using the master branch with some modifications in the caller .cpp file, so I haven't checked the operator-overloading branch yet.

I'd actually suggest that you do that even if you're not porting to SIMDe since it should increase portability to other compilers (like PGI).

Yes, but there are other errors. Here are my progress so far:

In my .cpp file that includes <x86/sse2> (please remember that I'm using your master branch), I've added something like:

//#include <immintrin.h>
//#define SIMDE_ENABLE_OPENMP	// -fopenmp
#include <x86/sse2.h>	// SIMDde library
[...]
#ifndef SIMDE_MM_TRANSPOSE4_PS
#define SIMDE_MM_TRANSPOSE4_PS(a,b,c,d) _MM_TRANSPOSE4_PS(a,b,c,d)
#endif //SIMDE_MM_TRANSPOSE4_PS

SIMDE__FUNCTION_ATTRIBUTES
simde__m128
operator+(simde__m128 a, simde__m128 b) {
	return simde_mm_add_ps(a, b);
}
 
SIMDE__FUNCTION_ATTRIBUTES
simde__m128
operator-(simde__m128 a, simde__m128 b) {
	return simde_mm_sub_ps(a, b);
}
 
SIMDE__FUNCTION_ATTRIBUTES
simde__m128
operator*(simde__m128 a, simde__m128 b) {
	return simde_mm_mul_ps(a, b);
}
 
SIMDE__FUNCTION_ATTRIBUTES
simde__m128
operator/(simde__m128 a, simde__m128 b) {
	return simde_mm_div_ps(a, b);
}
 
SIMDE__FUNCTION_ATTRIBUTES
simde__m128&
operator+=(simde__m128& a, simde__m128 b) {
	return (a=simde_mm_add_ps(a, b));
}

SIMDE__FUNCTION_ATTRIBUTES
simde__m128&
operator-=(simde__m128& a, simde__m128 b) {
	return (a=simde_mm_sub_ps(a, b));
}

SIMDE__FUNCTION_ATTRIBUTES
simde__m128&
operator*=(simde__m128& a, simde__m128 b) {
	return (a=simde_mm_mul_ps(a, b));
}

SIMDE__FUNCTION_ATTRIBUTES
simde__m128&
operator-(simde__m128& a) {
	return (a=simde_mm_sub_ps(simde_mm_set_ps1 (0.0f), a));
}

Not sure these are correct, but they fix a lot of compilation errors.
There are also some other macros that are not implemented in SIMDE (but luckily for me, I've been told that they are just optimizations and can be skipped):

// I've added the SIMDE prefix here
    SIMDE_MM_SET_FLUSH_ZERO_MODE(SIMDE_MM_FLUSH_ZERO_ON);
    SIMDE_MM_SET_DENORMALS_ZERO_MODE(SIMDE_MM_DENORMALS_ZERO_ON);

After skipping (commenting out) the two lines above, I still have calls to simde_mm_malloc and simde_mm_free; but in posix systems these can be easily implemented this way (don't know about Windows):

static inline void* simde_mm_malloc (size_t size, size_t alignment)	{
  // This works on posix systems
  // For Windows users: C11 should have aligned_alloc(...) that could replace simde_mm_malloc(...), but simde requires C99
  void *ptr;
  if (alignment == 1) return malloc (size);
  if (alignment == 2 || (sizeof (void *) == 8 && alignment == 4)) alignment = sizeof (void *);
  if (posix_memalign (&ptr, alignment, size) == 0) return ptr;
  else return NULL;
}
static inline void  simde_mm_free (void * ptr) {free (ptr);}

After all this I still get some errors (and at this point I gave up the conversion attempt):

/.../simde/simde/x86/sse2.h:3220:81: error: expected ‘)’ before ‘;’ token
  define simde_mm_srli_epi16(a, imm8) SIMDE__M128I_C(_mm_srli_epi16(a.n, imm8));
                                                                               ^
../simd_emu/nudge.cpp:2436:94: note: in expansion of macro ‘simde_mm_srli_epi16’
 imde_mm_add_epi16(simde_mm_add_epi16(edge, simde_mm_set1_epi16(1)), simde_mm_srli_epi16(edge, 1)); // Calculates 1 << edge (valid for 0-2).
                                                                     ^~~~~~~~~~~~~~~~~~~
../simd_emu/nudge.cpp:2436:122: error: expected primary-expression before ‘)’ token
 dd_epi16(edge, simde_mm_set1_epi16(1)), simde_mm_srli_epi16(edge, 1)); // Calculates 1 << edge (valid for 0-2).

Hope this helps.

from simde.

Flix01 commented on May 20, 2024

Just a little update on this.
The last error I reported in my last post is actually a real error in <x86/sse2.h>, at line 3220:

#  define simde_mm_srli_epi16(a, imm8) SIMDE__M128I_C(_mm_srli_epi16(a.n, imm8));

should be:

#  define simde_mm_srli_epi16(a, imm8) SIMDE__M128I_C(_mm_srli_epi16(a.n, imm8))

without the semicolon at the end.

Now the demo seems to work (even if I have to dig a little bit more inside it... to understand how I can disable my native sse extensions: I'm using gcc without any -march option and I'm not sure it's enough; but in any case the demo seems to run correctly!).

[Edit] In fact I've just discovered that both SIMDE_SSE2_NATIVE and SIMDE_SSE_NATIVE are implicitly defined. I'd like to force SIMD emulation, but I'm not sure it's possible...
[Update] I've just discovered the definition: SIMDE_NO_NATIVE that seems to force SIMD emulation, and the demo seems to work with this definition using gcc.
Too bad that the emscripten compiler can't compile the code (LLVM ERROR: Unsupported integer vector type with numElems: 2, primitiveSize: 64!)! Please see this link for further info about it. [Update] By using emscripten LLVM backend (1.38.40-upstream) it works!

[Edit] There might be other errors like the one I've spotted (i.e. definitions ending with a semicolon are also present here, here and maybe in other places).

from simde.

nemequ commented on May 20, 2024

Sorry about the delay.

@Flix01 I'm not sure if you're still interested in this or not, but we have an emscripten build in CI now and it's working pretty well with the standard options. If it's still causing you problems please let me know and I'll look into it right away.

Sorry, I didn''t see the edit about the semicolons. I just fixed a few (60458c1) that I found with grep -PR '^ *# *define .+;$' .. Hopefully that's all of them; the multi-line macros are generally pretty well tested, it's the native aliases that are a problem and they're pretty much all single-line.

In the future, if you could open up a new issue for this type of thing it should speed things up; I forgot there was other stuff attached to this and thought it was just about operator overloading.

On the subject of operator overloading, we now use native types (on GCC-style vector extensions) instead of our union, so for compilers where operators worked they should work on other architectures now, too.

from simde.

Flix01 commented on May 20, 2024

Thank you for your feedback (and for the work you're made on simde).

In the future, if you could open up a new issue for this type of thing it should speed things up; I forgot there was other stuff attached to this and thought it was just about operator overloading.

I'm sorry, you're right.

On the subject of operator overloading, we now use native types (on GCC-style vector extensions) instead of our union, so for compilers where operators worked they should work on other architectures now, too.

I've just checked that I have to comment out the operator overloading code inside the "simde-version" of nudge.cpp to make it compile again using the master branch of simde (with -DSIMDE_NO_NATIVE).

@Flix01 I'm not sure if you're still interested in this or not, but we have an emscripten build in CI now and it's working pretty well with the standard options. If it's still causing you problems please let me know and I'll look into it right away.

Good news! Does it work with -DSIMDE_NO_NATIVE?

from simde.

nemequ commented on May 20, 2024

I've just checked that I have to comment out the operator overloading code inside the "simde-version" of nudge.cpp to make it compile again using the master branch of simde (with -DSIMDE_NO_NATIVE).

What was the error message? For emscripten SIMDe should be using GCC-style vector extensions for the return value, so operators should work just like they do for the native SSE2 API. Maybe the lane types don't match (SIMDe uses a vector of int_fast32_t for __m128i)?

Good news! Does it work with -DSIMDE_NO_NATIVE?

Yes, it works with SIMDE_NO_NATIVE. But you should never need to define that yourself except for testing. SIMDe should automatically detect native support and, if available, target it. Emscripten doesn't have native support for SSE/SSE2/etc., so portable fallbacks will be used. All SIMDE_NO_NATIVE does is disable the checks for native support forcing it to always fall back.

If you look at the test results from emscripten you'll see that there are no native tests being run, because native support isn't detected. Note that this is a new feature; we used to build all the tests and the native functions would effectively just be a copy of the emulated ones.

from simde.

Flix01 commented on May 20, 2024

it works with SIMDE_NO_NATIVE. But you should never need to define that yourself except for testing.

As far as I can see (and can remember), I included simde/x86/sse2.h directly here, instead of simde/simde-common.h, and used -DSIMDE_NO_NATIVE in the compilation command-line.

I guess the reason was that nudge can use sse2 or avx, and since Emscripten (WASM) didn't have any native SIMD support, I wanted to choose the minimal requirement (see2) explicitly in the code.

What was the error message? For emscripten SIMDe should be using GCC-style vector extensions for the return value, so operators should work just like they do for the native SSE2 API. Maybe the lane types don't match (SIMDe uses a vector of int_fast32_t for __m128i)?

No, I think it's correct this way: I've just commented out this part, that was necessary when operator overloading was not present (I've never used simde operator-overloading branch); now that code is no longer necessary as far as I can see (*).

So, everything seems to work fine now with your master branch.

Thank you again.

(*) Just for reference, the errors were all like:

../simde/nudge.cpp:88:31: error: overloaded 'operator-' must have at least one parameter of class or enumeration type
NUDGE_FORCEINLINE simde__m128 operator - (simde__m128 a) {
                              ^

from simde.

nemequ commented on May 20, 2024

As far as I can see (and can remember), I included simde/x86/sse2.h directly here, instead of simde/simde-common.h, and used -DSIMDE_NO_NATIVE in the compilation command-line.

That's good. The implementations will pull in simde-common.h and whataver else they need.

So, everything seems to work fine now with your master branch.

Excellent. For what it's worth, depending on that means you're creating a dependency on GCC-style vector extensions, so you're limiting your pool of available compilers a bit (no more so than before using SIMDe, though). For example, MSVC doesn't support them.

By the way, I noticed you have a NUDGE_ALIGNED macro; since you're already using SIMDe you might want to consider using SIMDE_ALIGN. Similarly, instead of NUDGE_FORCEINLINE there is HEDLEY_ALWAYS_INLINE (Hedley is included in SIMDe).

Please let me know if you run into any more issues.

Edit: also, to be clear, if you just pass -fopenmp-simd to the compiler SIMDe will have no way of knowing; you have to either pass -fopenmp (in which case the compiler will link in the OpenMP runtime), or -fopenmp-simd and define SIMDE_ENABLE_OPENMP. If you just pass -fopenmp-simd but don't define SIMDE_ENABLE_OPENMP SIMDe won't generate the OpenMP SIMD pragmas, so the generated code may not be quite as fast.

from simde.

Flix01 commented on May 20, 2024

Edit: also, to be clear, if you just pass -fopenmp-simd to the compiler SIMDe will have no way of knowing; you have to either pass -fopenmp (in which case the compiler will link in the OpenMP runtime), or -fopenmp-simd and define SIMDE_ENABLE_OPENMP. If you just pass -fopenmp-simd but don't define SIMDE_ENABLE_OPENMP SIMDe won't generate the OpenMP SIMD pragmas, so the generated code may not be quite as fast.

Thank you, but I don't use openmp, because the SIMDe version is mainly used for browser support (Emscripten), and I don't think using openmp there is easy.

from simde.

nemequ commented on May 20, 2024

I wouldn't suggest trying to use the full OpenMP on emscripten, but -fopenmp-simd should work well.

It's basically just hints for the compiler's autovectorizer and has no dependency on the rest of OpenMP (which is why gcc, icc, and clang have an option to enable it independently of the rest of OpenMP). It works fine on emscripten, and the only effect should be that it generates better code.

from simde.

Flix01 commented on May 20, 2024

but -fopenmp-simd should work well.

Good to know. Thank you again!

from simde.

nemequ commented on May 20, 2024

Operator overloading no longer makes sense. We now use the native types in the API if available, so adding overloads to the simde_*_private types wouldn't do any good.

OTOH, compiler/architectures which worked before switching to SIMDe will still work as before without our intervention; code which relies on that just won't be as portable as using the proper functions.

from simde.

Add operator overloading for C++ about simde HOT 21 CLOSED

Comments (21)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent