I'm trying to test some of the Intel Intrinsics to see how they work. So, i created a function to do that for me and this is the code:
void test_intel_256()
{
__m256 res,vec1,vec2;
__M256_MM_SET_PS(vec1, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0);
__M256_MM_SET_PS(vec1, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0);
__M256_MM_ADD_PS(res,vec1,vec2);
if (res[0] ==9 && res[1] ==9 && res[2] ==9 && res[3] ==9 
  && res[4] ==9 && res[5] ==9 && res[6] ==9 && res[7] ==9 )
    printf("Addition : OK!\n");
else
    printf("Addition : FAILED!\n");
}
But then i'm getting these errors:
error: unknown type name ‘__m256’
error: subscripted value is neither array nor pointer nor vector
error: subscripted value is neither array nor pointer nor vector 
error: subscripted value is neither array nor pointer nor vector
error: subscripted value is neither array nor pointer nor vector
error: subscripted value is neither array nor pointer nor vector
error: subscripted value is neither array nor pointer nor vector
error: subscripted value is neither array nor pointer nor vector
error: subscripted value is neither array nor pointer nor vector
error: subscripted value is neither array nor pointer nor vector
Meaning that the compiler is not recognizing the __m256 type and by consequence he can't see the res as an array of floats. I'm including these libraries mmintrin.h, emmintrin.h, xmmintrin.h and i'm using eclipse Mars
So what i want to know is whether the problem is from the compiler or the hardware or something else? and how can i solve it? Thank you!
MMX and SSE2 are baseline for x86-64, but AVX is not. You do need to specifically enable AVX, where you didn't for SSE2.
Build with -march=haswell or whatever CPU you actually have.  Or just use -mavx.
Beware that gcc -mavx with the default tune=generic will split 256b loadu/storeu intrinsics into vmovups xmm / vinsertf128, which is bad if your data is actually aligned most of the time, and especially bad on Haswell with limited shuffle-port throughput.
It's good for Sandybridge and Bulldozer-family if your data really is unaligned, though.  See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80568: it even affects AVX2 vector-integer code, even though all AVX2 
CPUs (except maybe Excavator and Ryzen) are harmed by this tuning.  tune=generic doesn't take into account what instruction-set extension are enabled, and there's no tune=generic-avx2.
You could use -mavx2 -mno-avx256-split-unaligned-load -mno-avx256-split-unaligned-store.  That still doesn't enable other tuning options (like optimizing for macro-fusion of compare and branch) that all modern x86 CPUs have (except low-power ones), but that isn't enabled by gcc's tune=generic.  (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78855).
Also:
I'm including these libraries mmintrin.h, emmintrin.h, xmmintrin.h
Don't do that.  Always just include immintrin.h in SIMD code.  It pulls in all Intel SSE/AVX extensions.  This is why you get error: unknown type name ‘__m256’
Keep in mind that subscripting vector types lie __m256 is non-standard and non-portable.  They're not arrays, and there's no reason you should expect [] to work like an array.  Extracting the 3rd element or something from a SIMD vector in a register requires a shuffle instruction, not a load.
If you want handy wrappers for vector types that let you do stuff like use operator[] to extract scalars from elements of vector variables, have a look at Agner Fog's Vector Class Library.  It's GPLed, so you'll have to look at other wrapper libraries if that's a problem.
It lets you do stuff like
// example from the manual for operator[]
Vec4i a(10,11,12,13);
int b = a[2];   // b = 12
You can use normal intrinsics on VCL types.  Vec8f is a transparent wrapper on __m256, so you can use it with _mm256_mul_ps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With