What's the difference between _mm_broadcast_ss() and _mm_load_ps1()?
void example(){
__declspec(align(32)) const float num = 20;
__m128 a1 = _mm_broadcast_ss(&num);
__declspec(align(32)) float f1[4];
_mm_store_ps (f1, a1);
std::cout << f1[0] << " " << f1[1] <<" " << f1[2] << " " << f1[3] << "\n";
__m128 a2 = _mm_load_ps1(&num);
__declspec(align(32)) float f2[4];
_mm_store_ps (f2, a2);
std::cout << f2[0] << " " << f2[1] <<" " << f2[2] << " " << f2[3] << "\n";
}
I got same output in both ways, so why do they both exist?
The immintrin. h header file defines a set of data types that represent different types of vectors. These are; __m256 : This is a vector of eight floating point numbers (8x32 = 256 bits)
_mm256_maskstore_epi32(int *addr, __m256i mask, __m256i a) — store 32-bit values from a at addr , but only the values 32-bit values that mask specifies. Values are stored if the most significant (i.e. sign) bit of each 32-bit integer in mask is set.
_mm_broadcast_ss only compiles for AVX targets.
_mm_load1_ps / _mm_load_ps1 will compile to multiple instructions (movss / shufps) when compiling for targets that don't support AVX. When you are compiling for an AVX target, any good compiler will use a vbroadcastss to implement them.
load1 / set1 and other convenience functions were introduced early on, because it's often good to let the compiler pick the optimal strategy for moving data around.
_mm_broadcast_* intrinsics were introduced as direct wrappers around the vbroadcastss / vbroadcastsd instructions. (AVX2 has integer vpbroadcast..., and the reg-reg forms of vbroadcastss. AVX1 only has vbroadcastss x/ymm, [mem].)
_mm_load1_ps or _mm_set1_ps.It makes no difference to the code, and lets the same source build for non-AVX targets.
The choice might make a difference to the asm output at -O0, but IDK. If you care about the asm output in an un-optimized build, then 1: that's weird, and 2: you'll have to see what your compiler does.
As you can see from the asm output on godbolt (for gcc):
-mno-avx)bcast: compile error so I #ifdef it out
__m128 load1(const float*p) { return _mm_load1_ps(p); }
movss xmm0, DWORD PTR [rdi]
shufps xmm0, xmm0, 0
ret
-mavx)__m128 bcast(const float*p) { return _mm_broadcast_ss(p); }
vbroadcastss xmm0, DWORD PTR [rdi]
ret
__m128 load1(const float*p) { return _mm_load1_ps(p); }
vbroadcastss xmm0, DWORD PTR [rdi]
ret
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With