I'm moving the first steps into SSE2 in C++. Here's the intrinsic I'm learning right now:
__m128d _mm_add_pd (__m128d a, __m128d b)
The document says: Add packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst.
But I never pass dst to that function. So how can it add two double I pass (via pointer) to a resulting array if I don't pass it?
The intrinsic returns the result of the computation, so you can store it in a variable or use it as another parameter.
An important thing to note here is that most SIMD instructions don't operate directly on memory, but you need to explicitly load (_mm_load(u)_pd) and store (_mm_store(u)_pd) the double values as you would for example do in assembly. The intermediate values will most likely be stored in SSE registers, or if too many registers are in use, on the stack.
So if you wanted to sum up two double arrays, you would do something like
double a[N];
double b[N];
double c[N];
for (int i = 0; i < N; i += 2) { // We load two doubles every time
auto x = _mm_loadu_pd(a + i); // We don't know anything about alignment
auto y = _mm_loadu_pd(b + i); // So I assume the load is unaligned
auto sum = _mm_add_pd(x, y); // Compute the vector sum
_mm_storeu_pd(c + i, sum); // The store is unaligned as well
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With