I am having a problem with a SSE method I am writing that performs audio processing. I have implemented a SSE random function based on Intel's paper here:
http://software.intel.com/en-us/articles/fast-random-number-generator-on-the-intel-pentiumr-4-processor/
I also have a method that is performing conversions from Float to S16 using SSE also, the conversion is performed quite simply as follows:
unsigned int Float_S16LE(float *data, const unsigned int samples, uint8_t *dest)
{
  int16_t *dst = (int16_t*)dest;
  const __m128 mul = _mm_set_ps1((float)INT16_MAX);
   __m128 rand;
  const uint32_t even = count & ~0x3;
  for(uint32_t i = 0; i < even; i += 4, data += 4, dst += 4)
  {
    /* random round to dither */
    FloatRand4(-0.5f, 0.5f, NULL, &rand);
    __m128 rmul = _mm_add_ps(mul, rand);
    __m128 in = _mm_mul_ps(_mm_load_ps(data),rmul);
    __m64 con = _mm_cvtps_pi16(in);
    memcpy(dst, &con, sizeof(int16_t) * 4);
  }
}
FloatRand4 is defined as follows:
static inline void FloatRand4(const float min, const float max, float result[4], __m128 *sseresult = NULL)
{
  const float delta  = (max - min) / 2.0f;
  const float factor = delta / (float)INT32_MAX;
  ...
}
If sseresult != NULL the __m128 result is returned and result is unused. 
This performs perfectly on the first loop, but on the next loop delta becomes -1.#INF instead of 1.0. If I comment out the line __m64 con = _mm_cvtps_pi16(in); the problem goes away.
I think that the FPU is getting into an unknown state or something.
Mixing SSE Integer arithmetic and (regular) Floating point math. Can produce weird results because both are operating on the same registers. If you use:
_mm_empty()
the FPU is reset into a correct state. Microsoft has Guidelines for When to Use EMMS
http://msdn.microsoft.com/en-us/library/bytwczae.aspx
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With