In the application which I'm currently developing, I must sum pretty big arrays of vectors efficiently. Here's my code:
public List<double[, ,]> normalMaps;
public double[, ,] Mix(double[] weights, double gain)
{
int w, h;
w = normalMaps[0].GetLength(0);
h = normalMaps[0].GetLength(1);
double[, ,] ret = new double[w, h, 3];
int normcount = normalMaps.Count;
//for (int y = 0; y < h; y++)
Parallel.For(0, h, y =>
{
for (int x = 0; x < w; x++)
{
for (int z = 0; z < normcount; z++)
{
ret[x, y, 0] += normalMaps[z][x, y, 0] * weights[z];
ret[x, y, 1] += normalMaps[z][x, y, 1] * weights[z];
ret[x, y, 2] += normalMaps[z][x, y, 2] * weights[z];
}
ret[x, y, 0] *= gain;
ret[x, y, 1] *= gain;
ret[x, y, 2] *= gain;
ret[x, y, 0] = Math.Max(-1, Math.Min(1, ret[x, y, 0]));
ret[x, y, 1] = Math.Max(-1, Math.Min(1, ret[x, y, 1]));
ret[x, y, 2] = Math.Max(-1, Math.Min(1, ret[x, y, 2]));
double retnorm = Math.Sqrt(ret[x, y, 0] * ret[x, y, 0] + ret[x, y, 1] * ret[x, y, 1] + ret[x, y, 2] * ret[x, y, 2]);
ret[x, y, 0] /= retnorm;
ret[x, y, 1] /= retnorm;
ret[x, y, 2] /= retnorm;
}
});
return ret;
}
Now, when I try to sum 7 1024*1024 arrays of 3-component vectors, the operation takes 320 ms on my laptop. Making the code multithreaded gave me already a huge performance boost. But I need to make it even faster. How can I optimize it even more? I can already see I could use a simple array instead of a List<>, that would make the code faster, but not much. Is there really nothing left to optimize? I was thinking about moving this thing to GPU, but it's just an idea. Can somebody help me out? Thanks in advance.
You'll get your code from 270ms to 0ms if you know the fact that you are iterating the dimensions in a bit inefficient order, which causes false sharing. You are essentially parallelizing "width", instead of height. You might be confusing the way how arrays are stored in memory.
The false-sharing is not the only problem, due to the fact how computers work, you are iterating over things in a cache-inefficient way.
Usually array definitions should be myArray[HEIGHT, WIDTH] to keep it consistent with memory storage, and when iterating, the height should be outermost.
Parallel.For(0, w, x =>
{
for (int y = 0; y < h; y++)
{
...
}
}
That took me from 800ms to 150ms, while having equal dimensions, just by swapping the few things.
As you mentioned, swapping that List<> out for an array will give a noticeable performance boost.
If you switch to arrays, you could also make use of pointers to iterate the values. You'll take a small performance hit for pinning it so it doesn't get moved by the GC but considering the size, the pros should out-weigh the cons. You see this done a fair bit within the .NET framework's source to squeeze every drop of performance they can from hefty iterations.
You may be able to utilize the new SIMD support for the actual calculations but I don't know enough on the subject to be able to give more details. I should also mention that the new SIMD features in .NET aren't fully complete yet and still in beta.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With