System.Numerics.Vector brings SIMD support to .NET Core and .NET Framework. It works on .NET Framework 4.6+ and .NET Core.
// Baseline
public void SimpleSumArray()
{
for (int i = 0; i < left.Length; i++)
results[i] = left[i] + right[i];
}
// Using Vector<T> for SIMD support
public void SimpleSumVectors()
{
int ceiling = left.Length / floatSlots * floatSlots;
for (int i = 0; i < ceiling; i += floatSlots)
{
Vector<float> v1 = new Vector<float>(left, i);
Vector<float> v2 = new Vector<float>(right, i);
(v1 + v2).CopyTo(results, i);
}
for (int i = ceiling; i < left.Length; i++)
{
results[i] = left[i] + right[i];
}
}
Unfortunately, the initialization of the Vector can be the limiting step. To work around this, several sources recommend using MemoryMarshal to transform the source array into an array of Vectors [1][2]. For example:
// Improving Vector<T> Initialization Performance
public void SimpleSumVectorsNoCopy()
{
int numVectors = left.Length / floatSlots;
int ceiling = numVectors * floatSlots;
// leftMemory is simply a ReadOnlyMemory<float> referring to the "left" array
ReadOnlySpan<Vector<float>> leftVecArray = MemoryMarshal.Cast<float, Vector<float>>(leftMemory.Span);
ReadOnlySpan<Vector<float>> rightVecArray = MemoryMarshal.Cast<float, Vector<float>>(rightMemory.Span);
Span<Vector<float>> resultsVecArray = MemoryMarshal.Cast<float, Vector<float>>(resultsMemory.Span);
for (int i = 0; i < numVectors; i++)
resultsVecArray[i] = leftVecArray[i] + rightVecArray[i];
}
This brings a dramatic improvement in performance when running on .NET Core:
| Method | Mean | Error | StdDev |
|----------------------- |----------:|----------:|----------:|
| SimpleSumArray | 165.90 us | 0.1393 us | 0.1303 us |
| SimpleSumVectors | 53.69 us | 0.0473 us | 0.0443 us |
| SimpleSumVectorsNoCopy | 31.65 us | 0.1242 us | 0.1162 us |
Unfortunately, on .NET Framework, this way of initializing the vector has the opposite effect. It actually leads to worse performance:
| Method | Mean | Error | StdDev |
|----------------------- |----------:|---------:|---------:|
| SimpleSumArray | 152.92 us | 0.128 us | 0.114 us |
| SimpleSumVectors | 52.35 us | 0.041 us | 0.038 us |
| SimpleSumVectorsNoCopy | 77.50 us | 0.089 us | 0.084 us |
Is there a way to optimize the initialization of Vector on .NET Framework and get similar performance to .NET Core? Measurements have been performed using this sample application [1].
[1] https://github.com/CBGonzalez/SIMDPerformance
[2] https://stackoverflow.com/a/62702334/430935
As far as I know, the only efficient way to load a vector in .NET Framework 4.6 or 4.7 (presumably this will all change in 5.0) is with unsafe code, for example using Unsafe.Read<Vector<float>> (or its unaliged variant if applicable):
public unsafe void SimpleSumVectors()
{
int ceiling = left.Length / floatSlots * floatSlots;
fixed (float* leftp = left, rightp = right, resultsp = results)
{
for (int i = 0; i < ceiling; i += floatSlots)
{
Unsafe.Write(resultsp + i,
Unsafe.Read<Vector<float>>(leftp + i) + Unsafe.Read<Vector<float>>(rightp + i));
}
}
for (int i = ceiling; i < left.Length; i++)
{
results[i] = left[i] + right[i];
}
}
This uses the System.Runtime.CompilerServices.Unsafe package which you can get via NuGet, but it could be done without that too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With