I'm attempting to parallelize a task but I'm not getting good performance. At most I'm getting 50% CPU usage.
Below is a toy example where I build a list of lists of random words, and then go through every word and string reverse it. The reversing part is what I'm trying to parallelize.
To me this looks like a CPU-bound problem, so I don't really understand why CPU usage is so low.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
namespace Papa
{
class Program
{
static void Main(string[] args)
{
// Build source
var random = new Random();
var alphabet = "abcdefghijklmnopqrstuvwxyz";
Console.WriteLine("Building data source");
var source = new List<string[]>();
foreach (var i in Enumerable.Range(0, 5000000))
{
var words = random.Next(3, 50);
var line = new List<string>();
for (var j = 0; j < words; ++j)
{
line.Add(
string.Concat(
Enumerable
.Range(0, random.Next(3, 9))
.Select(w => alphabet[random.Next(0, alphabet.Length - 1)])
)
);
}
source.Add(line.ToArray());
}
// Process source
Console.WriteLine("Processing source");
var processed = new List<string[]>();
Parallel.ForEach(
source,
() => new List<string[]>(),
(line, loop, local) =>
{
var processedLine = new List<string>();
for (var i = 0; i < line.Count(); ++i)
{
processedLine.Add(string.Concat(line[i].Reverse()));
}
local.Add(processedLine.ToArray());
return local;
},
partitionResult =>
{
lock (processed)
{
processed.AddRange(partitionResult);
}
});
}
}
}
Parallel.ForEach
will use as many threads as the scheduler provides by default (reference). Be aware that the documentation states:
Executes a foreach (For Each in Visual Basic) operation on a Partitioner in which iterations may run in parallel and loop options can be configured.
So it might be the case it just does not seem fit with the default options.
After running your code I found the following (when hitting the parallelized part):
As you can see it already utilized 15Gb of memory but it's running on all cores. I'm quite sure you run into memory/GC bottlenecks rather than CPU bottlenecks.
I run some additional performance analysis. This is what I found (String::Concat
can be ignored because it's part of the initialization):
Line 45 is taking up ~50% of the CPU time.
I played a bit with the code and made the following changes:
Parallel.ForEach(
source,
() => new List<string[]>(),
(line, loop, local) =>
{
var length = line.Length;
var processedLine = new string[length];
for (var i = 0; i < length; ++i)
{
processedLine[i] = line[i].Reverse() + "";
}
local.Add(processedLine);
return local;
},
partitionResult =>
{
lock (processed)
{
processed.AddRange(partitionResult);
}
});
With a doubled data-size (10000000
~32Gb of data) this shows at least a small peak in CPU usage but it is done in less than 7 seconds so I'm not quite sure that TaskManager really gets the peak correctly. Never the less this does also not resolve the problem and the GC also runs like crazy.
My "final" conclusion would be, that it's either (or multiple) of:
which is throttling you.
In my opinion, this is a nice example of why parallelization will not yield x-times the performance even if you can use x-threads.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With