I began reading through the information posted at http://www.albahari.com/threading/
The author stated that:
Sleep(0)
or Yield is occasionally useful in production code for advanced performance tweaks. It’s also an excellent diagnostic tool for helping to uncover thread safety issues: if insertingThread.Yield()
anywhere in your code makes or breaks the program, you almost certainly have a bug.
According to the MSDN on Thread.Yield()
, Thread.Yield()
is defined as follows:
Causes the calling thread to yield execution to another thread that is ready to run on the current processor. The operating system selects the thread to yield to.
To me, this describes the half of software development that says that race conditions can't be solved.
Is this a standard debugging practice in threading?
It's good advice, but I usually use Sleep(1) instead.
One of the things that makes concurrency bugs so hard to fix is that they are hard to reproduce -- most problems manifest when you are unlucky and the OS suspends your thread at the worst possible time.
When debugging problems like this, you'll often need to test a hypothesis like "maybe it happens when my thread gets suspended here, and...". At that point you can insert a yield or sleep, which will suspend your thread and greatly increase the likelihood of reproducing the error.
Using Thread.Sleep()
or Thread.Yield()
won't solve your bugs, but they might hide them in some cases. While this seems like a good thing - stopping bugs from popping up is better than having them kill your program - the reality is that you're not resolving the underlying issue.
Yes, stomping bugs in a multi-threaded program can be damned hard. This is where you really have to understand how your threads are interacting and what happens when you have threads running simultaneously on different CPU cores, etc. Without that understanding you'll likely never find the error in your program logic that is causing the problem in the first place.
When writing a multi-threaded program you have to make sure that every operation on shared data is atomic. Even a simple increment operation becomes a problem when you are doing it on a shared value, which is why we have the Interlocked.Increment()
method. For everything else there are locks and so on to help you manage your thread interactions.
Examine every interaction that your threads have with shared data and make sure that there is a lock in place on the data while you are using it. For instance, let's say you're queuing jobs for a set of worker threads to do:
public class WorkerThread
{
public static readonly Queue<Job> jobs = new Queue<Job>();
public void ThreadFunc()
{
while (true)
{
if (jobs.Count > 0)
{
Job myJob = jobs.Dequeue()
// do something with the job...
}
else
Thread.Yield();
}
}
}
Seems simple enough, and it's only going to be a few cycles between when you check for a job and then go to fetch it. In the meantime another thread has swooped in and grabbed the waiting job out from under you. You can solve this in a few ways, but the simplest is probably to use a thread-safe version of the queue from System.Collections.Concurrent
:
public class WorkerThread
{
public static readonly ConcurrentQueue<Job> jobs = new ConcurrentQueue<Job>();
public void ThreadFunc()
{
Job myJob;
while (true)
{
if (jobs.TryDequeue(out myJob))
{
// do something with the job...
}
else
Thread.Yield();
}
}
}
In cases where you don't have a thread-safe version you'll have to fall back on either locking or some other mechanism to secure your access to the shared data. A lock-based solution to the above might look something like this:
public class WorkerThread
{
private static object _jobs_lock = new object();
private static readonly Queue<Job> _jobs = new Queue<Job>();
public void ThreadFunc()
{
Job myJob;
while (true)
{
if ((myJob = NextJob()) != null)
{
// do something with the job...
}
else
Thread.Yield();
}
}
public void AddJob(Job newJob)
{
lock(_jobs_lock)
_jobs.Enqueue(newJob);
}
private Job NextJob()
{
lock (_jobs_lock)
{
if (_jobs.Count > 0)
return _jobs.Dequeue();
}
return null;
}
}
Either of those two will ensure that the collection isn't modified between testing if there is a job and actually retrieving the job from the queue. Make sure you release the locks as fast as you can though, because otherwise you're going to have lock contention issues which can be a whole lot tougher to resolve. Never leave a lock in place longer than absolutely necessary to do the work - in this case, test for and retrieve an item from the queue. Do this for all of your shared resources and you won't have any more race conditions on them.
Of course there are plenty of other threading issues, including methods that are inherently thread unsafe. Make your threads as self-dependent as you can and lock access to shared resources and you should be able to avoid most of the nasty heisenbugs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With