I really dont understand how you can make some data structures lock-free. For example, if you have a linked list then either you surround the operations with mutexes, OR you can end up with a race condition if another thread executes whilst you are busy re-linking new nodes together.
The concept of "lock free" (I appreciate it doesnt mean "No locks" but means threads can progress without waiting for other threads to finish) just doesnt make sense.
Could somebody please show me a simple example using a stack, queue or linked list etc which is implemented as "lock-free" because I cannot understand how you can prevent the race condition without interfering with another threads productivity? Surely these two aims contradict each other?
Lock-less data structures use atomic operations and may impose additional requirements. For example, the data structure might only be safe for one reader and one writer thread, or any other combination. In the case of a simple linked list would use atomic reads and writes to the node pointer, to guarantee that multiple threads can safely read and write to it at the same time.
You may or may not get away with just that. If you need additional guarantees about the content of the data structure and validation, you are probably not able to make this without some form of high level locking. Also, not every data structure allows to be rewritten to be lock free, even when taking into account additional requirements on how the data structure is used. In those case, immutable objects might be a solution, but they have usually come with performance penalties due to copying, which is not always desirable over locking the object and then mutating it.
What I find easy and explainable is that first you can write pseudocode for lock based(mutex) style data structure and then try to see how the variables you held a lock on, can be modified in a lock free way with CAS operations.Though others have given great answers, I would like to add that you get a feel of it only if you implement with by yourself,of course by reading some pseudocode from some research paper it was published in.
Here's a queue I implemented in C++ with validation testing for multi threaded runs :
#include<iostream>
#include<atomic>
#include<thread>
#include<vector>
#define N 1000
using namespace std;
class lf_queue
{
private:
struct node
{ int data;
atomic<node*> next;
node(int d):data(d)
{}
};
atomic<node*> Head;
atomic<node*> Tail;
public:
lf_queue()
{
node *nnode= new node(-1);
nnode->next=NULL;
Head=nnode;
Tail=nnode;
}
void enqueue(int data)
{
node *nnode= new node(data);
nnode->next=NULL;
node *tail,*next_p;
while(true)
{
tail=Tail.load();
next_p=tail->next;
if(tail==Tail.load())
{
if(next_p==NULL)
{
if((tail->next).compare_exchange_weak(next_p,nnode))
break;
}
else
{
Tail.compare_exchange_weak(tail,next_p);
}
}
}
Tail.compare_exchange_weak(tail,nnode);
}
bool dequeue(int &res)
{
while(true)
{
node *head,*tail,*next_p;
head=Head.load();
tail=Tail.load();
next_p=head->next;
if(head==Head.load())
{
if(head==tail)
{
if(next_p==NULL)
return false;
Tail.compare_exchange_weak(tail,next_p);
}
else
{
res=next_p->data;
if(Head.compare_exchange_weak(head,next_p))
break;
}
}
}//end loop
return true;
}
};
void producer(lf_queue &q)
{ //cout<<this_thread::get_id()<<"Inside producer\n";
for(int i=0;i<N;i++)
{
q.enqueue(1);
}
//cout<<this_thread::get_id()<<" "<<"Finished producing\n";
}
void consumer(lf_queue &q,atomic<int>& sum)
{ //cout<<this_thread::get_id()<<" "<<"Inside consumer\n";
for(int i=0;i<N;i++)
{
int res=0;
while(!q.dequeue(res));
sum+=res;
}
//cout<<this_thread::get_id()<<" "<<"Finished consuming\n";
}
int main()
{
lf_queue Q;
atomic<int> sum;
sum.store(0);
vector<thread> thread_pool;
for(int i=0;i<10;i++)
{ if(i%2==0)
{ thread t(consumer,ref(Q),ref(sum));
thread_pool.push_back(move(t));
}
else
{
thread t(producer,ref(Q));
thread_pool.push_back(move(t));
}
}
for(int i=0;i<thread_pool.size();i++)
thread_pool[i].join();
cout<<"Final sum "<<sum.load()<<"\n";
return 0;
}
I tried implementing lock free linked list using Harris's paper, but ran into complications, you see with C++11 style, you can only perform CAS on atomic<> types, and also these atomic<node*> can't be casted into long for the purpose of bit stealing which Harris's implementation uses to logically mark deleted nodes.However there are code implementations in C available on internet that use low level cas_ptr operations which gives more flexibility for casting to/from between addresses and long.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With