I am trying to understand quick sort algorithm from this web site, paul's implementation is as fast as stl::sort(quick sort in the large range, and insertion sort in smaller range).
I compare paul's implementation with mine, mine is 3 times slower than his implementation. By profiling our codes, I find the main diffidence is Partition.
The following is excerpt form paul's code:
void Partition(int data[], int low , int high){
int pivot = data[low];
int fromLow = low;
int fromHigh = high;
int ptr = low;
goto QStart;
while(true){
do {
fromHigh--;
if(fromLow >= fromHigh)
goto QEnd;
QStart:;
}while(data[fromHigh] > pivot)
data[ptr] = data[fromHigh];
ptr = fromHigh;
do{
fromLow++;
if(fromLow >= fromHigh){
ptr = fromHigh;
goto QEnd;
}
}while(data[fromLow] < pivot)
data[ptr] = data[fromLow];
ptr = fromLow;
}
QEnd:;
data[ptr] = pivot;
}
And following is mine:
void MyPartition(int data[], int low , int high){
int pivot = data[low];
int fromLow = low;
int fromHigh = high;
int ptr = low;
while(fromLow != fromHigh){
if(data[fromHigh] >= pivot)
fromHigh--;
else{
data[ptr] = data[fromHigh];
ptr = fromHigh;
while(data[fromLow] <= pivot && fromLow != fromHigh)
fromLow ++;
data[ptr] = data[fromLow];
ptr = fromLow;
}
}
data[ptr] = pivot;
}
These two functions implement the same algorithm, and I believe they have the same BigO:
Does someone know why paul's implementation is fast than mine?
UPDATE:
int PartitionData2(int * data, int low, int high){
int pivot = data[low];
int fromLow = low;
int fromHigh = high;
int ptr = low;
while(fromLow < fromHigh){
if(data[fromHigh] > pivot) // '>=' ==> '>'
fromHigh--;
else{
data[ptr] =data[fromHigh] ;
ptr = fromHigh;
while(data[++fromLow] < pivot && // '<=' ==> '<'
fromLow != fromHigh);
data[ptr] = data[fromLow];
ptr = fromLow;
fromHigh--; // antti.huima's idea
}
}
data[ptr] = pivot;
return ptr;
}
I just update codes according to idea of antti.huima, this make my codes as fast as paul's codes.
It make me confuse, because it looks like swapping value which equals to pivot.
UPDATE2: I understand the reason why we need move element which equals to pivot, because if we don't, the two new partitions will be uneven, e.g. there should be one is much larger than another. it eventually go to O(n^2) case.
see this PDF
You have some redundant checks in your code that paul's code doesn't have.
For example on the line
while(data[fromLow] <= pivot && fromLow != fromHigh)
the first check is redundant on the first iteration because it always holds that when you start this iteration, on the first iteration data[fromLow] is not higher than the pivot. The reason is that whenever you start this iteration, you have just swapped in a value less than the pivot from 'fromHigh'. Because for a randomly ordered array this iteration is only run for a couple of loops (it terminates with 50% probability for a random pivot), you are in practice doing 25% extra comparisons compared to paul's code which doesn't do the pivot comparison before limit check but increments fromLow first once.
You have the same performance bug in the first loop where you decrease fromHigh, even though it's syntactically different. paul's code doesn't have it... and that's why he needs to goto to QStart :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With