I am reading Robert Sedgewick's book Algorithms 4th edition, and he has the following exercise question: What is the expected number of subarrays of size 0, 1 and 2 when quicksort is used to sort an array of N items with distinct keys?
Then he says that if you're mathematically inclined, do the math, if not, run experiments, I have run the experiments and it seems like the arrays of size 0 and 1 have precisely the same number of occurrences and the arrays of size 2 are only half as occurrent.
The version of quicksort in question is the one with 2 way partitioning.
I understand that we get subarrays of size 0 when the partitioning item is the smallest/biggest one in the subarray, so the consequent 2 calls for the sort will be
sort(a, lo, j-1); // here if j-1 < lo, we have an array of size 0
sort(a, j+1, hi); // here if j+1 > hi, we have an array of size 0
The arrays of size 1 happen when the partitioning item is 2nd to first smallest/biggest item, and of size 2 when it's 3rd to first smallest/biggest item.
So, how exactly do I derive a mathematical formula?
Here is the code in C#
class QuickSort
{
    private static int zero = 0, one = 0, two = 0;
    private static int Partition<T>(T[] a, int lo, int hi) where T : IComparable<T>
    {
        T v = a[lo];
        int i = lo, j = hi + 1;
        while(true)
        {
            while(Alg.Less(a[++i], v)) if(i == hi) break;
            while(Alg.Less(v, a[--j])) if(j == lo) break;
            if(i >= j) break;
            Alg.Swap(ref a[i], ref a[j]);
        }
        Alg.Swap(ref a[lo], ref a[j]);
        return j;
    }
    private static void Sort<T>(T[] a, int lo, int hi) where T : IComparable<T>
    {
        if(hi < lo) zero++;
        if(hi == lo) one++;
        if(hi - lo == 1) two++;
        if(hi <= lo) return;
        int j = Partition(a, lo, hi);
        Sort(a, lo, j - 1);
        Sort(a, j + 1, hi);
    }
    public static void Sort<T>(T[] a) where T : IComparable<T>
    {
        Alg.Shuffle(a);
        int N = a.Length;
        Sort(a, 0, N - 1);
        Console.WriteLine("zero = {0}, one = {1}, two = {2}", zero, one, two);
    }
}
There's a proof that says that on average quicksort uses 2NlnN ~ 1.39NlgN compares to sort an array of length N with distinct keys.
I guess we can think of 1.39NlgN as we do N comparisons ~lgN times, so on average we divide our array in half, hence at some point we will be left with pairs to compare, and since there are only pairs to compare, for example : <1,2>,<3,4>,<5,6>,etc..., we will get subarrays of size 0 and 1 after partitioning them, that only proves that sizes of 0 and 1, are more frequent, but I still don't understand why sizes of 2 are almost exactly half as frequent.
The total number of subarrays in an array of size N is N * (N + 1) / 2. The count of subarrays with an odd product is equal to the total number of continuous odd elements present in the array. Therefore, count of subarrays with even product = (Total number of subarrays – Subarrays with the odd product).
The largest item is moved every time by 2 positions, therefore the number of exchanges is floor(N/2).
Quicksort uses ~2 N ln N compares (and one-sixth that many exchanges) on the average to sort an array of length N with distinct keys.
Leaving aside the mathematics, one thing is totally clear: when quicksort is called with a two-element partition, it will then issue two recursive calls, one of which has a zero-element partition and the other of which has a one-element partition.
So there will certainly be one one-element partition and one zero-element partition counted for every two-element partition counted.
In addition, one- and zero-element partitions can occur spontaneously, without a two-element parent, when the selected partition element is either the largest/smallest or second-largest/smallest element in the current partition. Roughly speaking, these should be about as likely as each other, and also as likely as that a two-element partition shows up.
So the observed results are not unexpected.
QuickSort will recursively partition the array into two smaller array at position k. k can be from 1 to n. Each k has the same probability of occurrence. Let C0(n) be the average number of appearances of 0-sized subsets, and C1(n), C2(n) be the same for 1-sized and 2-sized subsets.
Apart from initial conditions, each satisfies:
C(n) = 1/n sum(C(k-1) + C(n-k) for k=1..n)
The two parts of the sum are the same but summed in the opposite order, so:
C(n) = 2/n sum(C(k-1) for k=1..n)
or
n*C(n) = 2*sum(C(k-1) for k=1..n)
Assuming neither n nor n-1 are part of the initial conditions, we can simplify by subtracting (n-1)C(n-1) from both sides:
n*C(n) - (n-1)C(n-1) = 2*C(n-1)
or
C(n) = (n+1)/n * C(n-1)
Deriving results from the recurrence relation
We now have a recurrence relation C(n) which applies equally to C0, C1 and C2.
For C0, we have initial conditions C0(0)=1, C0(1)=0. We compute C0(2) to get 1, and then we can apply the simplified recurrence relation C0(n) = (n+1)/n * C0(n-1) for n>2 to get the general result C0(n)=(n+1)/3.
For C1, we have initial conditions C1(0)=0, C1(1)=1. As before, we compute C1(2) to get 1, and apply the same procedure as for C0 to get the general result C1(n)=(n+1)/3.
For C2, we have initial conditions C2(0)=C2(1)=0, and C2(2)=1. This time we compute C2(3) = 1/3 * 2 * (C2(0) + C2(1) + C2(2)) = 2/3. Then applying the simplified recurrence relation to infer the general result C2(n)=(n+1)/4 * C2(3) = (n+1)/4 * 2/3 = (n+1)/6.
Conclusion
We've shown the average number of appearances of 0-sized and 1-sized subarrays when quicksorting an array of size n is in both cases (n+1)/3. For 2-sized subarrays we've shown it's (n+1)/6.
This confirms your original observation that 2-sized subsets appear exactly half as often as 0 and 1-sized subsets, and gives an exact formula for the means.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With