Quick and Merge sort for multiple CPUs

Question

Both merge sort and quick sort can work in parallel. Each time we split a problem in two sub-problems we can run those sub-problems in parallel. However it looks sub-optimal.

Suppose we have 4 CPUs. On the 1st iteration we split the problem in only 2 sub-problems and two CPUs are idle. On the 2nd iteration all CPUs are busy but on the 3d iteration we do not have enough CPUs. So, we should adapt the algorithm for the case when CPUs << log(N).

Does it make sense? How would you adapt the sorting algorithms to these cases?

Xantix · Accepted Answer

First off, the best parallel implementation will depend highly on the environment. Some factors to consider:

Shared Memory (a 4-core computer) vs. Not Shared (4 single-core computers)
Size of data to sort
Speed of comparing two elements
Speed of swapping/moving two elements
Memory available
Is each computer/core identical or are there differences in speeds, network latency to communicate between parts, cache effects, etc.
Fault tolerance: what if one computer/core broke down in the middle of the operation.

etc.

Now moving back to the theoretical:

Suppose I have 1024 cards, and 7 other people to help me sort them.

Merge Sort

I quickly split the stack into 8 sections of somewhat equal size. It won't be perfectly equal since I am going fast. Actually since my friends can start sorting their part as soon as they get their section, I should give my first friend a stack bigger than the rest and get smaller towards the end.

Each person sorts their part however they like sequentially. (radix sort, quick sort, merge sort, etc.)

Now for the hard part ... merging.

In real life I would probably have the first two people that are ready form a pair and start merging their decks together. Perhaps they could work together, one person merging from the front and the other from the back. Perhaps they could both work from the front while calling their numbers out.

Soon enough other people will be done with their individual sorting, and can start merging. I would have them form pairs as they find convenient and keep going until all the cards are merged.

Quick Sort

The real trick here is to try to parallelize the partitioning, since the rest is pretty easy to do.

I will start by breaking the stack into 8 parts, and hand one part out to each friend. While doing this, I will choose one of the cards that looks like it might end up towards the middle of the sorted deck. I call out that number.

Each of my friends will partition their smaller stack into three piles, less than the called out number, equal to the called out number, and greater than the called out number. If one friend is faster than the others, he/she can steal some cards from a neighboring friend.

When they are finished with that, I collect all the less thans into one pile and give that to friends 0 through 3, I set aside the equal to's, and give the greater's to friends 4 through 7.

Friends 0 through 3, will divide their stack into four somewhat equal parts, will choose a card to partition around, and repeat the process amongst themselves.

This repeats until each friend has their own stack.

(Note that if the partitioning card wasn't chosen well, rather than dividing up the work 50-50, maybe I would only assign 2 friends to work on the less thans, and let the other 6 work on the greater thans.)

At the end, I just collect all of the stacks in the right order, along with the partition cards.

Conclusion

While it is true that some approaches are faster on a computer than in real life, I think the preceding is a good start. Different computers or cores or threads will perform their work at different speeds, unless you are implementing the sort in hardware. (If you are, you might want to look into "Sorting Networks" and or "Optimal Sorting Networks").

If you are sorting numbers, you will need a large dataset to be helped by paralellizing it.

However, if you are sorting images by comparing the sum manhattan distance between corresponding pixel red green blue values. You will find it less difficult to get speed-up of just less than k times with k cpu's.

Lastly, you will want to time the sequential version(s), and compare as you go along, since, cache effects, memory usage, network costs, etc, might just might make a difference.

Quick and Merge sort for multiple CPUs

Tags:

language-agnostic

algorithm

sorting

parallel-processing

Michael

1 Answers

Merge Sort

Quick Sort

Conclusion

Xantix

Recent Activity

Donate For Us

Quick and Merge sort for multiple CPUs

Tags:

language-agnostic

algorithm

sorting

parallel-processing

Michael

1 Answers

Merge Sort

Quick Sort

Conclusion

Xantix

Related questions

Recent Activity

Donate For Us