The only reason I see for using merge sort over quick sort is if the list was already (or mostly) sorted.
Merge sort requires more space as it creates an extra array for storing, and no matter what it will compare every item.
Quick sort on the other hand does not require extra space, and doesn't swap or compare more than necessary.
It would seem unintuitive to say that because of large or small data sets one is better than the other.
For example, quoting Geeksforgeeks article on that:
Merge sort can work well on any type of data sets irrespective of its size (either large or small). whereas The quick sort cannot work well with large datasets.
And next it says:
Merge sort is not in place because it requires additional memory space to store the auxiliary arrays. whereas The quick sort is in place as it doesn’t require any additional storage.
I understand that space complexity and time complexity are separate things. But it still is an extra step, and of course the fact that writing everything on a new array with large data sets it would take more time.
As for the pivoting problem, the bigger the data set, the lower the chance of picking the lowest or highest item (unless, again, it's an almost sorted list).
So why is it considered that merge sort is a better way of sorting large data sets instead of quick sort?
Why is Merge sort better for large arrays and Quick sort for small ones? It would seem unintuitive to say that because of large or small data sets one is better than the other.
Assuming the dataset fits in memory (not paged out), the issue is not the size of the dataset, but a worst case pattern for a particular implementation of quick sort that result in O(n2) time complexity. Quick sort can use median of medians to guarantee worst case time complexity is O(n log(n)), but that ends up making it significantly slower than merge sort. An alternative is to switch to heap sort if the level of recursion becomes too deep, known as intro sort, and is used in some libraries.
https://en.wikipedia.org/wiki/Median_of_medians
https://en.wikipedia.org/wiki/Introsort
Merge sort requires more space as it creates an extra array for storing, and no matter what it will compare every item.
There are variations of merge sort that don't require any extra storage for data, but they tend to be about 50+% slower than standard merge sort.
Quick sort on the other hand does not require extra space, and doesn't swap or compare more than necessary.
Every element of a sub-array will be compared to the pivot element. As the number of equal elements increases, Lomuto partition scheme gets worse, while Hoare partition scheme gets better. With a lot of equal elements, Hoare partition scheme needlessly swaps equal elements, but the check to avoid the swaps generally costs more time than just swapping.
sorting an array of pointers to objects
Merge sort does more moves but fewer compares than quick sort. If sorting an array of pointers to objects, only the pointers are being moved, but comparing objects requires deference of the pointers and what is needed to compare objects. In this case and other cases where compare takes more time than moves, merge sort is faster.
large datasets that don't fit in memory
For datasets too large to fit in memory, a memory base sort is used to sort "chunks" of the dataset that will fit into memory then written to external storage. Then the "chunks" on external storage are merged using a k-way merge to produce a sorted dataset.
https://en.wikipedia.org/wiki/External_sorting
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With