Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most suitable sorting algorithm

I have to sort a large array of doubles of size 100000.

The point is that I do not want to sort the whole array but only find the largest 20000 elements in descending order.

Currently I am using selection sort. Any way to improve the performance?

like image 607
mihsathe Avatar asked Mar 02 '26 20:03

mihsathe


2 Answers

100,000 is not a very large array on most modern devices. Are you sure you can't just sort all of them using a standard library sorting function?

You can avoid a full sort by using a variation of heapsort. Normally in a heapsort you build a heap of the entire data set (100,000 elements in your case). Instead, only allow the heap to grow to 20,000 elements. Keep the largest element at the top of the heap. Once the heap is full (20,000 elements), you compare each subsequent element of the data set to the top of the heap. If the next data set element is larger than the top of the heap, just skip it. If it's smaller than the top of the heap, pop the top of the heap and insert the element from the data set.

Once you've gone through the entire data set, you have a heap of the 20,000 smallest elements of the data set. You can pop them one-by-one into an array to have a sorted array.

This algorithm runs in O(N log K) time, where N is the size of the data set (100,000 in your example) and K is the number of elements you want to keep (20,000 in your example).

like image 124
rob mayoff Avatar answered Mar 04 '26 09:03

rob mayoff


I'd suggest starting with bucket sort and then using some of the simpler algorithms to sort each bucket. If any of them is still too big, you can either use bucket sort again or another nlog(n) method (such as mergesort or quicksort). Otherwise, selection (or better, insertion) will do just fine.

Just for comparison: selection/insertion/quicksort is O(n*n), mergesort is O(nlog(n)), bucket sort is O(n*k), where k is the number of buckets. Choose k < log(n) and you'll get a better performance than the alternatives.

Note: quicksort's worst case scenario is O(n*n), but in practice it is much faster.

Update O(n*k) is the average performance for bucket sort, not the worst case, so the same note above applies.

like image 30
mgibsonbr Avatar answered Mar 04 '26 09:03

mgibsonbr



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!