Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Increasing performance in custom sorting of a string array

I am trying to find an efficient way to sort an array of strings based on a numeric value within each string element of the array. I am currently using the Array.Sort(array, customComparer) static method (quick sort), with my custom comparer class (sorting in descending order) being:

class StringComparer : IComparer<string>
{
    public int Compare(string a, string b)
    {
        string s1 = a;
        string s2 = b;

        Match matchA = Regex.Match(s1, @"\d+$");
        Match matchB = Regex.Match(s2, @"\d+$");

        long numberA = long.Parse(matchA.Value);
        long numberB = long.Parse(matchB.Value);

        if (numberB - numberA < 0)
        {
            return -1;
        }
        else 
        {
            return 1;
        }
    }
}

This works very well, but sometimes it takes too much time to sort, with an array of 100 000 strings taking more than a minute on a 2.4Ghz processor. I wonder if there is a more efficient way to accomplish the same. For example, implementing a different sorting algorithm or taking another approach like using a dictionary and sorting on the value (the value being the numeric part of the string). Any suggestions? Thanks in advance!

like image 626
cake Avatar asked Jan 27 '26 12:01

cake


2 Answers

You're parsing the value for each comparison. I would suggest you parse once, to get a string/long pair, sort that, and then extract the string part afterwards.

Note that your existing code has a bug: it will never return 0, for two strings comparing as equal.

Here's an alternative approach using LINQ (which isn't in-place sorting, but is simple.)

var sorted = unsorted.OrderBy(x => long.Parse(Regex.Match(x, @"\d+$").Value));
                     .ToList();

(OrderBy projects once to get the keys, then compares keys.)

like image 127
Jon Skeet Avatar answered Jan 29 '26 01:01

Jon Skeet


You are now performing the Regexes O(n log n) times.

Consider looping once over all strings, extracting the numerical value and adding it to a SortedDictionary<long, string>

This requires only O(n) executions of the Reg expression. The rest of the sorting should be comparable.

like image 31
Henk Holterman Avatar answered Jan 29 '26 02:01

Henk Holterman



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!