I need to sort a list using python 3. There might be strings, integers, floats, or tuples, etc.
I am currently trying to make correct use of the sort function using the key parameter like this:
data.sort(key=gen_key)
...
def gen_key(self, value):
if is_number(value):
return str(value)
if isinstance(value, str):
return value
return '___' + type(value).__name__
But the problem is that numbers will now be sorted naturally. While I want to order the numbers and floats still like numbers and floats instead of treating them as strings.
The behavior is caused by the return str(value) part. But i cannot return a different type than a string, as that will raise an exception, as of python 3 strings wont be sorted with numbers like they did in python 2. The exception is the following
unordarable types: int() < str()
Any suggestions?
The trick is to make your key function return a tuple with a guaranteed comparable type in the first index, and the disparate types in subsequent indices.
While not 100% identical to what Python 2 does, for your specific case of "numbers to the front, everything else compared by typename" you can do this with a reasonably efficient key function:
>>> from numbers import Number
>>> seq = ['Z', 3, 'Y', 1, 'X', 2.5, False, (1, 2), [2, 3], None]
>>> sorted(seq, key=lambda x: (x is not None, '' if isinstance(x, Number) else type(x).__name__, x))
[None, False, 1, 2.5, 3, [2, 3], 'X', 'Y', 'Z', (1, 2)]
The key function here makes the first element of the key a simple bool, forcing None to sort before everything else (Py2 did the same thing), then sorting all numeric types first by using the empty string for the second part of the key, where everything else uses their type name (also like Py2). Once you've gotten past those first two indices, what's left is the same type, and should compare just fine.
The main flaw here is that comparable non-numeric types like set and frozenset won't compare to one another, they'll be sorted by typename only (a custom key class using exceptions could handle this).
It also won't handle the recursive case; if the sequence contains [2, 3] and ['a', 'b'], it will have a TypeError comparing 2 with 'a', but nothing short of a ridiculously involved key class could handle that.
If that's not an issue, this is cheap to run and relatively simple.
Unlike solutions involving custom classes with __lt__ defined to perform the comparison, this approach has the advantage of generating built-in keys, which compare efficiently with minimal execution of Python-level code during the sort.
Timings:
# Multiply out the sequence so log n factor in n log n work counts for something
>>> seq = ['Z', 3, 'Y', 1, 'X', 2.5, False, (1, 2), [2, 3], None] * 100
# Verify equivalence
>>> sorted(seq, key=Py2Key) == sorted(seq, key=lambda x: (x is not None, '' if isinstance(x, Number) else type(x).__name__, x))
True
# Timings in seconds for the fastest time (of 3 trials) to run the sort 1000 times:
>>> import timeit
# Py2Key class
>>> min(timeit.repeat('sorted(seq, key=Py2Key)', 'from __main__ import seq, Py2Key', number=1000))
5.251885865057375
>>> min(timeit.repeat('sorted(seq, key=lambda x: (x is not None, "" if isinstance(x, Number) else type(x).__name__, x))', 'from __main__ import seq, Number', number=1000))
1.9556877178131344
Basically, avoiding the overhead of dynamic Python-level __lt__ is reducing runtime by just over 60%. It doesn't seem to be an algorithmic improvement (a seq 100 times longer has the same runtime ratio), just a reduction in fixed overhead, but it's a non-trivial reduction.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With