I want to sort a pandas dataframe with respect to multiple columns, where for some of the columns ("col2" and "col3") I want to use this custom defined compare function that takes two elements:
example:
>>> df = pd.DataFrame({"col1": [1,2,3], "col2": [[2], [], [1]], "col3": [[1,0,1], [2,2,2], [3]]})
>>> df
col1 col2 col3
0 1 [2] [1, 0, 1]
1 2 [] [2, 2, 2]
2 3 [1] [3]
def compare_fn(l1, l2): #list 1 and list 2
if len(l1) < len(l2):
return -1 # l1 is of smaller value than l2
if len(l1) > len(l2):
return 1 # l2 is of smaller value than l1
else:
for i in range(len(l1)):
if l1[i] < l2[i]:
return -1
elif l1[i] > l2[i]:
return 1
return 0 # l1 and l2 have same value
now, I would like to sort with respect to all 3 columns, where in col2
and col3
the function used to compare two elements that I want used is my custom defined function. (for col1 it's a straightforward sort).
I tried:
df.sort_values(["col1", "col2", "col3"], key=[None, compare_fn, compare_fn])
, this returns a 'list' object is not callable
error.
from functools import cmp_to_key; df.sort_values(["col1", "col2", "col3"], key=[None, cmp_to_key(compare_fn), cmp_to_key(compare_fn)])
, this returns a 'list' object is not callable
error.
I even tried disregarding the first column all together and passing one argument to the key:
df[["col2", "col3"]].sort_values(["col2", "col3"], key=cmp_to_key(compare_fn))
returns TypeError: object of type 'functools.KeyWrapper' has no len()
and
df[["col2", "col3"]].sort_values(["col2", "col3"], key=compare_fn)
returns TypeError: compare_fn() missing 1 required positional argument: 'l2'
.
So i know that at least one of my problems is not knowing how to use a two-element compare function for sorting a pandas DataFrame column.
Your key function needs to take the whole series as the input.
Rewrite your function like this:
def compare_fn(l): #list 1 and list 2
return [(len(x), tuple(x)) for x in l]
(df.sort_values('col1')
.sort_values(['col2','col3'],
key=compare_fn, kind='mergesort')
)
Output:
col1 col2 col3
1 2 [] [2, 2, 2]
2 3 [1] [3]
0 1 [2] [1, 0, 1]
Update Also we can rewrite the function so as it works for other columns:
def compare_fn(l): #list 1 and list 2
return ([(len(x), tuple(x)) for x in l]
if type(l[0]) == list # case list
else l # case integer
)
df.sort_values(['col1','col2','col3'], key=compare_fn)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With