Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Key function when using sort_index()

I cannot understand how key function works when sorting indexes of a Series. For example I have such Series:

(0, 4)     k
(12, 16)   a
(24, 28)   b
(4, 8)     f
(8, 12)    g

and i want the indexes to be in the next order:

(0, 4)
(4, 8)
(8, 12)
(12, 16)
(24, 28)

These are not tuples, but just strings. When I sort it as a list, I just create a key function and to each element it returns the first number, by which the elements are sorted. But in documentation to sort_index() it is said that the key function should receive a series and return a series. So how does it work here? Sorting a list and changing indexes to it does not help as the values become separated from the original indexes.

This is how i worked with list. Here is the DataFrame. Earlier I created wage_bin to make an interval for each wage.

  person  col2 col3  wage wage_bin
0      a     5    g     4    (0,4]
1      b     3    e    14  (12,16]
2      c     4    e    25  (24,28]
3      d     8    p     9   (8,12]
4      a     1    s     5    (4,8]
5      d     6    x    12   (8,12]

g as I understand is a Series type

g = df.groupby('wage_bin').size()
wage_bin
(0,4]      1
(12,16]    1
(24,28]    1
(4,8]      1
(8,12]     2
dtype: int64

Here I made a list from indexes of series g and sorted by using partition to take a number between '(' and ','

k = list(g.index)
k.sort(key=lambda x: int(x.partition('(')[2].partition(',')[0]))
print(k)
['(0,4]', '(4,8]', '(8,12]', '(12,16]', '(24,28]']

so I understand how key works in list case. Instead of x we have an element of the list. But I could not get anything sensible when tried to use key function for sort_series(). I do not understand what to perform with an x in the function when x is a series.

like image 971
Duck Avatar asked Feb 04 '26 21:02

Duck


1 Answers

You can:

  1. temporarily create a new column with some regex (str.extract()) on the index. Change the last ) before the single quote ' to a ] if you have a bracket instead of a parenthesis
  2. sort by this temporary column
  3. and drop the unnecessary column

import pandas as pd
df = pd.DataFrame({'A': {0: '(0, 4)', 1: '(12, 16)', 2: '(24, 28)', 3: '(4, 8)', 4: '(8, 12)'},
 'B': {0: 'k', 1: 'a', 2: 'b', 3: 'f', 4: 'g'}}).set_index('A')
df['C'] = df.index.str.extract(',\s+(\d+)\)').astype(int)
df = df.sort_values('C').drop('C',axis=1)
df
Out[1]: 
          B
A          
(0, 4)    k
(4, 8)    f
(8, 12)   g
(12, 16)  a
(24, 28)  b
like image 181
David Erickson Avatar answered Feb 07 '26 11:02

David Erickson