Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas sort column with numerical string

I have a DataFrame below:

col1

Numb10
Numb11
Numb12
Numb7
Numb8

How can I sort with number order:

col1

Numb7
Numb8
Numb10
Numb11
Numb12

I tried but got error TypeError: cannot convert the series to <class 'int'>.

df.sort_values(by = "col1", key = (lambda x: int(x[4:])))

Update with one missing in col1

like image 226
Peter Chen Avatar asked Jan 24 '26 07:01

Peter Chen


2 Answers

key in sort_values takes the Series as parameter instead of individual element. From the docs:

Apply the key function to the values before sorting. This is similar to the key argument in the builtin sorted() function, with the notable difference that this key function should be vectorized. It should expect a Series and return a Series with the same shape as the input. It will be applied to each column in by independently.

In your case, you can use .str and astype for slicing and type convertion:

df.sort_values(by='col1', key=lambda s: s.str[4:].astype(int))
     col1
3   Numb7
4   Numb8
0  Numb10
1  Numb11
2  Numb12
like image 197
Psidom Avatar answered Jan 25 '26 21:01

Psidom


Your x[4:] might not always be integers. You can verify with

# convert to numerical values, float, not integers
extracted_nums = pd.to_numeric(df['col1'].str[4:], errors='coerce')

# check for invalid values
# if not `0` means you have something that are not numerical
print(extracted_nums.isna().any())

# sort by values
df.loc[extracted_nums.sort_values().index]
like image 28
Quang Hoang Avatar answered Jan 25 '26 20:01

Quang Hoang



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!