Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort a subset of columns of a pandas dataframe alphabetically by column name

I'm having trouble finding the solution to a fairly simple problem.

I would like to alphabetically arrange certain columns of a pandas dataframe that has over 100 columns (i.e. so many that I don't want to list them manually).

Example df:

import pandas as pd

subject = [1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,4,4,4,4,4,4]
timepoint = [1,2,3,4,5,6,1,2,3,4,5,6,1,2,4,1,2,3,4,5,6]
c = [2,3,4,5,6,7,3,4,1,2,3,4,5,4,5,8,4,5,6,2,3]
d = [2,3,4,5,6,7,3,4,1,2,3,4,5,4,5,8,4,5,6,2,3]
a = [2,3,4,5,6,7,3,4,1,2,3,4,5,4,5,8,4,5,6,2,3]
b = [2,3,4,5,6,7,3,4,1,2,3,4,5,4,5,8,4,5,6,2,3]

df = pd.DataFrame({'subject':subject,
                   'timepoint':timepoint,
                   'c':c,
                   'd':d,
                   'a':a,
                   'b':b})

df.head()

   subject  timepoint  c  d  a  b
0        1          1  2  2  2  2
1        1          2  3  3  3  3
2        1          3  4  4  4  4
3        1          4  5  5  5  5
4        1          5  6  6  6  6

How could I rearrange the column names to generate a df.head() that looks like this:

   subject  timepoint  a  b  c  d
0        1          1  2  2  2  2
1        1          2  3  3  3  3
2        1          3  4  4  4  4
3        1          4  5  5  5  5
4        1          5  6  6  6  6

i.e. keep the first two columns where they are and then alphabetically arrange the remaining column names.

Thanks in advance.

like image 947
Dr Wampa Avatar asked Sep 17 '25 22:09

Dr Wampa


2 Answers

You can split your your dataframe based on column names, using normal indexing operator [], sort alphabetically the other columns using sort_index(axis=1), and concat back together:

>>> pd.concat([df[['subject','timepoint']],
           df[df.columns.difference(['subject', 'timepoint'])]\
               .sort_index(axis=1)],ignore_index=False,axis=1)

    subject  timepoint  a  b  c  d
0         1          1  2  2  2  2
1         1          2  3  3  3  3
2         1          3  4  4  4  4
3         1          4  5  5  5  5
4         1          5  6  6  6  6
5         1          6  7  7  7  7
6         2          1  3  3  3  3
7         2          2  4  4  4  4
8         2          3  1  1  1  1
9         2          4  2  2  2  2
10        2          5  3  3  3  3
11        2          6  4  4  4  4
12        3          1  5  5  5  5
13        3          2  4  4  4  4
14        3          4  5  5  5  5
15        4          1  8  8  8  8
16        4          2  4  4  4  4
17        4          3  5  5  5  5
18        4          4  6  6  6  6
19        4          5  2  2  2  2
20        4          6  3  3  3  3
like image 186
sophocles Avatar answered Sep 19 '25 12:09

sophocles


Specify the first two columns you want to keep (or determine them from the data), then sort all of the other columns. Use .loc with the correct list to then "sort" the DataFrame.

import numpy as np

first_cols = ['subject', 'timepoint']
#first_cols = df.columns[0:2].tolist()  # OR determine first two

other_cols = np.sort(df.columns.difference(first_cols)).tolist()

df = df.loc[:, first_cols+other_cols]

print(df.head())
   subject  timepoint  a  b  c  d
0        1          1  2  2  2  2
1        1          2  3  3  3  3
2        1          3  4  4  4  4
3        1          4  5  5  5  5
4        1          5  6  6  6  6
like image 41
ALollz Avatar answered Sep 19 '25 11:09

ALollz