Sort a subset of columns of a pandas dataframe alphabetically by column name

Question

I'm having trouble finding the solution to a fairly simple problem.

I would like to alphabetically arrange certain columns of a pandas dataframe that has over 100 columns (i.e. so many that I don't want to list them manually).

Example df:

import pandas as pd

subject = [1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,4,4,4,4,4,4]
timepoint = [1,2,3,4,5,6,1,2,3,4,5,6,1,2,4,1,2,3,4,5,6]
c = [2,3,4,5,6,7,3,4,1,2,3,4,5,4,5,8,4,5,6,2,3]
d = [2,3,4,5,6,7,3,4,1,2,3,4,5,4,5,8,4,5,6,2,3]
a = [2,3,4,5,6,7,3,4,1,2,3,4,5,4,5,8,4,5,6,2,3]
b = [2,3,4,5,6,7,3,4,1,2,3,4,5,4,5,8,4,5,6,2,3]

df = pd.DataFrame({'subject':subject,
                   'timepoint':timepoint,
                   'c':c,
                   'd':d,
                   'a':a,
                   'b':b})

df.head()

   subject  timepoint  c  d  a  b
0        1          1  2  2  2  2
1        1          2  3  3  3  3
2        1          3  4  4  4  4
3        1          4  5  5  5  5
4        1          5  6  6  6  6

How could I rearrange the column names to generate a df.head() that looks like this:

   subject  timepoint  a  b  c  d
0        1          1  2  2  2  2
1        1          2  3  3  3  3
2        1          3  4  4  4  4
3        1          4  5  5  5  5
4        1          5  6  6  6  6

i.e. keep the first two columns where they are and then alphabetically arrange the remaining column names.

Thanks in advance.

sophocles · Accepted Answer

You can split your your dataframe based on column names, using normal indexing operator [], sort alphabetically the other columns using sort_index(axis=1), and concat back together:

>>> pd.concat([df[['subject','timepoint']],
           df[df.columns.difference(['subject', 'timepoint'])]\
               .sort_index(axis=1)],ignore_index=False,axis=1)

    subject  timepoint  a  b  c  d
0         1          1  2  2  2  2
1         1          2  3  3  3  3
2         1          3  4  4  4  4
3         1          4  5  5  5  5
4         1          5  6  6  6  6
5         1          6  7  7  7  7
6         2          1  3  3  3  3
7         2          2  4  4  4  4
8         2          3  1  1  1  1
9         2          4  2  2  2  2
10        2          5  3  3  3  3
11        2          6  4  4  4  4
12        3          1  5  5  5  5
13        3          2  4  4  4  4
14        3          4  5  5  5  5
15        4          1  8  8  8  8
16        4          2  4  4  4  4
17        4          3  5  5  5  5
18        4          4  6  6  6  6
19        4          5  2  2  2  2
20        4          6  3  3  3  3

ALollz · Answer

Specify the first two columns you want to keep (or determine them from the data), then sort all of the other columns. Use .loc with the correct list to then "sort" the DataFrame.

import numpy as np

first_cols = ['subject', 'timepoint']
#first_cols = df.columns[0:2].tolist()  # OR determine first two

other_cols = np.sort(df.columns.difference(first_cols)).tolist()

df = df.loc[:, first_cols+other_cols]

print(df.head())
   subject  timepoint  a  b  c  d
0        1          1  2  2  2  2
1        1          2  3  3  3  3
2        1          3  4  4  4  4
3        1          4  5  5  5  5
4        1          5  6  6  6  6

Sort a subset of columns of a pandas dataframe alphabetically by column name

Tags:

python

pandas

dataframe

Dr Wampa

2 Answers

sophocles

ALollz

Recent Activity

Donate For Us

Sort a subset of columns of a pandas dataframe alphabetically by column name

Tags:

python

pandas

dataframe

Dr Wampa

2 Answers

sophocles

ALollz

Related questions

Recent Activity

Donate For Us