Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort columns based the presence of a suffix in the column name

Tags:

python

pandas

I have a dataframe like:

enter image description here

The above df is just a very small sample. Actually I have around 8K + columns , I want to sort my dataframe such that all the columns ending with "_t1" comes in the end. enter image description here

I can definitely filter out a subset of the dataframe with code like:

data = data [data.columns[data .columns.str.endswith("_t1")]]

and then merging. Can there be any other simpler way of sorting a panda dataframe based on regex pattern of column names?

like image 869
Sushmita Avatar asked Nov 30 '25 08:11

Sushmita


2 Answers

You can create masks:

m = data.columns.str.endswith("_t1")

Or mask by regex:

m = data.columns.str.contains("_t1$")

Then join together:

cols = data.columns[~m].append(data.columns[m])

Or:

cols = data.columns[~m].tolist() + data.columns[m].tolist()

And change order of columns by subset:

df = data[cols]
like image 78
jezrael Avatar answered Dec 01 '25 21:12

jezrael


Another option is to use np.lexsort to sort columns then reorder by positional index; to separate columns that end with "_t1" to those that don't.

df = pd.DataFrame(columns=['abc', 'abc_t1', 'abcd', 'abcd_t1', 'xyz', 'xyz_t1'])
df

# Empty DataFrame
# Columns: [abc, abc_t1, abcd, abcd_t1, xyz, xyz_t1]
# Index: []
df.iloc[:, np.lexsort((df.columns.str.endswith('_t1'), ))]
# Alternatively,
df.iloc[:, np.argsort(df.columns.str.endswith('_t1'))]

# Empty DataFrame
# Columns: [abc, abcd, xyz, abc_t1, abcd_t1, xyz_t1]
# Index: []

If you need to handle more complicated regex suffixes then you can always extend the input to .endswith with an appropriate regex.

like image 42
cs95 Avatar answered Dec 01 '25 20:12

cs95