Sort columns based the presence of a suffix in the column name

Question

I have a dataframe like:

enter image description here

The above df is just a very small sample. Actually I have around 8K + columns , I want to sort my dataframe such that all the columns ending with "_t1" comes in the end. enter image description here

I can definitely filter out a subset of the dataframe with code like:

data = data [data.columns[data .columns.str.endswith("_t1")]]

and then merging. Can there be any other simpler way of sorting a panda dataframe based on regex pattern of column names?

jezrael · Accepted Answer

You can create masks:

m = data.columns.str.endswith("_t1")

Or mask by regex:

m = data.columns.str.contains("_t1$")

Then join together:

cols = data.columns[~m].append(data.columns[m])

Or:

cols = data.columns[~m].tolist() + data.columns[m].tolist()

And change order of columns by subset:

df = data[cols]

cs95 · Answer

Another option is to use np.lexsort to sort columns then reorder by positional index; to separate columns that end with "_t1" to those that don't.

df = pd.DataFrame(columns=['abc', 'abc_t1', 'abcd', 'abcd_t1', 'xyz', 'xyz_t1'])
df

# Empty DataFrame
# Columns: [abc, abc_t1, abcd, abcd_t1, xyz, xyz_t1]
# Index: []

df.iloc[:, np.lexsort((df.columns.str.endswith('_t1'), ))]
# Alternatively,
df.iloc[:, np.argsort(df.columns.str.endswith('_t1'))]

# Empty DataFrame
# Columns: [abc, abcd, xyz, abc_t1, abcd_t1, xyz_t1]
# Index: []

If you need to handle more complicated regex suffixes then you can always extend the input to .endswith with an appropriate regex.

Sort columns based the presence of a suffix in the column name

Tags:

python

pandas

Sushmita

2 Answers

jezrael

cs95

Recent Activity

Donate For Us

Sort columns based the presence of a suffix in the column name

Tags:

python

pandas

Sushmita

2 Answers

jezrael

cs95

Related questions

Recent Activity

Donate For Us