Change order of columns in pandas dataframes in a loop

Question

I have a number of pandas.Dataframe objects and want to reorder the columns of all of them in a for loop, but it's not working. What I have is:

import numpy as np
import pandas as pd

df1 = pd.DataFrame(np.random.rand(5, 5))
df2 = pd.DataFrame(np.random.rand(5, 5))

dfs = [ df1, df2 ]

Now, changing the name of the columns works:

for df in dfs:
    df.columns = [ 'a', 'b', 'c', 'd', 'e' ]

df1.head()

prints (columns with letters instead of numbers):

          a         b         c         d         e
0  0.276383  0.655995  0.512101  0.793673  0.165763
1  0.841603  0.831268  0.776274  0.670846  0.847065
2  0.626632  0.448145  0.184613  0.763160  0.337947
3  0.502062  0.881765  0.154048  0.908834  0.669257
4  0.254717  0.538606  0.677790  0.088452  0.014447

However, changing the order of the columns is not working in the same way. The following loop:

for df in dfs:
    df = df[ [ 'e', 'd', 'c', 'b', 'a' ] ]

leaves the dataframes unchanged.

If I do it for each dataframe, outside the for loop, it works, though:

df1 = df1[ [ 'e', 'd', 'c', 'b', 'a' ] ]
df1.head()

prints the following:

          e         d         c         b         a
0  0.165763  0.793673  0.512101  0.655995  0.276383
1  0.847065  0.670846  0.776274  0.831268  0.841603
2  0.337947  0.763160  0.184613  0.448145  0.626632
3  0.669257  0.908834  0.154048  0.881765  0.502062
4  0.014447  0.088452  0.677790  0.538606  0.254717

Why can't I loop over the dataframes to change the column order?

How can I loop over the dataframes in the list to change the column order?

Working with python 3.5.3, pandas 0.23.3

pmarcol · Accepted Answer

I've spent a while on it, it actually gave me a nice puzzle.
It works this way, because in your first loop you modify the existing objects, but in the second loop you actually create new objects and overwrite the old ones; by that the list dfs loses its references to df1 and df2. If you want the code to work in the way that after second loop you'd like to see the changes applied to df1 and df2, you can only use methods, that operate on the original dataframe and do not require overwriting.
I'm not convinced that my way is the optimal one, but that's what I mean:

import numpy as np
import pandas as pd

df1 = pd.DataFrame(np.random.rand(5, 5))
df2 = pd.DataFrame(np.random.rand(5, 5))

dfs = [ df1, df2 ]

for df in dfs:
    df.columns = [ 'a', 'b', 'c', 'd', 'e' ]

for df in dfs:
    for c in ['e', 'd', 'c', 'b', 'a']:
        df.insert(df.shape[1],c+'_new',df[c])
    #df.drop(['e', 'd', 'c', 'b', 'a'], axis=1)
    for c in [ 'a', 'b', 'c', 'd', 'e' ]:
        del df[c]
    df.columns = ['e', 'd', 'c', 'b', 'a']

Then calling df1 prints:

           e           d           c           b           a
0   0.550885    0.879557    0.202626    0.218867    0.266057
1   0.344012    0.767083    0.139642    0.685141    0.559385
2   0.271689    0.247322    0.749676    0.903162    0.680389
3   0.643675    0.317681    0.217223    0.776192    0.665542
4   0.480441    0.981850    0.558303    0.780569    0.484447

Change order of columns in pandas dataframes in a loop

Tags:

python

pandas

dataframe

Luis

1 Answers

pmarcol

Recent Activity

Donate For Us

Change order of columns in pandas dataframes in a loop

Tags:

python

pandas

dataframe

Luis

1 Answers

pmarcol

Related questions

Recent Activity

Donate For Us