Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change order of columns in pandas dataframes in a loop

I have a number of pandas.Dataframe objects and want to reorder the columns of all of them in a for loop, but it's not working. What I have is:

import numpy as np
import pandas as pd

df1 = pd.DataFrame(np.random.rand(5, 5))
df2 = pd.DataFrame(np.random.rand(5, 5))

dfs = [ df1, df2 ]

Now, changing the name of the columns works:

for df in dfs:
    df.columns = [ 'a', 'b', 'c', 'd', 'e' ]

df1.head()

prints (columns with letters instead of numbers):

          a         b         c         d         e
0  0.276383  0.655995  0.512101  0.793673  0.165763
1  0.841603  0.831268  0.776274  0.670846  0.847065
2  0.626632  0.448145  0.184613  0.763160  0.337947
3  0.502062  0.881765  0.154048  0.908834  0.669257
4  0.254717  0.538606  0.677790  0.088452  0.014447

However, changing the order of the columns is not working in the same way. The following loop:

for df in dfs:
    df = df[ [ 'e', 'd', 'c', 'b', 'a' ] ]

leaves the dataframes unchanged.

If I do it for each dataframe, outside the for loop, it works, though:

df1 = df1[ [ 'e', 'd', 'c', 'b', 'a' ] ]
df1.head()

prints the following:

          e         d         c         b         a
0  0.165763  0.793673  0.512101  0.655995  0.276383
1  0.847065  0.670846  0.776274  0.831268  0.841603
2  0.337947  0.763160  0.184613  0.448145  0.626632
3  0.669257  0.908834  0.154048  0.881765  0.502062
4  0.014447  0.088452  0.677790  0.538606  0.254717

Why can't I loop over the dataframes to change the column order?

How can I loop over the dataframes in the list to change the column order?


Working with python 3.5.3, pandas 0.23.3

like image 356
Luis Avatar asked Nov 24 '25 13:11

Luis


1 Answers

I've spent a while on it, it actually gave me a nice puzzle.
It works this way, because in your first loop you modify the existing objects, but in the second loop you actually create new objects and overwrite the old ones; by that the list dfs loses its references to df1 and df2. If you want the code to work in the way that after second loop you'd like to see the changes applied to df1 and df2, you can only use methods, that operate on the original dataframe and do not require overwriting.
I'm not convinced that my way is the optimal one, but that's what I mean:

import numpy as np
import pandas as pd

df1 = pd.DataFrame(np.random.rand(5, 5))
df2 = pd.DataFrame(np.random.rand(5, 5))

dfs = [ df1, df2 ]

for df in dfs:
    df.columns = [ 'a', 'b', 'c', 'd', 'e' ]

for df in dfs:
    for c in ['e', 'd', 'c', 'b', 'a']:
        df.insert(df.shape[1],c+'_new',df[c])
    #df.drop(['e', 'd', 'c', 'b', 'a'], axis=1)
    for c in [ 'a', 'b', 'c', 'd', 'e' ]:
        del df[c]
    df.columns = ['e', 'd', 'c', 'b', 'a']

Then calling df1 prints:

           e           d           c           b           a
0   0.550885    0.879557    0.202626    0.218867    0.266057
1   0.344012    0.767083    0.139642    0.685141    0.559385
2   0.271689    0.247322    0.749676    0.903162    0.680389
3   0.643675    0.317681    0.217223    0.776192    0.665542
4   0.480441    0.981850    0.558303    0.780569    0.484447
like image 101
pmarcol Avatar answered Nov 26 '25 04:11

pmarcol



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!