Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When using pd.concat() the resulting dataframe column names appear in parentheses and with an added comma

I'm fairly new to Python programming, and I can't figure out why this is happening... I'm using the "Online Shoppers Purchasing Intention Dataset" from the UCI Machine Learning Repository. I split the data which has numerical features and categorical features into two separate data frames (one for cat. data and one for num. data), to dummify the categorical variables, and to standardize the numerical variables. The two dataframes I created are 'StdNumFeat' for the standardized numericals and 'DumData' for the dummified categorical variables.

This is an excerpt of StdNumFeat.head()

   Administrative   Administrative_Duration Informational   Informational_Duration  ProductRelated
0   -0.696993              -0.457191          -0.396478            -0.244931           -0.691003
1   -0.696993              -0.457191          -0.396478            -0.244931           -0.668518
2   -0.696993              -0.457191          -0.396478            -0.244931           -0.691003
3   -0.696993              -0.457191          -0.396478            -0.244931           -0.668518
4   -0.696993              -0.457191          -0.396478            -0.244931           -0.488636

And this is an excerpt of DumData.head()

    Weekend Month_Aug   Month_Dec   Month_Feb   Month_Jul   Month_June  Month_Mar
0    False      0          0            1           0           0           0
1    False      0          0            1           0           0           0
2    False      0          0            1           0           0           0
3    False      0          0            1           0           0           0
4    False      0          0            1           0           0           0

When I concatenate the two dataframes using the following code:

data = pd.concat([StdNumFeat, DumData], axis=1)

The resulting data frame looks like this:

   (Administrative,)    (Administrative_Duration,)  (Informational,)    (Informational_Duration,)
0      -0.696993               -0.457191               -0.396478               -0.244931
1      -0.696993               -0.457191               -0.396478               -0.244931
2      -0.696993               -0.457191               -0.396478               -0.244931
3      -0.696993               -0.457191               -0.396478               -0.244931
4      -0.696993               -0.457191               -0.396478               -0.244931

Does anyone know why are the resulting column names followed by a comma, and in parentheses? What does that mean? Note: I'm using Jupyter Notebooks in Anaconda. Thanks.

like image 979
ThisNewGuy Avatar asked Jan 19 '26 05:01

ThisNewGuy


1 Answers

Problem is one level MultiIndex in StdNumFeat, obviously reason is set columns names by nested list:

StdNumFeat.columns = [['Administrative', 'Administrative_Duration', 'Informational',
                      'Informational_Duration', 'ProductRelated']]

Correct way:

StdNumFeat.columns = ['Administrative', 'Administrative_Duration', 'Informational',
                     'Informational_Duration', 'ProductRelated']
like image 166
jezrael Avatar answered Jan 20 '26 20:01

jezrael