Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correlation matrix does not show all columns python

I am trying to solve the "House Prices" challenge from Kaggle and I'm stuck on my correlation matrix because it simply doesn't show all columns I want. Initially, it was obviously because of the large number of columns, so I did this:

df = df_data[['SalePrice', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street', 'Alley', 'LotShape', 'LandContour', 'Utilities']].copy()    

corrmax = df.corr()

f, ax = plt.subplots(figsize=(16,12))
sns.heatmap(corrmax, annot = True)

And then, the result is a heatmap with only SalePrice, MSSubClass, LotFrontage and LotArea for some reason. Can anyone please help me?

like image 219
Violeta Gouvêa de Carvalho Avatar asked Nov 15 '25 13:11

Violeta Gouvêa de Carvalho


1 Answers

If you analysis the dataset of House Prices House Prices there are about 21-23 categorical variables 'MSZoning','Alley' The corr() matrix only show their relationship between the numerical values or non-categorical variables

corrmax = df.corr()

If you want to find the relation between the categorical and non-categorical variables use need to use the Spearman correlation matrix

You will find some help from the links below...

An overview of correlation measures between categorical and continuous variables

Correlation between a nominal (IV) and a continuous (DV) variable

like image 115
Sohaib Aslam Avatar answered Nov 18 '25 05:11

Sohaib Aslam



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!