I'm trying to produce two seaborn kernel density plots (kdeplot) side by side.
Three features (Community School?, Economic Need Index, School Income Estimate) are used here. The only categorical feature 'Community School?' is shown as green-blue colors representing its levels. 'Economic Need Index' and 'School Income Estimate' are for the two kdeplots respectively.
The image created using the code shown below is the best result I could get, but it has problems.
1) y-axis scale of the second plot is wrong (it should be some integer scales like the first plot) correction: kdeplot is normed (everything sums up to 1), so y-axis is correct given its x values.
2) an extra axis(?) is produced along below the two plots
3) I want to add a title for each subplot
I found kdeplot doesn't support hue so I tried to make it work with FacetGrid. Not sure if it's the right way to do it. Would appreciate if a better method is provided.
fig, (ax1, ax2) = plt.subplots(1, 2)
fig.subplots_adjust(wspace=.8)
fg = sns.FacetGrid(df, hue='Community School?', size=3)
fg.map(sns.kdeplot, 'Economic Need Index', shade=True, ax=ax1, label='Economic Need Index')
fg.map(sns.kdeplot, 'School Income Estimate', shade=True, ax=ax2, label='School Income Estimate')
plt.show()

# my dataset looks like:
Community School? / Economic Need Index / School Income Estimate
0 Yes 0.919 31141.72
1 No 0.641 56462.88
2 No 0.744 44342.61
3 No 0.860 31454.00
4 No 0.730 46435.59
5 No 0.858 39415.45
6 No 0.499 43706.73
7 No 0.833 28820.67
8 No 0.849 34889.24
9 No 0.861 35545.10
10 No 0.559 40809.90
11 Yes 0.917 27881.59
12 Yes 0.832 NaN
13 No 0.791 NaN
14 No 0.362 63760.00
15 No 0.771 NaN
16 No 0.451 62519.57
17 No 0.430 57504.48
18 No 0.448 56787.20
19 No 0.764 NaN
20 No 0.610 NaN
21 No 0.257 76833.96
22 No 0.597 NaN
23 No 0.769 32817.79
24 No 0.858 26114.78
25 No 0.176 103399.19
26 No 0.101 144270.13
27 No 0.293 98455.77
28 No 0.430 88011.14
29 No 0.153 102421.46
... ... ... ...
And a full dataset can be found here.
Consider melting your dataframe to have one value column and one indicator columns for Economic Need Indicator and School Income Estimate. Then, plot without matplotlib's subplots() call, only seaborn's FacetGrid with adjustments to default plot attributes:
long_df = pd.melt(df, id_vars='Community School?', var_name='Indicator', value_name='value')
print(long_df.head())
# Community School? Indicator value
# 0 Yes Economic Need Index 0.919
# 1 No Economic Need Index 0.641
# 2 No Economic Need Index 0.744
# 3 No Economic Need Index 0.860
# 4 No Economic Need Index 0.730
fg = sns.FacetGrid(long_df, col='Indicator', hue='Community School?',
sharex=False, sharey=False, size=4)
fg.map(sns.kdeplot, 'value', shade=True, label='Data')\
.add_legend()\
.set_titles("{col_name}")\
.set_axis_labels('')
plt.show()
plt.clf()
plt.close('all')

You are getting the additional figure because FacetGrid automatically opens its own window when called. See the answer to this question for further details. Here therefore is a simpler approach that works. I have added two optional lines to replace the NaNs with the mean for each type of school.
s = df.groupby(['Community School?'])['School Income Estimate'].transform('mean')
df['School Income Estimate'].fillna(s, inplace=True)
plt.subplots(1, 2)
plt.subplot(1, 2, 1)
a = sns.kdeplot(df.loc[df['Community School?'] == 'No', 'Economic Need Index'], shade=True, label='No')
b = sns.kdeplot(df.loc[df['Community School?'] == 'Yes', 'Economic Need Index'], color= 'red', shade=True, label='Yes')
plt.title('KDE of Economic Need Index')
plt.subplot(1, 2, 2)
c = sns.kdeplot(df.loc[df['Community School?'] == 'No', 'School Income Estimate'], shade=True, label='No')
d = sns.kdeplot(df.loc[df['Community School?'] == 'Yes', 'School Income Estimate'], color= 'red', shade=True, label='Yes')
plt.title('KDE of School Income Estimate')

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With