I've created a dummy dataframe which is similar to the one I'm using. The dataframe consists of Fare prices, Cabin-type, and Survival (1 is alive, 0 = dead).
The first plot creates many graphs via factorplot, with each graph representing the Cabin type. The x-axis is represented by the Fare price and Y-axis is just a count of the number of occurrences at that Fare price.
What I then did was created another series, via groupby of [Cabin, Fare] and then proceeded to take the mean of the survival to get the survival rate at each Cabin and Fare price.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame(dict(
Fare=[20, 10, 30, 40, 40, 10, 20, 30, 40 ,30, 20, 30, 30],
Cabin=list('AAABCDBDCDDDC'),
Survived=[1, 0, 0, 0 ,0 ,1 ,1 ,0 ,1 ,1 , 0, 1, 1]
))
g =sns.factorplot(x='Fare', col='Cabin', kind='count', data=df,
col_wrap=3, size=3, aspect=1.3, palette='muted')
plt.show()
x =df.groupby(['Cabin','Fare']).Survived.mean()
What I would like to do is, plot an lineplot on the count graph above, (so the x-axis is the same, and each graph is still represented by a Cabin-type), but I would like the y-axis to be the survival mean we calculated with the groupby series x in the code above, which when outputted would be the third column below.
Cabin Fare
A 10 0.000000
20 1.000000
30 0.000000
B 20 1.000000
40 0.000000
C 30 1.000000
40 0.500000
D 10 1.000000
20 0.000000
30 0.666667
The y-axis for the line plot should be on the right side, and the range I would like is [0, .20, .40, .60, .80, 1.0, 1.2]
I looked through the seaborn docs for a while, but I couldn't figure out how to properly do this.
My desired output looks something like this image. I'm sorry my writing looks horrible, I don't know how to use paint well. So the ticks and numbers are on the right side of each graph. The line plot will be connected via dots at each x,y point. So for Cabin A, the first x,y point is (10,0) with 0 corresponding to the right y-axis. The second point is (20,1) and so on.
Data operations:
Compute frequency counts:
df_counts = pd.crosstab(df['Fare'], df['Cabin'])
Compute means across the group and unstack it back to obtain a DF
. The Nan's
are left as they are and not replaced by zero's to show the break in the line plot or else they would be continuous which wouldn't make much sense here.
df_means = df.groupby(['Cabin','Fare']).Survived.mean().unstack().T
Prepare the x-axis labels as strings:
df_counts.index = df_counts.index.astype(str)
df_means.index = df_means.index.astype(str)
Plotting:
fig, ax = plt.subplots(1, 4, figsize=(10,4))
df_counts.plot.bar(ax=ax, ylim=(0,5), cmap=plt.cm.Spectral, subplots=True,
legend=None, rot=0)
# Use secondary y-axis(right side)
df_means.plot(ax=ax, secondary_y=True, marker='o', color='r', subplots=True,
legend=None, xlim=(0,4))
# Adjust spacing between subplots
plt.subplots_adjust(wspace=0.5, hspace=0.5)
plt.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With