Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

plotting a line graph on a count plot with a separate y-axis on the right side

I've created a dummy dataframe which is similar to the one I'm using. The dataframe consists of Fare prices, Cabin-type, and Survival (1 is alive, 0 = dead).

The first plot creates many graphs via factorplot, with each graph representing the Cabin type. The x-axis is represented by the Fare price and Y-axis is just a count of the number of occurrences at that Fare price.

What I then did was created another series, via groupby of [Cabin, Fare] and then proceeded to take the mean of the survival to get the survival rate at each Cabin and Fare price.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


df = pd.DataFrame(dict(
        Fare=[20, 10, 30, 40, 40, 10, 20, 30, 40 ,30, 20, 30, 30],
        Cabin=list('AAABCDBDCDDDC'),
        Survived=[1, 0, 0, 0 ,0 ,1 ,1 ,0 ,1 ,1 , 0, 1, 1]
    ))

g =sns.factorplot(x='Fare', col='Cabin', kind='count', data=df,
                  col_wrap=3, size=3, aspect=1.3,  palette='muted')

plt.show()

enter image description here

x =df.groupby(['Cabin','Fare']).Survived.mean()

What I would like to do is, plot an lineplot on the count graph above, (so the x-axis is the same, and each graph is still represented by a Cabin-type), but I would like the y-axis to be the survival mean we calculated with the groupby series x in the code above, which when outputted would be the third column below.

Cabin  Fare
A      10      0.000000
       20      1.000000
       30      0.000000
B      20      1.000000
       40      0.000000
C      30      1.000000
       40      0.500000
D      10      1.000000
       20      0.000000
       30      0.666667

The y-axis for the line plot should be on the right side, and the range I would like is [0, .20, .40, .60, .80, 1.0, 1.2]

I looked through the seaborn docs for a while, but I couldn't figure out how to properly do this.

My desired output looks something like this image. I'm sorry my writing looks horrible, I don't know how to use paint well. So the ticks and numbers are on the right side of each graph. The line plot will be connected via dots at each x,y point. So for Cabin A, the first x,y point is (10,0) with 0 corresponding to the right y-axis. The second point is (20,1) and so on. enter image description here

like image 745
Moondra Avatar asked Sep 19 '25 18:09

Moondra


1 Answers

Data operations:

Compute frequency counts:

df_counts = pd.crosstab(df['Fare'], df['Cabin'])

Image

Compute means across the group and unstack it back to obtain a DF. The Nan's are left as they are and not replaced by zero's to show the break in the line plot or else they would be continuous which wouldn't make much sense here.

df_means = df.groupby(['Cabin','Fare']).Survived.mean().unstack().T

Image

Prepare the x-axis labels as strings:

df_counts.index = df_counts.index.astype(str)
df_means.index = df_means.index.astype(str)

Plotting:

fig, ax = plt.subplots(1, 4, figsize=(10,4))
df_counts.plot.bar(ax=ax, ylim=(0,5), cmap=plt.cm.Spectral, subplots=True,               
                   legend=None, rot=0)
# Use secondary y-axis(right side)
df_means.plot(ax=ax, secondary_y=True, marker='o', color='r', subplots=True, 
              legend=None, xlim=(0,4))
# Adjust spacing between subplots
plt.subplots_adjust(wspace=0.5, hspace=0.5)
plt.show()

Image

like image 72
Nickil Maveli Avatar answered Sep 21 '25 07:09

Nickil Maveli