I have a dataframe where each of the rows has a certain weight which needs to be accounted for in the mean computations. I love seaborn factorplots and their bootstrapped 95% confidence intervals but haven't been able to get seaborn to accept a new weighted mean estimator.
Here is an example of what I would like to do.
tips_all = sns.load_dataset("tips")
tips_all["weight"] = 10 * np.random.rand(len(tips_all))
sns.factorplot("size", "total_bill",
data=tips_all, kind="point")
# here I would like to have a mean estimator that computes a weighted mean
# the bootstrapped confidence intervals should also use this weighted mean estimator
# something like (tips_all["weight"] * tips_all["total_bill"]).sum() / tips_all["weight"].sum()
# but on bootstrapped samples (for the confidence interval)
From @mwaskom: https://github.com/mwaskom/seaborn/issues/722
It's not really supported, but I think it is possible to hack together a solution. This seems to work?
tips = sns.load_dataset("tips")
tips["weight"] = 10 * np.random.rand(len(tips))
tips["tip_and_weight"] = zip(tips.tip, tips.weight)
def weighted_mean(x, **kws):
val, weight = map(np.asarray, zip(*x))
return (val * weight).sum() / weight.sum()
g = sns.factorplot("size", "tip_and_weight", data=tips,
estimator=weighted_mean, orient="v")
g.set_axis_labels("size", "tip")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With