Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weighted GLM: R Vs Python

In R, we below code for weighted GLM:

glm(formula, weight)

R Documentation: an optional vector of ‘prior weights’ to be used in the fitting process. Should be NULL or a numeric vector

In Python, using statsmodel.formula.api:

smf.glm(formula, data, freq_weight)

Python Documentation: 1d array of frequency weights. The default is None. If None is selected or a blank value, then the algorithm will replace with an array of 1’s with length equal to the endog.

Is the "weight" in R same as "freq_weight" in Python? (I am getting different Beta estimates in Python and R. They are close but slightly different)

like image 980
Ussu20 Avatar asked Jan 28 '26 07:01

Ussu20


1 Answers

As far as I remember, R glm weights are var_weights not freq_weights.

statsmodels GLM has both. In some cases both kinds of weights produce the same results, but not for all family link combinations and standard errors can differ in general.

This notebook illustrates some of the differences https://www.statsmodels.org/stable/examples/notebooks/generated/glm_weights.html

var_weights are often used when the outcome variable represents an average of several observations and the variance depends on the number of observations that have been used in the average.

freq_weights are mainly a short cut if we have several identical observations. For example, if we only have categorical explanatory variables, then freq_weights can be use for the counts of unique observations.

like image 142
Josef Avatar answered Jan 29 '26 21:01

Josef



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!