In R, we below code for weighted GLM:
glm(formula, weight)
R Documentation: an optional vector of ‘prior weights’ to be used in the fitting process. Should be NULL or a numeric vector
In Python, using statsmodel.formula.api:
smf.glm(formula, data, freq_weight)
Python Documentation: 1d array of frequency weights. The default is None. If None is selected or a blank value, then the algorithm will replace with an array of 1’s with length equal to the endog.
Is the "weight" in R same as "freq_weight" in Python? (I am getting different Beta estimates in Python and R. They are close but slightly different)
As far as I remember, R glm weights are var_weights not freq_weights.
statsmodels GLM has both. In some cases both kinds of weights produce the same results, but not for all family link combinations and standard errors can differ in general.
This notebook illustrates some of the differences https://www.statsmodels.org/stable/examples/notebooks/generated/glm_weights.html
var_weights are often used when the outcome variable represents an average of several observations and the variance depends on the number of observations that have been used in the average.
freq_weights are mainly a short cut if we have several identical observations. For example, if we only have categorical explanatory variables, then freq_weights can be use for the counts of unique observations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With