I am trying to model some data. I am having better luck with Excel than R, but the Excel solution won't scale so I need to figure how to do this in R.
Excel will map a trendline to the data and a Power curve yields a reasonable y = 0.6462x^-0.542.
When I put the same data into R and try to model it with the continuous power-law in the poweRlaw package I get something like y = 0.14901x^-3.03671. The intercept is way too small and the alpha is way too big.
# 14 days of % of users retained
y = c(0.61431 , 0.42585 , 0.35427 , 0.33893 , 0.28853 , 0.26004 , 0.2352 , 0.20087 , 0.17969 , 0.1848 , 0.17311 , 0.17092 , 0.15777 , 0.14901)
y.pl = conpl$new(y)
y.pl_est = estimate_xmin(c_pl)
y.pl_est
# $KS
# 0.1068587
#
# $xmin
# 0.14901
#
# $pars
# 3.03673
#
# $ntail
# 14

Is there a way to use lm or glm to do a power curve that give reasonable intercept and alpha?
Seems like Excel might be doing a linear model with normal errors on a log scale - I match the Excel results to as many decimal places as you share when I take logs of x and y before modeling.
Using @eipi10's nicely shared data frame:
dat = transform(dat, logx = log(x), logy = log(y))
mod = lm(logy ~ logx, data = dat)
## intercept
exp(coef(mod)[1])
# (Intercept)
# 0.6461621
## power
coef(mod)[2]
# logx
# -0.5424412
This works, of course because if
y = a * x ^ b
log(y) = log(a) + b * log(x)
So the fitted coefficients of the linear model are log(a) and b in the power model.
The difference is the assumption of the error distribution. The other answer using NLS minimizes squared error on the power scale - which is the MLE if you assume normally distributed errors in y. This method (so apparently Excel's method as well) assumes that errors are normal on the log scale, which means assuming log-normal errors on the untransformed scale - which may very well be more appropriate. (Though from the graph in eipi's answer we can see the differences in fitted values are very small.)
I haven't used the poweRlaw package, but R's base nls (non-linear least squares) function gives results similar to what you got with Excel. If there were a difference, after checking my code for errors, my first thought would be "so much the worse for Excel" :).
# Data
dat = data.frame(x=1:14,
y = c(0.61431 , 0.42585 , 0.35427 , 0.33893 , 0.28853 , 0.26004 , 0.2352 , 0.20087 , 0.17969 , 0.1848 , 0.17311 , 0.17092 , 0.15777 , 0.14901))
# Model
m1 = nls(y ~ a*x^b, list(a=1,b=1), data=dat)
summary(m1)
Formula: y ~ a * x^b
Parameters:
Estimate Std. Error t value Pr(>|t|)
a 0.62104 0.01307 47.51 4.94e-15 ***
b -0.51460 0.01525 -33.74 2.92e-13 ***
# Plot nls model
curve(coef(m1)[1]*x^coef(m1)[2], from=1, to=14)
# Add curve for Excel model in red
curve(0.6462*x^(-0.542), from=1, to=14, col="red", lty=2, add=TRUE)
# Add data points
points(dat$x, dat$y)

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With