Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to display standard errors with ml_linear_regression in sparklyr?

When running a linear regression using sparklyr, such as:

cached_cars %>%
  ml_linear_regression(mpg ~ .) %>%
  summary()

The results do not include standard errors

Deviance Residuals:
     Min       1Q   Median       3Q      Max 
-3.47339 -1.37936 -0.06554  1.05105  4.39057 

Coefficients:
(Intercept) cyl_cyl_8.0 cyl_cyl_4.0        disp          hp        drat
16.15953652  3.29774653  1.66030673  0.01391241 -0.04612835  0.02635025
          wt        qsec          vs          am       gear        carb 
 -3.80624757  0.64695710  1.74738689  2.61726546 0.76402917  0.50935118  

R-Squared: 0.8816
Root Mean Squared Error: 2.041
  1. Is there a way to display standard errors when running this regression?
  2. Is there a way to cluster standard errors in sparklyr?
  3. I have also been trying to run a linear model with multiple group fixed effects in sparklyr. In base R, I have done so with felm. Does anyone have experience doing this in sparklyr?

Solutions using SparkR are also highly appreciated.

like image 963
aquev Avatar asked Nov 19 '25 10:11

aquev


1 Answers

I received a useful answer to my first question at community.rstudio.com.

The answer from yitaoli is the following:

library(sparklyr)

spark_version <- "2.4.4" # This is the version of Spark I ran this example code with,
# but I think everything that follows should work in all versions of Spark anyways

sc <- spark_connect(master = "local", version = spark_version)

cached_cars <- copy_to(sc, mtcars)
model <- cached_cars %>%
  ml_linear_regression(mpg ~ .)

coeff_std_errs <- invoke(model$model$.jobj, "summary") %>%
  invoke("coefficientStandardErrors")

print(coeff_std_errs)
like image 93
aquev Avatar answered Nov 21 '25 00:11

aquev



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!