I train a simple CrossValidatorModel using logistic regression and spark-ml pipelines. I can predict new data, but I'd like to go beyond the black box and do some analysis of the coefficients
 val lr = new LogisticRegression().
  setFitIntercept(true).
  setMaxIter(maxIter).
  setElasticNetParam(alpha).
  setStandardization(true).
  setFamily("binomial").
  setWeightCol("weight").
  setFeaturesCol("features").
  setLabelCol("response")
val assembler = new VectorAssembler().
  setInputCols(Array("feat1", "feat2")).
  setOutputCol("features")
val modelPipeline = new Pipeline().
  setStages(Array(assembler,lr))
val evaluator = new BinaryClassificationEvaluator()
  .setLabelCol("response")
Then I define a grid of parameters and I train over the grid to get the best model wrt AUC
val paramGrid = new ParamGridBuilder().
  addGrid(lr.regParam, lambdas).
  build()
val pipeline = new CrossValidator().
  setEstimator(modelPipeline).
  setEvaluator(evaluator).
  setEstimatorParamMaps(paramGrid).
  setNumFolds(nfolds)
val cvModel = pipeline.fit(train)
How do I get coefficients (the betas) of the best logistic regression model?
Extract best model:
val bestModel = cvModel.bestModel match {
  case pm: PipelineModel => Some(pm)
  case _ => None
}
Find logistic regression model:
val lrm = bestModel
  .map(_.stages.collect { case lrm: LogisticRegressionModel => lrm })
  .flatMap(_.headOption)
Extract coefficients:
lrm.map(m => (m.intercept, m.coefficients))
Quick and dirty equivalent:
val lrm: LogisticRegressionModel = cvModel
  .bestModel.asInstanceOf[PipelineModel]
  .stages
  .last.asInstanceOf[LogisticRegressionModel]
(lrm.intercept, lrm.coefficients)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With