Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pass multiple Columns as features in a Logistic Regression Classifier in Spark? [duplicate]

I am trying to run Logistic regression with a simple data set to understand the syntax of pyspark. I have data which looks has 11 columns where the first 10 columns are features and the last column(11th column) is the label. I want to pass these 10 columns as features and the 11th column as label. But I only know to pass as a single column to pass as a feature using featuresCol="col_header_name" I have read the data from a csv file using pandas but I have converted it into RDD. here is the code:

from pyspark.ml.classification import LogisticRegression
from pyspark.sql import SQLContext
from pyspark import SparkContext
import pandas as pd
data = pd.read_csv('abc.csv')
sc = SparkContext("local", "App Name")
sql = SQLContext(sc)
spDF = sql.createDataFrame(data)
tri=LogisticRegression(maxIter=10,regParam=0.01,featuresCol="single_column",labelCol="label")
lr_model = tri.fit(spDF)

if I use featuresCol=[list_of_header_names] I get errors. I have used sk-learn which has really simple syntax something like:

reg=LogisticRegression()
reg=reg.fit(Dataframe_of_features,Label_array)
like image 845
A-ar Avatar asked Oct 15 '25 03:10

A-ar


1 Answers

You need to combine all the columns into one array of feature using Vector Assembler.

from pyspark.ml.linalg import Vectors
from pyspark.ml.feature import VectorAssembler
assembler = VectorAssembler(inputCols=[list_of_header_names],outputCol="features")
spDF = assembler.transform(spDF)

You can then pass that assembled array of all the variables as an input to the logistic regression.

tri=LogisticRegression(maxIter=10,
                       regParam=0.01,
                       featuresCol="features",
                       labelCol="label")
lr_model = tri.fit(spDF)
like image 58
pratiklodha Avatar answered Oct 17 '25 17:10

pratiklodha



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!