I am trying to combine all feature columns into a single one
So:
assembler = VectorAssembler(
    inputCols=feature_list,
    outputCol='features')
In which:
feature_list is a Python list that contains all the feature column names
Then
trainingData = assembler.transform(df)
But when I did:

What is the correct way to use VectorAssembler?
Many thanks
Without the stack trace or the df example, it's hard to understand your issue.
But I'd still answer it, according to the documentation:
from pyspark.ml.linalg import Vectors
from pyspark.ml.feature import VectorAssembler
dataset = spark.createDataFrame(
    [(0, 18, 1.0, Vectors.dense([0.0, 10.0, 0.5]), 1.0)],
    ["id", "hour", "mobile", "userFeatures", "clicked"])
dataset.show()
# +---+----+------+--------------+-------+
# | id|hour|mobile|  userFeatures|clicked|
# +---+----+------+--------------+-------+
# |  0|  18|   1.0|[0.0,10.0,0.5]|    1.0|
# +---+----+------+--------------+-------+
assembler = VectorAssembler(
    inputCols=["hour", "mobile", "userFeatures"],
    outputCol="features")
output = assembler.transform(dataset)
print("Assembled columns 'hour', 'mobile', 'userFeatures' to vector column 'features'")
output.select("features", "clicked").show(truncate=False)
# +-----------------------+-------+
# |features               |clicked|
# +-----------------------+-------+
# |[18.0,1.0,0.0,10.0,0.5]|1.0    |
# +-----------------------+-------+
Example Source Code
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With