Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to use StructField with PySpark

I'm running the PySpark shell and unable to create a dataframe. I've done

import pyspark
from pyspark.sql.types import StructField
from pyspark.sql.types import StructType

all without any errors returned.

Then I tried running these commands:

schemaString = "name age"
fields = [StructField(field_name, StringType(), True) for field_name in schemaString.split()]

And keep getting the error: ` name 'StructField' is not defined

Basically, I'm following the Spark documentation here: https://spark.apache.org/docs/1.3.0/sql-programming-guide.html

Weird, if I remove the for loop and do this, it works:

fields = [StructField('field1', StringType(), True)]
like image 866
simplycoding Avatar asked Oct 21 '25 03:10

simplycoding


1 Answers

It works for following code. Document for StructField and StringType. While 1.3 is pretty old.

from pyspark.sql.types import *
schemaString = "name age"

fields = [StructField(field_name, StringType(), True) 
    for field_name in schemaString.split()]
like image 116
Rockie Yang Avatar answered Oct 23 '25 15:10

Rockie Yang