I'm running the PySpark shell and unable to create a dataframe. I've done
import pyspark
from pyspark.sql.types import StructField
from pyspark.sql.types import StructType
all without any errors returned.
Then I tried running these commands:
schemaString = "name age"
fields = [StructField(field_name, StringType(), True) for field_name in schemaString.split()]
And keep getting the error: ` name 'StructField' is not defined
Basically, I'm following the Spark documentation here: https://spark.apache.org/docs/1.3.0/sql-programming-guide.html
Weird, if I remove the for loop and do this, it works:
fields = [StructField('field1', StringType(), True)]
It works for following code. Document for StructField and StringType. While 1.3 is pretty old.
from pyspark.sql.types import *
schemaString = "name age"
fields = [StructField(field_name, StringType(), True)
for field_name in schemaString.split()]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With