when using PySpark with the following code:
from pyspark.sql.types import *
samples = np.array([0.1,0.2])
dfSchema = StructType([StructField("x", FloatType(), True)])
spark.createDataFrame(samples,dfSchema)
I get:
TypeError: StructType can not accept object 0.10000000000000001 in type type 'numpy.float64'>
Any idea?
NumPy types, including numpy.float64
, are not a valid external representation for Spark SQL types. Furthermore schema you use doesn't reflect the shape of the data.
You should use standard Python types, and corresponding DataType
directly:
spark.createDataFrame(samples.tolist(), FloatType()).toDF("x")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With