Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PySpark: TypeError: StructType can not accept object 0.10000000000000001 in type <type 'numpy.float64'>

when using PySpark with the following code:

from pyspark.sql.types import *
samples = np.array([0.1,0.2])
dfSchema = StructType([StructField("x", FloatType(), True)])
spark.createDataFrame(samples,dfSchema)

I get:

TypeError: StructType can not accept object 0.10000000000000001 in type type 'numpy.float64'>

Any idea?

like image 838
Romeo Kienzler Avatar asked Oct 16 '25 03:10

Romeo Kienzler


1 Answers

NumPy types, including numpy.float64, are not a valid external representation for Spark SQL types. Furthermore schema you use doesn't reflect the shape of the data.

You should use standard Python types, and corresponding DataType directly:

spark.createDataFrame(samples.tolist(), FloatType()).toDF("x")
like image 127
zero323 Avatar answered Oct 17 '25 18:10

zero323



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!