Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

writing DataFrame to TextFile in Pyspark

I am trying to save a dataframe 'df2' into a text file using below code

code: df2.write.format('text').mode('overwrite').save('/tmp/hive/save_text')

Error:

org.apache.spark.sql.AnalysisException: Text data source does not support int data type.;

Py4JJavaError Traceback (most recent call last) /databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 62 try: ---> 63 return f(*a, **kw) 64 except py4j.protocol.Py4JJavaError as e:

/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 327 "An error occurred while calling {0}{1}{2}.\n". --> 328 format(target_id, ".", name), value) 329 else:

Py4JJavaError: An error occurred while calling o1239.save. : org.apache.spark.sql.AnalysisException: Text data source does not support int data type.;

**Ask: Please suggest how to write data from a dataframe into a text file **

like image 364
datageek Avatar asked Dec 14 '25 15:12

datageek


1 Answers

Note that, in order to use write.format('text'), your dataframe must have only one column else it will throw error. Hence you need to covert all columns into single column.

Alternately, you can use write.format('csv') or else you can convert it into RDD and save it as text file.

say for example your dataframe contains two columns viz. id, name (id is int and name is string) and you want to write as id,name in output file. For this, write code as below:

df2.rdd.map(lambda x : str(x[0]) + "," + x[1]).saveAsTextFile('/tmp/hive/save_text')
like image 165
Karthik Avatar answered Dec 19 '25 07:12

Karthik



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!