I am trying to load a csv and make the second line as header. How to achieve this. Please let me know. Thanks.
file_location = "/mnt/test/raw/data.csv"
file_type = "csv"
infer_schema = "true"
delimiter = ","
data = spark.read.format(file_type) \
.option("inferSchema", infer_schema) \
.option("header", "false") \
.option("sep", delimiter) \
.load(file_location) \
First Read the data as rdd and then pass this rdd to df.read.csv()
data=sc.TextFile('/mnt/test/raw/data.csv')
firstRow=data.first()
data=data.filter(lambda row:row != firstRow)
df = spark.read.csv(data,header=True)
For reference of dataframe functions use the below link, This would serve as bible for all of the dataframe operations you need, for specific version of spark replace "latest" in url to whatever version you want:
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With