I have searched online and the solutions provided online didn't resolve my issue. I am trying to read parquet files under a directory which are hierarchical. I am getting the following error.
'Unable to infer schema for Parquet. It must be specified manually.;'
My directory structure looks like: dbfs:/mnt/sales/region/country/2020/08/04
There will be multiple sub-directories for months under the year folder and subsequent sub-directories under month for days.
I only want to read them at the sales level which should give me for all the regions and I've tried both of the below codes but neither of them worked. Please help me with this.
spark.read.parquet("dbfs:/mnt/sales/*")
or
spark.read.parquet("dbfs:/mnt/sales/")
Can you try this option?
df = spark.read.option("header","true").option("recursiveFileLookup","true").parquet("/path/to/root/")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With