Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to use pyspark to read orc file

there are two types compress file format for spark. one is parquet, it's very easy to read:

from pyspark.sql import HiveContext
hiveCtx = HiveContext(sc)
hiveCtx.parquetFile(parquetFile)

but for ocr file. I cannot find a good example to show me how to use pyspark to read.

like image 790
Howardyan Avatar asked Oct 17 '25 13:10

Howardyan


1 Answers

Well, there is two ways:

Spark 2.x:

orc_df = spark.read.orc('python/test_support/sql/orc_partitioned')

Spark 1.6:

df = hiveContext.read.orc('python/test_support/sql/orc_partitioned')
like image 189
Thiago Baldim Avatar answered Oct 19 '25 09:10

Thiago Baldim