how to use pyspark to read orc file

Question

there are two types compress file format for spark. one is parquet, it's very easy to read:

from pyspark.sql import HiveContext
hiveCtx = HiveContext(sc)
hiveCtx.parquetFile(parquetFile)

but for ocr file. I cannot find a good example to show me how to use pyspark to read.

Thiago Baldim · Accepted Answer

Well, there is two ways:

Spark 2.x:

orc_df = spark.read.orc('python/test_support/sql/orc_partitioned')

Spark 1.6:

df = hiveContext.read.orc('python/test_support/sql/orc_partitioned')

Donate For Us