I create an external partitioned table in hive.
in the logs it shows numinputrows. that means the query is working and sending data. but when I connect to hive using beeline and query, select * or count(*) it's always empty.
def hiveOrcSetWriter[T](event_stream: Dataset[T])( implicit spark: SparkSession): DataStreamWriter[T] = {
import spark.implicits._
val hiveOrcSetWriter: DataStreamWriter[T] = event_stream
.writeStream
.partitionBy("year","month","day")
.format("orc")
.outputMode("append")
.option("compression", "zlib")
.option("path", _table_loc)
.option("checkpointLocation", _table_checkpoint)
hiveOrcSetWriter
}
What can be the issue? I'm unable to understand.
msck repair table tablename
It give go and check the location of the table and adds partitions if new ones exits.
In your spark process add this step in order to query from hive.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With