How to execute hql files with multiple SQL queries per single file?

Question

I have hql file which have a lot of hive queries and I would like to execute the whole file using Spark SQL.

This is what I have tried.

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

Usually to execute individual queries we do it this way:

sqlContext.sql("SELECT * from table")

However, when we have hql file with hundreds of queries, I use to do like this.

import scala.io.Source 

val filename = "/path/to/file/filename.hql"
for (line <- Source.fromFile(filename).getLines) {
    sqlContext.sql(line)
}

However, I get an error saying:

NoViableAltException

This is the top of the file.

DROP TABLE dat_app_12.12_app_htf;

CREATE EXTERNAL TABLE dat_app_12.12_app_htf(stc string,
  ftg string,
  product_type string,
  prod_number string,
  prod_ident_number string,
  prod_family string,
  frst_reg_date date, gth_name string,
  address string,
  tel string,
  maker_name string) ROW format serde 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
stored AS inputformat 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'file_location';

When the queries are multi-line queries like the above one, it doesn't work. However, when I format the queries and put all the lines into one line, it works.

CREATE EXTERNAL TABLE dat_app_12.12_app_htf(stc string, ftg string, product_type string, prod_number string, prod_ident_number string, prod_family string, frst_reg_date date, gth_name string, address string, tel string, maker_name string) ROW format serde 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' stored AS inputformat 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'file_location';

But I have thousands of lines like this. What is the proper way to do it.

Can anyone help in getting this solved.

Jacek Laskowski · Accepted Answer

tl;dr I don't think it's possible.

Spark SQL uses AstBuilder as the ANTLR-based SQL parser and accepts a single SQL statement at a time (see SqlBase.g4 for the full coverage of all supported SQL queries).

With that said, the only way to do it is to parse the multi-query input file yourself before calling Spark SQL's sqlContext.sql (or spark.sql as of Spark 2.0).

You could rely on empty lines as separators perhaps, but that depends on how input files are structured (and they could easily use semicolon instead).

In your particular case I've noticed that the end markers are actually semicolons.

// one query that ends with semicolon
DROP TABLE dat_app_12.12_app_htf;

// another query that also ends with semicolon
CREATE EXTERNAL TABLE dat_app_12.12_app_htf(stc string,
  ftg string,
  product_type string,
  prod_number string,
  prod_ident_number string,
  prod_family string,
  frst_reg_date date, gth_name string,
  address string,
  tel string,
  maker_name string) ROW format serde 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
stored AS inputformat 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'file_location';

If that's consistent, you could parse the file line by line (as you do with for expression) and read until ; is found. Multiple-line SQL queries are fine for Spark SQL and so you should have your solution.

I had a similar use case in a project and simply gave up trying to parse all the possible ways people write SQLs before I could hand it over to Spark.

subodh · Answer

Hey Did you try with this command

spark-sql –master yarn-client –conf spark.ui.port=port -i /hadoop/sbscript/hql_for_dml.hql

How to execute hql files with multiple SQL queries per single file?

Tags:

scala

apache-spark

apache-spark-sql

hadoop

hive

Natheer Alabsi

2 Answers

Jacek Laskowski

subodh

Recent Activity

Donate For Us

How to execute hql files with multiple SQL queries per single file?

Tags:

scala

apache-spark

apache-spark-sql

hadoop

hive

Natheer Alabsi

2 Answers

Jacek Laskowski

subodh

Related questions

Recent Activity

Donate For Us