Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to execute hql files with multiple SQL queries per single file?

I have hql file which have a lot of hive queries and I would like to execute the whole file using Spark SQL.

This is what I have tried.

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) 

Usually to execute individual queries we do it this way:

sqlContext.sql("SELECT * from table")

However, when we have hql file with hundreds of queries, I use to do like this.

import scala.io.Source 

val filename = "/path/to/file/filename.hql"
for (line <- Source.fromFile(filename).getLines) {
    sqlContext.sql(line)
}

However, I get an error saying:

NoViableAltException

This is the top of the file.

DROP TABLE dat_app_12.12_app_htf;

CREATE EXTERNAL TABLE dat_app_12.12_app_htf(stc string,
  ftg string,
  product_type string,
  prod_number string,
  prod_ident_number string,
  prod_family string,
  frst_reg_date date, gth_name string,
  address string,
  tel string,
  maker_name string) ROW format serde 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
stored AS inputformat 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'file_location';

When the queries are multi-line queries like the above one, it doesn't work. However, when I format the queries and put all the lines into one line, it works.

CREATE EXTERNAL TABLE dat_app_12.12_app_htf(stc string, ftg string, product_type string, prod_number string, prod_ident_number string, prod_family string, frst_reg_date date, gth_name string, address string, tel string, maker_name string) ROW format serde 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' stored AS inputformat 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'file_location';

But I have thousands of lines like this. What is the proper way to do it.

Can anyone help in getting this solved.

like image 273
Natheer Alabsi Avatar asked Jan 30 '26 22:01

Natheer Alabsi


2 Answers

tl;dr I don't think it's possible.

Spark SQL uses AstBuilder as the ANTLR-based SQL parser and accepts a single SQL statement at a time (see SqlBase.g4 for the full coverage of all supported SQL queries).

With that said, the only way to do it is to parse the multi-query input file yourself before calling Spark SQL's sqlContext.sql (or spark.sql as of Spark 2.0).

You could rely on empty lines as separators perhaps, but that depends on how input files are structured (and they could easily use semicolon instead).


In your particular case I've noticed that the end markers are actually semicolons.

// one query that ends with semicolon
DROP TABLE dat_app_12.12_app_htf;

// another query that also ends with semicolon
CREATE EXTERNAL TABLE dat_app_12.12_app_htf(stc string,
  ftg string,
  product_type string,
  prod_number string,
  prod_ident_number string,
  prod_family string,
  frst_reg_date date, gth_name string,
  address string,
  tel string,
  maker_name string) ROW format serde 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
stored AS inputformat 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'file_location';

If that's consistent, you could parse the file line by line (as you do with for expression) and read until ; is found. Multiple-line SQL queries are fine for Spark SQL and so you should have your solution.


I had a similar use case in a project and simply gave up trying to parse all the possible ways people write SQLs before I could hand it over to Spark.

like image 159
Jacek Laskowski Avatar answered Feb 02 '26 14:02

Jacek Laskowski


Hey Did you try with this command

spark-sql –master yarn-client –conf spark.ui.port=port -i /hadoop/sbscript/hql_for_dml.hql
like image 42
subodh Avatar answered Feb 02 '26 13:02

subodh



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!