As I read Spark/Hive SQL documentation is appears that Insert into table with a Column List is not supported in Spark 2.4 and earlier versions.
I have a source table and a destination table with different number of columns and different column names which I need to copy.
Does this mean I have to code this in PySpark to do this job as Spark SQL will not be able to do it ??
Example:
input_table( cola, colb, colc, cold, cole)
output_table(fieldx, fieldy, fieldz)
In SQL (assuming RDBMS such as MS-SQL, PostgreSQL etc) I would do the following:
insert into output_table(fieldx, fieldy, fieldz) select cola, colb, colc from input_table
Spark SQL does not allow this, it does not accept a column list in Insert SQL statement.
Question: how can I do this task with minimum of code and maximum performance in either PySpark or (ideally) in Spark-SQL (I am using Spark 2.4) ?
thank you
Specify the columns in output
that won't be copied from input_table
as null
in select
. (This is what would happen when only a set of columns, not all, would be insert
ed with a column list, if it were allowed)
insert into output_table
select cola, colb, colc,null as other1,--..specify non-copied column values as null
from input_table
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With