Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cannot have map type columns in DataFrame which calls set operations

: org.apache.spark.sql.AnalysisException: Cannot have map type columns in DataFrame which calls set operations(intersect, except, etc.), but the type of column map_col is map

I have a hive table with a column of type - MAP<Float, Float>. I get the above error when I try to do an insertion on this table in a spark context. Insertion works fine without the 'distinct'.

create table test_insert2(`test_col` string, `map_col` MAP<INT,INT>) 
location 's3://mybucket/test_insert2';

insert into test_insert2 
select distinct 'a' as test_col, map(0,0) as map_col
like image 951
jay.cs Avatar asked Sep 07 '25 06:09

jay.cs


1 Answers

Try to convert dataframe to .rdd then apply .distinct function.

Example:

spark.sql("select 'a'test_col,map(0,0)map_col 
              union all 
          select 'a'test_col,map(0,0)map_col").rdd.distinct.collect

Result:

Array[org.apache.spark.sql.Row] = Array([a,Map(0 -> 0)])
like image 190
notNull Avatar answered Sep 09 '25 03:09

notNull