Cannot have map type columns in DataFrame which calls set operations

Question

: org.apache.spark.sql.AnalysisException: Cannot have map type columns in DataFrame which calls set operations(intersect, except, etc.), but the type of column map_col is map

I have a hive table with a column of type - MAP<Float, Float>. I get the above error when I try to do an insertion on this table in a spark context. Insertion works fine without the 'distinct'.

create table test_insert2(`test_col` string, `map_col` MAP<INT,INT>) 
location 's3://mybucket/test_insert2';

insert into test_insert2 
select distinct 'a' as test_col, map(0,0) as map_col

notNull · Accepted Answer

Try to convert dataframe to .rdd then apply .distinct function.

Example:

spark.sql("select 'a'test_col,map(0,0)map_col 
              union all 
          select 'a'test_col,map(0,0)map_col").rdd.distinct.collect

Result:

Array[org.apache.spark.sql.Row] = Array([a,Map(0 -> 0)])

Cannot have map type columns in DataFrame which calls set operations

Tags:

apache-spark-sql

pyspark

hive

amazon-emr

jay.cs

1 Answers

notNull

Recent Activity

Donate For Us

Cannot have map type columns in DataFrame which calls set operations

Tags:

apache-spark-sql

pyspark

hive

amazon-emr

jay.cs

1 Answers

notNull

Related questions

Recent Activity

Donate For Us