Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive aggregation function that produces a map

Tags:

sql

hadoop

hive

I have the following hive table

ID, class,value

1, A, 0.3

1, B, 0.4

1, C, 0.5

2, B, 0.1

2, C, 0.2

I want to get

ID, class:value

1, [A:0.3, B:0.4, C:0.5]

2, [B:0.1, C:0.2]


I know that there is a collect_set() UDAF that produces a list of class or list of value, is there anyway to get a list of key:value pairs?

NOTE: I guess I can use two collect_set() one for class column and one for value column but I am not sure if the lists will be in the same order.

like image 339
viper Avatar asked Jun 09 '26 03:06

viper


1 Answers

I've used the UnionUDAF from the Brickhouse library to do something similar. You create a map from each pair, and then union them all together during the aggregation.

Add JAR brickhouse.jar;
create temporary function BH_union as 'brickhouse.udf.collect.UnionUDAF';

SELECT S.ID, BH_union(S.v_map) 
FROM (SELECT ID, map(class, value) as v_map from mytable) S
GROUP by S.ID
like image 76
libjack Avatar answered Jun 10 '26 19:06

libjack