I have a table like the following :
User:String Alias:String
JohnDoe     John
JohnDoe     JDoe
Roger       Roger
And I would like to group all the aliases of an user in an array, in a new table which would look like this :
User:String Alias:array<String>
JohnDoe     [John, JDoe]
Roger       [Roger]
I can't figure out how to do that with HiveQL.Do I have to write an UDF for that ?
Thanks !
Data aggregation is the process of gathering and expressing data in a summary to get more information about particular groups based on specific conditions. HQL offers several built-in aggregate functions, such as max(...), min(...), and avg(...).
The Hive split functions split given string into an array of values. This function will split on the given delimiter or a regular expression. Following is the syntax of split array function. where str is a string value to be split and pat is a delimiter or a regular expression.
Explode function syntaxselect explode (<MAP>) from <table_name>; It will return n number of rows where n is the size of the array/map. This function represent each element of array/map as a row.
count(*), count(expr), count(*) - Returns the total number of retrieved rows. It returns the sum of the elements in the group or the sum of the distinct values of the column in the group. It returns the average of the elements in the group or the average of the distinct values of the column in the group.
Check out the built-in aggregate function collect_set.
select 
    User, 
    collect_set(Alias) as Alias
from table
group by User;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With