Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Percentile calculation in Pig Latin

I'm trying to calculate percentile using Pig. I need to group data using an attribute and calculate percentiles for each tuple in the group based on sales.

I've seen there is no built in Pig function to do this. Wondering if anyone faced similar problem before can help me.

like image 415
Sudheer Someshwara Avatar asked Dec 09 '25 05:12

Sudheer Someshwara


1 Answers

As JaiPrakash mentioned, you can use the UDF StreamingQuantile from the Apache DataFu library. Since I already have an example ready, I'll just copy it here.

Input

item1,234
item1,324
item1,769
item2,23
item2,23
item2,45

PIG Script

register datafu-1.2.0.jar;
define Quantile datafu.pig.stats.StreamingQuantile('0.0','0.5','1.0');
data = load 'data' using PigStorage(',') as (item:chararray, value:int);
quantiles = FOREACH (GROUP data by item) GENERATE group, Quantile(data.value);
dump quantiles;

Output

(item1,(234.0,324.0,769.0))
(item2,(23.0,23.0,45.0))
like image 117
Jerome Serrano Avatar answered Dec 12 '25 01:12

Jerome Serrano