Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pig equivalent of SQL GREATEST / LEAST?

Tags:

apache-pig

I'm trying to find the Pig equivalent of the SQL functions GREATEST and LEAST. These functions are the scalar equivalent of the aggregate SQL functions MAX and MIN, respectively.

Essentially, I want to be able to say something like this:

x = LOAD 'file:///a/b/c.csv' USING PigStorage() AS (a: int, b: int, c: int);
y = FOREACH x GENERATE a AS a: int, b AS b: int, c AS c: int, GREATEST(a, b, c) AS g: int;

I know I could use bags and MAX to get this done, but I'm translating from another language into Pig and that implementation would be difficult to integrate.

Is there an "inline" approach I could use here? Some builtin function I'm overlooking, or maybe a UDF in Piggybank or DataFu, for example, would be ideal! If there's a completely "inline" version that uses bags and I'm just not thinking of it, that's fine too!

Thank you!

like image 391
sigpwned Avatar asked Jan 17 '26 15:01

sigpwned


1 Answers

It turns out that there are "inline" bag-based approaches that work:

x = LOAD 'file:///a/b/c.csv' USING PigStorage() AS (a: int, b: int, c: int);
y = FOREACH x GENERATE a AS a: int, b AS b: int, c AS c: int, MAX(TOBAG(a, b, c)) AS g: int;
like image 57
sigpwned Avatar answered Jan 21 '26 07:01

sigpwned