Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I find the Maximum Year in a given dataset using PIG?

Tags:

apache-pig

Suppose I have the following dataset :-

Year Temp

1974 48
1974 48
1991 56
1983 89
1993 91
1938 41
1938 56
1941 93
1983 87

I want my final answer to be 93 ( Pertaining to the year 1941). I am able to find the Maximum temperature for each year(Say 1941-93) but unable to find only the maximum. Any suggestions are appreciated.

Thanks,

like image 445
Gautham Honnavara Avatar asked Dec 02 '25 08:12

Gautham Honnavara


1 Answers

You can solve this problem in two ways.

Option1: Using (Group ALL + MAX)

A = LOAD 'input' USING PigStorage() AS (Year:int,Temp:int);
B = GROUP A ALL;
C = FOREACH B GENERATE MAX(A.Temp);
DUMP C;

Output:

(93)

Option2: Using (ORDER and LIMIT)

A = LOAD 'input' USING PigStorage() AS (Year:int,Temp:int);
B = ORDER A BY Temp DESC;
C = LIMIT B 1;
D = FOREACH C GENERATE Temp;
DUMP D;

Output:

(93)
like image 187
Sivasakthi Jayaraman Avatar answered Dec 05 '25 04:12

Sivasakthi Jayaraman