Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing custom parameters to a pig udf function in java

Tags:

apache-pig

This is the way I am looking to process my data.. from pig..

A = Load 'data' ...

B = FOREACH A GENERATE my.udfs.extract(*);
or

B = FOREACH A GENERATE my.udfs.extract('flag');

So basically extract either has no arguments or takes an argument... 'flag'

On my udf side...

@Override
    public DataBag exec(Tuple input) throws IOException {
           //if flag == true
              //do this
           //else
              // do that
     }

Now how do i implement this in pig?

like image 949
frazman Avatar asked Mar 19 '26 17:03

frazman


1 Answers

The preferred way is to use DEFINE.

,,Use DEFINE to specify a UDF function when:
...
The constructor for the function takes string parameters. If you need to use different constructor parameters for different calls to the function you will need to create multiple defines – one for each parameter set"

E.g:

Given the following UDF:

public class Extract extends EvalFunc<String> {

    private boolean flag;

    public Extract(String flag) {
        //Note that a boolean param cannot be passed from script/grunt
        //therefore pass it as a string
        this.flag = Boolean.valueOf(flag);
    }

    public Extract() {
    }

    public String exec(Tuple input) throws IOException {

        if (input == null || input.size() == 0) {
            return null;
        }
        try {
            if (flag) {
                ...
            }
            else {
                ...
            }
        }
        catch (Exception e) {
            throw new IOException("Caught exception processing input row ", e);
        }
    }
}

Then

define ex_arg my.udfs.Extract('true');
define ex my.udfs.Extract();
...
B = foreach A generate ex_arg(); --calls extract with flag set to true
C = foreach A generate ex(); --calls extract without any flag set


Another option (hack?) :

In this case the UDF gets instantiated with its noarg constructor and you pass the flag you want to evaluate in its exec method. Since this method takes a tuple as a parameter you need to first check whether the first field is the boolean flag.

public class Extract extends EvalFunc<String> {

    public String exec(Tuple input) throws IOException {

        if (input == null || input.size() == 0) {
            return null;
        }
        try {
            boolean flag = false;
            if (input.getType(0) == DataType.BOOLEAN) {
                flag = (Boolean) input.get(0);
            }
            //process rest of the fields in the tuple
            if (flag) {
               ...
            }
            else {
               ...
            }
        }
        catch (Exception e) {
            throw new IOException("Caught exception processing input row ", e);
        }
    }
}

Then

...
B = foreach A generate Extract2(true,*); --use flag
C = foreach A generate Extract2();

I'd rather stick to the first solution as this smells.

like image 151
Lorand Bendig Avatar answered Mar 24 '26 21:03

Lorand Bendig