I am relatively new to PigScript. I would like to know if there is a way of passing parameters to Java UDFs in Pig?
Here is the scenario: I have a log file which have different columns (each representing a Primary Key in another table). My task is to get the count of distinct primary key values in the selected column. I have written a Pig script which does the job of getting the distinct primary keys and counting them. However, I am now supposed to write a new UDF for each column. Is there a better way to do this? Like if I can pass a row number as parameter to UDF, it avoids the need for me writing multiple UDFs.
Pig provides extensive support for user defined functions (UDFs) as a way to specify custom processing. Pig UDFs can currently be implemented in six languages: Java, Jython, Python, JavaScript, Ruby and Groovy. The most extensive support is provided for Java functions.
In FOREACH GENERATE statements, we use the Eval functions. Basically, it accepts a Pig value as input and returns a Pig result.
To specify custom processing, Pig provides support for user-defined functions (UDFs). Thus, Pig allows us to create our own functions. Currently, Pig UDFs can be implemented using the following programming languages: - Java.
Which of the following is not true about Pig? B. Pig can not perform all the data manipulation operations in Hadoop.
The way to do it is by using DEFINE and the constructor of the UDF. So here is an example of a customer "splitter":
REGISTER com.sample.MyUDFs.jar;
DEFINE CommaSplitter com.sample.MySplitter(',');
B = FOREACH A GENERATE f1, CommaSplitter(f2);
Hopefully that conveys the idea.
To pass parameters you do the following in your pigscript:
UDF(document, '$param1', '$param2', '$param3')
edit: Not sure if those params need to be wrappedin ' ' or not
while in your UDF you do:
public class UDF extends EvalFunc<Boolean> {
public Boolean exec(Tuple input) throws IOException {
    if (input == null || input.size() == 0)
        return false;
    FileSystem fs = FileSystem.get(UDFContext.getUDFContext().getJobConf());
    String var1 = input.get(1).toString();
    InputStream var1In = fs.open(new Path(var1));
    String var2 = input.get(2).toString();
    InputStream var2In = fs.open(new Path(var2));
    String var3 = input.get(3).toString();
    InputStream var3In = fs.open(new Path(var3));
    return doyourthing(input.get(0).toString());
}
}
for example
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With