Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to apply map function in Spark DataFrame using Java?

I am trying to use map function on DataFrame in Spark using Java. I am following the documentation which says

map(scala.Function1 f, scala.reflect.ClassTag evidence$4) Returns a new RDD by applying a function to all rows of this DataFrame.

While using the Function1 in map , I need to implement all the functions. I have seen some questions related to this , but the solution provided converts the DataFrame into RDD. How can I use the map function in DataFrame without converting it into a RDD also what is the second parameter of map ie scala.reflect.ClassTag<R> evidence$4

I am using Java 7 and Spark 1.6.

like image 252
talin Avatar asked Dec 07 '25 01:12

talin


1 Answers

I know your question is about Java 7 and Spark 1.6, but in Spark 2 (and obviously Java 8), you can have a map function as part of a class, so you do not need to manipulate Java lambdas.

The call would look like:

Dataset<String> dfMap = df.map(
    new CountyFipsExtractorUsingMap(),
    Encoders.STRING());
dfMap.show(5);

The class would look like:

  /**
   * Returns a substring of the values in the id2 column.
   * 
   * @author jgp
   */
  private final class CountyFipsExtractorUsingMap
      implements MapFunction<Row, String> {
    private static final long serialVersionUID = 26547L;

    @Override
    public String call(Row r) throws Exception {
      String s = r.getAs("id2").toString().substring(2);
      return s;
    }
  }

You can find more details in this example on GitHub.

like image 76
jgp Avatar answered Dec 08 '25 15:12

jgp



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!