How to apply map function in Spark DataFrame using Java?

Question

I am trying to use map function on DataFrame in Spark using Java. I am following the documentation which says

map(scala.Function1 f, scala.reflect.ClassTag evidence$4) Returns a new RDD by applying a function to all rows of this DataFrame.

While using the Function1 in map , I need to implement all the functions. I have seen some questions related to this , but the solution provided converts the DataFrame into RDD. How can I use the map function in DataFrame without converting it into a RDD also what is the second parameter of map ie scala.reflect.ClassTag<R> evidence$4

I am using Java 7 and Spark 1.6.

jgp · Accepted Answer

I know your question is about Java 7 and Spark 1.6, but in Spark 2 (and obviously Java 8), you can have a map function as part of a class, so you do not need to manipulate Java lambdas.

The call would look like:

Dataset<String> dfMap = df.map(
    new CountyFipsExtractorUsingMap(),
    Encoders.STRING());
dfMap.show(5);

The class would look like:

  /**
   * Returns a substring of the values in the id2 column.
   * 
   * @author jgp
   */
  private final class CountyFipsExtractorUsingMap
      implements MapFunction<Row, String> {
    private static final long serialVersionUID = 26547L;

    @Override
    public String call(Row r) throws Exception {
      String s = r.getAs("id2").toString().substring(2);
      return s;
    }
  }

You can find more details in this example on GitHub.

How to apply map function in Spark DataFrame using Java?

Tags:

java

apache-spark

apache-spark-sql

talin

1 Answers

jgp

Recent Activity

Donate For Us

How to apply map function in Spark DataFrame using Java?

Tags:

java

apache-spark

apache-spark-sql

talin

1 Answers

jgp

Related questions

Recent Activity

Donate For Us