I am trying to use map function on DataFrame in Spark using Java. I am following the documentation which says
map(scala.Function1 f, scala.reflect.ClassTag evidence$4) Returns a new
RDDby applying a function to all rows of this DataFrame.
While using the Function1 in map , I need to implement all the functions. I have seen some questions related to this , but the solution provided converts the DataFrame into RDD.
How can I use the map function in DataFrame without converting it into a RDD also what is the second parameter of map ie scala.reflect.ClassTag<R> evidence$4
I am using Java 7 and Spark 1.6.
I know your question is about Java 7 and Spark 1.6, but in Spark 2 (and obviously Java 8), you can have a map function as part of a class, so you do not need to manipulate Java lambdas.
The call would look like:
Dataset<String> dfMap = df.map(
new CountyFipsExtractorUsingMap(),
Encoders.STRING());
dfMap.show(5);
The class would look like:
/**
* Returns a substring of the values in the id2 column.
*
* @author jgp
*/
private final class CountyFipsExtractorUsingMap
implements MapFunction<Row, String> {
private static final long serialVersionUID = 26547L;
@Override
public String call(Row r) throws Exception {
String s = r.getAs("id2").toString().substring(2);
return s;
}
}
You can find more details in this example on GitHub.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With