Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get an element in random from RDD

How can I efficiently select an element at random from an RDD of string?

like image 835
tourist Avatar asked Jan 29 '26 05:01

tourist


1 Answers

You'll need to use takeSample. Example :

val data = sc.parallelize(Range(1,100))
// data: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[9] at parallelize at <console>:27

data.takeSample(false,1)
// res9: Array[Int] = Array(38)

data.takeSample(false,1)
// res10: Array[Int] = Array(72)

data.takeSample(false,1)
// res11: Array[Int] = Array(93)

In case you wanted to fetch the same "random" element you can fix the seed :

data.takeSample(false, 1, seed = 10L)
// res14: Array[Int] = Array(62)

data.takeSample(false, 1, seed = 10L)
// res15: Array[Int] = Array(62)
like image 137
eliasah Avatar answered Jan 30 '26 19:01

eliasah