How can a reduce a key value pair to key and list of values?

Question

Let us Assume, I have a key value pair in Spark, such as the following.

[ (Key1, Value1), (Key1, Value2), (Key1, Vaue3), (Key2, Value4), (Key2, Value5) ]

Now I want to reduce this, to something like this.

[ (Key1, [Value1, Value2, Value3]), (Key2, [Value4, Value5]) ]

That is, from Key-Value to Key-List of Values.

How can I do that using the map and reduce functions in python or scala?

Vishnu Upadhyay · Accepted Answer

collections.defaultdict can be the solution https://docs.python.org/2/library/collections.html#collections.defaultdict

>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> for key, value in [('Key1', 'Value1'), ('Key1', 'Value2'), ('Key1', 'Vaue3'), ('Key2', 'Value4'), ('Key2', 'Value5') ]:
...     d[key].append(value)

>>> print d.items()
[('Key2', ['Value4', 'Value5']), ('Key1', [ 'Value1','Value2', 'Vaue3'])]

Sergii Lagutin · Answer

val data = Seq(("Key1", "Value1"), ("Key1", "Value2"), ("Key1", "Vaue3"), ("Key2", "Value4"), ("Key2", "Value5"))

data
  .groupBy(_._1)
  .mapValues(_.map(_._2))

res0: scala.collection.immutable.Map[String,Seq[String]] =
     Map(
        Key2 -> List(Value4, Value5), 
        Key1 -> List(Value1, Value2, Vaue3))

How can a reduce a key value pair to key and list of values?

Tags:

python

list

scala

apache-spark

bigdata

MetallicPriest

2 Answers

Vishnu Upadhyay

Sergii Lagutin

Recent Activity

Donate For Us

How can a reduce a key value pair to key and list of values?

Tags:

python

list

scala

apache-spark

bigdata

MetallicPriest

2 Answers

Vishnu Upadhyay

Sergii Lagutin

Related questions

Recent Activity

Donate For Us