I am processing XML using scala, and I am converting the XML into my own data structures. Currently, I am using plain Map instances to hold (sub-)elements, however, the order of elements from the XML gets lost this way, and I cannot reproduce the original XML.
Therefore, I want to use LinkedHashMap instances instead of Map, however I am using groupBy on the list of nodes, which creates a Map:
For example:
def parse(n:Node): Unit =
{
val leaves:Map[String, Seq[XmlItem]] =
n.child
.filter(node => { ... })
.groupBy(_.label)
.map((tuple:Tuple2[String, Seq[Node]]) =>
{
val items = tuple._2.map(node =>
{
val attributes = ...
if (node.text.nonEmpty)
XmlItem(Some(node.text), attributes)
else
XmlItem(None, attributes)
})
(tuple._1, items)
})
...
}
In this example, I want leaves to be of type LinkedHashMap to retain the order of n.child. How can I achieve this?
Note: I am grouping by label/tagname because elements can occur multiple times, and for each label/tagname, I keep a list of elements in my data structures.
Solution
As answered by @jwvh I am using foldLeft as a substitution for groupBy. Also, I decided to go with LinkedHashMap instead of ListMap.
def parse(n:Node): Unit =
{
val leaves:mutable.LinkedHashMap[String, Seq[XmlItem]] =
n.child
.filter(node => { ... })
.foldLeft(mutable.LinkedHashMap.empty[String, Seq[Node]])((m, sn) =>
{
m.update(sn.label, m.getOrElse(sn.label, Seq.empty[Node]) ++ Seq(sn))
m
})
.map((tuple:Tuple2[String, Seq[Node]]) =>
{
val items = tuple._2.map(node =>
{
val attributes = ...
if (node.text.nonEmpty)
XmlItem(Some(node.text), attributes)
else
XmlItem(None, attributes)
})
(tuple._1, items)
})
To get the rough equivalent to .groupBy() in a ListMap you could fold over your collection. The problem is that ListMap preserves the order of elements as they were appended, not as they were encountered.
import collection.immutable.ListMap
List('a','b','a','c').foldLeft(ListMap.empty[Char,Seq[Char]]){
case (lm,c) => lm.updated(c, c +: lm.getOrElse(c, Seq()))
}
//res0: ListMap[Char,Seq[Char]] = ListMap(b -> Seq(b), a -> Seq(a, a), c -> Seq(c))
To fix this you can foldRight instead of foldLeft. The result is the original order of elements as encountered (scanning left to right) but in reverse.
List('a','b','a','c').foldRight(ListMap.empty[Char,Seq[Char]]){
case (c,lm) => lm.updated(c, c +: lm.getOrElse(c, Seq()))
}
//res1: ListMap[Char,Seq[Char]] = ListMap(c -> Seq(c), b -> Seq(b), a -> Seq(a, a))
This isn't necessarily a bad thing since a ListMap is more efficient with last and init ops, O(1), than it is with head and tail ops, O(n).
To process the ListMap in the original left-to-right order you could .toList and .reverse it.
List('a','b','a','c').foldRight(ListMap.empty[Char,Seq[Char]]){
case (c,lm) => lm.updated(c, c +: lm.getOrElse(c, Seq()))
}.toList.reverse
//res2: List[(Char, Seq[Char])] = List((a,Seq(a, a)), (b,Seq(b)), (c,Seq(c)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With