I'm trying to understand a simple (as in other languages) workflow with clojure maps.
It basically comes down to this: How can chain these operations?
group-by :key on a vector of maps
select-keys on remaining maps without the previous key
group-by again (0..n times) and select-keys
count unique key instances at the end.
See also my previous question: Aggregate and Count in Maps
Example:
Given a vector of maps
(def DATA [{:a "X", :b "M", :c "K", :d 10}
{:a "Y", :b "M", :c "K", :d 20}
{:a "Y", :b "M", :c "F", :d 30}
{:a "Y", :b "P", :c "G", :d 40}])
performing group-by
(defn get-tree-level-1 [] (group-by :a DATA))
yields a map grouped by the value of that particular key.
{ X [{:a X, :b M, :c K, :d 10}],
Y [{:a Y, :b M, :c K, :d 20}
{:a Y, :b M, :c F, :d 30}
{:a Y, :b P, :c G, :d 40}]}
So far, so good. But what if I want to build a tree-like structure out of the data, which means selecting the remaining keys and ignoring some, select :b and :c and ignore :d, which would yield in the next level:
(def DATA2 [{ :X [{:b "M", :c "K"}],
:Y [{:b "M", :c "K"}
{:b "M", :c "F"}
{:b "P", :c "G"}]}])
And finally, counting all instances of the remaining keys (e.g. count all unique values of the :b key under the Y-root):
(def DATA3 [{ :X [{:M 1}],
:Y [{:M 2}
{:P 1}])
I tried doing a select-keys after the group-by, but the result was empty after the first step:
(defn get-proc-sums []
(into {}
(map
(fn [ [k vs] ]
[k (select-keys vs [:b :c])])
(group-by :a DATA))))
Repeated application of group-by is the wrong tool: it doesn't compose with itself very well. Rather, go over your input maps and transform each of them, once, into a format that's useful to you (using for or map), and then reduce over that to build your tree structure. Here is a simple implementation:
(defn hierarchy [keyseq xs]
(reduce (fn [m [ks x]]
(update-in m ks conj x))
{}
(for [x xs]
[(map x keyseq) (apply dissoc x keyseq)])))
user> (hierarchy [:a :b :c] '[{:a "X", :b "M", :c "K", :d 10}
{:a "Y", :b "M", :c "K", :d 20}
{:a "Y", :b "M", :c "F", :d 30}
{:a "Y", :b "P", :c "G", :d 40}])
{"Y" {"P" {"G" ({:d 40})},
"M" {"F" ({:d 30}),
"K" ({:d 20})}},
"X" {"M" {"K" ({:d 10})}}}
This gives you the hierarchical format that you want, with a list of all maps with only the "leftover" keys. From this, you can count them, distinct them, remove the :d key, or whatever else you want, either by writing another function that processes this map, or by adjusting what happens in the reduce function, or the for comprehension, above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With