I start from a jsonlines file similar to this
{ "kw": "foo", "age": 1}
{ "kw": "foo", "age": 1}
{ "kw": "foo", "age": 1}
{ "kw": "bar", "age": 1}
{ "kw": "bar", "age": 1}
Please note each line is a valid json, but the whole file is not.
The output I'm seeking is an ordered list of keywords sorted by its occurrence. Like this:
[
{"kw": "foo", "count": 3},
{"kw": "bar", "count": 2}
]
I'm able to group and count the keywords using the slurp
option
jq --slurp '. | group_by(.kw) | .[] | {kw: .[0].kw, count: . | length }'
Output:
{"kw":"bar","count":2}
{"kw":"foo","count":3}
But:
A very stupid solution I've found, is to pass twice via jq
:)
jq --slurp --compact-output '. | group_by(.kw) | .[] | {kw: .[0].kw, count: . | length }' sample.json \
| jq --slurp --compact-output '. | sort_by(.count)'
But I'm pretty sure someone smarter than me can find a more elegant solution.
This is not sorted
That is not quite correct, group_by(.foo)
internally does a sort(.foo)
, so the results are shown in the sorted order of the field. See jq
Manual - group_by(path_expression)
This is not valid JSON array
Just enclose the operation within [..]
and also the leading .
is optional. So just do
jq --slurp --compact-output '[ group_by(.kw)[] | {kw: .[0].kw, count: length } ]'
If you are referring to sort by the .count
you can do a ascending sort and reverse
jq --slurp --compact-output '[ group_by(.kw)[] | {kw: .[0].kw, count: length }] | sort_by(.count) | reverse'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With