Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB: groupby subdocument and count + add total count

What I want to achieve

Suppose having the following subdocument:

{
    "id":1,
    "url":"mysite.com",
    "views": 
     [
       {"ip":"1.1.1.1","date":"01-01-2015"},
       {"ip":"2.2.2.2","date":"01-01-2015"},
       {"ip":"1.1.1.1","date":"01-01-2015"},
       {"ip":"1.1.1.1","date":"01-01-2015"}
     ]
}

I want to count:

  1. how many IPs there are based on the "ip" value
  2. and also count the total of subdocuments in "views"

if possible in the same query, to achieve the following result:

[  
  {  
    "_id":"2.2.2.2",
    "count":1
  },
  {  
    "_id":"1.1.1.1",
    "count":3
  },
  {  
    "_id":"total",
    "count":4
  }
]

What I have achieved so far

Using the MongoDB Aggregation Framework I have managed to achieve point 1. the following way:

db.collection.aggregate([
    {
        "$unwind": "$views"
    },
    {
        "$group": {
            "_id": "$views.ip",
            "count": {
                "$sum": 1
            }
        }
    }
])

which returns:

[  
  {  
    "_id":"2.2.2.2",
    "count":1
  },
  {  
    "_id":"1.1.1.1",
    "count":3
  }
]

I wish to return that extra doc inside the array, that would be:

{  
  "_id":"total",
  "count":4
}

to achieve what I exposed above, but I am stuck there and haven't been able to do so.

like image 372
charliebrownie Avatar asked Feb 02 '26 09:02

charliebrownie


1 Answers

Not possible within the same aggregation pipeline as in principle the pipeline processes documents as they pass through it i.e. the pipeline stages do not need to produce one output document for every input document; e.g., some stages may generate new documents or filter out documents. In the above scenario, adding another $group step to get the grouped IP counts + total count would produce a different result to what you are after i.e.

db.collection.aggregate([
    {
        "$unwind": "$views"
    },
    {
        "$group": {
            "_id": "$views.ip",
            "count": {
                "$sum": 1
            }
        }
    },
    {
        "$group": {
            "_id": null,
            "total": {
                "$sum": "$count"
            }
        }
    }
])

You will only get the total count since the $group consumes all input documents (documents with the grouped IP counts) and outputs one document per each distinct group. That extra group step will group all the documents from the previous stream.

However, you could get the total count but as an extra field within each grouped document in your final result. The following example which uses an initial $project pipeline stage to get the total count via the $size operator accomplishes this:

db.collection.aggregate([
    {
        "$project": {
            "views": 1,
            "views_size": { "$size": "$views" }
        }
    }
    {
        "$unwind": "$views"
    },
    {
        "$group": {
            "_id": "$views.ip",
            "count": {
                "$sum": 1
            },
            "total": { "$first": "$views_size" }
        }
    }
])

Sample Output

[  
  {  
    "_id": "2.2.2.2",
    "count": 1,
    "total": 4
  },
  {  
    "_id": "1.1.1.1",
    "count": 3,
    "total": 4
  }
]
like image 129
chridam Avatar answered Feb 04 '26 01:02

chridam



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!