I'm working with a cosmos container with ~1 million documents. All documents have unique ids. There is another property in each document that is supposed to be unique. However, there are an unknown number of documents that contain duplicates of this property. Here's an example of the document structure:
{
"id":"unique-id",
"property":"value-that-is-supposed-to-be-unique"
}
I'm trying to get all documents for which "property" is not unique but struggling to come up with a query to do so. Best I've been able to com up with is the following:
SELECT c.property, count(1) FROM c GROUP BY c.property
which returns all "property" values and a count for each, but this is largely inadequate. I'm looking for the best way to get all ids of documents that contain duplicated "property" values.
You can use the below query to find the duplicates.
select d.property,d.tot_count from
(SELECT c.property,count(1) as tot_count FROM c
group by c.property) as d
where d.tot_count>1
I reproduced this with sample data.
Input
[
{
"id": "1",
"property": "p1"
},
{
"id": "2",
"property": "p1"
},
{
"id": "3",
"property": "p2"
}
]
Output
[
{
"property": "p1",
"tot_count": 2
}
]

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With