Finding duplicates in cosmos db based on property value

Question

I'm working with a cosmos container with ~1 million documents. All documents have unique ids. There is another property in each document that is supposed to be unique. However, there are an unknown number of documents that contain duplicates of this property. Here's an example of the document structure:

{
     "id":"unique-id",    
     "property":"value-that-is-supposed-to-be-unique"
}

I'm trying to get all documents for which "property" is not unique but struggling to come up with a query to do so. Best I've been able to com up with is the following:

SELECT c.property, count(1) FROM c GROUP BY c.property

which returns all "property" values and a count for each, but this is largely inadequate. I'm looking for the best way to get all ids of documents that contain duplicated "property" values.

Aswin · Accepted Answer

You can use the below query to find the duplicates.

select d.property,d.tot_count from
(SELECT c.property,count(1) as tot_count FROM c
group  by c.property) as d
where d.tot_count>1

I reproduced this with sample data.

Input

[
    {
        "id": "1",
        "property": "p1"
    },
    {
        "id": "2",
        "property": "p1"
    },
    {
        "id": "3",
        "property": "p2"
    }
]

Output

[
    {
        "property": "p1",
        "tot_count": 2
    }
]

enter image description here

Finding duplicates in cosmos db based on property value

Tags:

azure

azure-cosmosdb

azure-cosmosdb-sqlapi

ben

1 Answers

Aswin

Recent Activity

Donate For Us

Finding duplicates in cosmos db based on property value

Tags:

azure

azure-cosmosdb

azure-cosmosdb-sqlapi

ben

1 Answers

Aswin

Related questions

Recent Activity

Donate For Us