ARRAY_CONTAINS vs JOIN in azure-cosmosDB

Question

The JSON documents that we plan to ingest into DocumentDb look as follows…

[
{"id":"id1","LastName": “user1”, "GroupMembership":["g1","g2"]},
{"id":"id2","LastName": “user2”, "GroupMembership":["g1","g4","g5"]},
{"id":"id3","LastName": “user3”, "GroupMembership":["g3","g4","g2"]},
…
]

We want to answer queries such as, get me count of all users who are members of group “g1” or “g2” etc…. The number of users is very large (few millions)… What is the best way to implement this query and use the index and avoid any scans… Should I be using ARRAY_CONTAINS or JOIN (does ARRAY_CONTAINS internally use the index or is it doing a scan)…

Option1)

SELECT VALUE COUNT(1) FROM Users WHERE ARRAY_CONTAINS(Users.GroupMembership, "g1") or ARRAY_CONTAINS(Users.GroupMembership, "g2")

Option2)

SELECT VALUE COUNT(1) FROM Users JOIN Membership in Users.GroupMembership WHERE Membership = "g1" or Membership = "g2"

Samer Boshra · Accepted Answer

Both queries should utilize the index the same way, but ARRAY_CONTAINS is likely to provide a better execution time compared to JOIN. You could profile both queries using the Query Metrics as per this article: https://learn.microsoft.com/en-us/azure/cosmos-db/documentdb-sql-query-metrics#query-execution-metrics

Andriy Ivaneyko · Answer

Both shall provide same index utilization, however with the JOIN usage you can get duplicating results per entry and with the ARRAY_CONTAINS you won't. I guess that difference is very significant. See more about duplicating issue in the replies to Getting duplicate records in select query for the Azure DocumentDB and Cosmos db joins give duplicate results SO question.

ARRAY_CONTAINS vs JOIN in azure-cosmosDB

Tags:

azure

azure-cosmosdb

durga prasad

2 Answers

Samer Boshra

Andriy Ivaneyko

Recent Activity

Donate For Us

ARRAY_CONTAINS vs JOIN in azure-cosmosDB

Tags:

azure

azure-cosmosdb

durga prasad

2 Answers

Samer Boshra

Andriy Ivaneyko

Related questions

Recent Activity

Donate For Us