Find ID of parent where all children exactly match

Question

The Scenario

Let's suppose we have a set of database tables that represent four key concepts:

Entity Types (e.g. account, client, etc.)
Entities (e.g. instances of the above Entity Types)
Cohorts (a named group)
Cohort Members (the Entities that form up the membership of a Cohort)

The rules around Cohorts are:

A Cohort always has at least one Cohort Member.
A Cohorts Members must be unique to that Cohort (i.e. Entity 5 cannot be a member of Cohort 3 twice, though it could be a member of Cohort 3 and Cohort 4)
No two Cohorts will ever be entirely equal in membership, though one Cohort may legitimately be a subset of another Cohort.

The rules around Entities are:

No two Entities may have the same value pair (business_key, entity_type_id)
Two entities with a different entity_type_id may share a business_key

Because pictures tell a thousand lines of code, here is the ERD:

ERD of Entities and Cohorts

The Question

I want a SQL query that, when provided a collection of (business_key, entity_type_id) pairs, will search for a Cohort that matches exactly, returning one row with just the cohort_id if that Cohort exists, and zero rows otherwise.

i.e. - if the set of Entities matchesentity_ids 1 and 2, it will only return a cohort_id where the cohort_members are exactly 1 and 2, not just 1, not just 2, not a cohort with entity_ids 1 2 and 3. If no cohort exists that satisfies this, then zero rows are returned.

The Test Cases

To help people addressing the question, I have created a fiddle of the tables along with some data that defines various Entity Types, Entities, and Cohorts. There is also a table with test data for matching, named test_cohort. It contains 6 test cohorts which test various scenarios. The first 5 tests should exactly match just one cohort. The 6th test is a bogus one to test the zero-row clause. When using the test table, the associated INSERT statement should just have one line uncommented (see fiddle, it's set up like that initially):

http://sqlfiddle.com/#!18/2d022

My attempt in SQL is the following, though it fails tests #2 and #4 (which can be found in the fiddle):

SELECT actual_cohort_member.cohort_id
FROM test_cohort
INNER JOIN entity
    ON entity.business_key = test_cohort.business_key
    AND entity.entity_type_id = test_cohort.entity_type_id
INNER JOIN cohort_member AS existing_potential_member
    ON existing_potential_member.entity_id = entity.entity_id
INNER JOIN cohort
    ON cohort.cohort_id = existing_potential_member.cohort_id
RIGHT OUTER JOIN cohort_member AS actual_cohort_member
    ON actual_cohort_member.cohort_id = cohort.cohort_id
    AND actual_cohort_member.cohort_id = existing_potential_member.cohort_id
    AND actual_cohort_member.entity_id = existing_potential_member.entity_id
GROUP BY actual_cohort_member.cohort_id
HAVING
    SUM(CASE WHEN
        actual_cohort_member.cohort_id = existing_potential_member.cohort_id AND
        actual_cohort_member.entity_id = existing_potential_member.entity_id THEN 1 ELSE 0
    END) = COUNT(*)
;

John Woo · Accepted Answer

This scenario can be achieve by adding compound condition in the WHERE clause since you're comparing to a pair value. Then you have to count the result based from the conditions set in the WHERE clause as well as the total rows by of the cohort_id.

SELECT  c.cohort_id
FROM    cohort c
        INNER JOIN cohort_member cm
            ON c.cohort_id = cm.cohort_id
        INNER JOIN entity e
            ON cm.entity_id = e.entity_id
WHERE   (e.entity_type_id = 1 AND e.business_key = 'acc1')      -- condition here
         OR (e.entity_type_id = 1 AND e.business_key = 'acc2')
GROUP   BY c.cohort_id
HAVING  COUNT(*) = 2                                            -- number must be the same to the total number of condition
        AND (SELECT COUNT(*) 
             FROM cohort_member cm2 
             WHERE cm2.cohort_id = c.cohort_id) = 2             -- number must be the same to the total number of condition

Test Case #1
Test Case #2
Test Case #3
Test Case #4
Test Case #5
Test Case #6

As you can see in the test cases above, the value in the filter depends on the number of conditions in the WHERE clause. It would be advisable to create a dynamic query on this.

UPDATE

If the table test_cohort contains only one scenario, then this will suffice your requirement, however, if test_cohort contains list of scenarios then you might want to look in the other answer since this solution does not alter any table schema.

SELECT  c.cohort_id
FROM    cohort c
        INNER JOIN cohort_member cm
            ON c.cohort_id = cm.cohort_id
        INNER JOIN entity e
            ON cm.entity_id = e.entity_id
        INNER JOIN test_cohort tc
            ON tc.business_key = e.business_key
                AND tc.entity_type_id = e.entity_type_id
GROUP   BY c.cohort_id
HAVING  COUNT(*) = (SELECT COUNT(*) FROM test_cohort)
        AND (SELECT COUNT(*) 
             FROM cohort_member cm2 
             WHERE cm2.cohort_id = c.cohort_id) = (SELECT COUNT(*) FROM test_cohort)

Test Case #1
Test Case #2
Test Case #3
Test Case #4
Test Case #5
Test Case #6

Find ID of parent where all children exactly match

Tags:

sql

sql-server

set

t-sql

The Scenario

The Question

The Test Cases

e_i_pi

1 Answers

John Woo

Recent Activity

Donate For Us

Find ID of parent where all children exactly match

Tags:

sql

sql-server

set

t-sql

The Scenario

The Question

The Test Cases

e_i_pi

1 Answers

John Woo

Related questions

Recent Activity

Donate For Us