Let's suppose we have a set of database tables that represent four key concepts:
The rules around Cohorts are:
The rules around Entities are:
(business_key, entity_type_id)entity_type_id may share a business_keyBecause pictures tell a thousand lines of code, here is the ERD:

I want a SQL query that, when provided a collection of (business_key, entity_type_id) pairs, will search for a Cohort that matches exactly, returning one row with just the cohort_id if that Cohort exists, and zero rows otherwise.
i.e. - if the set of Entities matchesentity_ids 1 and 2, it will only return a cohort_id where the cohort_members are exactly 1 and 2, not just 1, not just 2, not a cohort with entity_ids 1 2 and 3. If no cohort exists that satisfies this, then zero rows are returned.
To help people addressing the question, I have created a fiddle of the tables along with some data that defines various Entity Types, Entities, and Cohorts. There is also a table with test data for matching, named test_cohort. It contains 6 test cohorts which test various scenarios. The first 5 tests should exactly match just one cohort. The 6th test is a bogus one to test the zero-row clause. When using the test table, the associated INSERT statement should just have one line uncommented (see fiddle, it's set up like that initially):
http://sqlfiddle.com/#!18/2d022
My attempt in SQL is the following, though it fails tests #2 and #4 (which can be found in the fiddle):
SELECT actual_cohort_member.cohort_id
FROM test_cohort
INNER JOIN entity
ON entity.business_key = test_cohort.business_key
AND entity.entity_type_id = test_cohort.entity_type_id
INNER JOIN cohort_member AS existing_potential_member
ON existing_potential_member.entity_id = entity.entity_id
INNER JOIN cohort
ON cohort.cohort_id = existing_potential_member.cohort_id
RIGHT OUTER JOIN cohort_member AS actual_cohort_member
ON actual_cohort_member.cohort_id = cohort.cohort_id
AND actual_cohort_member.cohort_id = existing_potential_member.cohort_id
AND actual_cohort_member.entity_id = existing_potential_member.entity_id
GROUP BY actual_cohort_member.cohort_id
HAVING
SUM(CASE WHEN
actual_cohort_member.cohort_id = existing_potential_member.cohort_id AND
actual_cohort_member.entity_id = existing_potential_member.entity_id THEN 1 ELSE 0
END) = COUNT(*)
;
This scenario can be achieve by adding compound condition in the WHERE clause since you're comparing to a pair value. Then you have to count the result based from the conditions set in the WHERE clause as well as the total rows by of the cohort_id.
SELECT c.cohort_id
FROM cohort c
INNER JOIN cohort_member cm
ON c.cohort_id = cm.cohort_id
INNER JOIN entity e
ON cm.entity_id = e.entity_id
WHERE (e.entity_type_id = 1 AND e.business_key = 'acc1') -- condition here
OR (e.entity_type_id = 1 AND e.business_key = 'acc2')
GROUP BY c.cohort_id
HAVING COUNT(*) = 2 -- number must be the same to the total number of condition
AND (SELECT COUNT(*)
FROM cohort_member cm2
WHERE cm2.cohort_id = c.cohort_id) = 2 -- number must be the same to the total number of condition
As you can see in the test cases above, the value in the filter depends on the number of conditions in the WHERE clause. It would be advisable to create a dynamic query on this.
UPDATE
If the table test_cohort contains only one scenario, then this will suffice your requirement, however, if test_cohort contains list of scenarios then you might want to look in the other answer since this solution does not alter any table schema.
SELECT c.cohort_id
FROM cohort c
INNER JOIN cohort_member cm
ON c.cohort_id = cm.cohort_id
INNER JOIN entity e
ON cm.entity_id = e.entity_id
INNER JOIN test_cohort tc
ON tc.business_key = e.business_key
AND tc.entity_type_id = e.entity_type_id
GROUP BY c.cohort_id
HAVING COUNT(*) = (SELECT COUNT(*) FROM test_cohort)
AND (SELECT COUNT(*)
FROM cohort_member cm2
WHERE cm2.cohort_id = c.cohort_id) = (SELECT COUNT(*) FROM test_cohort)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With