My question is how to identify duplicate (repeating) 'groups' of data within an SQL table. I am using SQL Server 2005 at the moment so prefer solutions based on that or ansi-sql.
Here is a sample table and expected result (below) to base this question on:
declare @data table (id nvarchar(10), fund nvarchar(1), xtype nvarchar(1))
insert into @data select 'Switch_1', 'A', 'S'
insert into @data select 'Switch_1', 'X', 'B'
insert into @data select 'Switch_1', 'Y', 'B'
insert into @data select 'Switch_1', 'Z', 'B'
insert into @data select 'Switch_2', 'A', 'S'
insert into @data select 'Switch_2', 'X', 'B'
insert into @data select 'Switch_2', 'Y', 'B'
insert into @data select 'Switch_2', 'Z', 'B'
insert into @data select 'Switch_3', 'C', 'S'
insert into @data select 'Switch_3', 'D', 'B'
insert into @data select 'Switch_4', 'C', 'S'
insert into @data select 'Switch_4', 'F', 'B'
(new data)
insert into @data select 'Switch_5', 'A', 'S'
insert into @data select 'Switch_5', 'X', 'B'
insert into @data select 'Switch_5', 'Y', 'B'
insert into @data select 'Switch_5', 'Z', 'B'
-- id fund xtype match
-- ---------- ---- ----- ---------
-- Switch_1 A S Match_1
-- Switch_1 X B Match_1
-- Switch_1 Y B Match_1
-- Switch_1 Z B Match_1
-- Switch_2 A S Match_1
-- Switch_2 X B Match_1
-- Switch_2 Y B Match_1
-- Switch_2 Z B Match_1
-- Switch_3 C S
-- Switch_3 D B
-- Switch_4 C S
-- Switch_4 F B
(new results)
-- Switch_5 A S Match_1
-- Switch_5 X B Match_1
-- Switch_5 Y B Match_1
-- Switch_5 Z B Match_1
I only want matches on an ALL or NOTHING basis (i.e. All records in the group match all records in another group - not a part match). Any match id can be used (I have used Match_1 above but can be numeric etc.)
Thanks for any help here.
(EDIT: I guess I should add that there could be any number of rows per group, not just the 2 or 4 shown in the sample above - and I'm also trying to avoid cursors)
(EDIT 2: I seem to have an issue if there are more than one matches found. The output from the SQL supplied is returning duplicate records for Switch_1 when there are more than one matches found. I have updated the sample data accordingly. Not sure if Lieven is still following this - I'm also looking at the solution and will post here if found.)
q: Combine all funds and xtypes of one id into one string using an XML PATH constructionr: Select a ROW_NUMBER and the respective id's for matching groups@data and r;WITH q AS (
SELECT DISTINCT d.id
, DuplicateData = STUFF((SELECT ', ' + fund + xtype FROM @data WHERE id = d.id FOR XML PATH('')), 1, 2, '')
FROM @data d
)
, r AS (
SELECT id1 = q1.id
, id2 = q2.id
, rn = ROW_NUMBER() OVER (ORDER BY q1.ID)
FROM q q1
INNER JOIN q q2 ON q1.DuplicateData = q2.DuplicateData AND q1.id < q2.id
)
SELECT id
, fund
, xtype
, match = 'Match_' + CAST(r.rn AS VARCHAR(32))
FROM @data d
LEFT OUTER JOIN r ON d.id IN (r.id1, r.id2)
id fund xtype match
---------- ---- ----- --------------------------------------
Switch_1 A S Match_1
Switch_1 X B Match_1
Switch_1 Y B Match_1
Switch_1 Z B Match_1
Switch_2 A S Match_1
Switch_2 X B Match_1
Switch_2 Y B Match_1
Switch_2 Z B Match_1
Switch_3 C S NULL
Switch_3 D B NULL
Switch_4 C S NULL
Switch_4 F B NULL
Here it is another query for it:
create table #temp1 (
id varchar(10),
fund nvarchar(1),
xtype nvarchar(1)
)
insert into #temp1 select 'Switch_1', 'A', 'S'
insert into #temp1 select 'Switch_1', 'X', 'B'
insert into #temp1 select 'Switch_1', 'Y', 'B'
insert into #temp1 select 'Switch_1', 'Z', 'B'
insert into #temp1 select 'Switch_2', 'A', 'S'
insert into #temp1 select 'Switch_2', 'X', 'B'
insert into #temp1 select 'Switch_2', 'Y', 'B'
insert into #temp1 select 'Switch_2', 'Z', 'B'
insert into #temp1 select 'Switch_3', 'C', 'S'
insert into #temp1 select 'Switch_3', 'D', 'B'
insert into #temp1 select 'Switch_4', 'C', 'S'
insert into #temp1 select 'Switch_4', 'F', 'B'
select t1.*, case when t2.equal = t3.total then 'True' else 'False' end as 'Match' from #temp1 t1
left outer join (select m.id, count(m2.id) as 'equal' from #temp1 m
inner join #temp1 m2 on m.Id <> m2.Id and m.fund = m2.fund and m.xtype = m2.xtype
group by m.id) t2 on t1.id = t2.id
inner join (select m3.id, count(m3.fund) as 'total' from #temp1 m3 group by m3.id) t3 on t3.id = t1.id
drop table #temp1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With