Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identifying duplicate GROUPS of data in SQL

My question is how to identify duplicate (repeating) 'groups' of data within an SQL table. I am using SQL Server 2005 at the moment so prefer solutions based on that or ansi-sql.

Here is a sample table and expected result (below) to base this question on:

declare @data table (id nvarchar(10), fund nvarchar(1), xtype nvarchar(1))

insert into @data select 'Switch_1', 'A', 'S'
insert into @data select 'Switch_1', 'X', 'B'
insert into @data select 'Switch_1', 'Y', 'B'
insert into @data select 'Switch_1', 'Z', 'B'
insert into @data select 'Switch_2', 'A', 'S'
insert into @data select 'Switch_2', 'X', 'B'
insert into @data select 'Switch_2', 'Y', 'B'
insert into @data select 'Switch_2', 'Z', 'B'
insert into @data select 'Switch_3', 'C', 'S'
insert into @data select 'Switch_3', 'D', 'B'
insert into @data select 'Switch_4', 'C', 'S'
insert into @data select 'Switch_4', 'F', 'B'

(new data)

insert into @data select 'Switch_5', 'A', 'S'
insert into @data select 'Switch_5', 'X', 'B'
insert into @data select 'Switch_5', 'Y', 'B'
insert into @data select 'Switch_5', 'Z', 'B'

--   id         fund xtype match
--   ---------- ---- ----- ---------
--   Switch_1   A    S     Match_1
--   Switch_1   X    B     Match_1
--   Switch_1   Y    B     Match_1
--   Switch_1   Z    B     Match_1
--   Switch_2   A    S     Match_1
--   Switch_2   X    B     Match_1
--   Switch_2   Y    B     Match_1
--   Switch_2   Z    B     Match_1
--   Switch_3   C    S     
--   Switch_3   D    B     
--   Switch_4   C    S     
--   Switch_4   F    B  

(new results)

--   Switch_5   A    S     Match_1
--   Switch_5   X    B     Match_1
--   Switch_5   Y    B     Match_1
--   Switch_5   Z    B     Match_1

I only want matches on an ALL or NOTHING basis (i.e. All records in the group match all records in another group - not a part match). Any match id can be used (I have used Match_1 above but can be numeric etc.)

Thanks for any help here.

(EDIT: I guess I should add that there could be any number of rows per group, not just the 2 or 4 shown in the sample above - and I'm also trying to avoid cursors)

(EDIT 2: I seem to have an issue if there are more than one matches found. The output from the SQL supplied is returning duplicate records for Switch_1 when there are more than one matches found. I have updated the sample data accordingly. Not sure if Lieven is still following this - I'm also looking at the solution and will post here if found.)

like image 631
Sean Avatar asked Dec 11 '25 19:12

Sean


2 Answers

The flow of execution is as follows

  • q: Combine all funds and xtypes of one id into one string using an XML PATH construction
  • r: Select a ROW_NUMBER and the respective id's for matching groups
  • Select the results by LEFT JOINING @data and r

SQL Statement

;WITH q AS (
  SELECT  DISTINCT d.id
          , DuplicateData = STUFF((SELECT ', ' + fund  + xtype FROM @data WHERE id = d.id FOR XML PATH('')), 1, 2, '')
  FROM    @data d
) 
, r AS (
  SELECT  id1 = q1.id
          , id2 = q2.id
          , rn = ROW_NUMBER() OVER (ORDER BY q1.ID)
  FROM    q q1
          INNER JOIN q q2 ON q1.DuplicateData = q2.DuplicateData AND q1.id < q2.id
)
SELECT  id
        , fund
        , xtype
        , match = 'Match_' + CAST(r.rn AS VARCHAR(32))
FROM    @data d
        LEFT OUTER JOIN r ON d.id IN (r.id1, r.id2)

Results

id         fund xtype match
---------- ---- ----- --------------------------------------
Switch_1   A    S     Match_1
Switch_1   X    B     Match_1
Switch_1   Y    B     Match_1
Switch_1   Z    B     Match_1
Switch_2   A    S     Match_1
Switch_2   X    B     Match_1
Switch_2   Y    B     Match_1
Switch_2   Z    B     Match_1
Switch_3   C    S     NULL
Switch_3   D    B     NULL
Switch_4   C    S     NULL
Switch_4   F    B     NULL
like image 120
Lieven Keersmaekers Avatar answered Dec 13 '25 08:12

Lieven Keersmaekers


Here it is another query for it:

create table #temp1 (
id varchar(10),
fund nvarchar(1),
xtype nvarchar(1)
)

insert into #temp1 select 'Switch_1', 'A', 'S'
insert into #temp1 select 'Switch_1', 'X', 'B'
insert into #temp1 select 'Switch_1', 'Y', 'B'
insert into #temp1 select 'Switch_1', 'Z', 'B'
insert into #temp1 select 'Switch_2', 'A', 'S'
insert into #temp1 select 'Switch_2', 'X', 'B'
insert into #temp1 select 'Switch_2', 'Y', 'B'
insert into #temp1 select 'Switch_2', 'Z', 'B'
insert into #temp1 select 'Switch_3', 'C', 'S'
insert into #temp1 select 'Switch_3', 'D', 'B'
insert into #temp1 select 'Switch_4', 'C', 'S'
insert into #temp1 select 'Switch_4', 'F', 'B'

select t1.*, case when t2.equal = t3.total then 'True' else 'False' end as 'Match' from #temp1 t1
left outer join (select m.id, count(m2.id) as 'equal' from #temp1 m
inner join #temp1 m2 on m.Id <> m2.Id and m.fund = m2.fund and m.xtype = m2.xtype
group by m.id) t2 on t1.id = t2.id
inner join (select m3.id, count(m3.fund) as 'total' from #temp1 m3 group by m3.id) t3 on t3.id = t1.id


drop table #temp1
like image 40
aF. Avatar answered Dec 13 '25 09:12

aF.



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!