There are millions of record in table. And need to calculate number of duplicate rows present in my table in Redshift. I could achieve it by using below query,
select
sum(cnt) from (select <primary_key>
, count(*)-1 as cnt
from
table_name
group by
<primary_key> having count(*)>1
Thanks.
You can try the following query:
SELECT Column_name, COUNT(*) Count_Duplicate
FROM Table_name
GROUP BY Column_name
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
If the criteria of duplication is only repeating primary key then
SELECT count(1)-count(distinct <primary_key>) FROM your_table
would work, except if you have specified your column as primary key in Redshift (it doesn't enforce constraint but if you mark a column as primary key count(distinct <primary_key>) will return the same as count(1) even if there are duplicate values in this column
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With