Coder Perfect

Finding duplicate rows in SQL Server

Problem

There are many duplicate rows in my SQL Server database of corporations. I’d like to use a select statement to get all of them and the number of duplicates, as well as the ids for each organization.

A statement like:

SELECT     orgName, COUNT(*) AS dupes  
FROM         organizations  
GROUP BY orgName  
HAVING      (COUNT(*) > 1)

Will give you something similar to

orgName        | dupes  
ABC Corp       | 7  
Foo Federation | 5  
Widget Company | 2 

But I’d like to get their IDs as well. Is there a way to accomplish this? Perhaps in the manner of a

orgName        | dupeCount | id  
ABC Corp       | 1         | 34  
ABC Corp       | 2         | 5  
...  
Widget Company | 1         | 10  
Widget Company | 2         | 2  

The reason for this is that there is a separate table of users who are linked to these organizations, which I’d like to merge (therefore remove dupes so the users link to the same organization instead of dupe orgs). But I’d want to do it manually so I don’t mess anything up, but I’d still need a statement that returns all of the dupe orgs’ IDs so I can go through the list of users.

Asked by xtine

Solution #1

select o.orgName, oc.dupeCount, o.id
from organizations o
inner join (
    SELECT orgName, COUNT(*) AS dupeCount
    FROM organizations
    GROUP BY orgName
    HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName

Answered by D’Arcy Rittich

Solution #2

You may use max(id) to discover duplicates and eliminate them with the following query.

SELECT orgName, COUNT(*), Max(ID) AS dupes 
FROM organizations 
GROUP BY orgName 
HAVING (COUNT(*) > 1)

However, you’ll need to perform this query several times.

Answered by Aykut Akıncı

Solution #3

You can do it this way:

SELECT
    o.id, o.orgName, d.intCount
FROM (
     SELECT orgName, COUNT(*) as intCount
     FROM organizations
     GROUP BY orgName
     HAVING COUNT(*) > 1
) AS d
    INNER JOIN organizations o ON o.orgName = d.orgName

You can use the following command to return only the records that can be deleted (leaving one of each):

SELECT
    id, orgName
FROM (
     SELECT 
         orgName, id,
         ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY id) AS intRow
     FROM organizations
) AS d
WHERE intRow != 1

The ROW NUMBER() function is not available in SQL Server 2000. Instead, you can use the following:

SELECT
    o.id, o.orgName, d.intCount
FROM (
     SELECT orgName, COUNT(*) as intCount, MIN(id) AS minId
     FROM organizations
     GROUP BY orgName
     HAVING COUNT(*) > 1
) AS d
    INNER JOIN organizations o ON o.orgName = d.orgName
WHERE d.minId != o.id

Answered by Paul

Solution #4

You can try it; it is the finest option for you.

 WITH CTE AS
    (
    SELECT *,RN=ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY orgName DESC) FROM organizations 
    )
    select * from CTE where RN>1
    go

Answered by code save

Solution #5

The solution marked as correct didn’t work for me, but I found this answer that worked just great: Get list of duplicate rows in MySql

SELECT n1.* 
FROM myTable n1
INNER JOIN myTable n2 
ON n2.repeatedCol = n1.repeatedCol
WHERE n1.id <> n2.id

Answered by ecairol

Post is based on https://stackoverflow.com/questions/2112618/finding-duplicate-rows-in-sql-server