Problem
There are many duplicate rows in my SQL Server database of corporations. I’d like to use a select statement to get all of them and the number of duplicates, as well as the ids for each organization.
A statement like:
SELECT orgName, COUNT(*) AS dupes
FROM organizations
GROUP BY orgName
HAVING (COUNT(*) > 1)
Will give you something similar to
orgName | dupes
ABC Corp | 7
Foo Federation | 5
Widget Company | 2
But I’d like to get their IDs as well. Is there a way to accomplish this? Perhaps in the manner of a
orgName | dupeCount | id
ABC Corp | 1 | 34
ABC Corp | 2 | 5
...
Widget Company | 1 | 10
Widget Company | 2 | 2
The reason for this is that there is a separate table of users who are linked to these organizations, which I’d like to merge (therefore remove dupes so the users link to the same organization instead of dupe orgs). But I’d want to do it manually so I don’t mess anything up, but I’d still need a statement that returns all of the dupe orgs’ IDs so I can go through the list of users.
Asked by xtine
Solution #1
select o.orgName, oc.dupeCount, o.id
from organizations o
inner join (
SELECT orgName, COUNT(*) AS dupeCount
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName
Answered by D’Arcy Rittich
Solution #2
You may use max(id) to discover duplicates and eliminate them with the following query.
SELECT orgName, COUNT(*), Max(ID) AS dupes
FROM organizations
GROUP BY orgName
HAVING (COUNT(*) > 1)
However, you’ll need to perform this query several times.
Answered by Aykut Akıncı
Solution #3
You can do it this way:
SELECT
o.id, o.orgName, d.intCount
FROM (
SELECT orgName, COUNT(*) as intCount
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) AS d
INNER JOIN organizations o ON o.orgName = d.orgName
You can use the following command to return only the records that can be deleted (leaving one of each):
SELECT
id, orgName
FROM (
SELECT
orgName, id,
ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY id) AS intRow
FROM organizations
) AS d
WHERE intRow != 1
The ROW NUMBER() function is not available in SQL Server 2000. Instead, you can use the following:
SELECT
o.id, o.orgName, d.intCount
FROM (
SELECT orgName, COUNT(*) as intCount, MIN(id) AS minId
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) AS d
INNER JOIN organizations o ON o.orgName = d.orgName
WHERE d.minId != o.id
Answered by Paul
Solution #4
You can try it; it is the finest option for you.
WITH CTE AS
(
SELECT *,RN=ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY orgName DESC) FROM organizations
)
select * from CTE where RN>1
go
Answered by code save
Solution #5
The solution marked as correct didn’t work for me, but I found this answer that worked just great: Get list of duplicate rows in MySql
SELECT n1.*
FROM myTable n1
INNER JOIN myTable n2
ON n2.repeatedCol = n1.repeatedCol
WHERE n1.id <> n2.id
Answered by ecairol
Post is based on https://stackoverflow.com/questions/2112618/finding-duplicate-rows-in-sql-server