Problem
Consider the EmployeeName field in the Employee table. Based on the EmployeeName field, the purpose is to eliminate duplicate data.
EmployeeName
------------
Anand
Anand
Anil
Dipak
Anil
Dipak
Dipak
Anil
I want to eliminate the records that are repeated using a single query.
How can this be accomplished in SQL Server using TSQL?
Asked by usr021986
Solution #1
This can be accomplished using window functions. The dupes will be sorted by empId and all but the first will be deleted.
delete x from (
select *, rn=row_number() over (partition by EmployeeName order by empId)
from Employee
) x
where rn > 1;
To see what would be erased, run it as a select:
select *
from (
select *, rn=row_number() over (partition by EmployeeName order by empId)
from Employee
) x
where rn > 1;
Answered by John Gibb
Solution #2
Assuming your Employee table includes a unique field (ID in the example below), you can do something like this:
delete from Employee
where ID not in
(
select min(ID)
from Employee
group by EmployeeName
);
The version with the lowest ID in the table will be left.
In response to McGyver’s statement, SQL 2012 is now available.
For versions of 2008 R2 and earlier,
For 2008R2, you’ll need to cast the GUID to a type that MIN understands, like as String.
delete from GuidEmployees
where CAST(ID AS binary(16)) not in
(
select min(CAST(ID AS binary(16)))
from GuidEmployees
group by EmployeeName
);
SqlFiddle in SQL 2008 for many kinds
SqlFiddle in SQL 2012 for many kinds
Answered by StuartLC
Solution #3
You might want to try something like this:
delete T1
from MyTable T1, MyTable T2
where T1.dupField = T2.dupField
and T1.uniqueField > T2.uniqueField
(This requires you have a unique integer-based field)
Personally, I believe you would be better served attempting to prevent duplicate records from being entered to the database rather than trying to fix it after the fact.
Answered by Ben Cawley
Solution #4
DELETE
FROM MyTable
WHERE ID NOT IN (
SELECT MAX(ID)
FROM MyTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)
WITH TempUsers (FirstName, LastName, duplicateRecordCount)
AS
(
SELECT FirstName, LastName,
ROW_NUMBER() OVER (PARTITIONBY FirstName, LastName ORDERBY FirstName) AS duplicateRecordCount
FROM dbo.Users
)
DELETE
FROM TempUsers
WHERE duplicateRecordCount > 1
Answered by Kumar Manish
Solution #5
WITH CTE AS
(
SELECT EmployeeName,
ROW_NUMBER() OVER(PARTITION BY EmployeeName ORDER BY EmployeeName) AS R
FROM employee_table
)
DELETE CTE WHERE R > 1;
Common table expressions’ enchantment.
Answered by Mostafa Elmoghazi
Post is based on https://stackoverflow.com/questions/3317433/delete-duplicate-records-in-sql-server