Consider the EmployeeName field in the Employee table. Based on the EmployeeName field, the purpose is to eliminate duplicate data.
EmployeeName ------------ Anand Anand Anil Dipak Anil Dipak Dipak Anil
I want to eliminate the records that are repeated using a single query.
How can this be accomplished in SQL Server using TSQL?
Asked by usr021986
This can be accomplished using window functions. The dupes will be sorted by empId and all but the first will be deleted.
delete x from ( select *, rn=row_number() over (partition by EmployeeName order by empId) from Employee ) x where rn > 1;
To see what would be erased, run it as a select:
select * from ( select *, rn=row_number() over (partition by EmployeeName order by empId) from Employee ) x where rn > 1;
Answered by John Gibb
Assuming your Employee table includes a unique field (ID in the example below), you can do something like this:
delete from Employee where ID not in ( select min(ID) from Employee group by EmployeeName );
The version with the lowest ID in the table will be left.
In response to McGyver’s statement, SQL 2012 is now available.
For versions of 2008 R2 and earlier,
For 2008R2, you’ll need to cast the GUID to a type that MIN understands, like as String.
delete from GuidEmployees where CAST(ID AS binary(16)) not in ( select min(CAST(ID AS binary(16))) from GuidEmployees group by EmployeeName );
SqlFiddle in SQL 2008 for many kinds
SqlFiddle in SQL 2012 for many kinds
Answered by StuartLC
You might want to try something like this:
delete T1 from MyTable T1, MyTable T2 where T1.dupField = T2.dupField and T1.uniqueField > T2.uniqueField
(This requires you have a unique integer-based field)
Personally, I believe you would be better served attempting to prevent duplicate records from being entered to the database rather than trying to fix it after the fact.
Answered by Ben Cawley
DELETE FROM MyTable WHERE ID NOT IN ( SELECT MAX(ID) FROM MyTable GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)
WITH TempUsers (FirstName, LastName, duplicateRecordCount) AS ( SELECT FirstName, LastName, ROW_NUMBER() OVER (PARTITIONBY FirstName, LastName ORDERBY FirstName) AS duplicateRecordCount FROM dbo.Users ) DELETE FROM TempUsers WHERE duplicateRecordCount > 1
Answered by Kumar Manish
WITH CTE AS ( SELECT EmployeeName, ROW_NUMBER() OVER(PARTITION BY EmployeeName ORDER BY EmployeeName) AS R FROM employee_table ) DELETE CTE WHERE R > 1;
Common table expressions’ enchantment.
Answered by Mostafa Elmoghazi
Post is based on https://stackoverflow.com/questions/3317433/delete-duplicate-records-in-sql-server