Pages

17 August 2013

SQL-Deleting Duplicate Records from a Table

Q How to delete duplicate records from a table ?
      This is one of the famous question asked by  an interviewer. Lets us see how to to delete duplicate records from a table. Table StudentInfo has duplicate records as shown below.


SudentInfo Table
Step 1: First we will create a temporary table say temp and copy all original table (StudentInfo Table) data into this very temporary table.


SELECT * INTO #Temp FROM StudentInfo

Now check data in temporary table # Temp

SELECT * FROM #Temp


Temporary Table

We see that all data from original table is copied into temporary table .

Step 2: Now we will delete all the duplicates record from the temporary table. For this, we will first group all records by ID and take the count of each group. Then we will delete those ID whose count is greater than 1.

DELETE FROM #Temp
WHERE ID IN (SELECT ID FROM StudentInfo GROUP BY ID HAVING COUNT(ID) > 1)






 All 6 duplicates records are deleted from temporary table. 2 non duplicate record is left in the temp table.

SELECT * FROM #Temp



Step 3: Now we will insert distinct records of duplicate records into temp table from original table as shown below. In this way , now temp table holds all distinct records.

INSERT INTO #Temp SELECT DISTINCT Name,ID 
FROM StudentInfo GROUP BY Name,ID 
HAVING COUNT(ID) >1




Step 4: Empty the Original table(StudentInfo).
DELETE FROM StudentInfo



Step 5: Now we will insert all the distinct records present in the temp table to the original table StudentInfo.

INSERT INTO StudentInfo SELECT * FROM #Temp



Step 6: Since there is no use of temp , we can drop that table.

DROP TABLE #Temp
Step 7: Now , the original table contains only the distinct records. All duplicates has been removed. we can see this as mentioned below.

SELECT * FROM StudentInfo




There is other popular way to delete duplicate records from the table using ROWID :


DELETE FROM  <TableName> WHERE ROWID NOT IN
 ( SELECT MIN(ROWID) FROM <TableName> GROUP BY  <ColumnName1> <ColumnName2>...)



7 comments: