SQL ,Data Warehouse and Testing Concepts: SQL-Deleting Duplicate Records from a Table

17 August 2013

SQL-Deleting Duplicate Records from a Table

Q How to delete duplicate records from a table ?
This is one of the famous question asked by an interviewer. Lets us see how to to delete duplicate records from a table. Table StudentInfo has duplicate records as shown below.

SudentInfo Table

Step 1: First we will create a temporary table say temp and copy all original table (StudentInfo Table) data into this very temporary table.

SELECT * INTO #Temp FROM StudentInfo

Now check data in temporary table # Temp

SELECT * FROM #Temp

Temporary Table

We see that all data from original table is copied into temporary table .

Step 2: Now we will delete all the duplicates record from the temporary table. For this, we will first group all records by ID and take the count of each group. Then we will delete those ID whose count is greater than 1.

DELETE FROM #Temp
WHERE ID IN (SELECT ID FROM StudentInfo GROUP BY ID HAVING COUNT(ID) > 1)

All 6 duplicates records are deleted from temporary table. 2 non duplicate record is left in the temp table.

SELECT * FROM #Temp

Step 3: Now we will insert distinct records of duplicate records into temp table from original table as shown below. In this way , now temp table holds all distinct records.

INSERT INTO #Temp SELECT DISTINCT Name,ID
FROM StudentInfo GROUP BY Name,ID
HAVING COUNT(ID) >1

Step 4: Empty the Original table(StudentInfo).
DELETE FROM StudentInfo

Step 5: Now we will insert all the distinct records present in the temp table to the original table StudentInfo.

INSERT INTO StudentInfo SELECT * FROM #Temp

Step 6: Since there is no use of temp , we can drop that table.

DROP TABLE #Temp
Step 7: Now , the original table contains only the distinct records. All duplicates has been removed. we can see this as mentioned below.

SELECT * FROM StudentInfo

There is other popular way to delete duplicate records from the table using ROWID :

DELETE FROM <TableName> WHERE ROWID NOT IN
( SELECT MIN(ROWID) FROM <TableName> GROUP BY <ColumnName1> <ColumnName2>...)

7 comments:

Unknown18 August 2013 at 08:38
nice one :)
ReplyDelete
Replies
Unknown18 August 2013 at 21:30
Gud one... :)
ReplyDelete
Replies
Unknown28 August 2013 at 01:48
Awesome piece of work :)
ReplyDelete
Replies
Sunil28 August 2013 at 09:08
Thank u Saurabh :)
ReplyDelete
Replies
Unknown24 October 2020 at 00:13
why can't we do it using dense_rank and partition
ReplyDelete
Replies

Add comment

Pages

17 August 2013

SQL-Deleting Duplicate Records from a Table

7 comments: