How to Delete Duplicate Rows in SQL?

In this section, we learn different ways to delete duplicate rows in MySQL and Oracle. If the SQL table contains duplicate rows, then we have to remove the duplicate rows.

Preparing sample data

The script creates the table named contacts.

DROP TABLE IF EXISTS contacts;  

CREATE TABLE contacts (  

id INT PRIMARY KEY AUTO_INCREMENT,  

first_name VARCHAR(30) NOT NULL,  

last_name VARCHAR(25) NOT NULL,   

    email VARCHAR(210) NOT NULL,  

    age VARCHAR(22) NOT NULL  

); 

    In the above table, we have inserted the following data.

    INSERT INTO contacts (first_name,last_name,email,age)   
    
    VALUES ('Kavin','Peterson','[email protected]','21'),  
    
           ('Nick','Jonas','[email protected]','18'),  
    
           ('Peter','Heaven','[email protected]','23'),  
    
           ('Michal','Jackson','[email protected]','22'),  
    
           ('Sean','Bean','[email protected]','23'),  
    
           ('Tom ','Baker','[email protected]','20'),  
    
           ('Ben','Barnes','[email protected]','17'),  
    
           ('Mischa ','Barton','[email protected]','18'),  
    
           ('Sean','Bean','[email protected]','16'),  
    
           ('Eliza','Bennett','[email protected]','25'),  
    
           ('Michal','Krane','[email protected]','25'),  
    
           ('Peter','Heaven','[email protected]','20'),  
    
           ('Brian','Blessed','[email protected]','20');  
    
           ('Kavin','Peterson','[email protected]','30'),  

      We execute the script to recreate test data after executing a DELETE statement.

      The query returns data from the contacts table:

      SELECT * FROM contacts  
      
      ORDER BY email;  
        idfirst_namelast_nameEmailage
        7BenBarnes[email protected]21
        13BrianBlessed[email protected]18
        10ElizaBennett[email protected]23
        1KavinPeterson[email protected]22
        14KavinPeterson[email protected]23
        8MischaBarton[email protected]20
        11MichalKrane[email protected]17
        4MichalJackson[email protected]18
        2NickJonas[email protected]16
        3PeterHeaven[email protected]25
        12PeterHeaven[email protected]25
        5SeanBean[email protected]20
        9SeanBean[email protected]20
        6TomBaker[email protected]30

        The following SQL query returns the duplicate emails from the contact table:

        SELECT  
        
            email, COUNT(email)  
        
        FROM  
        
            contacts  
        
        GROUP BY  
        
            email  
        
        HAVING  
        
        COUNT (email) > 1; 
          emailCOUNT(email)
          [email protected]2
          [email protected]2
          [email protected]2

          We have three rows with duplicate emails.

          (A) Delete duplicate rows with the DELETE JOIN statement

          DELETE t1 FROM contacts t1  
          
          INNERJOIN contacts t2   
          
          WHERE  
          
              t1.id < t2.id AND  
          
              t1.email = t2.email; 

            Output:

            Query OK, three rows affected (0.10 sec)  

            Three rows had been deleted. We execute the query, given below to finds the duplicate emails from the table.

            SELECT  
            
                email,   
            
            COUNT (email)  
            
            FROM  
            
                contacts  
            
            GROUP BY  
            
                email  
            
            HAVING  
            
            COUNT (email) > 1;

            The query returns the empty set. To verify the data from the contacts table, execute the following SQL query:

            SELECT * FROM contacts;  
            idfirst_namelast_nameEmailage
            7BenBarnes[email protected]21
            13BrianBlessed[email protected]18
            10ElizaBennett[email protected]23
            1KavinPeterson[email protected]22
            8MischaBarton[email protected]20
            11MichaKrane[email protected]17
            4MichalJackson[email protected]18
            2NickJonas[email protected]16
            3PeterHeaven[email protected]25
            5SeanBean[email protected]20
            6TomBaker[email protected]30

            The rows id’s 9, 12, and 14 have been deleted. We use the below statement to delete the duplicate rows:

            Execute the script for creating the contact.

            DELETE c1 FROM contacts c1  
            
            INNERJ OIN contacts c2   
            
            WHERE  
            
                c1.id > c2.id AND  
            
                c1.email = c2.email;
            idfirst_namelast_nameemailage
            1BenBarnes[email protected]21
            2KavinPeterson[email protected]22
            3BrianBlessed[email protected]18
            4NickJonas[email protected]16
            5MichalKrane[email protected]17
            6ElizaBennett[email protected]23
            7MichalJackson[email protected]18
            8SeanBean[email protected]20
            9MischaBarton[email protected]20
            10PeterHeaven[email protected]25
            11TomBaker[email protected]30

            (B) Delete duplicate rows using an intermediate table

            To delete a duplicate row by using the intermediate table, follow the steps given below:

            Step 1. Create a new table structure, same as the real table:

            CREATE TABLE source_copy LIKE source;  

            Step 2. Insert the distinct rows from the original schedule of the database:

            INSERT INTO source_copy  
            
            SELECT * FROM source  
            
            GROUP BY col; 

              Step 3. Drop the original table and rename the immediate table to the original one.

              DROP TABLE source;  
              
              ALTER TABLE source_copy RENAME TO source;  

                For example, the following statements delete the rows with duplicate emails from the contacts table:

                -- step 1  
                
                CREATE TABLE contacts_temp  
                
                LIKE contacts;  
                
                  
                
                -- step 2  
                
                INSERT INTO contacts_temp  
                
                SELECT * FROM contacts   
                
                GROUP BY email;  
                
                  
                
                -- step 3  
                
                DROP TABLE contacts;  
                
                  
                
                ALTER TABLE contacts_temp  
                
                RENAME TO contacts; 

                  (C) Delete duplicate rows using the ROW_NUMBER() Function

                  Note: The ROW_NUMBER() function has been supported since MySQL version 8.02, so we should check our MySQL version before using the function.

                  The following statement uses the ROW_NUMBER () to assign a sequential integer to every row. If the email is duplicate, the row will higher than one.

                  SELECT id, email, ROW_NUMBER()   
                  
                  OVER (PARTITION BY email   
                  
                  ORDER BY email  
                  
                      ) AS row_num  
                  
                  FROM contacts; 

                    The following SQL query returns id list of the duplicate rows:

                    SELECT id  
                    
                    FROM (SELECT id,  
                    
                    ROW_NUMBER() OVER (  
                    
                    PARTITION BY email ORDER BY email) AS row_num  
                    
                    FROM  
                    
                    contacts  
                    
                    ) t  
                    
                    WHERE  
                    
                    row_num> 1; 

                      Output:

                      id
                      9
                      12
                      14

                      Delete Duplicate Records in Oracle

                      When we found the duplicate records in the table, we had to delete the unwanted copies to keep our data clean and unique. If a table has duplicate rows, we can delete it by using the DELETE statement.

                      In the case, we have a column, which is not the part of group used to evaluate the duplicate records in the table.

                      Consider the table given below:

                      VEGETABLE_IDVEGETABLE_NAMECOLOR
                      01PotatoBrown
                      02PotatoBrown
                      03OnionRed
                      04OnionRed
                      05OnionRed
                      06PumpkinGreen
                      07PumpkinYellow

                      -- create the vegetable table  

                      CREATE TABLE vegetables (  

                      VEGETABLE_ID NUMBER generated BY DEFAULT AS ID ENTITY,  

                      VEGETABLE_NAME VARCHAR2(100),  

                      color VARCHAR2(20),  

                              PRIMARY KEY (VEGETABLE_ID)  

                      ); 

                      -- insert sample rows  

                      INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Potato','Brown');  

                      INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Potato','Brown');  

                      INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Onion','Red');  

                      INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Onion','Red');  

                      INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Onion','Red');  

                      INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Pumpkin','Green');  

                      INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Pumpkin','Yellow'); 

                      -- query data from the vegetable table  
                      
                      SELECT * FROM vegetables;  

                        Suppose, we want to keep the row with the highest VEGETABLE_ID and delete all other copies.

                        SELECT  
                        
                        MAX (VEGETABLE_ID)  
                        
                        FROM  
                        
                        vegetables  
                        
                        GROUP BY  
                        
                        VEGETABLE_NAME,  
                        
                        color  
                        
                        ORDER BY  
                        
                        MAX(VEGETABLE_ID);  
                          MAX(VEGETABLE_ID)
                          2
                          5
                          6
                          7

                          We use the DELETE statement to delete the rows whose values in the VEGETABLE_ID COLUMN are not the highest.

                          DELETE FROM  
                          
                          vegetables  
                          
                          WHERE  
                          
                          VEGETABLE_IDNOTIN  
                          
                            (  
                          
                          SELECT  
                          
                          MAX(VEGETABLE_ID)  
                          
                          FROM  
                          
                          vegetables  
                          
                          GROUP BY  
                          
                          VEGETABLE_NAME,  
                          
                          color  
                          
                          );  

                            Three rows have been deleted.

                            SELECT *FROM vegetables;  
                            VEGETABLE_IDVEGETABLE_NAMECOLOR
                            02PotatoBrown
                            05OnionRed
                            06PumpkinGreen
                            07Yellow

                            If we want to keep the row with the lowest id, use the MIN() function instead of the MAX() function.

                            DELETE FROM  
                            
                            vegetables  
                            
                            WHERE  
                            
                            VEGETABLE_IDNOTIN  
                            
                              (  
                            
                            SELECT  
                            
                            MIN(VEGETABLE_ID)  
                            
                            FROM  
                            
                            vegetables  
                            
                            GROUP BY  
                            
                            VEGETABLE_NAME,  
                            
                            color  
                            
                              ); 

                              The above method works if we have a column that is not part of the group for evaluating duplicate. If all values in the columns have copies, then we cannot use the VEGETABLE_ID column.

                              Let’s drop and create the vegetable table with a new structure.

                              DROP TABLE vegetables;  

                              CREATE TABLE vegetables (  

                              VEGETABLE_ID NUMBER,  

                              VEGETABLE_NAME VARCHAR2(100),  

                              Color VARCHAR2(20)  

                              );  

                              INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES(1,'Potato','Brown');  
                              
                              INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES(1, 'Potato','Brown');  
                              
                              INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color)VALUES(2,'Onion','Red');  
                              
                              INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color)VALUES(2,'Onion','Red');  
                              
                              INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES(2,'Onion','Red');  
                              
                              INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES(3,'Pumpkin','Green');  
                              
                              INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES('4,Pumpkin','Yellow');  
                              
                                
                              
                              SELECT * FROM vegetables; 
                                VEGETABLE_IDVEGETABLE_NAMECOLOR
                                01PotatoBrown
                                01PotatoBrown
                                02OnionRed
                                02OnionRed
                                02OnionRed
                                03PumpkinGreen
                                04PumpkinYellow

                                In the vegetable table, the values in all columns VEGETABLE_ID, VEGETABLE_NAME, and color have been copied.

                                We can use the rowid, a locator that specifies where Oracle stores the row. Because the rowid is unique so that we can use it to remove the duplicates rows.

                                DELETE  
                                
                                FROM  
                                
                                Vegetables  
                                
                                WHERE  
                                
                                rowed NOT IN  
                                
                                (  
                                
                                SELECT  
                                
                                MIN(rowid)  
                                
                                FROM  
                                
                                vegetables  
                                
                                GROUP BY  
                                
                                VEGETABLE_ID,  
                                
                                VEGETABLE_NAME,  
                                
                                color  
                                
                                  );

                                The query verifies the deletion operation:

                                SELECT * FROM vegetables;  
                                VEGETABLE_IDVEGETABLE_NAMECOLOR
                                01PotatoBrown
                                02OnionRed
                                03PumpkinGreen
                                04PumpkinYellow

                                Comments

                                Leave a Reply

                                Your email address will not be published. Required fields are marked *