Operations involved in handling duplicate rows include the following:
- Preventing duplicates from being created in the frist place.
- Counting the number of duplicates to determine whether they are present and to what extent.
- Identifying duplicated values(or the rows containing them) so you can see where they occur.
- Eliminating duplicates to ensure that each row is unique.
Several tools are at your disposal for dealing with duplicate rows.choose them according to the objective that you want to achieve:
- When you create a table, include a primary key or unique index to prevent duplicates from being added to the table.MySQL uses the index as a constraint to enforce the requirement that each row in the table contains a unique key in the indexed column or columns.
- In conjunction with a unique index,the INSERT IGNORE and REPLACE statements enable you to handle insertion of duplicate rows gracefully without generating errors.
- To determine whether a table contains duplicates ,use group by to categorize rows into groups,and count() to see how many rows are in each group.
- SELECT DISTINCT removes duplicate row from a result set.
1.How to preventing duplicates from occurring in a table?
Problem
- prevent? a table from ever containing duplicates.
Solution:
- Use a PRIMARY KEY or a UNIQUE index
permits duplicate rows:
mysql> CREATE TABLE person ? ? -> ( ? ? -> ? last_name char(20), ? ? -> ? first_name char(20), ? ? -> ? address ?char(40) ? ? -> ); Query OK, 0 rows affected (0.01 sec)
prevent multiple rows, add a primary key to its fefinition. and then a primary key prohibits NULL values:
mysql> CREATE TABLE person ? ? -> ( ? ? -> ? last_name char(20) not null, ? ? -> ? first_name char(20) not null, ? ? -> ? address char(40), ? ? -> ? PRIMARY KEY (last_name,first_name) ? ? -> ); Query OK, 0 rows affected (0.01 sec)
mysql>?
Another way to enforce uniqueness is to add a UNIQUE index rather than a PRIMARY KEY to a table.The two types of indexes are similar,but a UNIQUE index can be created on columns that permit NULL values?
mysql> CREATE TABLE person ( ? ? ? ? -> ? last_name char(20) NOT NULL, ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? -> ? first_name char(20) NOT NULL, ? ? -> ? address ? char(40), ? ? -> ? UNIQUE ?(last_name,first_name) ? ? -> ); Query OK, 0 rows affected (0.02 sec)
mysql>?
Each person must be assigned some sort of unique identifier,which becomes the value that distingishes one row from another.In MySQL,it's common to accomplish this by using an AUTO_INCREMENT column:
mysql>? mysql> CREATE TABLE person ? ? -> ( ? ? -> ? id ? ? ? ? INT UNSIGNED NOT NULL AUTO_INCREMENT, ? ? -> ? last_name ?CHAR(20), ? ? -> ? first_name CHAR(20), ? ? -> ? address ? ?CHAR(40), ? ? -> ? PRIMARY KEY (id) ? ? -> ); Query OK, 0 rows affected (0.01 sec)
mysql>?
2.How to dealing with duplicates when loading rows into a table.
Problem:
You have created a table with a unique index to prevent duplicate values in the indexed column or columns. But this results in an error if you attempt to insert a duplicate row, and you want to avoid having to deal with such errors.
Solution:
- One approach is to just ignore the error.
- Another is to use an INSERT IGNORE,REPLACE or INSERT ... ON DUPLICATE KEY UPDATE statement
mysql> Drop table if exists person; Query OK, 0 rows affected (0.01 sec)
mysql> CREATE TABLE person ? ? -> ( ? ? -> ?last_name CHAR(20) NOT NULL, ? ? -> ?first_name CHAR(20) NOT NULL, ? ? -> ?address CHAR(40), ? ? -> ?PRIMARY KEY (last_name, first_name) ? ? -> ); Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO person (last_name, first_name) ? ? -> values('X1','Y1'); Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO person (last_name, first_name)? ? ? -> values('X1','Y1'); ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ERROR 1062 (23000): Duplicate entry 'X1-Y1' for key 'PRIMARY' mysql>?
MySQL Provides three single-query solutions to the problem of handling duplicate row.Choose from among them depending on the duplicate-handling behavior you want:
- To keep the original row when a duplicate occurs,use INSERT IGNORE than INSERT.
mysql> INSERT IGNORE INTO person (last_name, first_name)? ? ? -> VALUES('X2','Y2'); Query OK, 1 row affected (0.00 sec)
mysql> INSERT IGNORE INTO person (last_name, first_name)? ? ? -> VALUES('X2','Y2'); ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql>
- To replace the original row with the new one when a duplicate occurs,use REPLACE rather than INSERT.
mysql>? mysql> REPLACE INTO person (last_name,first_name) ? ? -> VALUES ('X3','Y3'); Query OK, 1 row affected (0.00 sec)
mysql> REPLACE INTO person (last_name,first_name)? ? ? -> VALUES ('X3','Y3'); ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? Query OK, 1 row affected (0.00 sec)
- To modify columns of an existing row when a duplicate occurs,use INSERT ... ON DUPLICATE KEY UPDATE.
mysql> CREATE TABLE poll_vote ? ? -> ( ? ? -> ?poll_id INT UNSIGNED NOT NULL AUTO_INCREMENT, ? ? -> ?candidate_id INT UNSIGNED, ? ? -> ?vote_count INT UNSIGNED, ? ? -> ?PRIMARY KEY (poll_id, candidate_id) ? ? -> ); Query OK, 0 rows affected (0.01 sec)
mysql> select * from poll_vote; Empty set (0.00 sec)
mysql> INSERT INTO poll_vote (poll_id,candidate_id,vote_count) VALUES(14,3,1) ? ? -> ON DUPLICATE KEY UPDATE vote_count = vote_count + 1; Query OK, 1 row affected (0.00 sec)
mysql> select * from poll_vote; +---------+--------------+------------+ | poll_id | candidate_id | vote_count | +---------+--------------+------------+ | ? ? ?14 | ? ? ? ? ? ?3 | ? ? ? ? ?1 | +---------+--------------+------------+ 1 row in set (0.00 sec)
mysql> INSERT INTO poll_vote (poll_id,candidate_id,vote_count) VALUES(14,3,1) ? ? -> ON DUPLICATE KEY UPDATE vote_count = vote_count + 1; Query OK, 2 rows affected (0.00 sec)
mysql> select * from poll_vote; +---------+--------------+------------+ | poll_id | candidate_id | vote_count | +---------+--------------+------------+ | ? ? ?14 | ? ? ? ? ? ?3 | ? ? ? ? ?2 | +---------+--------------+------------+ 1 row in set (0.00 sec)
mysql>?
3.How to counting and Identifying duplicates
Problem
You want to determine whether a table contains duplicates,and to what extent they occur.Or you want to see the rows that contain the duplicated values.
Solution
Use a counting summary that displays duplicated values.To see the rows in which the duplicated values occur.join the summary to the original table to display the matching rows.
mysql> CREATE TABLE catalog_list ( ? ? -> last_name CHAR(20) NOT NULL, ? ? -> first_name CHAR(20) NOT NULL, ? ? -> street CHAR(40), ? ? ? ? ? ? -> PRIMARY KEY (last_name, first_name) ? ? -> ); Query OK, 0 rows affected (0.01 sec)
mysql> CREATE TABLE catalog_list ( last_name CHAR(20) NOT NULL, first_name CHAR(20) NOT NULL, street CHAR(40)); ERROR 1050 (42S01): Table 'catalog_list' already exists mysql> Drop table if exists catalog_list; Query OK, 0 rows affected (0.01 sec)
mysql> CREATE TABLE catalog_list ( last_name CHAR(20) NOT NULL, first_name CHAR(20) NOT NULL, street CHAR(40)); Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('Isaacson','Jim','515 Fordam St., Apt. 917') ? ? -> ; Query OK, 1 row affected (0.02 sec)
mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('Baxter','Wallace','57 3rd Ave.'); Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('Baxter','Wallace','57 3rd Ave.'); Query OK, 1 row affected (0.02 sec)
mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('McTavish','Taylor','432 River Run'); Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('Pinter','Marlene','9 Sunset Trail'); Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('Pinter','Marlene','9 Sunset Trail'); Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('BAXTER','WALLACE','57 3rd Ave.'); Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('Brown','Bartholowmew','432 River Run'); Query OK, 1 row affected (0.01 sec)
mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('Baxter','Wallace','57 3rd Ave., Apt 102'); Query OK, 1 row affected (0.01 sec)
mysql> select count(*) as rows from catalog_list; +------+ | rows | +------+ | ? ?9 | +------+ 1 row in set (0.00 sec)
mysql> select count(*) - count(distinct last_name,first_name)? ? ? -> as 'duplicate names' ? ? -> from catalog_list; +-----------------+ | duplicate names | +-----------------+ | ? ? ? ? ? ? ? 4 | +-----------------+ 1 row in set (0.00 sec)
mysql> select count(distinct last_name,first_name)/ count(*) ? ? -> as 'unique', ? ? -> 1 - (count(distinct last_name,first_name) / count(*)) ? ? -> as 'nonunique' ? ? -> from catalog_list; +--------+-----------+ | unique | nonunique | +--------+-----------+ | 0.5556 | ? ?0.4444 | +--------+-----------+ 1 row in set (0.00 sec)
mysql> select count(*),last_name,first_name ? ? -> from catalog_list ? ? -> group by last_name,first_name ? ? -> having count(*) > 1; +----------+-----------+------------+ | count(*) | last_name | first_name | +----------+-----------+------------+ | ? ? ? ?4 | Baxter ? ?| Wallace ? ?| | ? ? ? ?2 | Pinter ? ?| Marlene ? ?| +----------+-----------+------------+ 2 rows in set (0.00 sec)
mysql>?
3.How to eliminating duplicates from a Table.
Problem:?
You want to remove duplicate rows from a table, leaving only unique rows.
Solution:
Select the unique rows from the table into a second table, then use it to replace the original one.Or use DELETE ...LIMIT n to remove all but one instance of a specific set of duplicates rows.
mysql> select * from catalog_list order by last_name,first_name; +-----------+--------------+--------------------------+ | last_name | first_name ? | street ? ? ? ? ? ? ? ? ? | +-----------+--------------+--------------------------+ | Baxter ? ?| Wallace ? ? ?| 57 3rd Ave. ? ? ? ? ? ? ?| | Baxter ? ?| Wallace ? ? ?| 57 3rd Ave. ? ? ? ? ? ? ?| | BAXTER ? ?| WALLACE ? ? ?| 57 3rd Ave. ? ? ? ? ? ? ?| | Baxter ? ?| Wallace ? ? ?| 57 3rd Ave., Apt 102 ? ? | | Brown ? ? | Bartholowmew | 432 River Run ? ? ? ? ? ?| | Isaacson ?| Jim ? ? ? ? ?| 515 Fordam St., Apt. 917 | | McTavish ?| Taylor ? ? ? | 432 River Run ? ? ? ? ? ?| | Pinter ? ?| Marlene ? ? ?| 9 Sunset Trail ? ? ? ? ? | | Pinter ? ?| Marlene ? ? ?| 9 Sunset Trail ? ? ? ? ? | +-----------+--------------+--------------------------+ 9 rows in set (0.00 sec) mysql>? mysql> CREATE TABLE tmp LIKE catalog_list; Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO tmp SELECT DISTINCT * FROM catalog_list; Query OK, 6 rows affected (0.01 sec) Records: 6 ?Duplicates: 0 ?Warnings: 0
mysql> select * from tmp order by last_name,first_name; +-----------+--------------+--------------------------+ | last_name | first_name ? | street ? ? ? ? ? ? ? ? ? | +-----------+--------------+--------------------------+ | Baxter ? ?| Wallace ? ? ?| 57 3rd Ave. ? ? ? ? ? ? ?| | Baxter ? ?| Wallace ? ? ?| 57 3rd Ave., Apt 102 ? ? | | Brown ? ? | Bartholowmew | 432 River Run ? ? ? ? ? ?| | Isaacson ?| Jim ? ? ? ? ?| 515 Fordam St., Apt. 917 | | McTavish ?| Taylor ? ? ? | 432 River Run ? ? ? ? ? ?| | Pinter ? ?| Marlene ? ? ?| 9 Sunset Trail ? ? ? ? ? | +-----------+--------------+--------------------------+ 6 rows in set (0.00 sec)
mysql> CREATE TABLE catalog_list_original LIKE catalog_list; ? ? ? ? ? ? ? ? ? ? Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO catalog_list_original SELECT * FROM catalog_list; ? ? ? ? ? ?? Query OK, 9 rows affected (0.00 sec) Records: 9 ?Duplicates: 0 ?Warnings: 0
mysql> drop table catalog_list; Query OK, 0 rows affected (0.01 sec)
mysql> rename table tmp to catalog_list; Query OK, 0 rows affected (0.00 sec)
mysql>?
Removing duplicates using table replacement
mysql> drop table catalog_list; Query OK, 0 rows affected (0.01 sec)
mysql> select * from catalog_list; ERROR 1146 (42S02): Table 'cookbook.catalog_list' doesn't exist mysql> create table catalog_list like catalog_list_original; Query OK, 0 rows affected (0.01 sec)
mysql> select * from catalog_list; Empty set (0.00 sec) mysql> insert into catalog_list select * from catalog_list_original; ? ? Query OK, 9 rows affected (0.01 sec) Records: 9 ?Duplicates: 0 ?Warnings: 0
mysql> CREATE TABLE tmp like catalog_list; Query OK, 0 rows affected (0.02 sec)
mysql> ALTER TABLE tmp ADD PRIMARY KEY (last_name,first_name); Query OK, 0 rows affected (0.03 sec) Records: 0 ?Duplicates: 0 ?Warnings: 0
mysql> INSERT IGNORE INTO tmp SELECT * FROM catalog_list; Query OK, 5 rows affected, 4 warnings (0.00 sec) Records: 9 ?Duplicates: 4 ?Warnings: 4
mysql> select * from tmp order by last_name,first_name; +-----------+--------------+--------------------------+ | last_name | first_name ? | street ? ? ? ? ? ? ? ? ? | +-----------+--------------+--------------------------+ | Baxter ? ?| Wallace ? ? ?| 57 3rd Ave. ? ? ? ? ? ? ?| | Brown ? ? | Bartholowmew | 432 River Run ? ? ? ? ? ?| | Isaacson ?| Jim ? ? ? ? ?| 515 Fordam St., Apt. 917 | | McTavish ?| Taylor ? ? ? | 432 River Run ? ? ? ? ? ?| | Pinter ? ?| Marlene ? ? ?| 9 Sunset Trail ? ? ? ? ? | +-----------+--------------+--------------------------+ 5 rows in set (0.00 sec)
mysql> drop table catalog_list; Query OK, 0 rows affected (0.01 sec) mysql> RENAME TABLE tmp to catalog_list; Query OK, 0 rows affected (0.01 sec)
Removing duplicates of a particular row
Use Limit to restrict the effect of a delete statement to a subset of the rows that it otherwise would delete.
mysql> select * from catalog_list; +-----------+--------------+--------------------------+ | last_name | first_name ? | street ? ? ? ? ? ? ? ? ? | +-----------+--------------+--------------------------+ | Baxter ? ?| Wallace ? ? ?| 57 3rd Ave. ? ? ? ? ? ? ?| | Brown ? ? | Bartholowmew | 432 River Run ? ? ? ? ? ?| | Isaacson ?| Jim ? ? ? ? ?| 515 Fordam St., Apt. 917 | | McTavish ?| Taylor ? ? ? | 432 River Run ? ? ? ? ? ?| | Pinter ? ?| Marlene ? ? ?| 9 Sunset Trail ? ? ? ? ? | +-----------+--------------+--------------------------+ 5 rows in set (0.00 sec)
mysql> drop table catalog_list; Query OK, 0 rows affected (0.02 sec)
mysql> create table catalog_list like catalog_list_original; Query OK, 0 rows affected (0.01 sec)
mysql> select * from catalog_list; Empty set (0.00 sec)
mysql> insert into catalog_list select * from catalog_list_original; Query OK, 9 rows affected (0.01 sec) Records: 9 ?Duplicates: 0 ?Warnings: 0
mysql> select * from catalog_list; +-----------+--------------+--------------------------+ | last_name | first_name ? | street ? ? ? ? ? ? ? ? ? | +-----------+--------------+--------------------------+ | Isaacson ?| Jim ? ? ? ? ?| 515 Fordam St., Apt. 917 | | Baxter ? ?| Wallace ? ? ?| 57 3rd Ave. ? ? ? ? ? ? ?| | Baxter ? ?| Wallace ? ? ?| 57 3rd Ave. ? ? ? ? ? ? ?| | McTavish ?| Taylor ? ? ? | 432 River Run ? ? ? ? ? ?| | Pinter ? ?| Marlene ? ? ?| 9 Sunset Trail ? ? ? ? ? | | Pinter ? ?| Marlene ? ? ?| 9 Sunset Trail ? ? ? ? ? | | BAXTER ? ?| WALLACE ? ? ?| 57 3rd Ave. ? ? ? ? ? ? ?| | Brown ? ? | Bartholowmew | 432 River Run ? ? ? ? ? ?| | Baxter ? ?| Wallace ? ? ?| 57 3rd Ave., Apt 102 ? ? | +-----------+--------------+--------------------------+ 9 rows in set (0.00 sec)
mysql> select count(*),last_name,first_name ? ? -> from catalog_list ? ? -> group by last_name,first_name ? ? -> having count(*) > 1; +----------+-----------+------------+ | count(*) | last_name | first_name | +----------+-----------+------------+ | ? ? ? ?4 | Baxter ? ?| Wallace ? ?| | ? ? ? ?2 | Pinter ? ?| Marlene ? ?| +----------+-----------+------------+ 2 rows in set (0.00 sec)
mysql> DELETE FROM catalog_list where last_name = 'Baxter' ? ? -> AND first_name = 'Wallace' limit 2; Query OK, 2 rows affected (0.00 sec)
mysql> DELETE FROM catalog_list where last_name = 'Pinter' ? ? -> AND first_name = 'Marlene' limit 1; Query OK, 1 row affected (0.00 sec)
mysql> select * from catalog_list; +-----------+--------------+--------------------------+ | last_name | first_name ? | street ? ? ? ? ? ? ? ? ? | +-----------+--------------+--------------------------+ | Isaacson ?| Jim ? ? ? ? ?| 515 Fordam St., Apt. 917 | | McTavish ?| Taylor ? ? ? | 432 River Run ? ? ? ? ? ?| | Pinter ? ?| Marlene ? ? ?| 9 Sunset Trail ? ? ? ? ? | | BAXTER ? ?| WALLACE ? ? ?| 57 3rd Ave. ? ? ? ? ? ? ?| | Brown ? ? | Bartholowmew | 432 River Run ? ? ? ? ? ?| | Baxter ? ?| Wallace ? ? ?| 57 3rd Ave., Apt 102 ? ? | +-----------+--------------+--------------------------+ 6 rows in set (0.00 sec)
mysql> DELETE FROM catalog_list where last_name = 'Baxter' AND first_name = 'Wallace' limit 1; Query OK, 1 row affected (0.01 sec)
mysql> select * from catalog_list; +-----------+--------------+--------------------------+ | last_name | first_name ? | street ? ? ? ? ? ? ? ? ? | +-----------+--------------+--------------------------+ | Isaacson ?| Jim ? ? ? ? ?| 515 Fordam St., Apt. 917 | | McTavish ?| Taylor ? ? ? | 432 River Run ? ? ? ? ? ?| | Pinter ? ?| Marlene ? ? ?| 9 Sunset Trail ? ? ? ? ? | | Brown ? ? | Bartholowmew | 432 River Run ? ? ? ? ? ?| | Baxter ? ?| Wallace ? ? ?| 57 3rd Ave., Apt 102 ? ? | +-----------+--------------+--------------------------+ 5 rows in set (0.00 sec)
mysql>?
|