Page 1 of 1

SQL Command for finding duplicates in 4 different tables

Posted: Tue Apr 05, 2011 5:24 pm
by olga
Hello,

I need help with the following issue:

I want to remove duplicates emails from 4 different tables ( All 4 tables have the same columns: category, compnay's name, email address and website).What is the SQL command for this action, is there a way of deleting the duplicates automatically?

I was using this for checking the duplicates in 2 tables:
SELECT "Email" FROM "Table1", "Table2" WHERE "Table1"."Email" = "Table2"."Email"

it does notwork for 4 tables...Please help me!
Thank you very much,$

Olga

Re: SQL Command for finding duplicates in 4 different tables

Posted: Mon Apr 11, 2011 12:14 pm
by rudolfo
The SQL language has an operation UNION that does exactly this. Builds a collection of records from multiple tables but ignores any 2nd or 3rd appearance of the same record.

Code: Select all

SELECT category, compnay_name, email_address, website FROM table_1
UNION
SELECT category, compnay_name, email_address, website FROM table_2
UNION
SELECT category, compnay_name, email_address, website FROM table_3
UNION
SELECT category, compnay_name, email_address, website FROM table_4
Leave the individual tables alone and access the data through a query that uses this kind of statement.

If you are really eager to use as less storage as possible, you can find duplicates with the following query:

Code: Select all

SELECT category, compnay_name, email_address, website, count(*)
  FROM (SELECT category, compnay_name, email_address, website FROM table_1
        UNION ALL
        SELECT category, compnay_name, email_address, website FROM table_2
        UNION ALL
        SELECT category, compnay_name, email_address, website FROM table_3
        UNION ALL
        SELECT category, compnay_name, email_address, website FROM table_4)
 GROUP BY category, compnay_name, email_address, website
 HAVING count(*) > 1
First you build a collection of all the records in all 4 tables (the ALL with the UNION means not to filter any duplicates) then you see which combination of "category, compnay_name, email_address, website" appears more then only one time in this overall collection.
This gives you an overview but unless you process this with only 3 tables or only 2 tables you won't be able to say, if the duplicate barney@exmaple.org is in table_1 and table_2 or if it is in table_2 and table_4.

Instead of cleaning you would rather like to build a new table with the combined/merged data of the 4 existing ones.

Code: Select all

CREATE TABLE good_table AS
SELECT category, compnay_name, email_address, website FROM table_1
UNION
 ...
UNION
SELECT category, compnay_name, email_address, website FROM table_4
Of course if you have several other fields in your tables as well, the data in that tables would be lost this way.