Can we remove duplicate records?

NewComer

Active member
Joined
Jan 2, 2010
Messages
34
Programming Experience
Beginner
I have some duplicate records in my MS Access table & if I use DELETE command, it will delete all.

- Can we have a simple command to delete one record (but remain one left)?

*) I try to use command "ALTER IGNORE TABLE mytable ADD UNIQUE INDEX(Col1,Col2)", but VB doesn't accept it!

Note: If I set Primary keys, then it can cause Error Message display in run-time ... I don't want that to be happen

Thank for any help :confused:
 
You can execute a query to get the minimum ID value for each set of values for the other columns, e.g.
VB.NET:
SELECT MIN(ID), FirstName, LastName
FROM Person
GROUP BY FirstName, LastName
You can then extend that to get just the IDs from that:
VB.NET:
SELECT ID FROM
(
    SELECT MIN(ID), FirstName, LastName
    FROM Person
    GROUP BY FirstName, LastName
) a
You can then extend that to delete every record where the ID is not in that list:
VB.NET:
DELETE FROM Person
WHERE ID NOT IN
(
    SELECT ID FROM
    (
        SELECT MIN(ID), FirstName, LastName
        FROM Person
        GROUP BY FirstName, LastName
    ) a
)
I'm not 100% sure that that exact syntax will work in Access but the principle will be the same.
 
I think the syntax isn't good for VB and MS Access table

Hi jmcilhinney,

I did try your suggestion but the VB debug shows error of the first command, I think the syntax isn't good for VB and MS Access table!

I also search for similar syntax for VB & MS Access table ... but found nothing close!

Thank and do you have any suggestion?
 
Hi jmcilhinney,

I found my problem from your suggestion, my table doesn't have ID column! I did insert it and the first syntax went through! However, the second one shows error again! By any chance, you can explain why we have the a) at the end of the syntax?

Thank for help!
 
Actually, now that you mention it, that 'a' probably shouldn't be there. That would be applicable when you wanted to join with the result set of that subquery, not when you want to use it in an IN clause. Sorry, I typed that straight into the forum so it's not been tested against an actual database.
 
YYou can then extend that to delete every record where the ID is not in that list:
VB.NET:
DELETE FROM Person
WHERE ID NOT IN
(
    SELECT ID FROM
    (
        SELECT MIN(ID), FirstName, LastName
        FROM Person
        GROUP BY FirstName, LastName
    ) a
)
I'm not 100% sure that that exact syntax will work in Access but the principle will be the same.

I typically advocate NOT using IN for this. Poorly implemented IN operations can cause a massive performance hit because they literally scan the input list in order for every row. A 1000 row query could end up doing a million comparisons. Never use IN on lists longer than you'd be prepared to type by hand. You'd be better off inserting the grouped results into another table, deleting the contents of the first table, and then moving the results back and deleting the temporary table

Actually, you'd be better off setting a primary key or unique index (dpending if you will allow nulls) in the first place ; it's more efficient not to insert duplciate data in the first place than take it out later, and if you have a criteria that identifies a duplicate row (such as same name and age) this is a candidate for a unique key

If there's an error in the program when duplicate data is attempted to be inserted that's actually the end result you want, so you can show the user a message "username already exists" or something similar
 
1. Yes you are right, I notify the execution lasting very long & might cause time-out. However, with the auto-increment ID ... now I have the different ID for each record & I found the way to eliminate the duplicate record, the hard way!

2. I try to avoid set the Primary Keys, because I am not sure where the error duplicate message happen? If it when I try to INSERT new record & in my TRY & CATCH ... then I can ignore the error (it does exist before)

Thank for the explanation
 
I set Primary keys then I can catch errors without duplicate records

I set Primary keys with Try & Catch errors, then try a force aome duplicate errors ... I did catch all of them & my table doesn't allow duplicate any more!

Cheers :D
 
However, with the auto-increment ID ... now I have the different ID for each record & I found the way to eliminate the duplicate record, the hard way
Well either:
Your data is its own key (i.e. it is a rule of your app that no two people shall have the same name and age)
Your data needs a separate key (i.e. any of your data may be duplicated)

If the first case, make the key on the data. If the second, make the key on the auto ID. DONT make an autoincrement ID then start a long process to remove duplicates! Stop duplicates in the first place!

2. I try to avoid set the Primary Keys, because I am not sure where the error duplicate message happen? If it when I try to INSERT new record & in my TRY & CATCH ... then I can ignore the error (it does exist before)

You shouldnt be updating primary keys anyway, so it can only happen during INSERT, but still .. what's the problem You do the insert, it throws an error, you CATCH the error in your code, and deal with it.. You should only be running the INSERT from one location in your code anyway
 
Thanks

As I mentioned before your suggestions that I did set 2 Primary keys & remove the ID autoincrement ... every thing working fine now!

Thank anyways
 
Back
Top