Duplicate entries in SQlite database

EMTools · June 4, 2008, 3:20pm

Hi there,

I was wondering what the best way would be to quickly remove/delete duplicate entries from a SQlite table.

also

How would I find such entries to begin with ?

eg .

If I have a table with rows and columns

[b] NAME | PLACE | FRIEND | PET1 | OTHER ITEM

1 SOME NAME | SOME PLACE | SOME FRIEND | DOGS NAME | CATS NAME

2 SOME NAME | SOME PLACE | SOME FRIEND | DOGS NAME | CATS NAME

3 SOME NAME | SOME PLACE | SOME FRIEND | DOGS NAME | CATS NAME

4 ANOTHER NAME | ANOTHER PLACE | ANOTHER FRIEND | BIRDS NAME | CARS NAME

5 ANOTHER NAME | ANOTHER PLACE | ANOTHER FRIEND | BIRDS NAME | CARS NAME[/b]

The delete should work even if you don’t know what the value of name is, so simply for anything that is duplicate.
The distinct delete should delete rows 2, 3, 5 and just keep 1 and 4,

Also the table has several columns and must thus evaluate the contents of each column.

So the entire row as an entity should not have a duplicate row anywhere else in the table.

Is there a single SQL statement that can achieve this using do shell script ?

or

should I check for such entries before inserting them into the database? This would slow down importing stuff to the database considerably right?

Thanks in advance

EM

EMTools · June 4, 2008, 7:58pm

Any ideas?

James_Nierodzik · June 4, 2008, 8:06pm

I may have an idea here, be back after some testing.
.
Okay I’m back. This works on my end, but since I’m brand spanking new to SQL(ite) no blaming me if your mileage varies

-- SQL Vars
set sqlPath to "/usr/bin/sqlite3"
set dbPath to (path to desktop as Unicode text) & "DB Project:test.db"
set tableName to "tab1"

-- Remove duplicates
set _tempName to tableName & "old"
set sqlCommand to "\"ALTER TABLE " & tableName & " RENAME TO " & _tempName & ";CREATE TABLE " & tableName & " AS SELECT DISTINCT * FROM " & _tempName & ";DROP TABLE " & _tempName & ";\""
set shellCommand to "echo " & sqlCommand & " | " & sqlPath & space & (quoted form of POSIX path of dbPath)
do shell script shellCommand

From the research I read online there isn’t very many ways to actually do this so hopefully this works for you.

Most information was gathered from here and then adopted for SQLite - http://www.sqlteam.com/article/deleting-duplicate-records

barefeet · June 14, 2008, 7:25am

I’ll assume that your table is called “Contacts”

If you already have the data in the table and you just want to show the distinct rows (ie no repeated duplicates), then you can do this in SQLite:

select distinct * from Contacts;

to show just the two unique rows.

If you want to instead actually permanently remove the duplicates, you can use SQLite’s hidden “rowid” column to find all duplicate rows that aren’t the first row in a matching select:

delete from Contacts where rowid in
(
select rowid from Contacts as Duplicates where rowid >
(
select min(rowid) from Contacts as First
where First.Name = Duplicates.Name
and First.Place = Duplicates.Place
and First.Friend = Duplicates.Friend
and First.Pet1 = Duplicates.Pet1
and First.“Other Item” = Duplicates.“Other Item”
)
)
;

Just remove the encapsulating “delete” statement above. You can add the “*” the the select statement to show all the other columns (as well as the hidden rowid column) in the output:

select rowid, * from Contacts as Duplicates where rowid >
(
select min(rowid) from Contacts as First
where First.Name = Duplicates.Name
and First.Place = Duplicates.Place
and First.Friend = Duplicates.Friend
and First.Pet1 = Duplicates.Pet1
and First.“Other Item” = Duplicates.“Other Item”
)
;

Yes, I think it would be simpler, faster and best to check for duplicates when importing. It would take less time than checking after they’re all imported.

You can simply define a unique index:

create unique index “Contacts unique” on Contacts ( Name, Place, Friend, Pet1, “Other Item” );

and when you insert your data from the import, use the “insert or ignore” command, which will then ignore duplicate entries.

For many reasons, I suggest against creating and renaming tables as shown in another suggestion. You should confine ongoing data manipulation to the structure of existing tables.

For completeness, here is the SQL I used to create your sample table and data:

create table Contacts
(
Name text
, Place text
, Friend text
, Pet1 text
, “Other Item” text
)
;
insert into Contacts values ( ‘some name’, ‘some place’, ‘some friend’, ‘dogs name’, ‘cats name’ );
insert into Contacts values ( ‘some name’, ‘some place’, ‘some friend’, ‘dogs name’, ‘cats name’ );
insert into Contacts values ( ‘some name’, ‘some place’, ‘some friend’, ‘dogs name’, ‘cats name’ );

insert into Contacts values ( ‘another name’, ‘some place’, ‘another friend’, ‘birds name’, ‘cars name’ );
insert into Contacts values ( ‘another name’, ‘some place’, ‘another friend’, ‘birds name’, ‘cars name’ );

Tom
BareFeet

SQLite tools for Mac OS X compared:
http://www.tandb.com.au/sqlite/compare/

Duplicate entries in SQlite database

Tom BareFeet

Tom
BareFeet