Determining whether all list items are unique

Perfect_Record · November 9, 2011, 10:13pm

I have a list of up to 100 strings, and would like to determine whether every string in the list is unique.
I could use a brute force method of starting with the first item in the list and comparing it to every other in a loop, then move on to the second, etc.

Is there a faster/smarter method?

Hans-Gerd_Classen · November 10, 2011, 9:07am

Hi,

have a look at the satimage.osax and its “sortlist → remove duplicates”

Have a nice day

Nigel_Garvey · November 10, 2011, 10:28am

AppleScript’s ‘contains’ or ‘is in’ commands can do the “comparing with every other” for you:

set listOfStrings to paragraphs 1 thru 100 of (read (choose file))
set end of listOfStrings to some item of listOfStrings -- Deliberate duplicate for testing.

set restOfStrings to listOfStrings

repeat with i from 1 to (count listOfStrings) - 1
	set str to item i of listOfStrings
	set restOfStrings to rest of restOfStrings
	if (restOfStrings contains str) then display dialog "The string "" & str & "" is repeated."
end repeat

Or the other way round:

set listOfStrings to paragraphs 1 thru 100 of (read (choose file))
set end of listOfStrings to some item of listOfStrings -- Deliberate duplicate for testing.

set previousStrings to {}
set beginning of previousStrings to item 1 of listOfStrings

repeat with i from 2 to (count listOfStrings)
	set str to item i of listOfStrings
	if (previousStrings contains str) then display dialog "The string "" & str & "" is repeated."
	set end of previousStrings to str
end repeat

DJ_Bazzie_Wazzie · November 10, 2011, 12:02pm

For sortlist and remove duplicates you can use the shell command as well and don’t need the satimage.

set theList to {"a", "hello", "goodbye", "a", "another string", "fake text", "fake", "goodbye"}

set AppleScript's text item delimiters to character id 10
set theString to theList as string
set AppleScript's text item delimiters to ""

set sortedList to every paragraph of (do shell script "/bin/echo -n " & quoted form of theString & "| sort -u")

set listIsUnique to length of theList = length of sortedList

Hans-Gerd_Classen · November 10, 2011, 2:05pm

Hi,

I remembered that I’ve heard of some limitations of the sort-command:http://macscripter.net/viewtopic.php?id=34264 (last post by Shane Stanley)

Have a nice day

Marc_Anthony · November 10, 2011, 3:25pm

Hey, guys. I didn’t actually test Nigel’s first method, but I’ve previously researched the rest command; it can actually be slower than a standard repeat loop, although it might not be noticeable on such a small list.

DJ_Bazzie_Wazzie · November 10, 2011, 4:56pm

Shane is right about that. The most advanced operating system has the worst locale of all *nix OS versions variants. Still we can use another collation to get close to our expectations. For me the following code works good enough with sort:

set theList to {"a", "hello", "goodbye", "a", "another string", "Ã¤nother string", "Ã¤", "fake text", "fake", "goodbye"}

set AppleScript's text item delimiters to character id 10
set theString to theList as string
set AppleScript's text item delimiters to ""

set sortedList to (do shell script "/bin/echo -n " & quoted form of theString & " | LC_ALL=nl_NL.ISO8859-1 sort -u")

I used dutch (nl_NL) collation here to use eventually the standard latin la_LN.ISO8859-1 to avoid the default la_LN.US-ASCII collation.

Hans-Gerd_Classen · November 10, 2011, 5:22pm

Hi DJ,

works fine

Have to say that an oldfashioned AS-Handler will do the job (without sorting, which wasn’t part of the question …) pretty fast too …

set theList to {"a", "hello", "goodbye", "a", "another string", "Ã¤nother string", "Ã¤", "fake text", "fake", "goodbye"}

set theList to my uniqueArray(theList)

on uniqueArray(ain)
	set aout to {}
	repeat with i from 1 to count of ain
		set theItem to item i of ain
		if aout does not contain theItem then set end of aout to theItem
	end repeat
	return aout
end uniqueArray

Nigel_Garvey · November 10, 2011, 5:52pm

While the object of this thread is to “determine whether every string in the list is unique”, it’s the removal of duplicates rather than the actual sort order which is relevant to the sort-removing-duplicates methods here.

Another thing is that the Unix sort method won’t work correctly with strings containing more than one paragraph.

DJ_Bazzie_Wazzie · November 10, 2011, 7:21pm

Indeed Nigel, we did get a little carried away and multi paragraph string is by default a problem with sort. The workaround takes so much effort that the whole process won’t be faster than a simple AppleScript loop.

Perfect_Record · November 10, 2011, 7:48pm

Actually I merely want to alert the script user that there are duplicate strings in the list I’m checking. All the entries in the list need to remain intact.

This is part of a larger script that creates a report detailing the contents of a compact disc master prior to the master being delivered to the factory.

Every track of a CD has a unique identifier code embedded in the data stream. It’s very easy at the mastering stage to duplicate an ID code. The ID codes are a 12 character string in this format: USPR37300012.

The full script extracts track titles, composers, artists and ID codes for proof reading before the CD makes it out into the world. Also a little error checking thrown in for good measure.

Thanks for all the good suggestions!

DJ_Bazzie_Wazzie · November 10, 2011, 11:43pm

That’s not what Nigel mentioned.

Create a list with unique items and compare it’s length with the original (like in my first post). It the lengths of both lists are not equal it means that there are items double in the list.