Lists comparison

Hi all,

I would like your expertise on the efficiency of a code.

So I have two big lists. The list A is consisted of 60.000 items, and the list B has 500.000 items.

At the end of the comparison, I would like to have either a message that everything from list A is included in list B. Or to get returned only the list of the items from list A that do not exist in list B.

What do you suggest is the best approach and the more time efficient way of checking if the items in list A are present in list B?

I haven’t tested this with lists that size, but in theory it should be quite fast. If list A contains duplicates and list B has to contain at least the same number of each value, you’ll need a different method.

use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions

on compareLists(listA, listB)
	set setA to current application's class "NSMutableOrderedSet"'s orderedSetWithArray:(listA)
	set setB to current application's class "NSOrderedSet"'s orderedSetWithArray:(listB)
	tell setA to minusOrderedSet:(setB)
	if (setA's |count|() = 0) then
		return "All list A's items present in list B"
	else
		return setA's array() as list
	end if
end compareLists

I made testings with List A of 5120 items and List B of 354 394 items.

The script from @Nigel Garvey worked 105 times faster than mine. So, I deleted my post.

And with such a large list, how long did it take to execute?

My timing results with Nigel’s script:

listA count - 60,000
listB count - 500,000
uniqueList count - 13,856
timing result - 971 milliseconds

use framework "Foundation"
use framework "GameplayKit"
use scripting additions

-- untimed code
set {listA, listB} to getLists()
on getLists()
	set theWord to characters of "abcdefghi"
	set theWord to current application's NSArray's arrayWithArray:theWord
	set listA to current application's NSMutableArray's new()
	repeat 60000 times
		set theShuffledWord to (theWord's shuffledArray())
		set theShuffledWord to theShuffledWord's componentsJoinedByString:""
		listA's addObject:theShuffledWord
	end repeat
	set listB to current application's NSMutableArray's new()
	repeat 500000 times
		set theShuffledWord to (theWord's shuffledArray())
		set theShuffledWord to theShuffledWord's componentsJoinedByString:""
		listB's addObject:theShuffledWord
	end repeat
	return {listA as list, listB as list}
end getLists

-- start time
set startTime to current application's CACurrentMediaTime()

-- timed code
set uniqueList to compareLists(listA, listB)
on compareLists(listA, listB)
	set setA to current application's class "NSMutableOrderedSet"'s orderedSetWithArray:(listA)
	set setB to current application's class "NSOrderedSet"'s orderedSetWithArray:(listB)
	tell setA to minusOrderedSet:(setB)
	if (setA's |count|() = 0) then
		return "All list A's items present in list B"
	else
		return setA's array() as list --> 13,856 items
	end if
end compareLists

-- elapsed time
set elapsedTime to (current application's CACurrentMediaTime()) - startTime
set numberFormatter to current application's NSNumberFormatter's new()
if elapsedTime > 1 then
	numberFormatter's setFormat:"0.000"
	set elapsedTime to ((numberFormatter's stringFromNumber:elapsedTime) as text) & " seconds"
else
	(numberFormatter's setFormat:"0")
	set elapsedTime to ((numberFormatter's stringFromNumber:(elapsedTime * 1000)) as text) & " milliseconds"
end if

-- result
elapsedTime --> 971 milliseconds

I found this to be super fast.

property listA : POSIX path of ((path to desktop as text) & "List_A.txt") -- 61034 Items
property listB : POSIX path of ((path to desktop as text) & "List_B.txt") -- 591034 Items

set itemsOnlyInListA to (do shell script "diff  " & quoted form of listA & " " & ¬
	quoted form of listB & " |grep '<' |tr -d '< '")

if itemsOnlyInListA = "" then
	activate
	display dialog "Everything in List A is in List B" buttons ¬
		{"Cancel", "OK"} default button "OK"
else
	return paragraphs of itemsOnlyInListA
end if

TIME: 0.348s

Here’s an example showing how hopelessly outdated this approach is. Following results to error:


property listA : {"One", "Two", file, text, 123, "15"}
property listB : {"One", "Two", file, current application, 123, "15", integer}

set itemsOnlyInListA to (do shell script "diff " & quoted form of listA & " " & ¬
	quoted form of listB & " |grep '<' |tr -d '< '")

if itemsOnlyInListA = "" then
	activate
	display dialog "Everything in List A is in List B" buttons ¬
		{"Cancel", "OK"} default button "OK"
else
	return paragraphs of itemsOnlyInListA
end if

My script in the deleted post was able work with any lists, but worked very slow. The script from @Nigel Garvey is very fast and works with any lists as well. So, it is quite preferable:


use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions

property listA : {"One", "Two", file, text, 123, "15", AppleScript's text item delimiters}
property listB : {"One", "Two", file, current application, 123, "15", integer}

compareLists(listA, listB) of me

on compareLists(listA, listB)
	set setA to current application's class "NSMutableOrderedSet"'s orderedSetWithArray:(listA)
	set setB to current application's class "NSOrderedSet"'s orderedSetWithArray:(listB)
	tell setA to minusOrderedSet:(setB)
	if (setA's |count|() = 0) then
		return "All list A's items present in list B"
	else
		return setA's array() as list
	end if
end compareLists

The use of NSOrderedSet in my script is just so that any values not present in list B are returned in the order in which they first appear in list A — in case that’s helpful. If the order really doesn’t matter, slightly faster results can be obtained using the equivalent NSSet methods:

on compareLists(listA, listB)
	set setA to current application's class "NSMutableSet"'s setWithArray:(listA)
	set setB to current application's class "NSSet"'s setWithArray:(listB)
	tell setA to minusSet:(setB)
	if (setA's |count|() = 0) then
		return "All list A's items present in list B"
	else
		return setA's allObjects() as list
	end if
end compareLists

FWIW, using NSSet rather than NSOrderedSet in my test script reduced the timing result from 971 to 730 milliseconds.

Also FWIW, the NSSet script removes duplicates from the returned list, which in almost every instance is a desirable result. If that’s not the case, then NSMutableArray’s removeObjectsInArray can be used and is about as fast.

use framework "Foundation"
use scripting additions

set listA to {"a", "a", "b", "b", "c", "c"}
set listB to {"c", "d"}
set uniqueList to compareLists(listA, listB) --> {"a", "a", "b", "b"}

on compareLists(listA, listB)
	set arrayA to current application's NSMutableArray's arrayWithArray:listA
	set arrayB to current application's NSArray's arrayWithArray:listB
	arrayA's removeObjectsInArray:arrayB
	if (arrayA's |count|() = 0) then
		return "All list A's items present in list B"
	else
		return arrayA as list
	end if
end compareLists

Interesting stuff.

Thank you all so much for your replies.

I am so impressed with your skills!

I checked it and indeed it is super fast. Well done!!!

@KniazidisR …The original post specifically stated “So I have two big lists. The list A is consisted of 60.000 items, and the list B has 500.000 items.”

Lists of that size can’t be put in list as a property value in an AppleScript. The lists are just way too big. The List items can only be processed from an external file containing those list items.

Therefore, my solution provides the quickest solution while using much less code

listA count - 30,000
listB count - 600,000
uniqueList count - 20,000
timing result - .266 seconds

property listA : POSIX path of ((path to desktop as text) & "List_A.txt") -- 30000 Items
property listB : POSIX path of ((path to desktop as text) & "List_B.txt") -- 600000 Items

set itemsOnlyInListA to (do shell script "diff  " & quoted form of listA & " " & ¬
	quoted form of listB & " |grep '<' |tr -d '< '")

if itemsOnlyInListA = "" then
	activate
	display dialog "Everything in List A is in List B" buttons ¬
		{"Cancel", "OK"} default button "OK"
else
	return paragraphs of itemsOnlyInListA
end if

I’m not sure i agree with you.

I’ve tested the list capabilities of AppleScript and was able to generate a list of 600,000 items with no issues.
Yes, it took awhile to generate the list, but it worked

I was assuming your lists were already created and saved in 2 separate text files. I did not realize your script was creating the lists and then processing them that way.

Point taken.

But epaminos never said how the list gets populated.
In order to test I had to create a routine to generate a list that big.

Another idea that’s occurred to me recently is to filter an NSMutableArray, using an NSSet for the comparison object. This is faster than comparing two arrays, because searching for an item in a set is faster than searching for it in an array. (I don’t know why. It’s the way sets are designed.) It also seems to be very slightly faster than the minusSet: idea above. Like the NSOrderedSet version, it returns any items that are totally missing from listB in the order in which they occur in listA, only this time there’s an instance for every occurrence in listA instead of just one. The method still doesn’t detect if listB simply contains fewer instances of a value than listA.

on compareLists(listA, listB)
	set arrayA to current application's class "NSMutableArray"'s arrayWithArray:(listA)
	set setB to current application's class "NSSet"'s setWithArray:(listB)
	set filter to current application's class "NSPredicate"'s predicateWithFormat_("!(self IN %@)", setB)
	tell arrayA to filterUsingPredicate:(filter)
	if (arrayA's |count|() = 0) then
		return "All list A's items present in list B"
	else
		return arrayA as list
	end if
end compareLists