I would like your expertise on the efficiency of a code.
So I have two big lists. The list A is consisted of 60.000 items, and the list B has 500.000 items.
At the end of the comparison, I would like to have either a message that everything from list A is included in list B. Or to get returned only the list of the items from list A that do not exist in list B.
What do you suggest is the best approach and the more time efficient way of checking if the items in list A are present in list B?
I haven’t tested this with lists that size, but in theory it should be quite fast. If list A contains duplicates and list B has to contain at least the same number of each value, you’ll need a different method.
use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions
on compareLists(listA, listB)
set setA to current application's class "NSMutableOrderedSet"'s orderedSetWithArray:(listA)
set setB to current application's class "NSOrderedSet"'s orderedSetWithArray:(listB)
tell setA to minusOrderedSet:(setB)
if (setA's |count|() = 0) then
return "All list A's items present in list B"
else
return setA's array() as list
end if
end compareLists
listA count - 60,000
listB count - 500,000
uniqueList count - 13,856
timing result - 971 milliseconds
use framework "Foundation"
use framework "GameplayKit"
use scripting additions
-- untimed code
set {listA, listB} to getLists()
on getLists()
set theWord to characters of "abcdefghi"
set theWord to current application's NSArray's arrayWithArray:theWord
set listA to current application's NSMutableArray's new()
repeat 60000 times
set theShuffledWord to (theWord's shuffledArray())
set theShuffledWord to theShuffledWord's componentsJoinedByString:""
listA's addObject:theShuffledWord
end repeat
set listB to current application's NSMutableArray's new()
repeat 500000 times
set theShuffledWord to (theWord's shuffledArray())
set theShuffledWord to theShuffledWord's componentsJoinedByString:""
listB's addObject:theShuffledWord
end repeat
return {listA as list, listB as list}
end getLists
-- start time
set startTime to current application's CACurrentMediaTime()
-- timed code
set uniqueList to compareLists(listA, listB)
on compareLists(listA, listB)
set setA to current application's class "NSMutableOrderedSet"'s orderedSetWithArray:(listA)
set setB to current application's class "NSOrderedSet"'s orderedSetWithArray:(listB)
tell setA to minusOrderedSet:(setB)
if (setA's |count|() = 0) then
return "All list A's items present in list B"
else
return setA's array() as list --> 13,856 items
end if
end compareLists
-- elapsed time
set elapsedTime to (current application's CACurrentMediaTime()) - startTime
set numberFormatter to current application's NSNumberFormatter's new()
if elapsedTime > 1 then
numberFormatter's setFormat:"0.000"
set elapsedTime to ((numberFormatter's stringFromNumber:elapsedTime) as text) & " seconds"
else
(numberFormatter's setFormat:"0")
set elapsedTime to ((numberFormatter's stringFromNumber:(elapsedTime * 1000)) as text) & " milliseconds"
end if
-- result
elapsedTime --> 971 milliseconds
property listA : POSIX path of ((path to desktop as text) & "List_A.txt") -- 61034 Items
property listB : POSIX path of ((path to desktop as text) & "List_B.txt") -- 591034 Items
set itemsOnlyInListA to (do shell script "diff " & quoted form of listA & " " & ¬
quoted form of listB & " |grep '<' |tr -d '< '")
if itemsOnlyInListA = "" then
activate
display dialog "Everything in List A is in List B" buttons ¬
{"Cancel", "OK"} default button "OK"
else
return paragraphs of itemsOnlyInListA
end if
Here’s an example showing how hopelessly outdated this approach is. Following results to error:
property listA : {"One", "Two", file, text, 123, "15"}
property listB : {"One", "Two", file, current application, 123, "15", integer}
set itemsOnlyInListA to (do shell script "diff " & quoted form of listA & " " & ¬
quoted form of listB & " |grep '<' |tr -d '< '")
if itemsOnlyInListA = "" then
activate
display dialog "Everything in List A is in List B" buttons ¬
{"Cancel", "OK"} default button "OK"
else
return paragraphs of itemsOnlyInListA
end if
My script in the deleted post was able work with any lists, but worked very slow. The script from @Nigel Garvey is very fast and works with any lists as well. So, it is quite preferable:
use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions
property listA : {"One", "Two", file, text, 123, "15", AppleScript's text item delimiters}
property listB : {"One", "Two", file, current application, 123, "15", integer}
compareLists(listA, listB) of me
on compareLists(listA, listB)
set setA to current application's class "NSMutableOrderedSet"'s orderedSetWithArray:(listA)
set setB to current application's class "NSOrderedSet"'s orderedSetWithArray:(listB)
tell setA to minusOrderedSet:(setB)
if (setA's |count|() = 0) then
return "All list A's items present in list B"
else
return setA's array() as list
end if
end compareLists
The use of NSOrderedSet in my script is just so that any values not present in list B are returned in the order in which they first appear in list A — in case that’s helpful. If the order really doesn’t matter, slightly faster results can be obtained using the equivalent NSSet methods:
on compareLists(listA, listB)
set setA to current application's class "NSMutableSet"'s setWithArray:(listA)
set setB to current application's class "NSSet"'s setWithArray:(listB)
tell setA to minusSet:(setB)
if (setA's |count|() = 0) then
return "All list A's items present in list B"
else
return setA's allObjects() as list
end if
end compareLists
FWIW, using NSSet rather than NSOrderedSet in my test script reduced the timing result from 971 to 730 milliseconds.
Also FWIW, the NSSet script removes duplicates from the returned list, which in almost every instance is a desirable result. If that’s not the case, then NSMutableArray’s removeObjectsInArray can be used and is about as fast.
use framework "Foundation"
use scripting additions
set listA to {"a", "a", "b", "b", "c", "c"}
set listB to {"c", "d"}
set uniqueList to compareLists(listA, listB) --> {"a", "a", "b", "b"}
on compareLists(listA, listB)
set arrayA to current application's NSMutableArray's arrayWithArray:listA
set arrayB to current application's NSArray's arrayWithArray:listB
arrayA's removeObjectsInArray:arrayB
if (arrayA's |count|() = 0) then
return "All list A's items present in list B"
else
return arrayA as list
end if
end compareLists
@KniazidisR …The original post specifically stated “So I have two big lists. The list A is consisted of 60.000 items, and the list B has 500.000 items.”
Lists of that size can’t be put in list as a property value in an AppleScript. The lists are just way too big. The List items can only be processed from an external file containing those list items.
Therefore, my solution provides the quickest solution while using much less code
listA count - 30,000
listB count - 600,000
uniqueList count - 20,000
timing result - .266 seconds
property listA : POSIX path of ((path to desktop as text) & "List_A.txt") -- 30000 Items
property listB : POSIX path of ((path to desktop as text) & "List_B.txt") -- 600000 Items
set itemsOnlyInListA to (do shell script "diff " & quoted form of listA & " " & ¬
quoted form of listB & " |grep '<' |tr -d '< '")
if itemsOnlyInListA = "" then
activate
display dialog "Everything in List A is in List B" buttons ¬
{"Cancel", "OK"} default button "OK"
else
return paragraphs of itemsOnlyInListA
end if
I’ve tested the list capabilities of AppleScript and was able to generate a list of 600,000 items with no issues.
Yes, it took awhile to generate the list, but it worked
I was assuming your lists were already created and saved in 2 separate text files. I did not realize your script was creating the lists and then processing them that way.
Another idea that’s occurred to me recently is to filter an NSMutableArray, using an NSSet for the comparison object. This is faster than comparing two arrays, because searching for an item in a set is faster than searching for it in an array. (I don’t know why. It’s the way sets are designed.) It also seems to be very slightly faster than the minusSet: idea above. Like the NSOrderedSet version, it returns any items that are totally missing from listB in the order in which they occur in listA, only this time there’s an instance for every occurrence in listA instead of just one. The method still doesn’t detect if listB simply contains fewer instances of a value than listA.
on compareLists(listA, listB)
set arrayA to current application's class "NSMutableArray"'s arrayWithArray:(listA)
set setB to current application's class "NSSet"'s setWithArray:(listB)
set filter to current application's class "NSPredicate"'s predicateWithFormat_("!(self IN %@)", setB)
tell arrayA to filterUsingPredicate:(filter)
if (arrayA's |count|() = 0) then
return "All list A's items present in list B"
else
return arrayA as list
end if
end compareLists