Compare List with 4 Lists

I have a large and changing Project Design data in various Format / Size(s) sorted in different folders with same name and different extensions (ai, eps, jpg, png, zip)

What is the fastest way to compare list_Project basename with only basename (Eco 201, Eco 202) without extension of other 4 sample lists? Sample list data and Script below (1500 - 2000 items)

Currently I am doing this way

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

tell application "System Events"
	set list_Project to {"Eco 201", "Eco 202", "Eco 203", "Eco 204", "Eps 205", "Eps 206"}
	set listJpg to {"Eco 201", "Eco 202", "Eco 204", "Eps 205"}
	set listEps to {"Eco 201", "Eco 202", "Eco 204", "Eps 205", "Eps 206"}
	set listPng to {"Eco 201", "Eco 202", "Eco 203", "Eps 206"}
	set listZip to {"Eco 201", "Eco 202", "Eco 203", "Eco 204", "Eps 206"}
		
	-- Compare each item of list_Project to items of listJpg
	set missingJpg to {}
	repeat with i from 1 to count list_Project
		
		set Check1 to (item i of list_Project)
		
		if Check1 is not in listJpg then set missingJpg to (missingJpg & Check1)
	end repeat
	# result > missingJpg is now {"Eco 203", "Eps 206"}
	
	# looping repeating with other lists
	
end tell

([applescript] and [/applescript] posting tags added by NG.)

Hi PK3587. Welcome to MacScripter.

For a start, you don’t need to tell System Events to do anything here (or in this part of a larger script). Lists of text are AppleScript objects in their own right.

It’s quite common to speed up iterations through long lists by refering to the list variable as belonging to the script or (as in the script below) to a script object in a handler. It’s an AppleScript quirk that’s never been satisfactorily explained. :slight_smile:

on findMissingItems(referenceList, testList)
	script o
		property refLst : referenceList
		property output : {}
	end script
	
	repeat with i from 1 to (count referenceList)
		set thisItem to item i of o's refLst
		if (thisItem is not in testList) then set end of o's output to thisItem
	end repeat
	
	return o's output
end findMissingItems

set list_Project to {"Eco 201", "Eco 202", "Eco 203", "Eco 204", "Eps 205", "Eps 206"}

set listJpg to {"Eco 201", "Eco 202", "Eco 204", "Eps 205"}
set missingFromListJpg to findMissingItems(list_Project, listJpg)

set listEps to {"Eco 201", "Eco 202", "Eco 204", "Eps 205", "Eps 206"}
set missingFromListEps to findMissingItems(list_Project, listEps)

set listPng to {"Eco 201", "Eco 202", "Eco 203", "Eps 206"}
set missingFromListPng to findMissingItems(list_Project, listPng)

set listZip to {"Eco 201", "Eco 202", "Eco 203", "Eco 204", "Eps 206"}
set missingFromListZip to findMissingItems(list_Project, listZip)

-- Demo result:
return {missingFromListJpg, missingFromListEps, missingFromListPng, missingFromListZip}
--> {{"Eco 203", "Eps 206"}, {"Eco 203"}, {"Eco 204", "Eps 205"}, {"Eps 205"}}

If you’re expecting most of the items in the four lists to be missing, it might turn out to be slightly faster to “cross out” the ones that are there from a copy of the reference list and return what’s left:

on findMissingItems(referenceList, testList)
	script o
		property refLst : missing value
	end script
	
	copy referenceList to o's refLst
	repeat with i from 1 to (count referenceList)
		set thisItem to item i of o's refLst
		if (thisItem is in testList) then set item i of o's refLst to missing value
	end repeat
	
	-- Return a list containing only the texts which haven't been replaced with missing values.
	return o's refLst's text
end findMissingItems

You don’t need to use a script object. All lists are sent by reference to subroutines.


on findMissingItems(referenceList, testList)
	local output
	set output to {}
	repeat with i from 1 to (count referenceList)
		set thisItem to item i of referenceList
		if (thisItem is not in testList) then set end of output to thisItem
	end repeat
	
	return output
end findMissingItems

set list_Project to {"Eco 201", "Eco 202", "Eco 203", "Eco 204", "Eps 205", "Eps 206"}

set listJpg to {"Eco 201", "Eco 202", "Eco 204", "Eps 205"}
set missingFromListJpg to findMissingItems(list_Project, listJpg)

set listEps to {"Eco 201", "Eco 202", "Eco 204", "Eps 205", "Eps 206"}
set missingFromListEps to findMissingItems(list_Project, listEps)

set listPng to {"Eco 201", "Eco 202", "Eco 203", "Eps 206"}
set missingFromListPng to findMissingItems(list_Project, listPng)

set listZip to {"Eco 201", "Eco 202", "Eco 203", "Eco 204", "Eps 206"}
set missingFromListZip to findMissingItems(list_Project, listZip)

-- Demo result:
return {missingFromListJpg, missingFromListEps, missingFromListPng, missingFromListZip}
--> {{"Eco 203", "Eps 206"}, {"Eco 203"}, {"Eco 204", "Eps 205"}, {"Eps 205"}}

For lists containing thousands of items I would use ASObjC.

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

-- Handler from Shane Stanley in Everyday AppleScriptObjC
on removeItemsInListInOrder:list2 fromList:list1
	set set2 to current application's NSSet's setWithArray:list2
	set set1 to current application's NSMutableOrderedSet's orderedSetWithArray:list1
	set1's minusSet:set2
	return (set1's array()) as list
end removeItemsInListInOrder:fromList:

set list_Project to {"Eco 201", "Eco 202", "Eco 203", "Eco 204", "Eps 205", "Eps 206"}

set listJpg to {"Eco 201", "Eco 202", "Eco 204", "Eps 205"}
set missingFromListJpg to its removeItemsInListInOrder:listJpg fromList:list_Project

set listEps to {"Eco 201", "Eco 202", "Eco 204", "Eps 205", "Eps 206"}
set missingFromListEps to its removeItemsInListInOrder:listEps fromList:list_Project

set listPng to {"Eco 201", "Eco 202", "Eco 203", "Eps 206"}
set missingFromListPng to its removeItemsInListInOrder:listPng fromList:list_Project

set listZip to {"Eco 201", "Eco 202", "Eco 203", "Eco 204", "Eps 206"}
set missingFromListZip to its removeItemsInListInOrder:listZip fromList:list_Project

{missingFromListJpg, missingFromListEps, missingFromListPng, missingFromListZip}
--> {{"Eco 203", "Eps 206"}, {"Eco 203"}, {"Eco 204", "Eps 205"}, {"Eps 205"}}

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) mardi 14 juillet 2020 16:50:07

Thanks,

Nigel Garvey, robertfern, Yvan Koenig for your excellent stuff. Everything works well with the sample data. I will change my complete script and test with full data.

Thanks for your help, I appreciate it!

I decided to put Nigel’s statement to the test. I modified his first and second scripts to create a list with 2000 items using the code contained below. I also changed list items that started with Eps to Eco.

set list_Project to {}
repeat with i from 1 to 2000
	set end of list_Project to "Eco " & i
end repeat

I then tested both scripts and the relative results–not including the time it took to create the list_Project list–were as Nigel predicted:

Nigel’s First Script: 31 milliseconds

Nigel’s Second Script: 24 milliseconds

By way of comparison, my basic AppleScript with no speed enhancements took 559 milliseconds to run. The same script with the reference-to operator applied to the list_Project and output list variables took 65 milliseconds to run.

use framework "Foundation"

set listProject to {}
repeat with i from 1 to 2000
	set end of listProject to "Eco " & i
end repeat

set startTime to current application's class "NSDate"'s new()

set output to {}
set listJpg to {"Eco 201", "Eco 202", "Eco 204", "Eco 205"}
set missingJpg to findMissing(a reference to listProject, listJpg, a reference to output)

set output to {}
set listEps to {"Eco 201", "Eco 202", "Eco 204", "Eco 205", "Eco 206"}
set missingEps to findMissing(a reference to listProject, listEps, a reference to output)

set output to {}
set listPng to {"Eco 201", "Eco 202", "Eco 203", "Eco 206"}
set missingPng to findMissing(a reference to listProject, listPng, a reference to output)

set output to {}
set listZip to {"Eco 201", "Eco 202", "Eco 203", "Eco 204", "Eco 206"}
set missingZip to findMissing(a reference to listProject, listZip, a reference to output)

on findMissing(firstList, secondList, output)
	repeat with anItem in firstList
		set anItem to contents of anItem
		if anItem is not in secondList then
			set end of output to anItem
		end if
	end repeat
	return output as list
end findMissing

set endTime to -(startTime's timeIntervalSinceNow()) --> 65 milliseconds on my computer

Hi robertfern.

You’re missing my point that for speed (which is what PK3587 wanted), iterating through a long list is best done with the list variable specified as belonging to a script or script object. That is, with object specifiers something like item i of someScript’s listVariable rather than just item i of listVariable. You can’t do this with local variables (including handler parameter variables), but they can still be used and are OK if speed isn’t important or the lists are short.

Some clarification regarding Sample Data, it’s kept simple to avoid confusion, actual file names are long and descriptive.

File names are Short form of Project Name + Part Name + Sequence like from Project “Eco Theme + Panel Front + 001”. So actual list is like :

{ “Eco Theme Panel Front 001”, “Paint House Top 001”, “Lighting Led Garden 005”, “Signage Black 18x48”, “colour codes update 03 May”}

There is no Pattern in file names. So we must stick on comparing full names without extensions, Pattern fetching won’t work.

Speed is the crucial part here, it’s a shared folder in which data files are added, updated, and deleted by 3 users.

Nigel.

From my own personal testing of lists passed into subroutines, the “item i of someScript’s listVariable” works as tho it is “a reference to” since the lists are passed as reference by AppleScript anyways. So you already get the speed improvement.

Now as to the output list, since it is probably not very big the difference is negligible. (depending on size)

Robert

It’s pointers that are passed, not references in the AppleScript sense.

But putting things to the test, it seems that we’re both right. Using a specifier set against a script object is definitely and consistently faster, but the difference is negligible with 2000 strings, which is quite a surprise to me!

use framework "Foundation" -- For timing with NSDate.

on findMissingItemsRF(referenceList, testList)
	local output
	set output to {}
	repeat with i from 1 to (count referenceList)
		set thisItem to item i of referenceList
		if (thisItem is not in testList) then set end of output to thisItem
	end repeat
	
	return output
end findMissingItemsRF

on findMissingItemsNG(referenceList, testList)
	script o
		property refLst : referenceList
		property output : {}
	end script
	
	repeat with i from 1 to (count referenceList)
		set thisItem to item i of o's refLst
		if (thisItem is not in testList) then set end of o's output to thisItem
	end repeat
	
	return o's output
end findMissingItemsNG

set list_Project to {"Eco 201", "Eco 202", "Eco 203", "Eco 204", "Eps 205", "Eps 206"}
set listJpg to {"Eco 201", "Eco 202", "Eco 204", "Eps 205"}
repeat with i from 207 to 2200
	set end of my list_Project to "Eco" & i
	set end of my listJpg to "Eco" & i
end repeat

set startTime to current application's class "NSDate"'s new()
set missingFromListJpg to findMissingItemsRF(list_Project, listJpg)
set RFTime to -(startTime's timeIntervalSinceNow())

set startTime to current application's class "NSDate"'s new()
set missingFromListJpg to findMissingItemsNG(list_Project, listJpg)
set NGTime to -(startTime's timeIntervalSinceNow())

return {RFTime, NGTime}
--> {0.847696065903, 0.754427909851}

Robert’s hander can be made practically the same speed as mine by actually passing a reference:

set missingFromListJpg to findMissingItemsRF(a reference to list_Project, listJpg)

Hi, all. In conjunction with my, a variation on Robert’s code is about three times faster, when fed ~2000 items.

use framework "Foundation" -- For timing with NSDate (only for testing)

global list_Project
set list_Project to {}--value empty for testing

#generate test values
repeat with counter from 201 to 2201
	set end of list_Project to "Eco " & counter
end repeat

set listJpg to {"Eco 201", "Eco 202", "Eco 204", "Eco 205"}
set listEps to {"Eco 201", "Eco 202", "Eco 204", "Eco 205", "Eco 206"}
set listPng to {"Eco 201", "Eco 202", "Eco 203", "Eco 206"}
set listZip to {"Eco 201", "Eco 202", "Eco 203", "Eco 204", "Eco 206"}

set startTime to current application's class "NSDate"'s new() --start timing (only for testing)

repeat with listVal in {listJpg, listEps, listPng, listZip}
	findMissingItems(listVal)
end repeat

-(startTime's timeIntervalSinceNow()) --return timing result (only for testing)
--listJpg --{listJpg, listEps, listPng, listZip} --enable to see outcome(s)



on findMissingItems(testList)
	set output to {}
	repeat with counter from 1 to (count list_Project)
		if my list_Project's item counter is not in testList then set output's end to my list_Project's item counter
	end repeat
	output
end findMissingItems

–edited to pare down explicit subroutine calls and for clarity

In such a case it would be quicker to make the main set just once:

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

on removeItemsInListInOrder:list2 fromSet:aSet
	set set2 to current application's NSSet's setWithArray:list2
	set set1 to aSet's mutableCopy()
	set1's minusSet:set2
	return (set1's array()) as list
end removeItemsInListInOrder:fromSet:

set list_Project to {"Eco 201", "Eco 202", "Eco 203", "Eco 204", "Eps 205", "Eps 206"}
set fullSet to current application's NSOrderedSet's orderedSetWithArray:list_Project

set listJpg to {"Eco 201", "Eco 202", "Eco 204", "Eps 205"}
set missingFromListJpg to its removeItemsInListInOrder:listJpg fromSet:fullSet

set listEps to {"Eco 201", "Eco 202", "Eco 204", "Eps 205", "Eps 206"}
set missingFromListEps to its removeItemsInListInOrder:listEps fromSet:fullSet

set listPng to {"Eco 201", "Eco 202", "Eco 203", "Eps 206"}
set missingFromListPng to its removeItemsInListInOrder:listPng fromSet:fullSet

set listZip to {"Eco 201", "Eco 202", "Eco 203", "Eco 204", "Eps 206"}
set missingFromListZip to its removeItemsInListInOrder:listZip fromSet:fullSet

{missingFromListJpg, missingFromListEps, missingFromListPng, missingFromListZip}
--> {{"Eco 203", "Eps 206"}, {"Eco 203"}, {"Eco 204", "Eps 205"}, {"Eps 205"}}

Thank you guys! Brilliant ideas.

I just want to restate/clarify (w.r.t. post #8) that my filenames have NO common pattern/sequence.
Some of the files don’t have any numbers, they only have alphabetic characters.

So any ideas that work for my filenames without any common sequence/pattern are highly appreciated.

Perhaps I don’t understand, but Nigel’s first script does not rely on any common sequence/pattern, is very fast, and works with the data in your post 8. For example:

on findMissingItems(referenceList, testList)
	script o
		property refLst : referenceList
		property output : {}
	end script
	
	repeat with i from 1 to (count referenceList)
		set thisItem to item i of o's refLst
		if (thisItem is not in testList) then set end of o's output to thisItem
	end repeat
	
	return o's output
end findMissingItems

set list_Project to {"Eco Theme Panel Front 001", "Paint House Top 001", "Lighting Led Garden 005", "Signage Black 18x48", "colour codes update 03 May"}

set listJpg to {"Eco Theme Panel Front 001", "colour codes update 03 May"}
set missingFromListJpg to findMissingItems(list_Project, listJpg)
--> {"Paint House Top 001", "Lighting Led Garden 005", "Signage Black 18x48"}

Hi peavine.

Thanks for your kind support.

Scripts of Nigel Garvey, robertfern, Yvan Koenig work very fast with my sample data of 150 files. I will test and try to learn from every Script posted by great member’s (Nigel Garvey, robertfern, Yvan Koenig, Marc Anthony, peavine, Shane Stanley) of this forum.

I am now completely re-writing my Old Large Scripts for real run. It will take some time for me to complete this large script.

I just founded 1 important error in the script of @Nigel Garvey, which is very fast from other side, so I prefer this solution.

Code line


if (thisItem is not in testList) then set end of o's output to thisItem

should be


if ({thisItem} is not in testList) then set end of o's output to thisItem

– parentheses added

I tested following lists:


set list_Project to {{pdfName:"A.pdf", DOIofPDF:1}, {pdfName:"B.pdf", DOIofPDF:2}, {pdfName:"C.pdf", DOIofPDF:3}}
set listJpg to {{pdfName:"A.pdf", DOIofPDF:1}, {pdfName:"B.pdf", DOIofPDF:3}, {pdfName:"D.pdf", DOIofPDF:4}}

set missingFromListJpg to findMissingItems(list_Project, listJpg)

on findMissingItems(referenceList, testList)
	script o
		property refLst : referenceList
		property output : {}
	end script
	repeat with i from 1 to (count referenceList)
		set thisItem to item i of o's refLst
		if ({thisItem} is not in testList) then set end of o's output to thisItem
	end repeat
	return o's output
end findMissingItems

Hi KniazidisR.

Thanks for pointing that out. It wasn’t actually an error in the replies to PK3587 as it was known the test lists contained only strings and that automatic coercions to list could be relied upon. But for general use with is in, contains, and their negatives, or where the list contents are known to be lists or records, you’re absolutely right. The search items should be explicitly wrapped in braces.

FWIW, the following is from the AppleScript Language Guide and, if I understand correctly, explains Nigel’s reference to automatic coercion:

https://developer.apple.com/library/archive/documentation/AppleScript/Conceptual/AppleScriptLangGuide/reference/ASLR_operators.html#//apple_ref/doc/uid/TP40000983-CH5g-125019

Hi peavine.

The commands in question refer to cross-sections of lists rather than to individual items, eg.:

{"this", "is", 2, "cool"} contains {2, "cool"} -- {2, "cool"} is part of {"this", "is", 2, "cool"}.
--> true

So both parameters have to be lists. If the “section” parameter isn’t a list, it’s automatically coerced to one:

{"this", "is", 2, "cool"} contains 2
-- Action performed:
{"this", "is", 2, "cool"} contains {2} -- 2 automatically coerced to {2}.

If you’re checking to see if a list contains a list or a record, you have to present the search item in a list shell yourself. If the item’s a list, this is to make it clear that you’re looking for a section of the main list containing just the list in which you’re interested, not a section that’s the same as it:

{"this", "is", {2}, "cool"} contains {{2}} -- {{2}} is a section of {"this", "is", {2}, "cool"}

If the item of interest is a record, an automatic coercion to list would produced a list of the record’s values instead of a list containing the record:

{"this", "is", {a:2, b:2}, "cool"} contains {a:2, b:2} --> false
-- Action performed:
{"this", "is", {a:2, b:2}, "cool"} contains {2, 2} -- {a:2, b:2} as list is {2, 2}.
-- So an explicit list shell's needed:
{"this", "is", {a:2, b:2}, "cool"} contains {{a:2, b:2}} --> true

It should be clear from the above that other strange results are possible when explicit list braces aren’t used: :slight_smile:

{"this", "is", 2, 2, "cool"} contains {a:2, b:2} --> true

Nigel. Thanks for the explanation, which makes complete sense.