Eliminate repeat loop with predicate

Nigel. I tested your original script in both Script Editor and Script Debugger and received the same file count under various test scenarios. I also looked at the actual files returned and saw no discrepancy.

On a different issue, an NSPredicate cheat-sheet that I use contains the following:

Common mistakes. Using [NSString stringWithFormat:] to build predicates is prone to have non-escaped diacritics or artifacts like an apostrophe. Use [NSPredicate predicateWithFormat:] instead.

I tried to eliminate the use of stringWithFormat in your first script but couldn’t get that to work (possibly for the reason cited). Do you know how to do this? Thanks!

use framework "Foundation"
use scripting additions

set theFolder to "/Users/Robert"
set skipFolders to {"/Users/Robert/Movies", "/Users/Robert/Music", "/Users/Robert/Pictures"}
set fileManager to current application's NSFileManager's defaultManager()
set theFolder to current application's |NSURL|'s fileURLWithPath:theFolder
set folderContents to (fileManager's enumeratorAtURL:theFolder includingPropertiesForKeys:{} options:6 errorHandler:(missing value))'s allObjects() --option 6 skips files and package contents

set formatString to (current application's NSArray's arrayWithArray:skipFolders)'s componentsJoinedByString:"|"
--in the next code line the regex pattern has to be placed in single quotes 
--but that appears to stop %@ from expanding to formatString's values
set thePredicate to current application's NSPredicate's predicateWithFormat_("!(path MATCHES '(%@).*')", formatString)
set folderContents to (folderContents's filteredArrayUsingPredicate:thePredicate)

Hi peavine.

I think the difficulty you’re having here is that with stringWithFormat:, the substitution values are simply spiced into the format, whereas with predicateWithFormat:, they’re quoted into it — if they’re strings. My second, hopefully improved script above makes use of this. It only uses stringWithFormat: to complete the regex pattern that has to be matched, and that only after escapedPatternForString: has ensured that the characters in the paths are treated as literals.

Hope this makes sense!

1 Like

The following code runs in 20s on my M2 Mac mini Pro in Script Debugger. The containsObject result is false here for skipFolders and their contents.

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set fm to current application's NSFileManager's defaultManager()
set skipFolders to current application's NSMutableSet's alloc()'s init()
set fcontents to current application's NSMutableSet's alloc()'s init()

set theFolder to current application's NSURL's fileURLWithPath:"/Users/Robert"
skipFolders's addObjectsFromArray:{"/Users/Robert/Movies", "/Users/Robert/Music", "/Users/Robert/Pictures"}

set folderContents to (fm's enumeratorAtURL:theFolder includingPropertiesForKeys:{} options:6 errorHandler:(missing value))'s allObjects()

fcontents's addObjectsFromArray:folderContents
fcontents's minusSet:skipFolders
-- log (fcontents's allObjects()'s containsObject:"/Users/Robert/Movies/some_movie.mp4") as boolean
log (fcontents's allObjects()'s containsObject:"/Users/Robert/Music/iTunes/") as boolean
1 Like

VikingOSX. Thanks for the script suggestion.

I don’t think your script works as expected, and the following demonstrates this:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set fm to current application's NSFileManager's defaultManager()
set skipFolders to current application's NSMutableSet's alloc()'s init()
set fcontents to current application's NSMutableSet's alloc()'s init()
set theFolder to current application's NSURL's fileURLWithPath:"/Users/Robert"
skipFolders's addObjectsFromArray:{"/Users/Robert/Movies", "/Users/Robert/Music", "/Users/Robert/Pictures"}
set folderContents to (fm's enumeratorAtURL:theFolder includingPropertiesForKeys:{} options:6 errorHandler:(missing value))'s allObjects()
fcontents's addObjectsFromArray:folderContents

log fcontents's |count|() --> 1171
fcontents's minusSet:skipFolders
log fcontents's |count|() --> 1171

There appear to be two issues. First, the skipFolders set contains paths and the fcontents set contains URLs. Second, removing the three folders in the skipFolders set from the fcontents set would only remove these three folders, not the files and folder contained within these three folders.

It’s frustrating that testing whether a given string BEGINSWITH a particular substring (prefix) is so straightforward, yet the equivalent test that asks whether a particular substring prefixes the given string isn’t—as far as I’m aware—among the available string comparison operations.

Constructing a predicate that compares the string value for path against each individual element within a collection is entirely possible, but there are syntactic restrictions that limit collections to being the first (left-hand side) operand in a binary operation. So, even though it feels like we should be able to do this:

NOT (path BEGINSWITH ANY %@)

we can’t. And, while there really should be a string comparison operator to test for prefixes like this:

NOT (ANY %@ PREFIXES path)

there isn’t. The closest we can get is this:

NOT (ANY %@ IN path)

which can also be written like this:

NONE %@ IN path

Since macOS file paths aren’t case-sensitive, it probably makes sense to perform the string comparisons on a similar basis:

NONE %@ IN[c] path

This will, indeed, filter out all the paths in skipFolders and their respective subpaths, however, in the (albeit unlikely) situation where the target directory to be enumerated contains a directory subtree whose relative folder path matches any of the absolute folder paths in skipFolders, then this entire subtree will also be filtered out of the final result.

The likelihood of this is small, but not zero. Nonetheless, if you’re happy with this limitation, then you can replace this:

with this:

	set thePredicate to current application's NSPredicate's predicateWithFormat_("NONE %@ IN[c] path", skipFolders)
	set folderContents to (folderContents's filteredArrayUsingPredicate:thePredicate)

An Alternative Method: SUBQUERY()

Another approach to do away with the AppleScript repeat loop is to bury it within the predicate itself, which is effectively what the SUBQUERY() operation does. It’s probably slightly less performant that the above method, but it might still be more performant that the AppleScript repeat loop in the average case and produces identical results.

If you’re not familiar with the SUBQUERY() expression, you can read about it here. But, it should be pretty easy to intuit how it works by seeing it used in a predicate format string to achieve what we want:

SUBQUERY(%@, $fp, path BEGINSWITH[c] $fp).@count == 0
  • %@: The collection being iterated over by SUBQUERY() passed as an argument to predicateWithFormat_(), namely skipFolders.

  • $fp: This is the variable identifier used to refer to the individual element of the collection within the expression. It serves an equivalent function to that of aFolder in the AppleScript repeat loop from your original script.

  • path BEGINSWITH[c] $fp: This is the condition expression that will test the string value of path for the elements in the array to be filtered against each individual element from the collection passed to SUBQUERY() via %@.

The important thing to note is that SUBQUERY() is, itself, performing a filtering operation on the supplied array, i.e. skipFolders. It, thus, returns a collection, which will be those elements of skipFolders for which the condition expression is true. Since we only want those elements in the folderContents array that fail the test for each and every element of skipFolders, we need the collection returned by the SUBQUERY() expression to contain zero items.

As before, the repeat loop from your original script can be replaced with this:

	set thePredicate to current application's NSPredicate's predicateWithFormat_("SUBQUERY(%@, $fp, path BEGINSWITH[c] $fp).@count == 0", skipFolders)
	set folderContents to (folderContents's filteredArrayUsingPredicate:thePredicate)

I’ll let you evaluate timings for this.

3 Likes

CJK. Thanks for the script suggestions and explanations. I will have to spend some time to understand the subquery approach, but I like learning new stuff.

I decided it might be helpful to retest all of the script suggestions for execution speed and returned results. I quantified the latter by doing a count of the output. I edited Jonas’ script to remove the array-to-list coercion, which was expensive.

Script Milliseconds Item Count
Peavine 19 1161
Jonas 18 1161
Nigel 1 13 1161
Nigel 2 12 1161
CJK 1 13 1161
CJK 2 23 1161

Thanks for the feedback. Although my heart was in the right place, my coding diligence was an issue. I beat on this for a few hours today only to see the elegance of CJK’s predicate suggestion that ran the solution in 20.3 seconds here. I have done very little with advanced predicate syntax.

Does your 19ms result pertain to your original script (i.e. using the AppleScript repeat loop) ?

Yes it does.

Just out of curiosity, I increased the number of paths in the skipFolders list from 3 to 11 and retested. My script (with the repeat loop) took 31 milliseconds and Nigel’s second script took 13 milliseconds. As a practical matter, all of these timing results are perfectly fine, but I still find exercises like this useful because I learn a lot.

How did the others fare ? I’m expecting SUBQUERY() to take a hit in line with the AppleScript repeat loop, as these can both basically be considered to have 𝒪(𝑛²) time complexity compared to what is roughly 𝒪(𝑛) complexity of all the other methods. Your previous table of results didn’t have any surprising results.

It might be possible to shorten the execution time of @ionah’s script by replacing his repeat loop (which merely serves to combine multiple predicates) with a direct construction of the predicate format string (similar to the one you used in your second post) by way of text item delimiters. I imagine this will be speedier than creating and acting upon multiple ObjC object instances, and possibly faster even than invoking the equivalent NSString method for joining array elements. That said, it will shave, at best, only a couple of milliseconds off (if any at all), and I’m personally hesitant to ascribe too much significance to these millisecond differences yielded by informal benchmarking.

On a minor note, the regular expression from @Nigel_Garvey’s post, namely (?:%@).*, would potentially over-filter subtrees of any directory whose named is prefixed by the name of any of the skipFolders housed within the same container, e.g. "/Users/Robert/Pictures_Proving_Earth_Is_Flat". I think (?:%@)(?:/.*)? should match correctly provided the paths specified in skipFolders do not include a terminating slash (catering for the possibility of a trailing slash is a bit more involved).

1 Like

Ah yes! Thanks! I’d probably use your regex solution and insert a line further up to remove any trailing slashes from the skipFolders paths. I’ve escaped the slashes in the regex patterns below simply for consistency with what escapedPatternForString: does. They probably don’t actually need to be escaped.

use framework "Foundation"
use scripting additions

set theFolder to "/Users/Robert"
set skipFolders to {"/Users/Robert/Movies/", "/Users/Robert/Music", "/Users/Robert/Pictures/"}

set fileManager to current application's NSFileManager's defaultManager()
set theFolder to current application's |NSURL|'s fileURLWithPath:theFolder
set folderContents to (fileManager's enumeratorAtURL:theFolder includingPropertiesForKeys:{} options:6 errorHandler:(missing value))'s allObjects() --option 6 skips files and package contents

set thePattern to (current application's NSArray's arrayWithArray:(skipFolders))'s componentsJoinedByString:(linefeed)
-- Remove any trailing slashes.
set thePattern to thePattern's stringByReplacingOccurrencesOfString:("(?m)\\/++$") withString:("") options:(current application's NSRegularExpressionSearch) range:({0, thePattern's |length|()})
set thePattern to current application's NSRegularExpression's escapedPatternForString:(thePattern)
set thePattern to thePattern's stringByReplacingOccurrencesOfString:(linefeed) withString:("|")
-- Use CJK's revised regex pattern.
set thePattern to current application's NSString's stringWithFormat_("(?:%@)(?:\\/.*)?", thePattern)
set thePredicate to current application's NSPredicate's predicateWithFormat_("!(path MATCHES[c] %@)", thePattern)
set folderContents to folderContents's filteredArrayUsingPredicate:(thePredicate)

I reran the timing results with 11 paths in the skipFolders list. I first created one folder and one file that were not returned by any of the scripts except Nigel’s third script (e.g. /Users/Robert/Music Log.txt).

Script Milliseconds Item Count
Peavine 1 32 7
Jonas 30 7
Nigel 1 14 7
Nigel 2 14 7
CJK 1 15 7
CJK 2 54 7
Nigel 3 14 9

I ran my own timing tests this morning, timing just the filter code in each script. The initial URL fetch is common to all the scripts and takes up the bulk of their running time, so any vagaries while it’s in progress to need to be eliminated from the filter method comparison results.

Actual timings, as always, depend on the model and age of the computer, the system it’s running, how much processor time is diverted to background tasks during the tests, the user’s set-up, etc. In the current case, the number and distribution of files and folders in the user’s home directory could be factors. But although my home folder clearly contains far more stuff than peavine’s (junk accumulated over 27 years of Mac ownership), my results are broadly in line with his in regard to what’s faster than what. Only the original three folders were “skipped” in my tests. The scripts were run with all of them open at the same time in both Script Editor and Script Debugger and with a Numbers document open to note the results. I include the results below in case anyone’s interested — and because I wanted to experiment with peavine’s table posting technique. :wink: The most interesting results, as I mentioned somewhere above, are actually the different numbers of items reportedly returned depending on whether the scripts are run in Script Editor or Script Debugger! I’ll have to look into this later on.

Script Editor Script Debugger
Seconds Result Count Seconds Result Count
INITIAL GET 4.849 174260 4.276 230133
-
FILTER
Peavine 1 2.360 171599 3.255 227473
Jonas 2.304 171599 3.243 227473
Nigel 1 1.259 171599 1.755 227473
Nigel 2 1.256 171599 1.756 227473
CJK 1 1.452 171599 2.044 227473
CJK 2 3.473 171599 4.627 227473
Nigel 3 1.351 171599 1.916 227473
1 Like

@Nigel_Garvey
Can you tell us what’s your configuration?
The timing results you shared seem very fast to me.
With my MacPro 2013 under Monterey, your second script takes 28 seconds for 383696 total files and 375463 filtered.

Can you run this script and add it to your timing report?

use framework "Foundation"
use scripting additions

tell application "Finder" to set toKeep to (folders of (path to current user folder) whose name is not in {"Movies", "Music", "Pictures", "Library"}) as alias list

set filteredArray to current application's NSMutableArray's new()
set fileManager to current application's NSFileManager's defaultManager()
repeat with aKept in toKeep
	set entireContent to (fileManager's enumeratorAtURL:aKept includingPropertiesForKeys:{} options:6 errorHandler:(missing value))'s allObjects()
	set filteredArray to (filteredArray's arrayByAddingObjectsFromArray:entireContent)
end repeat

filteredArray's |count|()

:wink:

Hi @ionah. I did the tests on my desktop machine, which is an iMac 18,3 with a 3.4 GHz Quad-Core Intel Core I5 processor running macOS Ventura 13.6.7. I didn’t bother copying the files over to my M3 MacBook to test there. Its screen’s tiny. :wink:

As I explained above, the comparative timings are only for the scripts’ filter set-up-and-execution parts, which will obviously be less than when the fetching of the disk data and creation of the original URL array are included.

On my iMac, your script takes 0.96 seconds in Script Editor and 1.047 seconds in Script Debugger, reporting 57190 items in both cases. Only getting the folder contents you actually want is clearly more efficient than getting your entire home folder contents and filtering out what you don’t. But in the spirit of peavine’s original enquiry and of not assuming that the “skipped” folders will be siblings in the same root folder, I didn’t pursue that path myself. :wink:

Nigel–a point of clarification.

My original script got the entire contents of my home folder and skipped three folders (and their contents). In that context, I would not expect your above statement to be the case. In retesting and for testing purposes only, I skipped all but a few folders, and in that context I would expect your above statement to be the case. Do I misunderstand what you are saying?

I misread this part…
In this context, my results are close to yours.
Many thanks!

Hi @peavine. I’m not quite clear myself what you’re asking. :smile: But what I was thinking is that your script and those derived from it don’t skip folders, they get the entire contents of the root folder and then filter the unwanted folders and their contents from the result. ionah’s most recent script does skip folders, hence its speed. But it’s only good where the folders are immediate subfolders of the root.

If anyone whose not an expert is passing by, the script can accept a list of folders coming from any place. It needs a small adaptation:

use framework "Foundation"
use scripting additions

set toKeep to {"/Applications", "/Users/Shared"}

set filteredArray to current application's NSMutableArray's new()
set fileManager to current application's NSFileManager's defaultManager()
repeat with aKept in toKeep
	set anURL to (current application's NSURL's fileURLWithPath:aKept)
	set entireContent to (fileManager's enumeratorAtURL:anURL includingPropertiesForKeys:{} options:6 errorHandler:(missing value))'s allObjects()
	set filteredArray to (filteredArray's arrayByAddingObjectsFromArray:entireContent)
end repeat

filteredArray's |count|()

Nigel. You were responding to Jonas’ script, which I hadn’t read when I posted, and I now understand your comment. Sorry about that. And, BTW, in actual testing my thought was wrong anyways. :frowning: