Eliminate repeat loop with predicate

The following does what I want and only takes 19 milliseconds. Just for learning purposes, I wondered if the predicate could be written to work without the repeat loop. Thanks!

use framework "Foundation"
use scripting additions

set theFolder to "/Users/Robert"
set skipFolders to {"/Users/Robert/Movies", "/Users/Robert/Music", "/Users/Robert/Pictures"}
set fileManager to current application's NSFileManager's defaultManager()
set theFolder to current application's |NSURL|'s fileURLWithPath:theFolder
set folderContents to (fileManager's enumeratorAtURL:theFolder includingPropertiesForKeys:{} options:6 errorHandler:(missing value))'s allObjects() --option 6 skips files and package contents
repeat with aFolder in skipFolders
	set thePredicate to current application's NSPredicate's predicateWithFormat_("not (path BEGINSWITH %@)", aFolder)
	set folderContents to (folderContents's filteredArrayUsingPredicate:thePredicate)
end repeat

I guess the following is an alternative, but it doesn’t allow you to easily add or remove from the skipFolders list, and it takes as long as the previous version.

use framework "Foundation"
use scripting additions

set theFolder to "/Users/Robert"
set skipFolders to {"/Users/Robert/Movies", "/Users/Robert/Music", "/Users/Robert/Pictures"}
set skipFolders to current application's NSArray's arrayWithArray:skipFolders
set fileManager to current application's NSFileManager's defaultManager()
set theFolder to current application's |NSURL|'s fileURLWithPath:theFolder
set folderContents to (fileManager's enumeratorAtURL:theFolder includingPropertiesForKeys:{} options:6 errorHandler:(missing value))'s allObjects() --option 6 skips files and package contents
set thePredicate to current application's NSPredicate's predicateWithFormat_("not (path BEGINSWITH %@ or path BEGINSWITH %@ or path BEGINSWITH %@)", skipFolders's objectAtIndex:0, skipFolders's objectAtIndex:1, skipFolders's objectAtIndex:2)
set folderContents to (folderContents's filteredArrayUsingPredicate:thePredicate)

For this you can use NSCompoundPredicate:

use framework "Foundation"
use scripting additions

set theFolder to "/Users/Robert"
set skipFolders to {"/Users/Robert/Movies", "/Users/Robert/Music", "/Users/Robert/Pictures"}

set predArray to current application's NSMutableArray's new()
repeat with aFolder in skipFolders
	(predArray's addObject:(current application's NSPredicate's predicateWithFormat_("not (path BEGINSWITH %@)", aFolder)))
end repeat
set thePredicate to (current application's NSCompoundPredicate's andPredicateWithSubpredicates:predArray)

set fileManager to current application's NSFileManager's defaultManager()
set theFolder to current application's |NSURL|'s fileURLWithPath:theFolder
set folderContents to (fileManager's enumeratorAtURL:theFolder includingPropertiesForKeys:{} options:6 errorHandler:(missing value))'s allObjects() --option 6 skips files and package contents
set folderContents to (folderContents's filteredArrayUsingPredicate:thePredicate) as list
1 Like

Hi both.

If it’s just an exercise, there’s this, I suppose:

use framework "Foundation"
use scripting additions

set theFolder to "/Users/Robert"
set skipFolders to {"/Users/Robert/Movies", "/Users/Robert/Music", "/Users/Robert/Pictures"}
set fileManager to current application's NSFileManager's defaultManager()
set theFolder to current application's |NSURL|'s fileURLWithPath:theFolder
set folderContents to (fileManager's enumeratorAtURL:theFolder includingPropertiesForKeys:{} options:6 errorHandler:(missing value))'s allObjects() --option 6 skips files and package contents

set formatString to (current application's NSArray's arrayWithArray:skipFolders)'s componentsJoinedByString:"|"
set formatString to current application's NSString's stringWithFormat_("!(path MATCHES '(%@).*')", formatString)
set thePredicate to current application's NSPredicate's predicateWithFormat:formatString
set folderContents to (folderContents's filteredArrayUsingPredicate:thePredicate)
1 Like

Thanks Jonas and Nigel–both of your solutions work great. Nigel’s solution was fastest, adding only 5 milliseconds to the time it takes to get folderContents before any filtering. Also, adding four additional folders to the skipFolders list didn’t increase the overall timing result of Nigel’s script at all.

Ideally, my solution should be proofed against any possible regex symbols or single-quotes in the skipFolder paths:

use framework "Foundation"
use scripting additions

set theFolder to "/Users/Robert"
set skipFolders to {"/Users/Robert/Movies", "/Users/Robert/Music", "/Users/Robert/Pictures"}

set fileManager to current application's NSFileManager's defaultManager()
set theFolder to current application's |NSURL|'s fileURLWithPath:theFolder
set folderContents to (fileManager's enumeratorAtURL:theFolder includingPropertiesForKeys:{} options:6 errorHandler:(missing value))'s allObjects() --option 6 skips files and package contents

set thePattern to (current application's NSArray's arrayWithArray:(skipFolders))'s componentsJoinedByString:(linefeed)
set thePattern to current application's NSRegularExpression's escapedPatternForString:(thePattern)
set thePattern to thePattern's stringByReplacingOccurrencesOfString:(linefeed) withString:("|")
set thePattern to current application's NSString's stringWithFormat_("(?:%@).*", thePattern)
set thePredicate to current application's NSPredicate's predicateWithFormat_("!(path MATCHES %@)", thePattern)
set folderContents to folderContents's filteredArrayUsingPredicate:(thePredicate)

When I add a |count|() instruction to the above scripts, they all give the same results, which is encouraging. But interestingly, the counts in Script Debugger are somewhat different from those in Script Editor! :open_mouth:

1 Like

@Nigel_Garvey
Very elegant, as always!
Particularly the sting format pattern: "(?:%@).*".
May I ask why using non-marking sub expression?
I found that if the pattern is "%@.*", the script returns more results and I don’t see why these results should be omitted.

Hi ionah.

In the stringWithFormat: line, “%@” represents the value of thePattern, which at that point is a series of path beginnings delimited by “|” — the regex equivalent of “OR”. The beginnings have to be parenthesised together in the regex so that the “.*” matches the rest of each path for each of them. If they weren’t so grouped, the “.*” would only apply to the last one and only the root folders would be matched in the other cases. This is why there are more results without the grouping. The root folders’ paths are matched and excluded, but not the paths to their contents.

Making the group a non-capture one is just an obsessive drive for efficiency on my part. I doubt it makes much difference in practice. :wink:

2 Likes

Nigel. I tested your original script in both Script Editor and Script Debugger and received the same file count under various test scenarios. I also looked at the actual files returned and saw no discrepancy.

On a different issue, an NSPredicate cheat-sheet that I use contains the following:

Common mistakes. Using [NSString stringWithFormat:] to build predicates is prone to have non-escaped diacritics or artifacts like an apostrophe. Use [NSPredicate predicateWithFormat:] instead.

I tried to eliminate the use of stringWithFormat in your first script but couldn’t get that to work (possibly for the reason cited). Do you know how to do this? Thanks!

use framework "Foundation"
use scripting additions

set theFolder to "/Users/Robert"
set skipFolders to {"/Users/Robert/Movies", "/Users/Robert/Music", "/Users/Robert/Pictures"}
set fileManager to current application's NSFileManager's defaultManager()
set theFolder to current application's |NSURL|'s fileURLWithPath:theFolder
set folderContents to (fileManager's enumeratorAtURL:theFolder includingPropertiesForKeys:{} options:6 errorHandler:(missing value))'s allObjects() --option 6 skips files and package contents

set formatString to (current application's NSArray's arrayWithArray:skipFolders)'s componentsJoinedByString:"|"
--in the next code line the regex pattern has to be placed in single quotes 
--but that appears to stop %@ from expanding to formatString's values
set thePredicate to current application's NSPredicate's predicateWithFormat_("!(path MATCHES '(%@).*')", formatString)
set folderContents to (folderContents's filteredArrayUsingPredicate:thePredicate)

Hi peavine.

I think the difficulty you’re having here is that with stringWithFormat:, the substitution values are simply spiced into the format, whereas with predicateWithFormat:, they’re quoted into it — if they’re strings. My second, hopefully improved script above makes use of this. It only uses stringWithFormat: to complete the regex pattern that has to be matched, and that only after escapedPatternForString: has ensured that the characters in the paths are treated as literals.

Hope this makes sense!

1 Like

The following code runs in 20s on my M2 Mac mini Pro in Script Debugger. The containsObject result is false here for skipFolders and their contents.

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set fm to current application's NSFileManager's defaultManager()
set skipFolders to current application's NSMutableSet's alloc()'s init()
set fcontents to current application's NSMutableSet's alloc()'s init()

set theFolder to current application's NSURL's fileURLWithPath:"/Users/Robert"
skipFolders's addObjectsFromArray:{"/Users/Robert/Movies", "/Users/Robert/Music", "/Users/Robert/Pictures"}

set folderContents to (fm's enumeratorAtURL:theFolder includingPropertiesForKeys:{} options:6 errorHandler:(missing value))'s allObjects()

fcontents's addObjectsFromArray:folderContents
fcontents's minusSet:skipFolders
-- log (fcontents's allObjects()'s containsObject:"/Users/Robert/Movies/some_movie.mp4") as boolean
log (fcontents's allObjects()'s containsObject:"/Users/Robert/Music/iTunes/") as boolean
1 Like

VikingOSX. Thanks for the script suggestion.

I don’t think your script works as expected, and the following demonstrates this:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set fm to current application's NSFileManager's defaultManager()
set skipFolders to current application's NSMutableSet's alloc()'s init()
set fcontents to current application's NSMutableSet's alloc()'s init()
set theFolder to current application's NSURL's fileURLWithPath:"/Users/Robert"
skipFolders's addObjectsFromArray:{"/Users/Robert/Movies", "/Users/Robert/Music", "/Users/Robert/Pictures"}
set folderContents to (fm's enumeratorAtURL:theFolder includingPropertiesForKeys:{} options:6 errorHandler:(missing value))'s allObjects()
fcontents's addObjectsFromArray:folderContents

log fcontents's |count|() --> 1171
fcontents's minusSet:skipFolders
log fcontents's |count|() --> 1171

There appear to be two issues. First, the skipFolders set contains paths and the fcontents set contains URLs. Second, removing the three folders in the skipFolders set from the fcontents set would only remove these three folders, not the files and folder contained within these three folders.

It’s frustrating that testing whether a given string BEGINSWITH a particular substring (prefix) is so straightforward, yet the equivalent test that asks whether a particular substring prefixes the given string isn’t—as far as I’m aware—among the available string comparison operations.

Constructing a predicate that compares the string value for path against each individual element within a collection is entirely possible, but there are syntactic restrictions that limit collections to being the first (left-hand side) operand in a binary operation. So, even though it feels like we should be able to do this:

NOT (path BEGINSWITH ANY %@)

we can’t. And, while there really should be a string comparison operator to test for prefixes like this:

NOT (ANY %@ PREFIXES path)

there isn’t. The closest we can get is this:

NOT (ANY %@ IN path)

which can also be written like this:

NONE %@ IN path

Since macOS file paths aren’t case-sensitive, it probably makes sense to perform the string comparisons on a similar basis:

NONE %@ IN[c] path

This will, indeed, filter out all the paths in skipFolders and their respective subpaths, however, in the (albeit unlikely) situation where the target directory to be enumerated contains a directory subtree whose relative folder path matches any of the absolute folder paths in skipFolders, then this entire subtree will also be filtered out of the final result.

The likelihood of this is small, but not zero. Nonetheless, if you’re happy with this limitation, then you can replace this:

with this:

	set thePredicate to current application's NSPredicate's predicateWithFormat_("NONE %@ IN[c] path", skipFolders)
	set folderContents to (folderContents's filteredArrayUsingPredicate:thePredicate)

An Alternative Method: SUBQUERY()

Another approach to do away with the AppleScript repeat loop is to bury it within the predicate itself, which is effectively what the SUBQUERY() operation does. It’s probably slightly less performant that the above method, but it might still be more performant that the AppleScript repeat loop in the average case and produces identical results.

If you’re not familiar with the SUBQUERY() expression, you can read about it here. But, it should be pretty easy to intuit how it works by seeing it used in a predicate format string to achieve what we want:

SUBQUERY(%@, $fp, path BEGINSWITH[c] $fp).@count == 0
  • %@: The collection being iterated over by SUBQUERY() passed as an argument to predicateWithFormat_(), namely skipFolders.

  • $fp: This is the variable identifier used to refer to the individual element of the collection within the expression. It serves an equivalent function to that of aFolder in the AppleScript repeat loop from your original script.

  • path BEGINSWITH[c] $fp: This is the condition expression that will test the string value of path for the elements in the array to be filtered against each individual element from the collection passed to SUBQUERY() via %@.

The important thing to note is that SUBQUERY() is, itself, performing a filtering operation on the supplied array, i.e. skipFolders. It, thus, returns a collection, which will be those elements of skipFolders for which the condition expression is true. Since we only want those elements in the folderContents array that fail the test for each and every element of skipFolders, we need the collection returned by the SUBQUERY() expression to contain zero items.

As before, the repeat loop from your original script can be replaced with this:

	set thePredicate to current application's NSPredicate's predicateWithFormat_("SUBQUERY(%@, $fp, path BEGINSWITH[c] $fp).@count == 0", skipFolders)
	set folderContents to (folderContents's filteredArrayUsingPredicate:thePredicate)

I’ll let you evaluate timings for this.

3 Likes

CJK. Thanks for the script suggestions and explanations. I will have to spend some time to understand the subquery approach, but I like learning new stuff.

I decided it might be helpful to retest all of the script suggestions for execution speed and returned results. I quantified the latter by doing a count of the output. I edited Jonas’ script to remove the array-to-list coercion, which was expensive.

Script Milliseconds Item Count
Peavine 19 1161
Jonas 18 1161
Nigel 1 13 1161
Nigel 2 12 1161
CJK 1 13 1161
CJK 2 23 1161

Thanks for the feedback. Although my heart was in the right place, my coding diligence was an issue. I beat on this for a few hours today only to see the elegance of CJK’s predicate suggestion that ran the solution in 20.3 seconds here. I have done very little with advanced predicate syntax.

Does your 19ms result pertain to your original script (i.e. using the AppleScript repeat loop) ?

Yes it does.

Just out of curiosity, I increased the number of paths in the skipFolders list from 3 to 11 and retested. My script (with the repeat loop) took 31 milliseconds and Nigel’s second script took 13 milliseconds. As a practical matter, all of these timing results are perfectly fine, but I still find exercises like this useful because I learn a lot.

How did the others fare ? I’m expecting SUBQUERY() to take a hit in line with the AppleScript repeat loop, as these can both basically be considered to have 𝒪(𝑛²) time complexity compared to what is roughly 𝒪(𝑛) complexity of all the other methods. Your previous table of results didn’t have any surprising results.

It might be possible to shorten the execution time of @ionah’s script by replacing his repeat loop (which merely serves to combine multiple predicates) with a direct construction of the predicate format string (similar to the one you used in your second post) by way of text item delimiters. I imagine this will be speedier than creating and acting upon multiple ObjC object instances, and possibly faster even than invoking the equivalent NSString method for joining array elements. That said, it will shave, at best, only a couple of milliseconds off (if any at all), and I’m personally hesitant to ascribe too much significance to these millisecond differences yielded by informal benchmarking.

On a minor note, the regular expression from @Nigel_Garvey’s post, namely (?:%@).*, would potentially over-filter subtrees of any directory whose named is prefixed by the name of any of the skipFolders housed within the same container, e.g. "/Users/Robert/Pictures_Proving_Earth_Is_Flat". I think (?:%@)(?:/.*)? should match correctly provided the paths specified in skipFolders do not include a terminating slash (catering for the possibility of a trailing slash is a bit more involved).

1 Like

Ah yes! Thanks! I’d probably use your regex solution and insert a line further up to remove any trailing slashes from the skipFolders paths. I’ve escaped the slashes in the regex patterns below simply for consistency with what escapedPatternForString: does. They probably don’t actually need to be escaped.

use framework "Foundation"
use scripting additions

set theFolder to "/Users/Robert"
set skipFolders to {"/Users/Robert/Movies/", "/Users/Robert/Music", "/Users/Robert/Pictures/"}

set fileManager to current application's NSFileManager's defaultManager()
set theFolder to current application's |NSURL|'s fileURLWithPath:theFolder
set folderContents to (fileManager's enumeratorAtURL:theFolder includingPropertiesForKeys:{} options:6 errorHandler:(missing value))'s allObjects() --option 6 skips files and package contents

set thePattern to (current application's NSArray's arrayWithArray:(skipFolders))'s componentsJoinedByString:(linefeed)
-- Remove any trailing slashes.
set thePattern to thePattern's stringByReplacingOccurrencesOfString:("(?m)\\/++$") withString:("") options:(current application's NSRegularExpressionSearch) range:({0, thePattern's |length|()})
set thePattern to current application's NSRegularExpression's escapedPatternForString:(thePattern)
set thePattern to thePattern's stringByReplacingOccurrencesOfString:(linefeed) withString:("|")
-- Use CJK's revised regex pattern.
set thePattern to current application's NSString's stringWithFormat_("(?:%@)(?:\\/.*)?", thePattern)
set thePredicate to current application's NSPredicate's predicateWithFormat_("!(path MATCHES[c] %@)", thePattern)
set folderContents to folderContents's filteredArrayUsingPredicate:(thePredicate)

I reran the timing results with 11 paths in the skipFolders list. I first created one folder and one file that were not returned by any of the scripts except Nigel’s third script (e.g. /Users/Robert/Music Log.txt).

Script Milliseconds Item Count
Peavine 1 32 7
Jonas 30 7
Nigel 1 14 7
Nigel 2 14 7
CJK 1 15 7
CJK 2 54 7
Nigel 3 14 9