What is the quickest way to apply regex to an entire array?

I’m attempting to create a custom array of Music/iTunes sort name records as an NSArray as per my own NSRegularExpression which I will later compare against the current state of sort name records (theSortName), the goal being to only update records based on further conditions after getting a short list of theID.
In the example below, I have the desired outcome in the array myTest which takes a considerable amount of time with 15,000 records.

I’m experimenting with theRegexMutableArray to see if this will speed things up,
Should I continue down the NSMutable array path to potentially save time?
Does this save much time rather than building/adding to the end of a new array?
Is there a better way somehow with valueForKey?
Can I somehow parse the full array to sortString(someText)?

tell application "Music"
	set theID to current application's NSArray's arrayWithArray:(get persistent ID of every file track in library playlist 1)
	set theName to current application's NSArray's arrayWithArray:(get name of every file track in library playlist 1)
	set theSortName to current application's NSArray's arrayWithArray:(get sort name of every file track of library playlist 1)
	set theRegexMutableArray to current application's NSMutableArray's arrayWithArray:(get name of every file track in library playlist 1)
end tell

set myTest to {}
repeat with i from 1 to count of theID
	set end of myTest to sortString(theName's objectAtIndex:(i - 1))
end repeat

repeat with i from 1 to count of theID
	(theRegexMutableArray's replaceObjectAtIndex:(i - 1) withObject:(sortString(theRegexMutableArray's objectAtIndex:(i - 1))))   --edited, working but still slow
end repeat

on sortString(someText)
	set theNSString to current application's NSString's stringWithString:someText
	set theOptions to (current application's NSRegularExpressionDotMatchesLineSeparators as integer) + (current application's NSRegularExpressionAnchorsMatchLines as integer) + (current application's NSCaseInsensitiveSearch as integer)
	set theRegEx to current application's NSRegularExpression's regularExpressionWithPattern:"^(A |The )(.*$)" options:theOptions |error|:(missing value)
	set theNSString to theRegEx's stringByReplacingMatchesInString:theNSString options:0 range:{location:0, |length|:theNSString's |length|()} withTemplate:"$2"
	return theNSString
end sortString

I’m new to Objective-C, I’m still trying to understand translation of code signatures from the documentation and how to query arrays from a database perspective. Appreciate the ITLibrary Framework suggestions in recent posts, thanks! So after I’ve applied regex to an entire array in an optimized manner, I’d like some help to understand the following a little better…

Ive created myReferenceArray, with just 3 key/value pairs for now which should help me cross reference theRegexMutableArray with current state theSortName using NSPredicate. I’m sure there is a better way. Am I creating the myReferenceArray correctly? What else could be improved on here?

set myReferenceArray to {}
repeat with i from 1 to count of theID
	set end of myReferenceArray to {theID:theID's objectAtIndex:(i - 1), theSortName:theSortName's objectAtIndex:(i - 1), theRegexMutableArray:theRegexMutableArray's objectAtIndex:(i - 1)}
end repeat
set myReferenceArray to current application's NSArray's arrayWithArray:myReferenceArray
set thePred to current application's NSPredicate's predicateWithFormat:"theRegexMutableArray CONTAINS theSortName" -- edited, working
set theFilteredResult to (myReferenceArray's filteredArrayUsingPredicate:thePred)

Thank you.

I suspect there won’t be any drastic difference either way – but it’s the sort of thing you should test. Making assumptions about timings in AppleScript is something that’s bitten many of us.

You can only use valueForKey for properties, or methods without parameters.

There’s no real shortcut. You might find using a mutable array and addObject: is quicker for a large number of items, but it tends to be slower for smaller lists.

Thanks Shane.

Regarding myReferenceArray : Your suggestion worked a charm.
I’m now Initializing a mutable array and using addObject. Although I don’t believe I can avoid the repeat loop in this case (please correct me if I’m wrong), I’m getting the required results at a 400% improvement when compared to adding to end of myReferenceArray {} as list . This is great!

set myReferenceArray to current application's NSMutableArray's alloc's init()
repeat with i from 1 to count of theID
	set myData to {theID:theID's objectAtIndex:(i - 1), theSortName:theSortName's objectAtIndex:(i - 1), theRegexMutableArray:theRegexMutableArray's objectAtIndex:(i - 1)}
	(myReferenceArray's addObject:myData)
end repeat
set thePred to current application's NSPredicate's predicateWithFormat:"theRegexMutableArray CONTAINS theSortName "
set theFilteredResult to (myReferenceArray's filteredArrayUsingPredicate:thePred)

theFilteredResult:
(NSArray) {
{
theID:“6BE13A45E8239EA7”,
theSortName:“Funk (Original Mix)”,
theRegexMutableArray:“The Funk (Original Mix)”
},
{
theID:“59D7A378DD1FBC02”,
theSortName:“Way You Do (Original Mix)”,
theRegexMutableArray:“The Way You Do (Original Mix)”
},
{
theID:“6DB4031F1723BCAD”,
theSortName:“Stars (Hatiras Remix)”,
theRegexMutableArray:“The Stars (Hatiras Remix)”
},
{
theID:“DBD2C1E62F95EB90”,
theSortName:“Stars (Original Mix)”,
theRegexMutableArray:“The Stars (Original Mix)”
},
{
theID:“39C0D29216063FE5”,
theSortName:“Whispers - And The Beat Goes On (Purple Disco Machine Edit)”,
theRegexMutableArray:“The Whispers - And The Beat Goes On (Purple Disco Machine Edit)”
}
}

Hi.

Have you tried editing the names in vanilla AppleScript first, before setting the NSArrays?

use framework "Foundation"

tell application "Music"
	set theID to current application's NSArray's arrayWithArray:(get persistent ID of every file track in library playlist 1)
	set theName to (get name of every file track in library playlist 1) -- Not an NSArray here.
	set theSortName to current application's NSArray's arrayWithArray:(get sort name of every file track of library playlist 1)
	-- set theRegexMutableArray to current application's NSMutableArray's arrayWithArray:(get name of every file track in library playlist 1)
end tell

set myTest to getSortNames(theName)
set theRegexMutableArray to current application's NSMutableArray's arrayWithArray:myTest
set theName to current application's NSArray's arrayWithArray:theName

on getSortNames(theNames)
	script o
		property sortNames : theNames's items
	end script
	
	repeat with i from 1 to (count o's sortNames)
		set thisName to item i of o's sortNames
		if ((thisName begins with "A ")) or (thisName begins with "The ") then set item i of o's sortNames to text from word 2 to end of thisName
	end repeat
	
	return o's sortNames
end getSortNames

Thanks, Nigel.

This has been most enlightening as your suggestion in vanilla Applescript achieves results in under 3 seconds! So I guess, one cannot assume ASOC will be faster, sometimes keeping it vanilla will be more efficient.
But how is this so? Every time I build or repeat over a list in vanilla Applescript, it usually slows things down.

I thought I would put this to the test in NSArrays just for my own learning, and adopted your approach by only updating the values required rather than running regex on the entire array. The conditional filtering and prefix removal improved runtime significantly, but nowhere near the suggested vanilla Applescript.

This is what I have come up with so far, which is 675% slower than your suggestion, so my question is why? I’d like to understand if there Is there a better way I should be approaching ASOC, which enables it to run as efficiently as your vanilla Applescript?

tell application "Music"
	set theID to current application's NSArray's arrayWithArray:(get persistent ID of every file track in library playlist 1)
	--set theName to (get name of every file track in library playlist 1) -- Not an NSArray here.
	--set theSortName to current application's NSArray's arrayWithArray:(get sort name of every file track of library playlist 1)
	set theRegexMutableArray to current application's NSMutableArray's arrayWithArray:(get name of every file track in library playlist 1)
end tell

repeat with i from 1 to count of theID
	set thisName to (theRegexMutableArray's objectAtIndex:(i - 1))
	if (thisName's hasPrefix:"A ") or (thisName's hasPrefix:"The ") then
		set thisName to (thisName's componentsSeparatedByString:" ")
		(thisName's removeObjectAtIndex:0)
		set thisName to (thisName's componentsJoinedByString:" ")
		(theRegexMutableArray's replaceObjectAtIndex:(i - 1) withObject:thisName)
	end if
end repeat

Many thanks.

Hi MrCee.

While ObjectiveC can be a lot faster than AppleScript for some purposes, a lot of AppleScriptObjC involves switching back and forth between AppleScript and ObjectiveC by means of the “Scripting Bridge”. This often happens automatically, as for example in the instruction:

if (thisName's hasPrefix:"A ") or (thisName's hasPrefix:"The ") then

Every time either of the hasPrefix: methods is executed, the AppleScript string supplied to it is coerced to an NString first and the ObjectiveC boolean returned as the test result is coerced to the AppleScript equivalent afterwards. Perversely, testing just now, I’m finding that this is actually faster than using variables explicitly set to the NSStrings before the repeat! :confused: But generally, although AppleScript’s designed to let you switch around easily between languages and agents, it works faster the fewer individual instructions it sends to things outside itself.

Another point with regard to your test code is that the NSString class has its own regex-capable search-and-replace method which is faster and more convenient than what you’ve been trying so far. It’s still not as fast as the vanilla, but in the current context, is a better ASObjC solution:

tell application "Music"
	--

	set theRegexMutableArray to current application's NSMutableArray's arrayWithArray:(get name of every file track in library playlist 1)
end tell

repeat with thisName in theRegexMutableArray
	if (thisName's hasPrefix:"A ") or (thisName's hasPrefix:"The ") then
		set thisName's contents to (thisName's stringByReplacingOccurrencesOfString:"^(?:A |The )" withString:"" options:(current application's NSRegularExpressionSearch) range:{0, thisName's |length|()})
	end if
end repeat

If you know that a sizeable number of the strings do begin with either "A " or “The”, you may find it faster to leave out the ‘if’ statement and let the strings be replaced anyway, even if the replacements are no different from the originals.

Another alternative you may like to try is this:

repeat with thisName in theRegexMutableArray
	if (thisName's hasPrefix:"A ") then
		set thisName's contents to (thisName's stringByReplacingOccurrencesOfString:"A " withString:"")
	else if (thisName's hasPrefix:"The ") then
		set thisName's contents to (thisName's stringByReplacingOccurrencesOfString:"The" withString:"")
	end if
end repeat

The tests take about the same amount of time, but the replacement method does a simpler search and is therefore slightly faster.

I’ve already seen 3 cases where vanilla AppleScript is faster than optimally written AsObjC code to accomplish the same task. Faster hybrid cases are even more common to me. Like here.

In addition to the reasons given by Nigel Garvey, vanilla AppleScript boasts super-fast tools such as text items delimiters, getting the size of a file structure’s item, working with script objects by reference, incrementing an array by adding item’s content instead of adding an item, super fast working the Finder with AppleScript aliases array, and more. This is just the tip of the iceberg that we can see.

Thanks Guys.

Fredrik71 – I needed to see that post. It’s definitely answered some of the questions I had.

Nigel, your help with this exercise has been invaluable. Now I’m understanding Script Objects or ”script o” better than before. Also, the NSString class regex-capable search-and-replace method was something I overlooked; this has definitely improved the speed compared to my previous method. Now I can apply regex across 15,000 NSArray items in approx. 12 seconds.

Isn’t that interesting. I’m glad I’ve experienced this now for myself before I build multiple repeat loops. So essentially whenever possible, removing the repeat count and not setting a objectAtIndex variable at each repeat; and therefore using ‘repeat with thisName in theRegexMutableArray’ as you have shown in your example actually improves repeat loop runtime by 30%.

So pulling it all together, what I’ve learned since my original post…

  1. My goal was to apply regex to an entire array, and could do this in 2 ways. Vanilla Applescript is certainly faster when removing the first word and essentially keeping it simple. In the future, I will need the power of NSRegularExpression to add further regex conditions involving tokens and backreferences which potentially vanilla Applescript will not handle. So I’ve optimized regex in ASOC and will build on this later. This is what I have which takes approx. 12 seconds for 15.000 items….
--Nigel's ASOC suggestion: 12 seconds for 15,000 items
repeat with thisName in theRegexMutableArray
    if (thisName's hasPrefix:"A ") or (thisName's hasPrefix:"The ") then
        set thisName's contents to (thisName's stringByReplacingOccurrencesOfString:"^(?:A |The )" withString:"" options:(current application's NSRegularExpressionSearch) range:{0, thisName's |length|()})
    end if
end repeat
  1. I set out to create an NSArray with multiple key/value pairs in which I could query/filter easily. This takes time to build. I’ve now optimized the creation of myReferenceArray in ASOC as NSMutableArray using addObject and the repeat loop and is much faster than adding to the end of a list {} Building this NSArray takes 25 seconds for 15,000 ‘rows’.
--Shane's ASOC suggestion: 25 seconds for 15,000 items
set myReferenceArray to current application's NSMutableArray's alloc's init()
repeat with i from 1 to count of theID
    set myData to {theID:theID's objectAtIndex:(i - 1), theSortName:theSortName's objectAtIndex:(i - 1), theRegexMutableArray:theRegexMutableArray's objectAtIndex:(i - 1)}
    (myReferenceArray's addObject:myData)
end repeat

Is there a quicker way to build an NSArray and apply the regex changes? After seeing the speed vanilla Applescript can achieve, I thought I’d combine the best of both worlds using script object properties and this is what I’ve come up with, while comparing 3 regex replace methods….

use framework "Foundation"

tell application "Music"
    set theID to (get persistent ID of every file track in library playlist 1)
    set theName to (get name of every file track in library playlist 1)
    set theSortName to (get sort name of every file track of library playlist 1)
end tell

on makeNSArrayQuickly(theIDs, theNames, theSortNames)
    script o
        property myIDs : theIDs's items
        property myNames : theNames's items
        property mySortNames : theSortNames's items
        property myRegexSortNames : theNames's items --to be updated
        property myResult : {}
    end script
    
    repeat with i from 1 to (count o's myIDs)
        set thisName to item i of o's myRegexSortNames
        --if ((thisName begins with "A ")) or (thisName begins with "The ") then set item i of o's myRegexSortNames to text from word 2 to end of thisName --2.96 SECONDS
        --if ((thisName begins with "A ")) or (thisName begins with "The ") then set item i of o's myRegexSortNames to sortString1(thisName) --4.35 SECONDS
        if ((thisName begins with "A ")) or (thisName begins with "The ") then set item i of o's myRegexSortNames to sortString2(thisName) --3.64 SECONDS
        set end of o's myResult to {theID:item i of o's myIDs, theName:item i of o's myNames, theSortName:item i of o's mySortNames, myRegexSortName:item i of o's myRegexSortNames}
    end repeat
    
    set myResult to current application's NSMutableArray's arrayWithArray:(o's myResult)
    return myResult
end makeNSArrayQuickly

set myReferenceArray to (makeNSArrayQuickly(theID, theName, theSortName))
set thePred to current application's NSPredicate's predicateWithFormat:"myRegexSortName CONTAINS theSortName "
set theFilteredResult to (myReferenceArray's filteredArrayUsingPredicate:thePred)

on sortString1(someText) --4.35 SECONDS
    set theNSString to current application's NSString's stringWithString:someText
    set theOptions to (current application's NSRegularExpressionDotMatchesLineSeparators as integer) + (current application's NSRegularExpressionAnchorsMatchLines as integer) + (current application's NSCaseInsensitiveSearch as integer)
    set theRegEx to current application's NSRegularExpression's regularExpressionWithPattern:"^(A |The )(.*$)" options:theOptions |error|:(missing value)
    set theNSString to theRegEx's stringByReplacingMatchesInString:theNSString options:0 range:{location:0, |length|:theNSString's |length|()} withTemplate:"$2"
    return theNSString
end sortString1

on sortString2(someText) --3.64 SECONDS
    set theNSString to current application's NSString's stringWithString:someText
    set theNSString to (theNSString's stringByReplacingOccurrencesOfString:"^(?:A |The )" withString:"" options:(current application's NSRegularExpressionSearch) range:{0, theNSString's |length|()})
    return theNSString
end sortString2 

I’m pleased to say that with all of the suggestions above, 15,000 key/value pairs with regex applied to the entire array can be completed in under 5 seconds!

theFilteredResult:

(NSArray) {
{
myRegexSortName:“Funk (Original Mix)”,
theSortName:“Funk (Original Mix)”,
theID:“6BE13A45E8239EA7”,
theName:“The Funk (Original Mix)”
},
{
myRegexSortName:“Way You Do (Original Mix)”,
theSortName:“Way You Do (Original Mix)”,
theID:“59D7A378DD1FBC02”,
theName:“The Way You Do (Original Mix)”
},
{
myRegexSortName:“Stars (Hatiras Remix)”,
theSortName:“Stars (Hatiras Remix)”,
theID:“6DB4031F1723BCAD”,
theName:“The Stars (Hatiras Remix)”
},
{
myRegexSortName:“Stars (Original Mix)”,
theSortName:“Stars (Original Mix)”,
theID:“DBD2C1E62F95EB90”,
theName:“The Stars (Original Mix)”
},
{
myRegexSortName:“Whispers - And The Beat Goes On (Purple Disco Machine Edit)”,
theSortName:“Whispers - And The Beat Goes On (Purple Disco Machine Edit)”,
theID:“39C0D29216063FE5”,
theName:“The Whispers - And The Beat Goes On (Purple Disco Machine Edit)”
}
}

I think the next thing I’ll need to learn is NSDictionary and shared keys to potentially search and query as per the above in a more efficient way. If anyone has any further suggestions or tweaks, let me know. I would appreciate it.

Cheers!