Thursday, February 25, 2021

#1 2020-11-23 05:01:39 am

MrCee
Member
Registered: 2016-07-01
Posts: 12

What is the quickest way to apply regex to an entire array?

I’m attempting to create a custom array of Music/iTunes sort name records as an NSArray as per my own NSRegularExpression which I will later compare against the current state of sort name records (theSortName), the goal being to only update records based on further conditions after getting a short list of theID.
In the example below, I have the desired outcome in the array myTest which takes a considerable amount of time with 15,000 records.

I'm experimenting with theRegexMutableArray to see if this will speed things up,
Should I continue down the NSMutable array path to potentially save time?
Does this save much time rather than building/adding to the end of a new array?
Is there a better way somehow with valueForKey?
Can I somehow parse the full array to sortString(someText)?

Applescript:

tell application "Music"
   set theID to current application's NSArray's arrayWithArray:(get persistent ID of every file track in library playlist 1)
   set theName to current application's NSArray's arrayWithArray:(get name of every file track in library playlist 1)
   set theSortName to current application's NSArray's arrayWithArray:(get sort name of every file track of library playlist 1)
   set theRegexMutableArray to current application's NSMutableArray's arrayWithArray:(get name of every file track in library playlist 1)
end tell

set myTest to {}
repeat with i from 1 to count of theID
   set end of myTest to sortString(theName's objectAtIndex:(i - 1))
end repeat

repeat with i from 1 to count of theID
   (theRegexMutableArray's replaceObjectAtIndex:(i - 1) withObject:(sortString(theRegexMutableArray's objectAtIndex:(i - 1)))) --edited, working but still slow
end repeat

on sortString(someText)
   set theNSString to current application's NSString's stringWithString:someText
   set theOptions to (current application's NSRegularExpressionDotMatchesLineSeparators as integer) + (current application's NSRegularExpressionAnchorsMatchLines as integer) + (current application's NSCaseInsensitiveSearch as integer)
   set theRegEx to current application's NSRegularExpression's regularExpressionWithPattern:"^(A |The )(.*$)" options:theOptions |error|:(missing value)
   set theNSString to theRegEx's stringByReplacingMatchesInString:theNSString options:0 range:{location:0, |length|:theNSString's |length|()} withTemplate:"$2"
   return theNSString
end sortString

I’m new to Objective-C, I’m still trying to understand translation of code signatures from the documentation and how to query arrays from a database perspective. Appreciate the ITLibrary Framework suggestions in recent posts, thanks! So after I've applied regex to an entire array in an optimized manner, I’d like some help to understand the following a little better…

Ive created myReferenceArray, with just 3 key/value pairs for now which should help me cross reference theRegexMutableArray with current state theSortName using NSPredicate. I’m sure there is a better way. Am I creating the myReferenceArray correctly? What else could be improved on here?

Applescript:

set myReferenceArray to {}
repeat with i from 1 to count of theID
   set end of myReferenceArray to {theID:theID's objectAtIndex:(i - 1), theSortName:theSortName's objectAtIndex:(i - 1), theRegexMutableArray:theRegexMutableArray's objectAtIndex:(i - 1)}
end repeat
set myReferenceArray to current application's NSArray's arrayWithArray:myReferenceArray
set thePred to current application's NSPredicate's predicateWithFormat:"theRegexMutableArray CONTAINS theSortName" -- edited, working
set theFilteredResult to (myReferenceArray's filteredArrayUsingPredicate:thePred)

Thank you.

Last edited by MrCee (2020-11-24 02:16:46 am)

Offline

 

#2 2020-11-24 04:28:25 am

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 6565

Re: What is the quickest way to apply regex to an entire array?

MrCee wrote:

Should I continue down the NSMutable array path to potentially save time?
Does this save much time rather than building/adding to the end of a new array?



I suspect there won't be any drastic difference either way -- but it's the sort of thing you should test. Making assumptions about timings in AppleScript is something that's bitten many of us.

Is there a better way somehow with valueForKey?



You can only use valueForKey for properties, or methods without parameters.

Am I creating the myReferenceArray correctly? What else could be improved on here?



There's no real shortcut. You might find using a mutable array and addObject: is quicker for a large number of items, but it tends to be slower for smaller lists.


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/
latenightsw.com

Offline

 

#3 2020-11-25 02:23:54 am

MrCee
Member
Registered: 2016-07-01
Posts: 12

Re: What is the quickest way to apply regex to an entire array?

Thanks Shane.

Regarding myReferenceArray : Your suggestion worked a charm.
I'm now Initializing a mutable array and using addObject. Although I don't believe I can avoid the repeat loop in this case (please correct me if I'm wrong), I'm getting the required results at a 400% improvement when compared to adding to end of myReferenceArray {} as list . This is great!

Applescript:

set myReferenceArray to current application's NSMutableArray's alloc's init()
repeat with i from 1 to count of theID
   set myData to {theID:theID's objectAtIndex:(i - 1), theSortName:theSortName's objectAtIndex:(i - 1), theRegexMutableArray:theRegexMutableArray's objectAtIndex:(i - 1)}
   (myReferenceArray's addObject:myData)
end repeat
set thePred to current application's NSPredicate's predicateWithFormat:"theRegexMutableArray CONTAINS theSortName "
set theFilteredResult to (myReferenceArray's filteredArrayUsingPredicate:thePred)

theFilteredResult:
(NSArray) {
    {
        theID:"6BE13A45E8239EA7",
        theSortName:"Funk (Original Mix)",
        theRegexMutableArray:"The Funk (Original Mix)"
    },
    {
        theID:"59D7A378DD1FBC02",
        theSortName:"Way You Do (Original Mix)",
        theRegexMutableArray:"The Way You Do (Original Mix)"
    },
    {
        theID:"6DB4031F1723BCAD",
        theSortName:"Stars (Hatiras Remix)",
        theRegexMutableArray:"The Stars (Hatiras Remix)"
    },
    {
        theID:"DBD2C1E62F95EB90",
        theSortName:"Stars (Original Mix)",
        theRegexMutableArray:"The Stars (Original Mix)"
    },
    {
        theID:"39C0D29216063FE5",
        theSortName:"Whispers - And The Beat Goes On (Purple Disco Machine Edit)",
        theRegexMutableArray:"The Whispers - And The Beat Goes On (Purple Disco Machine Edit)"
    }
}

Offline

 

#4 2020-11-25 04:17:06 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 5356

Re: What is the quickest way to apply regex to an entire array?

MrCee wrote:

In the example below, I have the desired outcome in the array myTest which takes a considerable amount of time with 15,000 records.


Hi.

Have you tried editing the names in vanilla AppleScript first, before setting the NSArrays?

Applescript:

use framework "Foundation"

tell application "Music"
   set theID to current application's NSArray's arrayWithArray:(get persistent ID of every file track in library playlist 1)
   set theName to (get name of every file track in library playlist 1) -- Not an NSArray here.
   set theSortName to current application's NSArray's arrayWithArray:(get sort name of every file track of library playlist 1)
   -- set theRegexMutableArray to current application's NSMutableArray's arrayWithArray:(get name of every file track in library playlist 1)
end tell

set myTest to getSortNames(theName)
set theRegexMutableArray to current application's NSMutableArray's arrayWithArray:myTest
set theName to current application's NSArray's arrayWithArray:theName

on getSortNames(theNames)
   script o
       property sortNames : theNames's items
   end script
   
   repeat with i from 1 to (count o's sortNames)
       set thisName to item i of o's sortNames
       if ((thisName begins with "A ")) or (thisName begins with "The ") then set item i of o's sortNames to text from word 2 to end of thisName
   end repeat
   
   return o's sortNames
end getSortNames


NG

Offline

 

#5 2020-11-26 08:11:19 pm

MrCee
Member
Registered: 2016-07-01
Posts: 12

Re: What is the quickest way to apply regex to an entire array?

Thanks, Nigel.

This has been most enlightening as your suggestion in vanilla Applescript achieves results in under 3 seconds! So I guess, one cannot assume ASOC will be faster, sometimes keeping it vanilla will be more efficient.
But how is this so? Every time I build or repeat over a list in vanilla Applescript, it usually slows things down.

I thought I would put this to the test in NSArrays just for my own learning, and adopted your approach by only updating the values required rather than running regex on the entire array. The conditional filtering and prefix removal improved runtime significantly, but nowhere near the suggested vanilla Applescript.

This is what I have come up with so far, which is 675% slower than your suggestion, so my question is why? I’d like to understand if there Is there a better way I should be approaching ASOC, which enables it to run as efficiently as your vanilla Applescript?

Applescript:

tell application "Music"
   set theID to current application's NSArray's arrayWithArray:(get persistent ID of every file track in library playlist 1)
   --set theName to (get name of every file track in library playlist 1) -- Not an NSArray here.
   --set theSortName to current application's NSArray's arrayWithArray:(get sort name of every file track of library playlist 1)
   set theRegexMutableArray to current application's NSMutableArray's arrayWithArray:(get name of every file track in library playlist 1)
end tell

repeat with i from 1 to count of theID
   set thisName to (theRegexMutableArray's objectAtIndex:(i - 1))
   if (thisName's hasPrefix:"A ") or (thisName's hasPrefix:"The ") then
       set thisName to (thisName's componentsSeparatedByString:" ")
       (thisName's removeObjectAtIndex:0)
       set thisName to (thisName's componentsJoinedByString:" ")
       (theRegexMutableArray's replaceObjectAtIndex:(i - 1) withObject:thisName)
   end if
end repeat

Many thanks.

Offline

 

#6 2020-11-26 10:42:53 pm

Fredrik71
Member
Registered: 2019-10-23
Posts: 625

Re: What is the quickest way to apply regex to an entire array?

Read this post: https://macscripter.net/viewtopic.php?id=36833

Search with google: interapplication communication IAC benchmark it have give me little
more understanding what Apple Events is all about and different from other technology.

Here is simple example showing how much slower Apple's Script Editor is to others...
I have recently discover many strange things with Apple's Script Editor.

The same example is 5 times faster in Automator.

Applescript:

on run
   repeat with i from 1 to 10
       set a to do shell script "echo " & i
   end repeat
   display dialog a
end run

Last edited by Fredrik71 (2020-11-27 12:24:37 am)


The purpose to study someone else art is not to add, its to make less more.

Offline

 

#7 2020-11-27 03:33:31 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 5356

Re: What is the quickest way to apply regex to an entire array?

MrCee wrote:

This is what I have come up with so far, which is 675% slower than your suggestion, so my question is why?


Hi MrCee.

While ObjectiveC can be a lot faster than AppleScript for some purposes, a lot of AppleScriptObjC involves switching back and forth between AppleScript and ObjectiveC by means of the "Scripting Bridge". This often happens automatically, as for example in the instruction:

Applescript:

if (thisName's hasPrefix:"A ") or (thisName's hasPrefix:"The ") then

Every time either of the hasPrefix: methods is executed, the AppleScript string supplied to it is coerced to an NString first and the ObjectiveC boolean returned as the test result is coerced to the AppleScript equivalent afterwards. Perversely, testing just now, I'm finding that this is actually faster than using variables explicitly set to the NSStrings before the repeat!  hmm  But generally, although AppleScript's designed to let you switch around easily between languages and agents, it works faster the fewer individual instructions it sends to things outside itself.

Another point with regard to your test code is that the NSString class has its own regex-capable search-and-replace method which is faster and more convenient than what you've been trying so far. It's still not as fast as the vanilla, but in the current context, is a better ASObjC solution:

Applescript:

tell application "Music"
   --

   set theRegexMutableArray to current application's NSMutableArray's arrayWithArray:(get name of every file track in library playlist 1)
end tell

repeat with thisName in theRegexMutableArray
   if (thisName's hasPrefix:"A ") or (thisName's hasPrefix:"The ") then
       set thisName's contents to (thisName's stringByReplacingOccurrencesOfString:"^(?:A |The )" withString:"" options:(current application's NSRegularExpressionSearch) range:{0, thisName's |length|()})
   end if
end repeat

If you know that a sizeable number of the strings do begin with either "A " or "The", you may find it faster to leave out the 'if' statement and let the strings be replaced anyway, even if the replacements are no different from the originals.

Another alternative you may like to try is this:

Applescript:

repeat with thisName in theRegexMutableArray
   if (thisName's hasPrefix:"A ") then
       set thisName's contents to (thisName's stringByReplacingOccurrencesOfString:"A " withString:"")
   else if (thisName's hasPrefix:"The ") then
       set thisName's contents to (thisName's stringByReplacingOccurrencesOfString:"The" withString:"")
   end if
end repeat

The tests take about the same amount of time, but the replacement method does a simpler search and is therefore slightly faster.


NG

Offline

 

#8 2020-11-29 03:48:53 am

KniazidisR
Member
From:: Greece
Registered: 2019-03-03
Posts: 1632

Re: What is the quickest way to apply regex to an entire array?

MrCee wrote:

This has been most enlightening as your suggestion in vanilla Applescript achieves results in under 3 seconds! So I guess, one cannot assume ASOC will be faster, sometimes keeping it vanilla will be more efficient.
But how is this so? Every time I build or repeat over a list in vanilla Applescript, it usually slows things down.


I've already seen 3 cases where vanilla AppleScript is faster than optimally written AsObjC code to accomplish the same task. Faster hybrid cases are even more common to me. Like here.

In addition to the reasons given by Nigel Garvey, vanilla AppleScript boasts super-fast tools such as text items delimiters, getting the size of a file structure's item, working with script objects by reference, incrementing an array by adding item's content instead of adding an item, super fast working the Finder with AppleScript aliases array, and more. This is just the tip of the iceberg that we can see.

Last edited by KniazidisR (2020-11-29 04:00:22 am)


Model: MacBook Pro
OS X: Catalina 10.15.4
Web Browser: Safari 13.1
Ram: 4 GB

Offline

 

#9 2020-11-29 04:30:57 am

MrCee
Member
Registered: 2016-07-01
Posts: 12

Re: What is the quickest way to apply regex to an entire array?

Thanks Guys.

Fredrik71 – I needed to see that post. It’s definitely answered some of the questions I had.

Nigel, your help with this exercise has been invaluable. Now I’m understanding Script Objects or ”script o” better than before. Also, the NSString class regex-capable search-and-replace method was something I overlooked; this has definitely improved the speed compared to my previous method. Now I can apply regex across 15,000 NSArray items in approx. 12 seconds.

I'm finding that this is actually faster than using variables explicitly set to the NSStrings before the repeat!



Isn’t that interesting. I’m glad I’ve experienced this now for myself before I build multiple repeat loops. So essentially whenever possible, removing the repeat count and not setting a objectAtIndex variable at each repeat; and therefore using ‘repeat with thisName in theRegexMutableArray’ as you have shown in your example actually improves repeat loop runtime by 30%.

So pulling it all together, what I’ve learned since my original post…

1)    My goal was to apply regex to an entire array, and could do this in 2 ways. Vanilla Applescript is certainly faster when removing the first word and essentially keeping it simple. In the future, I will need the power of NSRegularExpression to add further regex conditions involving tokens and backreferences which potentially vanilla Applescript will not handle. So I’ve optimized regex in ASOC and will build on this later. This is what I have which takes approx. 12 seconds for 15.000 items….

Applescript:

--Nigel's ASOC suggestion: 12 seconds for 15,000 items
repeat with thisName in theRegexMutableArray
if (thisName's hasPrefix:"A ") or (thisName's hasPrefix:"The ") then
set thisName's contents to (thisName's stringByReplacingOccurrencesOfString:"^(?:A |The )" withString:"" options:(current application's NSRegularExpressionSearch) range:{0, thisName's |length|()})
end if
end repeat

2)    I set out to create an NSArray with multiple key/value pairs in which I could query/filter easily. This takes time to build. I’ve now optimized the creation of myReferenceArray in ASOC as NSMutableArray using addObject and the repeat loop and is much faster than adding to the end of a list {} Building this NSArray takes 25 seconds for 15,000 ‘rows’.

Applescript:

--Shane's ASOC suggestion: 25 seconds for 15,000 items
set myReferenceArray to current application's NSMutableArray's alloc's init()
repeat with i from 1 to count of theID
set myData to {theID:theID's objectAtIndex:(i - 1), theSortName:theSortName's objectAtIndex:(i - 1), theRegexMutableArray:theRegexMutableArray's objectAtIndex:(i - 1)}
(myReferenceArray's addObject:myData)
end repeat

Is there a quicker way to build an NSArray and apply the regex changes? After seeing the speed vanilla Applescript can achieve, I thought I’d combine the best of both worlds using script object properties and this is what I’ve come up with, while comparing 3 regex replace methods….

Applescript:

use framework "Foundation"

tell application "Music"
set theID to (get persistent ID of every file track in library playlist 1)
set theName to (get name of every file track in library playlist 1)
set theSortName to (get sort name of every file track of library playlist 1)
end tell

on makeNSArrayQuickly(theIDs, theNames, theSortNames)
script o
property myIDs : theIDs's items
property myNames : theNames's items
property mySortNames : theSortNames's items
property myRegexSortNames : theNames's items --to be updated
property myResult : {}
end script

repeat with i from 1 to (count o's myIDs)
set thisName to item i of o's myRegexSortNames
--if ((thisName begins with "A ")) or (thisName begins with "The ") then set item i of o's myRegexSortNames to text from word 2 to end of thisName --2.96 SECONDS
--if ((thisName begins with "A ")) or (thisName begins with "The ") then set item i of o's myRegexSortNames to sortString1(thisName) --4.35 SECONDS
if ((thisName begins with "A ")) or (thisName begins with "The ") then set item i of o's myRegexSortNames to sortString2(thisName) --3.64 SECONDS
set end of o's myResult to {theID:item i of o's myIDs, theName:item i of o's myNames, theSortName:item i of o's mySortNames, myRegexSortName:item i of o's myRegexSortNames}
end repeat

set myResult to current application's NSMutableArray's arrayWithArray:(o's myResult)
return myResult
end makeNSArrayQuickly

set myReferenceArray to (makeNSArrayQuickly(theID, theName, theSortName))
set thePred to current application's NSPredicate's predicateWithFormat:"myRegexSortName CONTAINS theSortName "
set theFilteredResult to (myReferenceArray's filteredArrayUsingPredicate:thePred)

on sortString1(someText) --4.35 SECONDS
set theNSString to current application's NSString's stringWithString:someText
set theOptions to (current application's NSRegularExpressionDotMatchesLineSeparators as integer) + (current application's NSRegularExpressionAnchorsMatchLines as integer) + (current application's NSCaseInsensitiveSearch as integer)
set theRegEx to current application's NSRegularExpression's regularExpressionWithPattern:"^(A |The )(.*$)" options:theOptions |error|:(missing value)
set theNSString to theRegEx's stringByReplacingMatchesInString:theNSString options:0 range:{location:0, |length|:theNSString's |length|()} withTemplate:"$2"
return theNSString
end sortString1

on sortString2(someText) --3.64 SECONDS
set theNSString to current application's NSString's stringWithString:someText
set theNSString to (theNSString's stringByReplacingOccurrencesOfString:"^(?:A |The )" withString:"" options:(current application's NSRegularExpressionSearch) range:{0, theNSString's |length|()})
return theNSString
end sortString2

I’m pleased to say that with all of the suggestions above, 15,000 key/value pairs with regex applied to the entire array can be completed in under 5 seconds!

theFilteredResult:

(NSArray) {
    {
        myRegexSortName:"Funk (Original Mix)",
        theSortName:"Funk (Original Mix)",
        theID:"6BE13A45E8239EA7",
        theName:"The Funk (Original Mix)"
    },
    {
        myRegexSortName:"Way You Do (Original Mix)",
        theSortName:"Way You Do (Original Mix)",
        theID:"59D7A378DD1FBC02",
        theName:"The Way You Do (Original Mix)"
    },
    {
        myRegexSortName:"Stars (Hatiras Remix)",
        theSortName:"Stars (Hatiras Remix)",
        theID:"6DB4031F1723BCAD",
        theName:"The Stars (Hatiras Remix)"
    },
    {
        myRegexSortName:"Stars (Original Mix)",
        theSortName:"Stars (Original Mix)",
        theID:"DBD2C1E62F95EB90",
        theName:"The Stars (Original Mix)"
    },
    {
        myRegexSortName:"Whispers - And The Beat Goes On (Purple Disco Machine Edit)",
        theSortName:"Whispers - And The Beat Goes On (Purple Disco Machine Edit)",
        theID:"39C0D29216063FE5",
        theName:"The Whispers - And The Beat Goes On (Purple Disco Machine Edit)"
    }
}

I think the next thing I'll need to learn is NSDictionary and shared keys to potentially search and query as per the above in a more efficient way.  If anyone has any further suggestions or tweaks, let me know. I would appreciate it.


Cheers!

Last edited by MrCee (2020-11-29 05:42:04 am)

Offline

 

Board footer

Powered by FluxBB

RSS (new topics) RSS (active topics)