Tuesday, September 27, 2022

#1 2022-08-10 06:50:53 am

peavine
Member
From:: Prescott, Arizona
Registered: 2018-09-04
Posts: 1458

Regular Expression Capture Groups

I'm working to learn the ASObjC implementation of Regular Expression capture groups. I'm using a rewrite of a script from Shane's ASObjC book, and the goal is to get substrings in parentheses that match a pattern.

Scripts one and two work as expected, but I can't get script three--which has two capture groups--to work. I'm not sure if my RegEx pattern is faulty or if I'm doing something else wrong. I tested the script in Shane's book (page 82) with 3 capture groups, and it works fine. Thanks for any help.

Applescript:

use framework "Foundation"

-- script one
# set theString to "(Joe) and (Jack) and (John) and (30) and (40) and (50)"
# set thePattern to "\\((\\D.*?)\\)"
# set n to 1 --> {"Joe", "Jack", "John"}

-- script two
# set theString to "(Joe) and (Jack) and (John) and (30) and (40) and (50)"
# set thePattern to "\\((\\d.*?)\\)"
# set n to 1 --> {"30", "40", "50"}

-- script three
set theString to "(Joe) and (Jack) and (John) and (30) and (40) and (50)"
set thePattern to "\\((\\D.*?)\\)|\\((\\d.*?)\\)"
set n to 2 --> unable to set argument 2... 'utxt'("length"), 0 ] }> could not be coerced to type {_NSRange=QQ}.

set theString to current application's NSString's stringWithString:theString
set theOptions to 24 -- DotMatchesLineSeparators and AnchorsMatchLines
set theRegEx to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:theOptions |error|:(missing value)
set regExResults to theRegEx's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
log (count of items of regExResults) --> 6
set theMatches to {}
repeat with i from 1 to count of items of regExResults
   set aMatch to (item i of regExResults)
   if (aMatch's numberOfRanges()) as integer < (n + 1) then -- N/Ap because 6 ranges and n + 1 is 3
       set end of theMatches to missing value
   else
       set theRange to (aMatch's rangeAtIndex:n) --> {location:9.22337203685478E+18, |length|:0}
       set end of theMatches to (theString's substringWithRange:theRange) as string
   end if
end repeat
return theMatches

Last edited by peavine (2022-08-10 07:06:27 am)


2018 Mac mini - macOS Monterey - Script Debugger 8

Offline

 

#2 2022-08-10 07:13:14 am

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 6793

Re: Regular Expression Capture Groups

The problem is that at some point the range's location is NSNotFound, which is too big an integer for AppleScript even to represent accurately as a real. When you pass it back, the location therefore has a different value.

The solution is to test for this:

Applescript:

       set theRange to (aMatch's rangeAtIndex:n)
       if theRange's |length| > 0 then
           set end of theMatches to (theString's substringWithRange:theRange) as string
       end if


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/
latenightsw.com

Offline

 

#3 2022-08-10 07:32:40 am

peavine
Member
From:: Prescott, Arizona
Registered: 2018-09-04
Posts: 1458

Re: Regular Expression Capture Groups

Thanks Shane. I made the changes you suggest and the script works great. smile

Just for learning purposes, the solution raises the question in my mind why a range's location is NSNotFound. There are 6 regExResults, and all of them are valid substrings. I worked my way through the script in Script Debugger's debug mode but couldn't learn anything.


2018 Mac mini - macOS Monterey - Script Debugger 8

Offline

 

#4 2022-08-10 09:35:14 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 5563

Re: Regular Expression Capture Groups

Hi peavine.

Your regex pattern "\\((\\D.*?)\\)|\\((\\d.*?)\\)" contains two capture groups, one either side of the OR indicator "|". They're regarded as capture groups 1 and 2 even though they're simply alternatives. There are entries for both of them in the match result, but only one of them actually matches the subtext found. So either aMatch's rangeAtIndex:1 is the range of the subtext and its rangeAtIndex:2 indicates a non-match, or vice versa. You could reduce the number of capture groups to one in this particular case by having the OR within a group, eg. "\\((\\D.*?|\\d.*?)\\)". This way, the rangeAtIndex:1 is always it.


NG

Offline

 

#5 2022-08-10 09:53:50 am

peavine
Member
From:: Prescott, Arizona
Registered: 2018-09-04
Posts: 1458

Re: Regular Expression Capture Groups

Thanks Nigel for the explanation--it took a little thought but I understand things now.

Just as an aside, the above script can be simplified if only one capture group is present. The following returns everything within parentheses and returns a blank list if no parentheses are found. Also, the returned list includes an empty string if blank parentheses are encountered, although these can be filtered out in the repeat loop if desired. This script is easily modified to return text contained in other characters--one example being quoted text.

Applescript:

-- requires macOS El Capitan or newer
use framework "Foundation"
use scripting additions

set theString to "(Jack) and (Joe) and (30)" --> {"Jack", "Joe", "30"}
# set theString to "(Jack) and () and (Joe) and (30)" --> {"Jack", "", "Joe", "30"}
# set theString to "" --> {}

set textInParentheses to getTextInParentheses(theString)

on getTextInParentheses(theString)
   set theString to current application's NSString's stringWithString:theString
   set thePattern to "\\((.*?)\\)"
   set theRegEx to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
   set regExResults to theRegEx's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
   set theMatches to current application's NSMutableArray's new()
   repeat with anItem in regExResults
       set theRange to (anItem's rangeAtIndex:1)
       (theMatches's addObject:(theString's substringWithRange:theRange))
   end repeat
   return theMatches as list
end getTextInParentheses

Last edited by peavine (2022-08-12 09:55:13 am)


2018 Mac mini - macOS Monterey - Script Debugger 8

Offline

 

#6 2022-08-12 07:06:49 am

peavine
Member
From:: Prescott, Arizona
Registered: 2018-09-04
Posts: 1458

Re: Regular Expression Capture Groups

I was working to learn look-behind and look-ahead assertions and realized that they can be used to perform the same task as the script in post 5 above.

Applescript:

-- requires macOS El Capitan or newer
use framework "Foundation"
use scripting additions

set theString to "(Jack) and (Joe) and (30)" --> {"Jack", "Joe", "30"}

set textInParentheses to getTextInParentheses(theString)

on getTextInParentheses(theString)
   set theString to current application's NSString's stringWithString:theString
   set thePattern to "(?<=\\().*?(?=\\))"
   set theRegEx to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
   set regExResults to theRegEx's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
   set theRanges to (regExResults's valueForKey:"range")
   set theMatches to current application's NSMutableArray's new()
   repeat with aRange in theRanges
       (theMatches's addObject:(theString's substringWithRange:aRange))
   end repeat
   return theMatches as list
end getTextInParentheses

One major limitation appears to be that the characters that bracket the desired text have to be different. Thus, at least in my testing, the script cannot be used to find text in quotes. The changed lines in the above script and the result are:

Applescript:

set theString to "\"Jack\" and \"Joe\" and \"30\"" --> {"Jack", " and ", "Joe", " and ", "30"}
set thePattern to "(?<=\\\").*?(?=\\\")"

Last edited by peavine (2022-08-12 09:56:04 am)


2018 Mac mini - macOS Monterey - Script Debugger 8

Offline

 

#7 2022-08-12 10:17:13 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 5563

Re: Regular Expression Capture Groups

Hi peavine.

Look-behinds and look-aheads can seem a bit odd at first. They don't count towards a regex search's progress through the source text. Text matched by a look-behind may already have been matched or passed over before the match it precedes is reached. Similarly, where a look-ahead is matched, the search resumes from the matched look-ahead text, not from after it. So where look-behind and look-ahead matches are identical and the characters between them essentially wildcards, the results are as you describe. You sometimes have to be very inventive to get round such possibilities.  wink  I think a capture group's the way to go in this case.


NG

Offline

 

#8 2022-08-12 03:39:21 pm

peavine
Member
From:: Prescott, Arizona
Registered: 2018-09-04
Posts: 1458

Re: Regular Expression Capture Groups

Thanks Nigel. I'll stick with capture groups.


2018 Mac mini - macOS Monterey - Script Debugger 8

Offline

 

#9 2022-08-13 08:25:09 am

peavine
Member
From:: Prescott, Arizona
Registered: 2018-09-04
Posts: 1458

Re: Regular Expression Capture Groups

My script in post 5 works fine with one capture group. My script in post 1 is intended to work with 2 or more capture groups but is broken. I've included below a revised script which incorporates Shane's fix and includes a few miscellaneous edits, which are just a matter of personal preference.

Applescript:

use framework "Foundation"
use scripting additions

set theString to "(Joe) and (30) and (A1) and (Jack) and (40) and (B1)"

set textInParentheses to getTextInParentheses(theString, 1)
-- The second parameter is the capture group, which in this instance would normally be set to 1 (all letters), 2 (all digits), or 3 (a combination of letters and digits). The script will throw an error if a particular capture group is not found (e.g. 4), and error correction needs to be added for this.

on getTextInParentheses(theString, captureGroup)
   set theString to current application's NSString's stringWithString:theString
   set thePattern to "(?i)\\(([a-z]*?)\\)|\\(([0-9]*?)\\)|\\(([a-z0-9]*?)\\)"
   set theRegex to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
   set regexResults to theRegex's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
   set theMatches to current application's NSMutableArray's new()
   repeat with aMatch in regexResults
       set theRange to (aMatch's rangeAtIndex:captureGroup)
       if theRange's |length| > 0 then (theMatches's addObject:(theString's substringWithRange:theRange))
   end repeat
   return theMatches as list
end getTextInParentheses

Last edited by peavine (2022-08-14 06:42:32 am)


2018 Mac mini - macOS Monterey - Script Debugger 8

Offline

 

#10 2022-08-16 09:12:11 am

peavine
Member
From:: Prescott, Arizona
Registered: 2018-09-04
Posts: 1458

Re: Regular Expression Capture Groups

I've gained a basic understanding of capture groups but one issue remained unclear. There are two types of capture group back references, and their formats are \\n and $n. In the NSRegularExpressions documentation, the \\n back reference is discussed under "Regular Expressions Matacharacters" and $n is discussed under "Template Matching Format".

Like much having to do with regular expressions, an example helps. The following looks for consecutive duplicate words using \\1 and replaces them with one instance of the duplicate words using $1. The search is case insensitive and does not match across paragraph returns, although both of these behaviors are easily changed.

Applescript:

use framework "Foundation"
use scripting additions

set theString to "This is is a test test.
This This is another Another test."


set cleanedString to removeDuplicateWords(theString)

on removeDuplicateWords(theString)
   set thePattern to "(?i)\\b(\\w+)\\h+\\1\\b" -- \\1 is a back reference to (\\w+)
   set theString to current application's NSMutableString's stringWithString:theString
   set replaceCount to (theString's replaceOccurrencesOfString:thePattern withString:"$1" options:(current application's NSRegularExpressionSearch) range:{0, theString's |length|()}) -- $1 is a back reference to (\\w+)
   return theString as text
end removeDuplicateWords

Last edited by peavine (2022-08-20 10:09:15 am)


2018 Mac mini - macOS Monterey - Script Debugger 8

Offline

 

#11 2022-08-18 09:58:16 pm

technomorph
Member
Registered: 2017-12-14
Posts: 279

Re: Regular Expression Capture Groups

Check out the RegExKit App for Mac
It’s amazing for testing your RegExs
Has tip and hints.
Shows you capture groups etc.
I use it all the time.

Offline

 

#12 2022-08-19 07:06:41 am

peavine
Member
From:: Prescott, Arizona
Registered: 2018-09-04
Posts: 1458

Re: Regular Expression Capture Groups

Thanks technomorph. I downloaded RegexKit from GitHub at:

https://github.com/forhappy/RegexKit

It's a single app file of about 37 MB with an attractive interface and lots of helpful Regular Expression information. I had been using the Atom editor to test Regular Expressions, but I think RegexKit will be much better.

Last edited by peavine (2022-08-19 08:00:38 am)


2018 Mac mini - macOS Monterey - Script Debugger 8

Offline

 

#13 2022-08-22 08:52:46 pm

technomorph
Member
Registered: 2017-12-14
Posts: 279

Re: Regular Expression Capture Groups

It will also supply you “code” for your expression and replacement.
I use the PHP code for Objective-c as
Escapes everyhhhong properly

Offline

 

#14 2022-08-22 09:36:15 pm

technomorph
Member
Registered: 2017-12-14
Posts: 279

Re: Regular Expression Capture Groups

Here's a script i use to test them in AppleScript.
At the end is a bunch of commented out "tests" or examples you might find useful


Applescript:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

property NSRegularExpression : a reference to current application's NSRegularExpression
property NSRegularExpressionCaseInsensitive : a reference to 1
property NSRegularExpressionUseUnicodeWordBoundaries : a reference to 40
property NSRegularExpressionAnchorsMatchLines : a reference to 16
property NSRegularExpressionSearch : a reference to 1024
property NSString : a reference to current application's NSString

property myTestName : ""

property mySourceA : ""
property mySourceB : ""
property myPattern1 : ""
property myPattern2 : ""
property myReplace : ""

property myTestA1 : ""
property myTestA2 : ""

property myTestB1 : ""
property myTestB2 : ""
property myTestExpect1 : ""
property myTestExpect2 : ""

property logRegEx : true
property logResults : true
property logDebug : false



-- RUN TEMPLATE

-- \\b(WAV|24 bit|96|19\\.2)\\b
-- NEED FLAC MISSING BAD LOW REPLACE NOT LIVE

set aWordsPattern1 to my createPatternForMatchAnyWords:"WAV 24%bit 96 19.2"
set aWordsPattern2 to my createPatternForMatchAnyWords:"NEED%FLAC MISSING BAD LOW REPLACE NOT LIVE"

my testRegWithName:"TRACK QUALITY SCANNING TAGS FOR CONATINS" pattern1:aWordsPattern1 pattern2:aWordsPattern2 source1:"FLAC 24 bit - 19.2 kHz" source2:"missing" replaceWith:"MATCHED" expecting1:"" expecting2:""


-- MAIN SCRIPT OBJECT FUNCTIONS
on testRegWithName:aName pattern1:patternNo1 pattern2:patternNo2 ¬
   source1:sourceA source2:sourceB replaceWith:aReplace ¬
   expecting1:expectNo1 expecting2:expectNo2
   my resetValues()
   set myTestName to aName
   if not patternNo1 is "" then set myPattern1 to patternNo1
   if not patternNo2 is "" then set myPattern2 to patternNo2
   if not sourceA is "" then set mySourceA to sourceA
   if not sourceB is "" then set mySourceB to sourceB
   if not aReplace is "" then set myReplace to aReplace
   if not expectNo1 is "" then set myTestExpect1 to expectNo1
   if not expectNo2 is "" then set myTestExpect2 to expectNo2
   
   my runTestA()
   my runTestB()
   if logResults then my logTestResults()
end testRegWithName:pattern1:pattern2:source1:source2:replaceWith:expecting1:expecting2:

on resetValues()
   set myTestName to ""
   set myPattern1 to "NONE"
   set myPattern2 to "NONE"
   set mySourceA to "NONE"
   set mySourceB to "NONE"
   set myReplace to ""
   
   set myTestA1 to "NONE"
   set myTestA2 to "NONE"
   
   set myTestB1 to "NONE"
   set myTestB2 to "NONE"
   
   set myTestExpect1 to "NONE"
   set myTestExpect2 to "NONE"
end resetValues

on runTestA()
   if mySourceA is "NONE" then
       return
   end if
   if not myPattern1 is "NONE" then
       set myTestA1 to my findInString:mySourceA withPattern:myPattern1 replaceWith:myReplace
   end if
   if not myPattern2 is "NONE" then
       set myTestA2 to my findInString:mySourceA withPattern:myPattern2 replaceWith:myReplace
   end if
end runTestA

on runTestB()
   if mySourceB is "NONE" then
       return
   end if
   if not myPattern1 is "NONE" then
       set myTestB1 to my findInString:mySourceB withPattern:myPattern1 replaceWith:myReplace
   end if
   if not myPattern2 is "NONE" then
       set myTestB2 to my findInString:mySourceB withPattern:myPattern2 replaceWith:myReplace
   end if
end runTestB

on logTestResults()
   log ("------------------------------------------- TEST RESULTS LOG")
   log {"----------------myTestName is", myTestName}
   
   log {"myPattern1 is", myPattern1}
   log {"myPattern2 is", myPattern2}
   log {"myReplace is", myReplace}
   
   log {"--------------mySourceA is", mySourceA}
   
   log {"myTestA1 is", myTestA1}
   log {"myTestA2 is", myTestA2}
   if not myTestExpect1 is "NONE" then
       log {"myTestExpect1 is", myTestExpect1}
   end if
   
   log {"--------------mySourceB is", mySourceB}
   log {"myTestB1 is", myTestB1}
   log {"myTestB2 is", myTestB2}
   if not myTestExpect2 is "NONE" then
       log {"myTestExpect2 is", myTestExpect2}
   end if
end logTestResults

-- MAIN FUNCTIONS


on findInString:aString withPattern:aRegExString replaceWith:aReplace
   set aRegEx to my createRegularExpressionWithPattern:aRegExString
   if logDebug then
       log {"aRegEx is:", aRegEx}
   end if
   return (my findInString:aString withRegEx:aRegEx replaceWith:aReplace)
end findInString:withPattern:replaceWith:

on findInString:aString withRegEx:aRegEx replaceWith:aReplace
   if logDebug then log ("findInString:withRegEx:replaceWith: START")
   set aSource to NSString's stringWithString:aString
   set aRepString to NSString's stringWithString:aReplace
   set aLength to aSource's |length|()
   set aRange to (current application's NSMakeRange(0, aLength))
   set aCleanString to (aRegEx's stringByReplacingMatchesInString:aSource options:0 range:aRange withTemplate:aRepString)
   
   return aCleanString
end findInString:withRegEx:replaceWith:

on createRegularExpressionWithPattern:aRegExString
   if (class of aRegExString) is equal to (NSRegularExpression's class) then
       log ("it alreadry was a RegEx")
       return aRegExString
   end if
   set aPattern to NSString's stringWithString:aRegExString
   set regOptions to NSRegularExpressionCaseInsensitive + NSRegularExpressionUseUnicodeWordBoundaries
   set {aRegEx, aError} to (NSRegularExpression's regularExpressionWithPattern:aPattern options:regOptions |error|:(reference))
   if (aError ≠ missing value) then
       log {"regEx failed to create aError is:", aError}
       log {"aError debugDescrip is:", aError's debugDescription()}
       break
       return
   end if
   return aRegEx
end createRegularExpressionWithPattern:



on createPatternForMatchAnyWords:aLine
   set aString to NSString's stringWithString:aLine
   set aArray to aString's componentsSeparatedByString:" "
   set aPattern to NSString's stringWithString:"\\b("
   if (logRegEx) then
       log {"createPatternForMatchAnyWords aArray is:", aArray}
   end if
   
   set aTotal to (aArray's |count|())
   repeat with i from 1 to aTotal
       set aWord to aArray's item i
       set aWord to (aWord's stringByReplacingOccurrencesOfString:"%" withString:" ")
       set aWordPattern to (NSRegularExpression's escapedPatternForString:aWord)
       if (i ≠ aTotal) then
           set aWordPattern to (aWordPattern's stringByAppendingString:"|")
       end if
       if (logRegEx) then
           log {"aWord is:", aWord}
           log {"aWordPattern is:", aWordPattern}
       end if
       set aPattern to (aPattern's stringByAppendingString:aWordPattern)
   end repeat
   set aPattern to aPattern's stringByAppendingString:")\\b"
   if (logRegEx) then
       log {"final pattern is:", aPattern}
   end if
   return aPattern
end createPatternForMatchAnyWords:


on createPatternForMatchAllWords:aLine
   set aString to NSString's stringWithString:aLine
   set aArray to aString's componentsSeparatedByString:" "
   set aPattern to NSString's stringWithString:"^"
   if (logRegEx) then
       log {"createPatternForMatchAllWords aArray is:", aArray}
   end if
   
   repeat with i from 1 to (aArray's |count|())
       set aWord to aArray's item i
       if ((aWord's |length|()) > 1) then
           set aWordPattern to (my createPatternForMatchWord:aWord)
       else
           set aWordPattern to (my createPatternForMatchLetter:aWord)
       end if
       if (logRegEx) then
           log {"aWordPattern is:", aWordPattern}
       end if
       set aPattern to (aPattern's stringByAppendingString:aWordPattern)
   end repeat
   set aPattern to aPattern's stringByAppendingString:".*$"
   if (logRegEx) then
       log {"final pattern is:", aPattern}
   end if
   return aPattern
end createPatternForMatchAllWords:

-- (?=.*\\bYou\\b)
on createPatternForMatchWord:aWord
   set aWordPattern to NSString's stringWithString:"(?=.*\\b"
   set aWordPattern to (aWordPattern's stringByAppendingString:aWord)
   set aWordPattern to (aWordPattern's stringByAppendingString:".?\\b)")
   return aWordPattern
end createPatternForMatchWord:

on createPatternForMatchLetter:aWord
   set aWordPattern to NSString's stringWithString:"(?=.*\\b"
   set aWordPattern to (aWordPattern's stringByAppendingString:aWord)
   set aWordPattern to (aWordPattern's stringByAppendingString:".{0,2}\\b)")
   return aWordPattern
end createPatternForMatchLetter:




(*
   -- /(^.*?\\.){1}
   
   
   my testRegWithName:"REMOVE FROM START TO FIRST PERIOD / ALT ALSO REMOVE TRACK." pattern1:"(^.*?\\.){1}" pattern2:"((?:^.*?\\.){1}(?:track\\.?)?)" source1:"@unionOfArrays.trackGenres" source2:"self.track.bitRate" replaceWith:"" expecting1:"" expecting2:""
*)



(*
   -- ^(?=.*\\bYou\\b)(?=.*\\bKnow\\b)(?=.*\\bLove\\b)(?=.*\\bYou\\b).*$
   --
   
   set aWordsPattern1 to my createPatternForMatchAllWords:"You Know I Love You"
   set aWordsPattern2 to my createPatternForMatchAllWords:"You Know I Fuck You"
   
   
   my testRegWithName:"MATCH ALL WORDS IN LINE TITLE TEST 01" pattern1:aWordsPattern1 pattern2:aWordsPattern2 source1:"If You Love Me (Let Me Know)" source2:"I Didn't Know I Loved You" replaceWith:"MATCHED" expecting1:"" expecting2:""
*)




(*
   my testRegWithName:"SINGLE ARTIST MATCH 3 MORE ADDS NO THE" pattern1:"(?>((^the\\s)|(\\s?(\\,|\\&|\\+)\\s?)|(\\s(and|vs)\\.?\\s)|^))(Eagles)(?>($|(\\,?\\s?)))" pattern2:"(?>((\\s?(\\,|\\&|\\+)\\s?)|(\\s(and|vs)\\.?\\s)|^))(Eagles)(?>($|(\\,?\\s?)))" source1:"The Eagles CCR, Eagles Rolling Stones The Eagles of Death and Eagles II" source2:"CCR, Eagles, Rolling Stones & eagles plus Eagles of Death vs Eagles" replaceWith:"$1MATCHED$8" expecting1:"" expecting2:""
   -- ((^.*)
(?:(^the\s)|(\,\s?)|^)((Red)*.*(Hot)*.*(Chili)*.*(Peppers)*)(?:$|\,)(.*+))
*)


(*
   
   my testRegWithName:"SINGLE ARTIST MATCH" pattern1:"(?:(^the\\s)|(\\,\\s?)|^)(Eagles)(?:$|\\,)" pattern2:"((^.*)
(?:(^the\\s)|(\\,\\s?)|^)(Eagles)(?:$|\\,)(.*+))" source1:"The Eagles of DeathMetal" source2:"CCR, Eagles, Rolling Stones" replaceWith:"MATCHED" expecting1:"" expecting2:""
*)

(*
   
   -- REMOVE DIGITS AND DASH FROM BEGGING
   set myTestName to "
REMOVE DIGITS AND DASH FROM BEGGING"
   set myPattern1 to "
/^(\\s*[0-9]+\\s*-?\\s*)"
   set myPattern2 to "
/^(\\s*[0-9]+\\s*-?\\s*)/m"
   set mySourceA to "
001 Come Together"
   set mySourceB to "
123123123 - Believe"
   set myReplace to "
"
   set myTestA1 to my findInString:mySourceA withPattern:myPattern1 replaceWith:myReplace
   set myTestA2 to my findInString:mySourceA withPattern:myPattern2 replaceWith:myReplace
   
   set myTestB1 to my findInString:mySourceB withPattern:myPattern1 replaceWith:myReplace
   set myTestB2 to my findInString:mySourceB withPattern:myPattern2 replaceWith:myReplace
   
   if logResults then
       log ("
-------------------------------------------NEW TEST START")
       log {"----------------myTestName is", myTestName}
       
       log {"myPattern1 is", myPattern1}
       log {"myPattern2 is", myPattern2}
       log {"mySourceA is", mySourceA}
       
       log {"myReplace is", myReplace}
       log {"myTestA1 is", myTestA1}
       log {"myTestA2 is", myTestA2}
       
       log {"mySourceB is", mySourceB}
       log {"myTestB1 is", myTestB1}
       log {"myTestB2 is", myTestB2}
   end if
   
   
   
   -- REMOVE THE FROM BEGGING
   set myTestName to "REMOVE THE FROM BEGGING"
   set myPattern1 to "/^the\\W/mi"
   set myPattern2 to ""
   set mySourceA to "The Beatles"
   set mySourceB to "Adam and the Ants"
   set myReplace to ""
   set myTestA1 to my findInString:mySourceA withPattern:myPattern1 replaceWith:myReplace
   set myTestA2 to my findInString:mySourceB withPattern:myPattern1 replaceWith:myReplace
   
   if logResults then
       log ("-------------------------------------------NEW TEST START")
       log {"----------------myTestName is", myTestName}
       
       log {"myPattern1 is", myPattern1}
       log {"myPattern2 is", myPattern2}
       log {"mySourceA is", mySourceA}
       log {"mySourceB is", mySourceB}
       log {"myReplace is", myReplace}
       log {"myTestA1 is", myTestA1}
       log {"myTestA2 is", myTestA2}
   end if
   
   ((?:Red)?\b(?:Hot)?\b(?:Chili)?\b(?:Peppers)?\b)
   
   -- MATCH WHOLE WORD - EG WORK, hello
   set myTestName to "MATCH WHOLE WORD - EG WORK, hello"
   set myPattern1 to "\\b(\\w*work\\w*)\\b"
   set myPattern2 to "\\b(\\w*hello\\w*)\\b"
   set mySourceA to "hello 'worked? hello working all works and \"worked with \""
   set mySourceB to ""
   set myReplace to "XXXXX"
   set myTestA1 to my findInString:mySourceA withPattern:myPattern1 replaceWith:myReplace
   set myTestA2 to my findInString:mySourceA withPattern:myPattern2 replaceWith:myReplace
   
   if logResults then
       log ("-------------------------------------------NEW TEST START")
       log {"----------------myTestName is", myTestName}
       
       log {"myPattern1 is", myPattern1}
       log {"myPattern2 is", myPattern2}
       log {"mySourceA is", mySourceA}
       log {"mySourceB is", mySourceB}
       log {"myReplace is", myReplace}
       log {"myTestA1 is", myTestA1}
       log {"myTestA2 is", myTestA2}
   end if
   
   
   -- REMOVE BRACKETS AND BETWEETN
   set myTestName to "REMOVE BRACKETS AND BETWEETN"
   set myPattern1 to "(\\s*?\\(.+?\\)\\s*)+"
   set myPattern2 to "(\\W?(\\(|\\[|\\{).+?(\\)|\\]|\\})\\W?)+" -- also remove { and [
   set mySourceA to "Blah (blah1) (blah 2) me to (plus) check{all the men) and the [alll them]"
   set mySourceB to ""
   set myReplace to " "
   set myTestA1 to my findInString:mySourceA withPattern:myPattern1 replaceWith:myReplace
   set myTestA2 to my findInString:mySourceA withPattern:myPattern2 replaceWith:myReplace
   
   if logResults then
       log ("-------------------------------------------NEW TEST START")
       log {"----------------myTestName is", myTestName}
       
       log {"myPattern1 is", myPattern1}
       log {"myPattern2 is", myPattern2}
       log {"mySourceA is", mySourceA}
       log {"mySourceB is", mySourceB}
       log {"myReplace is", myReplace}
       log {"myTestA1 is", myTestA1}
       log {"myTestA2 is", myTestA2}
   end if
   
   
   -- REMOVE DASH TO END
   set myTestName to "REMOVE DASH TO END"
   set myPattern1 to "( -\\s?(.*))"
   set myPattern2 to "(\\s+-\\s?(.*))" -- also remove { and [
   set mySourceA to "Blah (blah1) -me to (plus) check{all the men) and the [alll them]"
   set mySourceB to ""
   set myReplace to ""
   set myTestA1 to my findInString:mySourceA withPattern:myPattern1 replaceWith:myReplace
   set myTestA2 to my findInString:mySourceA withPattern:myPattern2 replaceWith:myReplace
   
   if logResults then
       log ("-------------------------------------------NEW TEST START")
       log {"----------------myTestName is", myTestName}
       
       log {"myPattern1 is", myPattern1}
       log {"myPattern2 is", myPattern2}
       log {"mySourceA is", mySourceA}
       log {"mySourceB is", mySourceB}
       log {"myReplace is", myReplace}
       log {"myTestA1 is", myTestA1}
       log {"myTestA2 is", myTestA2}
   end if
   
   
   -- CAPTURE YEAR 19xx or 20xx or 21xx
   set myTestName to "CAPTURE YEAR 19xx or 20xx or 21xx"
   set myPattern1 to "/\\s?(?:\\(|\\[|\\{)?([1-2][0|1|9][0-9]{2})(?:\\)|\\]|\\})?\\s?/i"
   set myPattern2 to "\\s?(?:\\(|\\[|\\{)?([1-2][0|1|9][0-9]{2})(?:\\)|\\]|\\})?\\s?\\-?\\s?(.*)\\s\\[.*(HD).*\\]"
   set mySourceA to "1971 - Paul Simon [24bit 96kHz 2010 HDtracks FLAC]"
   set mySourceB to "1923 Rolling Stones"
   set myReplace to "$2 $3 ($1)"
   set myTestExpect1 to "Paul Simon HD (1971)"
   set expectResults2 to "Rolling Stones (1923)"
   set myTestA1 to my findInString:mySourceA withPattern:myPattern1 replaceWith:myReplace
   set myTestA2 to my findInString:mySourceB withPattern:myPattern1 replaceWith:myReplace
   
   if logResults then
       log ("-------------------------------------------NEW TEST START")
       log {"----------------myTestName is", myTestName}
       
       log {"myPattern1 is", myPattern1}
       log {"myPattern2 is", myPattern2}
       log {"mySourceA is", mySourceA}
       log {"mySourceB is", mySourceB}
       log {"myReplace is", myReplace}
       log {"myTestA1 is", myTestA1}
       log {"myTestA2 is", myTestA2}
   end if
   
   -- CAPTURE YEAR V2
   set myTestName to "CAPTURE YEAR V2"
   set myPattern1 to "/\\s?(?:\\(|\\[|\\{)?([1-2][0|1|9][0-9]{2})(?:\\)|\\]|\\})?\\W+/i"
   set myPattern2 to ""
   set mySourceA to "2009 - The Rolling Stones - Great Album"
   set mySourceB to ""
   set myReplace to "$`$' ($1)"
   set expectResults1 to "The Rolling Stones - Great Album (2009)"
   set myTestA1 to my findInString:mySourceA withPattern:myPattern1 replaceWith:myReplace
   set myTestA2 to ""
   
   if logResults then
       log ("-------------------------------------------NEW TEST START")
       log {"----------------myTestName is", myTestName}
       
       log {"myPattern1 is", myPattern1}
       log {"myPattern2 is", myPattern2}
       log {"mySourceA is", mySourceA}
       log {"mySourceB is", mySourceB}
       log {"myReplace is", myReplace}
       log {"myTestA1 is", myTestA1}
       log {"myTestA2 is", myTestA2}
   end if
   
   
   -- CAPTURE YEAR V3 More Complete With Capture Groups
   set myTestName to "CAPTURE YEAR V3 More Complete With Capture Groups"
   set myPattern1 to "/(.*)\\s?(?:\\(|\\[|\\{)?([1-2][0|1|9][0-9]{2})(?:\\)|\\]|\\})?\\W+(.*)/i"
   set myPattern2 to ""
   set mySourceA to "2009 - The Rolling Stones - Great Album"
   set mySourceB to ""
   set myReplace to "$1$3 ($2)"
   set expectResults1 to "The Rolling Stones - Great Album (2009)"
   set myTestA1 to my findInString:mySourceA withPattern:myPattern1 replaceWith:myReplace
   set myTestA2 to ""
   
   if logResults then
       log ("-------------------------------------------NEW TEST START")
       log {"----------------myTestName is", myTestName}
       
       log {"myPattern1 is", myPattern1}
       log {"myPattern2 is", myPattern2}
       log {"mySourceA is", mySourceA}
       log {"mySourceB is", mySourceB}
       log {"myReplace is", myReplace}
       log {"myTestA1 is", myTestA1}
       log {"myTestA2 is", myTestA2}
   end if
   
   
   -- REMOVE DASH TO END
   set myTestName to "REMOVE DASH TO END"
   set myPattern1 to "( -\\s?(.*))"
   set myPattern2 to "(\\s+-\\s?(.*))" -- also remove { and [
   set mySourceA to "Blah (blah1) -me to (plus) check{all the men) and the [alll them]"
   set mySourceB to ""
   set myReplace to ""
   set myTestA1 to my findInString:mySourceA withPattern:myPattern1 replaceWith:myReplace
   set myTestA2 to my findInString:mySourceA withPattern:myPattern2 replaceWith:myReplace
   
   if logResults then
       log ("-------------------------------------------NEW TEST START")
       log {"----------------myTestName is", myTestName}
       
       log {"myPattern1 is", myPattern1}
       log {"myPattern2 is", myPattern2}
       log {"mySourceA is", mySourceA}
       log {"mySourceB is", mySourceB}
       log {"myReplace is", myReplace}
       log {"myTestA1 is", myTestA1}
       log {"myTestA2 is", myTestA2}
   end if
   
   -- MUSIC SPECIFIC
   
   -- GENRE REFORMAT
   set myTestName to "GENRE REFORMAT"
   --set myPattern1 to "(\\s?,\\s?)|(?:\\b)(\\s?[/]\\s?)(?:\\b)" -- this works
   set myPattern1 to "(^|\\W+)?(,|\\/)(\\W+|$)?" -- this works and also does not replace dashes
   set myPattern2 to "(^|\\W+)?(,|\\/|\\s)(\\W+|$)?" -- this works and also replaces spaces with dash
   set mySourceA to "futureSoul/RnB Disco Boogie, Funk"
   set mySourceB to ""
   set myReplace to " - "
   set expectResults1 to "futureSoul - RnB - Disco - Boogie - Funk"
   set myTestA1 to my findInString:mySourceA withPattern:myPattern1 replaceWith:myReplace
   set myTestA2 to my findInString:mySourceA withPattern:myPattern2 replaceWith:myReplace
   
   if logResults then
       log ("-------------------------------------------NEW TEST START")
       log {"----------------myTestName is", myTestName}
       
       log {"myPattern1 is", myPattern1}
       log {"myPattern2 is", myPattern2}
       log {"mySourceA is", mySourceA}
       log {"mySourceB is", mySourceB}
       log {"myReplace is", myReplace}
       log {"myTestA1 is", myTestA1}
       log {"myTestA2 is", myTestA2}
   end if
*)

-- MUSIC SPECIFIC

(*
   -- VS REPLACE
   set myTestName to "VS REPLACE"
   set myPattern1 to "(^|\\W+)(vs|versus)(\\W+|$)"
   set myPattern2 to "/(^|\\W+)(vs|versus)(\\W+|$)/mi"
   set mySourceA to "this is kerry vs. the world and her versus everything and dj vs me and vse "
   set mySourceB to ""
   set myReplace to " & "
   set myTestExpect to "this is kerry & the world and her & everything and dj & me and vse "
*)


-- *myTestA1 "this is kerry & the world and her & everything and dj & me and vse "*)
-- (*myTestA2 is, (NSString) "this is kerry vs. the world and her versus everything and dj vs me and vse "*)
(*
   
   -- SINGLE ARTIST MATCH
   -- (?:(^the\\s)|(\\,\\s?)|^)(Eagles)(?:$|\\,)
   set myTestName to "SINGLE ARTIST MATCH"
   set myPattern1 to "(?:(^the\\s)|(\\,\\s?)|^)(Eagles)(?:$|\\,)"
   set myPattern2 to "/(^|\\W+)(vs|versus)(\\W+|$)/mi"
   set mySourceA to "The Eagles of"
   set mySourceB to "CCR, Eagles"
   set myReplace to "XXXXXXXXXXXX"
*)


Offline

 

#15 2022-08-22 09:40:44 pm

technomorph
Member
Registered: 2017-12-14
Posts: 279

Re: Regular Expression Capture Groups

Here's a few sites that i've found helpful:

https://www.regexpal.com
^^^^^^^^^ I used this before I found RegExKit
you might find some useful examples


This site is awesome for explaining more advanced topics and even simple ones
Also helpful for making your RegExs more efficient.

https://www.regular-expressions.info/tutorial.html

Offline

 

#16 2022-08-23 07:09:42 am

peavine
Member
From:: Prescott, Arizona
Registered: 2018-09-04
Posts: 1458

Re: Regular Expression Capture Groups

Thanks technomorph.

BTW, have you found a way to save the regular expression and test string when working with RegexKit? I couldn't find a way to do this, and it's not really important, but I thought I'd ask.


2018 Mac mini - macOS Monterey - Script Debugger 8

Offline

 

#17 2022-08-23 08:25:41 pm

technomorph
Member
Registered: 2017-12-14
Posts: 279

Re: Regular Expression Capture Groups

No I don’t think you can save them.
(Or I haven’t figured it out)
Hence why i copy and paste
And save them as I did in my script.

I definitely find I’m adjusting them as
Often something doesn’t get captured or something gets
Captured that I don’t want.

Offline

 

#18 2022-08-24 06:50:13 am

peavine
Member
From:: Prescott, Arizona
Registered: 2018-09-04
Posts: 1458

Re: Regular Expression Capture Groups

Thanks technomorph. When you click on "RegEx Workspace" in the upper-left corner of RegexKit, a dialog states the following, which made me wonder if there might be a way to save the data. I don't think there is, though, and RegexKit is a marvelous app regardless.

Your most recent changes have not been saved. If you leave before saving, your changes will be lost.


2018 Mac mini - macOS Monterey - Script Debugger 8

Offline

 

Board footer

Powered by FluxBB

RSS (new topics) RSS (active topics)