The ASObjC implementation of regular expressions is quite good, and I’ve included some examples below. The methods and syntax used with ASObjC can be a bit arcane, but the following handlers don’t require a knowledge of that. Simply select a handler, set the regex pattern, and run the handler with a string as a parameter. A few preliminary notes:
-
ASObjC follows the ICU specification for regular expressions (here).
-
Two backslashes need to be used just about anywhere one backslash would normally be used in a regex pattern. For example, use
\\d
for a digit and\\.
for a literal dot. -
Option 1024 is the enumeration for NSRegularExpressionSearch.
-
These handlers are case sensitive, but that can be changed by inserting
(?i)
at the beginning of the regex pattern. -
Many of the handlers in this thread are based on scripts in Shane’s ASObjC book.
All of the handlers require that the following header be placed at the beginning the script.
use framework "Foundation"
use scripting additions
Handler 1 does a simple search and replace.
--Search and Replace
--Replace all instances of three consecutive digits with "xxx"
set theString to "aaa 111 bbb 22 ccc 333 ddd"
set newString to getNewString(theString) -->"aaa xxx bbb 22 ccc xxx ddd"
on getNewString(theString)
set theString to current application's NSString's stringWithString:theString
set thePattern to "\\d{3}"
return (theString's stringByReplacingOccurrencesOfString:thePattern withString:"xxx" options:1024 range:{0, theString's |length|()}) as text
end getNewString
Handler 2 does a search and replace with a capture group, which is the portion of the pattern within parentheses. The handler returns the substring matched by the pattern within the capture group and does not return the substring matched by the pattern outside the capture group. There can be multiple capture groups, and they are identified as $1, $2, and so on.
--Search and Replace with Capture Group
--Return first instance of characters preceded and followed by \"
set theString to "This is \"quoted text\" in a string"
set theSubstring to getSubstring(theString) -->"quoted text"
on getSubstring(theString)
set theString to current application's NSString's stringWithString:theString
set thePattern to ".*?\\\"(.+?)\\\".*"
return (theString's stringByReplacingOccurrencesOfString:thePattern withString:"$1" options:1024 range:{0, theString's |length|()}) as text
end getSubstring
Handler 3 returns every substring that matches the regex pattern.
--Return all matches
--Return all instances of 3 consecutive digits
set theString to "aaa 111 bbb 22 ccc 333 ddd"
set matchingSubstrings to getMatchingSubstrings(theString) --> {"111", "333"}
on getMatchingSubstrings(theString)
set theString to current application's NSString's stringWithString:theString
set thePattern to "\\d{3}"
set theRegex to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
set regexResults to theRegex's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
set theRanges to (regexResults's valueForKey:"range") --an optimization
set theMatches to current application's NSMutableArray's new()
repeat with aRange in theRanges
(theMatches's addObject:(theString's substringWithRange:aRange))
end repeat
return theMatches as list
end getMatchingSubstrings
Handler 4 is the same as handler 3 but only returns the first matching substring.
--Return first match
--Return first instance of 3 consecutive digits
set theString to "aaa 111 bbb 22 ccc 333 ddd"
set matchingSubstring to getMatchingSubstring(theString) -->"111"
on getMatchingSubstring(theString)
set theString to current application's NSString's stringWithString:theString
set thePattern to "\\d{3}"
set theRange to theString's rangeOfString:thePattern options:1024
return (theString's substringWithRange:theRange) as text
end getMatchingSubstring
Handler 5 is the same as handler 3, but it uses a capture group.
--Return all matches with capture group
--Return all matches in parentheses
set theString to "aaa (111) bbb 22 ccc (333) ddd"
set theSubstrings to getSubstrings(theString) -->{"111", "333"}
on getSubstrings(theString)
set theString to current application's NSString's stringWithString:theString
set thePattern to "\\((.+?)\\)"
set theRegex to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
set regexResults to theRegex's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
set theMatches to current application's NSMutableArray's new()
repeat with aMatch in regexResults
set theRange to (aMatch's rangeAtIndex:1) --capture group 1
(theMatches's addObject:(theString's substringWithRange:theRange))
end repeat
return theMatches as list
end getSubstrings
Handler 6 returns a count of the matches. This handler is extremely fast, taking only a few milliseconds to count over 1,000 matches in the text of a 159-page PDF book.
--Return the number of matches found
--Return the number of 3 consecutive digits
set theString to "aaa 111 bbb 22 ccc 333 ddd"
set matchCount to getMatchCount(theString) -->2
on getMatchCount(theString)
set theString to current application's NSString's stringWithString:theString
set thePattern to "\\d{3}"
set theRegex to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
return theRegex's numberOfMatchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
end getMatchCount