I’ve been attempting to do something that probably can’t be done, but I thought I would ask just in case. The following script returns an error that the class is not key value coding compliant. Is there any way to get this to work without the repeat loop. Thanks.
use framework "Foundation"
use scripting additions
set theString to "This is a test line with a Test word"
set thePattern to "(?i)test"
set matchingData to getMatches(theString, thePattern)
on getMatches(theString, thePattern)
set theString to current application's NSString's stringWithString:theString
set theRegex to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
set regexResults to theRegex's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
set theRanges to (regexResults's valueForKey:"range")
set theLocations to (theRanges's valueForKey:"location") -- returns error
# set theLocations to {} -- this and following returns the desired result
# repeat with aMatch in regexResults
# set end of theLocations to (aMatch's range()'s location())
# end repeat
# return theLocations --> {10, 27}
end getMatches
Thanks Fredrik71 for looking at my post. I want the script to return a list of the location of every instance of the search text (which is test). The string will vary in length and may have hundreds of matches of the search text.
My question was a purely technical one–which arose in the context of the thread you mention–and my question was whether key value coding could be used to get the locations from an array of NSRanges. Stefan answered my question and explained why this was not possible.
set theString to “This is a test line with a Test word” set thePattern to “(?i)test” set matchingData to getMatches(theString, thePattern)
on getMatches(str, pattern) set shellcmd to "echo " & qt(str) & "| grep -bEo " & qt(pattern) set res to (do shell script shellcmd) set off tooffset of “:” in res return {text 1 thru (off - 1) of res asinteger, countof (characters (off + 1) thru -1 of res)} end getMatches on qt(str) return “"” & str & “"” end qt
Hallenstal. Thanks for responding to my thread. I tested your script but it did not seem to return the expected results, which were {10, 27}. I am not knowledgeable with grep so perhaps I’m doing something wrong. BTW, your script threw an error as written and I had to escape the quote in the qt handler.
set theString to "This is a test line with a Test word"
set thePattern to "(?i)test"
set matchingData to getMatches(theString, thePattern) --> {10, 12}
on getMatches(str, pattern)
set shellcmd to "echo " & qt(str) & "| grep -bEo " & qt(pattern)
set res to (do shell script shellcmd)
set off to offset of ":" in res
return {text 1 thru (off - 1) of res as integer, count of (characters (off + 1) thru -1 of res)}
end getMatches
on qt(str)
return "\"" & str & "\""
end qt
-- Tested on Monterey 12.6.3
use framework "Foundation"
use scripting additions
set theString to "This is a test line with a Test word"
set thePattern to "(?i)test"
set matchingData to getMatches(theString, thePattern)
on getMatches(theString, thePattern)
set theString to current application's NSString's stringWithString:theString
set theRegex to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
set regexResults to theRegex's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
set theRanges to (regexResults's valueForKey:"range") as list
set theArray to current application's NSArray's arrayWithArray:theRanges
set theLocations to (theArray's valueForKey:"location")
return theLocations as list --> {10, 27}
end getMatches
@Hallenstal shell is working. It’s just the parsing method that is wrong.
set theString to "This is a test line with a Test word other test"
set thePattern to "(?i)test"
set theLocations to getLocations(theString, thePattern) --> {"10", "27", "43"}
on getLocations(str, pattern)
set res to do shell script ("echo " & quoted form of (str) & "| grep -bEo " & quoted form of (pattern))
return paragraphs of (do shell script ("echo " & quoted form of (res) & "| grep -Eo " & quoted form of ("\\d+")))
end getLocations
@ionah. Many thanks for your suggestions, both of which work great.
It was mentioned above that my original request arose from another thread (see link below), which had to do with timing results in finding 1288 instances of a substring in a string. I reran these tests with @ionah’s two suggestions and the results were as follows (all times are milliseconds):
@ionah. I don’t use Script Geek in this particular instance because of the difficulty in getting a very large string without including the time it takes to get that string in the total timing results. My grep timing script:
use framework "Foundation"
use scripting additions
-- untimed code
set theString to "My Rob is a cool Robert! His name is Robert... "
repeat 12 times
set theString to theString & theString
end repeat
-- start time
set startTime to current application's CACurrentMediaTime()
-- timed code
set thePattern to "(?i)Rob"
set theOffsets to getLocations(theString, thePattern)
on getLocations(str, pattern)
set res to do shell script ("echo " & quoted form of (str) & "| grep -bEo " & quoted form of (pattern))
return paragraphs of (do shell script ("echo " & quoted form of (res) & "| grep -Eo " & quoted form of ("\\d+")))
end getLocations
-- elapsed time
set elapsedTime to (current application's CACurrentMediaTime()) - startTime
set numberFormatter to current application's NSNumberFormatter's new()
if elapsedTime > 1 then
numberFormatter's setFormat:"0.000"
set elapsedTime to ((numberFormatter's stringFromNumber:elapsedTime) as text) & " seconds"
else
(numberFormatter's setFormat:"0")
set elapsedTime to ((numberFormatter's stringFromNumber:(elapsedTime * 1000)) as text) & " milliseconds"
end if
-- result
elapsedTime --> 76 milliseconds
# count theOffsets --> 12288
# theOffsets
My ASObjC timing script:
use framework "Foundation"
use scripting additions
-- untimed code
set theString to "My Rob is a cool Robert! His name is Robert... "
repeat 12 times
set theString to theString & theString
end repeat
-- start time
set startTime to current application's CACurrentMediaTime()
-- timed code
set thePattern to "(?i)Rob"
set theOffsets to getMatches(theString, thePattern)
on getMatches(theString, thePattern)
set theString to current application's NSString's stringWithString:theString
set theRegex to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
set regexResults to theRegex's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
set theRanges to (regexResults's valueForKey:"range") as list
set theArray to current application's NSArray's arrayWithArray:theRanges
set theLocations to (theArray's valueForKey:"location")
return theLocations as list
end getMatches
-- elapsed time
set elapsedTime to (current application's CACurrentMediaTime()) - startTime
set numberFormatter to current application's NSNumberFormatter's new()
if elapsedTime > 1 then
numberFormatter's setFormat:"0.000"
set elapsedTime to ((numberFormatter's stringFromNumber:elapsedTime) as text) & " seconds"
else
(numberFormatter's setFormat:"0")
set elapsedTime to ((numberFormatter's stringFromNumber:(elapsedTime * 1000)) as text) & " milliseconds"
end if
-- result
elapsedTime --> 160 milliseconds
# count theOffsets --> 12288
# theOffsets
To avoid this, you can build a large string in any text editor and copy it as a global variable in your script. This way the string will be loaded at compile time and will not be included in the time calculation.
modified version.
observe that grep only support ERE using -E option, to ignore case, option -i is used
if you want to ignore case in first letter then use brackets e.g. “[Tt]est”
see for example wikipedia of POSIX ERE
BR regexp.applescript (824 Bytes)
You can probably get even higher speed with just one pipe meaning only one call to the shell as below:
set theString to "This is a test line with a Test word other test"
set thePattern to "(?i)test"
set theLocations to getLocations(theString, thePattern) --> {"10", "27", "43"}
on getLocations(str, pattern)
set res to do shell script "echo " & quoted form of (str) & "| grep -bEo " & quoted form of (pattern) & "| grep -Eo " & quoted form of ("\\d+")
return paragraphs of res
end getLocations
Apparently, the grep that comes on a mac —or at least v2.5.1— has a bug such that -b always returns 0. I should note that I haven’t seen anything official to that effect but after doing some searches for grep byte offset, I found several comments making this allegation.
On a whim, I used macports to install gnu grep (ggrep v3.8 which dates back to 2019) and it is returning offsets of 10, 27, 43 on the longer test string. I’m running Sierra so perhaps there are other versions available but at least the --byte-offset option now works.
This is a minor variation on @ionah’s script. While it’s possible to make ggrep the default grep and put it on the path, I’m holding off on that so I had to include its path in the shell command. Additionally, the (?i) option is a PCRE feature so instead of using -E, it requires the -P option. The last leg of the command limits the response to digits and I used text delimiters to remove the returns. I don’t have any tools to test its speed with but it gives the impression of being fairly quick.
set theString to "This is a test line with a Test word other test"
set thePattern to "(?i)test"
set matchingData to getMatches(theString, thePattern)
set AppleScript's text item delimiters to return
text items of (matchingData)
on getMatches(str, pattern)
set shellcmd to "echo " & qt(str) & " | /opt/local/bin/ggrep -Pbo " & qt(pattern) & " | /opt/local/bin/ggrep -o '[[:digit:]]*'"
set res to (do shell script shellcmd)
return res
end getMatches
on qt(str)
return "'" & str & "'"
end qt
--> {"10", "27", "43"}
This is the shell command that is being run:
echo 'This is a test line with a Test word other test' | /opt/local/bin/ggrep -Pbo '(?i)test' | /opt/local/bin/ggrep -o '[[:digit:]]*'
let pattern = /(?i)test/
let testString = "This is a test line with a Test word"
let matches = testString.matches(of: pattern)
let result = matches.map{NSRange($0.range, in: testString).location}
print(result) // [10, 27]
and if you want the substrings
let result = matches.map{String(testString[$0.range])}
I’m runninggrep (BSD grep, GNU compatible) 2.6.0-FreeBSD, which is in MacOS Ventura Version 13.2.1, and it works. Even stranger that (?i) works, since only ERE should work. anyway you can use ERE for most patterns except forward looking.