Key value coding compliance error

Unfortunately it’s not possible. NSRange is not an object, only types which inherit from NSObject can be key value coding compliant.

If NSRange was KVC compliant you could even write valueForKey:"range.location"

1 Like

Thanks Stefan for the explanation.

Thanks Fredrik71 for looking at my post. I want the script to return a list of the location of every instance of the search text (which is test). The string will vary in length and may have hundreds of matches of the search text.

Hey peavine,

It sound like he wants the routine i made in a previous post

No need to use ASObjC as it’s actually slower

Robert. I’m not sure who you are referring to?

My question was a purely technical one–which arose in the context of the thread you mention–and my question was whether key value coding could be used to get the locations from an array of NSRanges. Stefan answered my question and explained why this was not possible.

RegEx has a function to Enumerate it’s matches, which are TextChecking results:

If a capture group match is not found the range with be nil.

I don’t know if NSTextCheckingResult is a subclass if NSOBject, NSRange definitely is not.

Hi, seems easier to use shell

hi,
seems easier with shell:

set theString to “This is a test line with a Test word”
set thePattern to “(?i)test”
set matchingData to getMatches(theString, thePattern)

on getMatches(str, pattern)
set shellcmd to "echo " & qt(str) & "| grep -bEo " & qt(pattern)
set res to (do shell script shellcmd)
set off to offset of “:” in res
return {text 1 thru (off - 1) of res as integer, count of (characters (off + 1) thru -1 of res)}
end getMatches
on qt(str)
return “"” & str & “"”
end qt

Hallenstal. Thanks for responding to my thread. I tested your script but it did not seem to return the expected results, which were {10, 27}. I am not knowledgeable with grep so perhaps I’m doing something wrong. BTW, your script threw an error as written and I had to escape the quote in the qt handler.

set theString to "This is a test line with a Test word"
set thePattern to "(?i)test"
set matchingData to getMatches(theString, thePattern) --> {10, 12}

on getMatches(str, pattern)
	set shellcmd to "echo " & qt(str) & "| grep -bEo " & qt(pattern)
	set res to (do shell script shellcmd)
	set off to offset of ":" in res
	return {text 1 thru (off - 1) of res as integer, count of (characters (off + 1) thru -1 of res)}
end getMatches

on qt(str)
	return "\"" & str & "\""
end qt

@peavine

Try this:

-- Tested on Monterey 12.6.3
use framework "Foundation"
use scripting additions

set theString to "This is a test line with a Test word"
set thePattern to "(?i)test"
set matchingData to getMatches(theString, thePattern)

on getMatches(theString, thePattern)
	set theString to current application's NSString's stringWithString:theString
	set theRegex to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
	set regexResults to theRegex's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
	set theRanges to (regexResults's valueForKey:"range") as list
	set theArray to current application's NSArray's arrayWithArray:theRanges
	set theLocations to (theArray's valueForKey:"location")
	return theLocations as list --> {10, 27}
end getMatches

@Hallenstal shell is working. It’s just the parsing method that is wrong.

set theString to "This is a test line with a Test word other test"
set thePattern to "(?i)test"
set theLocations to getLocations(theString, thePattern) --> {"10", "27", "43"}

on getLocations(str, pattern)
	set res to do shell script ("echo " & quoted form of (str) & "| grep -bEo " & quoted form of (pattern))
	return paragraphs of (do shell script ("echo " & quoted form of (res) & "| grep -Eo " & quoted form of ("\\d+")))
end getLocations

@ionah. Many thanks for your suggestions, both of which work great.

It was mentioned above that my original request arose from another thread (see link below), which had to do with timing results in finding 1288 instances of a substring in a string. I reran these tests with @ionah’s two suggestions and the results were as follows (all times are milliseconds):

ionah’s Grep script - 76
robertfern’s AppleScript script - 116
ionah’s ASObjC script - 160
peavine’s ASObjC script - 262

Although not of much (or any) significance, there are a few differences in the results returned by the script suggestions:

  • The substring locations in the grep and ionah’s ASObjC script are zero-based.
  • The grep script returns numbers as text; the other suggestions return integers.

The thread mentioned above can be found here

@peavine

I don’t know what method you’re using to get those results.
Here are the ones I get with Script Geek:

MacPro6.1, macOS Version 12.6.3 (21G419), 100 iterations

First Run Total Time Average
AppleScriptObjC 0.408 0.078 0.001
Shell grep 0.016 1.263 0.013

Ratio (excluding first run): 1:16.27

@ionah. I don’t use Script Geek in this particular instance because of the difficulty in getting a very large string without including the time it takes to get that string in the total timing results. My grep timing script:

use framework "Foundation"
use scripting additions

-- untimed code
set theString to "My Rob is a cool Robert! His name is Robert... "
repeat 12 times
	set theString to theString & theString
end repeat

-- start time
set startTime to current application's CACurrentMediaTime()

-- timed code
set thePattern to "(?i)Rob"
set theOffsets to getLocations(theString, thePattern)
on getLocations(str, pattern)
	set res to do shell script ("echo " & quoted form of (str) & "| grep -bEo " & quoted form of (pattern))
	return paragraphs of (do shell script ("echo " & quoted form of (res) & "| grep -Eo " & quoted form of ("\\d+")))
end getLocations

-- elapsed time
set elapsedTime to (current application's CACurrentMediaTime()) - startTime
set numberFormatter to current application's NSNumberFormatter's new()
if elapsedTime > 1 then
	numberFormatter's setFormat:"0.000"
	set elapsedTime to ((numberFormatter's stringFromNumber:elapsedTime) as text) & " seconds"
else
	(numberFormatter's setFormat:"0")
	set elapsedTime to ((numberFormatter's stringFromNumber:(elapsedTime * 1000)) as text) & " milliseconds"
end if

-- result
elapsedTime --> 76 milliseconds
# count theOffsets --> 12288
# theOffsets

My ASObjC timing script:

use framework "Foundation"
use scripting additions

-- untimed code
set theString to "My Rob is a cool Robert! His name is Robert... "
repeat 12 times
	set theString to theString & theString
end repeat

-- start time
set startTime to current application's CACurrentMediaTime()

-- timed code
set thePattern to "(?i)Rob"
set theOffsets to getMatches(theString, thePattern)
on getMatches(theString, thePattern)
	set theString to current application's NSString's stringWithString:theString
	set theRegex to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
	set regexResults to theRegex's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
	set theRanges to (regexResults's valueForKey:"range") as list
	set theArray to current application's NSArray's arrayWithArray:theRanges
	set theLocations to (theArray's valueForKey:"location")
	return theLocations as list
end getMatches

-- elapsed time
set elapsedTime to (current application's CACurrentMediaTime()) - startTime
set numberFormatter to current application's NSNumberFormatter's new()
if elapsedTime > 1 then
	numberFormatter's setFormat:"0.000"
	set elapsedTime to ((numberFormatter's stringFromNumber:elapsedTime) as text) & " seconds"
else
	(numberFormatter's setFormat:"0")
	set elapsedTime to ((numberFormatter's stringFromNumber:(elapsedTime * 1000)) as text) & " milliseconds"
end if

-- result
elapsedTime --> 160 milliseconds
# count theOffsets --> 12288
# theOffsets

To avoid this, you can build a large string in any text editor and copy it as a global variable in your script. This way the string will be loaded at compile time and will not be included in the time calculation.

As you can see, using a loop or not does not make a significant difference.

1 Like

Hi
grep -E support POSIX ERE regular expressions. You need to adapt the pattern accordingly.

BR

modified version.
observe that grep only support ERE using -E option, to ignore case, option -i is used
if you want to ignore case in first letter then use brackets e.g. “[Tt]est”
see for example wikipedia of POSIX ERE
BR
regexp.applescript (824 Bytes)

You can probably get even higher speed with just one pipe meaning only one call to the shell as below:

set theString to "This is a test line with a Test word other test"
set thePattern to "(?i)test"
set theLocations to getLocations(theString, thePattern) --> {"10", "27", "43"}

on getLocations(str, pattern)
	set res to do shell script "echo " & quoted form of (str) & "| grep -bEo " & quoted form of (pattern) & "| grep -Eo " & quoted form of ("\\d+")
	return paragraphs of res
end getLocations

BR

Apparently, the grep that comes on a mac —or at least v2.5.1— has a bug such that -b always returns 0. I should note that I haven’t seen anything official to that effect but after doing some searches for grep byte offset, I found several comments making this allegation.

On a whim, I used macports to install gnu grep (ggrep v3.8 which dates back to 2019) and it is returning offsets of 10, 27, 43 on the longer test string. I’m running Sierra so perhaps there are other versions available but at least the --byte-offset option now works.

This is a minor variation on @ionah’s script. While it’s possible to make ggrep the default grep and put it on the path, I’m holding off on that so I had to include its path in the shell command. Additionally, the (?i) option is a PCRE feature so instead of using -E, it requires the -P option. The last leg of the command limits the response to digits and I used text delimiters to remove the returns. I don’t have any tools to test its speed with but it gives the impression of being fairly quick.

set theString to "This is a test line with a Test word other test"
set thePattern to "(?i)test"
set matchingData to getMatches(theString, thePattern)

set AppleScript's text item delimiters to return
text items of (matchingData)

on getMatches(str, pattern)
	set shellcmd to "echo " & qt(str) & " | /opt/local/bin/ggrep -Pbo " & qt(pattern) & " | /opt/local/bin/ggrep -o '[[:digit:]]*'"
	set res to (do shell script shellcmd)
	return res
end getMatches

on qt(str)
	return "'" & str & "'"
end qt
--> {"10", "27", "43"}

This is the shell command that is being run:

echo 'This is a test line with a Test word other test' | /opt/local/bin/ggrep -Pbo '(?i)test' | /opt/local/bin/ggrep -o '[[:digit:]]*'

No, this is Swift:

let pattern = /(?i)test/
let testString = "This is a test line with a Test word"
let matches = testString.matches(of: pattern)
let result = matches.map{NSRange($0.range, in: testString).location}
print(result) // [10, 27]

and if you want the substrings

let result = matches.map{String(testString[$0.range])}

1 Like

[quote=“Mockman, post:24, topic:74278”]

I’m runninggrep (BSD grep, GNU compatible) 2.6.0-FreeBSD, which is in MacOS Ventura Version 13.2.1, and it works. Even stranger that (?i) works, since only ERE should work. anyway you can use ERE for most patterns except forward looking.

BR

For the heck of it, I wrote the same thing (I hope) in JavaScript:

(() => {
const str = "My Rob is a cool Robert! His name is Robert... ".repeat(13);

/* start time */
const startTime = new Date().getTime();
for (let i = 0; i < 100000; i++) {
const regEx = /Rob/ig;
const matches = str.matchAll(regEx);
const locations = [...matches].map(m => m.index);
}
const elapsedTime = new Date().getTime() - startTime;
console.log(elapsedTime);
})()

Result is 3205 ms for 100,000 iterations, so about 0.03 ms per iteration when run in Script Editor. On the command line (osascript ...), it runs in a little less time: 2800 ms, i.e. 0.028ms per iteration. I don’t know what the Script Geek timings are in.
I tried the code with a considerably longer string, containing 4096 matches. Then it took about 34 ms per iteration (in Script Editor). So, the run-time behavior is > O(n).