I made a handler to find multiple instances of a sub-string in a string.
The last parameter ‘intemCount’ is optional. Without it, it will return a list of all the indexes of the found string, otherwise if you pass an integer it will find the index of that instance count or return a zero if not found.
offsetMulti of "Rob" out of "My Rob is a cool Robert! His name is Robert..." given itemCount:0
-- {4, 18, 38}
to offsetMulti of findText out of textString given itemCount:ic : 0
local indexList, tid, c, tc
set tid to text item delimiters
set c to length of findText
set text item delimiters to findText
considering case
set textString to text items of textString
end considering
if (count textString) = 1 then return 0
set tc to 1
set indexList to {}
repeat with i from 1 to (count textString) - 1
set tc to tc + (length of item i of textString)
set end of indexList to tc
if i = ic then exit repeat
set tc to tc + c
end repeat
set text item delimiters to tid
if ic = 0 then
return indexList
else if ic < (count textString) then
return item ic of indexList
end if
return 0
end offsetMulti
my offsetsForSubstring:"Rob" inText:"My Rob is a cool Robert! His name is Robert..."
on offsetsForSubstring:subString inText:theText
set rn to (reverse of characters of subString) as text
script go
on |λ|(temp)
set i to offset of rn in temp
if i = 0 then return {}
return |λ|(text (1 + i) thru -1 of temp) & ((length of temp) - i - 1)
end |λ|
end script
return go's |λ|((reverse of characters of theText) as text)
end offsetsForSubstring:inText:
NOTE: as I see, the solution from @robertfern wotks faster.
use framework "Foundation"
use scripting additions
set theString to "My Rob is a cool robert! His name is Robert..."
set thePattern to "(?i)rob" -- remove (?i) to make case sensitive
set theOffsets to getOffsets(theString, thePattern) --> {4, 18, 38}
on getOffsets(theString, thePattern)
set theString to current application's NSString's stringWithString:theString
set theRegex to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
set regexResults to theRegex's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
set matchOffsets to {}
repeat with aMatch in regexResults
set end of matchOffsets to ((aMatch's range()'s location()) + 1)
end repeat
return matchOffsets
end getOffsets
Also, not a big fan of recursive. Too much overhead of stack manipulation.
I always will set mine up to be iterative.
Here is an iterative version of the recursive one from KniazidisR
(it’s 12% faster, and won’t have a recursive stack limit)
offsetMulti of "Rob" out of "My Rob is a cool Robert! His name is Robert..." given itemCount:0
-- {4, 18, 38}
to offsetMulti of findText out of textString given itemCount:ic : 0
local indexList, c, n, tc
set c to (length of findText)
set indexList to {}
set tc to 0
repeat (item (((ic = 0) as integer) + 1) of {ic, 500000}) times
considering case
set n to offset of findText in textString
end considering
if n = 0 then exit repeat
set tc to tc + n
set end of indexList to tc
set tc to tc + c - 1
set textString to text (n + c) thru -1 of textString
end repeat
if (count indexList) > 0 then
if ic = 0 then
return indexList
else if ic = (count indexList) then
return item ic of indexList
end if
end if
return 0
end offsetMulti
I ran some timing tests. The test string contained 4096 instances of the original string, and the results were:
THE SCRIPT IN - TIMING RESULTS
Post 1 - 3.967 seconds
Post 2 - returned stack overflow
Post 3 - 262 milliseconds
Post 4 - 7.350 seconds
I also tested the Post 1 and 3 scripts with a test string that contained 32 instances of the original string and the results were 2 and 6 milliseconds, respectively.
The following is the timing script with my suggestion:
use framework "Foundation"
use scripting additions
-- untimed code
set theString to "My Rob is a cool Robert! His name is Robert... "
repeat 12 times
set theString to theString & theString
end repeat
-- start time
set startTime to current application's CACurrentMediaTime()
-- timed code
set thePattern to "Rob"
set theOffsets to getOffsets(theString, thePattern)
on getOffsets(theString, thePattern)
set theString to current application's NSString's stringWithString:theString
set theRegex to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
set regexResults to theRegex's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
set matchOffsets to {}
repeat with aMatch in regexResults
set end of matchOffsets to ((aMatch's range()'s location()) + 1)
end repeat
return matchOffsets
end getOffsets
-- elapsed time
set elapsedTime to (current application's CACurrentMediaTime()) - startTime
set numberFormatter to current application's NSNumberFormatter's new()
if elapsedTime > 1 then
numberFormatter's setFormat:"0.000"
set elapsedTime to ((numberFormatter's stringFromNumber:elapsedTime) as text) & " seconds"
else
(numberFormatter's setFormat:"0")
set elapsedTime to ((numberFormatter's stringFromNumber:(elapsedTime * 1000)) as text) & " milliseconds"
end if
-- result
elapsedTime --> 262 milliseconds
# count theOffsets --> 12288
That’s a very large string. I can speed mine up drasticallly by using script objects. I’ll do it when I get a minute.
Can i get a copy of the test string you used?
Here it is…
to offsetMulti of findText out of textString given itemCount:ic : 0 -- way Faster
local tid, c, tc
script L
property indexList : {}
property foundStrings : missing value
end script
set tid to text item delimiters
set c to length of findText
set text item delimiters to findText
considering case
set L's foundStrings to text items of textString
end considering
if (count L's foundStrings) = 1 then return 0
set tc to 1
repeat with i from 1 to (count L's foundStrings) - 1
set tc to tc + (length of item i of L's foundStrings)
set end of L's indexList to tc
if i = ic then exit repeat
set tc to tc + c
end repeat
set text item delimiters to tid
if ic = 0 then
return L's indexList
else if ic < (count L's foundStrings) then
return item ic of L's indexList
end if
return 0
end offsetMulti
EDIT - modified one line to shorten an if statement
Robert. The test string is created by the script under the comment “untimed code” (see my script in post 5 above). The result with your new script was 116 milliseconds.