RegEx Search and Incremented Replacement

I’ve been working on something without success. The following script replaces an asterisk at the front of a line with a dash and works as expected. However, I want the replacement character to be consecutive line numbers. This is easily done with a repeat loop, but I wondered if that might be avoided. I did a Google search, which contained some suggestions, but I didn’t understand them. This is not for a particular project–just to learn. Thanks.

use framework "Foundation"

set theString to "a line
 * aa
 * bb
 * cc
a line"

set theCharacter to "*"
set thePattern to "(?m)^(\\h*)\\" & theCharacter
set theString to current application's NSString's stringWithString:theString
set theString to (theString's stringByReplacingOccurrencesOfString:thePattern withString:("$1" & "-") options:(current application's NSRegularExpressionSearch) range:{0, theString's |length|()})
return theString as text

The desired text is:

The repeat loop is necessary to be able to increment the index counter.

This is a solution with NSRegularExpression where the options allow to determine new lines by the ^ character.

It’s mandatory to replace the substrings backwards because the string grows after each replacement and the ranges change.

set theString to "a line
* aa
* bb
* cc
a line"

set theString to my (NSMutableString's stringWithString:theString)
set regex to my (NSRegularExpression's regularExpressionWithPattern:"^\\*\\s" options:(my NSRegularExpressionAnchorsMatchLines) |error|:(missing value))
set matches to regex's matchesInString:theString options:0 range:{0, theString's |length|()}
set theCounter to count matches
repeat with i from theCounter to 1 by -1
	set theRange to (matches's objectAtIndex:(i - 1))'s range()
	(theString's replaceCharactersInRange:theRange withString:((theCounter as text) & " - "))
	set theCounter to theCounter - 1
end repeat
return theString as text

Here’s a minor reworking of Stefan’s solution which I think gives the results peavine wanted. Basically the regex is different and the extraction and use of the ranges is slightly optimised:

use framework "Foundation"

set theString to "a line
 * aa
 * bb
 * cc
a line"

set theString to current application's NSMutableString's stringWithString:theString
-- Look-behinds don't allow infinite repeats, but do allow indefinite ones within a range. 10 should be enough.
set regex to current application's NSRegularExpression's ¬
	regularExpressionWithPattern:"(?m)(?<=^\\h{1,10})\\*" options:(0) |error|:(missing value)
set matches to regex's matchesInString:theString options:0 range:{0, theString's |length|()}
set ranges to matches's valueForKey:"range"
repeat with i from (count ranges) to 1 by -1
	(theString's replaceCharactersInRange:(ranges's item i) withString:(i as text))
end repeat
return theString as text

Here’s a script that includes some handlers that I find I use a lot.
Mainly

  • create a RegEx from a pattern.
  • check if a RegEx contains matches in a string
  • a RegEx replace matches in string with a pattern
    (the continue use of always having to a create a range…arrrrgh)

Workflow:

  • set replaceIndex to 1
  • creates aRegEx
  • splits the testString into aArray
  • enumerate aArray with aLine
  • set aNewLine to aLine
  • check if RegEx matches aLine
    if YES then create the replaceString with replaceIndex and space
    set aNewLine to aRegEx’s replaceMatchesInString:withTemplate:
  • add aNewLine to aFinalArray
  • create aFinalString by joining aFinalArray with linefeed
use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions
use framework "Foundation"

property NSArray : a reference to current application's NSArray
property NSMutableArray : a reference to current application's NSMutableArray
property NSString : a reference to current application's NSString

property NSRegularExpression : a reference to current application's NSRegularExpression
property NSRegularExpressionCaseInsensitive : a reference to 1
property NSRegularExpressionUseUnicodeWordBoundaries : a reference to 40
property NSRegularExpressionAnchorsMatchLines : a reference to 16
property NSRegularExpressionSearch : a reference to 1024

--logging
property logRXMatches : false
property logDebugMode : false

--test properties
property aTestPattern : "^\\*\\s"
property aRegEx : missing value
property aReplaceIndex : 1
property aDelimiter : linefeed
property aTestString : ""
property aTestArray : {}
property aFinalString : ""

set aTestString to "a line
* aa
* bb
* cc
a line"

set aResult to (my incrementalReplaceInString:aTestString withPattern:aTestPattern startIndex:aReplaceIndex) as text

(*
	a line
	1 aa
	2 bb
	3 cc
	a line
*)

on incrementalReplaceInString:aString withPattern:aPattern startIndex:aIndex
	set aFinalArray to NSMutableArray's array()
	set aReplaceIndex to aIndex
	set aRegEx to (my createRegularExpressionWithPattern:aPattern)
	set aTestArray to (my splitString:aString usingDelimiter:(my aDelimiter))
	set aCount to aTestArray's |count|()
	repeat with aIndex from 0 to (aCount - 1)
		set aLine to (aTestArray's objectAtIndex:aIndex)
		set aNewLine to aLine
		if (my containsMatchesForRegEx:aRegEx inString:aLine) then
			set aRepString to NSString's stringWithFormat_("%@ ", aReplaceIndex)
			if (my logDebugMode) then
				log {"aReplaceIndex is:", aReplaceIndex}
				log {"aRepString is:", aRepString}
			end if
			set aNewLine to (my replaceMatchesForRegEx:aRegEx inString:aLine withPattern:aRepString)
			set aReplaceIndex to aReplaceIndex + 1
		end if
		(aFinalArray's addObject:aNewLine)
	end repeat
	
	set aFinalString to (aFinalArray's componentsJoinedByString:(my aDelimiter))
	if (my logDebugMode) then
		log {"aTestArray is:", aTestArray}
		log {"aFinalArray is:", aFinalArray}
		log {"aFinalString is:", aFinalString}
	end if
	return aFinalString
end incrementalReplaceInString:withPattern:startIndex:



-- CREATE A REGULAR EXPRESSION

on createRegularExpressionWithPattern:aRegExString
	if (class of aRegExString) is equal to (NSRegularExpression's class) then
		if (my logDebugMod) then log ("it alreadry was a RegEx")
		return aRegExString
	end if
	set aPattern to NSString's stringWithString:aRegExString
	set regOptions to NSRegularExpressionCaseInsensitive + NSRegularExpressionUseUnicodeWordBoundaries
	set {aRegEx, aError} to (NSRegularExpression's regularExpressionWithPattern:aPattern options:regOptions |error|:(reference))
	if (aError ≠ missing value) then
		log {"regEx failed to create aError is:", aError}
		log {"aError debugDescrip is:", aError's debugDescription()}
		break
		return
	end if
	return aRegEx
end createRegularExpressionWithPattern:

-- REGEX MATCHING FUNCTIONS
-- CONTAINS MATCHES?
on containsMatchesForRegEx:aRegEx inString:aString
	if (aRegEx = missing value) then
		if (my logDebugMode) then log ("aRegEx was nil")
		return false
	end if
	set aCount to (my countOfMatchesForRegEx:aRegEx inString:aString)
	return (aCount > 0)
end containsMatchesForRegEx:inString:

-- COUNT OF MATCHES
on countOfMatchesForRegEx:aRegEx inString:aString
	set matches to (my matchesForRegEx:aRegEx inString:aString)
	return matches's |count|()
end countOfMatchesForRegEx:inString:

-- REGEX MATCHES
on matchesForRegEx:aRegEx inString:aString
	set aSource to NSString's stringWithString:aString
	set aLength to aSource's |length|()
	set aRange to (current application's NSMakeRange(0, aLength))
	set matches to (aRegEx's matchesInString:aSource options:0 range:aRange)
	if (my logRXMatches) then
		log {"matches for Pattern Logs ========"}
		log {"aString is:", aString}
		log {"aSource is:", aSource}
		log {"aLength is:", aLength}
		log {"aRange is:", aRange}
		log {"matches is:", matches}
		if (my logDebugMode) then my debugLogRegEx:aRegEx
	end if
	return matches
end matchesForRegEx:inString:

-- REGEX REPLACE MATCHES IN STRING WITH TEMPLATE

on replaceMatchesForRegEx:aRegEx inString:aString withPattern:aPattern
	set aLength to aString's |length|()
	set aRange to (current application's NSMakeRange(0, aLength))
	set aNewString to (aRegEx's stringByReplacingMatchesInString:aString options:0 range:aRange withTemplate:aPattern)
	return aNewString
end replaceMatchesForRegEx:inString:withPattern:


-- UTILITY SPLIT STRING
on splitString:aString usingDelimiter:aDelimiter
	set aSource to (NSString's stringWithString:aString)
	set aSplitter to (NSString's stringWithString:aDelimiter)
	set aArray to (aSource's componentsSeparatedByString:aSplitter)
	return aArray
end splitString:usingDelimiter:

-- DEBUGGING
on debugLogRegEx:aRegEx
	if (aRegEx = missing value) then
		log {"aRegEx is empty"}
		return
	end if
	set groupCount to aRegEx's numberOfCaptureGroups()
	set aPattern to aRegEx's pattern()
	set aCleanPattern to (NSRegularExpression's escapedPatternForString:aPattern)
	log {"aRegEx is", aRegEx}
	log {"aRegEx aPattern is", aPattern}
	log {"aRegEx aCleanPattern is", aCleanPattern}
	log {"aRegEx groupCount is", groupCount}
	log {"aRegEx options is", aRegEx's options()}
end debugLogRegEx:

Thanks Stefan, Nigel, and technomorph for the script suggestions. They all work great, and I appreciate your time.

Just for learning purposes, I wrote the following script to reset the counter when intervening non-list paragraphs are encountered. I ran timing tests with a string that contained 1,729 paragraphs with a 5-item asterisked list every 50 paragraphs. The timing results for Nigel’s and my scripts were 16 and 28 milliseconds, respectively.

use framework "Foundation"
use scripting additions

set theString to "some text
	* line one
	* line two
some text
	* line one
	* line two
some text"

set numberedString to getNumberedString(theString)

on getNumberedString(theString)
	set theString to current application's NSMutableString's stringWithString:theString
	set thePattern to "(?m)^(\\h*)\\*(.*)$"
	set theRegex to current application's NSRegularExpression's regularExpressionWithPattern:(thePattern) options:(0) |error|:(missing value)
	set theMatches to theRegex's matchesInString:theString options:0 range:{0, theString's |length|()}
	set {theCounter, priorRangeEnd} to {1, 0}
	repeat with aMatch in theMatches
		set aRange to aMatch's range()
		if priorRangeEnd = (aRange's location()) then
			set theCounter to theCounter + 1
		else
			set theCounter to 1
		end if
		(theString's replaceOccurrencesOfString:(thePattern) withString:("$1" & theCounter & "$2") options:(1024) range:aRange)
		set priorRangeEnd to (aRange's location()) + (aRange's |length|()) + 1
	end repeat
	return theString as text
end getNumberedString

I was curious how a basic AppleScript would fare in my testing. The timing result was 233 milliseconds but improved to 11 milliseconds if enhanced with a script object (see below). The testing procedure was the same as that described above.

set theString to "some text
	* line one * one
	* line two * two
some text
	* line one
	* line two
some text"

set numberedString to getNumberedString(theString)

on getNumberedString(theString)
	script o
		property theParagraphs : (paragraphs of theString)
		property numberedParagraphs : {}
	end script
	
	set TID to text item delimiters
	set text item delimiters to "*"
	set theCounter to 1
	
	ignoring white space
		repeat with aParagraph in o's theParagraphs
			set aParagraph to contents of aParagraph
			if aParagraph begins with "*" then
				set end of o's numberedParagraphs to (text item 1 of aParagraph & (theCounter as text) & text items 2 thru -1 of aParagraph)
				set theCounter to theCounter + 1
			else
				set end of o's numberedParagraphs to aParagraph
				set theCounter to 1
			end if
		end repeat
	end ignoring
	
	set text item delimiters to linefeed
	set o's numberedParagraphs to (o's numberedParagraphs as text)
	set text item delimiters to TID
	return o's numberedParagraphs
end getNumberedString

I’ve included below the script I used to run the timing tests. I didn’t post it before because I didn’t want to clutter the forum. To avoid doing that, I’ll remove the script after a few days.

– script deleted by peavine –