Regex Negative Lookbehind

I want a regex for use in the Script Debugger Find bar that matches lines 1 and 3 in the following:

set s to "mystring" --good afternoon
set s to "good afternoon" --mystring
set s to "another mystring" --good afternoon
set s to "good afternoon" --mystring two
-- a mystring a --mystring
a -- a mystring a --mystring

The following regex matches lines 1, 3, and 6, and I think I need to use a negative lookbehind to omit line 6, but I can’t get that to work.

^[^-].*mystring.*--.*$

The following regex uses a negative lookbehind but it doesn’t work.

^.*(?<!--)mystring.*--.*$

BTW, the goal is to match mystring if it is before “–” but not if it is after “–”. Thanks for any suggestions.

Hi @peavine.

This seems to work, although it could probably be improved:

(?m)^.*(?<!--.{0,10})mystring[^-]*--.*$

It matches 0 or more optional characters, followed by “mystring” (which isn’t preceded by “--” and 0 to 10 optional characters), which in turn is followed by 0 or more non-"-"s, “--”, and 0 or more characters up to the end of the line. (The “(?m)” may not be ncessary in Script Debugger.)

In other words, the lookbehind is included in the opening “.*”. Because of the way the regex engine works, it’s not permitted to use “*” or “+” operators in a lookbehind. But an indefinite count is allowed if a maximum range is specified. One has to judge how many characters may be enough. I’ve used 0 to 10 here (“{0,10}”) for the number of possible characters between “--” and “mystring”.

2 Likes

Thanks Nigel. That works great.

It turns out I didn’t need to match “–” after mystring, so I modified Nigel’s solution as follows. I formatted it as a script because the forum software modifies it otherwise.

(?m)^.*(?<!--.{0,100})mystring.*$

The ICU documentation for regular expressions contains the following explanation of a Negative Look-Behind assertion. I was especially interested by the last sentence, which confirms what Nigel explains in his post.

Negative Look-behind assertion. True if the parenthesized pattern does not match text preceding the current input position, with the last character of the match being the input character just before the current position. Does not alter the input position. The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators.)

Just to see if table formatting was helpful, I thought I’d use a markdown table to break down the regex pattern into its component parts. I wanted to force the “Regex Pattern” header to one line but nothing (including a non-breaking space) accomplished that.

Regex Pattern Explanation
(?m) a flag option under which “^” and “$” match the start and end of each line rather than the start and end of the input string
^ the beginning of a line
.* zero or more characters
(?<!–.{0,100}) negative look-behind assertion (see below)
mystring a literal string
.* zero or more characters
$ the end of a line

The negative look-behind assertion does not find a match if “myString” is preceded by “–”. There can be no more than 100 characters between “myString” and “–”, although this number can be changed to whatever is desired.