How to extract phrase between double square brackets in a string

Dear forum members, I have a question that is beyond my basic AppleScript skills:

If I have a string “John is [[a boy]] and Mary is [[a girl]], they are [[brother]] and [[sister]]” and I want to extract all words between the [[]] into a list of {“a boy”, “a girl”, “brother”, “sister”}. There may be duplications but I know how to deal with it.

I wonder how to use shell script or other methods to achieve this? I would prefer not to use 3rd part addition if it is possible. I am dealing with about 300-500 words per string and each string will likely have less than ten [[]].

Thank you very much in advance.

The easiest is to use text item delimiters and then get all the even items of the result

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

set mystring to "John is [[a boy]] and Mary is [[a girl]], they are [[brother]] and [[sister]]"
set text item delimiters to {"[[", "]]"}
set textlist to text items of mystring
set mylist to {}
repeat with i from 2 to count of textlist by 2
	set end of mylist to item i of textlist
end repeat
get mylist

It works!
I see that the trick is by step of 2, and using text item delimiters is really fast. Just want to ask one more stupid question. I think using text item delimiter method should be (much)faster than shell script if I have to run this for 500 strings because the script doesn’t need to talk to another app (the shell)?

Thank you very much and have a nice evening.

It depends how many strings, on the length of the strings, and how many substrings you’re expecting to find.
Using AppleScriptObjC would be the best way on really large strings.

There are a few examples in that sub-forum

Thanks again for the advice.

I ran some tests with Script Geek using robertfern’s approach. The list contained 500 items, each of which was the OP’s original string. The first-run time from a cold start was 47 milliseconds and, if modified to use reference-to operators, was 17 milliseconds. The time attributable to the read-file command alone was 9 milliseconds. A very good result.

set myString to paragraphs of (read file "Macintosh HD:Users:Robert:Working:New Text File.txt")

set newList to {}

set text item delimiters to {"[[", "]]"}

repeat with anItem in myString
	set myList to {}
	set textlist to text items of anItem
	repeat with i from 2 to count textlist by 2
		set end of myList to item i of textlist
	end repeat
	set end of newList to myList
end repeat

set text item delimiters to ""

Thanks for the info.

I don’t see your script has “a reference to” statement, are you referring to the list of paragraphs?

`


set myString to a reference to paragraphs of (read file "Macintosh HD:Users:Robert:Working:New Text File.txt")

Or to use property to define the two lists (mylist and newlist) within a script object?

I will test both approaches.

Cheers

This is an AppleScriptObjC solution using Regular Expression


use AppleScript version "2.5"
use framework "Foundation"
use scripting additions

set theString to "John is [[a boy]] and Mary is [[a girl]], they are [[brother]] and [[sister]]"

set cocoaString to current application's NSString's stringWithString:theString
set pattern to "\\[{2}([^]]+)]{2}"
set regex to current application's NSRegularExpression's regularExpressionWithPattern:pattern options:0 |error|:(missing value)
set matches to regex's matchesInString:theString options:0 range:{location:0, |length|:(count theString)}
set theResult to {}
repeat with aMatch in matches
	set end of theResult to (cocoaString's substringWithRange:(aMatch's rangeAtIndex:1)) as text
end repeat
theResult

@StefanK

It would be better with these three instructions at the beginning:

use AppleScript version "2.5"
use framework "Foundation"
use scripting additions

and if you add

 theResult

at the very end. :stuck_out_tongue:

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) dimanche 21 juin 2020 22:28:59

Ngan. The script in my earlier post does not use a-reference-to operators–I’ve included below the version that does. I renamed a few of the variables just to keep things clear.

set oldList to paragraphs of (read file "Macintosh HD:Users:Robert:Working:New Text File.txt")

set newList to {}
set newListRef to a reference to newList

set text item delimiters to {"[[", "]]"}

repeat with anItem in (a reference to oldList)
	set tempList to {}
	set textItems to text items of anItem
	repeat with i from 2 to (count textItems) by 2
		set end of tempList to item i of textItems
	end repeat
	set end of newListRef to tempList
end repeat

set text item delimiters to ""

I retested to confirm that the run time was 17 milliseconds and that the script returned the desired results.

Here are 3.5 versions to compare performances.
All of them apply to a set of datas matching what was described in the original message.

Using script object is an alternate way to take benefit of references.

old fashioned Applescript

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

script o
	property myStrings : {}
	property fullResults : {}
	property partialResults : {}
end script

-- Build a long string
set mystring to ""
repeat with i from 1 to 30
	set mystring to mystring & "John is [[a boy" & i & "]] and Mary is [[a girl" & i & "]], they are [[brother" & i & "]] and [[sister" & i & "]] "
end repeat
-- replicate it 500 times
set myList to {}
repeat 500 times
	set end of myList to mystring
end repeat
set mySource to my recolle(myList, linefeed)

-- Now the data to scan is ready
tell me to say "Go"
set startDate to current application's NSDate's |date|()
set o's fullResults to {}
set o's myStrings to paragraphs of mySource
set text item delimiters to {"[[", "]]"}

repeat with anItem in o's myStrings
	set o's partialResults to {}
	set textlist to text items of anItem
	repeat with i from 2 to count textlist by 2
		set end of o's partialResults to item i of textlist
	end repeat
	set end of o's fullResults to o's partialResults
end repeat

set text item delimiters to {""}
log o's fullResults
"That took " & (-(startDate's timeIntervalSinceNow()) as real) & " seconds."
--> "That took 0,874303936958 seconds."

#=====

on recolle(l, d)
	local oTIDs, t
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set t to l as text
	set AppleScript's text item delimiters to oTIDs
	return t
end recolle

#=====

my attempts to use Regex

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions


script o
	property myStrings : {}
	property fullResults : {}
	property partialResults : {}
end script

-- Build a long string
set mystring to ""
repeat with i from 1 to 30
	set mystring to mystring & "John is [[a boy" & i & "]] and Mary is [[a girl" & i & "]], they are [[brother" & i & "]] and [[sister" & i & "]] "
end repeat
-- replicate it 500 times
set myList to {}
repeat 500 times
	set end of myList to mystring
end repeat
set mySource to my recolle(myList, linefeed)

-- Now the data to scan is ready
tell me to say "Go"
set startDate to current application's NSDate's |date|()
set o's fullResults to {}

set o's myStrings to paragraphs of mySource

repeat with anItem in o's myStrings
	set aList to (its findPattern:"\\[\\[.+?\\]\\]" inString:(anItem as string))
	my decoupe(my supprime(my recolle(aList, linefeed), {"[[", "]]"}), linefeed)
	set end of o's fullResults to result
end repeat

log o's fullResults
"That took " & (-(startDate's timeIntervalSinceNow()) as real) & " seconds."
-- "That took 10,530691981316 seconds." if I disable the call to my decoupe.
-- "That took 10,77609705925 seconds." if I enable the call to my decoupe.

#=====

on findPattern:thePattern inString:theString
	
	set theNSString to current application's NSString's stringWithString:theString
	set theOptions to ((current application's NSRegularExpressionDotMatchesLineSeparators) as integer) + ((current application's NSRegularExpressionAnchorsMatchLines) as integer)
	set theRegEx to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:theOptions |error|:(missing value)
	set theFinds to theRegEx's matchesInString:theNSString options:0 range:{location:0, |length|:theNSString's |length|()}
	set theResult to {} -- we will add to this
	repeat with i from 1 to count of theFinds
		set theRange to (item i of theFinds)'s range()
		set end of theResult to (theNSString's substringWithRange:theRange) as string
	end repeat
	return theResult
end findPattern:inString:

#=====

on decoupe(t, d)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	return l
end decoupe

#=====

on recolle(l, d)
	local oTIDs, t
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set t to l as text
	set AppleScript's text item delimiters to oTIDs
	return t
end recolle

#=====
(*
replaces every occurences of d1 by d2 in the text t
*)
on remplace(t, d1, d2)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d1}
	set l to text items of t
	set AppleScript's text item delimiters to d2
	set t to l as text
	set AppleScript's text item delimiters to oTIDs
	return t
end remplace

#=====
(*
removes every occurences of d in text t
*)
on supprime(t, d)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set l to text items of t
	set AppleScript's text item delimiters to ""
	set t to l as text
	set AppleScript's text item delimiters to oTIDs
	return t
end supprime

#=====

stefanK’s version enhanced

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions


script o
	property myStrings : {}
	property fullResults : {}
	property partialResults : {}
end script

-- Build a long string
set mystring to ""
repeat with i from 1 to 30
	set mystring to mystring & "John is [[a boy" & i & "]] and Mary is [[a girl" & i & "]], they are [[brother" & i & "]] and [[sister" & i & "]] "
end repeat
-- replicate it 500 times
set myList to {}
repeat 500 times
	set end of myList to mystring
end repeat
set mySource to my recolle(myList, linefeed)

-- Now the data to scan is ready
tell me to say "Go"
set startDate to current application's NSDate's |date|()
set o's fullResults to {}

set o's myStrings to paragraphs of mySource
-- two constants defined only once
set pattern to "\\[{2}([^]]+)]{2}"
set regex to (current application's NSRegularExpression's regularExpressionWithPattern:pattern options:0 |error|:(missing value))
repeat with theString in o's myStrings
	set cocoaString to (current application's NSString's stringWithString:theString)
	set matches to (regex's matchesInString:theString options:0 range:{location:0, |length|:(count theString)})
	set o's partialResults to {}
	repeat with aMatch in matches
		set end of o's partialResults to (cocoaString's substringWithRange:(aMatch's rangeAtIndex:1)) as text
	end repeat
	set end of o's fullResults to o's partialResults
end repeat

log o's fullResults
"That took " & (-(startDate's timeIntervalSinceNow()) as real) & " seconds."
--> "That took 10,742474913597 seconds."

#=====

on recolle(l, d)
	local oTIDs, t
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set t to l as text
	set AppleScript's text item delimiters to oTIDs
	return t
end recolle

#=====

The Regex versions require more than 10 times what requires old fashioned AppleScript.

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) dimanche 21 juin 2020 22:55:37

Thank you to all of your suggestions again. It’s not about the answers but the techniques in them that are most valuable to me.

When I first read about script object from the O’ Reilly book and Sanderson’s book twelve months ago, I had no idea regarding what it is about and why is it useful. Ten months ago, I started to use SD for developing complex scripts that are rather intensive on list and list of reference objects manipulation for personal use. From there I began to aware of how script objects, property, a reference to, plist are crucial to the elevate the performance and function of AppleScript. I started to read through the many posts in this forum and I have to say that this forum is a treasure for those of us who are not looking for a quick answer but wanting to advance their skills. This forum explains and illustrates how the nuance in the choice of coding can turn a lacklustre performance into a rather high performing script. Thanks again to Light Night Software founder who carries on with mainatining this forum.