Forums are full of scripting problems that involve removing the extension from a name, extracting text preceded by a date from a photo title, finding and replacing words in a text document, creating web-friendly names by inserting underscores (“_”) in place of spaces, grabbing extensions from file names, or removing “/” characters and replacing them with colons in a path. The answers to these queries almost always involve AppleScript’s Text Item Delimiters. This scriptutorial, if I might call it that, tries to clarify TIDs by explaining how they work and giving some hopefully useful examples.
It’s useful to start the discussion with what we know about written text. In plain English, a delimiter is a character or string of characters used to separate or mark the ends of items of data. We use this idea unconciously all the time. We put parentheses around words in a sentence to indicate that they are an “aside”. In word processing, we think of non-printing characters like space, tab, return, and linefeed as separating words. In anglophone texts, leading capital letters and ending periods are usually our markers for sentences, and return or linefeed characters separate blocks of text containing words and sentences into paragraphs. In HTML text, we use “tags” - special sets of delimiters containing information for the rendering engine of a browser. In forums, we use “bbCode” to do the same thing. The Script Editor automatically treats a double dash: “–”, in a single string ending with a line feed, or “(…)” enclosing several lines of text as “not script”. Those symbols are the Script Editor’s delimiters for a comment in the midst of its otherwise plain text code.
[b]Think of text item delimiters as having two main functions:
-
breaking a string of text into parts called “text items” that are separated by the delimiter chosen. The delimiter does not show in the list of text items.
-
To insert text between each of the items of a list when coercing it to a string of text. The text inserted will be the value of the current delimiter.[/b]
For many of the more common operations, these two functions are often used alternately, and we’ll be exploring examples of how to use that idea here.
AppleScript’s Words and Paragraphs use delimiters: AppleScript includes some built-in special delimiters with names: it is capable of discerning words delimited by some but not all non-printing characters, and paragraphs delimited by line feeds or carriage returns. Here are some examples.
words of "Hi, I'm Peggy-Sue" --> {"Hi", "I'm", "Peggy-Sue"}
words of "Now*is the.time&for all_good (folks) to learn-AppleScript"
--> {"Now", "*", "is", "the.time", "&", "for", "all", "_", "good", "folks", "to", "learn-AppleScript"}
words of "this string has a CR
in it" --> {"this", "string", "has", "a", "CR", "in", "it"} - but the return itself doesn't show.
words of "this string has" & return & "in it"
-- {"this", "string", "has", "
-- ", "in it"} - this time it does show because we inserted it as a character.
words of "this string has a" & tab & "in it" --> {"this", "string", "has", "a", " ", "in it"} - the tab counts
words of "set-piece is hyphenated" --> {"set-piece", "is", "hyphenated"} - the hyphen isn't separated
words of "funny*word with asterisk" --> {"funny", "*", "word", "with", "asterisk"} - the asterisk is separate
set t to "This, he said, is a sentence."
words of t --> {"This", "he", "said", "is", "a", "sentence"}
-- Note that the commas didn't come through. The commas in the result are separating the words in the list.
paragraphs of "this string has a return
in it" --> {"this string has a return", "in it"}
AppleScript’s text item delimiters: AppleScript includes a global property called “text item delimiters” that can be set in a script. The default value is {“”}, i.e., scripts in which the text item delimiters are not specifically changed don’t alter text in any way. Note that while the default value of AppleScript’s text item delimiters is actually {“”}, in our AppleScripts, “” will do because AppleScript will usually treat single-item lists as the item the list contains. The reason for the list default is to allow for future expansion with multiple delimiters, but this has been so for years and multiple delimiters have yet to be implemented. Perhaps they never will be. See this article on “AppleScript Properties” for the definitive word.
In the discussion that follows, I have carefully referred to “AppleScript’s text item delimiters” in the examples. As a matter of interest, for AppleScripts not involving a tell block, “text item delimiters” by itself will work in most scripts without the “AppleScript’s” preface. Caution is required, however, when a tell block is involved because applications like TextEdit also use the AppleScript key words “text item delimiters” in their dictionaries, and instant confusion will result about whose text item delimiters are meant. Beginners should stick to the full version. Experienced scripters will know when to use the shortened form.
It is easy to see that there is no pre-set delimiter in AppleScript if you take a list of words and coerce them into a text string as in the example below. (Note that for the remainder of this article, I will omit the braces when setting AppleScript’s text item delimiters.)
set myWords to words of "The time has come the walrus said"
--> {"The", "time", "has", "come", "the", "walrus", "said"}
-- Now coerce that list back to a string:
myWords as string --> "Thetimehascomethewalrussaid"
-- That is squished together because no text "" is put between the items.
-- but if we set the AppleScript's text item delimiters to space...
set AppleScript's text item delimiters to space
set MW to myWords as string
set AppleScript's text item delimiters to "" -- ALWAYS SET THEM BACK
MW --> "The time has come the walrus said"
It is important to pay attention to the message: “ALWAYS SET THEM BACK”. AppleScript remembers its delimiters setting. Even if you open a new second script in the Script Editor, the delimiters you set in the first will apply in the second. The “always set them back afterwards” philosophy with delimiters will avoid serious problems later. Some people use an “always set them explicitly before use” approach, which works well within self-contained scripts. It is the view of many expert scripters, however, that it’s safest to script courteously but defensively. Reset the delimiters yourself before finishing or handing over to another script, but don’t assume that other scripts are doing the same for you. Note what they are and set them back when they are no longer needed is the best policy.
As a second example, the Script Editor uses ASCII character 10 (a line feed) between paragraphs so delimiting paragraphs by looking for line feeds is the same as simply asking for the paragraphs.
set txt to "line 1
line 2
line 3
line 4"
set tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to ASCII character 10 -- (a line feed)
set newTxt to text items of txt -- not text of, text items of
set AppleScript's text item delimiters to tid -- whatever they were before - ALWAYS SET THEM BACK!
newTxt --> {"line 1", "line 2 ", "line 3", "line 4"}
-- The script term "words" segregates text items using spaces, returns, and line feeds
words of txt --> {"line", "1", "line", "2", "line", "3", "line", "4"}
-- and the script term "paragraphs" produces the same result as the delimiters did above
paragraphs of txt --> {"line 1", "line 2 ", "line 3", "line 4"}
and if we look at txt as string with a line feed as the delimiter we get:
(* line 1
line 2
line 3
line 4 *)
Searching and Replacing: One of the most frequent uses of AppleScript’s text item delimiters is to search a string of text and replace one or more words in it with an alternative. Most text processors have a built-in tool to do this. Here’s a tool in AppleScript for finding and replacing text in a variable. It is in the form of a handler called “switchText”:
set ourText to "To be or not to be, that is the question."
set findThis to "be"
set replaceItWith to "script"
set newText to switchText of ourText from findThis to replaceItWith -- our call to the handler
--> "To script or not to script, that is the question."
-- but we'll continue:
set nextText to switchText of newText from " is the question" to " in doubt"
--> "To script or not to script, that in doubt."
-- and then again:
set lastText to switchText of nextText from "that" to "never"
--> "To script or not to script, never in doubt."
to switchText of theText from SearchString to ReplaceString
set OldDelims to AppleScript's AppleScript's text item delimiters
set AppleScript's AppleScript's text item delimiters to SearchString
set newText to text items of theText
set AppleScript's AppleScript's text item delimiters to ReplaceString
set newText to newText as text
set AppleScript's AppleScript's text item delimiters to OldDelims
return newText
end switchText
Now what has happened here? Here’s a really nice explanation of what the handler is doing written by Kai Edwards (who I hope will forgive me for the editorial changes I’ve made to it):
set newText to switchText of "What, Purple Shoes?" from "Purple" to "Green"
to switchText of currentText from SearchString to ReplaceString -- the handler
set storedDelimiters to AppleScript's text item delimiters
-- this simply stores the current value of AppleScript's AppleScript's text item delimiters
-- so they can be restored later (thus helping to avoid potential problems elsewhere).
-- Remember, we always set them back to what they were.
set AppleScript's text item delimiters to SearchString
-- AppleScript's AppleScript's text item delimiters are now set to "Purple"
set currentText to currentText's text items -- note we have changed currentText's value
-- create a list of text items from the original text, separated at the points where the
-- current text item delimiter ("Purple") appeared.
--> {"What, ", " Shoes?"} - Note that the spaces and punctuation are retained.
set AppleScript's text item delimiters to ReplaceString
-- AppleScript's AppleScript's text item delimiters are now set to "Green"
set currentText to currentText as Unicode text
-- coerce the list {"What, ", " Shoes?"} to Unicode text. This operation will also
-- insert the current value of AppleScript's AppleScript's text item delimiters ("Green")
-- between each of the listed items
--> "What, Green Shoes?"
set AppleScript's text item delimiters to storedDelimiters
-- restore the value of AppleScript's AppleScript's text item delimiters
-- to whatever they were on entering the subroutine. Remember that a call to this
-- might have been made from within a section of script that had the TIDs set to
-- something else. Hand the result back with the TIDs as they were.
currentText
-- return the now modified text (and restored TIDs) -- "What, Green Shoes?"
end switchText -- the end of the handler.
Finding Base Name of File (without the extension): Another example often seen is removing an extension, i.e., finding the “base name” of a file. TIDs will do it, although they are assuredly not the only way. We could have reversed the characters of the file name and searched for the first period in a repeat loop, for example.
set jobNum to "123.456.pdf"
getBaseName from jobNum --> "123.456"
-- the following looks after the possibility that the base name includes a "."
-- e.g. This.fileName.ext. If underscores are wanted instead of spaces
-- in a name, uncomment the two commented lines in the handler.
to getBaseName from t -- (Kai Edwards)
set d to AppleScript's text item delimiters
set AppleScript's text item delimiters to "." -- separated at periods
if (count t's text items) > 1 then set t to t's text 1 thru text item -2
-- Aha, there is more than one, but the text is split at the second
--> "123.456"
-- This is actually the result we want, and we could stop but the next few
-- instructions deal with getting the text back to it's initial Unicode
-- text or ASCII text form.
set t to t's text items -- splits t into a list again at the periods
tell t to set t to beginning & ({""} & rest) -- puts it back together to
-- preserve it's "type". Basically what it does is to leave the result as
-- ASCII text if it was ASCII text or as Unicode text if it was Unicode text.
set AppleScript's text item delimiters to d -- always set them back again!
return t
end getBaseName
Note that the purpose of restoring the text to its original form is that avoids potential problems later. This dichotomy of ASCII text versus Unicode text in AppleScript is the subject of another article sometime. It’s a constant source of confusion. Readers might refer to this article in “Joel on Software” for some hints.
Stripping Extra Spaces From Text: This example, by Nigel Garvey, shows how to remove an arbitrary number of spaces in front of and following some text to be retained. This often results from reading text that has been set in columns by separating the words in the rows by enough spaces to line up the columns in a monotext font like Courier or Monoco. As Mr. Garvey said: this is by no means the only way to do this, but it happens to be a convenient way to cater for the possibility that the input string might be all spaces or zero-length.
set someText to " -test bin " -- as Unicode text
set ASTID to AppleScript's AppleScript's text item delimiters -- remember the old value
set AppleScript's text item delimiters to space -- the character we want to remove
set TIs to someText's text items -- get the list of items, {"", "", "", "-test", "bin", "", ""} -- more later on why the empty characters appear.
set a to 1
set b to (count TIs) --> 7 in this case
repeat while (a < b) and ((count item a of TIs) is 0) -- count the characters in the item
set a to a + 1
end repeat
--> a is now 4 for this example, i.e., "-test" is the 4th text item in TIs.
-- Stripping trailing spaces as well.
repeat while (b > a) and ((count item b of TIs) is 0) -- start at the end of TIs and go backwards
set b to b - 1
end repeat
--> b is now 5 for this example, i.e., "bin" is the 5th text item in TIs.
set strippedText to text from text item a to text item b of someText
set AppleScript's AppleScript's text item delimiters to ASTID -- SET THEM BACK!
strippedText --> "-test bin" with the internal space left intact
Why didn’t we lose the space between “-test” and “bin”? Because we counted in from the ends of the list of text items to get a and b and those counts never reached any spaces in the middle of the text items. Notice in this example that someText’s text items turned out to be {“”, “”, “”, “-test”, “bin”, “”, “”}. Why do the “empty” text items appear here but not in other examples? An “empty” item occurs whenever the delimiter occurs at the beginning or end of the given text. Compare these:
set t to "Able was I ere I saw Elba"
set tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to "Able"
set ti to text items of t
ti --> {"", " was I ere I saw Elba"}
-- OR
set AppleScript's text item delimiters to "Elba"
set tii to text items of t
tii --> {"Able was I ere I saw ", ""}
-- BUT
set AppleScript's text item delimiters to "ere"
set tiii to text items of t
tiii --> {"Able was I ", " I saw Elba"}
set AppleScript's text item delimiters to tid
Just as an aside, the opposite of splitting the text at the middle word is finding the middle word using AppleScript’s “middle element reference”:
set t to "Able was I ere I saw Elba"
middle word of t --> "ere"
-- so we could have split this text like this:
set AppleScript's text item delimiters to middle word of t
-- Note that middle [i]element[/i] will return the left one of a pair if there are an even number of [i]element[/i]s in the object of the command.
set tm to text items of t
set AppleScript's text item delimiters to tid
tm --> {"Able was I ", " I saw Elba"}
{item 1 of tm, (reverse of characters of item 2 of tm) as string} --> {"Able was I ", "ablE was I "} - A Palindrome because the middle word is too!
--> Not very useful, but fun. We could have done this about the middle character as well to prove that the phrase was a palindrome (punctuation and capitalization excepted, the same read in either direction).
A word of warning about case: AppleScript’s text item delimiters are case sensitive for plain ASCII text, but they are case insensitive for Unicode text. Further, as of this writing, AppleScript cannot deal with text item delimiters containing Unicode characters that do not map to Western Mac OS Roman characters. This can be problematic when reading text into a script that contains such characters even when they may be quite readable in TextEditor, for example. Exploring what to do about such characters is a complex topic for another day.
The solution when using mappable Unicode text where case is important is to use “considering case” in the script. The following examples illustrate:
set t to "Twas brillig and the slithy toves"
set tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to "twas"
set ti to t's text items
set AppleScript's text item delimiters to tid
ti --> {"Twas brillig and the slithy toves"} -- no match.
-- Try for Unicode text, however:
set t to t as Unicode text
set AppleScript's text item delimiters to "twas"
set tiUCT to t's text items
set AppleScript's text item delimiters to tid
tiUCT --> {"", " brillig and the slithy toves"}
To fix that “miss” for Unicode text, we insert “considering case”
set t to "Twas brillig and the slithy toves" as Unicode text
considering case
set AppleScript's text item delimiters to "twas"
set tiU to t's text items
set AppleScript's text item delimiters to tid
end considering
tiU --> {"Twas brillig and the slithy toves"}
-- "Now, twas" is not equal to "Twas"
As our final examples, here are four scripts for dealing with multiple delimiters. The first pair, below, includes a handler for finding the text between two delimiters.
set t to "My father has spanked me, and my mother has spanked me; all my aunts and uncles have spanked me for my 'satiable curtiosity; and still I want to know what the Crocodile has for dinner!"
extractBetween(t, "my ", ";") --> "'satiable curtiosity"
---- The handler ----
to extractBetween(SearchText, startText, endText)
set tid to AppleScript's text item delimiters -- save them for later.
set AppleScript's text item delimiters to startText -- find the first one.
set endItems to text of text item -1 of SearchText -- everything after the first.
set AppleScript's text item delimiters to endText -- find the end one.
set beginningToEnd to text of text item 1 of endItems -- get the first part.
set AppleScript's text item delimiters to tid -- back to original values.
return beginningToEnd -- pass back the piece.
end extractBetween
A more useful use of the same handler is to form the bbCode for a link in a forum from a webloc (anything but a Safari webloc created by dragging the favicon to the desktop - drag the text only if you use Safari). To illustrate this on this web page without confusing your browser and the software that produces the page, I must change the brackets used from the usual left and right chevrons “<->” for XML, and left and right brackets “[-]” for the bbCode. To actually use this script (which will run as is), you will have to change them back. I have made the chevrons into ^, and the brackets into |.
-- read (pathToYourWeblocHere) - text shown below as "p"
set p to "^?xml version=\"1.0\" encoding=\"UTF-8\"?^
^!DOCTYPE plist PUBLIC \"-//Apple Computer//DTD PLIST 1.0//EN\" \"http://www.apple.com/DTDs/PropertyList-1.0.dtd\"^
^plist version=\"1.0\"^
^dict^
^key^URL^/key^
^string^http://bbs.applescript.net/^/string^
^/dict^
^/plist^
"
-- I want the url from that webloc inserted in bbCode, say, with "AppleScript Forums" as the link text.
set link to "AppleScript Forums" -- the text for my link
set ex to extractBetween(p, "^string^", "^/string^") -- extract the URL
--> "http://bbs.applescript.net/"
set tURL to "|url=" & ex & "|" & link & "|/url|" -- form it into bbCode
"|url=http://bbs.applescript.net/|AppleScript Forums|/url|"
-- this, pasted into a forum, would look like a link but point to the url, after the symbols are changed back.
to extractBetween(SearchText, startText, endText)
set tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to startText
set endItems to text of text item -1 of SearchText
set AppleScript's text item delimiters to endText
set beginningToEnd to text of text item 1 of endItems
set AppleScript's text item delimiters to tid
return beginningToEnd
end extractBetween
As the final set of scripts we explore a handler for dealing with lists of TIDs instead of just a beginning and an ending one. We will use it for only two, however, just to illustrate. Although AppleScript’s text item delimiters are literally a list of one or more strings as was mentioned above, current versions of AppleScript process only the first item of a list and ignore any others, present or not. If AppleScript’s text item delimiters was set to {“tid-1”, “tid-2”} only “tid-1” would be processed. The following is an example by Jon Nathan of a script for processing multiple delimiters.
The handler below depends on the technique of inserting a string not likely to be encountered in any real text, like “?|?” for example, as a marker for each of the delimiters in the given list to be processed, and then replaces the markers with the saved TIDs. To accomplish this, the handler that does it runs through a repeat loop, inserting the marker in place of each delimiter, and then, having marked them all, the handler replaces the markers with the original delimiters.
set the_string to "This is a string: with_multiple delimiters."
set the_delims to {":", "_"}
my multi_atid_split(the_string, the_delims)
--> {"This is a string", " with", "multiple delimiters."}
on multi_atid_split(the_string, the_delims)
-- store the originals and set up the marker.
set {OLD_delim, _marker_} to {AppleScript's text item delimiters, "?|?"}
-- process each of the delimiters in the_delims replacing each with the _marker_
repeat with this_delim in the_delims
my atid(this_delim) -- see the handler that follows
set the_string to text items of the_string
my atid(_marker_)
set the_string to text items of the_string as string
end repeat
-- At this point our text looks like this:
-- "This is a string?|? with?|?multiple delimiters."
my atid(_marker_) -- now get the markers out
set the_string to text items of the_string
my atid(OLD_delim) -- rebuild with the originals
return the_string
end multi_atid_split
-- This 3-line handler saves a lot of typing.
on atid(the_delim)
set AppleScript's text item delimiters to the_delim
end atid
This can be useful too when removing extraneous characters from text. Suppose we had phone numbers in a number of formats and for our database, we wanted a simple string of numbers including the area code but without parentheses, dashes or spaces in it. The following script does this for a sampling of four phone numbers (in the North American style) with and without area codes. It would be easy to modify it to accept a single phone number and clean out the characters we didn’t want.
-- Set the phone numbers to be tested
set phNums to {"456-4321", "876-789-1212", "(898) 321-2121", "505 1234"}
set localAC to "911" -- local area code to use when none is given.
-- Test all the numbers
set cleanedPN to {} -- a place to put the "cleaned" phone numbers
set removals to {"(", ")", "-", space} -- the characters to remove
repeat with k from 1 to count phNums
tell item k of phNums -- to permit using "it" for short
tell my multiTiD(it, removals) as string -- this does the job
if (count of it) > 7 then -- it's got an area code (different "it")
set end of cleanedPN to it -- add to our list
else -- it hasn't got an area code
set end of cleanedPN to localAC & it -- add with AC to our list
end if
end tell
end tell
end repeat
cleanedPN --> {"9114564321", "8767891212", "8983212121", "9115051234"}
--- handlers ---
on multiTiD(tString, delims)
set {saveOTID, _marker_} to {atid("otid"), "?|?"}
repeat with aDelim in delims
my atid(aDelim)
set tString to text items of tString
my atid(_marker_)
set tString to text items of tString as string
end repeat
my atid(_marker_)
set tString to text items of tString
my atid(saveOTID)
return tString
end multiTiD
on atid(delim)
if delim = "otid" then
return AppleScript's text item delimiters
else
set AppleScript's text item delimiters to delim
end if
end atid
Edit: An interested reader (JimT) wrote in to say that the approach above could be problematic if the marker inserted contained any characters to be included in the search. He proposed a workaround as follows:
set theString to "This is a string: with_multiple
delimiters."
set theDelims to {":", "_"} -- remove the colon and underscore
set tTextItems to my multiDelimSplit(theString, theDelims)
--> {"This is a string", " with", "multiple delimiters."}
on multiDelimSplit(theString, theDelims)
set oldDelim to AppleScript's text item delimiters
set theList to {theString}
repeat with aDelim in theDelims
set AppleScript's text item delimiters to aDelim
set newList to {}
repeat with anItem in theList
set newList to newList & text items of anItem
end repeat
set theList to newList
end repeat
set AppleScript's text item delimiters to oldDelim
return theList
end multiDelimSplit
-- note that order of delims can matter if they have a common character
multiDelimSplit("This | is a | string.", {" | ", " "})
--> {"This", "is", "a", "string."}
multiDelimSplit("This | is a | string.", {" ", " | "})
--> {"This", "|", "is", "a", "|", "string."}
-- in the second case, the spaces used by the second delim
-- were removed by the first delim
For even more good examples, search the forums for “AppleScript’s text item delimiters”. There are hundreds of examples and one of them might be just what you need.
Another important example was brought to my attention on June 10, 2010; namely this handler by Yvan Koenig for extracting every instance of text that occurs between bounding delimiters:
-- Extract every instance of text between bounding delimiters (Yvan Koenig)
set t to "My father has spanked me, and my mother has spanked me; all my aunts and uncles have spanked me for my 'satiable curtiosity; and still I want to know what the Crocodile has for dinner!"
set extract to extractBetween(t, "my ", ";") --> {"mother has spanked me", "'satiable curtiosity"}
---- The handler ----
to extractBetween(SearchText, startText, endText)
set tid to AppleScript's text item delimiters -- save them for later.
set AppleScript's text item delimiters to startText -- find the first one.
set liste to text items of SearchText
set AppleScript's text item delimiters to endText -- find the end one.
set extracts to {}
repeat with subText in liste
if subText contains endText then
copy text item 1 of subText to end of extracts
end if
end repeat
set AppleScript's text item delimiters to tid -- back to original values.
return extracts
end extractBetween
Rather neat.
Adam Bell