Removing leading and trailing space characters from string, plus a couple of text related questions

Hi,
Unfortunately I have spent the past sixteen years writing most my code in xTalk in the Livecode IDE. A feature of xTalk is that almost every variable is a string and the engine handles type conversions. This has coloured my thinking and I am struggling to get to grips with how Applescript handles strings.

I wanted to replicate a “simple” xTalk command in AppleScript :

put "     Now is the time for all good me to come to the aid of the party     " into mystring
  
   repeat until the first char of mystring is not space
      delete the first char of mystring
   end repeat
   
   repeat until the last char of mystring is not space
      delete the last char of mystring
   end repeat

Here is my Applescript version"

set mystring to "     Now is the time for all good me to come to the aid of the party     "

set mystring to TrimLeading(mystring, " ")
set mystring to TrimTrailing(mystring, " ")

display dialog "-->" & mystring & "<--"


on TrimLeading(pString, pTgt)
	set AppleScript's text item delimiters to ""
	set tFirstChar to item 1 of pString as text
	repeat while tFirstChar = " "
		set pString to text 2 thru end of pString
		set tFirstChar to item 1 of pString as text
	end repeat
	return pString
end TrimLeading

on TrimTrailing(pString, pTgt)
	set AppleScript's text item delimiters to ""
	set tLastChar to item -1 of pString as text
	repeat while tLastChar = " "
		set pString to text 1 thru -2 of pString
		set tLastChar to item -1 of pString as text
	end repeat
	return pString
end TrimTrailing

While the Applescript version allows any character to be specified in pTgt meaning it is more powerful than my xtalk example I have a feeling that I am missing an obvious way of trimming characters in Applescript.

Also what is the difference between specifying a coercion “as text” compared with “as string”? Is text just text whereas a string is a list{} of characters?

Hi Simon.

You don’t need to set AppleScript’s text item delimiters in your handlers since nothing in the code is affected by them.

Slightly different code which you may like is;

set mystring to "     Now is the time for all good men to come to the aid of the party     "

set mystring to TrimLeading(mystring, " ")
set mystring to TrimTrailing(mystring, " ")

display dialog "-->" & mystring & "<--"


on TrimLeading(pString, pTgt)
	repeat while (pString begins with pTgt)
		set pString to text 2 thru end of pString
	end repeat
	return pString
end TrimLeading

on TrimTrailing(pString, pTgt)
	repeat while (pString ends with pTgt)
		set pString to text 1 thru -2 of pString
	end repeat
	return pString
end TrimTrailing

Both your handlers and mine will error if the string contains only the characters you want to trim. There’s also an ASObjC method, which probably isn’t much more efficient, but it does cope with strings containing only the characters to trim without the need for extra code:

use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions

set mystring to "     Now is the time for all good men to come to the aid of the party     "

set mystring to current application's NSString's stringWithString:(mystring) -- Get an Objective-C version of the text.
set charactersToTrim to current application's NSCharacterSet's whitespaceCharacterSet()
set mystring to mystring's stringByTrimmingCharactersInSet:(charactersToTrim)
set mystring to mystring as text -- AppleScript version of the edited string.

display dialog "-->" & mystring & "<--"

They’re essentially the same. The official expression’s “as text”, but “as string” does exactly the same thing in most cases. Historically, “as string” was the original expression, when AppleScript still used MacRoman encoding for text. There was a transitional period when it began to be possible to get UTF16BE Unicode text results instead and at that time you’d have to use “as string” for MacRoman results and “as Unicode Text” for the Unicode versions. With the introduction of AppleScript 2.0 (many years ago now!), AppleScript text became UTF16BE anyway and the official expression for coercing to it became “as text”. However, for compatibility with earlier scripts, “as string” and “as Unicode text” remained valid, although now returning the same results as “as text”. It’s very occasionally necessary to use “as string” rather than either of the other two, such as when using the File Read/Write commands to read a file containing MacRoman-encoded text, or if using an ancient hack that someone posted here the other week:

(current date) as «class isot» as string

While on the subject of historical interest, your xTalk put … into syntax is how what is now AppleScript’s copy command use to be expressed back in the really distant past. It still compiles now, although in the copy form. The equivalent of set was " Now is the time for all good me to come to the aid of the party " returning mystring, which still compiles in that form and still works! :slightly_smiling_face:

3 Likes

as string , as text and as Unicode text is same.

set aVar to "abcd" as text
set bVar to "abcd" as string
set cVar to "abcd" as Unicode text

set aCmp to (aVar = bVar)
set bCmp to (aVar = cVar)
set cCmp to (bVar = cVar)

return {aCmp, bCmp, cCmp}
-->{true, true, true}
1 Like

Thanks for all your code. Nigel’s which is both clearer and more concise than my example.

Peavine’s regex solution is interesting and I have made a note of it as I may have some long text lists to process.

Drifting off topic how do you store useful code/handlers for use in other projects?

This sort of follows your original xTalk approach but the collection of characters oscillates between string and list a few times to use the rest property of lists. Not sure how that affects performance — probably not too much with short strings and a lot with long strings.

Basically, if the first character is a space, it converts the string to a list and drops item 1. When done, it reverses the character order and repeats the process to get rid of the trailing spaces.

set mys to "     Now is the time for all good me to come to the aid of the party     "

-- trim from the beginning of the beginning
set mys to shedSpace(mys)

-- reverse character order
set mys to reverse of characters of mys as text

-- trim from the beginning of the end
set mys to shedSpace(mys)

-- set character order right
set mys to reverse of characters of mys as text


-- trim spaces from beginning of string
on shedSpace(mystring)
	repeat until the first character of mystring is not space
		set mystring to the (rest of characters of mystring) as text
	end repeat
end shedSpace

On the matter of string or text, there is one other thing to consider. Applications can define them in their own way, which may not be identical. For example, it can be an issue with the application Devonthink.

That is a novel way of removing the characters. I tried searched for the definition of “rest of” but failed to find it.

Your code also shows, if one cares to look and as you say , at how the string/text swaps between a string and a list of characters. I note what you write about how applications may differ on how they interpret text and string in commands.

But ensure the text item delimiters are set to "" or to {""} first! :grin:

1 Like

You can see it in the chapter on classes linked below. And on the matter of switching between string and list, Nigel’s comment is worth noting. The script is brief but beware if there is any chance of delimiters not being the default.

The property is actually just rest. The ‘of’ in this case is the typical, in that it ties the property to the specified list of elements.

It is indeed novel, but isn’t there a means to remove all space characters on one side in a single step rather using a repeat loop that copies the shrinking string ‘n’ number of space character iteration times? While elegant, it just feels inefficient. Could you explain?

But no ObjC solution pls.

@Mockman, I’m an AS newbie so be gentle…

Here was my trimwhitespace routine…

on trimWhitespace(aString)
	local s, e, ls, ws
	set ws to "  " & tab & linefeed & return
	set ls to length of aString
	set e to ls
	set s to 1
	repeat with s from 1 to ls
		if text s of aString is not in ws then exit repeat
	end repeat
	repeat with e from ls to s by -1
		if text e of aString is not in ws then exit repeat
	end repeat
	if e - s < 0 then return ""
	return text s thru e of aString
end trimWhitespace

the first two characters of ws is space and option-space

Alas, there is not. That’s as gentle as I can be.

Others are much more knowledgeable than I but I can’t help but think that at some point, apple decided to not invest any more time developing the core language. Instead, it offered access to more advanced capabilities through ASObjC (or javascript, sort of).

So, since regex wasn’t really a thing back in the day, it was never added but if you do need regex, you can access it through ASObjC or the shell, where you can use grep, sed, awk or python or whatever.

So the solution seems to be to use one of those external toolsets or roll your own. This affects tasks like sorting as well, and to my mind… date strings.

Any routine that removes an as yet unknown number of spaces from the ends of a piece of text has to iterate through the letters of the text being processed. There is no other way. So if it is completed in Applescript the workings are on display whereas the workings of a non Applescript solution will most likely be hidden from inspection.

The question is one of whether a non Applescript solution offers an advantage when compared with the Applescript one.
Peavine offers an alternative version with the use of Grep:

on getTrimmedText(theString)
	set thePattern to "(?m)^\\h+|\\h+$" -- trim all horizontal whitespace
	set theString to current application's NSString's stringWithString:theString
	set theString to (theString's stringByReplacingOccurrencesOfString:thePattern withString:"" options:(current application's NSRegularExpressionSearch) range:{0, theString's |length|()})
	return theString as text
end getTrimmedText

and Nigel offers an ASObjC version:

set mystring to current application's NSString's stringWithString:(mystring) -- Get an Objective-C version of the text.
set charactersToTrim to current application's NSCharacterSet's whitespaceCharacterSet()
set mystring to mystring's stringByTrimmingCharactersInSet:(charactersToTrim)
set mystring to mystring as text -- AppleScript version of the edited string.

display dialog "-->" & mystring & "<--"

I have read that Applescript is efficient at accessing items in a list {} so the question boils down to is does using a routine written in a different language give an overall speed advantage over an all Applescript version and is any difference worth while? For example I don’t know Grep or ASObjC so I know that if I used either that sometime in the future I would be trying to remember exactly what was doing what.

If I have time tomorrow I’ll try and get some timing for all of the handlers published above.

S

1 Like

Guess I’ll be sticking with scripts filled with arcane UNIX…

on trim(theText)
  return (do shell script "echo \"" & theText & "\" | xargs")
end trim

set theStr to "       test the arcane UNIX approach       "
trim(theStr)

How do you guys time routines anyway? I’d like to learn how to compare functions for speed.

if you’re feeling brave…

set theText to "     Now is the time for all good men to come to the aid of the party     "
set AppleScript's text item delimiters to " "
(words of theText) as text -->"Now is the time for all good me to come to the aid of the party"

That’s more inline with what I was thinking. What side effects would occur if I was brave?

This would remove all punctuation, space runs, etc from your strings. Unless you know the data is absolutely consistently not going to be affected by these changes, or you want those effects, then this code is not safe.

An alternative would be:

set theText to "     Now is the time for all good men to come to the aid of the party     "
text from word 1 to word -1 of theText

This only loses spaces and punctuation before the first word and after the last.

1 Like

My approach suffers the same, eliminating all extra spaces between words… :frowning:

Guess I’ve been lucky that multiple spaces in text strings were always undesired.

Most forum members use the free Script Geek app.

Nice! Terse.

Assuming that any leading punctuation should go, to retain trailing punctuation we can get the last non-space char and use tids.

set theText to "                 TIL “Now is the time for all good men to come to the aid of their party.” is a typing drill written by a teacher named Charles E. Weller. “Now is the time for all good men to come to the aid of their country.” is now used instead, because it exactly fills out a 70-space line. --u/tampared.     --Michael Scott;>?}                    "

Trim(theText) -->"TIL “Now is the time for all good men to come to the aid of their party.” is a typing drill written by a teacher named Charles E. Weller. “Now is the time for all good men to come to the aid of their country.” is now used instead, because it exactly fills out a 70-space line. --u/tampared.     --Michael Scott;>?}"

on Trim(theText)
	set {theText, AppleScript's text item delimiters} to {text from word 1 to end of theText, " "}
	set {theTempText, AppleScript's text item delimiters} to {text items of theText, ""}
	set AppleScript's text item delimiters to character -1 of (theTempText as text)
	set theText to ((text items 1 thru -2 of theText & "") as text)
end Trim
2 Likes