Trim [Remove Spaces]

trim:

-- theseCharacters : A list of characters to trim
-- someText        : The text to be trimmed
--
on trim(theseCharacters, someText)
	-- Lazy default (AppleScript doesn't support default values)
	if theseCharacters is true then set theseCharacters to ¬
		{" ", tab, ASCII character 10, return, ASCII character 0}
	
	repeat until first character of someText is not in theseCharacters
		set someText to text 2 thru -1 of someText
	end repeat
	
	repeat until last character of someText is not in theseCharacters
		set someText to text 1 thru -2 of someText
	end repeat
	
	return someText
end trim

-- Example
trim("  Hello, World!	", true)

A slightly simpler version that only removes spaces:

on trim(someText)
	repeat until someText does not start with " "
		set someText to text 2 thru -1 of someText
	end repeat
	
	repeat until someText does not end with " "
		set someText to text 1 thru -2 of someText
	end repeat
	
	return someText
end trim

@Bruce Phillips: Thanks for sharing your code for a trim function.

I have optimized it somewhat to hopefully:

  1. Improve performance for large strings by updating the source string only after calculating the area to be trimmed.
  2. Use more descriptive variable names.
  3. Added option to trim left, right, or both sides of string.

But the core code is the same.


--

set strTest to "\tsome text here  and here\t  "
my trimThis(strTest, true, "full")

on trimThis(pstrSourceText, pstrCharToTrim, pstrTrimDirection)
	
	-- pstrCharToTrim 	 : A list of characters to trim, or true to use default
	-- pstrSourceText    : The text to be trimmed
	-- pstrTrimDirection : Direction of Trim ("right","left", "full")
	
	set strTrimedText to pstrSourceText
	
	---	USE DEFAULT IF true IS PASSED ---
	-- Lazy default (AppleScript doesn't support default values)
	
	if pstrCharToTrim is true then
		set pstrCharToTrim to {" ", tab, ASCII character 10, return, ASCII character 0}
	end if
	
	--- TRIM LEFT SIDE OF STRING ---
	
	if (pstrTrimDirection = "full") or (pstrTrimDirection = "left") then
		set iLoc to 1
		repeat until character iLoc of strTrimedText is not in pstrCharToTrim
			set iLoc to iLoc + 1
		end repeat
		
		set strTrimedText to text iLoc thru -1 of strTrimedText
	end if
	
	--- TRIM RIGHT SIDE OF STRING ---
	
	
	if (pstrTrimDirection = "full") or (pstrTrimDirection = "right") then
		set iLoc to count of strTrimedText
		repeat until character iLoc of strTrimedText is not in pstrCharToTrim
			set iLoc to iLoc - 1
		end repeat
		
		set strTrimedText to text 1 thru iLoc of strTrimedText
		
	end if
	
	return strTrimedText
	
end trimThis


Thanks for the update Michael but when bringing old posts back to live you should consider changes of AppleScript through the years as well. Improvements over Michaels scripts are:

  • Since AppleScript 2.0 the script should use (character) id instead of ASCII character command.
  • AppleScript 2.0 is Unicode, by default all Unicode spaces are supported now, not only limited to MacRoman whitespaces
  • BugFix: The handler will return “” if the string contains only whitespaces
  • BugFix: The handler will return “” if the string is empty
  • Optimization: The string will only be trimmed once, not twice (text n thru n …)
  • Use justification constants left and right to indicate direction, missing value (any value) a full trim will be applied.
  • pstrCharToTrim works now with a more obvious value than a boolean. Use missing value (read: undefined) to use default whitespaces. It will also use default values when the given object is not a list instead of returning an error.
set strTest to "	Hello World!  "
trimThis(strTest, missing value, left) -- result:  "Hello World!  "
trimThis(strTest, missing value, right) -- result:  "	Hello World!"
trimThis(strTest, {tab, return, linefeed}, missing value) -- result:  "Hello World! "

on trimThis(pstrSourceText, pstrCharToTrim, pstrTrimDirection)
	-- pstrCharToTrim 	 : A list of characters to trim, or true to use default
	-- pstrSourceText    : The text to be trimmed
	-- pstrTrimDirection : Direction of Trim left, right or any value for full
	
	set strTrimedText to pstrSourceText
	
	-- If undefinied use default whitespaces
	if pstrCharToTrim is missing value or class of pstrCharToTrim is not list then
		-- trim tab, newline, return and all the unicode characters from the 'separator space' category
		-- http://www.fileformat.info/info/unicode/category/Zs/list.htm
		set pstrCharToTrim to {tab, linefeed, return, space, character id 160, character id 5760, character id 8192, character id 8193, character id 8194, character id 8195, character id 8196, character id 8197, character id 8198, character id 8199, character id 8200, character id 8201, character id 8202, character id 8239, character id 8287, character id 12288}
	end if
	
	set lLoc to 1
	set rLoc to count of strTrimedText
	
	--- From left to right, get location of first non-whitespace character
	if pstrTrimDirection is not right then
		repeat until lLoc = (rLoc + 1) or character lLoc of strTrimedText is not in pstrCharToTrim
			set lLoc to lLoc + 1
		end repeat
	end if
	
	-- From right to left, get location of first non-whitespace character
	if pstrTrimDirection is not left then
		repeat until rLoc = 0 or character rLoc of strTrimedText is not in pstrCharToTrim
			set rLoc to rLoc - 1
		end repeat
	end if
	
	if lLoc ≥ rLoc then
		return ""
	else
		return text lLoc thru rLoc of strTrimedText
	end if
end trimThis

And why not :wink:

use AppleScript version "2.4"
use framework "Foundation"

set strTest to "	Hello World! "
trimThis(strTest, missing value, left) -- result: "Hello World! "
trimThis(strTest, missing value, right) -- result: "    Hello World!"
trimThis(strTest, tab & return & linefeed, missing value) -- result: "Hello World! "

on trimThis(pstrSourceText, pstrCharToTrim, pstrTrimDirection)
	-- pstrCharToTrim     : A list of characters to trim, or true to use default
	-- pstrSourceText : The text to be trimmed
	-- pstrTrimDirection : Direction of Trim left, right or any value for full
	if pstrCharToTrim = missing value or pstrCharToTrim = true then
		set setToTrim to current application's NSCharacterSet's whitespaceAndNewlineCharacterSet()
	else
		set setToTrim to current application's NSCharacterSet's characterSetWithCharactersInString:pstrCharToTrim
	end if

	set anNSString to current application's NSString's stringWithString:pstrSourceText
	if pstrTrimDirection = left then
		set theRange to anNSString's rangeOfCharacterFromSet:(setToTrim's invertedSet())
		if |length| of theRange = 0 then return ""
		set anNSString to anNSString's substringFromIndex:(theRange's location)
	else if pstrTrimDirection = right then
		set theRange to anNSString's rangeOfCharacterFromSet:(setToTrim's invertedSet()) options:(current application's NSBackwardsSearch)
		if |length| of theRange = 0 then return ""
		set anNSString to anNSString's substringToIndex:(theRange's location)
	else
		set anNSString to anNSString's stringByTrimmingCharactersInSet:setToTrim
	end if
	return anNSString as text
end trimThis

Considerably slower on the sample string, but gets more competitive with longer strings.

@DJ: Thanks for the backhanded compliment. :frowning:

When providing what you think is an improved version of someone else’s script, it is not necessary to be critical if nothing wrong was done. Whether or not your version is better will be left up to the reader.

I see nothing wrong with using the ASCII character command. It works and is more obvious than the character ID approach.
I also think it is more obvious to use a value of “full” rather than missing value for parameter pstrTrimDirection. There’s no reason not to allow for both.

Your “optimization” of trimming once instead of twice is a very minor optimization. But it good.

Thanks for sharing your version. It is good the thread is alive again. :cool:

That was no my intention and didn’t meant to.

Me neither but I prefer to use Apple’s guidelines and the AppleScript Language Guide who clearly says no longer to use the ASCII number and ASCII character commands since AppleScript 2.0.

With that analogy Bruce’s version can still be the best, depending on the mind that’s reading it :wink:

I still consider removing the possible errors and making it AppleScript 2.0 compatible as an improvement which is more than a preference. The improvement is not in performance but in reliability and support. The “right” “left” “full” was to make it cleaner (personal preference to make the if statement faster/cleaner inside the routine) and not to use true but missing value is based on the global gentlemen agreement between programmers. The latter is technically not wrong, but it’s uncommon and may lead to confusing.

OK, I did not realize the ASCII commands had been depreciated.

From the Apple Guidelines:

Maybe even better is to use the constants:

AS you’ve found, it’s deprecated. And yes, it does work – but only for true ASCII characters – that is, for numbers 0 to 127. Above that, it’s unreliable.

But there’s another reason to prefer the id approach: it’s much faster. The difference is insignificant in this example because you’re only calling it once, but it’s worth keeping in mind. Using ASCII character involves sending an Apple event, where using id does not. For example:

set theStart to current date
repeat 3000 times
	repeat with i from 0 to 127
		ASCII character i
	end repeat
end repeat
return (current date) - theStart

And this:

set theStart to current date
repeat 3000 times
	repeat with i from 0 to 127
		string id i
	end repeat
end repeat
return (current date) - theStart

Using a constant like linefeed is quicker still, but only by the narrowest of margins.

You’re beating a dead horse. :wink:

I noted it was depreciated, and even quoted the Apple Guidelines.
I was already in agreement to not use the ASCII commands.

I had occasion to use Shane’s script from post 4–its 6 years old but still does the job. Anyways, I happened to notice an issue when using the script to trim whitespace characters from the right side of a string, and the issue is that the last character of the string is truncated. Thus, if the returned result should be " Hello World!" it is instead " Hello World".

I don’t fully understand the code in that immediate section of the script but changing the first line below to the second line below seems to fix the issue.

set anNSString to anNSString's substringToIndex:(theRange's location)
set anNSString to anNSString's substringToIndex:((theRange's location) + 1)

Hi peavine.

theRange is set to the range of the last character in the string which isn’t in the set of characters to be trimmed. The code’s then meant to return the text from the beginning of the string up to and including that character. But in fact substringToIndex: “Returns a new string containing the characters of the receiver up to, but not including, the one at a given index.” So to include that last character, the given index has to be the character’s location plus its length:

set anNSString to anNSString's substringToIndex:((theRange's location) + (theRange's |length|))

Thanks Nigel. That works great.

Shane’s script includes the ability to remove characters from the beginning, the end, or both the beginning and end of a string. I seldom need to do all of these in one script and, for economy of code, decided to divide Shane’s script into three handlers, which I’ve included below. I’ve also included a fourth handler which works on a string with multiple paragraphs.

The operation of these scripts requires no explanation, except to note that the handlers’ second parameter can either be specific characters or missing value, and, in the latter case, the character set whitespaceAndNewlineCharacterSet is used. So, when calling one of the handlers, all of the following are valid:

trimLeadingCharacters(theString, “-” & " ")
trimTrailingCharacters(theString, space & tab)
trimLeadingAndTrailingCharacters(theString, missing value)

I ran some timing tests with these handlers with single-paragraph strings and they generally took about 3 milliseconds to run–this assumes that the foundation framework is already in memory. I tested the fourth handler with a string that contained 301 paragraphs, and the timing result was 265 milliseconds.

TRIM LEADING CHARACTERS

on trimLeadingCharacters(theString, theCharacters)
	if theCharacters = missing value then
		set theCharacters to current application's NSCharacterSet's whitespaceAndNewlineCharacterSet()
	else
		set theCharacters to current application's NSCharacterSet's characterSetWithCharactersInString:theCharacters
	end if
	
	set theString to current application's NSString's stringWithString:theString
	set theRange to theString's rangeOfCharacterFromSet:(theCharacters's invertedSet())
	if |length| of theRange = 0 then return ""
	return (theString's substringFromIndex:(theRange's location)) as text
end trimLeadingCharacters

TRIM TRAILING CHARACTERS

on trimTrailingCharacters(theString, theCharacters)
	if theCharacters = missing value then
		set theCharacters to current application's NSCharacterSet's whitespaceAndNewlineCharacterSet()
	else
		set theCharacters to current application's NSCharacterSet's characterSetWithCharactersInString:theCharacters
	end if
	
	set theString to current application's NSString's stringWithString:theString
	set theRange to theString's rangeOfCharacterFromSet:(theCharacters's invertedSet()) options:(current application's NSBackwardsSearch)
	if |length| of theRange = 0 then return ""
	return (theString's substringToIndex:((theRange's location) + (theRange's |length|))) as text
end trimTrailingCharacters

TRIM LEADING AND TRAILING CHARACTERS

on trimLeadingAndTrailingCharacters(theString, theCharacters)
	if theCharacters = missing value then
		set theCharacters to current application's NSCharacterSet's whitespaceAndNewlineCharacterSet()
	else
		set theCharacters to current application's NSCharacterSet's characterSetWithCharactersInString:theCharacters
	end if

	set theString to current application's NSString's stringWithString:theString
	set theString to theString's stringByTrimmingCharactersInSet:theCharacters
	return theString as text
end trimLeadingAndTrailingCharacters

TRIM LEADING AND TRAILING CHARACTERS - STRING HAS MULTIPLE PARAGRAPHS

on trimLeadingAndTrailingCharacters(theString, theCharacters)
	if theCharacters = missing value then
		set theCharacters to current application's NSCharacterSet's whitespaceCharacterSet()
	else
		set theCharacters to current application's NSCharacterSet's characterSetWithCharactersInString:theCharacters
	end if
	
	set theStrings to (current application's NSString's stringWithString:theString)'s componentsSeparatedByString:linefeed
	set cleanedStrings to current application's NSMutableArray's new()
	repeat with aString in theStrings
		set aString to (aString's stringByTrimmingCharactersInSet:theCharacters)
		(cleanedStrings's addObject:aString)
	end repeat
	return (cleanedStrings's componentsJoinedByString:linefeed) as text
end trimLeadingAndTrailingCharacters

Just for the sake of thoroughness, I decided to write a script that worked the same as script 4 above except with basic AppleScript. I ran timing tests with a string that contained 3001 paragraphs with spaces and tabs at the beginning and end of each paragraph. The results were:

Script 4 above - 2.540 seconds
The script below without a script object - 0.312
The script below - 0.047

set theCleanedText to trimText(theText, {space, tab})

on trimText(theText, theCharacters)
	script o
		property untrimmedParagraphs : (paragraphs of theText)
		property trimmedParagraphs : {}
	end script
	
	repeat with aParagraph in o's untrimmedParagraphs
		set aParagraph to contents of aParagraph
		try
			repeat while text 1 of aParagraph is in theCharacters
				set aParagraph to text 2 thru -1 of aParagraph
			end repeat
			repeat while text -1 of aParagraph is in theCharacters
				set aParagraph to text 1 thru -2 of aParagraph
			end repeat
		on error
			set aParagraph to ""
		end try
		set the end of o's trimmedParagraphs to aParagraph
	end repeat
	
	set {TID, text item delimiters} to {text item delimiters, linefeed}
	set o's trimmedParagraphs to (o's trimmedParagraphs as text)
	set text item delimiters to TID
	return o's trimmedParagraphs
end trimText

KniazidisR. Thanks for looking at my scripts and for your suggestion.

Our scripts appear to perform somewhat different tasks. My script removes the specified characters (whitespace as currently written) from the beginning and end of every paragraph in the text. Thus, in my timing tests, the script trims 3001 paragraphs. Your script removes the specified characters from the beginning and end of the entire text rather than the individual paragraphs. If that is correct, it doesn’t appear the timing-test results of our scripts should be compared. Am I confused as to what your script does?

No, you’re right, I got it mixed up. This means that my script is useless, because when trimming per 1024 paragraphs, it is 50 times slower than yours. Therefore, I am deleting my post. And only your version goes to my collection of scripts.

Also using RegEx you can find those pesky double/triple spaces and remove them with this.
Then use NSCharacterSet white…
(much of this has been split out into separate handlers that I use for other purposes)

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

property NSUTF32LittleEndianStringEncoding : a reference to 2.617245952E+9
property NSMutableCharacterSet : a reference to current application's NSMutableCharacterSet
property NSCharacterSet : a reference to current application's NSCharacterSet

property NSNotFound : a reference to 9.22337203685477E+18 + 5807

property NSRegularExpression : a reference to current application's NSRegularExpression
property NSRegularExpressionCaseInsensitive : a reference to 1
property NSRegularExpressionUseUnicodeWordBoundaries : a reference to 40
property NSRegularExpressionAnchorsMatchLines : a reference to 16
property NSRegularExpressionSearch : a reference to 1024
property NSString : a reference to current application's NSString

on cleanAllWhiteSpaceInString:aString
	set aPattern to (NSString's stringWithString:"(\\s+){2}")
	set aDirtyString to (my findInString:aString withPattern:aPattern replaceWith:" ")
	set aCleanString to (my cleanWhiteSpaces:aDirtyString)
	return aCleanString
end cleanAllWhiteSpaceInString:

on cleanWhiteSpaces:aString
	set aCharSet to NSCharacterSet's whitespaceCharacterSet()
	return (my cleanString:aString withCharacterSet:aCharSet)
end cleanWhiteSpaces:

on cleanString:aString withCharacterSet:aCharSet
	set aDirtyString to NSString's stringWithString:aString
	set aCleanString to (aDirtyString's stringByTrimmingCharactersInSet:aCharSet)
	return aCleanString
end cleanString:withCharacterSet:

on findInString:aString withPattern:aRegExString replaceWith:aReplace
	set aRegEx to my createRegularExpressionWithPattern:aRegExString
	return (my findInString:aString withRegEx:aRegEx replaceWith:aReplace)
end findInString:withPattern:replaceWith:

on findInString:aString withRegEx:aRegEx replaceWith:aReplace
	set aSource to NSString's stringWithString:aString
	set aRepString to NSString's stringWithString:aReplace
	set aLength to aSource's |length|()
	set aRange to (current application's NSMakeRange(0, aLength))
	set matches to (aRegEx's matchesInString:aSource options:0 range:aRange)
	set aCount to matches's |count|()
	if (aCount > 0) then
		set aCleanString to (aRegEx's stringByReplacingMatchesInString:aSource options:0 range:aRange withTemplate:aRepString)
	else
		set aCleanString to aSource
	end if
	return aCleanString
end findInString:withRegEx:replaceWith:

on createRegularExpressionWithPattern:aRegExString
	if (class of aRegExString) is equal to (NSRegularExpression's class) then
		--log ("it alreadry was a RegEx")
		return aRegExString
	end if
	set aPattern to NSString's stringWithString:aRegExString
	set regOptions to NSRegularExpressionCaseInsensitive + NSRegularExpressionUseUnicodeWordBoundaries
	set {aRegEx, aError} to (NSRegularExpression's regularExpressionWithPattern:aPattern options:regOptions |error|:(reference))
	if (aError ≠ missing value) then
		log {"regEx failed to create aError is:", aError}
		log {"aError debugDescrip is:", aError's debugDescription()}
		break
		return
	end if
	return aRegEx
end createRegularExpressionWithPattern:

I thought I would test to see if an ASObjC implementation of regular expressions might be faster when removing spaces and tabs from the beginning and end of paragraphs. My test string contained 4097 paragraphs, each of which was preceded by 3 spaces and followed by 3 tabs. I also retested my scripts from post 13 and 14, and the results were:

script from post 13 (ASObjC) - 340 milliseconds
script from post 14 ( basic AppleScript) - 90 milliseconds
script included below - 7 milliseconds

use framework "Foundation"
use scripting additions

set theString to " 	 line one 	 
 	 line two 	 "

set trimmedText to getTrimmedText(theString)

on getTrimmedText(theString)
	set thePattern to "(?m)^\\h+|\\h+$" -- trim all horizontal whitespace
	-- set thePattern to "(?m)^[]+|[]+$" -- trim the characters in square brackets
	set theString to current application's NSString's stringWithString:theString
	set theString to (theString's stringByReplacingOccurrencesOfString:thePattern withString:"" options:(current application's NSRegularExpressionSearch) range:{0, theString's |length|()})
	return theString as text
end getTrimmedText

Thanks, stealing this!

If you’re interested, ICU regex has a metacharacter matching any horizontal white space character: \h. So you could save yourself a small amount of typing with:

set thePattern to "(?m)^\\h+|\\h+$"

Not to be confused with \s, which also matches vertical “white spaces” such as line endings and form feeds.