Remove double or triple space in a text string

Hi

I can’t figure out how to delete too many spaces in a string. I can delete all of them but i do need to retain one space

I have a series of names. Some are correct i.e. “John Smith” some have double or triple spaces in i.e. “John Smith” or “John Smith” some have a space at the beginning or end " John Smith "

How can I delete the unwanted spaces to leave “John Smith”

Thanks

Hello

When I copy paste your message in a text file as a sample document, I get curious characters.
What you describe as space characters are in fact couples of characters whose ID is $C2 & $A0.
As far as I know, $A0 is NO BREAK SPACE but I’m unable to guess what’s the meaning of the character $C2 which is supposed to be Â.

What is really your document contents ?

Yvan KOENIG (VALLAURIS, France) lundi 18 juin 2012 12:09:27

Try:


set theName to "John  Smith"
set theName to do shell script "echo " & quoted form of theName & " | sed 's/  *[ˆ ]/ /'"

This version also removes trailing and leading spaces:


set theName to "  John   Smith  "
set theName to do shell script "echo " & quoted form of theName & " | sed -e 's/^ *//' -e 's/ *$//' -e 's/  */ /'"

Totally Brilliant thanks all

Or this version works no matter many words there may be in the name:


set theName to "  John      Aardvark  Lazenby     Bicycle  Smith  "
set theName to do shell script "echo " & quoted form of theName & " | sed -e 's/  */ /g' -e 's/^ *//' -e 's/ *$//'"

Hello!

This is another solution I wrote for my own amusement, that I guess is faster, at least it is different and vanilla! :slight_smile:
It also ought to be faster, not that it matters that much.


set oldDelims to AppleScript's text item delimiters
set AppleScript's text item delimiters to {" "}
set theName to "  John      Aardvark  Lazenby     Bicycle  Smith  "
set aList to theName's text items
repeat with anItm in aList
	if contents of anItm is "" then set contents of anItm to missing value
end repeat
set theName to aList's text as text
set AppleScript's text item delimiters to oldDelims
log theName
”> John Aardvark Lazenby Bicycle Smith

1 Like

Ok, here is a the handler, that finds its way into my library, it tackles space and tabs.


set theName to "  John      Aardvark  Lazenby   	  Bicycle  Smith  "

set newname to stripwhSpace from theName
log theName
”> John Aardvark Lazenby Bicycle Smith
to stripwhSpace from astring
	script o
		property aList : missing value
	end script

	set oldDelims to AppleScript's text item delimiters
	set AppleScript's text item delimiters to {" ", "	"} -- space, tab 
	
	set o's aList to astring's text items
	
	repeat with i from 1 to (get count o's aList)
		if item i of o's aList is "" then set item i of o's aList to missing value
	end repeat
	set astring to o's aList's text as text
	set AppleScript's text item delimiters to oldDelims
	return astring
end stripwhSpace

It’s funny, McUsr, that when I dug in my own library I found almost the exact same same code. For plain script my will be enough an you won’t need an extra script object (my is an script object as well).

set theText to " hello  world! how    are you?"
set AppleScript's text item delimiters to space
set textItems to every text item of theText
repeat with x from 1 to count my textItems
	if item x of my textItems = "" then set item x of my textItems to missing value
end repeat
set theText to text of my textItems as string
set AppleScript's text item delimiters to ""
return theText

Hello!

I didn’t snag it, not this time :smiley:

I don’t see adding that script object as leading to too much over head. though my is shorter to type than 'o’s.

Just view the usage of a script object as something regarding personal taste! :slight_smile:

I have changed name of the handler to trimwh by the way

It’s been awhile, but McUsr is the first poster doing what I think the original poster requested. That is takes:

and returns

I will provide my, straight solution. No repeat loops, no script objects:


set theText to "    John    Aardvark     Lazenby     	   Bicycle    Smith    "
set AppleScript's text item delimiters to space
set resultText to (words of theText) as text

--> "John Aardvark Lazenby Bicycle Smith"

The scripts in posts #4 and #6 were doing what was requested two months before McUsr’s solutions were posted. Today I might shorten the second one to:


set theName to "  John      Aardvark  Lazenby     Bicycle  Smith  "
set theName to do shell script "echo " & quoted form of theName & " | sed -E 's/[[:space:]]+/ /g ; s/^[[:space:]]|[[:space:]]$//g'"

adayzdone’s script in post #3 has been affected by a change in MacScripter’s text encoding since it was posted, so it’s hard to tell now whether it worked back then or not. As you can see, weedinner was delighted with the replies he received.

None of the scripts use script objects, so I don’t know why KniazidisR said that. His script breaks the text into AppleScript ‘words’ before rejoining them with single spaces, so it would produce the wrong result with a name like John-Aardvark Lazenby Bicycle-Smith.

This also works.

set theText to quoted form of " John Aardvark Lazenby     Bicycle Smith "

set theResult to do shell script "echo " & theText & " | tr -s ' ' | sed -E 's/^ | $//g'"

Neat. :cool: xargs’s man page even says “If utility is omitted, echo(1) is used.”, so it’s specified behaviour, not a hack. :slight_smile:

Most (or all) of the scripts in this thread will do the job with a short string, but I wondered how they would fare with a large block of text. I tested Nigel’s and Fredrik71’s scripts from this thread and an ASObjcC script from Shane’s book. The test string contained 50,000 characters plus however many additional characters are contained in double and triple spaces. I ran each test several times to account for random differences in the test string.

ASObjC - 0.010 second
sed - 0.019 second
xargs - 0.019 second

The test script follows. Before running this script, handlers not being tested need to be disabled, and a check needs to be made to insure that double- and triple-spaces have not been converted to single spaces by the forum.

use framework "Foundation"
use scripting additions

-- untimed code
set theCharacterList to {"a", "b", "c", "d", " ", "  ", "   "} -- the last two items are 2 and 3 spaces
set theString to {"   "} -- this is 3 spaces
repeat with i from 1 to 50000
	set the end of theString to some item in theCharacterList
end repeat
set the end of theString to "   " -- this is 3 spaces
set theString to theString as text

-- start time
set startTime to current application's CFAbsoluteTimeGetCurrent()

-- timed code
set theNewString to cleanUpText(theString)

on cleanUpText(someText)
	set theString to current application's NSString's stringWithString:someText
	set stringLength to theString's |length|()
	set theString to theString's stringByReplacingOccurrencesOfString:" +" withString:" " options:(current application's NSRegularExpressionSearch) range:{location:0, |length|:stringLength}
	set theWhiteSet to current application's NSCharacterSet's whitespaceCharacterSet()
	set theString to theString's stringByTrimmingCharactersInSet:theWhiteSet
	set theString to theString as text
end cleanUpText

on cleanUpText(theString)
	set theString to (do shell script "echo " & quoted form of theString & " | sed -E 's/[[:space:]]+/ /g ; s/^[[:space:]]|[[:space:]]$//g'")
end cleanUpText

on cleanUpText(theString)
	set theString to (do shell script "echo " & quoted form of theString & " | xargs")
end cleanUpText

-- elapsed time
set elapsedTime to (current application's CFAbsoluteTimeGetCurrent()) - startTime
set nf to current application's NSNumberFormatter's new()
nf's setFormat:("0.000")
set elapsedTime to ((nf's stringFromNumber:elapsedTime) as text) & " seconds"

-- result
elapsedTime

I agree. xargs is a very useful and powerful tool. But for this example, you don’t need to call xargs at all. The same thing can be accomplished directly from the “find” command like this…

do shell script "find " & quoted form of thePath & " -type f -name '" & theName & "' -exec  cat {} \\;"

Instead of the do shell script command in this…

set thePath to POSIX path of (path to desktop)
my findEveryFileWithName:"index.md" inPath:thePath
on findEveryFileWithName:theName inPath:thePath
	do shell script "find " & quoted form of thePath & " -type f -name '" & theName & "' -print0 | xargs -0 cat"
end findEveryFileWithName:inPath: