Using AppleScript's Text Item Delimiters

Schmye_Bubbula · August 7, 2013, 5:39pm

Somebody explain conceptually what’s going on here; what’s the difference?


set AppleScript's text item delimiters to ASCII character (0)

repeat with x in "aabbccbb"
	exit repeat
end repeat

x --> item 1 of "aabbccbb"

set AppleScript's text item delimiters to x

AppleScript's text item delimiters --> "a"

text items of "aabbccbb" as text --> "bbccbb"

set AppleScript's text item delimiters to ASCII character (0)

But now do it like this:


set AppleScript's text item delimiters to ASCII character (0)

(*
repeat with x in "aabbccbb"
	exit repeat
end repeat

x --> item 1 of "aabbccbb"
*)

set AppleScript's text item delimiters to item 1 of "aabbccbb"

AppleScript's text item delimiters --> "a"

text items of "aabbccbb" as text --> "aabbccbb" (Huh?)

set AppleScript's text item delimiters to ASCII character (0)

I don’t get it ” the meat of the algorithm seems the same to me either way, namely:

1st way: set AppleScript’s text item delimiters to x
vs.
2nd way: set AppleScript’s text item delimiters to item 1 of “aabbccbb”

Why doesn’t…
text items of “aabbccbb” as text
…the 2nd way strip out the first two characters of “aabbccbb”?
AppleScript’s text item delimiters read the same both ways (“a”).
What am I missing?

Adam_Bell · August 7, 2013, 6:25pm

text items returns a list, in this case three items: {“”, “”, “bbccbb”} where the first empty quote is the first “a”, the second empty quote is the second “a”, and the third part is the rest. By converting text items to text you get the whole list.

set AppleScript's text item delimiters to item 1 of "aabbccbb"

set td to AppleScript's text item delimiters --> "a"

set y to text items of "aabbccbb" --> {"", "", "bbccbb"}

I don’t know why the conversion reveals the “a"s. Seems to me the answer should be " bbccbb”

Schmye_Bubbula · August 7, 2013, 6:44pm

Why not “bbccbb” instead of " bbccbb"? Aren’t the first two text items null characters?
And “item 1 of ‘aabbccbb’” is the same thing as “x” when setting AppleScript’s text item delimiters ” no? Why the different result?
(My head is getting ready to explode! )

McUsrII · August 7, 2013, 8:47pm

Hello.

What first happens is that you get the same list as Adam Bell got: namely {“”,“”,“bbccbb”}. Now, you must look at the empty strings as placeholders for text item delimiters.

You haven’t changed your text item delimiters, so when the string is turned back to text, then AppleScript inserts the text item delimiters there for you, so you end up with “aabbccbb” again.

This is totally normal behaviour. If you don’t change the text delimiter, you’ll end up with what you had originally, the moment you convert the list of text items back to text.


set AppleScript's text item delimiters to item 1 of "aabbccbb"

log AppleScript's text item delimiters --> "a"

set d to text items of "aabbccbb" 
# {"","","bbccbb"}
set AppleScript's text item delimiters to ""
# we remove the slots for the text item delimiters from the list
set d to d as text
log d
-- > "bbccbb"

Schmye_Bubbula · August 7, 2013, 11:40pm

Eureka! Got it! Thanks, guys ” and especially McUsrII, who finally made me see it: I needed to split-up the “to text items of” and the “as text,” and restore the standard null text item delimeters in-between. So in terms of my 2nd scriplet in post #7 above, to make it work, it should be modified to…


set AppleScript's text item delimiters to ASCII character (0)

(*
repeat with x in "aabbccdd"
	exit repeat
end repeat

x --> item 1 of "aabbccdd"
*)

set AppleScript's text item delimiters to item 1 of "aabbccdd"

AppleScript's text item delimiters --> "a"

set strippedText to text items of "aabbccdd"
-- Needed to split-up the "to text items of" and the "as text"...

set AppleScript's text item delimiters to ASCII character (0)
-- ... and put this in-between.

set strippedText to strippedText as text --> "bbccdd"
-- (This is the "as text" split away after the null text item delimeters restored.)

AppleScript: 2.0.1
Browser: Firefox 4.0.1
Operating System: Mac OS X (10.5)

Adam_Bell · August 7, 2013, 11:54pm

Thanks, McUsrII – that was the missing link; that he didn’t change the TID before converting to text. I knew better but had a senior moment; just turned 76 a few days ago. :rolleyes:

McUsrII · August 8, 2013, 12:16am

You don’t have to be 76 to have a senior moment, I guess I am a living proof of that.

I didn’t figure it out at first either, not util it suddenly dawned upon me.

Belated Congratualtions with your birthday Adam.

Schmye_Bubbula · August 8, 2013, 12:37am

I still don’t know why my 1st scriplet worked in post #7. I didn’t split the operations with restoration of the null TID in-between in that one, did I? Or did the repeat slip it in somehow?

What really threw me was that the TIDs were “a” in both scriplets in my post #7. (And apparently it’s still an anomaly, judging from Adam’s post #8.)

McUsrII · August 8, 2013, 2:35am

Hello.

Yes, I’d say that behaviour is an anomaly! I can’t explain it, but I can guess that since the variable is declared in a loop, then it just isn’t “visible” enough to work properly. It works well enough to get things removed but not to be inserted.

Well, that is a clever hack, if you just want to remove stuff! Now, if you change the line

set AppleScript's text item delimiters to x

into

set AppleScript's text item delimiters to contents of x

Then you get the default behaviour.

If you change the x into a normal variable in your first script in your post #7, then it works as it should too, (returning aabbccbb), so I guess it is that very locally scoped variable that does the trick, (x is really only meant to be used in the loop as a loop variable). If you declare x as local before the assignment in the repeat loop, then the “trick” also breaks, returning (aabbccbb), so I think the scoping is the culprit.

I don’t think the trick saves you much anyway, 3 lines added contra one setting the delimiters, and one coercing to text, but there might be hiding a slight speed gain there.

Schmye_Bubbula · August 16, 2013, 2:12am

I don’t understand why text item delimiters appear only whenever they occur at the beginning or end of given text, and not in the middle. In other words:


set text item delimiters to ""
set x to "abbc"
set text item delimiters to "b"
set x to x's text items --> {"a", "", "c"}
set text item delimiters to ""

But:


set text item delimiters to ""
set x to "abc"
set text item delimiters to "b"
set x to x's text items --> {"a", "c"}
set text item delimiters to ""

Why weren’t the latter’s text items {“a”, “”, “c”}, and the former’s {“a”, “”, “”, “c”}? My natural expectation ” wrong it turns out ” was that the delimiters are there in the original text string so they should also appear in the list of text items. The “Able was I ere I saw Elba” example in the first post’s tutorial says that’s how it works, but I don’t get why it works that way.

Nigel_Garvey · August 16, 2013, 7:00am

You have to think of ‘text items’ and ‘text item delimiters’ as alternating in the text, with ‘text items’ always being on the outside. Each instance of a delimiter comes between two text items.

In the text “abc”, the single instance of the delimiter “b” comes between “a” and “c”, so those are the text items.

In “abbc”, the two instances of the delimiter are adjacent. But there’s notionally a zero-length (or “empty”) text item between them and this is returned as the zero-length string “”.

Similarly, if an instance of the delimiter occurs at the beginning or end of the text, there’s notionally an empty text item on its outer side.

This sounds unnecessarily esoteric when you’re talking about extracting the text items, but when the list of text items is coerced back to text, the delimiter is simply inserted between the items and the result contains the right number of delimiter instances in the right places.

Schmye_Bubbula · August 17, 2013, 12:34pm

Thanks, Nigel. I guess that’s the crux: The text items in the list pertaining to the delimiters (i.e., each “”) aren’t the delimiters themselves, as I was wrongly construing, but rather the “empty” text items between the delimiters. Hard to wrap my head around, and I will have to ponder why it’s thusly as I work more with delimiters before it sinks-in. I was wanting it to be otherwise (each “” representing the delimiters themselves, not the absence of text items between them) because I was trying to use that as a way of parsing strings with far fewer passes than going through every single character. There’s probably still a way of doing that if I can just get the pattern of the way it really works down pat, so I’ll just plod along until I get it. Seat-of-the-pants AppleScripting is hard.

Nigel_Garvey · August 17, 2013, 2:55pm

You’ve “got” it now with regard to each “” in the list representing an “empty” text item.

There is of course no significance or reality to “empty” text items. They’re simply a convenient device to indicate the insertion points for any adjacent or text-end delimiter instances in the text for which you have text items. These situations would be difficult to indicate otherwise.

{“”, “a”, “”, “c”, “”} is five text items, so four delimiter instances have been removed from between them or four can be inserted:
“” & “b” & “a” & “b” & “” & “b” & “c” & “b” & “”
or “babbcb”

Schmye_Bubbula · August 23, 2013, 7:16pm

(If my problem here turns-out not to be specifically pertinent to text item delimiters, I’ll ask a mod to move this post out of the thread.)

This script aspires to remove redundant characters from a string.
It does the following:
“ For each character in turn…
“ Concatenate a dupe character to the first occurrence so that ” after then setting text item delimiters to that character ” any single instance not at the string endpoints (i.e., within the string) will explicitly show up as an empty string in the resulting text items list.
“ Look for the first appearance of an empty string in the text items list and replace it with the character in question. (Not necessary to restore it in the same order to remove redundancies; just for shits & giggles.)
“ After setting text item delimiters back to the regular {“”}, convert the text items list back to a string, freshly stripped of that character’s redundancies, and go on to the next character.

There may be a better way ” and if there is, I’d love to hear it, however strictly speaking it would be off-topic ” but what I really want to know is why in the first pass of the y-loop, {“a”, “”, “”, “bbccdd”} becomes an empty string (“”) when its variable is set to “as text” under the auspices of the default {“”} text item delimiters.


stripRedundantCharacters("aabbccdd")

to stripRedundantCharacters(inputText)
	repeat with x in inputText
		set getUniqueCharacters to text 1 through (the offset of x in inputText) in inputText & ¬
			text (the offset of x in inputText) through -1 in inputText
		set text item delimiters to x
		set getUniqueCharacters to getUniqueCharacters's text items
		
		repeat with y from 1 to the count of getUniqueCharacters's text items
			if text item y in getUniqueCharacters = "" then
				set text item y in getUniqueCharacters to x
				set text item delimiters to ""
				set getUniqueCharacters to getUniqueCharacters as text
				exit repeat
			end if
		end repeat
		
	end repeat
	return getUniqueCharacters
end stripRedundantCharacters

For your convenience, here it is again with the addition of debug code and comments:


stripRedundantCharacters("aabbccdd")

to stripRedundantCharacters(inputText)
	repeat with x in inputText
		log x --> "a" (in 1st pass; similarly with the following recorded Results)
		set getUniqueCharacters to text 1 through (the offset of x in inputText) in inputText & ¬
			text (the offset of x in inputText) through -1 in inputText ¬
			# Doubles-up the first character which will next become text item delimiters so that any single instances of it (within the string's interior) will make sure to get a null character ("") in the string's text items list.
		set text item delimiters to x
		set getUniqueCharacters to getUniqueCharacters's text items
		log getUniqueCharacters --> {"", "", "", "bbccdd"}
		log the (count of getUniqueCharacters's text items) --> 4
		
		repeat with y from 1 to the count of getUniqueCharacters's text items
			log y --> 1
			log text item y in getUniqueCharacters --> ""
			if text item y in getUniqueCharacters = "" then
				set text item y in getUniqueCharacters to x ¬
					# Puts-back the first occurance of the character, in place.
				log text item y in getUniqueCharacters --> "a"
				log getUniqueCharacters --> {"a", "", "", "bbccdd"}
				set text item delimiters to ""
				set getUniqueCharacters to getUniqueCharacters as text
				log getUniqueCharacters --> "" (Huh? Why not "abbccdd" in the 1st pass?)
				log "Next x"
				exit repeat
			end if
		end repeat
		
	end repeat
	return getUniqueCharacters
end stripRedundantCharacters

Actually, I should have posed my question more simply in the general form (apologies!):


set z to "aabbccdd"

repeat with x in z
	set text item delimiters to x
	log text item delimiters --> "a"
	set z to z's text items
	log z --> {"", "", "bbccdd"}
	
	repeat with y from 1 to the count of z's text items
		set text item y in z to x
		log z --> {"a", "", "bbccdd"}
		set text item delimiters to ""
		set z to z as text
		log z --> "" (Huh? Why not "abbccdd" in 1st pass?)
		
	end repeat
end repeat

AppleScript: 2.0.1
Browser: Firefox 4.0.1
Operating System: Mac OS X (10.5)

Nigel_Garvey · August 23, 2013, 8:15pm

Hi.

The first item in the list isn’t actually the text “a” ” although it’s logged as that ” but the reference item 1 of “aabbccdd”, since that’s the value of x in the kind of repeat: repeat with x in .. Until recently, coercing a list containing a reference to text only included the items before the reference (if any). That seems to be what’s happening in your case. On my Mountain Lion system (AppleScript 2.2.4), the coercion attempt causes an error: “Can’t make {item 1 of "aabbccdd", "", "", "bbccdd"} into type text.”

Instead of the reference in the list, you should use its contents or it as text:

set text item y in getUniqueCharacters to contents of x

-- Or:
set text item y in getUniqueCharacters to x as text

However, this doesn’t actually make the script work. The output’s exactly the same as the input. I’ll look at it again shortly.

Schmye_Bubbula · August 23, 2013, 8:27pm

Nigel, are you sure the error isn’t happening during pass 2 in the y-loop, as a reaction to the unexpected result during pass 1? (I get that same error during pass 2 as well… Well, I ~think~ it’s the same error!)

Nigel_Garvey · August 23, 2013, 8:43pm

Here’s a working version, if I’ve correctly understood the aim:


stripRedundantCharacters("aabbccdd")

to stripRedundantCharacters(inputText)
	set getUniqueCharacters to inputText
	
	set astid to AppleScript's text item delimiters
	repeat with i from 1 to (count inputText)
		set x to character i of inputText
		set AppleScript's text item delimiters to x
		set getUniqueCharacters to getUniqueCharacters's text items
		set AppleScript's text item delimiters to ""
		-- Concatenating the 'rest' list to text automatically coerces it to text using the current "" delimiter.
		set getUniqueCharacters to beginning of getUniqueCharacters & x & rest of getUniqueCharacters
	end repeat
	set AppleScript's text item delimiters to astid
	
	return getUniqueCharacters
end stripRedundantCharacters

Edit: And here’s a modification which cycles through the output string instead of the input. Duplicate characters are always to the right of the index, so their removal further reduces the number of iterations still to be done.


stripRedundantCharacters("aabbccdd")

to stripRedundantCharacters(inputText)
	set getUniqueCharacters to inputText
	
	set astid to AppleScript's text item delimiters
	set i to 1
	repeat until (i > (count getUniqueCharacters))
		set x to character i of getUniqueCharacters
		set AppleScript's text item delimiters to x
		set getUniqueCharacters to getUniqueCharacters's text items
		set AppleScript's text item delimiters to ""
		-- Concatenating the 'rest' list to text automatically coerces it to text using the current "" delimiter.
		set getUniqueCharacters to beginning of getUniqueCharacters & x & rest of getUniqueCharacters
		set i to i + 1
	end repeat
	set AppleScript's text item delimiters to astid
	
	return getUniqueCharacters
end stripRedundantCharacters

Schmye_Bubbula · August 23, 2013, 9:14pm

Bingo!
And it now has sunk-in that your diagnosis in Post #21 is the crux, i.e.,…


set z to "aabbccdd"

repeat with x in z
	set text item delimiters to x
	set z to text items of z
	set text item 1 of z to x --> {[item 1 of "aabbccdd"], "", "bbccdd"}
	log z --> Booby trap!: {"a", "", "bbccdd"}
	set text item delimiters to ""
	set z to z as text
	log z --> ""
	exit repeat
end repeat

(Whereas this is a text item delimiters booby trap awaiting to snare the unwary, I guess these posts indeed belong in this tutorial thread.)

A big thanks and tip o’ the hat, Nigel!

AppleScript: 2.0.1
Browser: Firefox 4.0.1
Operating System: Mac OS X (10.5)

Nigel_Garvey · August 24, 2013, 9:29am

Well. The problem turned out be not directly connected with text item delimiters; but if the poster (yourself) thought it might be happening because of something he hadn’t understood in Adam’s article, it would be reasonable to ask here. Otherwise “AppleScript | Mac OS X” is the place.

Ignacio · August 24, 2015, 8:10pm

In Applescript 2.4, some of the examples in the first script return something very different.
It´s seems that, even though I have the same TID “”, AS recognizes “-” and “*” as a TID; and don’t consider “_” as a word.

(If I ask for “length of Applescript’s text item delimiters” the answer is: 1)

words of “Hi, I’m Peggy-Sue” → {“Hi”, “I’m”, “Peggy-Sue”}
– {“Hi”, “I’m”, “Peggy”, “Sue”}

words of “Nowis the.time&for all_good (folks) to learn-AppleScript"
→ {“Now”, "”, “is”, “the.time”, “&”, “for”, “all”, “_”, “good”, “folks”, “to”, “learn-AppleScript”}
– {“Now”, “is”, “the.time”, “for”, “all_good”, “folks”, “to”, “learn”, “AppleScript”}

words of “set-piece is hyphenated” → {“set-piece”, “is”, “hyphenated”} - the hyphen isn’t separated
– {“set”, “piece”, “is”, “hyphenated”} (now it’s not)

words of “funnyword with asterisk" → {“funny”, "”, “word”, “with”, “asterisk”} - the asterisk is separate
– {“funny”, “word”, “with”, “asterisk”}