How to write an applescript to match strings in 2 TextEdit documents?

Just for the hell of it, to have tried it that is, I tried to embellish Nigels eminent way of pruning a long list by a short list,with ignoring white space :smiley: it Worked! So, that was all to it, to have it work correctly with leading and trailing blanks.

This must be the fastest way ever of pruning a list, at least in Apple Script, I think it is hard to beat before you write something that compile into binary code, that can execute independently.

I have lived in the belief that text items delimiters, was something you couldn’t manipulate with the different clauses, I was wrong.

What is character id 0 ?

It is nul, it is a an utf-8 character with just zero bit’s in it. (Correct me if I am wrong!) You often use that in order to preserve white-space when you are working with xargs in combination with find or mdfind, by the -print0 and −0 options.

I played with the character id 0 as a text item delimiter earlier, and it turns out, then you get it all, in one piece, as opposed to “”, that delivers just characters. Also a nice nice trick! :slight_smile:

I took some liberties and made a handler out of Nigels excellent work!


set longText to "c@gmail.com
b@gmail.com
d@gmail.com
a@gmail.com
h@gmail.com"
set shortText to "d@gmail.com
  c@gmail.com 
g@gmail.com"

set longText to pruneFast for longText against shortText
longText
-- pruneFast: removes items that are in shortText from longText
-- one item at each line are assumed.
to pruneFast for longText against shortText
	” © NG: http://macscripter.net/post.php?tid=39866
	local astid, newLong
	ignoring white space
		set astid to AppleScript's text item delimiters
		set AppleScript's text item delimiters to return & linefeed
		set longText to linefeed & longText's paragraphs & return
		set AppleScript's text item delimiters to return & character id 0 & linefeed
		set shortText to linefeed & shortText's paragraphs & return
		set AppleScript's text item delimiters to character id 0
		set AppleScript's text item delimiters to shortText's text items
		set newLong to longText's text items
		set AppleScript's text item delimiters to ""
		set newLong to newLong as text
		set AppleScript's text item delimiters to return -- or linefeed if you prefer.
		set newLong to text from word 1 to word -1 of (newLong's paragraphs as text)
		set AppleScript's text item delimiters to astid
	end ignoring
	return newLong
end pruneFast

I am familiar with combining “print0 | xargs -0”. However, did not know that was character id 0. I guess it is not equivalent to " " ?

set xxx to "a" & character id 0 & "b"

I would love to get a bit more background on it…

This is an example of where I learned something from a thread that went off on a tangent. :wink:

I guess someone will come around and enlighten you! :slight_smile:

My not totally uneducated guess for your example is, that since a character id 0 really can’t be printed, it gets translated to " ".

When we are dealing with character id 0, we are really dealing with text but at the level below it, at the binary level. Everything is just 0 and 1’s when it comes to it. and a sequence of zero’s are often used to terminate a string when it comes to C-languages, at least the strings that lays on the “internal” level, the other kind of strings that are higher up are called Pascal strings, and has a length property to go with them.

What Nigel did, was to use this convention, he massaged a nul (character id 0) into the paragraphs. ( I am not totally sure how this works, but I think they get embellished into the paragraphs. When he then set the text item delimiter to just character id 0, he then sees to that the whole paragraph (line) gets returned as a text item when he pulls out the text items from shortText’s text items, having changed perspective from paragraphs to text items.

I hope this made some sense, it is very ingenious! :slight_smile:

I wanted to put “sentinel” markers round the individual entries in the short text and separate the results into a list without using a repeat. Character id 0 was simply a convenient delimiter unlikely to be mistaken for anything else in the text! :slight_smile:

Since the script uses simultaneous multiple delimiters, there have to be two sentinel characters between adjacent entries in the long text to avoid any possibility of overlapping text items. Instead of using two linefeeds or two returns, I chose to use a linefeed in front of each entry and a return after it because that creates the combination <return & linefeed> between entries, which AppleScript recognises as a paragraph separator in its own right. This saves a delimiter-change when singling the line-endings at the end. Obviously the items in the short list also have linefeeds in front of them and returns at the end.

I didn’t make clear in my post that the method only works in Snow Leopard or later, where simultaneous multiple delimiters work.

That used to be the case with the pre-Leopard ‘string’ class. Both delimiters and ‘offset’ were case-sensitive and uninfluenced by ‘considering’ or ‘ignoring’ statements. They were more flexible with ‘Unicode text’ ” today’s ‘text’ ” although ‘offset’ doesn’t ignore white space.