Separating tags from Quark's XTG

Lorand · July 22, 2007, 8:32am

Hi everybody,

I’m working on a script that makes some changes in text files saved from QuarkXPress in Xpress Tags format. Because these files will be reimported into Quark, I don’t want to mess around with the tags. The idea I came up with is to separate the content of the files into two lists: one for the tags and one for the bare text. Later on, only the text list will be processed, the two lists coerced into a string and file saved. The script works, but the separation routine slows it down way to much. Here’s the part of the script that makes the separation:


-- thetext is the content of the file

		set tag_list to {}
		set non_tag_list to {}
		set str to ""
		set chr to ""
		set tagcheck to true
		set skip_it to false
		set append_it to false
		set lt to length of thetext
		repeat with k from 1 to lt
			if tagcheck = true then
				set chr to (item k of thetext) as string
				set str to str & chr
				set skip_it to true
				if (chr = ">") or (chr = ":") or (k = lt) then
					set tagcheck to false
					if append_it = true then
						set where_tag to (count of tag_list)
						set str1 to item where_tag of tag_list
						set item where_tag of tag_list to str1 & str
						set append_it to false
					else
						set end of tag_list to str
					end if
					set str to ""
				end if
			end if
			if (tagcheck = false) and (skip_it = false) then
				set chr to (item k of thetext) as string
				set str to str & chr
				if (chr = "<") or (chr = "@") or (k = lt) then
					set tagcheck to true
					if length of str > 2 then
						set end of non_tag_list to str
						set str to ""
					else
						set append_it to true
					end if
				end if
			end if
			set skip_it to false
		end repeat

As you can see in this script, I don’t need portions of text with length less than 3 characters, so the shorter ones are copied to the tag_list instead of non_tag_list.
Note that tags begin with “<” and end with “>”, but there are also paragraph style tags, which begin with “@” and sometimes end with “:”, sometimes not. Here’s an example of an xtg file header:

What do you think, is there a way to make it more efficient?