Hi everybody,
I’m working on a script that makes some changes in text files saved from QuarkXPress in Xpress Tags format. Because these files will be reimported into Quark, I don’t want to mess around with the tags. The idea I came up with is to separate the content of the files into two lists: one for the tags and one for the bare text. Later on, only the text list will be processed, the two lists coerced into a string and file saved. The script works, but the separation routine slows it down way to much. Here’s the part of the script that makes the separation:
-- thetext is the content of the file
set tag_list to {}
set non_tag_list to {}
set str to ""
set chr to ""
set tagcheck to true
set skip_it to false
set append_it to false
set lt to length of thetext
repeat with k from 1 to lt
if tagcheck = true then
set chr to (item k of thetext) as string
set str to str & chr
set skip_it to true
if (chr = ">") or (chr = ":") or (k = lt) then
set tagcheck to false
if append_it = true then
set where_tag to (count of tag_list)
set str1 to item where_tag of tag_list
set item where_tag of tag_list to str1 & str
set append_it to false
else
set end of tag_list to str
end if
set str to ""
end if
end if
if (tagcheck = false) and (skip_it = false) then
set chr to (item k of thetext) as string
set str to str & chr
if (chr = "<") or (chr = "@") or (k = lt) then
set tagcheck to true
if length of str > 2 then
set end of non_tag_list to str
set str to ""
else
set append_it to true
end if
end if
end if
set skip_it to false
end repeat
As you can see in this script, I don’t need portions of text with length less than 3 characters, so the shorter ones are copied to the tag_list instead of non_tag_list.
Note that tags begin with “<” and end with “>”, but there are also paragraph style tags, which begin with “@” and sometimes end with “:”, sometimes not. Here’s an example of an xtg file header:
What do you think, is there a way to make it more efficient?