The general problem is to “chunk” a TextEdit document into new documents containing identified portions of the original. The original RTF doc has markers inserted in it by the App that exported it as one long doc that I can find with a “contains” filter and thus determine the range of paragraphs that constitute one “chunk” as those between markers. I’m not using text item delimiters because they don’t apply to rich text.
The problem is that if I have a formatted line of rich text with some words bold, italic or colored, with font changes, how do I create a new TE document that preserves that formatting for the chunk? This script will determine what’s there:
tell application "TextEdit"
open (choose file of type "public.rtf" without invisibles) -- grab the doc.
set AR to {attribute runs, properties of attribute runs} of text of document 1
end tell
It returns an array of two lists. The first element is the characters associated with the formatting captured in the second element of each item of the outer list.
Now how do I get that into a new doc?. If it was possible to select the paragraphs I wanted I could GUI script copying them, but it’s not.
EDIT: I see that hhas’ TextCommands can deal with formatting – I’ll pursue that unless someone knows a more straight-forward method that doesn’t require the special treatment. This isn’t for me.
The only thing I am having a problem with is the colour.
I notice the color is always the same in the properties, reagardless of the actual words colour
** ok that was me, not spotting a typo. So coulor works as well.
So now need to work out placement i.e middle, left, right
**
tell application "TextEdit"
set theFile to open (choose file of type "public.rtf" without invisibles) -- grab the doc.
set dname to name of document 1
set a to paragraphs of text of document dname
repeat with i from 1 to number of items in a
set this_itemP to item i of a
if this_itemP contains "yourword" then
set newDoc to (make new document at beginning of documents)
set the text of the front document to this_itemP
repeat with i from 1 to number of words in this_itemP
set this_itemWord to word i of this_itemP
set ARsize to {size of attribute runs} of word i of text of document dname
set ARfont to {font of attribute runs} of word i of text of document dname
set ARcolor to (color of (attribute runs) of word i of text of document dname)
set ARcolor to item 1 of ARcolor
set font of word i of newDoc to (ARfont as string)
set size of word i of newDoc to (ARsize as string)
set color of word i of newDoc to ARcolor
end repeat
end if
end repeat
end tell
Everything is possible, Adam, but sometimes quite tricky
I would read the raw RTF text with AppleScript’s read command,
extract the header, which contains all the color and font definitions, make your selections and write the files
back to disk each with the header and its selected text.
Maybe you have Hanaan Rosenthal’s AppleScript Guide, in one chapter is a tutorial about the RTF syntax
tell application "TextEdit"
set theFile to open (choose file of type "public.rtf" without invisibles) -- grab the doc.
set dname to name of document 1
set a to paragraphs of text of document dname
repeat with i from 1 to number of items in a
set this_itemP to item i of a
if this_itemP contains "Textedit" then
set this_start to i
else if this_itemP contains "open" then
set this_end to i
end if
end repeat
set chunk to (paragraphs this_start thru this_end of text of document dname) as string
set newDoc to (make new document at beginning of documents)
set the text of the front document to chunk
repeat with i from 1 to number of words in chunk
set this_itemWord to word i of chunk
set ARsize to {size of attribute runs} of word i of text of document dname
set ARfont to {font of attribute runs} of word i of text of document dname
set ARcolor to (color of (attribute runs) of word i of text of document dname)
set ARcolor to item 1 of ARcolor
set font of word i of newDoc to (ARfont as string)
set size of word i of newDoc to (ARsize as string)
set color of word i of newDoc to ARcolor
end repeat
end tell
That was a very good idea,
Although my other script seems to work well, it could be very slow
This is very close…
It does all the fonts,colour, and sizes.
but does not seem to keep the paragraphs in order
set thetext to do shell script "cat /Users/username/Untitled.rtf" as string
(*look for header *)
set a to paragraphs of thetext
set this_Header_start to 1
repeat with i from 1 to number of items in a
set this_itemP to item i of a
if this_itemP contains "pardirnatural" then
set this_Header_end to i
exit repeat
end if
end repeat
(*look for text from and to *)
repeat with i from 1 to number of items in a
set this_itemP to item i of a
if this_itemP contains "tell" then
set this_start to i
exit repeat
end if
end repeat
repeat with i from 1 to number of items in a
set this_itemP to item i of a
if this_itemP contains "the doc" then
set this_end to i
exit repeat
end if
end repeat
(* put it together and writ it out *)
set header to (paragraphs this_Header_start thru this_Header_end of thetext)
set chunk to (paragraphs this_start thru this_end of thetext) --as string
do shell script "echo " & "\"" & header & "\"" & return & " > text.rtf"
do shell script "echo " & "\"" & chunk & "\"}" & " >> text.rtf"
**edit, change the line if this_itemP contains “pardirnatural” then to if this_itemP contains “pard” then
Would it work for you to delete the text that’s not in a chunk of interest, save the document under a different name, and then reopen the original to get the next chunk? It’s possible on my Jaguar machine, but I suspect the scripting may be better in Tiger.
You can Select the text and use the Textedit services :New Window Containing Selection ( mine has a hot key, but I can not remember if this is standard)
You can use this in most apps, but more importantly, you can use it in Textedit on the rtf doc.
tell application "TextEdit" to set text of document 1 to paragraphs i thru j of document 2
However, Cocoa Scripting’s standard Text Suite implementation blows chunks (as I’m sure you already know), so in practice all you get is an error.
As a workaround, in theory you could copy the text into a new document, then delete the portions you don’t want:
tell application "TextEdit"
set text of document 1 to text of document 2
delete paragraphs j thru -1 of document 2
delete paragraphs 1 thru i of document 2
end tell
But, once again, Cocoa Scripting’s standard Text Suite implementation blows chunks (feel free to file bugs on that POS), and in practice appears to have O(n*n) efficiency when deleting text, so quickly grinds to a halt as document size increases.
Note that TextCommands doesn’t do RTF. Its ‘format’ command is for getting string representations of AppleScript values.
Note that while the first edition did, the second doesn’t.
If you want to edit the RTF data manually, you can find the RTF specs online easily enough. How practical this is will depend on the clarity of the data and the complexity of the changes.
Other options would be to use a non-Cocoa Scripting-based rich text editor, e.g. Word or Tex-Edit Pro may be suitable, or use another language that provides RTF libraries or RTF-aware rich text classes, e.g. you could knock together a simple command line tool using NSAttributedString with the AppKit additions.
Thank you hhas; it’s good to know that my ineptness at transferring paragraphs of easily delineated paragraphs of an export from an unscriptable database app (that insists on dumping the whole thing as one document) to a set of new separate documents is not entirely my own. I’ve discovered that attribute runs and properties of attribute runs contain all the required data for reformatting a new document, but that one must then alter the format of the new text on a word by word basis because making a new attribute run doesn’t seem to be possible (at least I’ve never discovered the language for doing it).
Be nice when AppleScript supports RTF better than it does now as more and more apps seem to be using it as their native text display.
A variation on the RTF-editing approach is to strip the visible text from the bits outside the current chunk, but to leave the RTF formatting tags in place. That way, if the chunk starts in the middle of an attribute run, the relevant tags will be in force at the point where the chunk starts in the edited document. If the RTF text is loaded into TextEdit and resaved from there, any superfluous tags will be removed automatically.
The script below assumes you have the original document open in TextEdit’s front window and already know the paragraph ranges of the three chunks. The chunk files are saved to the same folder as the original. With my 44KB test document, most of the running time is taken up with the opening and resaving of the three chunk files at the very end of the getChunks() handler. Tested in Jaguar but not (yet) in Tiger.
-- Supervise the extraction of three chunks from an RTF document into new TextEdit files and documents.
-- docPath is TextEdit's POSIX path to the file of its front document.
-- rangeLists is a list of three two-integer lists, the integers representing paragraph numbers.
on getChunks(docPath, rangeLists)
set origPath to docPath as POSIX file as Unicode text
set rtf to (read file origPath)
set newFiles to {}
set astid to AppleScript's text item delimiters
set rtfLF to "\\" & (ASCII character 10) -- RTF line feed.
if (rtf does not contain rtfLF) then set rtfLF to "\\" & return
set AppleScript's text item delimiters to rtfLF
set paragraphCount to (count rtf's text items)
repeat with chunk from 1 to 3
set {i, j} to item chunk of rangeLists
set AppleScript's text item delimiters to rtfLF
set parts to {text from text item i to text item j of rtf}
if (i > 1) then set beginning of parts to stripTextFromRTF(text 1 thru text item (i - 1) of rtf)
if (j < paragraphCount) then set end of parts to stripTextFromRTF(text from text item (j + 1) to -1 of rtf)
set end of parts to "}"
set AppleScript's text item delimiters to ""
set newRTF to parts as string
set newPath to origPath & " Chunk " & chunk & ".rtf"
set fRef to (open for access file newPath with write permission)
try
set eof fRef to 0
write newRTF to fRef
end try
close access fRef
set end of newFiles to alias newPath
end repeat
set AppleScript's text item delimiters to astid
tell application "TextEdit"
activate
open newFiles
set modified of documents 1 thru 3 to true
save (documents 1 thru 3)
end tell
end getChunks
-- Strip the text from an RTF chunk, leaving the formatting in place.
-- TextEdit will remove any redundant formatting when it opens and resaves the document.
on stripTextFromRTF(rtf)
set skippables to "'uU{}" & (ASCII character 10) & return
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to "\\"
script o
property TIs : rtf's text items
end script
considering case
-- Text item 1 is either "", text, or a brace for RTF code.
if (beginning of o's TIs is not "{") then set item 1 of o's TIs to ""
set zapNext to false
repeat with i from 2 to (count o's TIs)
set thisTI to item i of o's TIs
if ((count thisTI) is 0) then -- Textual backslash ("\\"). The following text item will be text too.
set item i of o's TIs to missing value
set zapNext to true
else if (zapNext) then -- Text following a textual backslash.
set item i of o's TIs to missing value
set zapNext to false
else if (character 1 of thisTI is in skippables) then -- Exotic character, textual brace, or line end.
set item i of o's TIs to missing value
else if (thisTI contains " ") and (thisTI does not start with "fcharset") then -- Probably an attribute tag.
set item i of o's TIs to word 1 of thisTI & " "
end if
end repeat
end considering
set rtf to o's TIs's strings as string
set AppleScript's text item delimiters to astid
return rtf
end stripTextFromRTF
-- Assuming you've already worked out that the three "chunks" are paragraphs 1 to 17, 18 to 48, and 49 to 111.
tell application "TextEdit" to set docPath to path of front document
getChunks(docPath, {{1, 17}, {18, 48}, {49, 111}})
I do know the paragraph ranges and can identify the name to be used for each chunk from within the chunk. Tomorrow, I’ll make a fresh start on that so I can try your method on my document.
I’ve discovered that the raw unicode of the document I’m working with which is an export from another program called BookEnds (which is not scriptable) starts off with every font on my machine, and doesn’t contain any newLine characters. The paragraphs of the text are delimited by \p symbols. With a few mods, however, I might get this running using your method.
Thanks, Nigel. It turns out that most of my difficulties are caused by the “vagaries” of my file, which is an export from BookEnds. If I prepare my own files in TextEdit even the very crude approach of grabbing the file’s attributes one word at a time and then transferring them to another works perfectly (not fast, but accurate). In my real case, I’m actually creating the second document as an entry in Journler, but this approach works there too since it’s language for dealing with words is the same. I think that the BookEnds file will have to have all its paragraph symbols changed back to proper newlines. I’m now convinced that it has to be fixed first.
tell application "TextEdit"
set F to {}
set S to {}
set C to {}
tell text of document 1
set tText to it
repeat with k from 1 to count words of it
set F's end to font of word k
set S's end to size of word k
set C's end to color of word k
end repeat
end tell
make new document
set text of document 1 to tText
tell text of document 1
repeat with k from 1 to count words
set font of word k to item k of F
set size of word k to item k of S
set color of word k to item k of C
end repeat
end tell
end tell