Script to split a Word Doc into smaller files according to outline?

Hi

I’m aware that I’ve just asked a similar question about splitting RTF/txt files. This time I have a slightly different scenario. I have a long word document with an outline of 3 levels and then body text.

I am wondering if it would be possible to write a script to split this into folders and files according to the outline as follows:

Level 1, Level 2 and Level 3 become a folder hierarchy, with the name of the folder matching the text in the Levels. Then each section of body text under Level 3 (separated by bullet points), becomes an RTF or txt file in the folder for Line 3.

I would also want to give thought to the naming of the RTF files, perhaps adding another Level (Level 4) which would be the name of the file selected, or specifying a delimiter in each bullet point of the body text in Level 3 which separates the name of the file from the content of the file.

Again, I wonder if such a script is possible?

Thanks

Nick

Just to say that I am also interested in a method of achieving the same result using TextEdit, as I’d rather not use Microsoft Word if possible.

Of course TextEdit doesn’t have an outline view, so I would have to use some other method of knowing where to split the document.

I imagine that the simplest method would be with styles. So I have one style for Parts of the Book, one for Chapters and one for Sections. These would then become folders in Finder, or ideally groups in DEVONthink (ideally I would bypass Finder and create groups and RTFs directly in DEVONthink).

Each paragraph under the Sections style would then become RTFs in the group/folder for that section, with some text appended to the end (separated from the rest of the paragraph by a delimiter) which would become the file name of the RTF.

I hope that’s clear and would be interested in any ideas about how to best achieve this,

Thanks,

Nick.

Just to say that I have managed to put together a script that pretty much does what I want it to do, using just DEVONthink, and various text delimiters:

[code]tell application “DEVONthink Pro”
set theseItems to the selection
repeat with thisItem in theseItems
set bookName to (name of thisItem as string)
set authorName to texts 1 thru ((offset of “-” in bookName) - 2) of bookName
set titleName to texts ((offset of “-” in bookName) + 2) thru -1 of bookName
set authorLoc to create location “Sync/psychotherapy/book notes/” & authorName in database “nick”
set titleLoc to create location “Sync/psychotherapy/book notes/” & authorName & “/” & titleName in database “nick”

	set theText to text of window 1
	set sectionNo to 1
	set thisPart to ""
	set thisChapter to ""
	activate
	repeat with j from 1 to (count paragraphs in theText)
		set theParagraph to paragraph j of theText
		set paraText to text of theParagraph
		set thisTag to ""
		if theParagraph begins with "*** " then
			set thisPart to texts 5 thru -1 of paraText
			create location "Sync/psychotherapy/book notes/" & authorName & "/" & titleName & "/" & thisPart in database "nick"
		else if theParagraph begins with "** " then
			set sectionNo to 1
			set thisChapter to texts 4 thru -1 of paraText
			create location "Sync/psychotherapy/book notes/" & authorName & "/" & titleName & "/" & thisPart & "/" & thisChapter in database "nick"
		else if theParagraph begins with "* " then
			if thisPart is "" and thisChapter is "" then
				set thisSection to sectionNo & ". " & texts 3 thru -1 of paraText
				set x to create location "Sync/psychotherapy/book notes/" & authorName & "/" & titleName & "/" & thisSection in database "nick"
				set sectionNo to (sectionNo + 1)
			else
				set thisSection to sectionNo & ". " & texts 3 thru -1 of paraText
				set x to create location "Sync/psychotherapy/book notes/" & authorName & "/" & titleName & "/" & thisPart & "/" & thisChapter & "/" & thisSection in database "nick"
				set sectionNo to (sectionNo + 1)
			end if
		else if theParagraph is not "" then
			if paraText contains "~" then
				set thisName to texts ((offset of "~" in paraText) + 1) thru ((offset of "*" in paraText) - 2) of paraText
			else
				set thisName to texts 1 thru ((offset of ":" in paraText) - 1) of paraText
			end if
			if paraText starts with "0" then
				set thisText to texts ((offset of ":" in paraText) + 2) thru ((offset of "*" in paraText) - 2) of paraText & return & return & texts (offset of "{" in bookName) thru ((offset of "}" in bookName) - 1) of bookName & "@" & texts ((offset of "/" in paraText) + 1) thru ((offset of ":" in paraText) - 1) of paraText & "}"
			else if texts 1 thru 4 of paraText contains "/" then
				set thisText to texts ((offset of ":" in paraText) + 2) thru ((offset of "*" in paraText) - 2) of paraText & return & return & texts (offset of "{" in bookName) thru ((offset of "}" in bookName) - 1) of bookName & "@" & texts 1 thru ((offset of "/" in paraText) - 1) of paraText & "}"
			else
				set thisText to texts ((offset of ":" in paraText) + 2) thru ((offset of "*" in paraText) - 2) of paraText & return & return & texts (offset of "{" in bookName) thru ((offset of "}" in bookName) - 1) of bookName & "@" & texts 1 thru ((offset of ":" in paraText) - 1) of paraText & "}"
			end if
			if paraText contains "*" then
				set thisTag to texts ((offset of "*" in paraText) + 1) thru -1 of paraText
			end if
			set thisPage to texts 1 thru ((offset of ":" in paraText) - 1) of paraText
			create record with {name:thisName, type:rtf, plain text:thisText, comment:thisPage, tags:thisTag} in x
		end if
		
	end repeat
	set y to create location "Sync/psychotherapy/book notes/" & authorName & "/" & titleName in database "nick"
	move record thisItem to y
	set name of thisItem to "Full Text"
end repeat

end tell[/code]
It processes a book’s worth of notes in just a couple of seconds, so that is pleasing, adding tags to the new records as well. It creates a group hierarchy in DT that looks like this:

http://emberapp.com/nickharambee/images/devonthink-pro-2/sizes/m.png

If anyone would like to do something similar and wants to know more about the delimiters/format I am using then I’d be happy to let you know.

There are a few things that I haven’t worked out how to do yet though and I wonder if someone could help me out:

1. How to specify whether groups created are excluded from tagging or not.
2. How to refer to a single record. I will only be processing one file/record at a time, but couldn’t work out how to refer to just one record, so have wrapped the script in a “set theseItems to the selection/repeat with thisItem in theseItems” argument, as this is a method I am familiar with. Similarly to define the text in the current document I have used the argument: “set theText to text of window 1” when perhaps there is a way of referring to the text in the document rather than the window.
3. I am wondering if it is possible to determine the type of paragraph by font attributes rather than characters, e.g. “if font of theParagraph is bold then”. When I tried this I got an error message stating something like “can’t get font of the paragraph”.
4. The script is sometimes returning an error on attempting to rename the original file after it has been moved (see end of script): “error “DEVONthink Pro got an error: Can’t set content id 100608 of database id 1 to "Full Text".” number -10006 from content id 100608 of database id 1”

In time I want to adapt the script so that it can work out from the notes how many levels there are in the hierarchy for a particular book, so that the one script will work well with any book structure.

Nick