Batch Convert - Word Docs to Text Files

Hi,

I wonder if anyone can help me please. I regularly have to change a large number of Word Documents (.doc) to Text files (.txt) by hand - converting them from a specified folder and saving them as text file formats in another. Is it possible to do this using an AppleScript?

Any help or thoughts would be hugely appreciated. It would save me a lot of time!

Cheers,

MARTIN

I’m not really a Microsoft Office guy (I’m more like iWork) but this should do it:

set sourceFolder to choose folder with prompt "Where are the Word files?"
set destFolder to choose folder with prompt "Where do you want to save the text?"

set allFiles to list folder sourceFolder without invisibles

repeat with i in allFiles
	set myfile to (sourceFolder as text) & i
	set destfile to (destFolder as text) & i & ".txt"
	
	if validateWordDocument(myfile) then
		tell application "Microsoft Word"
			open myfile
			set myDocument to document myfile
			save as myDocument file name destfile file format format Unicode text
			close document destfile -- Word automatically changes to saved document
		end tell
	end if
end repeat

on validateWordDocument(filename)
	try
		set filename to filename as alias
		local exts
		set exts to {"doc", "docx"}
		set fext to name extension of (info for filename)
		log fext
		return fext is in exts
	on error
		return false
	end try
end validateWordDocument

Hope it work,
ief2

Hello.

Here is my solution which doesn’t use Ms Word at all.
I had already written it when I saw Iefs post so I submit it as well
The initial starting paths for choosing folders can be adjusted. I assumed you will get your word documents from within your documents folder, and your converted files into a folder on the desktop.

What happens here:

You choose a folder with doc and or docx files to convert to text files.

You choose a folder which the text files will be stored in.
We convert them one by one and stores them with a txt extension to the default folder.

-- Converts doc and docx files to text files using utf-8 encoding which is all right for western text.
set {tids, AppleScript's text item delimiters} to {AppleScript's text item delimiters, "."}
set docDir to choose folder with prompt "Choose a folder with some Word files" default location (path to documents folder)
set txtDir to choose folder with prompt "Choose a folder were the converted doc files converted will be stored\nYou can make a new folder from here." default location (path to desktop folder)

tell application "Finder"
	set docFiles to (every document file of folder docDir whose (name extension is in {"doc","docx"} )) as alias list
	set txtDir to (POSIX path of txtDir)
	repeat with aDoc in docFiles
		set thisDoc to contents of aDoc
		set the item_path to the quoted form of the POSIX path of thisDoc
		set the item_name to (text item 1 of text items of (get name of thisDoc))
		set out_file to quoted form of (txtDir & item_name & ".txt")
		try
			set res to do shell script "textutil -convert txt  -output " & out_file & " " & item_path
		on error e number n
			display dialog "An error occurred during conversion of : " & item_path & ": " & n & e
		end try
	end repeat
end tell

set AppleScript's text item delimiters to tids


Best Regards

Mcusr

Hi both,

Thank you so much for your help and your incredibly speedy responses. It’s interesting to see your different solutions.

A huge thanks. Over the course of a year, this really will save me a lot of time!

Cheers,

MARTIN

Hi,

easiest version, it puts the converted files into the source folder
textutil changes the name extension automatically



set sourceFolder to choose folder
tell application "Finder" to set theFiles to files of sourceFolder whose name extension is in {"doc", "docx"} or creator type is "MSWD"
repeat with oneFile in theFiles
	do shell script "textutil -convert txt " & quoted form of POSIX path of (oneFile as text)
end repeat

Its a beauty Stefan!

I just added this comment to add that there was a bug in my solution that is now fixed.
It was a misspelled variable name.

Still I get an error number −10004 in the Script Editors console I just can’t understand that happens because
the files are converted correctly. (The −10004 reads like some sort of privilege violation ).

Best Regards

McUsr

Thanks again for all of your help and contributions. I’m very happy!