Read a HTML file as class UTF8 trouble

I have been racking my brain trying to find a solution to this for days. I have compiled an applescript to loop through a word mailmerge, save each record as a html file and then read the html file s utf8 to be used in a outlook email body.

Where I am getting stuck is the reading of the created html file as utf8. I keep getting the following error:

"error “Microsoft Word got an error: End of file error.” number -39 from file “Macintosh HD:Users:Tiana:Desktop:BodyFiles:1.html”

I know that if I can read the html file correctly as utf8 it will work (I’ve testing this script using a pre-prepared html file and bypassing the mail merging). Here is my script below:

    tell application "System Events"

tell the user domain

set thertfname to ((the path of the desktop folder) as text) & "BodyFiles:" & theRecord & ".rtf"

set thehtmlname to ((the path of the desktop folder) as text) & "BodyFiles:" & theRecord & ".html"

end tell

end tell


-- now do the merge for that one record, save the result and close

set the first record of theDataSource to theRecord

set the last record of theDataSource to theRecord

execute data merge theDataMerge

save as active document file name thertfname file format format rtf

close active document


set b to thertfname

do shell script "textutil -convert html " & b

set theFilePath to thehtmlname

if characters -5 thru -1 of theFilePath as string is ".html" then

set theFileHandle to (open for access file theFilePath)

set someData to (read theFileHandle as «class utf8»)

close access theFileHandle

end if

Cross posted here: http://stackoverflow.com/questions/41864176/applescript-read-html-file-as-utf-8

First, you don’t need System Events to build the paths – just use:

set thertfname to ((path to desktop as text) & "BodyFiles:" & theRecord & ".rtf")
set thehtmlname to ((path to desktop as text) & "BodyFiles:" & theRecord & ".html")

Second, shell scripts require POSIX paths:

do shell script ("textutil -convert html " & quoted form of POSIX path of b)

Hi Shane! Thanks for the tips, I’ve incorporated it into my script. I’m rusty at best with AppleScript and even with the changes you posted I’m still having trouble on the line

set someData to (read theFileHandle as «class utf8»)

I’ve never had a problem being able to create the HTML file, but can’t get passed the reading step.

What error are you getting?

error “Microsoft Word got an error: End of file error.” number -39 from file "Macintosh HD:Users:Tiana:Desktop:BodyFiles:1.html

You shouldn’t be doing this inside a Word tell block. And you should check that the file isn’t empty.

I’ve tried to take it out but I think it won’t let me because it’s in a repeat block? Either way I took out the part of the script that reads the HTML file (which isn’t empty) and tried that separately:

set thehtmlname to ((path to desktop as text) & "BodyFiles:" & "1.html")



set theFilePath to the POSIX path of thehtmlname



if characters -5 thru -1 of theFilePath as string is ".html" then

	

	set theFileHandle to (open for access file theFilePath)

	

	set someData to (read theFileHandle as «class utf8»)

	

	close access theFileHandle

	set the clipboard to someData

	set theContent to someData

end if

However I still get errors. When I tried the POsIX path of the file I get:

error “Network file permission error.” number -5000 from file “/Users/Tiana/Desktop/BodyFiles/1.html” to «class fsrf»

When I use the set path of the file I get:

error “End of file error.” number -39 from file “Macintosh HD:Users:Tiana:Desktop:BodyFiles:1.html”

Am I setting up the script incorrectly?

Either rewrite the block, or put the file reading stuff in its own “tell current application” block.

You need an HFS path there, not a POSIX path.

Not entirely, the error is thrown when you use a file specifier followed by an POSIX path. You can use POSIX paths but you need to send it as an string object not a file specifier.


-- send path as an string object containing POSIX path
set theFileHandle to (open for access theFilePath)
--or send path as file specifier
set theFileHandle to (open for access POSIX file theFilePath)

My point probably got lost in translation. I meant there, as in the OP’s statement as it was constructed. And given that he’s having trouble doing it outside a Word tell block, a bare POSIX path might still be problematic.

I appreciate the help from both of you! However no matter what I try to do It keeps halting at that reading step. Ill just post my code below incase maybe theres an error else where thats preventing it?



display dialog ¬
	"Select OK if you have already prepared a Mail Merge Word Document linked to a data source." with title ¬
	"Personalised Email Pre-requisite" with icon caution
set thertf to (choose file with prompt "Select the Pre Prepared E-mail Body in Mail Merge Word Template Format.")



set subjectDialog to display dialog ¬
	"Enter the subject of the email to send:" default answer "Email Subject"
set theSubject to text returned of subjectDialog

tell application "Microsoft Word"
	open thertf
end tell
tell application "Microsoft Word"
	activate
	
	set theMMMD to the active document
	set theDataMerge to the data merge of theMMMD
	
	set bContinue to false
	set theState to the state of theDataMerge

	if bContinue then
		
		set the destination of theDataMerge to send to new document
		
		
		
		set theDataSource to the data source of theDataMerge
		set theRecord to 1
		set the active record of theDataSource to theRecord
		
		--return the active record of theDataSource
		repeat until the active record of theDataSource is not equal to theRecord
			
			tell application "System Events"
				tell the user domain
					set thertfname to ((the path of the desktop folder) as text) & "BodyFiles:" & theRecord & ".rtf"
					set thehtmlname to ((the path of the desktop folder) as text) & "BodyFiles:" & theRecord & ".html"
				end tell
			end tell
			-- now do the merge for that one record, save the result and close
			set the first record of theDataSource to theRecord
			set the last record of theDataSource to theRecord
			execute data merge theDataMerge
			save as active document file name thertfname file format format rtf
			close active document
			
			set b to thertfname
			do shell script "textutil -convert html " & b
			set theFilePath to thehtmlname
			
			if characters -5 thru -1 of theFilePath as string is ".html" then
				set theFileHandle to (open for access POSIX file theFilePath)
				set someData to (read theFileHandle as «class utf8»)
				close access theFileHandle
			end if
			set the clipboard to someData
			
			set messageText to someData
			tell application "Microsoft Outlook"
				
				set theContent to messageText
	
				set the messageBody to theContent
				set newMessage to make new outgoing message ¬
					with properties {subject:theSubject, content:theContent}

				tell newMessage

					make new to recipient with properties {email address:{name:theName, address:theEmail}}
				end tell

				send newMessage
				
			end tell
			
			set theRecord to theRecord + 1
			set the active record of theDataSource to theRecord
		end repeat

	end if
	
end tell

You still have the do shell script and open for access commands inside a Word tell block. You should get it out of there as Shane suggested earlier. You can wrap an tell current application block around it or move it to an handler.

It seems that the end if instruction closing the if characters -5 thru -1 of theFilePath as string is “.html” then one need to be moved just before the end repeat one.

If the if instruction returns false, the variable someData will not be defined.
If this case strikes at the first pass in the loop you will get an error.
If it strikes later, someData will contain what was extracted from the preceding file which is probably not what you want.

Below is a modified version taking care of what I wrote above and of the need to call shell script as well as open for access out of the tell application Word block.



display dialog ¬
	"Select OK if you have already prepared a Mail Merge Word Document linked to a data source." with title ¬
	"Personalised Email Pre-requisite" with icon caution
set thertf to (choose file with prompt "Select the Pre Prepared E-mail Body in Mail Merge Word Template Format.")



set subjectDialog to display dialog ¬
	"Enter the subject of the email to send:" default answer "Email Subject"
set theSubject to text returned of subjectDialog
set pathToDesktop to path to dedsktop as text
tell application "Microsoft Word"
	open thertf
	--end tell
	--tell application "Microsoft Word"
	activate
	
	set theMMMD to the active document
	set theDataMerge to the data merge of theMMMD
	
	set bContinue to false
	set theState to the state of theDataMerge
	
	if bContinue then
		
		set the destination of theDataMerge to send to new document
		
		
		
		set theDataSource to the data source of theDataMerge
		set theRecord to 1
		set the active record of theDataSource to theRecord
		
		--return the active record of theDataSource
		repeat until the active record of theDataSource is not equal to theRecord
			
			set thertfname to pathToDesktop & "BodyFiles:" & theRecord & ".rtf"
			set thehtmlname to pathToDesktop & "BodyFiles:" & theRecord & ".html"
			
			-- now do the merge for that one record, save the result and close
			set the first record of theDataSource to theRecord
			set the last record of theDataSource to theRecord
			execute data merge theDataMerge
			save as active document file name thertfname file format format rtf
			close active document
			
			set b to thertfname
			tell current application
				do shell script "textutil -convert html " & quoted form of POSIX path of b
				set theFilePath to thehtmlname
				
				if characters -5 thru -1 of theFilePath as string is ".html" then
					set theFileHandle to (open for access POSIX file theFilePath)
					set someData to (read theFileHandle as «class utf8»)
					close access theFileHandle
					--end if
					set the clipboard to someData # what need for that ?
					
					set messageText to someData
				
			tell application "Microsoft Outlook"
				
				set theContent to messageText
	
				set the messageBody to theContent # what need for that ?
				set newMessage to make new outgoing message ¬
					with properties {subject:theSubject, content:theContent}

				tell newMessage

					make new to recipient with properties {email address:{name:theName, address:theEmail}}
				end tell

				send newMessage
				
			end tell # "Microsoft Outlook"
			
					set theRecord to theRecord + 1
					set the active record of theDataSource to theRecord

				end if # characters -5 thru -1
			end tell # current application
		end repeat
		
	end if
	
end tell

Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) mardi 31 janvier 2017 11:28:47

AMAZING. Thank you so much!! I just had to make one little change to this line " set theFileHandle to (open for access POSIX file theFilePath)" - changed to " set theFileHandle to (open for access theFilePath)"

Here is the working code for reference:

display dialog ¬
	"Select OK if you have already prepared a Mail Merge Word Document linked to a data source." with title ¬
	"Personalised Email Pre-requisite" with icon caution
set thertf to (choose file with prompt "Select the Pre Prepared E-mail Body in Mail Merge Word Template Format.")



set subjectDialog to display dialog ¬
	"Enter the subject of the email to send:" default answer "Email Subject"
set theSubject to text returned of subjectDialog
set pathToDesktop to path to desktop as text
tell application "Microsoft Word"
	open thertf
	--end tell
	--tell application "Microsoft Word"
	activate
	
	set theMMMD to the active document
	set theDataMerge to the data merge of theMMMD
	
	set bContinue to false
	set theState to the state of theDataMerge
	
	
	set the destination of theDataMerge to send to new document
	
	
	
	set theDataSource to the data source of theDataMerge
	set theRecord to 1
	set the active record of theDataSource to theRecord
	
	--return the active record of theDataSource
	repeat until the active record of theDataSource is not equal to theRecord
		
		set thertfname to pathToDesktop & "BodyFiles:" & theRecord & ".rtf"
		set thehtmlname to pathToDesktop & "BodyFiles:" & theRecord & ".html"
		set theName to the data merge data field value of data merge data field "First_Name" of theDataSource
		set theEmail to the data merge data field value of data merge data field "Recipient_Email" of theDataSource -- now do the merge for that one record, save the result and close
		set the first record of theDataSource to theRecord
		set the last record of theDataSource to theRecord
		execute data merge theDataMerge
		save as active document file name thertfname file format format rtf
		close active document
		
		set b to thertfname
		tell current application
			do shell script "textutil -convert html " & quoted form of POSIX path of b
			set theFilePath to thehtmlname
			
			if characters -5 thru -1 of theFilePath as string is ".html" then
				set theFileHandle to (open for access theFilePath)
				set someData to (read theFileHandle as «class utf8»)
				close access theFileHandle
				--end if
				set the clipboard to someData # what need for that ?
				
				set messageText to someData
				
				tell application "Microsoft Outlook"
					
					set theContent to messageText
					
					set the messageBody to theContent # what need for that ?
					set newMessage to make new outgoing message ¬
						with properties {subject:theSubject, content:theContent}
					
					tell newMessage
						
						make new to recipient with properties {email address:{name:theName, address:theEmail}}
					end tell
					
					send newMessage
					
				end tell # "Microsoft Outlook"
				
				set theRecord to theRecord + 1
				set the «class 2692» of theDataSource to theRecord
				
			end if # characters -5 thru -1
		end tell # current application
	end repeat
	
	
end tell

Thanks for the feedback.

Just some questions.

What’s the need for the instruction : set bContinue to false ?

What’s the need for the test upon the extension “.html” as the script builds the pathname with it : set thehtmlname to pathToDesktop & “BodyFiles:” & theRecord & “.html” ?

As I own neither Outlook nor Word, I didn’t saw that after moving inserting the end tell # to current application quite at the end of the script, a Word instruction is a bit fooled.
set the active record of theDataSource to theRecord
become :
set the «class 2692» of theDataSource to theRecord

I assume that this format doesn’t prevent the instruction to work but it may be better to get rid of that.

To do that we must to replace the unique tell current application block by two of them.

tell current application
	do shell script "textutil -convert html " & quoted form of POSIX path of b
end tell # current application
tell current application
	set theFileHandle to (open for access theFilePath)
	set someData to (read theFileHandle as «class utf8»)
	close access theFileHandle
end tell # current application

Of course if you apply these changes, don’t forget to remove the end tell # current application instruction near the end of the script.

Of course, if you decide to remove the if characters -5 thru -1 of theFilePath as string is “.html” then instruction which seems to be useless, a single tell current application would be required.

tell current application
	do shell script "textutil -convert html " & quoted form of POSIX path of b
	set theFilePath to thehtmlname
			
	set theFileHandle to (open for access theFilePath)
	set someData to (read theFileHandle as «class utf8»)
	close access theFileHandle
end tell # current application

You may make additional cleaning if you look carefully at your use of variable names.
What need for two names for the html path - thehtmlname, theFilePath ?
What need for two names for the rtf path - thertfname, b ?

What need for four names for the extracted text - someData, messageText, theContent, messageBody ?

Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) mercredi 1 février 2017 16:18:00