UTF-8 Encoding Error with XML File

I’m having a problem with UTF-8 encoding. If I use the below script to make 8 xml files, the first 2 will successfully import into InDesign. The last 6 will give me an error that says

Any ideas why it will correctly encode the first 2 files, but refuse to cooperate after that would be greatly appreciated.
(I’ll need to make between 8 to 24 xml files for each ad)

Writing to XML:


if xml_list is not ("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>" & linefeed & linefeed & "<Root>" & linefeed & linefeed & "</Root>") as string then
				
				--Copying the xml file to the chosen folder
				tell application "Finder"
					set xml_copy_file to POSIX file "/Volumes/NAS/Advertising Department/16_SCRIPTS/z_Future Scripts/Print Ad Layout/z_Testing Files/Article1.xml" as alias
					set xml_orig_name to name of xml_copy_file
					duplicate (xml_copy_file as alias) to xml_folder with replacing
					
					
					--Getting the promo number from the Excel file
					set tid to AppleScript's text item delimiters
					set AppleScript's text item delimiters to "_"
					set promo_number to first text item of excel_filename
					set AppleScript's text item delimiters to tid
					
					
					--Renaming the copied xml file to reflect the current page
					set xml_orig_file to xml_folder & xml_orig_name as string as alias
					set xml_file_name to promo_number & "_XML_Page " & (test_page_number as integer) & ".xml" as string
					set name of xml_orig_file to xml_file_name
					set xml_page_file to xml_folder & xml_file_name as string as alias
				end tell
				
				--Opening the xml file
				set myFile to open for access xml_page_file with write permission
				write xml_list to myFile
				close access myFile
			end if

Full Script:

--Global variables
global xml_list
global page_number
global number_list


---------------------------------------------------------------------------------------------------
--PROMPTING THE USER TO SELECT THE FOLDER IN WHICH TO SAVE THE WORD FILES
---------------------------------------------------------------------------------------------------

with timeout of 3000 seconds
	
	--Prompting the user to choose a folder in which to save the XML files
	set xml_folder to choose folder with prompt "Select the folder in which to save the XML files"
	
	--Prompting the user to selet the promotion's Excel file
	tell application "Finder"
		set my_file to choose file with prompt "Choose the Promo's Excel File"
		set excel_filename to name of my_file
	end tell
	
	set excel_report to POSIX path of my_file
	
	
	---------------------------------------------------------------------------------------------------
	--SPECIFYING WHICH EXCEL COLUMNS TO WORK WITH
	---------------------------------------------------------------------------------------------------
	
	set blain_number_column to "A"
	set image_name_column to "B"
	set image_path_column to "D"
	set blain_to_pic_number_column to "F"
	set image_to_pic_name_column to "G"
	set image_to_pic_path_column to "H"
	set page_number_column to "J"
	set vendor_column to "K"
	set description_column to "L"
	set LL_column to "M"
	set notes_column to "N"
	set reg_price_column to "O"
	set sale_price_column to "P"
	set possible_page_number_list to {"1.0", "2.0", "3.0", "4.0", "5.0", "6.0", "7.0", "8.0", "9.0", "10.0", "11.0", "12.0", "13.0", "14.0", "15.0", "16.0", "17.0", "18.0", "19.0", "20.0", "21.0", "22.0", "23.0", "24.0"}
	set number_list to {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0"}
	
	
	---------------------------------------------------------------------------------------------------
	--CREATING EMPTY LISTS TO BE USED LATER IN DETERMINING IF 
	--A FILE IS OPEN OR IF THERE IS A PROBLEM
	---------------------------------------------------------------------------------------------------
	
	set problem_list to {}
	set number_list to {}
	
	
	---------------------------------------------------------------------------------------------------
	--ADDING XML INFO TO A LIST
	---------------------------------------------------------------------------------------------------
	
	tell application "Microsoft Excel"
		activate
		open excel_report
		
		repeat with i from 1 to count of possible_page_number_list
			set test_page_number to item i of possible_page_number_list
			my add_to_page_list(test_page_number, blain_number_column, image_path_column, image_to_pic_path_column, vendor_column, description_column, reg_price_column, sale_price_column, notes_column, page_number_column, LL_column) -->Calling the add_to_page_list function 
			log xml_list
			
			
			---------------------------------------------------------------------------------------------------
			--ADDING ALL NECESSARY INFO FROM THE LIST TO EACH XML FILE
			---------------------------------------------------------------------------------------------------
			
			--Checking if the list for that page # is blank and only creating the XML file if it contains information
			if xml_list is not ("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>" & linefeed & linefeed & "<Root>" & linefeed & linefeed & "</Root>") as string then
				
				--Copying the xml file to the chosen folder
				tell application "Finder"
					set xml_copy_file to POSIX file "/Volumes/NAS/Advertising Department/16_SCRIPTS/z_Future Scripts/Print Ad Layout/z_Testing Files/Article1.xml" as alias
					set xml_orig_name to name of xml_copy_file
					duplicate (xml_copy_file as alias) to xml_folder with replacing
					
					
					--Getting the promo number from the Excel file
					set tid to AppleScript's text item delimiters
					set AppleScript's text item delimiters to "_"
					set promo_number to first text item of excel_filename
					set AppleScript's text item delimiters to tid
					
					
					--Renaming the copied xml file to reflect the current page
					set xml_orig_file to xml_folder & xml_orig_name as string as alias
					set xml_file_name to promo_number & "_XML_Page " & (test_page_number as integer) & ".xml" as string
					set name of xml_orig_file to xml_file_name
					set xml_page_file to xml_folder & xml_file_name as string as alias
				end tell
				
				--Opening the xml file
				set myFile to open for access xml_page_file with write permission
				write xml_list to myFile
				close access myFile
			end if
			
		end repeat
	end tell
	
	
	---------------------------------------------------------------------------------------------------
	--DISPLAYING A FINISHED DIALOG
	---------------------------------------------------------------------------------------------------
	
	tell application "SystemUIServer" to display dialog "XML Files for Promo# " & promo_number & " are Created" with icon note giving up after 5400
	
end timeout

-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
--FUNCTIONS
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------


-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
--FUNCTION TO ADD XML INFO FOR EACH PAGE TO A LIST
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------

on add_to_page_list(the_number, column1, column2, column3, column4, column5, column6, column7, column8, column9, column10)
	
	
	--Adding standard xml code to the top of the xml file
	set xml_list to "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>" & linefeed & linefeed & "<Root>" & linefeed as string
	
	
	--Determining how many rows the Excel file has
	tell application "Microsoft Excel"
		tell active sheet
			set image_count to 1
			set the_start_row to 2
			repeat with i from 1 to count of rows of used range
				set current_row to i
				
				--Determining the last used row number
				set row_count to count of rows of used range
				
				--Determining the last column
				select none
				set row_to_test to range "1:1"
				set cell_count to count cells of row_to_test
				set last_cell to get address of (get end cell cell_count of row_to_test direction toward the left)
				set first_cell to get address of (get end cell cell_count of row_to_test direction toward the right)
				if item 1 of last_cell is "$" and item 3 of last_cell is "$" then
					set last_column to item 2 of last_cell
				else if item 1 of last_cell is "$" and item 3 of last_cell is not "$" and item 4 of last_cell is "$" then
					set last_column to items 2 thru 3 of last_cell
				else if item 1 of last_cell is "$" and item 3 of last_cell is not "$" and item 4 of last_cell is not "$" and item 5 of last_cell is "$" then
					set last_column to items 2 thru 4 of last_cell
				else if item 1 of last_cell is "$" and item 3 of last_cell is not "$" and item 4 of last_cell is not "$" and item 5 of last_cell is not "$" and item 6 of last_cell is "$" then
					set last_column to items 2 thru 5 of last_cell
				end if
				
				
				--Setting the cell address for each info field
				set blain_number_cell to column1 & current_row as string
				set image_path_cell to column2 & current_row as string
				set image_to_pic_path_cell to column3 & current_row as string
				set vendor_cell to column4 & current_row as string
				set description_cell to column5 & current_row as string
				set reg_price_cell to column6 & current_row as string
				set sale_price_cell to column7 & current_row as string
				set notes_cell to column8 & current_row as string
				set page_cell to column9 & current_row as string
				set previous_page_cell to column9 & (current_row - 1) as string
				set current_LL_cell to column10 & current_row as string
				set test_LL_cell to column10 & (current_row + 1) as string
				set current_LL to value of cell current_LL_cell
				
				
				
				--Setting the cell address for the next info field in case it is a Lower Level (LL)
				set LL_row_list to {}
				set test_LL to value of cell test_LL_cell
				if test_LL contains "LL" then
					repeat
						set next_row to (current_row + 1)
						set LL_row_list to LL_row_list & next_row
						set test_LL_cell to "M" & next_row as string
						set next_test_LL_cell to "M" & (next_row + 1)
						if the value of cell next_test_LL_cell does not contain "LL" then
							exit repeat
						else
							set current_row to (current_row + 1)
						end if
					end repeat
				end if
				
				
				---------------------------------------------------------
				
				
				--Getting the values from every cell address
				if current_row is greater than or equal to the_start_row then
					set page_number to value of cell page_cell
					if (page_number as string) does not contain "0.0" then
						if current_LL does not contain "LL" then
							
							set the_vendor to value of cell vendor_cell as string
							set the_description to value of cell description_cell as string
							set reg_price to value of cell reg_price_cell as string
							set sale_price to value of cell sale_price_cell as string
							set the_notes to value of cell notes_cell as string
							set orig_blain_number to value of cell blain_number_cell as string
							set blain_number to my number_to_string(orig_blain_number)
							
							
							---------------------------------------------------------
							
							
							--REMOVING ILLEGAL XML CHARACTERS IN THE DESCRIPTION & FIXING THE IMAGE PATH						
							--Changing illegal # characters in the description to ones xml can understand
							(*if the_description contains "#" then
							set search_character to "#"
							set replacement_character to "#"
							repeat 5 times
								set the_description to my replace_characters(the_description, search_character, replacement_character) -->See replace_characters function at the bottom
								if the_description does not contain "# " then exit repeat
							end repeat
						end if*)
							
							
							--Changing illegal & characters in the description to ones xml can understand
							if the_description contains "&" then
								set search_character to "&"
								set replacement_character to "&"
								repeat 5 times
									set the_description to my replace_characters(the_description, search_character, replacement_character) -->See replace_characters function at the bottom
									if the_description does not contain "& " then exit repeat
								end repeat
							end if
							
							
							--Changing illegal & characters in the vendor to ones xml can understand
							if the_vendor contains "&" then
								set search_character to "&"
								set replacement_character to "&"
								repeat 5 times
									set the_vendor to my replace_characters(the_vendor, search_character, replacement_character) -->See replace_characters function at the bottom
									if the_vendor does not contain "& " then exit repeat
								end repeat
							end if
							
							--Changing illegal & characters in the vendor to ones xml can understand
							if the_notes contains "&" then
								set search_character to "&"
								set replacement_character to "&"
								repeat 5 times
									set the_notes to my replace_characters(the_notes, search_character, replacement_character) -->See replace_characters function at the bottom
									if the_notes does not contain "& " then exit repeat
								end repeat
							end if
							
							
							--Changing illegal & characters in the vendor to ones xml can understand
							if reg_price contains "&" then
								set search_character to "&"
								set replacement_character to "&"
								repeat 5 times
									set reg_price to my replace_characters(reg_price, search_character, replacement_character) -->See replace_characters function at the bottom
									if reg_price does not contain "& " then exit repeat
								end repeat
							end if
							
							--Changing illegal & characters in the vendor to ones xml can understand
							if sale_price contains "&" then
								set search_character to "&"
								set replacement_character to "&"
								repeat 5 times
									set sale_price to my replace_characters(sale_price, search_character, replacement_character) -->See replace_characters function at the bottom
									if sale_price does not contain "& " then exit repeat
								end repeat
							end if
							
							
							
							--Changing image path so it doesn't have double // after the 1_4c_IMAGES folder (the // breaks the xml image import) 
							(*if the_notes contains "DESIGNER NOTE: " then
								set tid to AppleScript's text item delimiters
								set AppleScript's text item delimiters to "DESIGNER NOTE: "
								set my_notes to last text item of the_notes
								set AppleScript's text item delimiters to tid
							else if the_notes is "" then
								set my_notes to ""
							else
								set my_notes to the_notes
							end if*)
							set my_notes to the_notes
							
							
							---------------------------------------------------------
							
							
							--ADDING ALL NECESSARY INFO TO THE CURRENT PAGE LIST						
							--Getting the large sale price and the cents
							set tid to AppleScript's text item delimiters
							set AppleScript's text item delimiters to "."
							set large_sale_price to first text item of sale_price
							set sale_price_cents to last text item of sale_price
							set AppleScript's text item delimiters to tid
							
							
							--Setting the kerning for 1 digit if the small price is a % and 2 digit if it is not
							if (sale_price_cents as string) contains "%" then
								set kerning_tag1 to "<kerning1digit>"
								set kerning_tag2 to "</kerning1digit>"
								set sale_word to "OFF"
								set sale_tag1 to "<off>"
								set sale_tag2 to "</off>"
							else if (sale_price_cents as string) contains "$" and (sale_price_cents as string) contains "off" then
								set kerning_tag1 to "<kerning1digit>"
								set kerning_tag2 to "</kerning1digit>"
								set sale_word to "OFF"
								set sale_tag1 to "<off2>"
								set sale_tag2 to "</off2>"
							else
								set kerning_tag1 to "<kerning2digit>"
								set kerning_tag2 to "</kerning2digit>"
								set sale_word to "SALE"
								set sale_tag1 to "<sale_wording>"
								set sale_tag2 to "</sale_wording>"
							end if
							
							
							--Adding a second 0 to all prices that should end in .00
							if sale_price_cents as string is "0" then
								set sale_price_cents to "00"
							end if
							
							
							--Adding all the text info to the list
							if (page_number as string) is (the_number as string) then
								
								set xml_list to xml_list & "<body><body_price><large_price>" & large_sale_price & "</large_price><small_price>" & sale_price_cents & "</small_price>" & kerning_tag1 & " " & kerning_tag2 & sale_tag1 & sale_word & sale_tag2 & linefeed & "<heading>" & the_vendor & " " & the_description & "." & "</heading><blain_number>" & " " & blain_number & "</blain_number><reg_price> Reg. " & reg_price & "</reg_price>" & linefeed as string
								
								if the_notes is not "" then set xml_list to xml_list & "<disclaimer> " & my_notes & "</disclaimer>" as string
								
								if LL_row_list is not {} then
									set para_count to "3"
									set LL_row_reverse_list to LL_row_list
									--set LL_row_reverse_list to reverse of LL_row_list
									repeat with current_LL_row in LL_row_reverse_list
										
										
										--Setting each cell
										set next_blain_number_cell to column1 & current_LL_row as string
										set next_vendor_cell to column4 & current_LL_row as string
										set next_description_infocell to column5 & current_LL_row as string
										set next_reg_price_cell to column6 & current_LL_row as string
										set next_sale_price_cell to column7 & current_LL_row as string
										
										--Getting the values for the LL cells
										tell application "Microsoft Excel"
											activate
											set next_orig_blain_number to value of cell next_blain_number_cell
											set next_blain_number to my number_to_string(next_orig_blain_number)
											set next_description_orig to value of cell next_description_infocell
											set next_reg_price to "Reg. " & value of cell next_reg_price_cell & "."
											set next_sale_price_value to value of cell next_sale_price_cell
											set next_sale_price to "SALE " & value of cell next_sale_price_cell
											
											if next_sale_price_value contains "OFF" then
												set next_sale_price_characters to every character of next_sale_price_value
												set next_sale_price_number to {}
												repeat with next_sale_price_character in next_sale_price_characters
													if next_sale_price_character is in number_list then
														set next_sale_price_number to next_sale_price_number & next_sale_price_character
														
													end if
												end repeat
											end if
											
											--Simplifying the description
											if (the_description as string) is in next_description_orig then
												set tid to AppleScript's text item delimiters
												set AppleScript's text item delimiters to the_description
												set next_description to first text item of next_description_orig as string
												set AppleScript's text item delimiters to tid
											else
												set next_description to next_description_orig
											end if
										end tell
										
										
										set xml_list to xml_list & "<ll_description>" & next_description & "</ll_description>" & "<ll_blain_number>" & next_blain_number & "</ll_blain_number>" & "<ll_reg_price>" & next_reg_price & "</ll_reg_price>" & "<ll_sale_price>" & next_sale_price & "</ll_sale_price>" & linefeed as string
										
										
									end repeat
								end if
								
								--Adding ending tags to body text
								set xml_list to xml_list & "</body_price></body>" as string
							end if
						end if
					end if
				end if
			end repeat
			
			
			--Adding all the images to the end of the list
			set image_count to 1
			set the_start_row to 2
			repeat with i from 1 to count of rows of used range
				set current_row to i
				if current_row is greater than or equal to the_start_row then
					set page_cell to column9 & current_row as string
					set image_path_cell to column2 & current_row as string
					set image_to_pic_path_cell to column3 & current_row as string
					
					
					--Changing image path so it doesn't have double // after the 1_4c_IMAGES folder (the // breaks the xml image import) 
					set image_path to value of cell image_path_cell
					(*display dialog image_path as string
					if (image_path as string) contains "1_4c IMAGES//" then
						display dialog "image path qualifies"
						set search_character to "1_4c IMAGES//"
						set replacement_character to "1_4c IMAGES/"
						set image_path to my replace_characters(image_path, search_character, replacement_character) -->See replace_characters function at the bottom
					end if
					
					
					display dialog image_path as string*)
					
					--Getting the values from every cell address
					if current_row is greater than or equal to the_start_row then
						set page_number to value of cell page_cell
						if (page_number as string) does not contain "0.0" then
							if current_LL does not contain "LL" then
								
								
								--Adding all the text info to the list
								if (page_number as string) is (the_number as string) then
									
									
									--Adding the image info to the list
									if image_path is not "" then set xml_list to xml_list & "<image" & image_count & " href=\"file://" & image_path & "\"></image" & image_count & ">" & linefeed as string
									set image_count to (image_count + 1)
									set image_to_pic_path_orig to value of cell image_to_pic_path_cell as string
									if (image_to_pic_path_orig as string) is not "N/A" and (image_to_pic_path_orig as string) is not equal to image_path then
										if image_to_pic_path_orig contains ", " then
											set tid to AppleScript's text item delimiters
											set AppleScript's text item delimiters to ", "
											set image_to_pic_paths to every text item of image_to_pic_path_orig
											set AppleScript's text item delimiters to tid
										else
											set image_to_pic_paths to image_to_pic_path_orig
										end if
										
										set image_count to "1"
										if image_path is not "" then
											set the_item to "2"
										else
											set the_item to "1"
										end if
										repeat with image_to_pic_path in image_to_pic_paths
											set xml_list to xml_list & "<image" & image_count & " href=\"file://" & image_path & "\"></image" & image_count & ">" as string
										end repeat
									else
										set xml_list to xml_list as string
									end if
									
									
								else
									set xml_list to xml_list
								end if
								
								
								
							end if
						end if
					end if
				end if
			end repeat
			
			set xml_list to xml_list & linefeed & "</Root>"
			log xml_list
		end tell
	end tell
end add_to_page_list


-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
--FUNCTION TO CHANGE EXCEL SCIENTIFIC NOTATION TO REAL NUMBERS
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------

on number_to_string(this_number)
	set this_number to this_number as string
	if this_number contains "E+" then
		set x to the offset of "." in this_number
		set y to the offset of "+" in this_number
		set z to the offset of "E" in this_number
		set the decimal_adjust to characters (y - (length of this_number)) thru ¬
			-1 of this_number as string as number
		if x is not 0 then
			set the first_part to characters 1 thru (x - 1) of this_number as string
		else
			set the first_part to ""
		end if
		set the second_part to characters (x + 1) thru (z - 1) of this_number as string
		set the converted_number to the first_part
		repeat with i from 1 to the decimal_adjust
			try
				set the converted_number to ¬
					the converted_number & character i of the second_part
			on error
				set the converted_number to the converted_number & "0"
			end try
		end repeat
		return the converted_number
	else
		return this_number
	end if
end number_to_string



-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
--FUNCTION TO REPLACE CHARACTERS
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------

on replace_characters(the_phrase, search_string, replacement_string)
	--Changing illegal xml characters to ones xml can understand
	
	set tid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to search_string
	--set item_count to count of text items of the_phrase
	set text_items to text items of the_phrase
	set AppleScript's text item delimiters to replacement_string
	set the_phrase to text_items as Unicode text
	set AppleScript's text item delimiters to tid
	log the_phrase
	
	return the_phrase
end replace_characters

Browser: Safari 602.4.8
Operating System: Mac OS X (10.10)

AppleScript: AppleScript 2.2.1
Browser: Safari 537.78.2
Operating System: Mac OS X (10.7)

Hi.

I don’t have time to unpick the entire script, but it seems you’re duplicating a file from one folder to another, renaming the duplicate, and writing your XML text to it.

One thing to note is that your code isn’t actually writing the text to the file as UTF-8, which is presumably the immediate cause of the problem. Should be:

write xml_list to myFile as «class utf8»

Another point, whose relevance here I don’t know, is that whatever you’re writing overwrites the file’s existing contents. If it’s longer than the existing contents, it’ll replace them altogether. If it’s shorter, some of the original contents will still be there at the end. If you don’t mind the contents being overwritten, there’s no need for the duplication. The ‘open for access’ command will simply create the file if it doesn’t exist ” although you won’t be able to create an alias to it before then. If you want to append your text to the end of existing content, you’ll have to add ‘starting at eof’ to the ‘write’ command.

Ideally, you should protect the code with a ‘try’ statement while the file’s open for access, so that the access will be closed again if there’s an error:

	-- .
	set xml_page_file to xml_folder & xml_file_name as string -- as alias
end tell

set myFile to open for access file xml_page_file with write permission
try
	write xml_list to myFile as «class utf8»
	-- Or: write xml_list to myFile as «class utf8» starting at eof
on error errMsg
	display dialog errMsg buttons {"OK} default button 1
end try
close access myFile

Yes, you are right. It’s duplicating a file from one folder to another, renaming the duplicate and writing the XML text to it. I don’t have time to test it right now, but this looks like it will fix the problem. Thanks Nigel :slight_smile: