Text Encoding for Excel csv file

Nigel_Garvey · February 12, 2014, 11:55am

[url=http://www.unicode.org/faq/utf_bom.html]http://www.unicode.org/faq/utf_bom.html[/url]

I don’t have Excel, which is apparently where the problem lies.

I do have Numbers 2.3. I’ve saved five CSV files containing the data for the range “A2:C4” in your image in post #16. The formats are UTF-8 without BOM, UTF-8 with BOM, UTF-16BE (ie. File Read/Write’s ‘as Unicode Text’) without BOM, UTF-16BE with BOM, and UTF-16LE (‘as «class ut16»’ on an Intel machine) with BOM. The BOM’s unavoidable in the last instance and in any case, BOM-less UTF-16 ” according to the blurb at the above link ” should be assumed to be big-endian. In fact, though, only the BOM-less UTF-16 is misinterpreted when the files are opened in Numbers. (It’s apparently not recognised as Unicode text.) The others are all rendered perfectly.

Copying the files over to my G5 ” where of course the version written ‘as Unicode text’ with a BOM is identical to one written locally ‘as «class ut16»’ ” the results are exactly the same. The data are all rendered perfectly except for the UTF-16BE without BOM.

So your problem’s not something that’s intrinsic to CSV, but to the application interpreting its somewhat loose rules.

To cover all the bases, here’s the script I used to prepare and write the data. It’s hard-wired to use a comma delimiter:

on list2csv(listOfLists)
	copy listOfLists to listOfLists
	
	set csvQuotedQuote to quote & quote
	set csvRecordDelimiter to return & linefeed
	
	set astid to AppleScript's text item delimiters
	repeat with i from 1 to (count listOfLists)
		set recordList to item i of listOfLists
		repeat with j from 1 to (count recordList)
			set fieldValue to (item j of recordList) as text
			if (fieldValue contains quote) then
				set AppleScript's text item delimiters to quote
				set fieldValue to fieldValue's text items
				set AppleScript's text item delimiters to csvQuotedQuote
				set fieldValue to fieldValue as text
			end if
			if ((fieldValue contains quote) or (fieldValue contains ",") or ((count fieldValue's paragraphs) > 1) or (fieldValue begins with space) or (fieldValue ends with space)) then set item j of recordList to quote & fieldValue & quote
			set AppleScript's text item delimiters to ","
			set item i of listOfLists to recordList as text
		end repeat
	end repeat
	set AppleScript's text item delimiters to csvRecordDelimiter
	set csv to listOfLists as text
	set AppleScript's text item delimiters to astid
	
	return csv
end list2csv

set myData to {{"one", "two", "three"}, {"RÃ©gulier a", "b", "c"}, {" RÃ©gulier d", "e,h", "f"}}

set csv to list2csv(myData)

-- Edit the following variously to write the data in different UTFs.
set fRef to (open for access file ((path to desktop as text) & "Test UTF-16BE.csv") with write permission)
try
	set eof fRef to 0
	-- BOM values, if used, at beginnings of files:
	-- When writing 'as «class utf8»', use «data rdatEFBBBF»
	-- When writing 'as Unicode text' (UTF-16BE, ie. big-endian), use «data rdatFEFF»
	-- When writing 'as «class ut16»' (UTF-16LE, little-endian), a processor-native BOM is prepended automatically by the system TO EACH SECTION OF TEXT WRITTEN. 
	write «data rdatFEFF» to fRef
	write csv to fRef as Unicode text
on error msg
	display dialog msg buttons {"OK"}
end try
close access fRef