How do I add binary 0xFEFF to a string? (And save it to a file)

lagr · May 3, 2023, 9:51pm

I have a string that I need to prepend with what is displayed as FEFF if I do a hexdump of that “character”.

How do I insert not-writable characters into a string? Does this affect saving of that string to a file?

I have searched but not found anything applicable.

Nigel_Garvey · May 3, 2023, 10:55pm

It’s a 2-byte code called a “Byte Order Mark”. If used, it goes at the beginning of a UTF-16 text file to show 1) that the text is encoded as UTF-16 and 2) the order of the two bytes in each 16-bit code unit. 0xFEFF indicates that the code units are “big-endian” — that is, that their high order bytes come before their low-order bytes. The other possibility with UTF-16 is 0xFFFE, which indicates “little-endian” text where the low order bytes come before the high-order bytes.

Different types of processor will assume one endianness or the other, depending on their architecture. The File Read/Write commands in the StandardAdditions were written back in the days when Macs ran on PowerPC processors, which were architecturally big-endian. So write … as Unicode text would write big-endian UTF-16 text to a file. When Apple switched to Intel processors, which are little-endian, the read and write commands continued to default to big-endian text simply for compatibility with older files. However, text applications on the new machines, such as TextEdit, were now little-endian when saving or writing as UTF-16, so if you wanted to use the StandardAdditions write command to save UTF-16 text for later reading by an application, it was necessary to write a big-endian BOM to the file first so that the application would know how to interpret the text. I don’t know what the situation is with the new M processors or whether the Read/Write commands still default to big-endian with as Unicode text.

The wise thing to do is to write your text to file as UTF-8 instead. This has only one possible byte order and practically every application nowadays assumes this is how text in a file is represented anyway. A UTF-8 BOM exists, but is not usually needed.

write myText to fileRef as «class utf8»
-- Or:
set myText to (read fileRef from 1 as «class utf8»)

If you’re still curious how to write a big-endian UTF-16 BOM to a file, it would be:

write «data FEFF» to fileRef
write myText to fileRef as Unicode text

lagr · May 4, 2023, 11:23am

I know about BOM, LE/BE etc. My initial problem was that Numbers erroneously opened an AS-generated tsv using Windows Latin encoding when it should have used UTF16.

Anyway, I changed the following line

	write tsv as Unicode text to fileDescriptor starting at eof

per your suggestions, to

	write tsv as «class utf8» text to fileDescriptor starting at eof

and now I get an error message:

(*errMsg: Parameter error.*)

and the resulting file is empty.

Why is that?

Nigel_Garvey · May 4, 2023, 12:32pm

The superfluous text keyword in your code. The line should be:

write tsv as «class utf8» to fileDescriptor starting at eof

«class utf8» is the token for UTF-8 Unicode text in this context. For some reason, Apple has never given it a keyword.