Change file encoding of TextEdit by Applescript

I didn’t found uconv on my machine.
The job may be done with iconv.

set theFile to POSIX path of (choose file)

set newFileName to (do shell script "str=" & quoted form of theFile & ";echo ${str%.*}") & "_iconv.txt"

set enc to "UTF-16BE"
set cmd to "xxd -p -r <<< xfeff "
do shell script cmd & " > " & quoted form of newFileName # write the BOM : FE FF in the new file

do shell script "iconv -f UTF-8 -t " & enc & space & quoted form of theFile & " >> " & quoted form of newFileName # write the UTF16-BE encoded text after the BOM

I tried to play with ASObjC but I’m puzzled.

In Xcode Help I read :
[format]NSUTF16BigEndianStringEncoding
NSUTF16StringEncoding encoding with explicit endianness specified.[/format]
My understanding was that using this encoding I will get a file with the Big Endian BOM at beginning.
Alas I was wrong.

The code (most of which was borrowed to Shane STANLEY) used is :

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

on modifyPath:thePath adding:addString
	set pathString to current application's NSString's stringWithString:thePath
	set theExtension to pathString's pathExtension()
	set thePathNoExt to pathString's stringByDeletingPathExtension()
	set newPath to (thePathNoExt's stringByAppendingString:addString)
	if theExtension's |length|() > 0 then
		set newPath to newPath's stringByAppendingPathExtension:theExtension
	end if
	return newPath as string
end modifyPath:adding:

on decodeFile:thePath
   set theString to current application's NSString's stringWithContentsOfFile:thePath encoding:(current application's NSISOLatin1StringEncoding) |error|:(missing value)
   set newPath to my modifyPath:thePath adding:"-new"
   set theResult to theString's writeToFile:newPath atomically:true encoding:(current application's NSUTF16BigEndianStringEncoding) |error|:(missing value)
   return theResult as boolean
end decodeFile:

set theSource to (choose file)
my decodeFile:(POSIX path of theSource)

Is there something wrong in it or am I wrongly understanding what applying NSUTF16BigEndianStringEncoding is supposed to do ?

Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) mardi 14 mars 2017 15:56:27

Hi Yvan.

I think “with explicit endianness specified” is just an explanation that the enum NSUTF16BigEndianStringEncoding is used to specify explicitly that the text is to be saved with UTF-16 big-endian encoding, not with the endianness native to the machine. It’s an explicit instruction to writeToFile rather than an instruction to include an explicit BOM in the file. Maybe Shane will confirm this when he gets up.

According the the Xcode documentation, this enum was only introduced with MacOS 10.12, but it works on my 10.11 system.

iconv command line util works with stdin and stdout, meaning you can pipe it directly from one encoding to another without the need of creating additional temporary files.

Great to see other solutions even with third party command line utils :cool: I’m still just curious what went wrong with the code above in post #12.

Thanks Nigel.

I tried to use an awful scheme to insert the BOM.

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

on modifyPath:thePath adding:addString
	set pathString to current application's NSString's stringWithString:thePath
	set theExtension to pathString's pathExtension()
	set thePathNoExt to pathString's stringByDeletingPathExtension()
	set newPath to (thePathNoExt's stringByAppendingString:addString)
	if theExtension's |length|() > 0 then
		set newPath to newPath's stringByAppendingPathExtension:theExtension
	end if
	return newPath as string
end modifyPath:adding:

on decodeFile:thePath
	set theString to (current application's NSString's stringWithString:" ")
	set moreString to current application's NSString's stringWithContentsOfFile:thePath encoding:(current application's NSISOLatin1StringEncoding) |error|:(missing value)
	set theString to theString's stringByAppendingString:moreString
	set newPath to my modifyPath:thePath adding:"-new"
	set theResult to theString's writeToFile:newPath atomically:true encoding:(current application's NSUTF16BigEndianStringEncoding) |error|:(missing value)
	return {newPath, theResult as boolean}
	
end decodeFile:

set theSource to (choose file)
set {newPath, bof} to my decodeFile:(POSIX path of theSource)
set newPath to newPath as «class furl»
set openFile to open for access newPath with write permission
write «data rdatFEFF» to openFile starting at 0
close access openFile

TextWrangler and BBEdit open the resulting file flawlessly but alas, TextEdit crashes.
If I open with TextWrangler then save with an other name, the newly saved file opens flawlessly in TextEdit.
Puzzling isn’t it ?

Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) mardi 14 mars 2017 17:51:41

That’s right.

Actually, I just noticed this in Wikipedia, FWIW:

The documentation is wrong (it says the same thing about NSASCIIStringEncoding :mad:). I believe it was introduced in 10.4.

Mmm… :confused: How about this?

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

on modifyPath:thePath adding:addString
	set pathString to current application's NSString's stringWithString:thePath
	set theExtension to pathString's pathExtension()
	set thePathNoExt to pathString's stringByDeletingPathExtension()
	set newPath to (thePathNoExt's stringByAppendingString:addString)'s stringByAppendingPathExtension:theExtension
	return newPath as string
end modifyPath:adding:

on decodeFile:thePath
	-- Get the BOM value as a two-character string. (The single character id (254 * 256 + 255) gets lost in the conversion to NSString.)
	set theUTF16BEBOM to current application's NSString's stringWithString:(string id {254, 255})
	-- Convert it to two bytes of data.
	set theData to (theUTF16BEBOM's dataUsingEncoding:(current application's NSISOLatin1StringEncoding))'s mutableCopy()
	-- Read the contents of the ISO Latin 1 text file.
	set theString to current application's NSString's stringWithContentsOfFile:thePath encoding:(current application's NSISOLatin1StringEncoding) |error|:(missing value)
	-- Convert that to data too, but encoded as UTF-16 big-endian, and append it to the BOM data.
	tell theData to appendData:(theString's dataUsingEncoding:(current application's NSUTF16BigEndianStringEncoding))
	-- Write the lot to a new file.
	set newPath to my modifyPath:thePath adding:"-new"
	set theResult to theData's writeToFile:newPath atomically:true
	
	return {newPath, theResult as boolean}
end decodeFile:

set theSource to (choose file)
set {newPath, bof} to my decodeFile:(POSIX path of theSource)

Nice :slight_smile:

An alternative that might work would be create the BOM as a zero width no-break space. Unfortunately this only works in 10.11 and above:

set theString to current application's NSString's stringWithString:"\\N{ZERO WIDTH NO-BREAK SPACE}"
set theString to theString's stringByApplyingTransform:(current application's NSStringTransformToUnicodeName) |reverse|:true

You could then append the contents of the file to that string, and save. I don’t have a suitable sample to test it.

Thank you. :slight_smile: And for your previous reply.

That works for me if I convert both to data before appending them, as in my version above. But if I append them as NSStrings and save the result, the resulting file crashes anything that tries to open or read it. (Well. TextEdit and a ‘read (choose file) as Unicode text’’ script anyway.) Sounds similar to what Yvan was getting with his version. I’ll have another look in the morning (GMT).

Thank you Nigel and Shane.
I just ignored the way to define the string containing the BOM.

About the problem which I describe.
May it due to the fact that the system keep the fact that the late write operation applied to the file was a write «class data one in the file’s metadatas ?
When the late action write text data like the late Nigel’s proposal or the shell version, the metadatas would record that and so TextEdit is satisfied.

I must add that when I compare the hexadecimal contents of the different attempts, they are identical.

Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) mercredi 15 mars 2017 10:13:37

That’s right, for these kind of things I prefer to read the CoreFoundation frameworks rather than the Cocoa frameworks. CoreFoundation team wrote it and their documentation seems more accurate. At least it says that kCFStringEncodingUTF16BE is introduced in Mac OS 10.4+ while k‹CFString‹Encoding‹Unicode is since the first release of Mac OS X.

When stepping to the x86 architecture the native endianness changed which caused problems with UTF16 encoded files back then, and that was around 10.4. While the PowerPC could run in both endianness mode it ran in big endian for Macintosh systems, therefore UTF16 was big endian by default. When the Intel processor was introduced the native endianness was little endian and a lot of software had trouble reading the PPC written UTF16 files. Therefore to read PPC written UTF16 files that followed PPC’s native endianness on an x86 machine you could use the key kCFStringEncodingUTF16BE.

Yes, it’s possible something is being written as an extended attribute, and TextEdit might well look at that.

Yes, although kCFStringEncodingUTF16BE is not the same value as NSUTF16BigEndianStringEncoding, and it’s at least theoretically possible that the encoding was supported earlier in CoreFoundation – the transform constants are an example of that.

The problem, I suspect, is that mistakes are being made because some enums and constants are being renamed to a naming scheme that fits better with Swift.

Hi Yvan.

I’ve just been fooling around with your script in various ways.

  1. The crashing only occurs if the data has been written to the file. Not if it hasn’t or if something else of the same length been written instead.
  2. The crashing appears to be a system problem, rather than just TextEdit. Merely selecting the file in a ‘choose file’ dialog crashes the host application before the “OK” button can be clicked.
  3. A workaround seems to be to write the BOM as a short integer instead of as data. Fortunately, UTF-16 BOMs can be represented in this way.
use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

on modifyPath:thePath adding:addString
	set pathString to current application's NSString's stringWithString:thePath
	set theExtension to pathString's pathExtension()
	set thePathNoExt to pathString's stringByDeletingPathExtension()
	set newPath to (thePathNoExt's stringByAppendingString:addString)'s stringByAppendingPathExtension:theExtension
	return newPath --as text
end modifyPath:adding:

on decodeFile:thePath
	set theString to (current application's NSString's stringWithString:" ")
	set moreString to current application's NSString's stringWithContentsOfFile:thePath encoding:(current application's NSISOLatin1StringEncoding) |error|:(missing value)
	set theString to theString's stringByAppendingString:moreString
	set newPath to my modifyPath:thePath adding:"-new"
	set theResult to theString's writeToFile:newPath atomically:true encoding:(current application's NSUTF16BigEndianStringEncoding) |error|:(missing value)
	set newPath to newPath as text
	write -512 as short to (get POSIX file newPath) -- Will open, start at 1, and close anyway, since the file already exists.
	return {newPath, theResult as boolean}
end decodeFile:

set theSource to (choose file)
set {newPath, bof} to my decodeFile:(POSIX path of theSource)

Hi Shane.

I suspect that including the zero-width no-break space in the string that gets written to the file with NSUTF16BigEndianStringEncoding does something undesirable to that character. The encoding needs to be sorted out before the write, which is the idea behind my NSData approach. Your idea works well when plugged into that.

Yes, that makes sense.

Thanks Nigel.

It’s good to know that we may write the BOM as a number but I will stay with your version which does all the job with ASObjC.

Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) mercredi 15 mars 2017 16:07:39

With Shane’s string transform suggestion, you can either create separate data blocks from the BOM character and the string from the original file and then join the blocks, or append the string from the file to the BOM character and create a data block from the result. This version does the latter:

use AppleScript version "2.5" -- Mac OS 10.11 (El Capitan) or later.
use framework "Foundation"
use scripting additions

on modifyPath:thePath adding:addString
	set pathString to current application's NSString's stringWithString:thePath
	set theExtension to pathString's pathExtension()
	set thePathNoExt to pathString's stringByDeletingPathExtension()
	set newPath to (thePathNoExt's stringByAppendingString:addString)'s stringByAppendingPathExtension:theExtension
	return newPath as string
end modifyPath:adding:

on decodeFile:thePath
	-- Create a character with the same Unicode value as a UTF-16 BE BOM.
	set theUTF16BEBOM to current application's NSString's stringWithString:"\\N{ZERO WIDTH NO-BREAK SPACE}"
	set theUTF16BEBOM to theUTF16BEBOM's stringByApplyingTransform:(current application's NSStringTransformToUnicodeName) |reverse|:true
	-- Read the contents of the ISO Latin 1 text file and append it to the BOM.
	set theString to current application's NSString's stringWithContentsOfFile:thePath encoding:(current application's NSISOLatin1StringEncoding) |error|:(missing value)
	set theString to theUTF16BEBOM's stringByAppendingString:theString
	-- Convert the result to data, encoded as UTF-16 big-endian.
	set theData to theString's dataUsingEncoding:(current application's NSUTF16BigEndianStringEncoding)
	-- Write the data to a new file.
	set newPath to my modifyPath:thePath adding:"-new"
	set theResult to theData's writeToFile:newPath atomically:true
	
	return {newPath, theResult as boolean}
end decodeFile:

set theSource to (choose file)
set {newPath, bof} to my decodeFile:(POSIX path of theSource)

And since it’s proved an intesting area for exploration, here’s a version which writes the BOM and the text to the new file separately:

use AppleScript version "2.5" -- Mac OS 10.11 (El Capitan) or later.
use framework "Foundation"
use scripting additions

on modifyPath:thePath adding:addString
	set pathString to current application's NSString's stringWithString:thePath
	set theExtension to pathString's pathExtension()
	set thePathNoExt to pathString's stringByDeletingPathExtension()
	set newPath to (thePathNoExt's stringByAppendingString:addString)'s stringByAppendingPathExtension:theExtension
	return newPath as string
end modifyPath:adding:

on decodeFile:thePath
	-- Create a character with the same Unicode value as a UTF-16 BE BOM.
	set theUTF16BEBOM to current application's NSString's stringWithString:"\\N{ZERO WIDTH NO-BREAK SPACE}"
	set theUTF16BEBOM to theUTF16BEBOM's stringByApplyingTransform:(current application's NSStringTransformToUnicodeName) |reverse|:true
	-- Convert it data.
	set BOMData to theUTF16BEBOM's dataUsingEncoding:(current application's NSUTF16BigEndianStringEncoding)
	-- Read the contents of the ISO Latin 1 text file.
	set theString to current application's NSString's stringWithContentsOfFile:thePath encoding:(current application's NSISOLatin1StringEncoding) |error|:(missing value)
	-- Convert that to data too, encoded as UTF-16 big-endian.
	set stringData to theString's dataUsingEncoding:(current application's NSUTF16BigEndianStringEncoding)
	-- Create a new file.
	set theResult to false
	set newPath to my modifyPath:thePath adding:"-new"
	tell current application's NSFileManager's defaultManager() to createFileAtPath:newPath |contents|:(missing value) attributes:(missing value)
	-- Open it for access with write permission, write the two blocks of data to it, and close it again.
	set fileAccess to current application's NSFileHandle's fileHandleForWritingAtPath:newPath
	try
		tell fileAccess to writeData:BOMData
		tell fileAccess to writeData:stringData
		set theResult to true
	end try
	tell fileAccess to closeFile()
	
	return {newPath, theResult as boolean}
end decodeFile:

set theSource to (choose file)
set {newPath, bof} to my decodeFile:(POSIX path of theSource)

Thanks Nigel

I’m scrapping my head.
Isn’t it a function allowing ASObjC to append datas to the already written ones ?

Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) mercredi 15 mars 2017 19:24:51

Thanks, Shane. I’ve now sorted that out.

I’m not sure if you’re asking if one exists or if that’s what I’ve used (in the second script in post #37).

As far as I can see, NSString’s and NSData’s writeToFile methods either create files containing just the given material or completely replace the contents of existing files. They don’t have methods for editing files in-place.

The NSFileHandle class seems to be the equivalent of the file system object created by open for access in the StandardAdditions, but with a few differences in the way it’s scripted. The significant differences here are:

  1. Files which don’t already exist have to be explicitly created first. I’ve used NSFileManager for this. (Files which do already exist are effectually emptied if created again with NSFileManager.)
  2. NSFileHandle only writes NSData objects. (According to the documentation, anyway. I haven’t put it to the test.)

Otherwise, as with write in the StandardAdditions, each successive write starts where the previous one ended unless something is done to change the file handle’s file pointer.