Change file encoding of TextEdit by Applescript

set theFile to POSIX path of (choose file)
set encoding to choose from list {"UTF-16 Big Endian", "UTF-16 Big Endian + Bom", "UTF-16 Little Endian", "UTF-16 Little Endian + Bom"}

if encoding is false then
	return --nothing selectedor pressed cancel
else
	set encoding to encoding as string
end if

set newFileName to (do shell script "str=" & quoted form of theFile & ";echo ${str%.*}") & "_iconv.txt"

if encoding contains "Big Endian" then
	set enc to "UTF-16BE"
	set cmd to ""
	if encoding contains "+ Bom" then set cmd to "xxd -p -r <<< xfeff "
	do shell script cmd & " > " & quoted form of newFileName
else
	set enc to "UTF-16LE"
	set cmd to ""
	if encoding contains "+ Bom" then set cmd to "xxd -p -r <<< xfffe"
	do shell script cmd & "> " & quoted form of newFileName
end if

do shell script "iconv -f UTF-8 -t " & enc & space & quoted form of theFile & " >> " & quoted form of newFileName

Hey that’s a great script DJ Bazzie Wazzie - thanks alot!

When I choose “UTF-16 Big Endian + Bom” it does just what I’m looking for.
Let’s see if I can make a script that has no encoding options and no choose file option as well - I want a specific script without any options to be run from FileMaker so that I can import a text file that was originally in UTF-8.

You’re welcome!

Great! And sad at the same time. It’s the only encoding that iconv and cocoa text system can’t write to a file.

I’ve changed my first post (cleaned up the mess I’ve made) and that example code should help you. I also noticed that echo -e -n doesn’t quite work as good as in the terminal. Haven’t figured out what exactly goes wrong there but xxd does it job very well, in Terminal as in do shell script.

Sorry I can’t use your short example script, don’t know how to change it…
I did begin to shorten your longer script though:

set theFile to POSIX path of (choose file)
set encoding to "UTF-16 Big Endian + Bom" as string
set newFileName to (do shell script "str=" & quoted form of theFile & ";echo ${str%.*}") & "_iconv.txt"

set enc to "UTF-16BE"
set cmd to "xxd -p -r <<< xfeff "
do shell script cmd & " > " & quoted form of newFileName

do shell script "iconv -f UTF-8 -t " & enc & space & quoted form of theFile & " >> " & quoted form of newFileName

Do really know what you mean by “iconv and cocoa text system can’t write to a file.” I have no problem using the above script, it does what it is supposed to do.

Sorry for the confusing up here… the script is working but I’m helping iconv to startup because it can’t write the BOM on it’s own. With cocoa text system, including Texteditor, it’s impossible to save the file in a proper way.

You mean something like this?

set theFile to POSIX path of (choose file)
set newFileName to (do shell script "str=" & quoted form of theFile & ";echo ${str%.*}") & "_iconv.txt"

do shell script "xxd -p -r <<< xfeff > " & quoted form of newFileName
do shell script "iconv -f UTF-8 -t UTF-16BE " & space & quoted form of theFile & " >> " & quoted form of newFileName

Yes, that’s short and nice I think.
Next for me is to get rid of the choose file thing and point directly to a file, but at least that I should be able to do myself:)

IMHO The people behind FileMaker should receive a copy of this thread.

And there seem to lack the fine print regarding the do shell script too. It is obviously interpreting stuff, and it would have been nice, if they specified exactly how input and output from the do shell script command is treated/translated. Because it isn’t much we can do about it. I mean, stty settings doesn’t work, when you don’t have a terminal…

I read this topic for a solution but it’s not working for me because if source file in UTF-8 without BOM then encoded file goes with error.
In my case I need to import .csv file into excel, but some characters imports in ISO-8859-1, so the solution is to encode file to UTF-16LE with BOM.

I tried to add BOM into UTF-8 first and then encode it to UTF-16 with BOM, and it works, but there are two steps, two encoded files and I don’t enjoy it.
Then I found a solution that works for me, so I’d like to share my experience:

In terminal I found similar command called “uconv” but it’s not available direct in shell (command not found error), so I should link to path:

on run {input, parameters}
	
	set theFile to POSIX path of input --source file
	set endFileName to (do shell script "str=" & quoted form of theFile & ";echo ${str%.*}") & "_b.csv" --temp file
	
	do shell script "/opt/local/bin/uconv -s -f UTF-8 -t UTF-16LE --add-signature < " & quoted form of theFile & " > " & quoted form of endFileName --uconv silent from utf-8 to utf-16 little endian with bom from source file to temp file
	do shell script "mv " & quoted form of endFileName & space & quoted form of theFile --replace source file by temp file

	return input
end run

This code works with file from input, encode it from UTF-8 to UTF-16 Little Endian with BOM (–add-signature for that) and replace source file by new one.
Use man uconv, to read about it, uconv --list lists the different encodings.

Hi BullyBu. Welcome to MacScripter and thanks for posting your own solution to this topic.

There’s no “/opt” folder on my machine. Were it and its contents installed on yours by some third-party software?

I didn’t found uconv on my machine.
The job may be done with iconv.

set theFile to POSIX path of (choose file)

set newFileName to (do shell script "str=" & quoted form of theFile & ";echo ${str%.*}") & "_iconv.txt"

set enc to "UTF-16BE"
set cmd to "xxd -p -r <<< xfeff "
do shell script cmd & " > " & quoted form of newFileName # write the BOM : FE FF in the new file

do shell script "iconv -f UTF-8 -t " & enc & space & quoted form of theFile & " >> " & quoted form of newFileName # write the UTF16-BE encoded text after the BOM

I tried to play with ASObjC but I’m puzzled.

In Xcode Help I read :
[format]NSUTF16BigEndianStringEncoding
NSUTF16StringEncoding encoding with explicit endianness specified.[/format]
My understanding was that using this encoding I will get a file with the Big Endian BOM at beginning.
Alas I was wrong.

The code (most of which was borrowed to Shane STANLEY) used is :

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

on modifyPath:thePath adding:addString
	set pathString to current application's NSString's stringWithString:thePath
	set theExtension to pathString's pathExtension()
	set thePathNoExt to pathString's stringByDeletingPathExtension()
	set newPath to (thePathNoExt's stringByAppendingString:addString)
	if theExtension's |length|() > 0 then
		set newPath to newPath's stringByAppendingPathExtension:theExtension
	end if
	return newPath as string
end modifyPath:adding:

on decodeFile:thePath
   set theString to current application's NSString's stringWithContentsOfFile:thePath encoding:(current application's NSISOLatin1StringEncoding) |error|:(missing value)
   set newPath to my modifyPath:thePath adding:"-new"
   set theResult to theString's writeToFile:newPath atomically:true encoding:(current application's NSUTF16BigEndianStringEncoding) |error|:(missing value)
   return theResult as boolean
end decodeFile:

set theSource to (choose file)
my decodeFile:(POSIX path of theSource)

Is there something wrong in it or am I wrongly understanding what applying NSUTF16BigEndianStringEncoding is supposed to do ?

Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) mardi 14 mars 2017 15:56:27

Hi Yvan.

I think “with explicit endianness specified” is just an explanation that the enum NSUTF16BigEndianStringEncoding is used to specify explicitly that the text is to be saved with UTF-16 big-endian encoding, not with the endianness native to the machine. It’s an explicit instruction to writeToFile rather than an instruction to include an explicit BOM in the file. Maybe Shane will confirm this when he gets up.

According the the Xcode documentation, this enum was only introduced with MacOS 10.12, but it works on my 10.11 system.

iconv command line util works with stdin and stdout, meaning you can pipe it directly from one encoding to another without the need of creating additional temporary files.

Great to see other solutions even with third party command line utils :cool: I’m still just curious what went wrong with the code above in post #12.

Thanks Nigel.

I tried to use an awful scheme to insert the BOM.

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

on modifyPath:thePath adding:addString
	set pathString to current application's NSString's stringWithString:thePath
	set theExtension to pathString's pathExtension()
	set thePathNoExt to pathString's stringByDeletingPathExtension()
	set newPath to (thePathNoExt's stringByAppendingString:addString)
	if theExtension's |length|() > 0 then
		set newPath to newPath's stringByAppendingPathExtension:theExtension
	end if
	return newPath as string
end modifyPath:adding:

on decodeFile:thePath
	set theString to (current application's NSString's stringWithString:" ")
	set moreString to current application's NSString's stringWithContentsOfFile:thePath encoding:(current application's NSISOLatin1StringEncoding) |error|:(missing value)
	set theString to theString's stringByAppendingString:moreString
	set newPath to my modifyPath:thePath adding:"-new"
	set theResult to theString's writeToFile:newPath atomically:true encoding:(current application's NSUTF16BigEndianStringEncoding) |error|:(missing value)
	return {newPath, theResult as boolean}
	
end decodeFile:

set theSource to (choose file)
set {newPath, bof} to my decodeFile:(POSIX path of theSource)
set newPath to newPath as «class furl»
set openFile to open for access newPath with write permission
write «data rdatFEFF» to openFile starting at 0
close access openFile

TextWrangler and BBEdit open the resulting file flawlessly but alas, TextEdit crashes.
If I open with TextWrangler then save with an other name, the newly saved file opens flawlessly in TextEdit.
Puzzling isn’t it ?

Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) mardi 14 mars 2017 17:51:41

That’s right.

Actually, I just noticed this in Wikipedia, FWIW:

The documentation is wrong (it says the same thing about NSASCIIStringEncoding :mad:). I believe it was introduced in 10.4.

Mmm… :confused: How about this?

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

on modifyPath:thePath adding:addString
	set pathString to current application's NSString's stringWithString:thePath
	set theExtension to pathString's pathExtension()
	set thePathNoExt to pathString's stringByDeletingPathExtension()
	set newPath to (thePathNoExt's stringByAppendingString:addString)'s stringByAppendingPathExtension:theExtension
	return newPath as string
end modifyPath:adding:

on decodeFile:thePath
	-- Get the BOM value as a two-character string. (The single character id (254 * 256 + 255) gets lost in the conversion to NSString.)
	set theUTF16BEBOM to current application's NSString's stringWithString:(string id {254, 255})
	-- Convert it to two bytes of data.
	set theData to (theUTF16BEBOM's dataUsingEncoding:(current application's NSISOLatin1StringEncoding))'s mutableCopy()
	-- Read the contents of the ISO Latin 1 text file.
	set theString to current application's NSString's stringWithContentsOfFile:thePath encoding:(current application's NSISOLatin1StringEncoding) |error|:(missing value)
	-- Convert that to data too, but encoded as UTF-16 big-endian, and append it to the BOM data.
	tell theData to appendData:(theString's dataUsingEncoding:(current application's NSUTF16BigEndianStringEncoding))
	-- Write the lot to a new file.
	set newPath to my modifyPath:thePath adding:"-new"
	set theResult to theData's writeToFile:newPath atomically:true
	
	return {newPath, theResult as boolean}
end decodeFile:

set theSource to (choose file)
set {newPath, bof} to my decodeFile:(POSIX path of theSource)

Nice :slight_smile:

An alternative that might work would be create the BOM as a zero width no-break space. Unfortunately this only works in 10.11 and above:

set theString to current application's NSString's stringWithString:"\\N{ZERO WIDTH NO-BREAK SPACE}"
set theString to theString's stringByApplyingTransform:(current application's NSStringTransformToUnicodeName) |reverse|:true

You could then append the contents of the file to that string, and save. I don’t have a suitable sample to test it.

Thank you. :slight_smile: And for your previous reply.

That works for me if I convert both to data before appending them, as in my version above. But if I append them as NSStrings and save the result, the resulting file crashes anything that tries to open or read it. (Well. TextEdit and a ‘read (choose file) as Unicode text’’ script anyway.) Sounds similar to what Yvan was getting with his version. I’ll have another look in the morning (GMT).

Thank you Nigel and Shane.
I just ignored the way to define the string containing the BOM.

About the problem which I describe.
May it due to the fact that the system keep the fact that the late write operation applied to the file was a write «class data one in the file’s metadatas ?
When the late action write text data like the late Nigel’s proposal or the shell version, the metadatas would record that and so TextEdit is satisfied.

I must add that when I compare the hexadecimal contents of the different attempts, they are identical.

Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) mercredi 15 mars 2017 10:13:37

That’s right, for these kind of things I prefer to read the CoreFoundation frameworks rather than the Cocoa frameworks. CoreFoundation team wrote it and their documentation seems more accurate. At least it says that kCFStringEncodingUTF16BE is introduced in Mac OS 10.4+ while k‹CFString‹Encoding‹Unicode is since the first release of Mac OS X.

When stepping to the x86 architecture the native endianness changed which caused problems with UTF16 encoded files back then, and that was around 10.4. While the PowerPC could run in both endianness mode it ran in big endian for Macintosh systems, therefore UTF16 was big endian by default. When the Intel processor was introduced the native endianness was little endian and a lot of software had trouble reading the PPC written UTF16 files. Therefore to read PPC written UTF16 files that followed PPC’s native endianness on an x86 machine you could use the key kCFStringEncodingUTF16BE.

Yes, it’s possible something is being written as an extended attribute, and TextEdit might well look at that.