Change file encoding of TextEdit by Applescript

Hi all

I’d like to change the TextEdit encodings in the TextEdit preferences with a script. E.g. from ‘Automatic’ to ‘UTF-8’. What I have is as follows:

tell application “TextEdit” to activate
tell application “System Events”
tell process “TextEdit”
keystroke “,” using {command down}
key code 124
delay 2
repeat 6 times
keystroke tab
end repeat
key code 125
delay 2

	-- ???
	-- ???
	
end tell

end tell

The script opens the window where I can choose the different encodings. But I cannot specify and select the different lines. Any tip or hint?

Thanks
Lazy

Hi,

workaround


tell application "TextEdit"
	set theCache to text of document 1
	close document 1 saving no
	quit
end tell
do shell script "defaults write com.apple.TextEdit PlainTextEncodingForWrite -int 4"
do shell script "defaults write com.apple.TextEdit PlainTextEncoding -int 4"
tell application "TextEdit"
	activate
	set text of document 1 to theCache
end tell

you can retrieve the enumeration values for the encodings by making the settings manually, quit TextEdit.app and look into the Preference file

Hi Stefan

Thanks for the quick reply.

-int 4 is for UTF-8. What’s the code for ‘Automatic’, ‘UTF-16’, Western Mac OS and Western Windows?

Thanks in advance

Lazy

I would like to have an AppleScript that converts the encoding of a text file from UTF-8 into UTF-16, anyone?
I’m using FileMaker and for some reason it wants to have UTF-16 - my scandinavian letters ÅÄÖ won’t show up correctly if I import a UTF-8 into FileMaker.

Hello!

Have a look at iconv in Terminal, that you can call with a do shell script from AppleScript.

Try man iconv, to read about it, iconv --list lists the different encodings in your case I think the incantation should look something like this:

do shell script "iconv -f UTF-8 -t UTF-16 /Users/you/your/path/To/file/you/want/to/encode >/path/to/Encoded/File/you/Want/to/end/up/with")

And please tell the Filemaker guys about your problem! :slight_smile:

To extend McUsr’s info (I’ve finished a 24 page documentation about character encoding last week):

UTF-16 is little endian and iconv adds a BOM
UTF-16LE is little endian and iconv won’t add a BOM
UTF-16BE is big endian and iconv won’t add a bom
There is no support for UTF-16 Big endian with BOM in iconv.

If you need UTF-16 with big endians and a BOM create a file and add byte 254 and 255 en then let iconv add the converted data to the file.

do shell script "xxd -p -r <<< xfeff > UTF-16FileWithBOM.txt"
do shell script "iconv -t UTF-16BE UTF-8File.txt >> UTF-16FileWithBOM.txt"

I don’t use the -f option because the shell is already UTF-8 and iconv recognizes it.

Because most applications relies on the iconv libraries, like cocoa text system, you need an application with it’s own character encoding libraries. Therefore the results of iconv are identical to cocoa text system, so iconv’s results are the same as Texteditor. Character encoding support is way better in Word than in cocoa’s text system unfortunately that doesn’s count for all Microsoft Office packages because Excel has very poor unicode support, even if it’s listed down in the supported character encoding list. Latest version(s) Word support only UTF-16 with BOM when the file is saved but can open all 4 different UTF-16 types of files.

edit: for the curios ones

So you know why Textedit and iconv doesn’t support UTF-16BE with BOM.

thanks for the replies!

My AppleScript skills are very limited so I don’t quite know how to make the script.
The script you sent me McUsrII doesn’t quite work, it doesn’t change the encoding, only removes the content of the txt file.
I’m trying your script DJ Bazzie Wazzie as well - can make it work…

I don’t know if FileMaker needs UTF-16 with big or little endian - maybe it doesn’t matter?

Endianness, for UTF-16, means how you can store multiple bytes to store higher numbers. As humans we combine numbers as well because after the number 9 we write the next number down as 10. This notation means big endian while for certain CPUs it can be faster to store number the other way around and you will store number ten as 01. So for bytes the number 261 will be stored as 0x0104 in big endian notation while it will be stored as 0x0401 in little endian. The best way to remind the difference is that big endian is similar to human notation.

Most of the time when the endian is wrong you will see all sort of Vietnamese symbols when opening the file

for example unicode character 100 will be stored as 0x0064 or 0x6400. When using the wrong endian the number (unicode character) 100 will be interpreted as an number (unicode character) 25600 which is character 𥘀.

DJ Bazzie Wazzie,

would it be possible for you to write a script that converts a .txt file from UTF-8 into UTF-16 (big or/and little endian)?

I would just try to import a txt file with a UTF-16, big or little endian, and see if it works in FileMaker…

:slight_smile:

I am surprised that bare utf-16 is little endian. :slight_smile: And I’d love to read your paper, if you care to share.

@ToBeJazz:

Try DJ Bazzie Wazzies shell commands in post 7, then import it into Filemaker manually, and see if it gives the results you need. If you get Japanese/Chinese or Korean as a result, then you are having the wrong endianess! :slight_smile:

set theFile to POSIX path of (choose file)
set encoding to choose from list {"UTF-16 Big Endian", "UTF-16 Big Endian + Bom", "UTF-16 Little Endian", "UTF-16 Little Endian + Bom"}

if encoding is false then
	return --nothing selectedor pressed cancel
else
	set encoding to encoding as string
end if

set newFileName to (do shell script "str=" & quoted form of theFile & ";echo ${str%.*}") & "_iconv.txt"

if encoding contains "Big Endian" then
	set enc to "UTF-16BE"
	set cmd to ""
	if encoding contains "+ Bom" then set cmd to "xxd -p -r <<< xfeff "
	do shell script cmd & " > " & quoted form of newFileName
else
	set enc to "UTF-16LE"
	set cmd to ""
	if encoding contains "+ Bom" then set cmd to "xxd -p -r <<< xfffe"
	do shell script cmd & "> " & quoted form of newFileName
end if

do shell script "iconv -f UTF-8 -t " & enc & space & quoted form of theFile & " >> " & quoted form of newFileName

Hey that’s a great script DJ Bazzie Wazzie - thanks alot!

When I choose “UTF-16 Big Endian + Bom” it does just what I’m looking for.
Let’s see if I can make a script that has no encoding options and no choose file option as well - I want a specific script without any options to be run from FileMaker so that I can import a text file that was originally in UTF-8.

You’re welcome!

Great! And sad at the same time. It’s the only encoding that iconv and cocoa text system can’t write to a file.

I’ve changed my first post (cleaned up the mess I’ve made) and that example code should help you. I also noticed that echo -e -n doesn’t quite work as good as in the terminal. Haven’t figured out what exactly goes wrong there but xxd does it job very well, in Terminal as in do shell script.

Sorry I can’t use your short example script, don’t know how to change it…
I did begin to shorten your longer script though:

set theFile to POSIX path of (choose file)
set encoding to "UTF-16 Big Endian + Bom" as string
set newFileName to (do shell script "str=" & quoted form of theFile & ";echo ${str%.*}") & "_iconv.txt"

set enc to "UTF-16BE"
set cmd to "xxd -p -r <<< xfeff "
do shell script cmd & " > " & quoted form of newFileName

do shell script "iconv -f UTF-8 -t " & enc & space & quoted form of theFile & " >> " & quoted form of newFileName

Do really know what you mean by “iconv and cocoa text system can’t write to a file.” I have no problem using the above script, it does what it is supposed to do.

Sorry for the confusing up here… the script is working but I’m helping iconv to startup because it can’t write the BOM on it’s own. With cocoa text system, including Texteditor, it’s impossible to save the file in a proper way.

You mean something like this?

set theFile to POSIX path of (choose file)
set newFileName to (do shell script "str=" & quoted form of theFile & ";echo ${str%.*}") & "_iconv.txt"

do shell script "xxd -p -r <<< xfeff > " & quoted form of newFileName
do shell script "iconv -f UTF-8 -t UTF-16BE " & space & quoted form of theFile & " >> " & quoted form of newFileName

Yes, that’s short and nice I think.
Next for me is to get rid of the choose file thing and point directly to a file, but at least that I should be able to do myself:)

IMHO The people behind FileMaker should receive a copy of this thread.

And there seem to lack the fine print regarding the do shell script too. It is obviously interpreting stuff, and it would have been nice, if they specified exactly how input and output from the do shell script command is treated/translated. Because it isn’t much we can do about it. I mean, stty settings doesn’t work, when you don’t have a terminal…

I read this topic for a solution but it’s not working for me because if source file in UTF-8 without BOM then encoded file goes with error.
In my case I need to import .csv file into excel, but some characters imports in ISO-8859-1, so the solution is to encode file to UTF-16LE with BOM.

I tried to add BOM into UTF-8 first and then encode it to UTF-16 with BOM, and it works, but there are two steps, two encoded files and I don’t enjoy it.
Then I found a solution that works for me, so I’d like to share my experience:

In terminal I found similar command called “uconv” but it’s not available direct in shell (command not found error), so I should link to path:

on run {input, parameters}
	
	set theFile to POSIX path of input --source file
	set endFileName to (do shell script "str=" & quoted form of theFile & ";echo ${str%.*}") & "_b.csv" --temp file
	
	do shell script "/opt/local/bin/uconv -s -f UTF-8 -t UTF-16LE --add-signature < " & quoted form of theFile & " > " & quoted form of endFileName --uconv silent from utf-8 to utf-16 little endian with bom from source file to temp file
	do shell script "mv " & quoted form of endFileName & space & quoted form of theFile --replace source file by temp file

	return input
end run

This code works with file from input, encode it from UTF-8 to UTF-16 Little Endian with BOM (–add-signature for that) and replace source file by new one.
Use man uconv, to read about it, uconv --list lists the different encodings.

Hi BullyBu. Welcome to MacScripter and thanks for posting your own solution to this topic.

There’s no “/opt” folder on my machine. Were it and its contents installed on yours by some third-party software?