Converting files from UTF-16 to UTF-8

In a nutshell, Net Barrier writes logs (.txt) to file in unicode (UTF-16). I would like to set up a folder action that will automatically convert any new files to unicode (UTF-8). The reason: I am extracting text using GREP and it does not work on UTF-16. I’ve done a search and I cannot find anything that I can understand. Any suggestions and links would be appreciated.

I won’t claim to understand all of that encoding stuff but I searched my email archives from the applescript-users list and found some stuff related to the topic. Here’s my lame attempt to stitch it together. I hope it works but I won’t stake my next meal on it. :wink:

set x to read file "path:to:utf16 file"

try
	set f to open for access file "path:to:new file" with write permission
	write x to f as «class utf8»
	try
		close access f
	end try
on error
	try
		close access f
	end try
end try

If this works then it won’t take much to wrap it in a folder action handler.
– Rob

Thanks Rob,

Your code worked perfectly if specific paths are used. My application of this code requires that I use the most:recent:file:added:to:the:folder:at:a:specific:path :slight_smile: I manged to find a work around using OmcEdit: nifty little command line app. Many thanks for your effort.

It’s entirely possible to make the script more generic and wrap it in a folder action handler. The script was simply a test to see if it produced the correct output. Let us know if you’d like to go further with it. You might also use your command line app in a folder action if you need an automated solution.

– Rob

Rob,
Thanks for the offer. I’ve always appreciated the help I get from the programmers at this BBS. And this is no acception. I’m going to see how my solution works out. If it doesn’t you’ll definitely ‘hear’ from me. Thanks again.

As long as I know (I’ve been playing a bit with UTF files), Unicode files (UTF-16) usually start with two bytes, which are a flag of LE or BE (hex FEFF or FFFE). And, after that little “header”, every character is composed by two bytes. Eg, the word “test” is ASCII 0 + t + ASCII 0 + e + ASCII 0 + s + ASCII 0 + t.
So, if you simply “read” the file, you will get such extra info. To read a well-formatted UTF-16 file, you should:

read "path:to:file" as Unicode text

And AppleScript will return only the related string (“test”), without header and double-bytes. So:

set theFile to alias "path:to:file"
set fref to (open for access theFile with write permission)
set oldContents to (read fref as Unicode text) --> «class utxt» = UTF-16
set eof of fref to 0
write oldContents to fref as «class utf8»
close access fref

If there is not header in the file (BE or LE), it will be correctly interpreted anyway.

Thanks, JJ. I was wondering if there was more to it than a simple, unqualified read/write.

– Rob