Hi had a folder which contains bunch of “.txt” files. Some files with (utf-8) format and some files with (utf-16) format. I need a Applescript to find out the (utf-8) fomat files and to convert it to (utf-16) format.
I got the below script from this forum. But it save as a new file and adding “_incov” to the file name. But I want to change the format in the original file itself.
set thefile to POSIX path of (choose file)
set newFileName to (do shell script "str=" & quoted form of thefile & ";echo ${str%.*}") & "_iconv.txt"
do shell script "xxd -p -r <<< xfeff > " & quoted form of newFileName
do shell script "iconv -f UTF-8 -t UTF-16BE " & space & quoted form of thefile & " >> " & quoted form of newFileName
UTF-8 is popular because it’s small and supports single byte characters (7-bits ASCII). Everything else UTF-16 is superior to UTF-8, like counting characters.
I’m not sure why the OP wants big-endian, but otherwise I’d suggest:
on open fileList
repeat with aFile in fileList
try
set theText to read aFile as «class utf8»
-- if we got here, it's UTF-8, so write it as UTF-16
write theText to aFile as Unicode text
on error
-- it's not UTF-8, so nothing to do
end try
end repeat
end open
use scripting additions
use framework "Foundation"
on open fileList
repeat with aFile in fileList
set thePath to POSIX path of aFile
set theNSString to (current application's NSString's stringWithContentsOfFile:thePath encoding:(current application's NSUTF8StringEncoding) |error|:(missing value))
if theNSString is not missing value then -- it was UTF-8
(theNSString's writeToFile:thePath atomically:true encoding:(current application's NSUTF16BigEndianStringEncoding) |error|:(missing value))
end if
end repeat
end open
The endianness of an file works best if it matches the processor’s endianness (read: no byte shifts are required for each character). So therefore I think the file is used for another architecture than his Mac.
edit: Fun Fact, the PPC architecture is one of the few processors architectures that supports different endian modi on the fly. Reading an UTF-16BE or UTF-16LE using PPC Macs should make no difference while on the Intel Macs it does make a difference.
There is also that, that some software, only deals with unicode text, so you have to convert the text to unicode before importing it. Software like this, is often software that runs on several platforms, and probably takes this approach in order to keep the versions for the different platforms compatible with each other.
You assume of course that the UTF-16 code will be the same length or longer than the UTF-8, which is probably a fairly safe bet.
The uncredited code the OP posted also writes a big-endian BOM to the file. (Two, in fact!) This version of yours writes a BOM too:
on open fileList
repeat with aFile in fileList
set fRef to (open for access aFile with write permission)
try
set txt to (read fRef as «class utf8»)
-- if we got here, it's UTF-8, so write it as UTF-16, big-endian with a BOM.
set eof fRef to 0
write «data rdatFEFF» to fRef
write txt to fRef as Unicode text
on error
-- It's not UTF-8, so it's probably UTF-16 already.
end try
close access fRef
end repeat
end open
#===== Handler borrowed from Regulus6633 - http://macscripter.net/viewtopic.php?id=36861
on writeTo(targetFile, theData, dataType, apendData)
-- targetFile is the path to the file you want to write
-- theData is the data you want in the file.
-- dataType is the data type of theData and it can be text, list, record etc.
-- apendData is true to append theData to the end of the current contents of the file or false to overwrite it
try
set targetFile to targetFile as text
set openFile to open for access file targetFile with write permission
if not apendData then set eof of openFile to 0
write theData to openFile starting at eof as dataType
close access openFile
return true
on error errMsg number errNbr
log "errNbr #" & errNbr & " " & errMsg
try
close access file targetFile
end try
return false
end try
end writeTo
#=====
Some days ago, I had to store datas in which some eastern characters may appear.
I’m not sure what you’re saying doesn’t work, but your second script has ‘starting at eof’ missing from the ‘write’ line, so the write’s starting at the beginning of the file.
Less importantly, the direct parameter of ‘set eof’ should be a file or open-file reference, not ‘of’ and a reference.
if not apendData then set eof fRef to 0
write _text to fRef as «class utf8» starting at eof