I didn’t found uconv on my machine.
The job may be done with iconv.
set theFile to POSIX path of (choose file)
set newFileName to (do shell script "str=" & quoted form of theFile & ";echo ${str%.*}") & "_iconv.txt"
set enc to "UTF-16BE"
set cmd to "xxd -p -r <<< xfeff "
do shell script cmd & " > " & quoted form of newFileName # write the BOM : FE FF in the new file
do shell script "iconv -f UTF-8 -t " & enc & space & quoted form of theFile & " >> " & quoted form of newFileName # write the UTF16-BE encoded text after the BOM
I tried to play with ASObjC but I’m puzzled.
In Xcode Help I read :
[format]NSUTF16BigEndianStringEncoding
NSUTF16StringEncoding encoding with explicit endianness specified.[/format]
My understanding was that using this encoding I will get a file with the Big Endian BOM at beginning.
Alas I was wrong.
The code (most of which was borrowed to Shane STANLEY) used is :
use AppleScript version "2.4"
use framework "Foundation"
use scripting additions
on modifyPath:thePath adding:addString
set pathString to current application's NSString's stringWithString:thePath
set theExtension to pathString's pathExtension()
set thePathNoExt to pathString's stringByDeletingPathExtension()
set newPath to (thePathNoExt's stringByAppendingString:addString)
if theExtension's |length|() > 0 then
set newPath to newPath's stringByAppendingPathExtension:theExtension
end if
return newPath as string
end modifyPath:adding:
on decodeFile:thePath
set theString to current application's NSString's stringWithContentsOfFile:thePath encoding:(current application's NSISOLatin1StringEncoding) |error|:(missing value)
set newPath to my modifyPath:thePath adding:"-new"
set theResult to theString's writeToFile:newPath atomically:true encoding:(current application's NSUTF16BigEndianStringEncoding) |error|:(missing value)
return theResult as boolean
end decodeFile:
set theSource to (choose file)
my decodeFile:(POSIX path of theSource)
Is there something wrong in it or am I wrongly understanding what applying NSUTF16BigEndianStringEncoding is supposed to do ?
Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) mardi 14 mars 2017 15:56:27
I think “with explicit endianness specified” is just an explanation that the enum NSUTF16BigEndianStringEncoding is used to specify explicitly that the text is to be saved with UTF-16 big-endian encoding, not with the endianness native to the machine. It’s an explicit instruction to writeToFile rather than an instruction to include an explicit BOM in the file. Maybe Shane will confirm this when he gets up.
According the the Xcode documentation, this enum was only introduced with MacOS 10.12, but it works on my 10.11 system.
iconv command line util works with stdin and stdout, meaning you can pipe it directly from one encoding to another without the need of creating additional temporary files.
Great to see other solutions even with third party command line utils I’m still just curious what went wrong with the code above in post #12.
use AppleScript version "2.4"
use framework "Foundation"
use scripting additions
on modifyPath:thePath adding:addString
set pathString to current application's NSString's stringWithString:thePath
set theExtension to pathString's pathExtension()
set thePathNoExt to pathString's stringByDeletingPathExtension()
set newPath to (thePathNoExt's stringByAppendingString:addString)
if theExtension's |length|() > 0 then
set newPath to newPath's stringByAppendingPathExtension:theExtension
end if
return newPath as string
end modifyPath:adding:
on decodeFile:thePath
set theString to (current application's NSString's stringWithString:" ")
set moreString to current application's NSString's stringWithContentsOfFile:thePath encoding:(current application's NSISOLatin1StringEncoding) |error|:(missing value)
set theString to theString's stringByAppendingString:moreString
set newPath to my modifyPath:thePath adding:"-new"
set theResult to theString's writeToFile:newPath atomically:true encoding:(current application's NSUTF16BigEndianStringEncoding) |error|:(missing value)
return {newPath, theResult as boolean}
end decodeFile:
set theSource to (choose file)
set {newPath, bof} to my decodeFile:(POSIX path of theSource)
set newPath to newPath as «class furl»
set openFile to open for access newPath with write permission
write «data rdatFEFF» to openFile starting at 0
close access openFile
TextWrangler and BBEdit open the resulting file flawlessly but alas, TextEdit crashes.
If I open with TextWrangler then save with an other name, the newly saved file opens flawlessly in TextEdit.
Puzzling isn’t it ?
Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) mardi 14 mars 2017 17:51:41
use AppleScript version "2.4"
use framework "Foundation"
use scripting additions
on modifyPath:thePath adding:addString
set pathString to current application's NSString's stringWithString:thePath
set theExtension to pathString's pathExtension()
set thePathNoExt to pathString's stringByDeletingPathExtension()
set newPath to (thePathNoExt's stringByAppendingString:addString)'s stringByAppendingPathExtension:theExtension
return newPath as string
end modifyPath:adding:
on decodeFile:thePath
-- Get the BOM value as a two-character string. (The single character id (254 * 256 + 255) gets lost in the conversion to NSString.)
set theUTF16BEBOM to current application's NSString's stringWithString:(string id {254, 255})
-- Convert it to two bytes of data.
set theData to (theUTF16BEBOM's dataUsingEncoding:(current application's NSISOLatin1StringEncoding))'s mutableCopy()
-- Read the contents of the ISO Latin 1 text file.
set theString to current application's NSString's stringWithContentsOfFile:thePath encoding:(current application's NSISOLatin1StringEncoding) |error|:(missing value)
-- Convert that to data too, but encoded as UTF-16 big-endian, and append it to the BOM data.
tell theData to appendData:(theString's dataUsingEncoding:(current application's NSUTF16BigEndianStringEncoding))
-- Write the lot to a new file.
set newPath to my modifyPath:thePath adding:"-new"
set theResult to theData's writeToFile:newPath atomically:true
return {newPath, theResult as boolean}
end decodeFile:
set theSource to (choose file)
set {newPath, bof} to my decodeFile:(POSIX path of theSource)
An alternative that might work would be create the BOM as a zero width no-break space. Unfortunately this only works in 10.11 and above:
set theString to current application's NSString's stringWithString:"\\N{ZERO WIDTH NO-BREAK SPACE}"
set theString to theString's stringByApplyingTransform:(current application's NSStringTransformToUnicodeName) |reverse|:true
You could then append the contents of the file to that string, and save. I don’t have a suitable sample to test it.
That works for me if I convert both to data before appending them, as in my version above. But if I append them as NSStrings and save the result, the resulting file crashes anything that tries to open or read it. (Well. TextEdit and a ‘read (choose file) as Unicode text’’ script anyway.) Sounds similar to what Yvan was getting with his version. I’ll have another look in the morning (GMT).
Thank you Nigel and Shane.
I just ignored the way to define the string containing the BOM.
About the problem which I describe.
May it due to the fact that the system keep the fact that the late write operation applied to the file was a write «class data one in the file’s metadatas ?
When the late action write text data like the late Nigel’s proposal or the shell version, the metadatas would record that and so TextEdit is satisfied.
I must add that when I compare the hexadecimal contents of the different attempts, they are identical.
Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) mercredi 15 mars 2017 10:13:37
That’s right, for these kind of things I prefer to read the CoreFoundation frameworks rather than the Cocoa frameworks. CoreFoundation team wrote it and their documentation seems more accurate. At least it says that kCFStringEncodingUTF16BE is introduced in Mac OS 10.4+ while k‹CFString‹Encoding‹Unicode is since the first release of Mac OS X.
When stepping to the x86 architecture the native endianness changed which caused problems with UTF16 encoded files back then, and that was around 10.4. While the PowerPC could run in both endianness mode it ran in big endian for Macintosh systems, therefore UTF16 was big endian by default. When the Intel processor was introduced the native endianness was little endian and a lot of software had trouble reading the PPC written UTF16 files. Therefore to read PPC written UTF16 files that followed PPC’s native endianness on an x86 machine you could use the key kCFStringEncodingUTF16BE.
Yes, although kCFStringEncodingUTF16BE is not the same value as NSUTF16BigEndianStringEncoding, and it’s at least theoretically possible that the encoding was supported earlier in CoreFoundation – the transform constants are an example of that.
The problem, I suspect, is that mistakes are being made because some enums and constants are being renamed to a naming scheme that fits better with Swift.
I’ve just been fooling around with your script in various ways.
The crashing only occurs if the data has been written to the file. Not if it hasn’t or if something else of the same length been written instead.
The crashing appears to be a system problem, rather than just TextEdit. Merely selecting the file in a ‘choose file’ dialog crashes the host application before the “OK” button can be clicked.
A workaround seems to be to write the BOM as a short integer instead of as data. Fortunately, UTF-16 BOMs can be represented in this way.
use AppleScript version "2.4"
use framework "Foundation"
use scripting additions
on modifyPath:thePath adding:addString
set pathString to current application's NSString's stringWithString:thePath
set theExtension to pathString's pathExtension()
set thePathNoExt to pathString's stringByDeletingPathExtension()
set newPath to (thePathNoExt's stringByAppendingString:addString)'s stringByAppendingPathExtension:theExtension
return newPath --as text
end modifyPath:adding:
on decodeFile:thePath
set theString to (current application's NSString's stringWithString:" ")
set moreString to current application's NSString's stringWithContentsOfFile:thePath encoding:(current application's NSISOLatin1StringEncoding) |error|:(missing value)
set theString to theString's stringByAppendingString:moreString
set newPath to my modifyPath:thePath adding:"-new"
set theResult to theString's writeToFile:newPath atomically:true encoding:(current application's NSUTF16BigEndianStringEncoding) |error|:(missing value)
set newPath to newPath as text
write -512 as short to (get POSIX file newPath) -- Will open, start at 1, and close anyway, since the file already exists.
return {newPath, theResult as boolean}
end decodeFile:
set theSource to (choose file)
set {newPath, bof} to my decodeFile:(POSIX path of theSource)
I suspect that including the zero-width no-break space in the string that gets written to the file with NSUTF16BigEndianStringEncoding does something undesirable to that character. The encoding needs to be sorted out before the write, which is the idea behind my NSData approach. Your idea works well when plugged into that.
With Shane’s string transform suggestion, you can either create separate data blocks from the BOM character and the string from the original file and then join the blocks, or append the string from the file to the BOM character and create a data block from the result. This version does the latter:
use AppleScript version "2.5" -- Mac OS 10.11 (El Capitan) or later.
use framework "Foundation"
use scripting additions
on modifyPath:thePath adding:addString
set pathString to current application's NSString's stringWithString:thePath
set theExtension to pathString's pathExtension()
set thePathNoExt to pathString's stringByDeletingPathExtension()
set newPath to (thePathNoExt's stringByAppendingString:addString)'s stringByAppendingPathExtension:theExtension
return newPath as string
end modifyPath:adding:
on decodeFile:thePath
-- Create a character with the same Unicode value as a UTF-16 BE BOM.
set theUTF16BEBOM to current application's NSString's stringWithString:"\\N{ZERO WIDTH NO-BREAK SPACE}"
set theUTF16BEBOM to theUTF16BEBOM's stringByApplyingTransform:(current application's NSStringTransformToUnicodeName) |reverse|:true
-- Read the contents of the ISO Latin 1 text file and append it to the BOM.
set theString to current application's NSString's stringWithContentsOfFile:thePath encoding:(current application's NSISOLatin1StringEncoding) |error|:(missing value)
set theString to theUTF16BEBOM's stringByAppendingString:theString
-- Convert the result to data, encoded as UTF-16 big-endian.
set theData to theString's dataUsingEncoding:(current application's NSUTF16BigEndianStringEncoding)
-- Write the data to a new file.
set newPath to my modifyPath:thePath adding:"-new"
set theResult to theData's writeToFile:newPath atomically:true
return {newPath, theResult as boolean}
end decodeFile:
set theSource to (choose file)
set {newPath, bof} to my decodeFile:(POSIX path of theSource)
And since it’s proved an intesting area for exploration, here’s a version which writes the BOM and the text to the new file separately:
use AppleScript version "2.5" -- Mac OS 10.11 (El Capitan) or later.
use framework "Foundation"
use scripting additions
on modifyPath:thePath adding:addString
set pathString to current application's NSString's stringWithString:thePath
set theExtension to pathString's pathExtension()
set thePathNoExt to pathString's stringByDeletingPathExtension()
set newPath to (thePathNoExt's stringByAppendingString:addString)'s stringByAppendingPathExtension:theExtension
return newPath as string
end modifyPath:adding:
on decodeFile:thePath
-- Create a character with the same Unicode value as a UTF-16 BE BOM.
set theUTF16BEBOM to current application's NSString's stringWithString:"\\N{ZERO WIDTH NO-BREAK SPACE}"
set theUTF16BEBOM to theUTF16BEBOM's stringByApplyingTransform:(current application's NSStringTransformToUnicodeName) |reverse|:true
-- Convert it data.
set BOMData to theUTF16BEBOM's dataUsingEncoding:(current application's NSUTF16BigEndianStringEncoding)
-- Read the contents of the ISO Latin 1 text file.
set theString to current application's NSString's stringWithContentsOfFile:thePath encoding:(current application's NSISOLatin1StringEncoding) |error|:(missing value)
-- Convert that to data too, encoded as UTF-16 big-endian.
set stringData to theString's dataUsingEncoding:(current application's NSUTF16BigEndianStringEncoding)
-- Create a new file.
set theResult to false
set newPath to my modifyPath:thePath adding:"-new"
tell current application's NSFileManager's defaultManager() to createFileAtPath:newPath |contents|:(missing value) attributes:(missing value)
-- Open it for access with write permission, write the two blocks of data to it, and close it again.
set fileAccess to current application's NSFileHandle's fileHandleForWritingAtPath:newPath
try
tell fileAccess to writeData:BOMData
tell fileAccess to writeData:stringData
set theResult to true
end try
tell fileAccess to closeFile()
return {newPath, theResult as boolean}
end decodeFile:
set theSource to (choose file)
set {newPath, bof} to my decodeFile:(POSIX path of theSource)
I’m not sure if you’re asking if one exists or if that’s what I’ve used (in the second script in post #37).
As far as I can see, NSString’s and NSData’s writeToFile methods either create files containing just the given material or completely replace the contents of existing files. They don’t have methods for editing files in-place.
The NSFileHandle class seems to be the equivalent of the file system object created by open for access in the StandardAdditions, but with a few differences in the way it’s scripted. The significant differences here are:
Files which don’t already exist have to be explicitly created first. I’ve used NSFileManager for this. (Files which do already exist are effectually emptied if created again with NSFileManager.)
NSFileHandle only writes NSData objects. (According to the documentation, anyway. I haven’t put it to the test.)
Otherwise, as with write in the StandardAdditions, each successive write starts where the previous one ended unless something is done to change the file handle’s file pointer.