Change file encoding of TextEdit by Applescript

Yes, that makes sense.

Thanks Nigel.

It’s good to know that we may write the BOM as a number but I will stay with your version which does all the job with ASObjC.

Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) mercredi 15 mars 2017 16:07:39

With Shane’s string transform suggestion, you can either create separate data blocks from the BOM character and the string from the original file and then join the blocks, or append the string from the file to the BOM character and create a data block from the result. This version does the latter:

use AppleScript version "2.5" -- Mac OS 10.11 (El Capitan) or later.
use framework "Foundation"
use scripting additions

on modifyPath:thePath adding:addString
	set pathString to current application's NSString's stringWithString:thePath
	set theExtension to pathString's pathExtension()
	set thePathNoExt to pathString's stringByDeletingPathExtension()
	set newPath to (thePathNoExt's stringByAppendingString:addString)'s stringByAppendingPathExtension:theExtension
	return newPath as string
end modifyPath:adding:

on decodeFile:thePath
	-- Create a character with the same Unicode value as a UTF-16 BE BOM.
	set theUTF16BEBOM to current application's NSString's stringWithString:"\\N{ZERO WIDTH NO-BREAK SPACE}"
	set theUTF16BEBOM to theUTF16BEBOM's stringByApplyingTransform:(current application's NSStringTransformToUnicodeName) |reverse|:true
	-- Read the contents of the ISO Latin 1 text file and append it to the BOM.
	set theString to current application's NSString's stringWithContentsOfFile:thePath encoding:(current application's NSISOLatin1StringEncoding) |error|:(missing value)
	set theString to theUTF16BEBOM's stringByAppendingString:theString
	-- Convert the result to data, encoded as UTF-16 big-endian.
	set theData to theString's dataUsingEncoding:(current application's NSUTF16BigEndianStringEncoding)
	-- Write the data to a new file.
	set newPath to my modifyPath:thePath adding:"-new"
	set theResult to theData's writeToFile:newPath atomically:true
	
	return {newPath, theResult as boolean}
end decodeFile:

set theSource to (choose file)
set {newPath, bof} to my decodeFile:(POSIX path of theSource)

And since it’s proved an intesting area for exploration, here’s a version which writes the BOM and the text to the new file separately:

use AppleScript version "2.5" -- Mac OS 10.11 (El Capitan) or later.
use framework "Foundation"
use scripting additions

on modifyPath:thePath adding:addString
	set pathString to current application's NSString's stringWithString:thePath
	set theExtension to pathString's pathExtension()
	set thePathNoExt to pathString's stringByDeletingPathExtension()
	set newPath to (thePathNoExt's stringByAppendingString:addString)'s stringByAppendingPathExtension:theExtension
	return newPath as string
end modifyPath:adding:

on decodeFile:thePath
	-- Create a character with the same Unicode value as a UTF-16 BE BOM.
	set theUTF16BEBOM to current application's NSString's stringWithString:"\\N{ZERO WIDTH NO-BREAK SPACE}"
	set theUTF16BEBOM to theUTF16BEBOM's stringByApplyingTransform:(current application's NSStringTransformToUnicodeName) |reverse|:true
	-- Convert it data.
	set BOMData to theUTF16BEBOM's dataUsingEncoding:(current application's NSUTF16BigEndianStringEncoding)
	-- Read the contents of the ISO Latin 1 text file.
	set theString to current application's NSString's stringWithContentsOfFile:thePath encoding:(current application's NSISOLatin1StringEncoding) |error|:(missing value)
	-- Convert that to data too, encoded as UTF-16 big-endian.
	set stringData to theString's dataUsingEncoding:(current application's NSUTF16BigEndianStringEncoding)
	-- Create a new file.
	set theResult to false
	set newPath to my modifyPath:thePath adding:"-new"
	tell current application's NSFileManager's defaultManager() to createFileAtPath:newPath |contents|:(missing value) attributes:(missing value)
	-- Open it for access with write permission, write the two blocks of data to it, and close it again.
	set fileAccess to current application's NSFileHandle's fileHandleForWritingAtPath:newPath
	try
		tell fileAccess to writeData:BOMData
		tell fileAccess to writeData:stringData
		set theResult to true
	end try
	tell fileAccess to closeFile()
	
	return {newPath, theResult as boolean}
end decodeFile:

set theSource to (choose file)
set {newPath, bof} to my decodeFile:(POSIX path of theSource)

Thanks Nigel

I’m scrapping my head.
Isn’t it a function allowing ASObjC to append datas to the already written ones ?

Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) mercredi 15 mars 2017 19:24:51

Thanks, Shane. I’ve now sorted that out.

I’m not sure if you’re asking if one exists or if that’s what I’ve used (in the second script in post #37).

As far as I can see, NSString’s and NSData’s writeToFile methods either create files containing just the given material or completely replace the contents of existing files. They don’t have methods for editing files in-place.

The NSFileHandle class seems to be the equivalent of the file system object created by open for access in the StandardAdditions, but with a few differences in the way it’s scripted. The significant differences here are:

  1. Files which don’t already exist have to be explicitly created first. I’ve used NSFileManager for this. (Files which do already exist are effectually emptied if created again with NSFileManager.)
  2. NSFileHandle only writes NSData objects. (According to the documentation, anyway. I haven’t put it to the test.)

Otherwise, as with write in the StandardAdditions, each successive write starts where the previous one ended unless something is done to change the file handle’s file pointer.

In Cocoa you can use NSMutableData to append NSData chunks and then the writeToFile… / writeToURL… methods of NSData to write the data directly to disk. The NSFileHandle way is actually not needed in this case.

There are more roads which lead to Rome than in AppleScript :wink:

Just an example:
I was asking if there is a way to write the BOM
then write the text itself.
At this time you append the text to the BOM in ram them write the entire data in a single instruction.

As you pointed, this may be done with old fashioned AppleScript.

Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) jeudi 16 mars 2017 10:54:14

Isn’t it what is done by the first script given in message #37 ?

Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) jeudi 16 mars 2017 10:55:49

Basically yes, but not after putting NSData instances together like writing data serially to disk via NSFileHandle

They are not values but they are masks, who are the same. For instance the default unicode mask contains two encodings.

It’s not possible with text files in general so those methods won’t exist. That is where the power of binary files comes in with data blocks but that doesn’t work for NSData or NSString classes. For larger files you could also use streams to reduce the memory footprint.

Appending data to the NSData object could be resource heavy which makes a file handle (session write) much more efficient than concatenating data and write it as a whole to disk. From efficiency perspective a far better choice.

No, they’re not the same – one is 0x10000100 and the other is 0x90000100.

The mask is 0x100 the other bit is CF or NS identification but is not an encoding mask.

More specifically the NSENCODING_MASK is added to NSUTF16BigEndianStringEncoding which is 1<<31 value, note the starting 9. So while their value may not be the same, their masks are the same just what I said before.

Hi Yvan.

The script in post #26 makes an NSMutableData version of the BOM and an NSData version of the text, appends the latter to the former, and writes the result to a new file with a single write.

The first script in post #37 appends the text to a character with the same Unicode value as the BOM, makes an NSData version of the result, and writes that to a new file with a single write.

The second script in post #37 makes NSData versions of both the BOM and the text, creates a new file, opens it for writing, writes the BOM data, then writes the text data, then closes the access.

The post #26 script’s probably the best of the three for the current purpose, since it does only do one write and also works in Mac OS 10.10. The other two are simply explorations of different approaches.

Oops, I didn’t took care that

try
	tell fileAccess to writeData:BOMData
	tell fileAccess to writeData:stringData
	set theResult to true
end try

is ASObjC code, not old fashioned code.

So now it seems that I understood.
If I made no error we may also use this 4th version:

# http://macscripter.net/viewtopic.php?id=28482&p=2
# message #26 alt

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

on modifyPath:thePath adding:addString
on modifyPath:thePath adding:addString
	set pathString to current application's NSString's stringWithString:thePath
	set theExtension to pathString's pathExtension()
	set thePathNoExt to pathString's stringByDeletingPathExtension()
	set newPath to (thePathNoExt's stringByAppendingString:addString)
	if theExtension's |length|() > 0 then
		set newPath to newPath's stringByAppendingPathExtension:theExtension
	end if
	return newPath as string
end modifyPath:adding:

on decodeFile:thePath
	-- Get the BOM value as a two-character string. (The single character id (254 * 256 + 255) gets lost in the conversion to NSString.)
	set theUTF16BEBOM to current application's NSString's stringWithString:(string id {254, 255})
	-- Convert it to two bytes of data.
	set BOMData to theUTF16BEBOM's dataUsingEncoding:(current application's NSISOLatin1StringEncoding)
	
	-- Read the contents of the ISO Latin 1 text file.
	set theString to current application's NSString's stringWithContentsOfFile:thePath encoding:(current application's NSISOLatin1StringEncoding) |error|:(missing value)
	-- Convert that to data too, but encoded as UTF-16 big-endian, and append it to the BOM data.
	set stringData to theString's dataUsingEncoding:(current application's NSUTF16BigEndianStringEncoding)
	-- Write the lot to a new file.
	set newPath to my modifyPath:thePath adding:"-new"
	
	tell current application's NSFileManager's defaultManager() to createFileAtPath:newPath |contents|:(missing value) attributes:(missing value)
	-- Open it for access with write permission, write the two blocks of data to it, and close it again.
	set fileAccess to current application's NSFileHandle's fileHandleForWritingAtPath:newPath
	try
		tell fileAccess to writeData:BOMData
		tell fileAccess to writeData:stringData
		set theResult to true
	end try
	tell fileAccess to closeFile()
	
	return {newPath, theResult as boolean}
end decodeFile:

set theSource to (choose file)
set {newPath, bof} to my decodeFile:(POSIX path of theSource)

Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) jeudi 16 mars 2017 15:37:24

It’s the 2nd script in message #37 which behaves this way.

I asked about the first one in message #37.

Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) jeudi 16 mars 2017 17:10:04

Yes. :slight_smile: Or even:

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

on modifyPath:thePath adding:addString
	set pathString to current application's NSString's stringWithString:thePath
	set theExtension to pathString's pathExtension()
	set thePathNoExt to pathString's stringByDeletingPathExtension()
	set newPath to (thePathNoExt's stringByAppendingString:addString)'s stringByAppendingPathExtension:theExtension
	return newPath as string
end modifyPath:adding:

on decodeFile:thePath
	-- Get the BOM value as a two-character string. (The single character id (254 * 256 + 255) gets lost in the conversion to NSString.)
	set theUTF16BEBOM to current application's NSString's stringWithString:(string id {254, 255})
	-- Convert it to two bytes of data.
	set BOMData to theUTF16BEBOM's dataUsingEncoding:(current application's NSISOLatin1StringEncoding)
	-- Read the contents of the ISO Latin 1 text file.
	set theString to current application's NSString's stringWithContentsOfFile:thePath encoding:(current application's NSISOLatin1StringEncoding) |error|:(missing value)
	-- Convert that to data too, but encoded as UTF-16 big-endian.
	set stringData to theString's dataUsingEncoding:(current application's NSUTF16BigEndianStringEncoding)
	
	-- Write the BOM data to a new file using the NSData method. If that works, open the file for access with write permission, move the file pointer to the end of it, write the text data, and close the file again.	
	set theResult to false
	set newPath to my modifyPath:thePath adding:"-new"
	if ((BOMData's writeToFile:newPath atomically:true) as boolean) then
		set fileAccess to current application's NSFileHandle's fileHandleForWritingAtPath:newPath
		try
			tell fileAccess to seekToEndOfFile()
			tell fileAccess to writeData:stringData
			set theResult to true
		end try
		tell fileAccess to closeFile()
	end if
	
	return {newPath, theResult as boolean}
end decodeFile:

set theSource to (choose file)
set {newPath, bof} to my decodeFile:(POSIX path of theSource)

Thanks Nigel

It seems that now I have every infos needed to drop old fashioned cote for writing datas :slight_smile:

Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) jeudi 16 mars 2017 18:15:17

By the way, this version of the modifyPath handler uses fewer methods and works properly for paths both with and without extensions:

on modifyPath:thePath adding:addString
	set pathString to current application's class "NSString"'s stringWithString:(thePath)
	set insertionPoint to pathString's rangeOfString:("(?=(\\.[^./]*+)?/*+$)") options:(current application's NSRegularExpressionSearch)
	set newPath to pathString's stringByReplacingCharactersInRange:(insertionPoint) withString:(addString)
	-- set newPath to newPath's stringByReplacingOccurrencesOfString:("/*+$") withString:("") options:(current application's NSRegularExpressionSearch) range:({0, newPath's |length|()})
	return newPath as text
end modifyPath:adding:

Edit: Regex modified to allow for the possibility of trailing slashes. (Thanks, Yvan.) And the commented-out line will remove any if it’s re-enabled.

It appears - thank you Nigel - that El Capitan doesn’t take care of the case : theExtension is empty string when it’s asked to append it to a path so the original handler must be edited as :

on modifyPath:thePath adding:addString
	set pathString to current application's NSString's stringWithString:thePath
	set theExtension to pathString's pathExtension()
	set thePathNoExt to pathString's stringByDeletingPathExtension()
	set newPath to (thePathNoExt's stringByAppendingString:addString)
	if theExtension's |length|() > 0 then
		set newPath to newPath's stringByAppendingPathExtension:theExtension
	end if
	return newPath as string
end modifyPath:adding:

For safe I will edit my messages containing the old version.

Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) dimanche 19 mars 2017 16:39:39