Thursday, March 30, 2017

#1 2009-02-26 03:55:26 pm

Lazybaer
Member
Registered: 2006-03-30
Posts: 43

Change file encoding of TextEdit by Applescript

Hi all

I'd like to change the TextEdit encodings in the TextEdit preferences with a script. E.g. from 'Automatic' to 'UTF-8'. What I have is as follows:

tell application "TextEdit" to activate
tell application "System Events"
    tell process "TextEdit"
        keystroke "," using {command down}
        key code 124
        delay 2
        repeat 6 times
            keystroke tab
        end repeat
        key code 125
        delay 2

        -- ???
        -- ???
       
    end tell
end tell

The script opens the window where I can choose the different encodings. But I cannot specify and select the different lines. Any tip or hint?

Thanks
Lazy


Filed under: System, TextEdit

Offline

 

#2 2009-02-26 04:11:57 pm

StefanK
Member
From: St. Gallen, Switzerland
Registered: 2006-10-21
Posts: 11430
Website

Re: Change file encoding of TextEdit by Applescript

Hi,

workaround

Applescript:


tell application "TextEdit"
   set theCache to text of document 1
   close document 1 saving no
   quit
end tell
do shell script "defaults write com.apple.TextEdit PlainTextEncodingForWrite -int 4"
do shell script "defaults write com.apple.TextEdit PlainTextEncoding -int 4"
tell application "TextEdit"
   activate
   set text of document 1 to theCache
end tell

you can retrieve the enumeration values for the encodings by making the settings manually, quit TextEdit.app and look into the Preference file


regards

Stefan

Filed under: TextEdit

Offline

 

#3 2009-02-26 04:29:01 pm

Lazybaer
Member
Registered: 2006-03-30
Posts: 43

Re: Change file encoding of TextEdit by Applescript

Hi Stefan

Thanks for the quick reply.

-int 4 is for UTF-8. What's the code for 'Automatic', 'UTF-16', Western Mac OS and Western Windows?

Thanks in advance

Lazy

Offline

 

#4 2009-02-26 04:31:57 pm

StefanK
Member
From: St. Gallen, Switzerland
Registered: 2006-10-21
Posts: 11430
Website

Re: Change file encoding of TextEdit by Applescript

Lazybaer wrote:

-int 4 is for UTF-8. What's the code for 'Automatic', 'UTF-16', Western Mac OS and Western Windows?

I wrote:

you can retrieve the enumeration values for the encodings by making the settings manually, quit TextEdit.app and look into the Preference file


regards

Stefan

Offline

 

#5 2012-12-06 03:04:24 am

ToBeJazz
Member
Registered: 2010-05-16
Posts: 132

Re: Change file encoding of TextEdit by Applescript

I would like to have an AppleScript that converts the encoding of a text file from UTF-8 into UTF-16, anyone?
I'm using FileMaker and for some reason it wants to have UTF-16 - my scandinavian letters ÅÄÖ won't show up correctly if I import a UTF-8 into FileMaker.

Offline

 

#6 2012-12-06 04:00:15 am

McUsrII
Member
Registered: 2012-11-20
Posts: 3046
Website

Re: Change file encoding of TextEdit by Applescript

Hello!

Have a look at iconv in Terminal, that you can call with a do shell script from AppleScript.

Try man iconv, to read about it, iconv --list lists the different encodings in your case I think the incantation should look something like this:

Applescript:

do shell script "iconv -f UTF-8 -t UTF-16 /Users/you/your/path/To/file/you/want/to/encode >/path/to/Encoded/File/you/Want/to/end/up/with")

And please tell the Filemaker guys about your problem! smile

Last edited by McUsrII (2012-12-06 04:04:17 am)


Filed under: encoding, iconv

Offline

 

#7 2012-12-06 04:48:33 am

DJ Bazzie Wazzie
Member
From: the Netherlands
Registered: 2004-10-20
Posts: 2594

Re: Change file encoding of TextEdit by Applescript

McUsrII wrote:

Applescript:

do shell script "iconv -f UTF-8 -t UTF-16 /Users/you/your/path/To/file/you/want/to/encode >/path/to/Encoded/File/you/Want/to/end/up/with")

To extend McUsr's info (I've finished a 24 page documentation about character encoding last week):

UTF-16 is little endian and iconv adds a BOM
UTF-16LE is little endian and iconv won't add a BOM
UTF-16BE is big endian and iconv won't add a bom
There is no support for UTF-16 Big endian with BOM in iconv.

If you need UTF-16 with big endians and a BOM create a file and add byte 254 and 255 en then let iconv add the converted data to the file.

Applescript:

do shell script "xxd -p -r <<< xfeff > UTF-16FileWithBOM.txt"
do shell script "iconv -t UTF-16BE UTF-8File.txt >> UTF-16FileWithBOM.txt"

I don't use the -f option because the shell is already UTF-8 and iconv recognizes it.

Because most applications relies on the iconv libraries, like cocoa text system, you need an application with it's own character encoding libraries. Therefore the results of iconv are identical to cocoa text system, so iconv's results are the same as Texteditor. Character encoding support is way better in Word than in cocoa's text system unfortunately that doesn's count for all Microsoft Office packages because Excel has very poor unicode support, even if it's listed down in the supported character encoding list. Latest version(s) Word support only UTF-16 with BOM when the file is saved but can open all 4 different UTF-16 types of files.

edit: for the curios ones

Unicode wrote:

Where a text data stream is known to be plain Unicode text (but not which endian), then BOM can be used as a signature. If there is no BOM, the text should be interpreted as big-endian.

So you know why Textedit and iconv doesn't support UTF-16BE with BOM.

Last edited by DJ Bazzie Wazzie (2012-12-06 07:17:48 am)

Offline

 

#8 2012-12-06 05:01:31 am

ToBeJazz
Member
Registered: 2010-05-16
Posts: 132

Re: Change file encoding of TextEdit by Applescript

thanks for the replies!

My AppleScript skills are very limited so I don't quite know how to make the script.
The script you sent me McUsrII doesn't quite work, it doesn't change the encoding, only removes the content of the txt file.
I'm trying your script DJ Bazzie Wazzie as well - can make it work...


I don't know if FileMaker needs UTF-16 with big or little endian - maybe it doesn't matter?

Offline

 

#9 2012-12-06 05:09:21 am

DJ Bazzie Wazzie
Member
From: the Netherlands
Registered: 2004-10-20
Posts: 2594

Re: Change file encoding of TextEdit by Applescript

ToBeJazz wrote:

I don't know if FileMaker needs UTF-16 with big or little endian - maybe it doesn't matter?

Endianness, for UTF-16, means how you can store multiple bytes to store higher numbers. As humans we combine numbers as well because after the number 9 we write the next number down as 10. This notation means big endian while for certain CPUs it can be faster to store number the other way around and you will store number ten as 01. So for bytes the number 261 will be stored as  0x0104 in big endian notation while it will be stored as 0x0401 in little endian. The best way to remind the difference is that big endian is similar to human notation.

Most of the time when the endian is wrong you will see all sort of Vietnamese symbols when opening the file

for example unicode character 100 will be stored as 0x0064 or 0x6400. When using the wrong endian the number (unicode character) 100 will be interpreted as an number (unicode character) 25600 which is character 𥘀.

Last edited by DJ Bazzie Wazzie (2012-12-06 05:16:36 am)

Offline

 

#10 2012-12-06 05:13:32 am

ToBeJazz
Member
Registered: 2010-05-16
Posts: 132

Re: Change file encoding of TextEdit by Applescript

DJ Bazzie Wazzie,

would it be possible for you to write a script that converts a .txt file from UTF-8 into UTF-16 (big or/and little endian)?

I would just try to import a txt file with a UTF-16, big or little endian, and see if it works in FileMaker...

Offline

 

#11 2012-12-06 05:30:38 am

McUsrII
Member
Registered: 2012-11-20
Posts: 3046
Website

Re: Change file encoding of TextEdit by Applescript

smile

DJ Bazzie Wazzie wrote:

UTF-16 is little endian and iconv adds a BOM

I am surprised that bare utf-16 is little endian. smile And I'd love to read your paper, if you care to share.

@ToBeJazz:

Try DJ Bazzie Wazzies shell commands in post 7, then import it into Filemaker manually, and see if it gives the results you need. If you get Japanese/Chinese or Korean as a result, then you are having the wrong endianess! smile

Last edited by McUsrII (2012-12-06 05:37:51 am)


Filed under: utf-16

Offline

 

#12 2012-12-06 06:35:17 am

DJ Bazzie Wazzie
Member
From: the Netherlands
Registered: 2004-10-20
Posts: 2594

Re: Change file encoding of TextEdit by Applescript

Applescript:

set theFile to POSIX path of (choose file)
set encoding to choose from list {"UTF-16 Big Endian", "UTF-16 Big Endian + Bom", "UTF-16 Little Endian", "UTF-16 Little Endian + Bom"}

if encoding is false then
   return --nothing selectedor pressed cancel
else
   set encoding to encoding as string
end if

set newFileName to (do shell script "str=" & quoted form of theFile & ";echo ${str%.*}") & "_iconv.txt"

if encoding contains "Big Endian" then
   set enc to "UTF-16BE"
   set cmd to ""
   if encoding contains "+ Bom" then set cmd to "xxd -p -r <<< xfeff "
   do shell script cmd & " > " & quoted form of newFileName
else
   set enc to "UTF-16LE"
   set cmd to ""
   if encoding contains "+ Bom" then set cmd to "xxd -p -r <<< xfffe"
   do shell script cmd & "> " & quoted form of newFileName
end if

do shell script "iconv -f UTF-8 -t " & enc & space & quoted form of theFile & " >> " & quoted form of newFileName

Offline

 

#13 2012-12-06 06:50:30 am

ToBeJazz
Member
Registered: 2010-05-16
Posts: 132

Re: Change file encoding of TextEdit by Applescript

Hey that's a great script DJ Bazzie Wazzie - thanks alot!

When I choose "UTF-16 Big Endian + Bom" it does just what I'm looking for.
Let's see if I can make a script that has no encoding options and no choose file option as well - I want a specific script without any options to be run from FileMaker so that I can import a text file that was originally in UTF-8.

Offline

 

#14 2012-12-06 07:22:21 am

DJ Bazzie Wazzie
Member
From: the Netherlands
Registered: 2004-10-20
Posts: 2594

Re: Change file encoding of TextEdit by Applescript

ToBeJazz wrote:

Hey that's a great script DJ Bazzie Wazzie - thanks alot!

You're welcome!

]When I choose "UTF-16 Big Endian + Bom" it does just what I'm looking for.

Great! And sad at the same time. It's the only encoding that iconv and cocoa text system can't write to a file.


Let's see if I can make a script that has no encoding options and no choose file option as well - I want a specific script without any options to be run from FileMaker so that I can import a text file that was originally in UTF-8.

I've changed my first post (cleaned up the mess I've made) and that example code should help you. I also noticed that echo -e -n doesn't quite work as good as in the terminal. Haven't figured out what exactly goes wrong there but xxd does it job very well, in Terminal as in do shell script.

Offline

 

#15 2012-12-06 07:31:34 am

ToBeJazz
Member
Registered: 2010-05-16
Posts: 132

Re: Change file encoding of TextEdit by Applescript

Sorry I can't use your short example script, don't know how to change it...
I did begin to shorten your longer script though:

Applescript:

set theFile to POSIX path of (choose file)
set encoding to "UTF-16 Big Endian + Bom" as string
set newFileName to (do shell script "str=" & quoted form of theFile & ";echo ${str%.*}") & "_iconv.txt"

set enc to "UTF-16BE"
set cmd to "xxd -p -r <<< xfeff "
do shell script cmd & " > " & quoted form of newFileName

do shell script "iconv -f UTF-8 -t " & enc & space & quoted form of theFile & " >> " & quoted form of newFileName

Do really know what you mean by "iconv and cocoa text system can't write to a file." I have no problem using the above script, it does what it is supposed to do.

Offline

 

#16 2012-12-06 07:39:35 am

DJ Bazzie Wazzie
Member
From: the Netherlands
Registered: 2004-10-20
Posts: 2594

Re: Change file encoding of TextEdit by Applescript

ToBeJazz wrote:

Do really know what you mean by "iconv and cocoa text system can't write to a file." I have no problem using the above script, it does what it is supposed to do.

Sorry for the confusing up here... the script is working but I'm helping iconv to startup because it can't write the BOM on it's own. With cocoa text system, including Texteditor, it's impossible to save the file in a proper way.

You mean something like this?

Applescript:

set theFile to POSIX path of (choose file)
set newFileName to (do shell script "str=" & quoted form of theFile & ";echo ${str%.*}") & "_iconv.txt"

do shell script "xxd -p -r <<< xfeff > " & quoted form of newFileName
do shell script "iconv -f UTF-8 -t UTF-16BE " & space & quoted form of theFile & " >> " & quoted form of newFileName

Last edited by DJ Bazzie Wazzie (2012-12-06 07:40:41 am)

Offline

 

#17 2012-12-06 07:58:08 am

ToBeJazz
Member
Registered: 2010-05-16
Posts: 132

Re: Change file encoding of TextEdit by Applescript

Yes, that's short and nice I think.
Next for me is to get rid of the choose file thing and point directly to a file, but at least that I should be able to do myself:)

Last edited by ToBeJazz (2012-12-06 07:59:08 am)

Offline

 

#18 2012-12-06 08:46:44 am

McUsrII
Member
Registered: 2012-11-20
Posts: 3046
Website

Re: Change file encoding of TextEdit by Applescript

IMHO The people behind FileMaker should receive a copy of this thread.

And there seem to lack the fine print regarding the do shell script too. It is obviously interpreting stuff, and it would have been nice, if they specified exactly how input and output from the do shell script command is treated/translated. Because it isn't much we can do about it. I mean, stty settings doesn't work, when you don't have a terminal..

Offline

 

#19 2017-03-14 07:23:58 am

BullyBu
Member
Registered: 2017-03-14
Posts: 1

Re: Change file encoding of TextEdit by Applescript

I read this topic for a solution but it's not working for me because if source file in UTF-8 without BOM then encoded file goes with error.
In my case I need to import .csv file into excel, but some characters imports in ISO-8859-1, so the solution is to encode file to UTF-16LE with BOM.

I tried to add BOM into UTF-8 first and then encode it to UTF-16 with BOM, and it works, but there are two steps, two encoded files and I don't enjoy it.
Then I found a solution that works for me, so I'd like to share my experience:

In terminal I found similar command called "uconv" but it's not available direct in shell (command not found error), so I should link to path:

Applescript:

on run {input, parameters}
   
   set theFile to POSIX path of input --source file
   set endFileName to (do shell script "str=" & quoted form of theFile & ";echo ${str%.*}") & "_b.csv" --temp file
   
   do shell script "/opt/local/bin/uconv -s -f UTF-8 -t UTF-16LE --add-signature < " & quoted form of theFile & " > " & quoted form of endFileName --uconv silent from utf-8 to utf-16 little endian with bom from source file to temp file
   do shell script "mv " & quoted form of endFileName & space & quoted form of theFile --replace source file by temp file

   return input
end run

This code works with file from input, encode it from UTF-8 to UTF-16 Little Endian with BOM (--add-signature for that) and replace source file by new one.
Use man uconv, to read about it, uconv --list lists the different encodings.

Offline

 

#20 2017-03-14 07:48:09 am

Nigel Garvey
Moderator
From: Warwickshire, England
Registered: 2002-11-19
Posts: 4226

Re: Change file encoding of TextEdit by Applescript

Hi BullyBu. Welcome to MacScripter and thanks for posting your own solution to this topic.

There's no "/opt" folder on my machine. Were it and its contents installed on yours by some third-party software?


NG

Offline

 

#21 2017-03-14 09:56:33 am

Yvan Koenig
Member
Registered: 2006-09-14
Posts: 2885

Re: Change file encoding of TextEdit by Applescript

I didn't found uconv on my machine.
The job may be done with iconv.

Applescript:

set theFile to POSIX path of (choose file)

set newFileName to (do shell script "str=" & quoted form of theFile & ";echo ${str%.*}") & "_iconv.txt"

set enc to "UTF-16BE"
set cmd to "xxd -p -r <<< xfeff "
do shell script cmd & " > " & quoted form of newFileName # write the BOM : FE FF in the new file

do shell script "iconv -f UTF-8 -t " & enc & space & quoted form of theFile & " >> " & quoted form of newFileName # write the UTF16-BE encoded text after the BOM

I tried to play with ASObjC but I'm puzzled.

In Xcode Help I read :

NSUTF16BigEndianStringEncoding
NSUTF16StringEncoding encoding with explicit endianness specified.

My understanding was that using this encoding I will get a file with the Big Endian BOM at beginning.
Alas I was wrong.

The code (most of which was borrowed to Shane STANLEY) used is :

Applescript:

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

on modifyPath:thePath adding:addString
   set pathString to current application's NSString's stringWithString:thePath
   set theExtension to pathString's pathExtension()
   set thePathNoExt to pathString's stringByDeletingPathExtension()
   set newPath to (thePathNoExt's stringByAppendingString:addString)
   if theExtension's |length|() > 0 then
       set newPath to newPath's stringByAppendingPathExtension:theExtension
   end if
   return newPath as string
end modifyPath:adding:

on decodeFile:thePath
   set theString to current application's NSString's stringWithContentsOfFile:thePath encoding:(current application's NSISOLatin1StringEncoding) |error|:(missing value)
   set newPath to my modifyPath:thePath adding:"-new"
   set theResult to theString's writeToFile:newPath atomically:true encoding:(current application's NSUTF16BigEndianStringEncoding) |error|:(missing value)
   return theResult as boolean
end decodeFile:

set theSource to (choose file)
my decodeFile:(POSIX path of theSource)

Is there something wrong in it or am I wrongly understanding what applying NSUTF16BigEndianStringEncoding is supposed to do ?


Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) mardi 14 mars 2017 15:56:27

Last edited by Yvan Koenig (2017-03-19 11:02:44 am)

Offline

 

#22 2017-03-14 10:40:12 am

Nigel Garvey
Moderator
From: Warwickshire, England
Registered: 2002-11-19
Posts: 4226

Re: Change file encoding of TextEdit by Applescript

Yvan Koenig wrote:

In Xcode Help I read :

NSUTF16BigEndianStringEncoding
NSUTF16StringEncoding encoding with explicit endianness specified.

My understanding was that using this encoding I will get a file with the Big Endian BOM at beginning.
Alas I was wrong.

[…]

Is there something wrong in it or am I wrongly understanding what applying NSUTF16BigEndianStringEncoding is supposed to do ?

Hi Yvan.

I think "with explicit endianness specified" is just an explanation that the enum NSUTF16BigEndianStringEncoding is used to specify explicitly that the text is to be saved with UTF-16 big-endian encoding, not with the endianness native to the machine. It's an explicit instruction to writeToFile rather than an instruction to include an explicit BOM in the file. Maybe Shane will confirm this when he gets up.

According the the Xcode documentation, this enum was only introduced with MacOS 10.12, but it works on my 10.11 system.


NG

Offline

 

#23 2017-03-14 10:55:46 am

DJ Bazzie Wazzie
Member
From: the Netherlands
Registered: 2004-10-20
Posts: 2594

Re: Change file encoding of TextEdit by Applescript

BullyBu wrote:

I tried to add BOM into UTF-8 first and then encode it to UTF-16 with BOM, and it works, but there are two steps, two encoded files and I don't enjoy it.

iconv command line util works with stdin and stdout, meaning you can pipe it directly from one encoding to another without the need of creating additional temporary files.

BullyBu wrote:

Then I found a solution that works for me, so I'd like to share my experience:

Great to see other solutions even with third party command line utils cool I'm still just curious what went wrong with the code above in post #12.

Offline

 

#24 2017-03-14 11:52:29 am

Yvan Koenig
Member
Registered: 2006-09-14
Posts: 2885

Re: Change file encoding of TextEdit by Applescript

Thanks Nigel.

I tried to use an awful scheme to insert the BOM.

Applescript:

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

on modifyPath:thePath adding:addString
   set pathString to current application's NSString's stringWithString:thePath
   set theExtension to pathString's pathExtension()
   set thePathNoExt to pathString's stringByDeletingPathExtension()
   set newPath to (thePathNoExt's stringByAppendingString:addString)
   if theExtension's |length|() > 0 then
       set newPath to newPath's stringByAppendingPathExtension:theExtension
   end if
   return newPath as string
end modifyPath:adding:

on decodeFile:thePath
   set theString to (current application's NSString's stringWithString:" ")
   set moreString to current application's NSString's stringWithContentsOfFile:thePath encoding:(current application's NSISOLatin1StringEncoding) |error|:(missing value)
   set theString to theString's stringByAppendingString:moreString
   set newPath to my modifyPath:thePath adding:"-new"
   set theResult to theString's writeToFile:newPath atomically:true encoding:(current application's NSUTF16BigEndianStringEncoding) |error|:(missing value)
   return {newPath, theResult as boolean}
   
end decodeFile:

set theSource to (choose file)
set {newPath, bof} to my decodeFile:(POSIX path of theSource)
set newPath to newPath as «class furl»
set openFile to open for access newPath with write permission
write «data rdatFEFF» to openFile starting at 0
close access openFile

TextWrangler and BBEdit open the resulting file flawlessly but alas, TextEdit crashes.
If I open with TextWrangler then save with an other name, the newly saved file opens flawlessly in TextEdit.
Puzzling isn't it ?


Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) mardi 14 mars 2017 17:51:41

Last edited by Yvan Koenig (2017-03-19 11:03:02 am)

Offline

 

#25 2017-03-14 05:33:47 pm

Shane Stanley
Member
From: Australia
Registered: 2002-12-07
Posts: 4967

Re: Change file encoding of TextEdit by Applescript

Nigel Garvey wrote:

I think "with explicit endianness specified" is just an explanation that the enum NSUTF16BigEndianStringEncoding is used to specify explicitly that the text is to be saved with UTF-16 big-endian encoding, not with the endianness native to the machine. It's an explicit instruction to writeToFile rather than an instruction to include an explicit BOM in the file. Maybe Shane will confirm this when he gets up.

That's right.

Actually, I just noticed this in Wikipedia, FWIW:

The standard also allows the byte order to be stated explicitly by specifying UTF-16BE or UTF-16LE as the encoding type. When the byte order is specified explicitly this way, a BOM is specifically not supposed to be prepended to the text, and a U+FEFF at the beginning should be handled as a zero-width non-breaking space character.

Nigel Garvey wrote:

According the the Xcode documentation, this enum was only introduced with MacOS 10.12, but it works on my 10.11 system.

The documentation is wrong (it says the same thing about NSASCIIStringEncoding mad). I believe it was introduced in 10.4.

Last edited by Shane Stanley (2017-03-14 05:48:34 pm)


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/

Offline

 

Board footer

Powered by FluxBB

[ Generated in 0.067 seconds, 8 queries executed ]

RSS (new topics) RSS (active topics)