Friday, October 22, 2021

#1 2021-10-06 08:34:57 am

Joy
Member
From:: South Tirol
Registered: 2008-07-04
Posts: 673
Website

Read file - respecting encoding

The Standard addition's read file command is not ideal, because you never know how to read simple "*.txt" files,

- Chinese text and smiles need "read as Unicode text"
- western text documents get messed up if you use "read as Unicode text"

You can use

Applescript:

Do shell script "file -brief '" & posix path of sel & "'"

But that analysis isn't sufficient to determine how to read txt encoding. If I knew, I could simply use iconv -f

I want a reliable way to convert txt documents into utf8 encoded files. So I tried with :

Applescript:

Do shell script "iconv -t ASCII//TRANSLIT '" & posix path of sel & "'"

But it throws an error

Offline

 

#2 2021-10-06 04:46:53 pm

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 6708

Re: Read file - respecting encoding

Joy wrote:

I want a reliable way to convert txt documents into utf8 encoded files



Standard Additions's "read as Unicode text" reads as UTF-16 -- for UTF-8 use "read as «class utf8»".

To answer your question directly:

Applescript:

use AppleScript version "2.4"
use scripting additions
use framework "Foundation"

on convertTextFileAt:posixPath
   set pathNSString to current application's NSString's stringWithString:posixPath
   set theExt to pathNSString's pathExtension()
   if theExt as text is "txt" then
       set theNSData to current application's NSData's dataWithContentsOfFile:posixPath
       set theOptionsDict to current application's NSDictionary's dictionaryWithObject:false ¬
           forKey:(current application's NSStringEncodingDetectionAllowLossyKey)
       set {theEncoding, theNSString} to current application's NSString's stringEncodingForData:theNSData ¬
           encodingOptions:theOptionsDict convertedString:(reference) usedLossyConversion:(missing value)
       if theEncoding = 0 then
           error "Unknown encoding"
       end if
       set theNewPath to pathNSString's stringByDeletingPathExtension()'s stringByAppendingString:"-utf8.txt"
       theNSString's writeToFile:theNewPath atomically:true encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
   end if
end convertTextFileAt:


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/
latenightsw.com

Offline

 

Board footer

Powered by FluxBB

RSS (new topics) RSS (active topics)