Hello.
I could have done this so much easier with file (Bsd) and Satimage.osax but I guess, (But I’m not sure) that this method will be fastest in the long run, or after the first where the file is converted to UTF-8.
Most of the things here is stolen from this very good thread. -I have plagiated StefanK’s approach for detecting whether a file is UTF-8 encoded or not word for word. Bruce Phillips provided the mktemp incantation, and the rest of the people contributed to the general enlightenment on the subject.
This is thelist reader/writer -preserver, which will swap lists in and out to disk from AS choose list dialogs.
The list maintainer is yet to come, and will be a separated script object, since I will not force everyone to strictly read and write UTF-8.
What is there to say about the read routine apart from that it is big and ugly? It works!
-I chose to convolute a primitive handler, to have it all in one place; -it is of no use for any other purpose what so ever.
Since I added the writeUtf8FromList handler I have changed readUtf8IntoList to take a hfsFullPathNameAsText as the format for the file name parameter.
Regarding error handling:
Since AppleScript stores it’s properties upon return with a value and this handler will work in a context very much dependent of properties, there is no error number −128 or similar in here.
If the script is to save any properties or not, that should be taken care of by return statements from the run handler -implicit or explicit, with or without a value to induce a save of the properties of the script.
So the handlesr returns null upon fatal error and false when there were no success, otherwise it returns a list which can be empty if admitted to be. ( writeUtf8FromList returns either the list or null).
Edit Scratching my head and mourning over the fact that I had to write another handler for Tiger since it really feeds on MacRoman internally, -I got the idea of leveraging a little on Satimage.osax after all.
This saves some extra conversion of a file to MacRoman before reading it into the list, which would be tiresome -and slows thing even more down, the clue is to keep the file as UTF8 for maximum compatibility.
So If you want to use this and runs Tiger then you have to download Satimage.osax in a Smile bundle from This page.
Or if you are never going to use it with Tiger, just rip out the two blocks marked Satimage.osax -you can even strip out the majorNumber parameter with accompanying code.
I will soon provide some basic localization of the buttons.
Thanks for all help!
script utf8List
-- The Idea and implementation and any faults is totally mine. © McUsr 2010 and put in the Public Domain.
-- The usually guarrantees about nothing what so ever applies, use it at your own risk.
-- Read the documentation.
-- You are not allowed to post this code elsewhere, but may of course refer to the post at macscripter.net.
-- macscripter.net/viewtopic.php?id=33529
(*
TERMS OF USE.
This applies only to posting code, as long as you don't post it, you are welcome to do
whatever you want to do with it without any further permission.
Except for the following: Selling the code as is, or removing copyright statmentents and the embedded link in the code (without the http:// part) from the code.
You must also state what you eventually have done with the original source. This obviously doesn't matter if you distribure AppleScript as read only. I do not require you to embed any properties helding copyright notice for the code.
Credit for having contributed to your product would in all cases be nice!
If you use this code as part of script of yours you are of course welcome to post that code with my code in it here at macscripter.net. If you then wish to post your code elsewhere after having uploaded it to MacScripter.net please email me and ask for permission.
The ideal situation is however that you then refer to your code by a link to MacScripter.net
The sole reason for this, is that it is so much better for all of us to have a centralized codebase which are updated, than having to roam the net to find any usable snippets. Which some of us probabaly originated in the first hand.
I'm picky about this. If I find you to have published parts of my code on any other site without previous permission, I'll do what I can to have the site remove the code, ban you, and sue you under the jurisdiction of AGDER LAGMANNSRETT of Norway. Those are the terms you silently agree too by using this code.
The above paragraphs are also valid if you violate any of my terms.
If you use this or have advantage of this code in a professional setting, where professional setting means that you use this code to earn money by keeping yourself more productive. Or if you as an employee share the resulting script with other coworkers, enhancing the productivity of your company, then a modest donation to MacScripter.net would be appreciated.
*)
on readUtf8IntoList(hfsTargetPathAsText, txtAppTitle, theListToReturn, blnAcceptEmpty, majorNumber)
-- PARAMETERS
-- hfsTargetPathAsText
-- : Hfs pathname of target file to write as text.
-- txtAppTitle
-- : A string or text with the title of the main script.
--theListToReturn
-- : the list to return
-- blnAcceptEmpty
-- : whether reading a list from an empty file is acceptable or not.
-- majorNumber
-- : major revision number of Mac Os X: Tiger yields 4 Leopard yields 5 and so on.
-- RETURNS: a list, false or null
-- if it returns false then something just mildy failed.
-- if it returns null, then there is serious problems.
local theFname, tedim, encodingResult, theRes, tempFileName, infFname, endLineCounter, theLimit, pxFilenNameAsText
script theError
property errval : 0
end script
script o
property l : {}
end script
script fileReader -- convoluted handler to read an utf8 file which is specialiced, -keeps it in its scope of useability.
on readutf8File(hfsTargetPathAsText, refvarStatus, aMajorNumber)
local fp, theContents
try
hfsTargetPathAsText as alias
on error e number n
set contents of refvarStatus to n -- error code for no file found.
return false
end try
try
set fp to open for access alias hfsTargetPathAsText
on error e number n
set contents of refvarStatus to n -- error code for bad access.
return false
end try
try
set theContents to read fp as «class utf8»
on error e number n
try
close access fp
on error e number n
set contents of refvarStatus to n
return false
end try
if not n = -39 then
set contents of refvarStatus to 4000 -- error code for no utf8
else
set contents of refvarStatus to -39
end if
return false
end try
try
close access fp
on error
set contents of refvarStatus to n -- error code for close error.
return false
end try
if aMajorNumber < 5 then --Satimage.osax 3.3.1 block BEGINS
try
set theContents to readtext alias hfsTargetPathAsText encoding "MACINTOSH" -- *untested*
on error e number n
set contents of refvarStatus to 6000 -- error for Satimage.osax not installed
end try
end if --Satimage.osax 3.3.1 block ENDS
return theContents -- as Unicode text
end readutf8File
end script
try
set tedim to text item delimiters -- we are checking that we are actually getting a file, just in case.
set text item delimiters to ":"
set theFname to text item -1 of (hfsTargetPathAsText as alias as text)
set text item delimiters to tedim
if theFname is "" then -- bundle / app or directory!
error number 5000
end if
-- we know we have something that can be a file
set theRes to fileReader's readutf8File(hfsTargetPathAsText, a reference to theError's errval, majorNumber) -- trying to read an utf8 file.
if theRes is false then -- guess what - it wasn't or it was som other misheap that just happened.
set pxFilenNameAsText to quoted form of POSIX path of hfsTargetPathAsText
if theError's errval is 4000 then -- it is an encoding error
try -- figuring out which encoding the file was encoded with.
set encodingResult to do shell script "/usr/bin/file " & pxFilenNameAsText
on error e number n partial result p from f to t
error e number n partial result p from f to t
end try
-- extracts the name of the encodding
set text item delimiters to " "
set theRes to text item 3 of encodingResult
set text item delimiters to tedim
if theRes is in {"UTF-16", "extended-ASCII"} then
set tempFileName to quoted form of POSIX path of ((path to temporary items as text) & theFname)
try
set tempFileName to do shell script "/usr/bin/mktemp -t readUtf8IntoList"
on error e number n partial result p from f to t
error e number n partial result p from f to t
end try
set infFname to pxFilenNameAsText
if theRes is "extended-ASCII" then
set theRes to "MACROMAN"
end if
try
do shell script "iconv -f " & theRes & " -t UTF-8 " & infFname & " >" & tempFileName
do shell script "mv -f " & tempFileName & " " & infFname
on error e number n partial result p from f to t
error e number n partial result p from f to t
end try
set theRes to fileReader's readutf8File(hfsTargetPathAsText, a reference to theError's errval, majorNumber)
if theRes is false then
error number theError's errval
end if
else
-- can't do anything about it
error number 3000
end if
else -- something fatal
error number theError's errval
end if
end if
set theListToReturn to every paragraph of theRes
if not theListToReturn is {} then -- shaves off any empty lines at the end of the file.
set endLineCounter to -1
set theLimit to ((count theListToReturn) * (-1))
set o's l to theListToReturn
if last item of o's l is "" then
repeat while item endLineCounter of o's l is ""
set item endLineCounter of o's l to missing value
if endLineCounter > theLimit then
set endLineCounter to endLineCounter - 1
else
exit repeat
end if
end repeat
end if
set theListToReturn to theListToReturn's strings
end if
if blnAcceptEmpty is false then
if (count of theListToReturn) is 0 then return false
end if
return theListToReturn
on error e number n
if n = -39 then -- empty file
if blnAcceptEmpty is false then
tell me to display alert (txtAppTitle & ":
The file : " & hfsTargetPathAsText & " is empty!")
return false
else
return {}
end if
else if n = 3000 then
tell me to display alert (txtAppTitle & ":
The file : " & hfsTargetPathAsText & " was not encoded with utf8, utf16 or MacRoman encoding.
I can't read in such a file into a list. Check it out in an editor.")
return false
else if n = 4000 then
tell me to display alert (txtAppTitle & ":
The file : " & hfsTargetPathAsText & " has some troubles in it please check it in an editor.")
return false
else if n = 5000 then
tell me to display alert (txtAppTitle & ":
" & hfsTargetPathAsText & " is not a file that can be read into a list. Choose a proper file.")
return false
else if n = 6000 then --Satimage.osax 3.3.1 block BEGINS
tell me to display alert (txtAppTitle & ":
You need to install Satimage.osax in order to run this script under Mac Os X Tiger and earlier: Download and install the right version of of Smile (3.3.1 Regular Editon
from: http://www.satimage.fr/software/en/downloads/downloads_old_smile.html
If not: just rip the 2 blocks marked Satimage.osx out of the handler: readUtf8IntoList() and its internal readutf8File() handler.")
return null --Satimage.osax 3.3.1 block ENDS
else -- fatal errors goes here!
tell me to display alert (txtAppTitle & ":
The file : " & hfsTargetPathAsText & " got the error :
" & e & number & " : " & n)
return null
end if
end try
end readUtf8IntoList
on writeUtf8FromList(hfsTargetPathAsText, txtAppTitle, theListToWrite, majorNumber)
-- PARAMETERS
-- hfsTargetPathAsText
-- : Hfs pathname of target file to write as text.
-- txtAppTitle
-- : A string or text with the title of the main script.
--theListToWrite
-- : the list to write
-- majorNumber
-- : major revision number of Mac Os X: Tiger yields 4 Leopard yields 5 and so on.
-- RETURNS: a list or null
-- if it returns null, then there is serious problems.
-- you must use the returned list for further work.
script o
property l : theListToWrite
end script
script theError
property errval : 0
end script
local theResult
script fileWriter
on writeutf8File(hfsTargetPathAsText, theList, refvarStatus)
local fRef, theText, astid
-- insert an ending empty element at the end if not present.
if item -1 of theList is not "" then set end of theList to "" -- for ending linefeed.
set astid to text item delimiters
set text item delimiters to (run script "\"\\n\"") -- linefeed Thanks! to Nigel Garvey
set theText to "" & theList -- internal representation Tiger/Leopard
set text item delimiters to astid
try
set fRef to (open for access file hfsTargetPathAsText with write permission)
on error e number n
set contents of item -1 of theList to missing value -- removes empty item
set contents of refvarStatus to n -- some errorcode
return false
end try
try
set eof fRef to 0
write «data rdatEFBBBF» to fRef -- BOM Thanks! to Nigel Garvey
write theText to fRef as «class utf8»
on error e number n
set contents of item -1 of theList to missing value -- removes empty item
set contents of refvarStatus to n -- some errorcode
try
close access fRef
on error e number n
close access fRef
end try
return false
end try
try
close access fRef
on error e number n
set contents of item -1 of theList to missing value -- removes empty item
set contents of refvarStatus to n -- some errorcode
close access fRef
return false
end try
set text item delimiters to astid
set item -1 of theList to missing value -- removes empty item
return true
end writeutf8File
end script
if majorNumber < 4 then
tell me
activate
display alert (txtAppTitle & ":
Versions of Mac Os X earlier than 10.4.0 is unsupported: your version is 10." & Major & "xx")
end tell
return null
end if
set theResult to fileWriter's writeutf8File(hfsTargetPathAsText, o's l, a reference to theError's errval)
if theResult is false then
-- should have localization here!
tell me
activate
display alert (txtAppTitle & ":
The file : " & hfsTargetPathAsText & " got the error number : " & theError's errval)
end tell
return null
end if
return o's l's strings
end writeUtf8FromList
end script
Best Regards
McUsr