I’m trying to write a totally general list to Unicode text handler, in order to be able to edit a list with TextEdit on platforms including at least Tiger. I thought I could come up with some common denominator that would work everywhere, but I’m unsure about the result. So if you either can tell me directly or test it on Leopard and Tiger it would be appreciated. FYI I’m not done with the function, I’m just after the correct text.
The next question: is how do I write a bom header so that TextEdit will recognize it as utf8?
After that: Am I really on a bad track, what I want is to be able to store as many different characters as possible
, while maintaining maximum compatibility with the different versions of Mac Os X, utf-8 seemed like a good choice,
but if anyone has a say on this, be welcome!
set mylist to {}
set end of mylist to "Éspana"
set end of mylist to ""
set end of mylist to " Å’"
set end of mylist to "" ”must remember this!
set thePath to (path to desktop as Unicode text) & "Aardvark.txt"
writeListToUtfFile(thePath, mylist)
on writeListToUtfFile(hfsTargetPathAsText, theList)
local fRef, theText, asti
script o
property l : theList
end script
set AppleScript's text item delimiters to { character id 10 }
set theText to o's l as Unicode text
set AppleScript's text item delimiters to ""
set fRef to (open for access file hfsTargetPathAsText with write permission)
try
set eof fRef to 0
write theText to fRef as «class utf8»
end try
close access fRef
end writeListToUtfFile
the script doesn’t work in Tiger, because character id has been introduced in Leopard.
For Tiger compatibility use ASCII character instead.
There is a constant return for ASCII character 13 which also works in Tiger.
The constant linefeed (ASCII character 10) works only in Leopard and higher.
Do you really need CR+LF? Only Windows uses both, the usual line delimiter of Mac OS is CR, UNIX uses LF.
The script object in the handler is syntactic sugar. There is no benefit at all of the script object
because there is only one access to the list while coercing it to text.
The text is written as UTF 16 (also in Tiger). If it’s intended to wite UTF-16 anyway, I recommend to write the BOM at the beginning of the file.
In Tiger and Leopard the braces round text item delimiters have no effect, you can omit them.
It’s recommended to reset text item delimiters to the former state after using them.
The addition AppleScript’s is only necessary in an application tell block.
For best reliability I would add an error handling. In your script you get an (uncatched) error,
if something happens during the writing operation.
on writeListToUtfFile(hfsTargetPathAsText, theList)
set TID to text item delimiters
set text item delimiters to return
set theText to theList as Unicode text
set text item delimiters to TID
try
set fRef to (open for access file hfsTargetPathAsText with write permission)
set eof fRef to 0
write theText to fRef
close access fRef
on error
try
close access file hfsTargetPathAsText
end try
end try
end writeListToUtfFile
Edit: I see, that you edited some parts of the script. My suggestions are for the original version
Last time I used Tiger, the only thing that was utf-16 was the Mac Os X part, very annoying, to have filenames encoded with utf-16, when the text was only encoded as utf-8! Has that changed?
The idea is to be able to write text to a least common denominator, and I figured utf-8, as one would then be able
to read the text fairly from the shell also.
I also hoped to skip the testing for versioning, but as you read this, I’m happily looting your code here.
I’m also a bit suspicious to wether the code as it will be when uncommented, will compile under Tiger at all
given the character id statements, maybe I’ll go deprecated.
I found the BOM line for utf8 in this post by Julio,
before I got the link from Stefan which deal with the topic in general more thoroghly.
set mylist to {}
set end of mylist to "Éspana"
set end of mylist to ""
set end of mylist to " Å’"
set end of mylist to "" --must remember this!
set thePath to (path to desktop as Unicode text) & "Aardvark.txt"
writeListToUtfFile(thePath, mylist)
on writeListToUtfFile(hfsTargetPathAsText, theList)
local fRef, theText, astid, major
set major to (system attribute "sysv") mod 4096 div 16
if major < 4 then return false -- pre tiger
set astid to text item delimiters
set text item delimiters to {ASCII character 10}
set theText to theList as Unicode text
set text item delimiters to ""
set fRef to (open for access file hfsTargetPathAsText with write permission)
try
set eof fRef to 0
write ((ASCII character 239) & (ASCII character 187) & (ASCII character 191)) to f --> not as «class utf8»
write theText to fRef as «class utf8»
end try
close access fRef
set text item delimiters to astid
return true
end writeListToUtfFile
Have you read my post about error handling and character id doesn’t work in Tiger (and the other things)? ASCII character is deprecated, but still works
I haven’t got around to error handling yet. As I am still unsure about how I will call the handler.
Is what happens within a handler outside of the terminology scope of a tell block?
So that I am totally secure by just using text items delimiters from a handler, when there is no
specific tell block in the handler?
I post an update of the handler with just the code for handling Os Versions and utf8 text in the post above.
The error handling is still not in place. -It is going to reside within an object, for conversion from lists to utf8 text and back again. And there is some other details as well.
Edit The script object was an irrational idea, so I remove it. I’ll post the finished function, with the
rest of it in code exchange as a listPreserver object when it is done, hopefully tonight.
Best Regards and thanks for your help and comments and not the least the link which I will study.
One can’t be careful enough with the text item delimiters.
As I stated in the post above under Edit I will post a listPreserver object in Code Exchange when it is done, and that will most definitively include robust error handling
-Also with regard to reading “wrongly” encoded files.
To get rid of the ASCII character versus character id problem, I use this piece of code :
if 5 > (system attribute "sys2") then
set NIL to ASCII character 0
set linefeed to ASCII character 10
set noBreak to ASCII character 202
else
set NIL to character id 0
set noBreak to character id 160
end if
Tiger is unable to use character id but it is able to compile it flawlessly.
I define linefeed in Tiger. No need to define it in Leopard and SnowLeopard
Yvan KOENIG (VALLAURIS, France) dimanche 4 juillet 2010 12:30:28
I believe the system attributes “sys1”, “sys2”, and “sys3” were only introduced when Tiger went up to version 10.4.10. It’s safer to use “sysv” here.
if (4176 > (system attribute "sysv")) then
-- Pre-Leopard.
else
-- Leopard & later.
end if
I have my doubts about the ability of earlier versions of Tiger to compile ‘character id’ into a form usable by Leopard or later, but I don’t actually know.
If McUsr’s script has to work in Tiger and later, another, less efficient but not deprecated way to set the TIDs to a linefeed would be:
set AppleScript's text item delimiters to (run script "\"\\n\"")
The simplest and least controversial way to write a UTF-8 BOM to a file is (as in the thread pointed out by Stefan):
write «data rdatEFBBBF» to fRef -- the 'f' in McUsr's script in post #4 above is an error.
It is all done now, and as I remember I were a little bit anxious about wether it compiled at all under Tiger when using the character id.
I’ is nice to that there is many alternatives for circumventing this particular problem.
And very nice to know that Yvans KOENIGS code compiles on Tiger. I have been rather anxious about that.
Maybe it then will compile earlier as well, as long as the lines aren’t used? -Like in Yvan’s block?
Here is what I finally came up with, with what I regard as proper error handling (robust).
I posted it here, as it is just one small piece of a bigger hole, and there is no reason to clutter up the name space in Code Exchange with this, as it is to be included at some point.
It should write a list to utf-8 in both Tiger, Leopard and Snow Leopard. (Later seems to be a dangerous word ).
Thanks to all of you for your feedback. And special thanks to Yvan Koenig for spotting that last dreaded typo!
property theErrorcode : 0
set myList to {}
set end of myList to "Éspana"
set end of myList to ""
set end of myList to " Å’"
set end of myList to "" --must remember this!
set thePath to (path to desktop as Unicode text) & "Aardvark.txt"
set Major to (system attribute "sysv") mod 4096 div 16
writeListToUtfFile(thePath, myList, a reference to theErrorCode, Major)
on writeListToUtfFile(hfsTargetPathAsText, theList, refvarStatus, aMajorNumber)
” you will have to set theList to theList's strings to remove the missing value.
local fRef, theText, astid
if aMajorNumber < 4 then return false -- pre tiger for safety, will be tested by caller.
-- insert an ending empty element at the end if not present.
if item -1 of theList is not "" then set end of theList to "" -- for ending linefeed.
set astid to text item delimiters
set text item delimiters to (run script "\"\\n\"") ” linefeed Thanks to Nigel Garvey
set theText to "" & theList -- internal representation Tiger/Leopard
set text item delimiters to astid
try
set fRef to (open for access file hfsTargetPathAsText with write permission)
on error e number n
set item -1 of theList to missing value -- removes empty item
set contents of refvarStatus to n -- some errorcode
return false
end try
try
set eof fRef to 0
write «data rdatEFBBBF» to fRef -- BOM by Nigel Garvey
write theText to fRef as «class utf8»
on error e number n
set item -1 of theList to missing value -- removes empty item
set contents of refvarStatus to n -- some errorcode
try
close access fRef
on error e number n
close access fRef
end try
return false
end try
try
close access fRef
on error e number n
set item -1 of theList to missing value -- removes empty item
set contents of refvarStatus to n -- some errorcode
close access fRef
return false
end try
set text item delimiters to astid
set item -1 of theList to missing value -- removes empty item
return true
end writeListToUtfFile