Will this open up correctly as Unicode Text in Tiger and Leopard?

Hello.

I’m trying to write a totally general list to Unicode text handler, in order to be able to edit a list with TextEdit on platforms including at least Tiger. I thought I could come up with some common denominator that would work everywhere, but I’m unsure about the result. So if you either can tell me directly or test it on Leopard and Tiger it would be appreciated. FYI I’m not done with the function, I’m just after the correct text.

The next question: is how do I write a bom header so that TextEdit will recognize it as utf8? :slight_smile:

After that: Am I really on a bad track, what I want is to be able to store as many different characters as possible
, while maintaining maximum compatibility with the different versions of Mac Os X, utf-8 seemed like a good choice,
but if anyone has a say on this, be welcome!


set mylist to {}

set end of mylist to "Éspana"
set end of mylist to ""
set end of mylist to " Å’"
set end of mylist to "" ”must remember this!

set thePath to (path to desktop as Unicode text) & "Aardvark.txt"

writeListToUtfFile(thePath, mylist)

on writeListToUtfFile(hfsTargetPathAsText, theList)
	local fRef, theText, asti
	script o
		property l : theList
	end script
	set AppleScript's text item delimiters to {  character id 10 }
	set theText to o's l as Unicode text 
	set AppleScript's text item delimiters to ""
	
	set fRef to (open for access file hfsTargetPathAsText with write permission)
	try
		set eof fRef to 0
		write theText to fRef as «class utf8»
	end try
	close access fRef
end writeListToUtfFile

Best Regards

McUsr

Hi,

the script doesn’t work in Tiger, because character id has been introduced in Leopard.
For Tiger compatibility use ASCII character instead.
There is a constant return for ASCII character 13 which also works in Tiger.
The constant linefeed (ASCII character 10) works only in Leopard and higher.

Do you really need CR+LF? Only Windows uses both, the usual line delimiter of Mac OS is CR, UNIX uses LF.
The script object in the handler is syntactic sugar. There is no benefit at all of the script object
because there is only one access to the list while coercing it to text.

The text is written as UTF 16 (also in Tiger). If it’s intended to wite UTF-16 anyway, I recommend to write the BOM at the beginning of the file.

In Tiger and Leopard the braces round text item delimiters have no effect, you can omit them.

It’s recommended to reset text item delimiters to the former state after using them.

The addition AppleScript’s is only necessary in an application tell block.

For best reliability I would add an error handling. In your script you get an (uncatched) error,
if something happens during the writing operation.


on writeListToUtfFile(hfsTargetPathAsText, theList)
	
	set TID to text item delimiters
	set text item delimiters to return
	set theText to theList as Unicode text
	set text item delimiters to TID
	
	try
		set fRef to (open for access file hfsTargetPathAsText with write permission)
		set eof fRef to 0
		write theText to fRef
		close access fRef
	on error
		try
			close access file hfsTargetPathAsText
		end try
	end try
end writeListToUtfFile

Edit: I see, that you edited some parts of the script. My suggestions are for the original version

PS: This thread describes how to write the diverse BOM

Hello and thanks for your input Stefan

Last time I used Tiger, the only thing that was utf-16 was the Mac Os X part, very annoying, to have filenames encoded with utf-16, when the text was only encoded as utf-8! Has that changed?

The idea is to be able to write text to a least common denominator, and I figured utf-8, as one would then be able
to read the text fairly from the shell also.

I also hoped to skip the testing for versioning, but as you read this, I’m happily looting your code here. :slight_smile:

I’m also a bit suspicious to wether the code as it will be when uncommented, will compile under Tiger at all
given the character id statements, maybe I’ll go deprecated.
I found the BOM line for utf8 in this post by Julio,
before I got the link from Stefan which deal with the topic in general more thoroghly.


set mylist to {}

set end of mylist to "Éspana"
set end of mylist to ""
set end of mylist to " Å’"
set end of mylist to "" --must remember this!

set thePath to (path to desktop as Unicode text) & "Aardvark.txt"

writeListToUtfFile(thePath, mylist)

on writeListToUtfFile(hfsTargetPathAsText, theList)
	local fRef, theText, astid, major
	set major to (system attribute "sysv") mod 4096 div 16
	if major < 4 then return false -- pre tiger
	set astid to text item delimiters
	set text item delimiters to {ASCII character 10}
	set theText to theList as Unicode text
	set text item delimiters to ""
	
	set fRef to (open for access file hfsTargetPathAsText with write permission)
	try
		set eof fRef to 0
		write ((ASCII character 239) & (ASCII character 187) & (ASCII character 191)) to f --> not as «class utf8» 
		write theText to fRef as «class utf8»
	end try
	close access fRef
	set text item delimiters to astid
	return true
end writeListToUtfFile

Best Regards

McUsr

Have you read my post about error handling and character id doesn’t work in Tiger (and the other things)? ASCII character is deprecated, but still works

Hello,

I haven’t got around to error handling yet. As I am still unsure about how I will call the handler.

Is what happens within a handler outside of the terminology scope of a tell block?
So that I am totally secure by just using text items delimiters from a handler, when there is no
specific tell block in the handler?

I post an update of the handler with just the code for handling Os Versions and utf8 text in the post above.
The error handling is still not in place. -It is going to reside within an object, for conversion from lists to utf8 text and back again. And there is some other details as well.

Edit The script object was an irrational idea, so I remove it. I’ll post the finished function, with the
rest of it in code exchange as a listPreserver object when it is done, hopefully tonight.

Best Regards and thanks for your help and comments and not the least the link which I will study.

McUsr

Any (tell) block affects only its scope

Thanks :slight_smile:

One can’t be careful enough with the text item delimiters.

As I stated in the post above under Edit I will post a listPreserver object in Code Exchange when it is done, and that will most definitively include robust error handling

-Also with regard to reading “wrongly” encoded files. :wink:

Best Regards

McUsr

Hello.

This one is done and can be found here.

Best Regards

McUsr

To get rid of the ASCII character versus character id problem, I use this piece of code :


if 5 > (system attribute "sys2") then
	set NIL to ASCII character 0
	set linefeed to ASCII character 10
	set noBreak to ASCII character 202
else
	set NIL to character id  0
	set noBreak to character id 160
end if

Tiger is unable to use character id but it is able to compile it flawlessly.
I define linefeed in Tiger. No need to define it in Leopard and SnowLeopard

Yvan KOENIG (VALLAURIS, France) dimanche 4 juillet 2010 12:30:28

Hi.

I believe the system attributes “sys1”, “sys2”, and “sys3” were only introduced when Tiger went up to version 10.4.10. It’s safer to use “sysv” here.


if (4176 > (system attribute "sysv")) then
	-- Pre-Leopard.
else
	-- Leopard & later.
end if

I have my doubts about the ability of earlier versions of Tiger to compile ‘character id’ into a form usable by Leopard or later, but I don’t actually know.

If McUsr’s script has to work in Tiger and later, another, less efficient but not deprecated way to set the TIDs to a linefeed would be:


set AppleScript's text item delimiters to (run script "\"\\n\"")

The simplest and least controversial way to write a UTF-8 BOM to a file is (as in the thread pointed out by Stefan):

write «data rdatEFBBBF» to fRef -- the 'f' in McUsr's script in post #4 above is an error.

Thanks.

It is all done now, and as I remember I were a little bit anxious about wether it compiled at all under Tiger when using the character id.

I’ is nice to that there is many alternatives for circumventing this particular problem.
And very nice to know that Yvans KOENIGS code compiles on Tiger. I have been rather anxious about that.

Maybe it then will compile earlier as well, as long as the lines aren’t used? -Like in Yvan’s block?

Best Regards

McUsr

Hello.

Here is what I finally came up with, with what I regard as proper error handling (robust).
I posted it here, as it is just one small piece of a bigger hole, and there is no reason to clutter up the name space in Code Exchange with this, as it is to be included at some point.

It should write a list to utf-8 in both Tiger, Leopard and Snow Leopard. (Later seems to be a dangerous word :slight_smile: ).

Thanks to all of you for your feedback. And special thanks to Yvan Koenig for spotting that last dreaded typo!


property theErrorcode : 0
set myList to {}

set end of myList to "Éspana"
set end of myList to ""
set end of myList to " Å’"
set end of myList to "" --must remember this!

set thePath to (path to desktop as Unicode text) & "Aardvark.txt"
set Major to (system attribute "sysv") mod 4096 div 16
writeListToUtfFile(thePath, myList, a reference to theErrorCode, Major)

on writeListToUtfFile(hfsTargetPathAsText, theList, refvarStatus, aMajorNumber)
” you will have to set theList to theList's strings to remove the missing value.
	local fRef, theText, astid
	
	if aMajorNumber < 4 then return false -- pre tiger for safety, will be tested by caller.
	-- insert an ending empty element at the end if not present.
	
	if item -1 of theList is not "" then set end of theList to "" -- for ending linefeed.
	set astid to text item delimiters
	set text item delimiters to (run script "\"\\n\"") ” linefeed Thanks to  Nigel Garvey
	set theText to "" & theList -- internal representation Tiger/Leopard
	set text item delimiters to astid
	
	try
		set fRef to (open for access file hfsTargetPathAsText with write permission)
	on error e number n
		set item -1 of theList to missing value -- removes empty item
		set contents of refvarStatus to n -- some errorcode
		return false
	end try
	try
		set eof fRef to 0
		
		write «data rdatEFBBBF» to fRef -- BOM by Nigel Garvey
		write theText to fRef as «class utf8»
	on error e number n
		set item -1 of theList to missing value -- removes empty item
		set contents of refvarStatus to n -- some errorcode
		try
			close access fRef
		on error e number n
			close access fRef
		end try
		return false
	end try
	try
		close access fRef
	on error e number n
		set item -1 of theList to missing value -- removes empty item
		set contents of refvarStatus to n -- some errorcode
		close access fRef
		return false
	end try
	set text item delimiters to astid
	set item -1 of theList to missing value -- removes empty item
	return true
end writeListToUtfFile

Best Regards

McUsr