ScriptObject for read and write utf8 with unix LF Tiger compatible

Hello.

I could have done this so much easier with file (Bsd) and Satimage.osax but I guess, (But I’m not sure) that this method will be fastest in the long run, or after the first where the file is converted to UTF-8.
Most of the things here is stolen from this very good thread. -I have plagiated StefanK’s approach for detecting whether a file is UTF-8 encoded or not word for word. Bruce Phillips provided the mktemp incantation, and the rest of the people contributed to the general enlightenment on the subject.

This is thelist reader/writer -preserver, which will swap lists in and out to disk from AS choose list dialogs.
The list maintainer is yet to come, and will be a separated script object, since I will not force everyone to strictly read and write UTF-8.

What is there to say about the read routine apart from that it is big and ugly? It works!
-I chose to convolute a primitive handler, to have it all in one place; -it is of no use for any other purpose what so ever.

Since I added the writeUtf8FromList handler I have changed readUtf8IntoList to take a hfsFullPathNameAsText as the format for the file name parameter.

Regarding error handling:

Since AppleScript stores it’s properties upon return with a value and this handler will work in a context very much dependent of properties, there is no error number −128 or similar in here.

If the script is to save any properties or not, that should be taken care of by return statements from the run handler -implicit or explicit, with or without a value to induce a save of the properties of the script.

So the handlesr returns null upon fatal error and false when there were no success, otherwise it returns a list which can be empty if admitted to be. ( writeUtf8FromList returns either the list or null).

Edit Scratching my head and mourning over the fact that I had to write another handler for Tiger since it really feeds on MacRoman internally, -I got the idea of leveraging a little on Satimage.osax after all.
This saves some extra conversion of a file to MacRoman before reading it into the list, which would be tiresome -and slows thing even more down, the clue is to keep the file as UTF8 for maximum compatibility.

So If you want to use this and runs Tiger then you have to download Satimage.osax in a Smile bundle from This page.
Or if you are never going to use it with Tiger, just rip out the two blocks marked Satimage.osax -you can even strip out the majorNumber parameter with accompanying code. :slight_smile:

I will soon provide some basic localization of the buttons.

Thanks for all help!


script utf8List
	-- The Idea and implementation and any faults is totally mine. © McUsr 2010 and put in the Public Domain.
	-- The usually guarrantees about nothing what so ever applies, use it at your own risk.
	-- Read the documentation.
	-- You are not allowed to post this code elsewhere, but may of course refer to the post at macscripter.net.
	-- macscripter.net/viewtopic.php?id=33529
(*
TERMS OF USE. 
This applies only to posting code, as long as you don't post it, you are welcome to do
whatever you want to do with it without any further permission. 
Except for the following: Selling the code as is, or removing copyright statmentents and the embedded link in the code (without the http:// part) from the code. 

You must also state what you eventually have done with the original source. This obviously doesn't matter if you  distribure AppleScript as read only. I do not require you to embed any properties helding copyright notice for the code.

Credit for having contributed to your product would in all cases be nice!

If you use this code as part of script of yours you are of course welcome to post that code with my code in it here at macscripter.net. If you then wish to post your code elsewhere after having uploaded it to MacScripter.net please email me and ask for permission.

The ideal situation is however that you then refer to your code by a link to MacScripter.net
The sole reason for this, is that it is so much better for all of us to have a centralized codebase which are updated, than having to roam the net to find any usable snippets. Which some of us probabaly originated in the first hand.

I'm picky about this. If I find you to have published parts of my code on any other site without previous permission, I'll do what I can to have the site remove the code, ban you, and sue you under the jurisdiction of AGDER LAGMANNSRETT of Norway. Those are the terms you silently agree too by using this code. 

The above paragraphs are also valid if you violate any of my terms.

If you use this or have advantage of this code in a professional setting, where professional setting means that you use this code to earn money by keeping yourself more productive. Or if you as an employee share the resulting script with other coworkers, enhancing the productivity of your company, then a modest donation to MacScripter.net would be appreciated.
*)

	on readUtf8IntoList(hfsTargetPathAsText, txtAppTitle, theListToReturn, blnAcceptEmpty, majorNumber)
		-- PARAMETERS
		-- hfsTargetPathAsText		
		--					: Hfs pathname of target file to write as text.
		-- txtAppTitle
		--					: A string or text with the title of the main script.
		--theListToReturn
		--					: the list to return 
		-- blnAcceptEmpty
		--					: whether reading a list from an empty file is acceptable or not.
		-- majorNumber
		--					: major revision number of Mac Os X: Tiger yields 4 Leopard yields 5 and so on.
		
		-- RETURNS: a list, false or null
		-- if it returns false then something just mildy failed.
		-- if it returns null, then there is serious problems.
		
		local theFname, tedim, encodingResult, theRes, tempFileName, infFname, endLineCounter, theLimit, pxFilenNameAsText
		script theError
			property errval : 0
		end script
		script o
			property l : {}
		end script
		script fileReader --  convoluted handler to read an utf8 file which is specialiced, -keeps it in its scope of useability.
			on readutf8File(hfsTargetPathAsText, refvarStatus, aMajorNumber)
				local fp, theContents
				try
					hfsTargetPathAsText as alias
				on error e number n
					set contents of refvarStatus to n -- error code for no file found.
					return false
				end try
				try
					set fp to open for access alias hfsTargetPathAsText
				on error e number n
					set contents of refvarStatus to n -- error code for bad access.
					return false
				end try
				try
					set theContents to read fp as «class utf8»
				on error e number n
					try
						close access fp
					on error e number n
						set contents of refvarStatus to n
						return false
					end try
					if not n = -39 then
						set contents of refvarStatus to 4000 -- error code for no utf8
					else
						set contents of refvarStatus to -39
					end if
					return false
				end try
				try
					close access fp
				on error
					set contents of refvarStatus to n -- error code for close error.
					return false
				end try
				if aMajorNumber < 5 then --Satimage.osax 3.3.1 block BEGINS 
					try
						set theContents to readtext alias hfsTargetPathAsText encoding "MACINTOSH" -- *untested*
					on error e number n
						set contents of refvarStatus to 6000 -- error for Satimage.osax not installed
					end try
				end if --Satimage.osax  3.3.1  block ENDS
				return theContents --  as Unicode text
			end readutf8File
		end script
		
		try
			set tedim to text item delimiters -- we are checking that we are actually getting a file, just in case.
			set text item delimiters to ":"
			set theFname to text item -1 of (hfsTargetPathAsText as alias as text)
			set text item delimiters to tedim
			if theFname is "" then -- bundle / app or directory!
				error number 5000
			end if
			-- we know we have something that can be a file
			
			
			set theRes to fileReader's readutf8File(hfsTargetPathAsText, a reference to theError's errval, majorNumber) -- trying to read an utf8 file.
			
			if theRes is false then -- guess what - it wasn't or it was som other misheap that just happened.
				set pxFilenNameAsText to quoted form of POSIX path of hfsTargetPathAsText
				if theError's errval is 4000 then -- it is an encoding error
					try -- figuring out which encoding the file was encoded with.
						set encodingResult to do shell script "/usr/bin/file  " & pxFilenNameAsText
					on error e number n partial result p from f to t
						error e number n partial result p from f to t
					end try
					-- extracts the name of the encodding
					set text item delimiters to " "
					set theRes to text item 3 of encodingResult
					set text item delimiters to tedim
					
					
					if theRes is in {"UTF-16", "extended-ASCII"} then
						set tempFileName to quoted form of POSIX path of ((path to temporary items as text) & theFname)
						try
							set tempFileName to do shell script "/usr/bin/mktemp -t readUtf8IntoList"
						on error e number n partial result p from f to t
							error e number n partial result p from f to t
						end try
						
						set infFname to pxFilenNameAsText
						if theRes is "extended-ASCII" then
							set theRes to "MACROMAN"
						end if
						try
							do shell script "iconv -f " & theRes & " -t UTF-8 " & infFname & " >" & tempFileName
							do shell script "mv -f " & tempFileName & " " & infFname
						on error e number n partial result p from f to t
							error e number n partial result p from f to t
						end try
						
						set theRes to fileReader's readutf8File(hfsTargetPathAsText, a reference to theError's errval, majorNumber)
						if theRes is false then
							error number theError's errval
						end if
					else
						-- can't do anything about it
						error number 3000
					end if
				else -- something fatal
					error number theError's errval
				end if
			end if
			set theListToReturn to every paragraph of theRes
			if not theListToReturn is {} then -- shaves off any empty lines at the end of the file.
				set endLineCounter to -1
				set theLimit to ((count theListToReturn) * (-1))
				set o's l to theListToReturn
				if last item of o's l is "" then
					repeat while item endLineCounter of o's l is ""
						set item endLineCounter of o's l to missing value
						if endLineCounter > theLimit then
							set endLineCounter to endLineCounter - 1
						else
							exit repeat
						end if
					end repeat
				end if
				set theListToReturn to theListToReturn's strings
			end if
			if blnAcceptEmpty is false then
				if (count of theListToReturn) is 0 then return false
			end if
			
			return theListToReturn
		on error e number n
			if n = -39 then -- empty file
				if blnAcceptEmpty is false then
					tell me to display alert (txtAppTitle & ":
The file : " & hfsTargetPathAsText & " is empty!")
					return false
				else
					return {}
				end if
			else if n = 3000 then
				tell me to display alert (txtAppTitle & ":
The file : " & hfsTargetPathAsText & " was not encoded with utf8, utf16 or MacRoman encoding.
					I can't read in such a file into a list. Check it out in an editor.")
				return false
			else if n = 4000 then
				tell me to display alert (txtAppTitle & ":
The file : " & hfsTargetPathAsText & " has some troubles in it please check it in an editor.")
				return false
			else if n = 5000 then
				tell me to display alert (txtAppTitle & ":
" & hfsTargetPathAsText & " is not a file that can be read into a list. Choose a proper file.")
				return false
			else if n = 6000 then --Satimage.osax 3.3.1 block BEGINS 
				tell me to display alert (txtAppTitle & ":
You need to install Satimage.osax in order to run this script under Mac Os X Tiger and earlier: Download and install the right version of of Smile (3.3.1 Regular Editon
from:  http://www.satimage.fr/software/en/downloads/downloads_old_smile.html
If not: just rip the 2 blocks marked Satimage.osx out of the handler: readUtf8IntoList() and its internal readutf8File() handler.")
				return null --Satimage.osax 3.3.1 block ENDS 
			else -- fatal errors goes here!
				tell me to display alert (txtAppTitle & ":
The file : " & hfsTargetPathAsText & " got the error :
" & e & number & " : " & n)
				return null
			end if
		end try
	end readUtf8IntoList
	
	on writeUtf8FromList(hfsTargetPathAsText, txtAppTitle, theListToWrite, majorNumber)
		-- PARAMETERS
		-- hfsTargetPathAsText		
		--					: Hfs pathname of target file to write as text.
		-- txtAppTitle
		--					: A string or text with the title of the main script.
		--theListToWrite
		--					: the list to write 
		
		-- majorNumber
		--					: major revision number of Mac Os X: Tiger yields 4 Leopard yields 5 and so on.
		
		-- RETURNS: a list or null
		-- if it returns null, then there is serious problems.
		-- you must use the returned list for further work.
		script o
			property l : theListToWrite
		end script
		script theError
			property errval : 0
		end script
		local theResult
		script fileWriter
			
			on writeutf8File(hfsTargetPathAsText, theList, refvarStatus)
				local fRef, theText, astid
				-- insert an ending empty element at the end if not present.			
				if item -1 of theList is not "" then set end of theList to "" -- for ending linefeed.
				set astid to text item delimiters
				set text item delimiters to (run script "\"\\n\"") -- linefeed Thanks! to Nigel Garvey
				set theText to "" & theList -- internal representation Tiger/Leopard
				set text item delimiters to astid
				
				try
					set fRef to (open for access file hfsTargetPathAsText with write permission)
				on error e number n
					set contents of item -1 of theList to missing value -- removes empty item
					set contents of refvarStatus to n -- some errorcode
					return false
				end try
				try
					set eof fRef to 0
					
					write «data rdatEFBBBF» to fRef -- BOM Thanks! to Nigel Garvey
					write theText to fRef as «class utf8»
				on error e number n
					set contents of item -1 of theList to missing value -- removes empty item
					set contents of refvarStatus to n -- some errorcode
					try
						close access fRef
					on error e number n
						close access fRef
					end try
					return false
				end try
				try
					close access fRef
				on error e number n
					set contents of item -1 of theList to missing value -- removes empty item
					set contents of refvarStatus to n -- some errorcode
					close access fRef
					return false
				end try
				set text item delimiters to astid
				set item -1 of theList to missing value -- removes empty item
				return true
			end writeutf8File
		end script
		if majorNumber < 4 then
			tell me
				activate
				display alert (txtAppTitle & ":
Versions of Mac Os X earlier than 10.4.0 is unsupported: your version is 10." & Major & "xx")
			end tell
			return null
		end if
		set theResult to fileWriter's writeutf8File(hfsTargetPathAsText, o's l, a reference to theError's errval)
		if theResult is false then
			-- should have localization here!
			tell me
				activate
				display alert (txtAppTitle & ":
The file : " & hfsTargetPathAsText & " got the error number : " & theError's errval)
			end tell
			return null
		end if
		return o's l's strings
	end writeUtf8FromList
end script


Best Regards

McUsr

Hello.

I have updated the handler [b]readUtf8IntoList/b to “shave off” any trailing empty lines in a both better and faster way.

Best Regards

McUsr

Hello.

I have removed some errors concerning error handling and empty files from [b] readUtf8IntoList/b.

Best Regards

McUsr