utf-8 and open location

if anyone comes to this thread looking on how to open your applescript from safari hers is a link on how to

http://www.macosxautomation.com/applescript/linktrigger/index.html

–edited out incorrect solution that I thought I found

Also:

http://www.occupantunknown.com/missinglink/missinglink.html

Peter B.


if you type mailto://@.com in safari url. this will open mail and work correctly. So it is something with applescript and how I am handling the variable

It doesn’t work with my default mail client, which suggests it’s up to the receiving application to know how to handle whatever’s in URLs passed to it.

If you’re only expecting a few exotic characters, you can write your own look-up table into the script and use AppleScript’s text item delimiters to replace instances of, say, “%EF%A3%BF” with “”, or “%20” with " ".

Otherwise, the Satimage OSAX has a command which unencodes URLs:

unescapeURL "awayToOpen://%EF%A3%BF"
--> "awayToOpen://"

There are no doubt other ways, including writing your own interpreter…

thank you this pointed me in the right direction someone posted a way to do this with a ruby shell command

on decode_text(encodedstring)
	do shell script "echo " & quoted form of ¬
		encodedstring & " | /usr/bin/ruby -r cgi -e \"print CGI.unescape(STDIN.read).gsub('+',' ')\""
end decode_text

http://www.macscripter.net/viewtopic.php?id=34479

Scott,

This problem is an url encoding problem. We know it is the encoding described in RFC 3986. When you read the last paragraph of chapter 2.5 you will read that when you use UTF-8 encoding (multibyte UCS) the data (read:byte values) should be coerced to octets and percent escaped. This is because URL’s are still today US-ASCII encoded according to RFC 3986. So the data that comes into your script should be decoded (like mail did automatically for you)

rawURLEncode("") --result:"%EF%A3%BF"
rawURLDecode("%EF%A3%BF") --result:""

on rawURLEncode(str)
	return do shell script "/bin/echo -n " & quoted form of str & " | php -r ' echo rawurlencode(fgets(STDIN)); '"
end rawURLEncode

on rawURLDecode(str)
	return do shell script "/bin/echo -n " & quoted form of str & " | php -r ' echo rawurldecode(fgets(STDIN)); '"
end rawURLDecode

Thanks DJ Bazzie Wazzie. That gives me a better understanding of what is going on. Is there an advantage to use the PHP version vs Ruby that I posted? other than you have an example of both encode and decode?

The actual reason of my post was (as you can see in the time difference of our posts) that you were posting while I was already writing.

yes. i see that now. thanks again. its was extremely helpful and i think I’ll use php anyway as I have a working example of both encode and decode.

Oh well. I have a vanilla version for this. :smiley:

Those handlers comes from the Apple Script Guidebook, that has been long gone, I have upgraded the handlers to use id instead of ASCII number

The routines not marked with ” not from the guidebook are copyright Apple. Inc. I sense I may have done something wrong when starting to use character id, instead of ASCII number, but ASCII number is deprecated.

The rationale for the vanilla solution, is a thought that has gradually matured in me. I believe those handlers to drain less battery, than having Osaxen in memory 24/7. If somebody knows for sure that I am wrong in this, please do enlighten me! :slight_smile:

Edit

I have learned that this code really doesn’t work well with utf-8, when the characters are in the range above the usual 7-bit ascii. Use it at your own risk!


script URLLib
	property allowed_URL_chars : (characters of "$-_.+!*'(),1234567890abcdefghijklmnopqrstuvwxyz")
	property hex_list : (characters of "0123456789ABCDEF")
	
	--	return encode_URL_string("Encode me! <steve@mac.com>")
	
	on isAvalidHtmlFileUrl(theUrl)
” not from the guidebook
		local ok, astid
		log "here"
		set astid to AppleScript's text item delimiters
		set AppleScript's text item delimiters to ":"
		if not text item 1 of theUrl is "file" then
			set AppleScript's text item delimiters to astid
			return false
		end if
		set AppleScript's text item delimiters to "."
		if not text item -1 of theUrl is "html" then
			set AppleScript's text item delimiters to astid
			return false
		end if
		
		set AppleScript's text item delimiters to astid
		return true
	end isAvalidHtmlFileUrl
	
	
	on decodefurl(anUrlFromABrowser)
” not from the guidebook
		-- 27/08/12 Tested!
		-- konverterer escaped chars tilbake til normal 
		-- fjerner file, og local host. 
		-- localhost starter helt til å begynne med i tilfelle.
		local tmpUrl
		
		set tmpUrl to my decode_text(anUrlFromABrowser)
		
		set tmpUrl to my privateHandlers's str_replace({substringToReplace:"file://", replacementString:"", OriginalString:tmpUrl})
		
		if (offset of "localhost" in tmpUrl) is 1 then set tmpUrl to text 10 thru -1 of tmpUrl
		
		return tmpUrl
	end decodefurl
	
	
	on encode_URL_string(this_item)
		set character_list to (characters of this_item)
		repeat with i from 1 to number of items in character_list
			set this_char to item i of character_list
			if this_char is not in allowed_URL_chars then set item i of character_list to my encode_URL_char(this_char)
		end repeat
		return character_list as string
	end encode_URL_string
	
	on encode_URL_char(this_char)
		--	set ASCII_num to (ASCII number this_char)
		set ASCII_num to (id of this_char)
		return ("%" & (item ((ASCII_num div 16) + 1) of hex_list) & (item ((ASCII_num mod 16) + 1) of hex_list)) as string
	end encode_URL_char
	
	(* decode_char
AppleScript Guidebook: Essential Text Decoding Routines

The following sub-routines can be used to decode previously encoded text.

decode_chars("%24")
--> "$"
*)
	
	-- this sub-routine is used to decode a three-character hex string
	on decode_chars(these_chars)
		copy these_chars to {indentifying_char, multiplier_char, remainder_char}
		set the hex_list to "123456789ABCDEF"
		if the multiplier_char is in "ABCDEF" then
			set the multiplier_amt to the offset of the multiplier_char in the hex_list
		else
			set the multiplier_amt to the multiplier_char as integer
		end if
		if the remainder_char is in "ABCDEF" then
			set the remainder_amt to the offset of the remainder_char in the hex_list
		else
			set the remainder_amt to the remainder_char as integer
		end if
		set the ASCII_num to (multiplier_amt * 16) + remainder_amt
		return (character id ASCII_num)
		-- 	return (ASCII character ASCII_num)
	end decode_chars
	
	(* decode_text

AppleScript Guidebook: Essential Text Decoding Routines

decode_text("My%20Hard%20Drive")
--> "My Hard Drive"

*)
	
	-- this sub-routine is used to decode text strings
	on decode_text(this_text)
		set flag_A to false
		set flag_B to false
		set temp_char to ""
		set the character_list to {}
		repeat with this_char in this_text
			set this_char to the contents of this_char
			if this_char is "%" then
				set flag_A to true
			else if flag_A is true then
				set the temp_char to this_char
				set flag_A to false
				set flag_B to true
			else if flag_B is true then
				set the end of the character_list to my decode_chars(("%" & temp_char & this_char) as string)
				set the temp_char to ""
				set flag_A to false
				set flag_B to false
			else
				set the end of the character_list to this_char
			end if
		end repeat
		return the character_list as string
	end decode_text
	
	(* 
Text Encoding Sub-Routine

Text Encoding Sub-Routine

This sub-routine is used in conjuction with the encoding characters sub-routine to encode spaces and high-level ASCII charaqcters (those above 127) in passed text strings. There are two parameters which control which characers to exempt from encoding.

The first parameter: encode_URL_A is a true or false value which indicates to the sub-routine whether to also encode most of the special characters reserved for use by URLs:

In the following example the encode_URL_A value is false thereby exempting the asterisk ( * ) character, which has a special meaning in URL's, from the encoding process. Only spaces and high-level ASCII characters, like the copyright symbol are encoded.


encode_text("*smith-wilson© report_23.txt", false, false)
--> "*smith-wilson%A9%20report_23.txt"
In the following example the encode_URL_A parameter is true and the asterisk character is included in the encoding process.

encode_text("*smith-wilson© report_23.txt", true, true)
--> "%2Asmith%2Dwilson%A9%20report%5F23%2Etxt"
In the following example the encode_URL_B is false, thereby exempting periods (.), colons(:), underscores (_), and hypens (-) from encoding.

encode_text("annual smith-wilson_report.txt", true, false)
--> "%2Aannual%20smith-wilson_report.txt"

AppleScript Guidebook: Essential Text Encoding  Routines



-- this sub-routine is used to encode text

*)
	
	
	on encode_text(this_text, encode_URL_A, encode_URL_B)
		set the standard_characters to "abcdefghijklmnopqrstuvwxyz0123456789"
		set the URL_A_chars to "$+!'/?;&@=#%><{}[]\"~`^\\|*"
		set the URL_B_chars to ".-_:"
		set the acceptable_characters to the standard_characters
		
		if encode_URL_A is false then set the acceptable_characters to the acceptable_characters & the URL_A_chars
		if encode_URL_B is false then set the acceptable_characters to the acceptable_characters & the URL_B_chars
		set the encoded_text to ""
		repeat with this_char in this_text
			if this_char is in the acceptable_characters then
				set the encoded_text to ¬
					(the encoded_text & this_char)
			else
				set the encoded_text to ¬
					(the encoded_text & encode_char(this_char)) as string
			end if
		end repeat
		return the encoded_text
	end encode_text
	
	(*
Character Encoding Sub-Routine
This sub-routine will encode a passed character. It is called by the other sub-routine examples and must be included in your script.

¢¢¢

encode_char("$")
--> returns: "%24"

AppleScript Guidebook: Essential Text Encoding  Routines

*)
	
	-- this sub-routine is used to encode a character
	on encode_char(this_char)
		set the ASCII_num to (the ASCII number this_char)
		set the hex_list to ¬
			{"0", "1", "2", "3", "4", "5", "6", "7", "8", ¬
				"9", "A", "B", "C", "D", "E", "F"}
		set x to item ((ASCII_num div 16) + 1) of the hex_list
		set y to item ((ASCII_num mod 16) + 1) of the hex_list
		return ("%" & x & y) as string
	end encode_char
	
	-- Encoding URLThis sub-routine is used in conjuction with the
	-- encoding characters sub-routine and the encoding text
	-- sub-routine to encode the text in a URL while retaining its
	-- slash-delineated format.To use, pass a URL path along with
	-- true or false values for the encode_URL_A and encode_URL_B
	-- parameters described in the encoding text sub-routine
	-- above.encode_filepath("file:///My Disk/My Folder/My
	-- File.htm", true, false)-->
	-- "file:///My%20Disk/My%20Folder/My%20File.htm"AppleScript
	-- Guidebook: Essential Text Encoding  Routines
	
	
	set a to encode_URL("ile:///My Disk/My Folder/My File.htm", true, false)
	-->
	on encode_URL(this_URL, encode_URL_A, encode_URL_B)
		set this_URL to this_URL as text
		set AppleScript's text item delimiters to "/"
		set the path_segments to every text item of this_URL
		repeat with i from 1 to the count of the path_segments
			set this_segment to item i of the path_segments
			set item i of the path_segments to my encode_text(this_segment, encode_URL_A, encode_URL_B)
		end repeat
		set this_URL to the path_segments as string
		set AppleScript's text item delimiters to ""
		return this_URL
	end encode_URL
	
	
	(* Filepath to URL
This sub-routine is used in conjuction with the encoding characters sub-routine and the encoding text sub-routine to convert a filepath to an encoded URL format string.

To use, pass a file reference along with true or false values for the encode_URL_A and encode_URL_B parameters described in the encoding text sub-routine above.

filepath_to_URL(alias "My Disk:My Folder:My File", true, false)
--> My%20Disk/My%20Folder/My%20File

AppleScript Guidebook: Essential Text Encoding  Routines
*)
	
	
	on filepath_to_URL(this_file, encode_URL_A, encode_URL_B)
		set this_file to this_file as text
		set AppleScript's text item delimiters to ":"
		set the path_segments to every text item of this_file
		repeat with i from 1 to the count of the path_segments
			set this_segment to item i of the path_segments
			set item i of the path_segments to my encode_text(this_segment, encode_URL_A, encode_URL_B)
		end repeat
		set AppleScript's text item delimiters to "/"
		set this_file to the path_segments as string
		set AppleScript's text item delimiters to ""
		return this_file
	end filepath_to_URL
	
	-->	set b to getIP from "https://127.0.0.1/path/to/file/"
	-->"127.0.0.1"
” not from the guide book
	to getIP from anUrl
		local a, b
		set a to offset of "//" in anUrl
		set b to offset of "/" in (text (a + 2) thru -1 of anUrl)
		set ipAddr to text (a + 2) thru (a + b) of anUrl
		return ipAddr
	end getIP
	
	
	
	script privateHandlers
” not from the guidebook
		on str_replace(R) -- Returns modified string
			-- R {substringToReplace: _ent, replacementString: _rent,OriginalString: _str}
			local _tids
			set _tids to AppleScript's text item delimiters
			set AppleScript's text item delimiters to R's substringToReplace
			set _res to text items of R's OriginalString as list
			set AppleScript's text item delimiters to R's replacementString
			set _res to items of _res as string
			set AppleScript's text item delimiters to _tids
			return _res
		end str_replace
		
	end script
	
end script

That’s a lot of code :mad: but according to which RFC does it work?

You’ll have to read the code to figure it out.

I have tested them, and they encode and decode all right, at least the usual characters, I have not tried it with accented characters. Which I believe is still not totally accepted for Url’s.

It’s code for the pre AppleScript 2.0 not for today version because it has no support for UTF-8.

It doesn’t work alright and every possible character/symbol is accepted today in URLs (including Mac OS X’s urls are UTF-8 encoded). The only thing the encoding does is that if an url contains byte values higher than 127 (7-bit ASCII support only), the byte higher than 127 will be encoded using characters with a lower byte value than 127. An é with UTF-8 encoding should be %C3%A9 and an é with ISO-8859-1 should be %E9. But %E9 is for old Mac versions

So please give your code a note that it is deprecated/old code and won’t work on today’s systems

Hello!

Could you please share the algorithm, or the receipe, and I’ll upgrade the code!

Edit Given time constraints, I’ll go for the ruby solution at the moment!

By the way: I read the w3.org standard last weekend, and there wasn’t a word about higher characters being accepted in there, so I acted in good faith! :slight_smile:

In the mean time, I’ll give my code a note! (or Apple’s really)

Thanks :smiley:

Well It worked in the ascii number and ascii charater era because the returned number from ascii number command was representing the byte value. Since AppleScript 2.0 we have to use string id but the numbers doesn’t represent byte values. The numbers returned from string id are unicode defined numbers which is in some cases quite useful, in this particular case it’s working against us.

The only (without installing additional software) way to get the byte values is using command utilities like xxd. But I would prefer perl, php or ruby for example to encode and decode URLs once you’re already opening a shell. The main reason why I’m choosing PHP is because I write most of my server software in PHP.

The algorithm itself is quite easy, handle the string as an array of bytes, if a byte value is higher than 127 then encode it. The lack of getting byte values from a string in vanilla AS I’m not able to give you one, I’m sorry.

Oh well.

It is important to have stuff that works, isn’t it?

I think I’ll go for perl, as that is what I have some knowledge of, since I have to write the encoding routine as well.

I liked the a and b flags in the routine above though, as it made it easy to mimick the behaviour of OS X, with regards to what characters got encoded and not. But maybe I have misunderstood something.

It is funny though, that it stands in the html standard, from w3.org that unicode characters in url’s are not accepted. That is : higher than ascii character number 127.

Do it stand in any rfc these days, that higher characters are accepted?

I can’t quote chapter and verse, but the idea of scripting additions sitting there and chewing up energy, nice mental picture though it might be, strikes me as ludicrous. If you want reasons why scripting additions are a bad idea, there are legitimate ones – you don’t need this sort of thing.

FWIW, this is a non-vanilla, non-osax method:

tell application id "au.com.myriad-com.ASObjC-Runner" -- ASObjC Runner.app
	modify string "" so it is escaped URI
	modify string "%EF%A3%BF" so it is unescaped URI
end tell

Or if you want vanilla in a Cocoa-AppleScript app:

tell current application's NSString to set x to stringWithString_("") -- make NSString
x's stringByAddingPercentEscapesUsingEncoding_(current application's NSUTF8StringEncoding)
--> 
tell current application's NSString to set x to stringWithString_("%EF%A3%BF") -- make NSString
x's stringByReplacingPercentEscapesUsingEncoding_(current application's NSUTF8StringEncoding)
--> %EF%A3%BF

You’re referring again to the unicode number and not the byte value. W3.org meant on user-level in that documentation, not from a technical-level. On technical-level it’s just an array of bytes and an é is just two byte values. They are encoded not because they are unicode but because both byte values are higher than 127. Also they mention that unicode characters are not allowed because (UTF-8) unicode characters uses always byte values higher than 127.

In another topic I asked you to run a piece of code, which you wouldn’t try. You should, and try opening a folder with a special character. When you’re not using unicode characters it won’t even open, when you use my example, that supports unicode characters, the file will open without problems.

Actually I may have worded it wrong, was I meant that (without going and finding it in the standard), is that extended characters are not permitted for urls. That is the meaning of what I got out of the w3.org standard for urls.

Now, we both know, that when having folders on your disk, or files, that you want to use in hyperlinks, this scheme is totally impractical, and kind of fascistic!

Still I do wonder in which RFC extended ascii characters for urls, are acknowledged.

Opening with the open command was never any problem, the problem is really that Safari has declared the url scheme file, so it handles it interally. That is I how I have understood it, so no matter the characters, it won’t open the folder, I have tried with nice characters, characters below 127, and nothing really happened.

Now, to really put it to a test, I’ll encode a file with your handler, and I’ll make an anchor tag out of it, and see if that opens, I know the result up front, but just to reach a conclusion! :slight_smile:

As for your code, if you go back and read, you’ll see that I indeed tried it.

Opening a url without any encoding necessary, well that actually worked, so I guess the encoding I have used, has messed it up!

Not only those Shane, but the SIMBL’s too. AsobjC-Runner, is a nice exception, in that it dies after a minute of idleness.

Whether those osaxen’s does something or not, there are still pages to be administred, and it eat cycles, and battery. That is my opinon, and I think I will stick with it!

Thanks for your code, consider it snagged, that is really the easy way out of it!

Hello!

You routine performed equally bad on the folder I had problems with Bazzie Wazzie. Looking at the properties at that folder, it was shared! When folders have the normal rights, I guess everything will work all right.

The folder I had problems with, had the same right for groups and everybody, I think that to be the obstacle, as I see Safari do something, but not enough!

I’ll use something else, than the old routines from the guidbook for the future!

Thanks for your help, both of you!