Unicode characters in AppleScriptObjC

I’m trying to do some work with Unicode characters in an AppleScript-Cocoa application and have been running into some issues with various characters.

In it’s most basic form, the app takes text that the user has input into a text field and then writes that text to a property list file. The user can then recall the text from the property list file, by which I am using defaults read to get the key value written previously. When writing to the property list file, some characters are converted automatically, such as an ampersand, which gets converted to & but other characters are converted to a Unicode character code, such as ¡ is converted to \U00a1. Fixing things like the & is not a problem, but I’m not understanding how to fix the Unicode characters.

When the user recalls the text from the property list file, I’d like text to be displayed to the user as it was originally typed it in… as in I want them to see " ¡ " and not “\U00a1” when the text is recalled. I thought I had found a solution with using «data utxt» but it’s not returning the characters I am expecting.

Since I need to account for pretty much any Unicode character, I need to use a variable to return the proper Unicode character from the character code. Something like the following works:

set currentCharacter to «data utxt00A1» as Unicode text
display dialog currentCharacter

but this does not:

set characterCode to "00A1"
-- this doesn't fail, but it also does not return the correct character:
set currentCharacter to (run script "«data utxt" & characterCode & "»") as Unicode text
display dialog currentCharacter

What am I not understanding here? Thanks in advance!

How are you writing the file? Why are you using defaults read?

I am using something like the following to write the text to the property list:

 

set aTextString to (textField's stringValue()) as text
 -- header and footer and defined elsewhere and include the proper formatting for a plist file
set aTextString to aTextStringHeader & aTextString & aTextStringFooter
set someText to current application's NSString's stringWithFormat_("%@", aTextString)   
set someTextData to someText's dataUsingEncoding_(current application's NSUTF8StringEncoding)        
set filePath to "/Users/Username/Desktop/aPropertyList.plist"
set fileManager to current application's NSFileManager's defaultManager()
set theResult to fileManager's createFileAtPath_contents_attributes_(filePath, someTextData, missing value)

I am using defaults read because… well, because I haven’t attempted using NSUserDefaults with a specified domain (other than standardUserDefaults) yet. In other cases, where I am reading/writing to my app’s plist, I am using NSUserDefaults. Just doing a quick defaults read made sense to me.

I’ve looked through the documentation on NSUserDefaults but didn’t understand if specifying a domain path other than standardUserDefaults was possible, and if so, was possible without mucking things up. I then looked into CFPreferences, but was confused by that too. The plist that the user recalls data from could be located anywhere the user chose to save the plist to… and so doing "defaults read " & pathToPlist & “ ” was the only thing that made sense to me.

Would changing stringWithFormat to stringWithUTF8String make a difference?

NSUserDefaults is for dealing with defaults, not .plist files. It happens that the data gets saved as a property list, but that’s an implementation detail, and it’s not actually done by NSUserDefaults.

You shouldn’t be trying to write property list files as text files like that, either. You normally want to save either a list/array or record/dictionary as a property list, and you do that directly with their writeToFile:atomically: and arrayWithContentsOfFile:/dictionaryWithContentsOfFile: methods. It’s actually a lot simpler than what you’re trying, unless you’re going to be adding to it or trying to read a bit at a time.

But it’s not entirely clear whether you want to store values as part of the app’s preferences, or in a separate property list file.

Shameless plug: this is all covered in my book.

Hello.

An alternative to using Shanes Book, and property lists, would/could maybe be to write an AppleScript Object or list to a file. If memory serves right, you should be able to serialize an object, or a list/structure of objects with AppleScripts read command. At least you can use the store scriptt and load script commands from AS’s native Scripting Additions ( used within a tell me block to make them work, almost regardless of context).

I don’t see any inadvertently wrong in using user defaults, by the way. And, if you want to create property list files by yourself, then you should read up on the NSEncoding protocol, and NSKeyed Archiever/Unarchiever, together with NSData, maybe read Property List Programming Guide as well.

If you are to go down that road, then Shane’s book will save you hours. :slight_smile:

Edit

Another alternative for writing out property list files is of course the Property list Suite of System Events.

Thanks for the suggestions! I understand that creating/writing to the plist in this way is not the correct way to do it, as in, there are other tools which are designed for this task. May I ask though, are there specific reasons why one should not use fileManager’s createFileAtPath_contents_attributes_(filePath, someTextData, missing value) to create/write to the plist? Would the issue I am having with Unicode characters be an example of a repercussion of doing so?

It’s probable that my understanding of the defaults system is lacking, and thus, I am incorrectly assuming that things like do shell script “defaults read ” is related/similar to NSUserDefaults’s standardUserDefaults()'s objectForKey_().

I apologize for being so vague about what I am doing with the plists. I’m working with launch agents, and the type of information which is written to the launch agent can vary. Some launch agents the user creates might have a ProgramArguments key which is just a path to an executable while others might be a multi-line shell script. Some launch agents might have a WatchPaths key whiles others don’t, and so on… I needed a way to combine the choices the user has made in the UI of my app and then create the appropriate keys from those choices and write them to the launch agent plist. If the user later goes back to edit the launch agent, my app has to figure out what kind of launch agent the user created, and do a bunch of stuff (selecting items at index in drop down buttons, hiding/showing different UI elements, and setting string values of text fields) based on reading the information in the launch agent.

I had been waiting for v5 of your book Shane, but I just went ahead and purchased it :slight_smile: With all the help you’ve provided, I feel like I owe you. Thanks for you help. I’m sure your book will a great help as well.

Out of curiosity, does anyone know why the following doesn’t work:

set characterCode to "00A1"
-- this doesn't fail, but it also does not return the correct character:
set currentCharacter to (run script "«data utxt" & characterCode & "»") as Unicode text
display dialog currentCharacter

The issue is that you shouldn’t really be treating property list files as text files.

I suspect what you are seeing is because createFileAtPath_contents_attributes_ requires raw data, not text. You could convert the text to data, and your Unicode problems would disappear. But that’s not really what you want to do…


They are related. But neither of them actually creates a property list file – they communicate with a separate process, and that process does the reading and writing of preference files. So if Apple decided to use a different format, or to save the files elsewhere as they did with sandboxed apps, NSUserDefaults and defaults would work exactly the same. The point is they they are for reading/storing preferences; how they get stored on disk is an implementation detail. And it means they’re no use for writing any other property list file.

OK. So if you read an existing launch agent using NSDictionary’s dictionaryWithContentsOfFile_, you’ll see it is effectively a record/dictionary. So what you want to do is build a suitable record/dictionary based on the user’s choices, and then use writeToFile_atomically_ to write it to a property list file. So the actual reading/writing part is very simple.

Thank you! Version 5 has stalled a couple of times, but it is still on the way… Meanwhile it will be a free upgrade to anyone who bought version 4 after it was announced.

I’m not sure, but using utxt like that is a bit of a hack. Much better to convert the hex to decimal and use “character id n”.

For anyone interested, I wrote this bit below to convert instances of unicode character codes back into the character the code represents. Of course, as soon as I figure out how to do things the right way, I probably won’t need this anymore. The only issue I’ve had with this script below is that it doesn’t consider case when looking for “\u” which could be an issue with a string like:

“\U00bf this is an example of what happens when you don’t consider case. Do you \UNDERSTAND?”


set aString to "\\U0153\\U2211\\U00b4\\U00ae\\U2020hello\\U00a5\\U00a8\\U02c6\\U00f8\\U03c0\\U201c\\U2018\\U00e5\\U00df\\U2202world\\U0192\\U00a9\\U02d9\\U2206\\U02da\\U00ac\\U2026\\U00e6\\U03a9\\U2248\\U00e7\\U221a\\U222b\\U02dc\\U00b5\\U2264\\U2265\\U00f7"

set AppleScript's text item delimiters to ""
set AppleScript's text item delimiters to "\\u"

set delimiterCount to count text items of aString

set modifiedString to text item 1 of aString

set x to 2

repeat delimiterCount times
	
	try
		set AppleScript's text item delimiters to "\\u"
		set currentStringPart to text item x of aString
		set AppleScript's text item delimiters to ""
		set currentStringCharacterCount to (count characters of currentStringPart)
		
		set characterCode to (characters 1 through 4 of currentStringPart) as text
		
		try
			set remainingString to (characters 5 through currentStringCharacterCount of currentStringPart) as text
		on error
			set remainingString to ""
		end try
		
		set nDec to (do shell script "perl -e 'printf(hex(\"" & characterCode & "\"))'") as number
		
		if nDec is not 0 then
		
			set currentCharacter to character id nDec
			set modifiedString to modifiedString & currentCharacter & remainingString
			
		else
		
			set modifiedString to modifiedString & "\\u" & currentStringPart
			
		end if
		
		set x to x + 1
		
	end try
	
end repeat

return modifiedString

Converting hex strings to decimals is easily done using a class called NSScanner. For example:

	set scanner to current application's NSScanner's scannerWithString:"00bf"
	set {theResult, theInt} to scanner's scanHexInt:(reference)
	character id theInt

Results in:

0000.085 [30] set scanner to current application's NSScanner's scannerWithString:"00bf"
--> <NSConcreteScanner: 0x60000047bc80>
0000.086 [31] set {theResult, theInt} to scanner's scanHexInt:(reference)
--> {1, 191}
0000.087 [32] character id theInt
--> ¿

In fact, NSScanner is also just the ticket for your job. Use something like this:

on makeTextOf:theString
	set newString to ""
	set scanner to current application's NSScanner's scannerWithString:theString -- make scanner from string
	repeat while (scanner's isAtEnd() as integer = 0) -- give up  if at end of scanner
		set theResult to scanner's scanString:"\\U" intoString:(missing value) -- look for "\\U"
		if theResult as integer = 1 then -- found \\U, so start looking for hex value
			set {theResult, theInt} to scanner's scanHexInt:(reference) -- get integer value of next hex
			set newString to newString & character id theInt -- make character and add it to string
		else -- no \\U found, so must be normal text
			set {theResult, someText} to scanner's scanUpToString:"\\U" intoString:(reference) -- get text up to next \\U
			set newString to newString & (someText as text) -- add it to string
		end if
	end repeat
	return newString
end makeTextOf:

theLib's makeTextOf:"\\U0153\\U2211\\U00b4\\U00ae\\U2020hello\\U00a5\\U00a8\\U02c6\\U00f8\\U03c0\\U201c\\U2018\\U00e5\\U00df\\U2202world\\U0192\\U00a9\\U02d9\\U2206\\U02da\\U00ac\\U2026\\U00e6\\U03a9\\U2248\\U00e7\\U221a\\U222b\\U02dc\\U00b5\\U2264\\U2265\\U00f7"

(You will probably have to translate from the Mavericks-stryle interleaved syntax.)

Simpler, and about 50 times faster.

This is why you write the books and I ask the questions :wink: