Hello all,
For educational reasons I will show and explain how hash tables works in it’s simplest form. There are other, and in some scenarios faster, solutions to skin this cat but that’s not the point of this topic. I would like to keep this topic plain AppleScript and has to work with hash function and the code needs to be easy to understand, any input is welcome that will meet this requirements.
For those who aren’t familiar with hash tables, a small explanation. A hash table will create an hash from it’s key, mostly a string, where the encrypted value meets an index in an array. When you have a list of key values pairs you replace the part of iterating through the list to lookup the key with a small hash function, resulting in a good performing key value pair list.
script hashTable
on newInstance()
script hashTableInstance
property parent : hashTable
property size : missing value
property hashList : missing value
property keyList : missing value
property valueList : missing value
end script
end newInstance
on init()
set my size to 0
set my hashList to {}
repeat 1000 times
set end of my hashList to {}
end repeat
set my keyList to {}
set my valueList to {}
return me
end init
on initWithKeysAndValues(_keys, _values)
set my keyList to _keys
set my valueList to _values
set my size to count my keyList
set my hashList to {}
repeat 1000 times
set end of my hashList to {}
end repeat
repeat with x from 1 to my size
set end of item hashFunction(item x of my keyList) of my hashList to x
end repeat
return me
end initWithKeyAndValues
on setValueforKey(_key, _value)
set x to hashFunction(_key)
repeat with node in item x of my hashList
if id of item node of my keyList = id of _key then
set item node of my valueList to _value
return true
end if
end repeat
set my size to (my size) + 1
set end of my valueList to _value
set end of my keyList to _key
set end of item x of my hashList to my size
end setValueforKey
on valueForKey(_key)
set x to hashFunction(_key)
repeat with node in item x of my hashList
if id of item node of my keyList = id of _key then
return item node of my valueList
end if
end repeat
return missing value
end valueForKey
on keyExists(_key)
considering case
return my keyList contains _key
end considering
end keyExists
on keys()
return my keyList
end keys
on setKeys(_keys)
set _keys to every text of _keys
if (count _keys) = (count my keyList) then
set my keyList to _keys
--need to re-hash
set my hashList to {}
repeat 1000 times
set end of my hashList to {}
end repeat
repeat with x from 1 to my size
set end of item hashFunction(item x of my keyList) of my hashList to x
end repeat
end if
end setKeys
on valueExists(_value)
return my valueList contains _value
end valueExists
on values()
return my valueList
end values
on setValues(_values)
if (count _values) = (count my valueList) then set my valueList to _values
end setValues
on count
return my size
end count
on hashFunction(_key)
set _hash to count _key
--Credits to Shane Stanly here, supports single character keys now.
repeat with char in (id of _key as list)
set _hash to _hash + char
end repeat
return _hash mod 1000 + 1
end hashFunction
end script
The methods:
-
newInstance() - class method
create an instance of hashtable -
init() --instance method
Initialize the instance, all values and size will be set to 0 -
initWithKeysAndValues(_keys, _values) --instance method
Initialize the instance, and all given values are set -
setValueforKey(_key, _value) - instance method
Insert a key and value into the list; if a key already exists then it will be updated with the value
Keys are case sensitive -
valueForKey(_key) - instance method
Return the associated value by key; if a given key doesn’t exists it returns missing value
Keys are case sensitive -
keyExists(_key) - instance method
Return a boolean whether a key already exists or not; note this method is case sensitive -
keys() instance method
Return a list with all the keys -
setKeys(_keys) - instance method
Set new keys for all the values. The keys must be a all of type string and length should match the number of current keys
Problem with big lists is that changing the keys that the whole table needs to be re-hashed -
valueExists(_value) - instance method
Return a boolean whether a value is already in the list or not. -
values() - instance method
Return a list with all the values -
setValues(_values) - instance method
set the values for all keys. Can be usefull when you have a fixed list with a fixed key table -
count - instance method
Returns the number of items in the list -
hashFunction(_key) - class method
The function that returns the associated item in the hash table
Note: Difference between instance method and class method is that class method should be called from the hashtable object immediatly and instance method should be called from the object returned by the newInstance class method.
USAGE:
First we can create an empty list by using:
set dict to hashTable's newInstance()'s init()
Now we’ve created a instance that is initialized and contains no data. To create a list with values you can use:
set dict to hashTable's newInstance()'s initWithKeysAndValues({"First", "Second", "Third"}, {"Red", "Green", "Blue"})
now to lookup an value you can use:
set dict to hashTable's newInstance()'s initWithKeysAndValues({"First", "Second", "Third"}, {"Red", "Green", "Blue"})
dict's valueForKey("Second")
Which returns “Green” but keep in mind that the key’s are case sensitive. So when I lookup “second” it would return missing value instead of Green. To avoid this behavior you should capitalize or uncapitalize the key string matches and hashes function to make this case insensitive.
To add an value we should use:
set dict to hashTable's newInstance()'s initWithKeysAndValues({"First", "Second", "Third"}, {"Red", "Green", "Blue"})
dict's setValueforKey("Last", {1, 2, 3, 5, 6, 7, 8})
dict's valueForKey("Last")
As you can see we can insert other object that string only. You can add any kind of basic AppleScript objects you want like string, numbers, integer, reals, records, lists and data.
To update an item we use the same code as inserting one. When an key exists the old associated value will be overwritten, this makes it also impossible to have two of the same keys. This is a feature not a bug :D.
set dict to hashTable's newInstance()'s initWithKeysAndValues({"First", "Second", "Third"}, {"Red", "Green", "Blue"})
dict's setValueforKey("Second", "Yellow")
dict's valueForKey("Second")
when you don’t want to overwrite you can use keyExists(_key) method to check if a key exists before you want to set/add the key value pair
set dict to hashTable's newInstance()'s initWithKeysAndValues({"First", "Second", "Third"}, {"Red", "Green", "Blue"})
if not dict's keyExists("Second") then
dict's setValueforKey("Second", "Yellow")
end if
dict's valueForKey("Second")
You can also set immediatly the values without rehashing which can be usefull for things like profiles (fixed key and value list)
set dict to hashTable's newInstance()'s initWithKeysAndValues({"First", "Second", "Third"}, {"Red", "Green", "Blue"})
dict's setValues({"Cyan", "Magenta", "Yellow"})
dict's valueForKey("Second")
The same can be applied for keys but remember that the list of keys can only contain strings and it needs to rehash the key table. So as long as dictionary contains less than 3000 items there is no problem/lag but when it’s 10,000 it needs some time to hash.
set dict to hashTable's newInstance()'s initWithKeysAndValues({"First", "Second", "Third"}, {"Red", "Green", "Blue"})
dict's setKeys({"Color 1", "Color 2", "Color 3"})
dict's valueForKey("Color 3")
At last, and maybe weird is that count command. Some commands can be overwritten with script objects so you can use the count command for the instance:
set dict to hashTable's newInstance()'s initWithKeysAndValues({"First", "Second", "Third"}, {"Red", "Green", "Blue"})
count of dict
count dict --is also correct
number of dict --can also be used
So I hope this is enough information and having fun with it!
Already some FAQ answers
Why an hash table size of 1000?
The main reason for that is AppleScript seems to be comfortable with arrays containing up to 1000 items. While the lookup for the hash wouldn’t have any problem with a list containing 100,000 items, creating them is so slow you wouldn’t use this script object. So creating a bigger hash table is only overhead and initialization would take longer.
Shouldn’t it be smaller then?
No because the hash function will output an index by it’s key value, the larger the hash table the less collisions we have (multiple indexes in the node) and the faster the item will be returned.
Aren’t there other/better hash tables?
Definitely! There are plenty different solutions but I wanted it to keep simple as possible so the code speaks for itself. When someone knows the trick he can create more complex hash function resulting in a better hash code. However, since 10.10 ASObjC is supported in AppleScript so I would use an NSDictionary instead. Again this is only for educational purpose.
What’s the main advantage of hash tables?
Apart from being an associative array the main advantage of hash tables is that no matter how large a list gets, it should only affect the lookup performance for a tiny bit (read: collisions). When you look up data from a list containing 100 items, a repeat loop will be fast. But when you need to loop through an list of 100,000 items for a couple of times it would be a lot more data handling. So when the list contains 100 or 100,000 the performance difference should be very small.
Does the ‘performance’ advantage also applies to this hash table?
Keep in mind that AppleScript and it’s foundation is all written in C, it’s compared to AppleScript a very high performance programming languages. That implies that self written implementations in plain AppleScript is slower than AppleScript’s own implementations written in C (not using an AppleEvent). So compared to lists and records this is slow, but when you lookup values by iterating in plain AppleScript this performance advantage does apply also in this simple hash table.
So why should I use an hash table when I can’t use it in a way as in other programming languages?
Well like in my first line, this post is for educational purposes. Another thing is that even creating a list containing 10,000 items are slow, you can still lookup an item in a blink of an eye. There are many cases records or other alternatives are better but this hash table can lookup keys for existence and their keys doesn’t need to be set at compile time which is required for records.
Why is it case sensitive?
Some programming languages does support associative arrays with case insensitive keys but I prefer case sensitive and using lower case key names. It’s something personal and also in this context quite easier otherwise I had to convert the keys to lower or uppercase first in the hash function so that the outcome of “KEY” and “key” are the same.
Edit 1: Shane Stanley suggested to change the loop to support single character keys
Edit 2: Nigel suggested to change the local variable name ‘hash’ because it collide with satimage users. Therefor I changed the local variable ‘hash’ into ‘_hash’. I also changed dictionary into dict
Edit 3: Shane pointed me to a translation error I made. The method initWithKeysAndValues is now spelled correctly
Edit 4: Updated some information because some of it doesn’t apply or AppleScript provides better alternatives by itself now.