A simple Alternative to a small hash table

Hello.

This is something I came up with in order to assign a number to names, like an index if you like for lookup purposes, so I can get back the name, by the index number I retrieved when I stored the name into the table. -I also needed the id number to be static, and independent of any deleted/inserted items, which is whay I use a table by design, and not a regular AppleScript list, where the item numbers will vary with insertions and deletions.

It is a replacement for a hash-table, and is is calibrated to hold less than 100 items, although it will search for a free spot (deleted item), should it be full, before it errors. It has a much better performance than a hash table of that size, the downside is that the keys aren’t encrypted in anyway, but that is outside of my requirements.

I use this for translating names into indicies, since some algorithms are better at working with indicies than names, (like matrices of various kinds, adjacency matrices in my case).

Why not use a hash table?
Because, I calculated the probabilities for a collision on such a small table, and I found the probability for a collison to be almost 100%, with a filling of 50 elements, for a table size of 100.

The formulae for the probability of a hash collision is 1 - e^(-k(k-1)/2N) where k is the number of elements, and N is the size of the hash key. For 15 elements, this yields a probability of a hash collison of 0.56 ~ 56%, for 50 elements, the probability is almost 100%.

The hash key I used was really much larger than 7bits, but if you are setting 100 elements as the max number of elements, then 100 is the 7 bit number we have to relate to, since we have to use the hash key modulus 101, to get elements between 1 and 100, which is what the hash keys will be “concentrated” down to (a span of values over 7 bits, since we need 7-bits to represent the decimal number 100 as a binary number).

Reference:
Hash Collision Probabilities

on makeidTable()
	script idTable
		-- 100 empty strings
		property table : {"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""}
		property elmPtr : 1
		on assignID(aName)
			if elmPtr < (length of my table) then
				set item elmPtr of my table to aName
				set assignedId to elmPtr
				set elmPtr to elmPtr + 1
				return assignedId
			else
				set emptyItem to false
				repeat with i from 1 to (length of my table)
					if item i of my table is "" then
						set emptyItem to i
						exit repeat
					end if
				end repeat
				if emptyItem is not false then
					set item emptyItem of my table to aName
					return emptyItem
				else
					error "assignID: the table was full, holding " & my elmCount & " elements."
				end if
			end if
		end assignID
		on remvName for anId
			set item anId of my table to ""
		end remvName
		
		on aName for anId
			return item anId of my table
		end aName
	end script
end makeidTable

set ids to makeidTable()
set idlist to {}
set end of idlist to ids's assignID("London")
set end of idlist to ids's assignID("Oslo")
set end of idlist to ids's assignID("Sidney")
set end of idlist to ids's assignID("Zurich")
set end of idlist to ids's assignID("Geneva")
set end of idlist to ids's assignID("Paris")
set end of idlist to ids's assignID("Rome")
set end of idlist to ids's assignID("Torino")
set end of idlist to ids's assignID("Vienna")

set aCity to aName of ids for 4
--> "Zurich"

Edit
2015-04-25 22:45 CET: I have edited the documentation so that it should be a bit clearer, what it is, the purpose, and why the design.

I like the code but I think it differs too much from an hash table to consider it as an alternative.

p.s. Since you mentioned collision issues, here is how I solved it in my hash-like-table code exchange.

Hello.

Thanks. :slight_smile:

I made it as an alternative to using a hash-table, to cover my needs for a datastructure to let me easily relate a string value to an integer, with the lowest possible overhead.

It is a constant time lookup table, with a very low probability of collision detection/handling. So it is an alternative, in the sense that one of the purposes of a hash table, is to provide constant time lookup of values. There are no generation of encrypted keys here, which may also be viewed as a purpose of having a hash table, but that I didn’t need, I just needed a way to relate a key made up of a string into an integer, with the smallest possible probability of a collision.

It is also much simpler than a hash table, in that there really isn’t any way to regenerete the hash value for a key. Here you’ll have to keep the “id” you get for the value, also since inserting the same value twice, will generate two different “ids”, so the idea here is from whence you have your “id”, you’ll use it, until you need the name of the object, which you retrieve via the “id”.

Hello.

I added a layer of datavalidation, to my original values, and then a hash table didn’t seem to be such a bad idea after all, because I wanted to ensure that a key existed in my table. The above is more of a one way street, as for now I need to verify that a name is in my table, before I start to using any keys. (The former approach, just assumed a user was happy with getting a key returned for a name.)

Now it is a small, yet fully fledged hash table with collision detection.

I did receive 19 collisions, during entry of the 58 city names, in the driver (math in post #1). I think it still will perform good enough for my usage.

on newHashTable()
	-- http://macscripter.net/viewtopic.php?pid=180425#p180425
	-- New version, improved comments.
	-- Removed bug, computes hashkey correctly  when key consists of a single character.
	script hashTable
		-- 100 empty strings
		property table : {"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""}
		-- table should be made to be an even number, so that table size + 1 is a prime
		property tableSize : (count table)
		
		on makeInitalHashKey for aValue
			tell aValue
				set clist to its characters
				set keysize to its length
			end tell
			
			set h to id of item 1 of clist
			if keysize > 1 then
				repeat with i from 2 to keysize
					set h to (h * 32 + (id of item i of clist)) mod ((my tableSize) + 1)
					-- horners method
				end repeat
			else
				set h to h mod ((my tableSize) + 1)
			end if
			
			if h = 0 then set h to 1
			return h
			
		end makeInitalHashKey
		
		on genKey for aValue
			-- Inserts a value into the hash-table, and returns
			-- the hash-key that can be used to retrieve it.
			
			-- You gotta test this, or test the output
			-- to ensure you got just unique values
			
			-- This is based on Sedgewick 2nd ed.  p. 233
			set h to makeInitalHashKey for aValue
			
			if item h of my table is "" then
				set item h of my table to aValue
				return h
			else if item h of my table is aValue then
				return h
			else
				-- **			log "Collided name = " & aValue & " h = " & h
				-- collision detection/handling 
				set foundSlot to false
				set h to h + 1
				repeat ((my tableSize) - 1) times
					set h to h mod ((my tableSize) + 1)
					if h is 0 then set h to 1
					if item h of my table is "" then
						set foundSlot to true
						set item h of my table to aValue
						exit repeat
					else
						set h to h + 1
					end if
				end repeat
				if not foundSlot then
					error "genKey: Table is full"
				else
					-- **					log h
					return h
				end if
			end if
		end genKey
		
		on keyExists for aValue
			return aValue is in my table
		end keyExists
		
		on aKey for aValue
			-- returns an already generated key for a value.
			-- key-exists should be run first
			set h to makeInitalHashKey for aValue
			if item h of my table is aValue then
				return h
			else
				-- collision detection/handling 
				set foundSlot to false
				set h to h + 1
				repeat ((my tableSize) - 1) times
					set h to h mod ((my tableSize) + 1)
					if h is 0 then set h to 1
					if item h of my table is aValue then
						set foundSlot to true
						exit repeat
					else
						set h to h + 1
					end if
				end repeat
				if not foundSlot then
					error "aKey: Table doesn't contain that key."
				else
					return h
				end if
			end if
		end aKey
		
		on rmValue for aKey
			-- This handler assumes that the hashkey exists!
			set item aKey of my table to ""
		end rmValue
		
		on aValue for aKey
			-- This handler assumes that the hashkey exists!
			return item aKey of my table
		end aValue
	end script
end newHashTable

on driver()
	-- be sure to remove the comments in front of the log statements!
	
	
	set htbl to makeHashTable()
	set klist to {}
	
	
	
	set end of klist to genKey of htbl for "Trondheim"
	set end of klist to genKey of htbl for "London"
	set end of klist to genKey of htbl for "Oslo"
	set end of klist to genKey of htbl for "Sidney"
	set end of klist to genKey of htbl for "Zurich"
	set end of klist to genKey of htbl for "Geneva"
	set end of klist to genKey of htbl for "Paris"
	set end of klist to genKey of htbl for "Rome"
	set end of klist to genKey of htbl for "Torino"
	set end of klist to genKey of htbl for "Vienna"
	set end of klist to genKey of htbl for "Haag"
	set end of klist to genKey of htbl for "Bern"
	set end of klist to genKey of htbl for "Amsterdam"
	set end of klist to genKey of htbl for "Copenhagen"
	set end of klist to genKey of htbl for "Bristol"
	set end of klist to genKey of htbl for "Las Vegas"
	set end of klist to genKey of htbl for "Brighton"
	set end of klist to genKey of htbl for "Arendal"
	set end of klist to genKey of htbl for "Divonne les Bains"
	set end of klist to genKey of htbl for "Aarhus"
	set end of klist to genKey of htbl for "Munich"
	set end of klist to genKey of htbl for "Strasbourg"
	set end of klist to genKey of htbl for "New Jersey"
	set end of klist to genKey of htbl for "Wellington"
	set end of klist to genKey of htbl for "Canberra"
	set end of klist to genKey of htbl for "Stockholm"
	set end of klist to genKey of htbl for "Gothenburg"
	set end of klist to genKey of htbl for "Houston"
	set end of klist to genKey of htbl for "Galveston"
	set end of klist to genKey of htbl for "Miami"
	set end of klist to genKey of htbl for "Vera Cruz"
	set end of klist to genKey of htbl for "Sacramento"
	set end of klist to genKey of htbl for "Los Angeles"
	set end of klist to genKey of htbl for "San Fransisco"
	set end of klist to genKey of htbl for "Tempe"
	set end of klist to genKey of htbl for "Bogota"
	set end of klist to genKey of htbl for "Buenos Aires"
	set end of klist to genKey of htbl for "Brugge"
	set end of klist to genKey of htbl for "Florence"
	set end of klist to genKey of htbl for "Vienna"
	set end of klist to genKey of htbl for "Rotterdam"
	set end of klist to genKey of htbl for "Leiden"
	set end of klist to genKey of htbl for "Glasgow"
	set end of klist to genKey of htbl for "Aberdeen"
	set end of klist to genKey of htbl for "Liverpool"
	set end of klist to genKey of htbl for "Portsmouth"
	set end of klist to genKey of htbl for "Cardiff"
	set end of klist to genKey of htbl for "Karlstad"
	set end of klist to genKey of htbl for "Hjørring"
	set end of klist to genKey of htbl for "Dresden"
	set end of klist to genKey of htbl for "Hamburg"
	set end of klist to genKey of htbl for "Kiel"
	set end of klist to genKey of htbl for "Frankfurt"
	set end of klist to genKey of htbl for "Toulouse"
	set end of klist to genKey of htbl for "Madrid"
	set end of klist to genKey of htbl for "Barcelona"
	set end of klist to genKey of htbl for "Tampa"
	set end of klist to genKey of htbl for "Seattle"
	set end of klist to genKey of htbl for "Carson"
	-- 19 collisions during generation of 58 keys
	
	set aName to aValue of htbl for 32
	return klist
end driver

Hello.

Just for the hell of it, I checked how many equal random numbers between 1 and 100 I got, when I generated 58 random numbers at a time, I got on the average 13 collisions. This means that my hash function doesn’t perform that bad at all, even though 19 collisions may sound much, it is totally in line with the math, described in the post #1.

Hello.

I removed a bug, that took effect in the makeHashTable script in post #4, when the key just consisted of one character. I also refactored out a makeInitalHashKey for aValue handler, and changed the script name inside the newHashTable to be more descriptive: “hashTable”. :slight_smile:

I should also mention, just in case, that the keys are case sensitive: Toy and ToY generates two different keys, if you don’t take any precations. That is the user of the script’s responsibility, to handle.

Hello.

For those interested in creating a hashtable large enough to avoid any collisions, before the number of stored keys reaches some treshold, here is the formula:

It turns out that by calculus, a function is generated for the number of elements a hashtable can contain before the probability of a collision reaches 1/2.

The formula is:

m is the number of keys that can be stored in the hashtable. -The hashtable size.
N that is returned, yields the number of keys that can be inserted into the hashtable of size m, before a probability of 1/2, for a collision, is reached. :slight_smile: