Friday, November 24, 2017

#1 2015-04-25 03:08:01 pm

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

A simple Alternative to a small hash table

Hello.

This is something I came up with in order to assign a number to names, like an index if you like for lookup purposes, so I can get back the name, by the index number I retrieved when I stored the name into the table.  -I also needed the id number to be static, and independent of any deleted/inserted items, which is whay I use a table by design, and not a regular AppleScript list, where the item numbers will vary with insertions and deletions.

It is  a replacement for a hash-table, and is is calibrated to hold less than 100 items, although it will search for a free spot (deleted item), should it be full, before it errors. It has a much better performance than a hash table of that size, the downside is that the keys aren't encrypted in anyway, but that is outside of my requirements.

I use this for translating names into indicies, since some algorithms are better at working with indicies than names, (like matrices of various kinds, adjacency matrices in my case).

Why not use a hash table?
Because, I calculated the probabilities for a collision on such a small table, and I found the probability for a collison to be almost 100%, with a filling of 50 elements, for a table size of 100.

The formulae for the probability of a hash collision is 1 - e^(-k(k-1)/2N) where k is the number of elements, and N is the size of the hash key. For 15 elements, this yields a probability of a hash collison of 0.56 ~  56%, for 50 elements, the probability is almost 100%.

The hash key I used was really much larger than 7bits, but if you are setting 100 elements as the max number of elements, then 100 is the  7 bit number we have to relate to, since we have to use the hash key modulus 101, to get elements between 1 and 100, which is what the hash keys will be "concentrated" down to (a span of values over 7 bits, since we need 7-bits to represent the decimal number 100 as a  binary number).

Reference:
Hash Collision Probabilities

Applescript:

on makeidTable()
   script idTable
       -- 100 empty strings
       property table : {"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""}
       property elmPtr : 1
       on assignID(aName)
           if elmPtr < (length of my table) then
               set item elmPtr of my table to aName
               set assignedId to elmPtr
               set elmPtr to elmPtr + 1
               return assignedId
           else
               set emptyItem to false
               repeat with i from 1 to (length of my table)
                   if item i of my table is "" then
                       set emptyItem to i
                       exit repeat
                   end if
               end repeat
               if emptyItem is not false then
                   set item emptyItem of my table to aName
                   return emptyItem
               else
                   error "assignID: the table was full, holding " & my elmCount & " elements."
               end if
           end if
       end assignID
       on remvName for anId
           set item anId of my table to ""
       end remvName
       
       on aName for anId
           return item anId of my table
       end aName
   end script
end makeidTable

set ids to makeidTable()
set idlist to {}
set end of idlist to ids's assignID("London")
set end of idlist to ids's assignID("Oslo")
set end of idlist to ids's assignID("Sidney")
set end of idlist to ids's assignID("Zurich")
set end of idlist to ids's assignID("Geneva")
set end of idlist to ids's assignID("Paris")
set end of idlist to ids's assignID("Rome")
set end of idlist to ids's assignID("Torino")
set end of idlist to ids's assignID("Vienna")

set aCity to aName of ids for 4
--> "Zurich"

Edit
2015-04-25 22:45 CET: I have edited the documentation so that it should be a bit clearer, what it is, the purpose, and why the design.

Last edited by McUsrII (2015-04-25 06:37:41 pm)

Offline

 

#2 2015-04-25 08:21:08 pm

DJ Bazzie Wazzie
Member
From:: the Netherlands
Registered: 2004-10-20
Posts: 2724
Website

Re: A simple Alternative to a small hash table

I like the code but I think it differs too much from an hash table to consider it as an alternative.

p.s. Since you mentioned collision issues, here is how I solved it in my hash-like-table code exchange.

Offline

 

#3 2015-04-26 03:15:21 am

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: A simple Alternative to a small hash table

Hello.

Thanks. smile

I made it as an alternative to using a hash-table, to cover my needs for a datastructure to let me easily relate a string value to an integer, with the lowest possible overhead.

It is a constant time lookup table, with a very low probability of collision detection/handling. So it is an alternative, in the sense that one of the purposes of a hash table, is to provide constant time lookup of values. There are no generation of encrypted keys here, which may also be viewed as a purpose of  having a hash table, but that I didn't need, I just needed a way to relate a key made up of a string into an integer, with the smallest possible probability of a collision.

It is also much simpler than a hash table, in that there really isn't any way to regenerete the hash value for a key. Here you'll have to keep the "id" you get for the value, also since inserting the same value twice, will generate two different "ids", so the idea here is from whence you have your "id", you'll use it, until you need the name of the object, which you retrieve via the "id".

Last edited by McUsrII (2015-04-26 03:28:36 am)

Offline

 

#4 2015-04-26 05:44:40 am

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: A simple Alternative to a small hash table

Hello.

I added a layer of datavalidation, to my original values, and then a hash table didn't seem to be such a bad idea after all, because I wanted to ensure that a key existed in my table. The above is more of a one way street, as for now I need to verify that a name is in my table, before I start to using any keys. (The former approach, just assumed a user was happy with getting a key returned for a name.)

Now it is a small, yet fully fledged hash table with collision detection.

I did receive 19 collisions, during entry of the  58 city names, in the driver (math in post #1). I think it still will perform good enough for my usage.

Applescript:

on newHashTable()
   -- [url]http://macscripter.net/viewtopic.php?pid=180425#p180425[/url]
   -- New version, improved comments.
   -- Removed bug, computes hashkey correctly when key consists of a single character.
   script hashTable
       -- 100 empty strings
       property table : {"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""}
       -- table should be made to be an even number, so that table size + 1 is a prime
       property tableSize : (count table)
       
       on makeInitalHashKey for aValue
           tell aValue
               set clist to its characters
               set keysize to its length
           end tell
           
           set h to id of item 1 of clist
           if keysize > 1 then
               repeat with i from 2 to keysize
                   set h to (h * 32 + (id of item i of clist)) mod ((my tableSize) + 1)
                   -- horners method
               end repeat
           else
               set h to h mod ((my tableSize) + 1)
           end if
           
           if h = 0 then set h to 1
           return h
           
       end makeInitalHashKey
       
       on genKey for aValue
           -- Inserts a value into the hash-table, and returns
           -- the hash-key that can be used to retrieve it.
           
           -- You gotta test this, or test the output
           -- to ensure you got just unique values
           
           -- This is based on Sedgewick 2nd ed. p. 233
           set h to makeInitalHashKey for aValue
           
           if item h of my table is "" then
               set item h of my table to aValue
               return h
           else if item h of my table is aValue then
               return h
           else
               -- **            log "Collided name = " & aValue & " h = " & h
               -- collision detection/handling
               set foundSlot to false
               set h to h + 1
               repeat ((my tableSize) - 1) times
                   set h to h mod ((my tableSize) + 1)
                   if h is 0 then set h to 1
                   if item h of my table is "" then
                       set foundSlot to true
                       set item h of my table to aValue
                       exit repeat
                   else
                       set h to h + 1
                   end if
               end repeat
               if not foundSlot then
                   error "genKey: Table is full"
               else
                   -- **                    log h
                   return h
               end if
           end if
       end genKey
       
       on keyExists for aValue
           return aValue is in my table
       end keyExists
       
       on aKey for aValue
           -- returns an already generated key for a value.
           -- key-exists should be run first
           set h to makeInitalHashKey for aValue
           if item h of my table is aValue then
               return h
           else
               -- collision detection/handling
               set foundSlot to false
               set h to h + 1
               repeat ((my tableSize) - 1) times
                   set h to h mod ((my tableSize) + 1)
                   if h is 0 then set h to 1
                   if item h of my table is aValue then
                       set foundSlot to true
                       exit repeat
                   else
                       set h to h + 1
                   end if
               end repeat
               if not foundSlot then
                   error "aKey: Table doesn't contain that key."
               else
                   return h
               end if
           end if
       end aKey
       
       on rmValue for aKey
           -- This handler assumes that the hashkey exists!
           set item aKey of my table to ""
       end rmValue
       
       on aValue for aKey
           -- This handler assumes that the hashkey exists!
           return item aKey of my table
       end aValue
   end script
end newHashTable

on driver()
   -- be sure to remove the comments in front of the log statements!
   
   
   set htbl to makeHashTable()
   set klist to {}
   
   
   
   set end of klist to genKey of htbl for "Trondheim"
   set end of klist to genKey of htbl for "London"
   set end of klist to genKey of htbl for "Oslo"
   set end of klist to genKey of htbl for "Sidney"
   set end of klist to genKey of htbl for "Zurich"
   set end of klist to genKey of htbl for "Geneva"
   set end of klist to genKey of htbl for "Paris"
   set end of klist to genKey of htbl for "Rome"
   set end of klist to genKey of htbl for "Torino"
   set end of klist to genKey of htbl for "Vienna"
   set end of klist to genKey of htbl for "Haag"
   set end of klist to genKey of htbl for "Bern"
   set end of klist to genKey of htbl for "Amsterdam"
   set end of klist to genKey of htbl for "Copenhagen"
   set end of klist to genKey of htbl for "Bristol"
   set end of klist to genKey of htbl for "Las Vegas"
   set end of klist to genKey of htbl for "Brighton"
   set end of klist to genKey of htbl for "Arendal"
   set end of klist to genKey of htbl for "Divonne les Bains"
   set end of klist to genKey of htbl for "Aarhus"
   set end of klist to genKey of htbl for "Munich"
   set end of klist to genKey of htbl for "Strasbourg"
   set end of klist to genKey of htbl for "New Jersey"
   set end of klist to genKey of htbl for "Wellington"
   set end of klist to genKey of htbl for "Canberra"
   set end of klist to genKey of htbl for "Stockholm"
   set end of klist to genKey of htbl for "Gothenburg"
   set end of klist to genKey of htbl for "Houston"
   set end of klist to genKey of htbl for "Galveston"
   set end of klist to genKey of htbl for "Miami"
   set end of klist to genKey of htbl for "Vera Cruz"
   set end of klist to genKey of htbl for "Sacramento"
   set end of klist to genKey of htbl for "Los Angeles"
   set end of klist to genKey of htbl for "San Fransisco"
   set end of klist to genKey of htbl for "Tempe"
   set end of klist to genKey of htbl for "Bogota"
   set end of klist to genKey of htbl for "Buenos Aires"
   set end of klist to genKey of htbl for "Brugge"
   set end of klist to genKey of htbl for "Florence"
   set end of klist to genKey of htbl for "Vienna"
   set end of klist to genKey of htbl for "Rotterdam"
   set end of klist to genKey of htbl for "Leiden"
   set end of klist to genKey of htbl for "Glasgow"
   set end of klist to genKey of htbl for "Aberdeen"
   set end of klist to genKey of htbl for "Liverpool"
   set end of klist to genKey of htbl for "Portsmouth"
   set end of klist to genKey of htbl for "Cardiff"
   set end of klist to genKey of htbl for "Karlstad"
   set end of klist to genKey of htbl for "Hjørring"
   set end of klist to genKey of htbl for "Dresden"
   set end of klist to genKey of htbl for "Hamburg"
   set end of klist to genKey of htbl for "Kiel"
   set end of klist to genKey of htbl for "Frankfurt"
   set end of klist to genKey of htbl for "Toulouse"
   set end of klist to genKey of htbl for "Madrid"
   set end of klist to genKey of htbl for "Barcelona"
   set end of klist to genKey of htbl for "Tampa"
   set end of klist to genKey of htbl for "Seattle"
   set end of klist to genKey of htbl for "Carson"
   -- 19 collisions during generation of 58 keys
   
   set aName to aValue of htbl for 32
   return klist
end driver

Last edited by McUsrII (2015-04-28 03:41:00 pm)

Offline

 

#5 2015-04-27 07:16:01 pm

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: A simple Alternative to a small hash table

Hello.

Just for the hell of it, I checked how many equal random numbers between 1 and 100 I got, when I generated 58 random numbers at a time, I got on the average 13 collisions. This means that my hash function doesn't perform that bad at all, even though 19 collisions may sound much, it is totally in line with the math, described in the post #1.

Offline

 

#6 2015-04-28 03:43:20 pm

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: A simple Alternative to a small hash table

Hello.

I removed a bug, that took effect in the makeHashTable script in post #4, when the key just consisted of one character. I also refactored out a makeInitalHashKey for aValue handler, and changed the script name inside the newHashTable to be more descriptive: "hashTable". smile

I should also mention, just in case, that the keys are case sensitive: Toy and ToY generates two different keys, if you don't take any precations. That is the user of the script's  responsibility, to handle.

Last edited by McUsrII (2015-04-28 04:10:57 pm)

Offline

 

#7 2015-06-12 01:51:23 pm

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: A simple Alternative to a small hash table

Hello.

For those interested in creating a hashtable large enough to avoid any collisions, before the number of stored keys reaches some treshold, here is the formula:

It turns out that by calculus, a function is generated for the number of elements a hashtable can contain before the probability of a collision reaches 1/2.

The formula is:

N=1.117*m^0.5


m is the number of keys that can be stored in the hashtable. -The hashtable size.
N that is returned, yields the number of keys that can be inserted into the hashtable of size m, before a probability of 1/2, for a collision, is reached. smile

Last edited by McUsrII (2015-06-12 01:53:15 pm)


Filed under: hashtable

Offline

 

Board footer

Powered by FluxBB

RSS (new topics) RSS (active topics)