Heap sort

Heap sorts are cleverer than they are fast, which is perhaps why we don’t hear much about them in AppleScript circles. That’s not to say they’re slow: it’s just that, given implementations of equal competence, quicksorts and merge sorts are generally faster so we might as well use them anyway. But heap sorts are interesting if you like to know how things work. :slight_smile:

The “heap” of the title is a notional arrangement of the items being sorted: as a layered heap, with one item at the top, then (if it’s a binary heap) two at the second level, four at the next, eight at the next, and so on. The bottom layer doesn’t have to be complete if there aren’t enough items to fill it.

The idea is that, once the heap’s established, it becomes a set of merging, vertical queues inching towards the top. Each slot in each level is the merge point for two queues rising from below, so there are (up to) two items in the level below waiting to move up into it when the present incumbent moves on. In this discussion, an item moves up when the slot ahead of it becomes free and the item has a higher value than its rival or is the first of equals. (In computerese, and in the Wikipedia article mentioned below, a slot that you think of as a merge point is a “root” and the two lower slots associated with it are its “children”.)

The “sift” process that initialises the heap (and later progresses the queues) puts the items into order in the queues, but the order horizontally might be quite chaotic. The priority system at the merge points ensures that the highest value in the sort reaches the top first, followed by the second (or co-) highest, then the third, and so on. These values are removed as they reach the top and are stored neatly somewhere else. This leaves the way clear for the higher of the two values in the second level to move up to the top, which in turn leaves the way clear for the higher of the two values below the higher value to take its place in the second level, and so on, down to the end of whichever way the succession goes.

In what this Wikipedia article calls “the simple variant”, the heap layers are laid end-to-end in the sort list so that the “top” is at the left, the “bottom” is towards the right, and items rise to the top through a process called “sifting down”! The heap’s a binary one and items come off the top in reverse order as described above. As each item’s removed, it’s stored in the slot at the far right of the current sort range, which is where it needs to be when the sort’s finished. The range is then reduced by one and the process repeats with the remaining items until the heap is exhausted. The list thus becomes ordered from the right as the heap diminishes towards the left.

But the end of the current sort range is also the current end of the heap, and it’s only occasionally that the queue containing the end item is advanced, so something needs to be done with that item when the value taken from the top is plonked down in its place. Rather ingeniously, it’s transferred to the queue that does move. It follows the vacancy down the queue as successive items move up and, if it has a higher value than both of the contenders for a particular slot, pushes in at that point. Otherwise it ends up at the bottom of that queue.

Here’s an AppleScript version of “simple variant” that sorts a specified range of a list:

(* "Simple variant" heap sort. *)

on heapSort(theList, l, r) -- Sort items l thru r of theList.
	script o
		-- The distance in the list from a root to its first child is proportional to the root's distance from the beginning of the sort range. The formula is l + (root - l) * c + 1, where c is the number of children per root. Rearranging this to precalculate as much of it as possible, we get (root * c) - (l * (c - 1) - 1). c is 2 in a binary heap, so (l * (c - 1) - 1) = l - 1. We'll use whatever this is in siftDown().
		property precalc : l - 1
		property lst : theList
		
		on hsrt(l, r)
			-- Set up the heap. (Comment in Wikipedia pseudocode: "Starting at the index of the
			-- last parent node, sift down so that all nodes below that index are in heap order.")
			repeat with i from (l + r) div 2 to l by -1
				siftDown(i, r, item i of my lst)
			end repeat
			
			-- The highest value in the sort is now at the top of the heap. (Left of sort range).
			-- Move it to the right end of the range after rescuing the last element of the heap from that slot.
			set endVal to item r of my lst
			set item r of my lst to item l of my lst
			
			-- Successively reduce the sort range, sift down (mixing in the rescued last element),
			-- rescue the next last element, and position the next highest value, until done.
			repeat with r from (r - 1) to (l + 1) by -1
				siftDown(l, r, endVal)
				
				set endVal to item r of my lst
				set item r of my lst to item l of my lst
			end repeat
			
			-- Put the last rescued end value at the beginning of the range.
			set item l of my lst to endVal
		end hsrt
		
		-- "Sift down" a queue. Find a place in it for the passed value.
		on siftDown(root, r, forQueue) -- Start-of-queue index, end-of-heap index, value to place in queue.
			set child to root * 2 - precalc -- Locate the first (left) child of the root.
			repeat until (child > r)
				set childVal to item child of my lst
				-- If the child has a sibling with a higher value, use that instead.
				if (child < r) then
					set child2 to child + 1
					set child2Val to item child2 of my lst
					if (child2Val > childVal) then
						set child to child2
						set childVal to child2Val
					end if
				end if
				
				-- If the value to place in the queue is less than the higher child value, advance
				-- the child and prepare to check its children. Otherwise the value will go here.
				if (forQueue < childVal) then
					set item root of my lst to childVal
					set root to child
					set child to root * 2 - precalc
				else
					set child to r + 1 -- exit repeat
				end if
			end repeat
			
			-- Insert the value to be introduced at the 'root' index reached.
			set item root of my lst to forQueue
		end siftDown
	end script
	
	o's hsrt(l, r)
end heapSort

set aList to {}
repeat 100 times
	set end of my aList to random number 4000
end repeat

heapSort(aList, 1, (count aList))
aList

The Wikipedia article also mentions the possibility of a ternary heap sort, where each root has three children instead of two. It takes longer to compare three items than two, but ternary heaps allow more and shorter queues, so items don’t need to be moved so often. The overall effect is a slight speed gain. The article claims this is in the order of 12%, but I’m only getting around 3-or-4%. I suspect that the 12% figure is based on the performance of the article’s inefficient “simple variant” pseudocode, which inexplicably swaps the rescued item with each of the rising queue items instead of just holding it in a variable while they move up. A ternary version performing fewer moves would be saving the same number of unnecessary swap processes, so the speed improvement would be more dramatic than with more efficient code.

(* Ternary heap sort. *)

on heapSort(theList, l, r) -- Sort items l thru r of theList.
	script o
		-- The distance in the list from a root to its first child is proportional to the root's distance from the beginning of the sort range. The formula is l + (root - l) * c + 1, where c is the number of children per root. Rearranging this to precalculate as much of it as possible, we get (root * c) - (l * (c - 1) - 1).  c is 3 in a ternary heap, so (l * (c - 1) - 1) = l * 2 - 1. We'll use whatever this is in siftDown().
		property precalc : l * 2 - 1
		property lst : theList
		
		on hsrt(l, r)
			-- Set up the heap. (Comment in Wikipedia pseudocode: "Starting at the index of the
			-- last parent node, sift down so that all nodes below that index are in heap order.")
			repeat with i from l + (r - l) div 3 to l by -1
				siftDown(i, r, item i of my lst)
			end repeat
			
			-- The highest value in the sort is now at the top of the heap. (Left of sort range).
			-- Move it to the right end of the range after rescuing the last element of the heap from that slot.
			set endVal to item r of my lst
			set item r of my lst to item l of my lst
			
			-- Successively reduce the sort range, sift down (mixing in the rescued last element),
			-- rescue the next last element, and position the next highest value, until done.
			repeat with r from (r - 1) to (l + 1) by -1
				siftDown(l, r, endVal)
				
				set endVal to item r of my lst
				set item r of my lst to item l of my lst
			end repeat
			
			-- Put the last rescued end value at the beginning of the range.
			set item l of my lst to endVal
		end hsrt
		
		-- "Sift down" a queue. Find a place in it for the passed value.
		on siftDown(root, r, forQueue) -- Start-of-queue index, end-of-heap index, value to place in queue.
			set child to root * 3 - precalc -- Locate the first (left) child of the root.
			repeat until (child > r)
				set childVal to item child of my lst
				-- If the child has sibling(s), use the first of whichever has the highest value.
				if (child < r) then
					set child2 to child + 1
					set child2Val to item child2 of my lst
					if (child2Val > childVal) then
						set child to child2
						set childVal to child2Val
					end if
					if (child2 < r) then
						set child3 to child2 + 1
						set child3Val to item child3 of my lst
						if (child3Val > childVal) then
							set child to child3
							set childVal to child3Val
						end if
					end if
				end if
				
				-- If the value to place in the queue is less than the highest child value, advance
				-- the child and prepare to check its children. Otherwise the value will go here.
				if (forQueue < childVal) then
					set item root of my lst to childVal
					set root to child
					set child to root * 3 - precalc
				else
					set child to r + 1 -- exit repeat
				end if
			end repeat
			
			-- Insert the value to be introduced at the 'root' index reached.
			set item root of my lst to forQueue
		end siftDown
	end script
	
	o's hsrt(l, r)
end heapSort

set aList to {}
repeat 100 times
	set end of my aList to random number 4000
end repeat

heapSort(aList, 1, (count aList))
aList

Edit: Revamped comments in the code.
More recently, corrected the ‘from’ parameter in the first ‘repeat’ statement of the ternary script

13th September 2010: I’ve uploaded new versions of these sorts, as part of a collection, to ScriptBuilders.