Finding What Sort Of Scripter You Are
In my previous columns in this series, I showed you a technique to simplify your script creation (pseudocode) and how to use Applescript to create three of the basic data structures that are used in computer science (stacks, queues, and priority queues). The next usual topic in most computer science courses is sorting. Sorting a list of items is one of the most common actions in any programming language, and it is often given sort shrift by most programming language creators. While some languages provide a built-in sort function, many (including Applescript) do not. But when you are showing the user of your script a long “choose from list” dialog it is much easier for the user to select something if the list is sorted.
A Bubbly Personality
Perhaps the oldest and easiest to understand sorting method is the bubble sort. Simply put, the bubble sort compares the first two items in the (unsorted) list. If the first item is greater than the second, the items are swapped. Then the second item (which was the first if there was a swap last time) and the third item are compared and swapped if necessary. The process continues until you have worked through all the times, finally placing the largest item in the last position.
However, we’re not nearly done! Now, we perform the same operation again, sorting the item next-to-last in the list, and then again for the item before next-to-last; you can see that with each pass, 1 item finds its proper place, but if the list has n items, you must loop through the list (n-1)+(n-2)+(n-3)…down to…1 times! So to sort 10 items, you have to do 9+8+7+6+5+4+3+2+1=45 comparisons. In AppleScript, the bubble sort looks like this:
on bubbleSort(theList)
-- defining an internal script makes for faster run times!
script bs
property alist : theList
end script
set theCount to length of bs's alist
if theCount < 2 then return bs's alist
repeat with indexA from theCount to 1 by -1
repeat with indexB from 1 to indexA - 1
if item indexB of bs's alist > item (indexB + 1) of bs's alist then
set temp to item indexB of bs's alist
set item indexB of bs's alist to item (indexB + 1) of bs's alist
set item (indexB + 1) of bs's alist to temp
end if
end repeat
end repeat
return bs's alist
end bubbleSort
While the bubble sort is okay for small lists (less than 20 items), it begins to bog down and get very slow for large numbers of items. The reason is that to sort the entire list (no matter how large it is) you have to do roughly (n*(n+1))/2 comparisons. To computer scientists and mathematicians, the important part of that equation is n*n or n squared.
Now, for 10 items we only did 45 comparisons, but 45 is “in the neighborhood” of 10 squared (or 100). For 100 items, we would do 99+98+97+…+1 comparisons, which works out to 5000 comparisons, which is “in the neighborhood” of 100 squared (or 10,000).
When comparing sorting algorithms or other processes, computer scientists look for the “order” of the operation, also called “the Big O” (for “order”). By this they mean, is the amount of processing required related to the number of inputs by simply n itself, n squared, or some other factor? In the case of the bubble sort, the operation is “big O of n squared.” We can do better. Although the bubble sort is good for learning to sort, it is basically useless for any real sorting. There is a variation of the bubble sort where you keep track of whether you’ve had to swap items during a loop through the data, and if not, you’re done, but even this doesn’t help the bubble sort much.
Any time you have a repeat loop inside another repeat loop, the total number of times the program will loop is the number of times through the outer loop MULTIPLIED BY the number of times through the inner loop. This is why bubble sort is Big O n squared. Keep this in mind any time you are writing nested repeats!
Insert Tab A into Slot B
I don’t know about you, but when I actually sort something by hand, I start a sorted stack (initially empty) and pull items from the unsorted bunch and insert the new item in its proper place in the sorted stack. If I’m short on workspace, I might even just arrange the items in order in the same stack of items. These are both examples of an insertion sort. For worst case (totally random sort), the bubble sort (n squared over 2) and the insertion sort (n squared over 4) are of about the same order. But in the situation where the data is basically in some order, then the insertion sort performs fewer swaps and is of the order (n+d) where d is the number of swaps performed. This is considerably better than n squared! Add to that the feature that you can add items spontaneously and they will be inserted in the proper place, and you have a much stronger sort than the bubble sort, which must operate on a fixed number of items.
Notice also that even though this sort has nested repeat loops, the inner loop only looks until it finds the spot for the sorted item, not through the entire list of items like bubble sort!
on insertSort(theList)
--Insertion sort using the same list
--(as opposed to using a second list)
script bs
property alist : theList
end script
set theLength to length of bs's alist
if theLength > 1 then
set insertHere to 2
repeat while insertHere ≤ theLength
if last item of bs's alist ≤ first item of bs's alist then
set bs's alist to {last item of bs's alist} & items 1 thru (theLength - 1) of bs's alist
else
set lookfor to 1
repeat while lookfor < insertHere and item lookfor of bs's alist ≤ last item of bs's alist
set lookfor to lookfor + 1
end repeat
set bs's alist to items 1 thru (lookfor - 1) of bs's alist & last item of bs's alist & items (lookfor) thru (theLength - 1) of bs's alist
end if
set insertHere to insertHere + 1
end repeat
end if
return bs's alist
end insertSort
The Urge to Merge
The merge sort is based on a totally different idea. You first compare item 1 and item 2 and put them in the correct order, forming a sublist of 2 items. You do the same for items 3 and 4, making another sublist, and so on until you reach the end, creating two-item sublists that are in order within. Then you begin comparing the first item of two adjacent sublists. So you compare item 1 to item 3 (the first item of list A to first item of list B) and if they need to swap, do so. If they don’t, then compare item 3 to item 2, swapping if necessary. When you are done with item 3, then do item 4 and now you’ve merged the two small two-item lists into a larger, four-item list. You continue this logic for all items, making larger sublists until you have two sublists containing half of the total items each, and then you merge those two lists.
The Applescript implementation of the merge sort requires us to create a handler inside the main sort handler, since this is a recursive algorithm. While the merge() handler has a repeat loop, it does NOT have any nested repeats. That alone should tell us that this sort is faster! In fact, the merge sort is of the order n*log(n), which is considerable better than n squared. Better yet, that n*log(n) performance holds true for both the average AND the worst cases!
on mergeSort(theList)
--the public routine, to be called from your script
script bs
property alist : theList
on merge(leftSide, rightSide)
--private routine called by mergeSort.
--do not call from your script!
set newList to {}
set theLeft to leftSide
set theRight to rightSide
set leftLength to length of theLeft
set rightLength to length of theRight
repeat while leftLength > 0 and rightLength > 0
if first item of theLeft ≤ first item of theRight then
set newList to newList & first item of theLeft
set theLeft to (rest of theLeft)
else
set newList to newList & first item of theRight
set theRight to rest of theRight
end if
set leftLength to length of theLeft
set rightLength to length of theRight
end repeat
if leftLength > 0 then set newList to newList & theLeft
if rightLength > 0 then set newList to newList & theRight
return newList
end merge
end script
set midList to 0
set leftList to {}
set rightList to {}
set listLength to length of bs's alist
if listLength ≤ 1 then
return bs's alist
else
set midList to listLength div 2
repeat with pointer from 1 to midList
set leftList to leftList & item pointer of bs's alist
end repeat
repeat with pointer from (midList + 1) to listLength
set rightList to rightList & item pointer of bs's alist
end repeat
set leftList to mergeSort(leftList)
set rightList to mergeSort(rightList)
end if
return bs's merge(leftList, rightList)
end mergeSort
The Quick and the Dead
In an amazing display of common sense, the fastest sort that is widely used is called quicksort. The quicksort algorithm is a bit more complex than the other sorts, but the results are nothing short of amazing. Using quicksort, large lists of items that are unsortable in an amount of time that a user is willing to wait are sortable in a fraction of the time.
The idea of the quicksort is to choose a midpoint item and all the items larger than that item (called the “pivot”) go to the right of it and the smaller items go to the left. These two sublists are then sorted the same way: choose a pivot, sort the items smaller to one side, larger to the other. It sounds simple enough, but there have been all kinds of methods for choosing the pivot item. The simplest example of quicksort just chooses the item in the middle of the list. The version I’ve listed below uses an internal handler to optimize the selection of the pivot value:
on quickSort(theList)
--public routine, called from your script
script bs
property alist : theList
on Qsort(leftIndex, rightIndex)
--private routine called by quickSort.
--do not call from your script!
if rightIndex > leftIndex then
set pivot to ((rightIndex - leftIndex) div 2) + leftIndex
set newPivot to Qpartition(leftIndex, rightIndex, pivot)
set theList to Qsort(leftIndex, newPivot - 1)
set theList to Qsort(newPivot + 1, rightIndex)
end if
end Qsort
on Qpartition(leftIndex, rightIndex, pivot)
--private routine called by quickSort.
--do not call from your script!
set pivotValue to item pivot of bs's alist
set temp to item pivot of bs's alist
set item pivot of bs's alist to item rightIndex of bs's alist
set item rightIndex of bs's alist to temp
set tempIndex to leftIndex
repeat with pointer from leftIndex to (rightIndex - 1)
if item pointer of bs's alist ≤ pivotValue then
set temp to item pointer of bs's alist
set item pointer of bs's alist to item tempIndex of bs's alist
set item tempIndex of bs's alist to temp
set tempIndex to tempIndex + 1
end if
end repeat
set temp to item rightIndex of bs's alist
set item rightIndex of bs's alist to item tempIndex of bs's alist
set item tempIndex of bs's alist to temp
return tempIndex
end Qpartition
end script
if length of bs's alist > 1 then bs's Qsort(1, length of bs's alist)
return bs's alist
end quickSort
Some versions of quicksort just grab the first item for the pivot, but if the list is already somewhat sorted, then the benefits of quicksort evaporate and you’re stuck with a sort that is Big O n squared again! To get around this fault in quicksort, some prefer to choose the middle item in the list or the middle value of the first, last and middle items. This version starts with the middle value and adjusts it when creating the partitions to find an optimal value. By doing this, this sort is almost always on the order of n*log(n) even in the worst possible case.
Out of Sorts
Sorting is something all of us use (or want to use) at one time or another. I hope this has helped show the pitfalls of sorting so you can select a sort that is appropriate for your use. I originally started working on sorting because I had an iTunes script that I wanted to display a list of my albums and the initial list was an unsorted mess. In researching sorts, I found that some are so slow that the are completely unusable. Others were only slow because I’d coded them in such a way that Applescript was dragging its feet, and recoding them speeded them up greatly. The quicksort above is VERY usable, even on an old G3 iMac (like mine) with an iTunes library of about 1000 songs. The “choose from list” dialog pops up very fast!
There is also a great thread in the MacScripter bbs’ Code Exchange forum on sorting, and I’d like to thank everyone who participated in that discussion, as it not only helped me to write these sort routines, they also helped me improve these routines for speed! All the sort routines in this article, plus the versions of quicksort optimized by Adam Bell and by Nigel Garvey that are in the Code Exchange thread, are in my Nite Flite library’s sortLib on ScriptBuilders.net as well.
More Information
For more information about sorting:
Wikipedia - Sorting Algorithms
Wikipedia - Bubble Sort
Wikipedia - Insertion Sort
Wikipedia - Merge Sort
Wikipedia - Quicksort