Friday, November 24, 2017
  • Index
  •  » unScripted
  •  » Improve your Applescripts: Sorting - The Big "O"

#1 2007-09-03 06:00:49 am

Kevin Bradley
Administrator
From:: Independence, MO
Registered: 2006-03-13
Posts: 548
Website

Improve your Applescripts: Sorting - The Big "O"

Finding What Sort Of Scripter You Are

In my previous columns in this series, I showed you a technique to simplify your script creation (pseudocode) and how to use Applescript to create three of the basic data structures that are used in computer science (stacks, queues, and priority queues). The next usual topic in most computer science courses is sorting. Sorting a list of items is one of the most common actions in any programming language, and it is often given sort shrift by most programming language creators. While some languages provide a built-in sort function, many (including Applescript) do not. But when you are showing the user of your script a long "choose from list" dialog it is much easier for the user to select something if the list is sorted.

A Bubbly Personality

Perhaps the oldest and easiest to understand sorting method is the bubble sort. Simply put, the bubble sort compares the first two items in the (unsorted) list. If the first item is greater than the second, the items are swapped. Then the second item (which was the first if there was a swap last time) and the third item are compared and swapped if necessary. The process continues until you have worked through all the times, finally placing the largest item in the last position.

However, we're not nearly done! Now, we perform the same operation again, sorting the item next-to-last in the list, and then again for the item before next-to-last; you can see that with each pass, 1 item finds its proper place, but if the list has n items, you must loop through the list (n-1)+(n-2)+(n-3)...down to..1 times! So to sort 10 items, you have to do 9+8+7+6+5+4+3+2+1=45 comparisons. In AppleScript, the bubble sort looks like this:

Applescript:


on bubbleSort(theList)
   -- defining an internal script makes for faster run times!
   script bs
       property alist : theList
   end script
   set theCount to length of bs's alist
   if theCount < 2 then return bs's alist
   repeat with indexA from theCount to 1 by -1
       repeat with indexB from 1 to indexA - 1
           if item indexB of bs's alist > item (indexB + 1) of bs's alist then
               set temp to item indexB of bs's alist
               set item indexB of bs's alist to item (indexB + 1) of bs's alist
               set item (indexB + 1) of bs's alist to temp
           end if
       end repeat
   end repeat
   return bs's alist
end bubbleSort

While the bubble sort is okay for small lists (less than 20 items), it begins to bog down and get very slow for large numbers of items. The reason is that to sort the entire list (no matter how large it is) you have to do roughly (n*(n+1))/2 comparisons. To computer scientists and mathematicians, the important part of that equation is n*n or n squared.

Now, for 10 items we only did 45 comparisons, but 45 is "in the neighborhood" of 10 squared (or 100). For 100 items, we would do 99+98+97+...+1 comparisons, which works out to 5000 comparisons, which is "in the neighborhood" of 100 squared (or 10,000).

When comparing sorting algorithms or other processes, computer scientists look for the "order" of the operation, also called "the Big O" (for "order"). By this they mean, is the amount of processing required related to the number of inputs by simply n itself, n squared, or some other factor? In the case of the bubble sort, the operation is "big O of n squared." We can do better. Although the bubble sort is good for learning to sort, it is basically useless for any real sorting. There is a variation of the bubble sort where you keep track of whether you've had to swap items during a loop through the data, and if not, you're done, but even this doesn't help the bubble sort much.

Any time you have a repeat loop inside another repeat loop, the total number of times the program will loop is the number of times through the outer loop MULTIPLIED BY the number of times through the inner loop. This is why bubble sort is Big O n squared. Keep this in mind any time you are writing nested repeats!

Insert Tab A into Slot B

I don't know about you, but when I actually sort something by hand, I start a sorted stack (initially empty) and pull items from the unsorted bunch and insert the new item in its proper place in the sorted stack. If I'm short on workspace, I might even just arrange the items in order in the same stack of items. These are both examples of an insertion sort. For worst case (totally random sort), the bubble sort (n squared over 2) and the insertion sort (n squared over 4) are of about the same order. But in the situation where the data is basically in some order, then the insertion sort performs fewer swaps and is of the order (n+d) where d is the number of swaps performed. This is considerably better than n squared! Add to that the feature that you can add items spontaneously and they will be inserted in the proper place, and you have a much stronger sort than the bubble sort, which must operate on a fixed number of items.

Notice also that even though this sort has nested repeat loops, the inner loop only looks until it finds the spot for the sorted item, not through the entire list of items like bubble sort!

Applescript:


on insertSort(theList)
   --Insertion sort using the same list
   --(as opposed to using a second list)
   script bs
       property alist : theList
   end script
   
   set theLength to length of bs's alist
   if theLength > 1 then
       set insertHere to 2
       repeat while insertHere ≤ theLength
           if last item of bs's alist ≤ first item of bs's alist then
               set bs's alist to {last item of bs's alist} & items 1 thru (theLength - 1) of bs's alist
           else
               set lookfor to 1
               repeat while lookfor < insertHere and item lookfor of bs's alist ≤ last item of bs's alist
                   set lookfor to lookfor + 1
               end repeat
               set bs's alist to items 1 thru (lookfor - 1) of bs's alist & last item of bs's alist & items (lookfor) thru (theLength - 1) of bs's alist
           end if
           set insertHere to insertHere + 1
       end repeat
   end if
   return bs's alist
end insertSort

The Urge to Merge

The merge sort is based on a totally different idea. You first compare item 1 and item 2 and put them in the correct order, forming a sublist of 2 items. You do the same for items 3 and 4, making another sublist, and so on until you reach the end, creating two-item sublists that are in order within. Then you begin comparing the first item of two adjacent sublists. So you compare item 1 to item 3 (the first item of list A to first item of list B) and if they need to swap, do so. If they don't, then compare item 3 to item 2, swapping if necessary. When you are done with item 3, then do item 4 and now you've merged the two small two-item lists into a larger, four-item list. You continue this logic for all items, making larger sublists until you have two sublists containing half of the total items each, and then you merge those two lists.

The Applescript implementation of the merge sort requires us to create a handler inside the main sort handler, since this is a recursive algorithm. While the merge() handler has a repeat loop, it does NOT have any nested repeats. That alone should tell us that this sort is faster! In fact, the merge sort is of the order n*log(n), which is considerable better than n squared. Better yet, that n*log(n) performance holds true for both the average AND the worst cases!

Applescript:


on mergeSort(theList)
   --the public routine, to be called from your script
   script bs
       property alist : theList
       
       on merge(leftSide, rightSide)
           --private routine called by mergeSort.
           --do not call from your script!
           set newList to {}
           set theLeft to leftSide
           set theRight to rightSide
           set leftLength to length of theLeft
           set rightLength to length of theRight
           repeat while leftLength > 0 and rightLength > 0
               if first item of theLeft ≤ first item of theRight then
                   set newList to newList & first item of theLeft
                   set theLeft to (rest of theLeft)
               else
                   set newList to newList & first item of theRight
                   set theRight to rest of theRight
               end if
               set leftLength to length of theLeft
               set rightLength to length of theRight
           end repeat
           if leftLength > 0 then set newList to newList & theLeft
           if rightLength > 0 then set newList to newList & theRight
           return newList
       end merge
   end script
   
   set midList to 0
   set leftList to {}
   set rightList to {}
   set listLength to length of bs's alist
   if listLength ≤ 1 then
       return bs's alist
   else
       set midList to listLength div 2
       repeat with pointer from 1 to midList
           set leftList to leftList & item pointer of bs's alist
       end repeat
       repeat with pointer from (midList + 1) to listLength
           set rightList to rightList & item pointer of bs's alist
       end repeat
       set leftList to mergeSort(leftList)
       set rightList to mergeSort(rightList)
   end if
   return bs's merge(leftList, rightList)
end mergeSort

The Quick and the Dead

In an amazing display of common sense, the fastest sort that is widely used is called quicksort. The quicksort algorithm is a bit more complex than the other sorts, but the results are nothing short of amazing. Using quicksort, large lists of items that are unsortable in an amount of time that a user is willing to wait are sortable in a fraction of the time.

The idea of the quicksort is to choose a midpoint item and all the items larger than that item (called the "pivot") go to the right of it and the smaller items go to the left. These two sublists are then sorted the same way: choose a pivot, sort the items smaller to one side, larger to the other. It sounds simple enough, but there have been all kinds of methods for choosing the pivot item. The simplest example of quicksort just chooses the item in the middle of the list. The version I've listed below uses an internal handler to optimize the selection of the pivot value:

Applescript:


on quickSort(theList)
   --public routine, called from your script
   script bs
       property alist : theList
       
       on Qsort(leftIndex, rightIndex)
           --private routine called by quickSort.
           --do not call from your script!
           if rightIndex > leftIndex then
               set pivot to ((rightIndex - leftIndex) div 2) + leftIndex
               set newPivot to Qpartition(leftIndex, rightIndex, pivot)
               set theList to Qsort(leftIndex, newPivot - 1)
               set theList to Qsort(newPivot + 1, rightIndex)
           end if
           
       end Qsort
       
       on Qpartition(leftIndex, rightIndex, pivot)
           --private routine called by quickSort.
           --do not call from your script!
           set pivotValue to item pivot of bs's alist
           set temp to item pivot of bs's alist
           set item pivot of bs's alist to item rightIndex of bs's alist
           set item rightIndex of bs's alist to temp
           set tempIndex to leftIndex
           repeat with pointer from leftIndex to (rightIndex - 1)
               if item pointer of bs's alist ≤ pivotValue then
                   set temp to item pointer of bs's alist
                   set item pointer of bs's alist to item tempIndex of bs's alist
                   set item tempIndex of bs's alist to temp
                   set tempIndex to tempIndex + 1
               end if
           end repeat
           set temp to item rightIndex of bs's alist
           set item rightIndex of bs's alist to item tempIndex of bs's alist
           set item tempIndex of bs's alist to temp
           
           return tempIndex
       end Qpartition
       
   end script
   
   if length of bs's alist > 1 then bs's Qsort(1, length of bs's alist)
   return bs's alist
end quickSort

Some versions of quicksort just grab the first item for the pivot, but if the list is already somewhat sorted, then the benefits of quicksort evaporate and you're stuck with a sort that is Big O n squared again! To get around this fault in quicksort, some prefer to choose the middle item in the list or the middle value of the first, last and middle items. This version starts with the middle value and adjusts it when creating the partitions to find an optimal value. By doing this, this sort is almost always on the order of n*log(n) even in the worst possible case.

Out of Sorts

Sorting is something all of us use (or want to use) at one time or another. I hope this has helped show the pitfalls of sorting so you can select a sort that is appropriate for your use. I originally started working on sorting because I had an iTunes script that I wanted to display a list of my albums and the initial list was an unsorted mess. In researching sorts, I found that some are so slow that the are completely unusable. Others were only slow because I'd coded them in such a way that Applescript was dragging its feet, and recoding them speeded them up greatly. The quicksort above is VERY usable, even on an old G3 iMac (like mine) with an iTunes library of about 1000 songs. The "choose from list" dialog pops up very fast!

There is also a great thread in the MacScripter bbs' Code Exchange forum on sorting, and I'd like to thank everyone who participated in that discussion, as it not only helped me to write these sort routines, they also helped me improve these routines for speed! All the sort routines in this article, plus the versions of quicksort optimized by Adam Bell and by Nigel Garvey that are in the Code Exchange thread, are in my Nite Flite library's sortLib on ScriptBuilders.net as well.

More Information

For more information about sorting:

Wikipedia - Sorting Algorithms
Wikipedia - Bubble Sort
Wikipedia - Insertion Sort
Wikipedia - Merge Sort
Wikipedia - Quicksort


Nitewing '98
--
I distrust morning people, largely because I suspect them of getting together early one day while the rest of us were asleep and setting up the rules of civilization.


Filed under: Sorting, sort, Bradley

Offline

 
  • Index
  •  » unScripted
  •  » Improve your Applescripts: Sorting - The Big "O"

Board footer

Powered by FluxBB

RSS (new topics) RSS (active topics)