Saturday, November 18, 2017

#1 2007-02-26 09:02:54 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4420

A merge sort

Having enjoyed Kevin Bradley's ScriptWire article on sorting methods last month, I though I'd have a go at writing a merge sort, and finally got around to it this weekend. Like the qsort handler that Arthur Knapp and I wrote a few years ago, the effort below sorts "in place" – that is, it sorts the actual list it's fed. It doesn't return a sorted copy. It can also sort just part of a list, if required. It's not quite as fast as qsort, but it's not bad.

Merge sort is a recursive, divide-and-conquer algorithm that sorts each half of a list and then "merges" the two halves into a sorted whole. Before that, the halves of the halves are sorted, and so on. At the deepest level of recursion, the "halves" are already ordered internally by virtue of having less than two items!

The customary implementation seems to be to create a "left list" and a "right list" at each level of recursion and to chop these up into smaller lists while building yet another, "merged" list, which is then passed back upstream for use at the previous level. My effort reduces the amount of list waste by only creating one extra list at each recursion level. This list is a copy of the whole of the current section after the two halves have been sorted. Two pointers are used to index the two halves, which are merged directly back to the part of original list from which the copy was taken.

Applescript:

(* Merge sort
Merge sort algorithm: John von Neumann, 1945.
AppleScript implementation: Nigel Garvey, 2007/2015.

Parameters: (list, range index 1, range index 2)
*)


on mergeSort(theList, l, r) -- Sort items l thru r of theList.
   script o
       property lst : theList
       property auxList : missing value
       
       on msrt(l, r)
           set leftR to (l + r) div 2 -- Left half end index.
           set rx to leftR + 1 -- Right half start/tracking index.
           
           -- Sort the left half by recursion if it has more than two items or by swapping as necessary if there are two. If one, do nothing.
           if (leftR - l > 1) then
               msrt(l, leftR)
           else if (leftR > l) then
               set lv to item l of my lst
               set rv to item leftR of my lst
               if (rv < lv) then
                   set item l of my lst to rv
                   set item leftR of my lst to lv
               end if
           end if
           -- Sort the right half similarly.
           if (r - rx > 1) then
               msrt(rx, r)
           else if (r > rx) then
               set lv to item rx of my lst
               set rv to item r of my lst
               if (rv < lv) then
                   set item rx of my lst to rv
                   set item r of my lst to lv
               end if
           end if
           
           -- Extract a separate list of the left half's items and set tracking and end indices for it.
           set auxList to items l thru leftR of my lst
           set lx to 1
           set leftR to rx - l
           
           -- Iterate through the range, merging the two halves by comparing the lowest unassigned value from auxList with that from the right half of the range and assigning the lower of the two (or the left if they're equal) to the current slot.
           
           -- Begin with the first (lowest) value from each half.
           set lv to beginning of my auxList
           set rv to item rx of my lst
           repeat with dx from l to r
               if (rv < lv) then
                   -- The right value's less than the left. Assign it to this slot.
                   set item dx of my lst to rv
                   -- If no more right-half values, assign the remaining left values to the remaining slots and exit this repeat and recursion level.
                   if (rx is r) then
                       repeat with dx from (dx + 1) to r
                           set item dx of my lst to item lx of my auxList
                           set lx to lx + 1
                       end repeat
                       exit repeat
                   end if
                   -- Otherwise get the next right-half value.
                   set rx to rx + 1
                   set rv to item rx of my lst
               else
                   -- The left value's less than or equal to the right.
                   set item dx of my lst to lv
                   -- If no more left-half values, simply exit this repeat and recursion level as the remaining right values are in place anyway.
                   if (lx is leftR) then exit repeat
                   -- Otherwise get the next left-half value
                   set lx to lx + 1
                   set lv to item lx of my auxList
               end if
           end repeat
       end msrt
   end script
   
   -- Process the input parameters.
   set listLen to (count theList)
   if (listLen > 1) then
       -- Negative and/or transposed range indices.
       if (l < 0) then set l to listLen + l + 1
       if (r < 0) then set r to listLen + r + 1
       if (l > r) then set {l, r} to {r, l}
       
       -- Do the sort.
       o's msrt(l, r)
   end if
   
   return -- nothing
end mergeSort

property sort : mergeSort

-- (* Demo:
set l to {}
repeat 1000 times
   set end of my l to (random number 1000)
end repeat

sort(l, 1, -1)
l
-- *)

13th September 2010: I've uploaded a new version of this script, as part of a collection, to ScriptBuilders.
29th November 2013: Since ScriptBuilders is now defunct and unlikely to return, I've replaced the code originally posted here with the latest version.
14th June 2015: The amount of waste has been further reduced (after eight years!) by only extracting the left halves of the merge ranges to the separate lists. Comments and variable names have been changed and the former mutually recursive sortHalf() handler has been replaced with a couple of bits of in-line code.

Last edited by Nigel Garvey (2015-06-14 01:40:42 pm)


NG

Offline

 

#2 2010-07-23 08:53:40 am

McUsr
Member
From:: Southern Norway
Registered: 2010-04-07
Posts: 1776

Re: A merge sort

Hello.

I'd just thought that I should mention that a merge sort is a stable sort. This means that if you sort a list on on criterion, and the same list on another criterion, it will keep the previous order of the former criterion.

Example: If you sort files by names first, and the sorts them by date afterwards, the files with an equal date will still be sorted by name.

This is the major property of a merge sort algorithm.


Mercurial vcs is a joy to use for scripting.


Filed under: merge sort

Offline

 

#3 2010-07-23 09:05:24 am

Adam Bell
Administrator
From:: Nova Scotia, Canada
Registered: 2005-10-04
Posts: 4660

Re: A merge sort

Thanks for posting that, McUsr. It means that Nigel's merge sort has advantages other than speed over the faster qsort. When you think about how the merge sort works by dividing down, you can see that it would preserve any presorted order as it sorted on a new property. Great stuff.


iMac running OS X 10.13.1

Offline

 

#4 2010-07-23 09:25:23 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4420

Re: A merge sort

Or to put it another way, if items in the list have equal values, they'll remain in the same order with respect to each other after they're merge sorted. This isn't necessarily the case with other sorts — and it doesn't usually matter, as equal values look the same whatever their order with respect to each other. But there are times, as you say, when it's desirable to keep their relative order.

I must admit that, for my own use, I use CustomQsort to sort and subsort spreadsheet entries. But I must try a merge sort version one day to see if it's any faster.  smile


NG

Offline

 

#5 2010-07-23 09:11:03 pm

McUsr
Member
From:: Southern Norway
Registered: 2010-04-07
Posts: 1776

Re: A merge sort

Hello.

Robert Sedgewick states in my old "Algorithms" book (second edition) that Merge Sort has  a worst case performance of N Log N, a Quick Sort has a worst case of N ^ 2. They have both and average of N log N. The advantage of the standard Quick Sort is the low requirement for stack space, while Merge Sort requires space proportional to N, which is rather expensive. (I haven't figured out how expensive that is as I haven't figured out the proportions. And I will not do that before I use it.  I heard from an authority that RDBMS uses Merge Sort.

Tools like that and database events, should make it feasible to make good looking reports with AppleScript.

If you remember the Spotlight Search window of Tiger, smile that's the best example of an unstable sort I can think of. smile

I look forward to try your Custom Sort handler. big_smile  As well as the Merge Sort when time permits me to delve into database events again.

Last edited by McUsr (2010-07-23 09:17:27 pm)


Mercurial vcs is a joy to use for scripting.


Filed under: Sorting

Offline

 

#6 2010-07-25 07:47:59 pm

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4420

Re: A merge sort

OK. To merge sort "on" anything, you need a customisable merge sort. Here's the customisable version of the above, which I don't think I've posted before:

13th September 2010: I've uploaded a new version of this script with improved customisation and different names for the "slave move" handlers, as part of a collection, to ScriptBuilders.
27th November 2013: Since ScriptBuilders is now defunct, I've replaced the code originally posted here with the latest version. There are differences in both the code and the form of the customisation parameter, so I've updated the entire post.
14th June 2015: The amount of waste has been further reduced (after eight years!) by only extracting the left halves of the merge ranges to separate lists. Comments and variable names have been changed and the former mutually recursive sortHalf() handler has been replaced with a couple of bits of in-line code. A reposition() hander has been added to the slave script object to cover for the fact that the right half of each merge now comes directly from the original list instead of from an auxiliary.


Applescript:

(* Merge sort — customisable version
Merge sort algorithm: John von Neumann, 1945.
AppleScript implementation: Nigel Garvey, 2007/2015.

Parameters: (list, range index 1, range index 2, record with optional 'comparer' and 'slave' properties). The 'comparer' value is a script object containing an isGreater(a, b) handler which determines if object a is "greater" than object b. The 'slave' value is a script object containing an extract(a, b) handler, which is called when the sort has extracted a copy of range a thru b of the list being sorted; a reposition(a, b) handler, called when item b in the main list has been moved to position b; a merge(a, b) handler, called when item a in an auxiliary list has been moved to position b in the main list; and a swap(a, b) handler, called when items a and b in the main list have been swapped.
Where the 'comparer' or 'slave' properties are omitted, or the customisation parameter isn't a record, the sort has default handlers which respectively compare items directly and do nothing.
*)

on CustomMergeSort(theList, l, r, customiser) -- Sort items l thru r of theList.
   script o
       property comparer : me
       property slave : me
       
       property lst : theList
       property auxList : missing value
       
       on msrt(l, r)
           set leftR to (l + r) div 2 -- Left half end index.
           set rx to leftR + 1 -- Right half start/tracking index.
           
           -- Sort the left half by recursion if it has more than two items or by swapping as necessary if there are two. If one, do nothing.
           if (leftR - l > 1) then
               msrt(l, leftR)
           else if (leftR > l) then
               set lv to item l of my lst
               set rv to item leftR of my lst
               if (comparer's isGreater(lv, rv)) then
                   set item l of my lst to rv
                   set item leftR of my lst to lv
                   slave's swap(l, leftR)
               end if
           end if
           -- Sort the right half similarly.
           if (r - rx > 1) then
               msrt(rx, r)
           else if (r > rx) then
               set lv to item rx of my lst
               set rv to item r of my lst
               if (comparer's isGreater(lv, rv)) then
                   set item rx of my lst to rv
                   set item r of my lst to lv
                   slave's swap(rx, r)
               end if
           end if
           
           -- Extract a separate list of the left half's items and set tracking and end indices for it.
           set auxList to items l thru leftR of my lst
           slave's extract(l, leftR)
           set lx to 1
           set leftR to rx - l
           
           -- Iterate through the range, merging the two halves by comparing the lowest unassigned value from auxList with that from the right half of the range and assigning the lower of the two (or the left if they're equal) to the current slot.
           
           -- Begin with the first (lowest) value from each half.
           set lv to beginning of my auxList
           set rv to item rx of my lst
           repeat with dx from l to r
               if (comparer's isGreater(lv, rv)) then
                   -- The right value's less than the left. Assign it to this slot.
                   set item dx of my lst to rv
                   slave's reposition(rx, dx)
                   -- If no more right-half values, assign the remaining left values to the remaining slots and exit this repeat and recursion level.
                   if (rx is r) then
                       repeat with dx from (dx + 1) to r
                           set item dx of my lst to item lx of my auxList
                           slave's merge(lx, dx)
                           set lx to lx + 1
                       end repeat
                       exit repeat
                   end if
                   -- Otherwise get the next right-half value.
                   set rx to rx + 1
                   set rv to item rx of my lst
               else
                   -- The left value's less than or equal to the right.
                   set item dx of my lst to lv
                   slave's merge(lx, dx)
                   -- If no more left-half values, simply exit this repeat and recursion level as the remaining right values are in place anyway.
                   if (lx is leftR) then exit repeat
                   -- Otherwise get the next left-half value
                   set lx to lx + 1
                   set lv to item lx of my auxList
               end if
           end repeat
       end msrt
       
       -- Default comparison and slave handlers for an ordinary sort.
       on isGreater(a, b)
           (a > b)
       end isGreater
       
       on extract(a, b)
       end extract
       
       on reposition(a, b)
       end reposition
       
       on merge(a, b)
       end merge
       
       on swap(a, b)
       end swap
   end script
   
   -- Process the input parameters.
   set listLen to (count theList)
   if (listLen > 1) then
       -- Negative and/or transposed range indices.
       if (l < 0) then set l to listLen + l + 1
       if (r < 0) then set r to listLen + r + 1
       if (l > r) then set {l, r} to {r, l}
       
       -- Supplied or default customisation scripts.
       if (customiser's class is record) then set {comparer:o's comparer, slave:o's slave} to (customiser & {comparer:o, slave:o})
       
       -- Do the sort.
       o's msrt(l, r)
   end if
   
   return -- nothing
end CustomMergeSort

property sort : CustomMergeSort

-- (* Contrived demo:
-- Reverse sort a list of random integers, sorting a list of matching strings in parallel with it.

script backwards
   -- Returns the opposite of the truth, for a reversed sort.
   on isGreater(a, b)
       (a < b) -- (a ≤ b) to reverse the stability too!
   end isGreater
end script

script parallel
   -- Duplicates moves from a merge sort in its own list.
   property lst : missing value
   property auxList : missing value
   
   on extract(a, b)
       set auxList to items a thru b of my lst
   end extract
   
   on reposition(a, b)
       set item b of my lst to item a of my lst
   end reposition
   
   on merge(a, b)
       set item b of my lst to item a of my auxList
   end merge
   
   on swap(a, b)
       tell item a of my lst
           set item a of my lst to item b of my lst
           set item b of my lst to it
       end tell
   end swap
end script

set l to {}
set l2 to {}
repeat 1000 times
   set end of my l to (random number 1000)
   set end of my l2 to result as text
end repeat
set parallel's lst to l2

sort(l, 1, -1, {comparer:backwards, slave:parallel})
l2
-- *)

Like CustomQsort() (a combined QuickSort/insertion sort by Arthur Knapp and myself), CustomMergeSort() takes a fourth, customisation parameter. Originally, this was a script object containing both comparison and "slave" handlers (which I'll explain in a moment). But the new version above is more flexible. Its customisation parameter is a record with 'comparer' and 'slave' properties which separately specify script objects for the comparison and slave actions. The two kinds of action can thus be in the same or different script objects, allowing a more mix-and-match approach. Furthermore, both script objects are now optional, the sort having internal defaults to use when they're omitted.

Comparisons: Instead of comparing the list items itself, CustomMergeSort() passes each pair to an isGreater() handler in the 'comparer' script object. This handler is expected to return a boolean indicating whether or not the value of the first item is "greater" than that of the second by the criteria the handler's designed to employ. For instance, if the items are lists, the handler may be written to see if item x of list a is greater than item x of list b. Or if list a is longer than list b. Or if list a contains a certain kind of value and list b doesn't. etc. If the items are records, the values of certain properties may need to be compared, eg.:

Applescript:

script compareRecordsByAardvark
   on isGreater(a, b)
       return (a's aardvark > b's aardvark)
   end isGreater
end script

Slave actions: Every time the sort moves items in the list, it sends information about the moves to relevant handlers in the 'slave' script object. The slave handler(s) will normally either reproduce the moves in another list — thus sorting it in parallel with the first — or simply be blank and do nothing at all. Unfortunately, you have to know how the sort moves things around to be able to write functional slave handlers yourself. The merge sorts here have three kinds of action and CustomMergeSort() expects to be able to call extract(), merge(), and swap() handlers in the 'slave' script object. To sort another list in parallel with the first, a "slave" script object might look like this:

Applescript:

script parallel
   property lst : missing value -- Set this property to the list to be sorted in parallel with the main list.
   property auxList : missing value
   
   on extract(a, b)
       -- The range a thru b of the main list (the "left half" for a merge) has been extracted for merging back into it. Do the same in respect of the slave list.
       set auxList to items a thru b of my lst
   end extract
   
   on reposition(a, b)
       -- Item b of the main list has been set to item a of the main list.
       set item b of my lst to item a of my lst
   end reposition
   
   on merge(a, b)
       -- Item b of the main list has been set to item a of the extracted left half.
       set item b of my lst to item a of my auxList
   end merge
   
   on swap(a, b)
       -- Items a and b of the main list have been swapped over.
       tell item a of my lst
           set item a of my lst to item b of my lst
           set item b of my lst to it
       end tell
   end swap
end script

Here are some example calls using the above script objects:

Applescript:

-- Sort items 1 thru -1 of a list of records by their 'aardvark' properties, simultaneously sorting another list in parallel.
-- The "slave" (parallel) list must have at least as many items as the main list.
set sortInParallel's lst to myOtherList
CustomMergeSort(myListOfRecords, 1, -1, {comparer:compareRecordsByAardvark, slave:parallel})

-- Sort the list of records, but without a parallel sort.
CustomMergeSort(myListOfRecords, 1, -1, {comparer:compareRecordsByAardvark})

-- Sort a list values which don't need custom comparisons, simultaneously sorting another list in parallel.
set sortInParallel's slaveList to myOtherList
CustomMergeSort(myListOfDates, 1, -1, {slave:parallel})

-- Do a non-custom sort! ie. compare values directly and perform no slave action.
CustomMergeSort(myListOfDates, 1, -1, {})

McUsr pointed out above that Merge Sort is a "stable" sort:

Example: If you sort files by names first, and the sorts them by date afterwards, the files with an equal date will still be sorted by name.


For my own use, I sometimes need to sort the rows of a spreadsheet on column F, subsorting on columns B, C, and E in turn. (In the script, the "rows" are a list of lists whose items are the cell values at the intersections with the columns.)

I use CustomMergeSort() for this. But instead of sorting on one column at a time and exploiting merge sort's stability to keep the rest in relative order, I sort on all four columns at once using an isGreater() handler which compares items 6, 2, 3, and 5 of the rows in that order each time it's called. It only needs to work through the column sequence until the two rows are found to have different values in one of the columns, and the entire process is accomplished with one sort instead of four. It's analogous to the way strings are compared when they're sorted.

Here's a demo of sorting on columns. You have to supply your own list of lists and your own path to the CustomMergeSort script.

Applescript:

-- Sort rows from a spreadsheet range, sorting and subsorting on four of its columns.
-- This code only demos the use of the custom sort.
-- It uses CustomMergeSort, but the sortOnCols script object works with any of my custom sorts.

-- This handler controls the sort.
on sortOnColumns(theRows, sortColumns)
   script sortOnCols
       property rows : theRows
       property sortCols : sortColumns
       property sortColumnCount : (count sortColumns)
       
       -- Compare the values at the given sort columns in row list a with those in row list b.
       on isGreater(a, b)
           script p
               property row_a : a
               property row_b : b
           end script
           
           -- Repeat with each of the sort columns, in the given order.
           repeat with i from 1 to sortColumnCount
               set col to item i of o's sortCols
               -- Get the cell values from the intersection of this sort column with these two rows.
               set val_a to p's row_a's item col
               set val_b to p's row_b's item col
               if (val_a = val_b) then
                   -- Keep going if the cell values are equal.
               else
                   -- Otherwise return whether or not cell value a is greater cell value b.
                   return (val_a > val_b)
               end if
           end repeat
           
           return false -- a's sort-column values are all the same as b's, so a's not greater than b.
       end isGreater
   end script
   
   -- Assuming one's CustomMergeSort() script is kept in the following location, load it and do the sort.
   set sorter to (load script file ((path to scripts folder from user domain as Unicode text) & "Libraries:Sorts:Custom Merge Sort.scpt"))
   
   sorter's CustomMergeSort(theRows, 1, -1, {comparer:sortOnCols})
end sortOnColumns

set theRows to aListOfListsExtractedFromNumbers
set sortColumns to {6, 2, 3, 5} -- Columns F, B, C, & E.

sortOnColumns(theRows, sortColumns)

Last edited by Nigel Garvey (2015-06-14 02:25:40 pm)


NG

Offline

 

#7 2010-07-26 02:27:23 am

McUsr
Member
From:: Southern Norway
Registered: 2010-04-07
Posts: 1776

Re: A merge sort

Nice Mr.G!.

I shall really try this smile I have to create a database first. But that is underway in another context. You sure have your hands with sort algorithms also. smile

It is really nice to see practical implementations over the text book's theoretical versions. -Without having to make them myself and knowing that every thing is in order. big_smile

I think merge sort is really applicable when the the data is suspected to be a retrieved in a worst case order for instance from a make shift database under a web server or such, where all the incoming data is in fairly random order.

What concerns me with merge sort is the alleged usage of stack space. I'm not sure wether this applies to your implementation or not, since you save some list handling. Which Apple Script is a little bit cheap on. I'll come back to you on this in not too long time I hope.

Last edited by McUsr (2010-07-26 02:36:49 am)


Mercurial vcs is a joy to use for scripting.


Filed under: merge sort

Offline

 

#8 2010-07-27 08:10:05 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4420

Re: A merge sort

I wrote:

I don't know why CustomMergeSort()'s faster the CustomQsort() with the single-sort method. I'll have to look into that some other time.


It seems that, while qSort() and CustomQsort() have to do considerably less moving around than mergeSort() and CustomMergeSort() to get a list in order, they actually perform quite a few more comparisons of the items. (The actual numbers depend, of course, on the length of the list and what's in it.) The greater amount of work involved in custom comparisons therefore affects CustomQsort() more than it does CustomMergeSort() and is enough to swing the speed advantage in the latter's favour.

Later on, I'll bung parallel sorting into the mix and see how things stand in that respect….


NG

Offline

 

#9 2010-07-27 08:49:10 am

McUsr
Member
From:: Southern Norway
Registered: 2010-04-07
Posts: 1776

Re: A merge sort

That will be interesting to see.

Nigel Garvey wrote:

It seems that, while qSort() and CustomQsort() have to do considerably less moving around than mergeSort() and CustomMergeSort() to get a list in order, they actually perform quite a few more comparisons of the items.


But won't customMergeSort still move a lot more data around? (This is right from the text book, I haven't actually looked at how you have implemented your version compared to the text book version of Merge Sort.)
If it is, it will require far more stack space if that is ever to be an issue.

Parallel sorting is something I'm looking forward to see. smile

I will soon work on some tree structures, which enables me to reach the parent node from a child. Hopefully I get a way with a doubly linked list. I have implemented such a structure fairly well in java, and will now convert it, which should be really easy. I believe I have implemented a balanced tree as well, but that may just be a wish. I have to have a look in my tree. smile

I want to present Script libraries in the dot language, and render it with  graphwiz (But the renderings will not be more complex than the object diagrams in ScriptDebugger, they will be simpler as a matter of fact.

Last edited by McUsr (2010-07-27 08:53:01 am)


Mercurial vcs is a joy to use for scripting.


Filed under: Algorithms

Offline

 

#10 2013-10-30 07:12:26 am

DJ Bazzie Wazzie
Member
From:: the Netherlands
Registered: 2004-10-20
Posts: 2724
Website

Re: A merge sort

Nigel Garvey wrote:

It's not quite as fast as qsort, but it's not bad.


Maybe a late response but for the record: qsort is slower than mergesort. The name is confusing and by most considered therefore the fastest sort. The perfect (fastest) sort in qsort is equally fast as mergesort, the worse case sort in qsort is half as fast as the perfect sort meaning it's about 39% slower mergesort. Because mergesort is stable, unlike qsort, the sort speed is consistent. Because mavericks has the new scirpt library, I was looking in my "old" library of handlers and found this:

Applescript:


mergeSort({9, 4, 2, 7, 5, 8, 6, 1, 3})
mergeSort({"mouse", "dog", "cat", "bird", "snake", "fish"})

on mergeSort(lst)
   set sortedList to {}
   if (count lst) < 2 then return lst
   set m to (count lst) div 2
   set l to mergeSort(items 1 thru m of lst)
   set r to mergeSort(items (m + 1) thru -1 of lst)
   repeat until ((count l) = 0 and (count r) = 0)
       if (count l) ≠ 0 and ((count r) = 0 or (item 1 of l) < (item 1 of r)) then
           set end of sortedList to item 1 of l
           set l to rest of l
       else
           set end of sortedList to item 1 of r
           set r to rest of r
       end if
   end repeat
   return sortedList
end mergeSort

In general mergesort performs better for linked lists instead of linear arrays. The problem for linear arrays is that mergesort needs to allocate sub arrays or as McUsr wrote "move a lot more data around". But for linked lists, a list designed to insert and remove data quickly no matter how big the list is, mergesort comes really to it's right because there is no movement of data, only pointer values. AppleScript uses linked lists but unfortunately they're presented and accessible only as linear arrays. This means that in my example code I'm forced to create new sub lists and return a new sorted list. Still it's more than 2 times faster than a qsort.

Last edited by DJ Bazzie Wazzie (2013-10-30 07:15:10 am)

Offline

 

#11 2013-10-30 02:53:05 pm

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: A merge sort

Hello.

I can't get this out of my mind, so I have to reply! smile

First of all, I can't argue that mergesort will run out of stack space, because the limit of AppleScript lists are 2^14 = ca. 16384 elements, which will be a boundary long before we run out of stack frames.

But you could improve the quicksort if you calculate the median, so that who's best is really a question of the data. This is a big discussion in between users of the median of medians algorithm at the moment by the way.

(I believe Nigel calculates the median for quicksort in the thread with Arthur Knapp.)

But… twice the speed and no work, makes for an easy choice. Personally I'd choose it for the stable properties.

Thanks for sharing, together with Nigel's CustomSort my dose of sorts is almost complete. smile

Last edited by McUsrII (2013-10-30 08:33:01 pm)

Offline

 

#12 2013-10-30 06:10:46 pm

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 5175

Re: A merge sort

McUsrII wrote:

But… twice the speed and no work, makes for an easy choice.


I agree:

Applescript:

on sortAList:aList
   set anArray to current application's NSArray's arrayWithArray:aList
   return (anArray's sortedArrayUsingSelector:"compare:") as list
end sortAList:

wink

(Actually, the speed difference is considerably greater than that.)

It's a pity that stable sorting isn't directly accessible from AppleScriptObjC.


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/

Offline

 

#13 2013-10-30 08:24:14 pm

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4420

Re: A merge sort

DJ Bazzie Wazzie wrote:
Nigel Garvey wrote:

It's not quite as fast as qsort, but it's not bad.


Maybe a late response but for the record: qsort is slower than mergesort.


Hi DJ.

The qsort to which I was referring was the one I’d previously mentioned as having been written by Arthur Knapp and myself, which is a hybrid Quicksort/insertion sort with a single sweep of the insertion sort replacing the lower recursion levels of the Quicksort. While an insertion sort isn’t particularly efficient in itself, it’s absolutely brilliant when there’s not much sorting left to do — such as after an incomplete Quicksort — because then it doesn’t do much! The lower levels of a Quicksort, on the other hand, have to keep “dividing and conquering” ever smaller and more numerous subranges with very little actual sorting taking place by then. The hybrid is thus generally faster than a straight Quicksort and is faster with integers than my merge sort. Other methods which have been suggested for improving Quicksort involve using the best pivot value at each level without taking too much time to find out what that is, and/or by grouping all the values which are equal to the pivot so that they can all be eliminated together from the rest of the sort.

While it’s possible to talk theoretically about the comparative speeds of sorts using “O”s and "worse-case scenarii", that’s not enough in a high-level language like AppleScript. The way algorithms are implemented in the language of choice also has a tremendous effect on what is actually fast, as evidenced by the fact that my merge sort in post #1 sorts a list of 10,000 random integers in around 0.85 seconds (on my machine) whereas yours takes around 41.8 seconds. (The qsort just mentioned takes 0.66 seconds, a Shell sort implementation of mine takes 1.28 seconds, and my ternary heap sort takes 1.11 seconds.) There are also factors such as how long it takes to compare particular types of items in the language (strings usually take longer than integers, for example) and how long it takes to move them, which could influence whether an algorithm with fewer moves is faster or one with fewer comparisons. Then there's the number of items to be sorted, the number of different values in relation to the number of items, how much sorting needs to be done, and so on. You can’t just say “Merge sort is the fastest sort. Here is one.”

However, at 0.55 seconds for 10,000 random integers, the fastest vanilla sort in my collection is my as yet unpublished non-recursive ternary merge sort, just beating its recursive ternary sibling by a few hundredths of a second — usually.  smile

OT: I note that having run these tests, the AppleScript Editor in Mavericks is claiming not to be able to revert my scripts to their original saved state. It’s bad enough having to revert them in the first place, but this is the pits. However, the files’ modification dates haven’t changed and the scripts seem to open in their original state.  hmm

Last edited by Nigel Garvey (2013-10-30 08:29:00 pm)


NG

Offline

 

#14 2013-10-30 08:44:47 pm

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 5175

Re: A merge sort

Nigel Garvey wrote:

However, at 0.55 seconds for 10,000 random integers, the fastest vanilla sort in my collection is my as yet unpublished non-recursive ternary merge sort, just beating its recursive ternary sibling by a few hundredths of a second — usually.


Now that you're on Mavericks, perhaps you can run the ASObjC version for comparison.  I don't know how our Macs compare, but I just ran it with a simple list created by:

Applescript:

set list1 to {}
repeat 10000 times
   set end of list1 to random number from 1 to 10000
end repeat

And I'm getting consistent times around 0.034 seconds.

(And most of that time is spent converting from AS list to array and back. it looks like the actual sorting takes in the order of 0.007 seconds.)


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/

Offline

 

#15 2013-10-31 02:42:26 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4420

Re: A merge sort

Shane Stanley wrote:

Now that you're on Mavericks, perhaps you can run the ASObjC version for comparison.


Sure. How do I do that?


NG

Offline

 

#16 2013-10-31 02:53:57 am

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 5175

Re: A merge sort

Make a new document in ASE and add this:

Applescript:

on sortAList:aList
   set anArray to current application's NSArray's arrayWithArray:aList
   return (anArray's sortedArrayUsingSelector:"compare:") as list
end sortAList:

Save it to ~/Library/Script Libraries/ (you'll have to make that folder) as a .scptd script bundle.

Once it's saved, the Bundle Contents button in the toolbar will be enabled. click, and in the drawer that appears, click on the AppleScript/Objective-C Library checkbox. Save again.

Then open new window and type:

Applescript:

use theLib : script "<name of first file>"
set list1 to {}
repeat 10000 times
set end of list1 to random number from 1 to 10000
end repeat
theLib's sortAList:list1

Add you timer of choice and run.


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/

Offline

 

#17 2013-10-31 03:22:42 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4420

Re: A merge sort

'random number' doesn't compile while the 'use' command's there.


NG

Offline

 

#18 2013-10-31 03:44:32 am

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 5175

Re: A merge sort

Once you use a use(!), you need to include "use scripting additions" if you want to use them (which in reality means you do it always).

(sorry, I wrote the last email while being called to dinner...)

Last edited by Shane Stanley (2013-10-31 03:48:02 am)


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/

Offline

 

#19 2013-10-31 04:16:38 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4420

Re: A merge sort

OK.

1.194 seconds on the first run, 0.081-ish subsequently.


NG

Offline

 

#20 2013-10-31 05:26:01 am

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 5175

Re: A merge sort

There's probably a bit of loading done for the first run, but settles down to be pretty consistent. I believe it uses a form of quicksort -- or it did. It's hard to be sure because NSArray is a cluster of classes, so the code used can vary depending on things like the number of items. It's definitely unfair competition sad

If you modify the script like this:

Applescript:

on sortAList:aList
set anArray to current application's NSArray's arrayWithArray:aList
anArray's sortedArrayUsingSelector:"compare:"
anArray's sortedArrayUsingSelector:"compare:"
... <say ten extra sorts>
return (anArray's sortedArrayUsingSelector:"compare:") as list
end sortAList:

you can calculate the extra time taken, and work out how long the actual sorting takes, and how much of the time is spent going from AppleScript to Cocoa and back.


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/

Offline

 

#21 2013-10-31 05:44:52 am

DJ Bazzie Wazzie
Member
From:: the Netherlands
Registered: 2004-10-20
Posts: 2724
Website

Re: A merge sort

@Nigel: I replied to your statement about qsort and mergesort and I replied to that in a more general way and not to your custom mergesort and qsort. Then I added that I had an true mergesort handler and compared it with a true qsort handler and unlike normal 39% average slower qsort, in AS the qsort is much slower. Again, no harm against your custom sorts but since they're called qsort and mergesort like normal sort mechanisms I think it was useful to reply to the real qsort and mergesort mechanisms because it was lacking in the only topic on MacScripter about mergesort.

WikiPedia wrote:

Conceptually, a merge sort works as follows:

    - Divide the unsorted list into n sublists, each containing 1 element (a list of 1 element is considered sorted).
    - Repeatedly merge sublists to produce new sorted sublists until there is only 1 sublist remaining. This will be the sorted list.


So I created a sorted list as described in the concept above, a true mergesort IMO.

p.s. Sorting a list of 10,000 unique integers took only 2 seconds on my machine, qsort took around 10 seconds.

Offline

 

#22 2013-10-31 06:22:12 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4420

Re: A merge sort

In July 2010, McUsr wrote:

But won't customMergeSort still move a lot more data around? (This is right from the text book, I haven't actually looked at how you have implemented your version compared to the text book version of Merge Sort.)


I see I didn’t reply to this at the time.

It’s not so much the data which are moved, but the pointers in the lists. And there are different approaches to that. The text book merge sort creates three new arrays at each recursion: two containing the items from each half of the array passed down from above, which are themselves then passed down to lower levels, and a third which is cobbled together from two received back from the lower levels and then passed up to the level above. It’s very expensive in terms of array creation.

In my merge sorts, only the indices of ranges within the original list are passed down. Instead of receiving back sorted lists, each recursion receives a situation whereby those ranges in the original list have been sorted. It then creates one new list covering both ranges and merges the items from each half of that back into the original. There are thus far fewer temporary lists generated (though still a lot!) and the result’s an in-place sort of the original list. There’s the additional bonus in the merge stages that if all the left items are used up, there’s no need to bother with the remaining right items as they’re already in place in the range from which they were taken.


NG

Offline

 

#23 2013-10-31 06:44:17 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4420

Re: A merge sort

DJ Bazzie Wazzie wrote:

… but since they're called qsort and mergesort like normal sort mechanisms I think it was useful to reply to the real qsort and mergesort mechanisms …


Hi DJ.

qsort is perhaps a little misleadingly named, since it's a hybrid. On the other hand, it isn't actually called "Quicksort".  wink

The merge sort, however, is algorithmically a true merge sort. It's just an in-place implementation.


NG

Offline

 

#24 2013-10-31 12:37:12 pm

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: A merge sort

Nigel Garvey wrote:
DJ Bazzie Wazzie wrote:

… but since they're called qsort and mergesort like normal sort mechanisms I think it was useful to reply to the real qsort and mergesort mechanisms …


Hi DJ.

qsort is perhaps a little misleadingly named, since it's a hybrid. On the other hand, it isn't actually called "Quicksort".  wink

The merge sort, however, is algorithmically a true merge sort. It's just an in-place implementation.


Algorithms and Datastructures N. Wirth wrote:

Its performance is so spectacular that its inventor, C.A.R. Hoare, called it Quicksort [2.5 and 2.6].


I can assure you it is quicksort Wirth was talking about.

By the way, thanks for your belated reply. Well, since it is is christmas today, I'll tell you how I calculated that mergesort never will run out of stackspace:

I presume AppleScript crashes after 100 recursions, and that mergeseort splits its lists in two.

Then we have the inequality log2X>100, solving that by raising everything up to e, we are left with x*2 = e^100.
and  (e^100)/2 is a number that surpasses 16340 quite a few times. smile

Well, I'll go back to the AS testing framework, while I am riding the waves…

And by the way, I think your CustomSort to be the most technical excellent handler I have seen in AppleScript, and most useful too, when you need it! wink

Last edited by McUsrII (2013-10-31 12:43:15 pm)

Offline

 

#25 2013-11-24 02:44:59 pm

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: A merge sort

Hello.

Nigel's script in post #6 here is really pivotal!

Thing is, that a stable sort handler, can solve a slew of problems, concerning structuring data, from say a graphics program, or any thing else for that matter, that is not made with structure in mind.

As such, a stable sort algorithm, and one that works well, is totally indispensable, the moment you need one. There is really no substitute for it.

Edit

You can of course write as complex a comparator as you want, to "emulate" the stable sort. But what you can't do, is preparating the data (fields) inside the comparator, not in every case anyway. But that is something that you easily can do between runs of the merge sort. So it is a really "low-treshold" tool for turning possible nightmares into a breeze.

(I'd rather call it an instrument, than a tool.)

Last edited by McUsrII (2013-11-24 02:53:07 pm)

Offline

 

Board footer

Powered by FluxBB

RSS (new topics) RSS (active topics)