A merge sort

Hello.

I can’t get this out of my mind, so I have to reply! :slight_smile:

First of all, I can’t argue that mergesort will run out of stack space, because the limit of AppleScript lists are 2^14 = ca. 16384 elements, which will be a boundary long before we run out of stack frames.

But you could improve the quicksort if you calculate the median, so that who’s best is really a question of the data. This is a big discussion in between users of the median of medians algorithm at the moment by the way.

(I believe Nigel calculates the median for quicksort in the thread with Arthur Knapp.)

But. twice the speed and no work, makes for an easy choice. Personally I’d choose it for the stable properties.

Thanks for sharing, together with Nigel’s CustomSort my dose of sorts is almost complete. :slight_smile:

I agree:

on sortAList:aList
	set anArray to current application's NSArray's arrayWithArray:aList
	return (anArray's sortedArrayUsingSelector:"compare:") as list
end sortAList:

:wink:

(Actually, the speed difference is considerably greater than that.)

It’s a pity that stable sorting isn’t directly accessible from AppleScriptObjC.

Hi DJ.

The qsort to which I was referring was the one I’d previously mentioned as having been written by Arthur Knapp and myself, which is a hybrid Quicksort/insertion sort with a single sweep of the insertion sort replacing the lower recursion levels of the Quicksort. While an insertion sort isn’t particularly efficient in itself, it’s absolutely brilliant when there’s not much sorting left to do ” such as after an incomplete Quicksort ” because then it doesn’t do much! The lower levels of a Quicksort, on the other hand, have to keep “dividing and conquering” ever smaller and more numerous subranges with very little actual sorting taking place by then. The hybrid is thus generally faster than a straight Quicksort and is faster with integers than my merge sort. Other methods which have been suggested for improving Quicksort involve using the best pivot value at each level without taking too much time to find out what that is, and/or by grouping all the values which are equal to the pivot so that they can all be eliminated together from the rest of the sort.

While it’s possible to talk theoretically about the comparative speeds of sorts using "O"s and “worse-case scenarii”, that’s not enough in a high-level language like AppleScript. The way algorithms are implemented in the language of choice also has a tremendous effect on what is actually fast, as evidenced by the fact that my merge sort in post #1 sorts a list of 10,000 random integers in around 0.85 seconds (on my machine) whereas yours takes around 41.8 seconds. (The qsort just mentioned takes 0.66 seconds, a Shell sort implementation of mine takes 1.28 seconds, and my ternary heap sort takes 1.11 seconds.) There are also factors such as how long it takes to compare particular types of items in the language (strings usually take longer than integers, for example) and how long it takes to move them, which could influence whether an algorithm with fewer moves is faster or one with fewer comparisons. Then there’s the number of items to be sorted, the number of different values in relation to the number of items, how much sorting needs to be done, and so on. You can’t just say “Merge sort is the fastest sort. Here is one.”

However, at 0.55 seconds for 10,000 random integers, the fastest vanilla sort in my collection is my as yet unpublished non-recursive ternary merge sort, just beating its recursive ternary sibling by a few hundredths of a second ” usually. :slight_smile:

OT: I note that having run these tests, the AppleScript Editor in Mavericks is claiming not to be able to revert my scripts to their original saved state. It’s bad enough having to revert them in the first place, but this is the pits. However, the files’ modification dates haven’t changed and the scripts seem to open in their original state. :confused:

Now that you’re on Mavericks, perhaps you can run the ASObjC version for comparison. I don’t know how our Macs compare, but I just ran it with a simple list created by:

set list1 to {}
repeat 10000 times
	set end of list1 to random number from 1 to 10000
end repeat

And I’m getting consistent times around 0.034 seconds.

(And most of that time is spent converting from AS list to array and back. it looks like the actual sorting takes in the order of 0.007 seconds.)

Sure. How do I do that?

Make a new document in ASE and add this:

on sortAList:aList
	set anArray to current application's NSArray's arrayWithArray:aList
	return (anArray's sortedArrayUsingSelector:"compare:") as list
end sortAList:

Save it to ~/Library/Script Libraries/ (you’ll have to make that folder) as a .scptd script bundle.

Once it’s saved, the Bundle Contents button in the toolbar will be enabled. click, and in the drawer that appears, click on the AppleScript/Objective-C Library checkbox. Save again.

Then open new window and type:

use theLib : script "<name of first file>" 
set list1 to {}
repeat 10000 times
   set end of list1 to random number from 1 to 10000
end repeat
theLib's sortAList:list1

Add you timer of choice and run.

‘random number’ doesn’t compile while the ‘use’ command’s there.

Once you use a use(!), you need to include “use scripting additions” if you want to use them (which in reality means you do it always).

(sorry, I wrote the last email while being called to dinner…)

OK.

1.194 seconds on the first run, 0.081-ish subsequently.

There’s probably a bit of loading done for the first run, but settles down to be pretty consistent. I believe it uses a form of quicksort – or it did. It’s hard to be sure because NSArray is a cluster of classes, so the code used can vary depending on things like the number of items. It’s definitely unfair competition :frowning:

If you modify the script like this:

on sortAList:aList
   set anArray to current application's NSArray's arrayWithArray:aList
   anArray's sortedArrayUsingSelector:"compare:"
   anArray's sortedArrayUsingSelector:"compare:"
   ... <say ten extra sorts>
   return (anArray's sortedArrayUsingSelector:"compare:") as list
end sortAList:

you can calculate the extra time taken, and work out how long the actual sorting takes, and how much of the time is spent going from AppleScript to Cocoa and back.

@Nigel: I replied to your statement about qsort and mergesort and I replied to that in a more general way and not to your custom mergesort and qsort. Then I added that I had an true mergesort handler and compared it with a true qsort handler and unlike normal 39% average slower qsort, in AS the qsort is much slower. Again, no harm against your custom sorts but since they’re called qsort and mergesort like normal sort mechanisms I think it was useful to reply to the real qsort and mergesort mechanisms because it was lacking in the only topic on MacScripter about mergesort.

So I created a sorted list as described in the concept above, a true mergesort IMO.

p.s. Sorting a list of 10,000 unique integers took only 2 seconds on my machine, qsort took around 10 seconds.

I see I didn’t reply to this at the time.

It’s not so much the data which are moved, but the pointers in the lists. And there are different approaches to that. The text book merge sort creates three new arrays at each recursion: two containing the items from each half of the array passed down from above, which are themselves then passed down to lower levels, and a third which is cobbled together from two received back from the lower levels and then passed up to the level above. It’s very expensive in terms of array creation.

In my merge sorts, only the indices of ranges within the original list are passed down. Instead of receiving back sorted lists, each recursion receives a situation whereby those ranges in the original list have been sorted. It then creates one new list covering both ranges and merges the items from each half of that back into the original. There are thus far fewer temporary lists generated (though still a lot!) and the result’s an in-place sort of the original list. There’s the additional bonus in the merge stages that if all the left items are used up, there’s no need to bother with the remaining right items as they’re already in place in the range from which they were taken.

Hi DJ.

qsort is perhaps a little misleadingly named, since it’s a hybrid. On the other hand, it isn’t actually called “Quicksort”. :wink:

The merge sort, however, is algorithmically a true merge sort. It’s just an in-place implementation.

I can assure you it is quicksort Wirth was talking about.

By the way, thanks for your belated reply. Well, since it is is christmas today, I’ll tell you how I calculated that mergesort never will run out of stackspace:

I presume AppleScript crashes after 100 recursions, and that mergeseort splits its lists in two.

Then we have the inequality log2X>100, solving that by raising everything up to e, we are left with x*2 = e^100.
and (e^100)/2 is a number that surpasses 16340 quite a few times. :slight_smile:

Well, I’ll go back to the AS testing framework, while I am riding the waves.

And by the way, I think your CustomSort to be the most technical excellent handler I have seen in AppleScript, and most useful too, when you need it! :wink:

Hello.

Nigel’s script in post #6 here is really pivotal!

Thing is, that a stable sort handler, can solve a slew of problems, concerning structuring data, from say a graphics program, or any thing else for that matter, that is not made with structure in mind.

As such, a stable sort algorithm, and one that works well, is totally indispensable, the moment you need one. There is really no substitute for it.

Edit

You can of course write as complex a comparator as you want, to “emulate” the stable sort. But what you can’t do, is preparating the data (fields) inside the comparator, not in every case anyway. But that is something that you easily can do between runs of the merge sort. So it is a really “low-treshold” tool for turning possible nightmares into a breeze.

(I’d rather call it an instrument, than a tool.)

In view of the demise of ScriptBuilders, I’ve updated the scripts in posts #1 and #6 and rewritten the blurb in the latter.

I’ve updated the scripts in the light of a tip I read on Wikipedia a couple of days ago that it’s only necessary to extract separate lists of the left halves for the merges, not lists containing both halves. Quite a revelation eight years after I wrote the code!

That is nice. :slight_smile:

Still not been around to test it. The reason I reply here, is that I have an idea for testing a average cases of algorithms.

This can of course been done the hard way, by computing the complexity of the algorithm, and then working with the probability of the average number of inversions in a data set.

My idea, is to bluntly create a data set with an average number of inversions, which is misplaced elements with respect to the sorting order. Then time the implementation of the algorithm with this dataset.

The average number of inversions is b/4[/b].

The whole proof of this can be found on page 431 in Rosen: “Discrete Mathematics with Applications” 6th Edition"

There is of course the math there for doing the average case computations, taking the computational complexity into account as well, if you need to compare average cases with varying algorithms and number of inputs. :slight_smile:

Edit
An easier way to figure out the average number of unsorted number of elements, without coming up with a formal proof, given the 50% probability per element that it is either sorted or unsorted, and by the linearity of expectations, (that you can sum up expected values). the math should hold: The sum of the elements that are unsorted when we have 50% chance of “unsortedness” per element, should be the half of the number of elements on an average.)

So, in a set with N elements, there should be an average of N/2 unsorted elements, which seems quite intuitive, and that I remember having seen in writing earlier, now that I have computed it. :slight_smile:

Good idea, but merge sort will not be affected how the data is shuffled like other sorting algorithms.

Actually.

Not totally sure how they got n log n as a result, but if you add up the smaller terms, then you have to take the swap operations into account, which will wary with the number of operations.