A merge sort

Nigel_Garvey · October 31, 2013, 8:22am

‘random number’ doesn’t compile while the ‘use’ command’s there.

Shane_Stanley · October 31, 2013, 8:44am

Once you use a use(!), you need to include “use scripting additions” if you want to use them (which in reality means you do it always).

(sorry, I wrote the last email while being called to dinner…)

Nigel_Garvey · October 31, 2013, 9:16am

OK.

1.194 seconds on the first run, 0.081-ish subsequently.

Shane_Stanley · October 31, 2013, 10:26am

There’s probably a bit of loading done for the first run, but settles down to be pretty consistent. I believe it uses a form of quicksort – or it did. It’s hard to be sure because NSArray is a cluster of classes, so the code used can vary depending on things like the number of items. It’s definitely unfair competition

If you modify the script like this:

on sortAList:aList
   set anArray to current application's NSArray's arrayWithArray:aList
   anArray's sortedArrayUsingSelector:"compare:"
   anArray's sortedArrayUsingSelector:"compare:"
   ... <say ten extra sorts>
   return (anArray's sortedArrayUsingSelector:"compare:") as list
end sortAList:

you can calculate the extra time taken, and work out how long the actual sorting takes, and how much of the time is spent going from AppleScript to Cocoa and back.

DJ_Bazzie_Wazzie · October 31, 2013, 10:44am

@Nigel: I replied to your statement about qsort and mergesort and I replied to that in a more general way and not to your custom mergesort and qsort. Then I added that I had an true mergesort handler and compared it with a true qsort handler and unlike normal 39% average slower qsort, in AS the qsort is much slower. Again, no harm against your custom sorts but since they’re called qsort and mergesort like normal sort mechanisms I think it was useful to reply to the real qsort and mergesort mechanisms because it was lacking in the only topic on MacScripter about mergesort.

WikiPedia:

Conceptually, a merge sort works as follows:

- Divide the unsorted list into n sublists, each containing 1 element (a list of 1 element is considered sorted).
- Repeatedly merge sublists to produce new sorted sublists until there is only 1 sublist remaining. This will be the sorted list.

So I created a sorted list as described in the concept above, a true mergesort IMO.

p.s. Sorting a list of 10,000 unique integers took only 2 seconds on my machine, qsort took around 10 seconds.

Nigel_Garvey · October 31, 2013, 11:22am

I see I didn’t reply to this at the time.

It’s not so much the data which are moved, but the pointers in the lists. And there are different approaches to that. The text book merge sort creates three new arrays at each recursion: two containing the items from each half of the array passed down from above, which are themselves then passed down to lower levels, and a third which is cobbled together from two received back from the lower levels and then passed up to the level above. It’s very expensive in terms of array creation.

In my merge sorts, only the indices of ranges within the original list are passed down. Instead of receiving back sorted lists, each recursion receives a situation whereby those ranges in the original list have been sorted. It then creates one new list covering both ranges and merges the items from each half of that back into the original. There are thus far fewer temporary lists generated (though still a lot!) and the result’s an in-place sort of the original list. There’s the additional bonus in the merge stages that if all the left items are used up, there’s no need to bother with the remaining right items as they’re already in place in the range from which they were taken.

Nigel_Garvey · October 31, 2013, 11:44am

Hi DJ.

qsort is perhaps a little misleadingly named, since it’s a hybrid. On the other hand, it isn’t actually called “Quicksort”.

The merge sort, however, is algorithmically a true merge sort. It’s just an in-place implementation.

McUsrII · October 31, 2013, 5:37pm

I can assure you it is quicksort Wirth was talking about.

By the way, thanks for your belated reply. Well, since it is is christmas today, I’ll tell you how I calculated that mergesort never will run out of stackspace:

I presume AppleScript crashes after 100 recursions, and that mergeseort splits its lists in two.

Then we have the inequality log2X>100, solving that by raising everything up to e, we are left with x*2 = e^100.
and (e^100)/2 is a number that surpasses 16340 quite a few times.

Well, I’ll go back to the AS testing framework, while I am riding the waves.

And by the way, I think your CustomSort to be the most technical excellent handler I have seen in AppleScript, and most useful too, when you need it!

McUsrII · November 24, 2013, 7:44pm

Hello.

Nigel’s script in post #6 here is really pivotal!

Thing is, that a stable sort handler, can solve a slew of problems, concerning structuring data, from say a graphics program, or any thing else for that matter, that is not made with structure in mind.

As such, a stable sort algorithm, and one that works well, is totally indispensable, the moment you need one. There is really no substitute for it.

Edit

You can of course write as complex a comparator as you want, to “emulate” the stable sort. But what you can’t do, is preparating the data (fields) inside the comparator, not in every case anyway. But that is something that you easily can do between runs of the merge sort. So it is a really “low-treshold” tool for turning possible nightmares into a breeze.

(I’d rather call it an instrument, than a tool.)

Nigel_Garvey · November 29, 2013, 9:45pm

In view of the demise of ScriptBuilders, I’ve updated the scripts in posts #1 and #6 and rewritten the blurb in the latter.

Nigel_Garvey · June 14, 2015, 7:51pm

I’ve updated the scripts in the light of a tip I read on Wikipedia a couple of days ago that it’s only necessary to extract separate lists of the left halves for the merges, not lists containing both halves. Quite a revelation eight years after I wrote the code!

McUsrII · June 20, 2015, 4:17pm

That is nice.

Still not been around to test it. The reason I reply here, is that I have an idea for testing a average cases of algorithms.

This can of course been done the hard way, by computing the complexity of the algorithm, and then working with the probability of the average number of inversions in a data set.

My idea, is to bluntly create a data set with an average number of inversions, which is misplaced elements with respect to the sorting order. Then time the implementation of the algorithm with this dataset.

The average number of inversions is b/4[/b].

The whole proof of this can be found on page 431 in Rosen: “Discrete Mathematics with Applications” 6th Edition"

There is of course the math there for doing the average case computations, taking the computational complexity into account as well, if you need to compare average cases with varying algorithms and number of inputs.

Edit
An easier way to figure out the average number of unsorted number of elements, without coming up with a formal proof, given the 50% probability per element that it is either sorted or unsorted, and by the linearity of expectations, (that you can sum up expected values). the math should hold: The sum of the elements that are unsorted when we have 50% chance of “unsortedness” per element, should be the half of the number of elements on an average.)

So, in a set with N elements, there should be an average of N/2 unsorted elements, which seems quite intuitive, and that I remember having seen in writing earlier, now that I have computed it.

DJ_Bazzie_Wazzie · June 21, 2015, 10:13am

Good idea, but merge sort will not be affected how the data is shuffled like other sorting algorithms.

McUsrII · June 21, 2015, 3:57pm

Actually.

Not totally sure how they got n log n as a result, but if you add up the smaller terms, then you have to take the swap operations into account, which will wary with the number of operations.

DJ_Bazzie_Wazzie · June 22, 2015, 12:05am

Nigel’s recursive merge sort (or iterative merge sort) requires an buffer where items from two lists are merged into one list. Inside the deepest recursive (or first iterative) call there are two lists each containing only 1 item and are merged into one list containing two items, this may look like swapping but it’s not. The next iteration or parent handler it becomes much more different from swapping. The drawback of merge sort requiring additional temporary buffers, which is the real bottleneck in performance. There is no difference if an item is “swapped” or not, it’s copied to the buffer before or after the the item in the opposite list is copied to the buffer.

McUsrII · June 22, 2015, 4:00am

Hello.

I did look at a text book implementation of a merge sort, which did swap elements if one element was larger than the next.

I probably should remove all those posts from this thread, and post it as a general subject in relation to timing, it seemed as a good idea at the time, since Nigel probably would be interested in the result, if he hadn’t seen that way of overcoming the obstacle there is to compute the average case of a sorting algorithm.

Nigel_Garvey · June 22, 2015, 9:37am

Hi DJ.

In my recursive merge sorts, each recursion extracts a shallow copy of a range in the main list and merges the two ends of the copy back to the original range. (The revision I made last week only extracts a copy of the left end of the range, merging from that and the right end of the original back to the original.) Where the range is only two items long, any rearrangement is done by swapping, not by recursion, extraction, and merging. The method does still produce a lot of temporary lists over the course of the sort, but only a third as many as the code published with the source from which learned the algorithm. One nice thing about merging a copy back to the original is that if the left items are exhausted first, the remaining right items must already be in the right places and needn’t be copied back too.

My iterative merge sorts only use one additional list per sort, merging back and forth between this and the original list on alternate passes. The exception is the first pass, which handles the shortest ranges (two items each in the binary sorts or three in the ternary versions). These are dealt with by swapping in the original list. When merging back and forth between two lists, it’s necessary to complete the merges even after the left items are exhausted.

Shane_Stanley · August 14, 2015, 1:20am

The animations on this page might be of interest to lurkers:

www.sorting-algorithms.com

Nigel_Garvey · August 21, 2015, 10:48pm

Thanks, Shane. Those are fun to watch.

m9hen · October 30, 2018, 10:11am

Shane your resource looks amazing. Thanks for sharing. I’ve also found this merge sort article helpful.