A merge sort

@Nigel: I replied to your statement about qsort and mergesort and I replied to that in a more general way and not to your custom mergesort and qsort. Then I added that I had an true mergesort handler and compared it with a true qsort handler and unlike normal 39% average slower qsort, in AS the qsort is much slower. Again, no harm against your custom sorts but since they’re called qsort and mergesort like normal sort mechanisms I think it was useful to reply to the real qsort and mergesort mechanisms because it was lacking in the only topic on MacScripter about mergesort.

So I created a sorted list as described in the concept above, a true mergesort IMO.

p.s. Sorting a list of 10,000 unique integers took only 2 seconds on my machine, qsort took around 10 seconds.

I see I didn’t reply to this at the time.

It’s not so much the data which are moved, but the pointers in the lists. And there are different approaches to that. The text book merge sort creates three new arrays at each recursion: two containing the items from each half of the array passed down from above, which are themselves then passed down to lower levels, and a third which is cobbled together from two received back from the lower levels and then passed up to the level above. It’s very expensive in terms of array creation.

In my merge sorts, only the indices of ranges within the original list are passed down. Instead of receiving back sorted lists, each recursion receives a situation whereby those ranges in the original list have been sorted. It then creates one new list covering both ranges and merges the items from each half of that back into the original. There are thus far fewer temporary lists generated (though still a lot!) and the result’s an in-place sort of the original list. There’s the additional bonus in the merge stages that if all the left items are used up, there’s no need to bother with the remaining right items as they’re already in place in the range from which they were taken.

Hi DJ.

qsort is perhaps a little misleadingly named, since it’s a hybrid. On the other hand, it isn’t actually called “Quicksort”. :wink:

The merge sort, however, is algorithmically a true merge sort. It’s just an in-place implementation.

I can assure you it is quicksort Wirth was talking about.

By the way, thanks for your belated reply. Well, since it is is christmas today, I’ll tell you how I calculated that mergesort never will run out of stackspace:

I presume AppleScript crashes after 100 recursions, and that mergeseort splits its lists in two.

Then we have the inequality log2X>100, solving that by raising everything up to e, we are left with x*2 = e^100.
and (e^100)/2 is a number that surpasses 16340 quite a few times. :slight_smile:

Well, I’ll go back to the AS testing framework, while I am riding the waves.

And by the way, I think your CustomSort to be the most technical excellent handler I have seen in AppleScript, and most useful too, when you need it! :wink:

Hello.

Nigel’s script in post #6 here is really pivotal!

Thing is, that a stable sort handler, can solve a slew of problems, concerning structuring data, from say a graphics program, or any thing else for that matter, that is not made with structure in mind.

As such, a stable sort algorithm, and one that works well, is totally indispensable, the moment you need one. There is really no substitute for it.

Edit

You can of course write as complex a comparator as you want, to “emulate” the stable sort. But what you can’t do, is preparating the data (fields) inside the comparator, not in every case anyway. But that is something that you easily can do between runs of the merge sort. So it is a really “low-treshold” tool for turning possible nightmares into a breeze.

(I’d rather call it an instrument, than a tool.)

In view of the demise of ScriptBuilders, I’ve updated the scripts in posts #1 and #6 and rewritten the blurb in the latter.

I’ve updated the scripts in the light of a tip I read on Wikipedia a couple of days ago that it’s only necessary to extract separate lists of the left halves for the merges, not lists containing both halves. Quite a revelation eight years after I wrote the code!

That is nice. :slight_smile:

Still not been around to test it. The reason I reply here, is that I have an idea for testing a average cases of algorithms.

This can of course been done the hard way, by computing the complexity of the algorithm, and then working with the probability of the average number of inversions in a data set.

My idea, is to bluntly create a data set with an average number of inversions, which is misplaced elements with respect to the sorting order. Then time the implementation of the algorithm with this dataset.

The average number of inversions is b/4[/b].

The whole proof of this can be found on page 431 in Rosen: “Discrete Mathematics with Applications” 6th Edition"

There is of course the math there for doing the average case computations, taking the computational complexity into account as well, if you need to compare average cases with varying algorithms and number of inputs. :slight_smile:

Edit
An easier way to figure out the average number of unsorted number of elements, without coming up with a formal proof, given the 50% probability per element that it is either sorted or unsorted, and by the linearity of expectations, (that you can sum up expected values). the math should hold: The sum of the elements that are unsorted when we have 50% chance of “unsortedness” per element, should be the half of the number of elements on an average.)

So, in a set with N elements, there should be an average of N/2 unsorted elements, which seems quite intuitive, and that I remember having seen in writing earlier, now that I have computed it. :slight_smile:

Good idea, but merge sort will not be affected how the data is shuffled like other sorting algorithms.

Actually.

Not totally sure how they got n log n as a result, but if you add up the smaller terms, then you have to take the swap operations into account, which will wary with the number of operations.

Nigel’s recursive merge sort (or iterative merge sort) requires an buffer where items from two lists are merged into one list. Inside the deepest recursive (or first iterative) call there are two lists each containing only 1 item and are merged into one list containing two items, this may look like swapping but it’s not. The next iteration or parent handler it becomes much more different from swapping. The drawback of merge sort requiring additional temporary buffers, which is the real bottleneck in performance. There is no difference if an item is “swapped” or not, it’s copied to the buffer before or after the the item in the opposite list is copied to the buffer.

Hello.

I did look at a text book implementation of a merge sort, which did swap elements if one element was larger than the next. :slight_smile:

I probably should remove all those posts from this thread, and post it as a general subject in relation to timing, it seemed as a good idea at the time, since Nigel probably would be interested in the result, if he hadn’t seen that way of overcoming the obstacle there is to compute the average case of a sorting algorithm.

Hi DJ.

In my recursive merge sorts, each recursion extracts a shallow copy of a range in the main list and merges the two ends of the copy back to the original range. (The revision I made last week only extracts a copy of the left end of the range, merging from that and the right end of the original back to the original.) Where the range is only two items long, any rearrangement is done by swapping, not by recursion, extraction, and merging. The method does still produce a lot of temporary lists over the course of the sort, but only a third as many as the code published with the source from which learned the algorithm. One nice thing about merging a copy back to the original is that if the left items are exhausted first, the remaining right items must already be in the right places and needn’t be copied back too.

My iterative merge sorts only use one additional list per sort, merging back and forth between this and the original list on alternate passes. The exception is the first pass, which handles the shortest ranges (two items each in the binary sorts or three in the ternary versions). These are dealt with by swapping in the original list. When merging back and forth between two lists, it’s necessary to complete the merges even after the left items are exhausted.

The animations on this page might be of interest to lurkers:

www.sorting-algorithms.com

1 Like

Thanks, Shane. Those are fun to watch. :slight_smile:

Shane your resource looks amazing. Thanks for sharing. I’ve also found this merge sort article helpful.