I recently came across Vladimir Yaroslavskiy’s dual-pivot Quicksort algorithm — which he claims is even faster than a Quicksort optimised in the ways suggested by Robert Sedgewick and others — and have spent the past month or so trying to get it to be faster in AppleScript instead of slower!
As the name suggests, a dual-pivot Quicksort uses two pivot values for partitioning instead of one. The Java code in Yaroslavskiy’s 2009 write-up of his idea includes analogues of three of the optimisations recommended for single-pivot sorts (which makes its claimed superiority over single-pivot sorts with similar optimisations somewhat less impressive. ) These optimisations — two-medians-of-five pivot selection, five-way partitioning, and insertion sorting of short partitions — are all inefficiently coded when translated directly into AppleScript (the insertion sort eyewateringly so), but it may be that the code was simplified to clarify the method. Once these inefficiencies are fixed, the result’s a very fast vanilla sort.
Yaroslavskiy’s pivot selection involves pre-sorting five evenly-spaced items in the current range, using a sequence of nine compare-and-swap actions. The second and fourth sorted items are then swapped with the items at the ends of the range and used as pivot values — with the minor optimisation that they’re not actually put back into the list at that stage. My version of the idea includes the two end items among the five pivot candidates, uses the insertion sort algorithm, sorts the values between variables instead of in the list, and returns the unsuccessful candidates directly to the appropriate slots afterwards. This takes just eight list accesses, four to ten comparisons, and zero to fourteen variable reassignments.
The partitioning scheme in Yaroslavskiy’s code initially divides each range into values less than the lesser pivot, an instance of the lesser pivot, values between or equal to the pivots, an instance of the greater pivot, and values greater than the greater pivot. The central partition’s then partitioned further to move any other pivot instances it contains to the appropriate ends of it. The idea is that once all instances of the two pivot values are in place, they needn’t be touched again for the rest of the sort. In AppleScript, it’s faster to do the whole thing in one pass instead of two, which involves more comparisons and moves, but actually takes less time. However, it’s still not fast enough to bring sort times down to those obtained with my implementation of the equivalent single-pivot sort. Full five-way partitioning only pays for itself in AppleScript where a pathologically huge proportion of the values is duplicates. It’s better to stick with Yaroslavskiy’s initial separation and leave any pivot instances in the central partition to be sorted out incidentally later. This does makes the dual-pivot sort faster than my single-pivot implementation in the vast majority of cases!
A less critical, but philosophically more challenging decision involves a related optimisation whereby the middle partition is ignored if the pivots are equal. If they are equal, the partition only contains more instances of that one value and is therefore fully sorted. The problem is that the pivots rarely are equal, so the time spent comparing them is usually wasted. But when the number of duplicates in the list is such that there are fewer than about fifteen different values per hundred items, the sort begins to suffer a speed penalty if it doesn’t have this optimisation. (Quicksorts are very inefficient at sorting values that are all the same.) Since the penalty with the optimisation is much less than the worst-case penalty without it, I’ve kept it in.
The “straight” version of the dual-pivot Quicksort here vies with a couple of my other sorts as my fastest vanilla sort implementation. The customisable version is easily my fastest customisable sort. While testing them, I also tried iterative versions, which are remarkably identical in speed. For some reason, the iterative version of the customisable sort is consistently a few milliseconds faster than its recursive sibling with lists containing more than around ten thousand items. For this reason, and to make the code more interesting (hopefully!), the iterative version of the customisable sort is the one given in the following post. Meanwhile, here’s the recursive “straight” version. The script includes a demo handler at the end:
(* Quicksort — dual-pivot version
Quicksort algorithm: S.A.R. (Tony) Hoare, 1960.
Insertion sort algorithm: unknown author.
AppleScript implementations: Arthur J. Knapp and Nigel Garvey, 2003. Minor modifications by Nigel Garvey 2007, 2010, and 2018.
Dual-pivot Quicksort algorithm: Vladimir Yaroslavskiy, 2008/2009.
AppleScript implementation: Nigel Garvey 2018.
Parameters: (list, range index 1, range index 2)
The range index parameters are integers specifying the range to be sorted in the list. They can be positive or negative and don't need to be in a particular order. The values 1 and -1 can be used to indicate the entire list.
Result returned: None. The passed list is sorted in place.
*)
on dualPivotQuicksort(theList, rangeIndex1, rangeIndex2)
script o
property cutoff : 16 -- Use an insertion sort or swap with ranges of this length or less.
property lst : theList
on qsrt(rangeLeft, rangeRight) -- The dual-pivot Quicksort handler.
(* TWO-MEDIANS-OF-5 PIVOT SELECTION *)
-- Select the second lowest and second highest of five values from the current range to use as pivot values.
set slot3 to (rangeLeft + rangeRight) div 2 -- Roughly the middle of the range
set slot2 to (rangeLeft + slot3) div 2 -- Roughly halfway between the middle and the beginning.
set slot4 to (slot3 + rangeRight) div 2 -- Roughly halfway between the middle and the end.
set v1 to my lst's item rangeLeft
set pivot1 to my lst's item slot2 -- 'pivot1' rather than 'v2', in anticipation of the eventual contents.
set v3 to my lst's item slot3
set pivot2 to my lst's item slot4 -- 'pivot2' rather than 'v4', ditto.
set v5 to my lst's item rangeRight
-- Sort the five values between the variables using an insertion sort algorithm.
if (v1 > pivot1) then
tell pivot1
set pivot1 to v1
set v1 to it
end tell
end if
if (pivot1 > v3) then
tell v3
set v3 to pivot1
if (v1 > it) then
set pivot1 to v1
set v1 to it
else
set pivot1 to it
end if
end tell
end if
if (v3 > pivot2) then
tell pivot2
set pivot2 to v3
if (pivot1 > it) then
set v3 to pivot1
if (v1 > it) then
set pivot1 to v1
set v1 to it
else
set pivot1 to it
end if
else
set v3 to it
end if
end tell
end if
if (pivot2 > v5) then
tell v5
set v5 to pivot2
if (v3 > it) then
set pivot2 to v3
if (pivot1 > it) then
set v3 to pivot1
if (v1 > it) then
set pivot1 to v1
set v1 to it
else
set pivot1 to it
end if
else
set v3 to it
end if
else
set pivot2 to it
end if
end tell
end if
-- Put the non-pivot values back into the list at the appropriate points. The pivot values nominally go to the ends of the range, but don't physically need to be there.
set my lst's item slot2 to v1
set my lst's item slot3 to v3
set my lst's item slot4 to v5
(* PARTIAL FIVE-WAY PARTITIONING *)
-- Group the items which aren't the two pivot instances into values less than pivot1, values between or equal to the pivots, and values greater than pivot2. Then "swap" the pivot instances into place between the middle and outer groups.
set leftPointer to rangeLeft + 1
set rightPointer to rangeRight - 1
set k to leftPointer
repeat until (k > rightPointer)
set leftValue to my lst's item k
if (pivot1 > leftValue) then
-- The current value's less than pivot1. Swap it with the leftmost value in the middle partition and cede that slot to the less-than-pivot1 partition. The item may be swapped with itself the first few times, but it's best not to worry about it.
set my lst's item k to my lst's item leftPointer
set my lst's item leftPointer to leftValue
set leftPointer to leftPointer + 1
else if (leftValue > pivot2) then
-- The value's greater than pivot2. Cede any values greater than pivot2 at the end of the middle partition to the greater-than-pivot2 partition until a value less than or equal to pivot2 shows up or the end pointer meets the current index.
set rightValue to my lst's item rightPointer
repeat while ((rightValue > pivot2) and (rightPointer > k))
set rightPointer to rightPointer - 1
set rightValue to my lst's item rightPointer
end repeat
-- Swap the current value with the one just found. If this is less than pivot1, make it a three-way swap with the item at the beginning of the partition. Extend the outer partition(s) accordingly.
set my lst's item rightPointer to leftValue
if (pivot1 > rightValue) then
set my lst's item k to my lst's item leftPointer
set my lst's item leftPointer to rightValue
set leftPointer to leftPointer + 1
else
set my lst's item k to rightValue
end if
set rightPointer to rightPointer - 1
end if
set k to k + 1
end repeat
-- "Swap" the pivot instances into place between the middle and outer partitions.
set item rangeLeft of my lst to item (leftPointer - 1) of my lst
set item (leftPointer - 1) of my lst to pivot1
set item rangeRight of my lst to item (rightPointer + 1) of my lst
set item (rightPointer + 1) of my lst to pivot2
-- At this point:
-- The left partition (rangeLeft thru (leftPointer - 2)) contains values less than pivot1 or is empty.
-- Item (leftPointer - 1) is an instance of pivot1.
-- The middle partition (leftPointer thru rightPointer) contains at least one value, which is greater than or equal to pivot1 and less than or equal to pivot2.
-- Item (rightPointer + 1) is an instance of pivot2.
-- The right partition ((rightPointer + 2) thru rangeRight) contains values greater than pivot2 or is empty.
-- The two known pivot instances are in the right places in the list and can be left alone for the rest of the sort.
(* RECURSION OR OTHERWISE *)
-- Deal with partitions containing more than one item by recursion, insertion sorting, or simple swapping as appropriate.
set leftPartitionLength to (leftPointer - 1) - rangeLeft
if (leftPartitionLength > cutoff) then
qsrt(rangeLeft, leftPointer - 2)
else if (leftPartitionLength > 2) then
isrt(rangeLeft, leftPointer - 2)
else if (leftPartitionLength is 2) then
set leftValue to my lst's item rangeLeft
set rightValue to my lst's item (leftPointer - 2)
if (leftValue > rightValue) then
set my lst's item rangeLeft to rightValue
set my lst's item (leftPointer - 2) to leftValue
end if
end if
if (pivot2 > pivot1) then -- (If the pivots are equal, the middle partition only contains instances of that value and doesn't need any further action.)
set middlePartitionLength to rightPointer - leftPointer + 1
if (middlePartitionLength > cutoff) then
qsrt(leftPointer, rightPointer)
else if (middlePartitionLength > 2) then
isrt(leftPointer, rightPointer)
else if (middlePartitionLength is 2) then
set leftValue to my lst's item leftPointer
set rightValue to my lst's item rightPointer
if (leftValue > rightValue) then
set my lst's item leftPointer to rightValue
set my lst's item rightPointer to leftValue
end if
end if
end if
set rightPartitionLength to rangeRight - (rightPointer + 1)
if (rightPartitionLength > cutoff) then
qsrt(rightPointer + 2, rangeRight)
else if (rightPartitionLength > 2) then
isrt(rightPointer + 2, rangeRight)
else if (rightPartitionLength is 2) then
set leftValue to my lst's item (rightPointer + 2)
set rightValue to my lst's item rangeRight
if (leftValue > rightValue) then
set my lst's item (rightPointer + 2) to rightValue
set my lst's item rangeRight to leftValue
end if
end if
end qsrt
on isrt(rangeLeft, rangeRight) -- The insertion sort handler.
-- Presort the first two items to set up a minor optimisation whereby the most recent instance of the highest value so far doesn't go back into the list until it's superseded or the end of the sort is reached.
set highestValueSoFar to my lst's item rangeLeft
set currentValue to my lst's item (rangeLeft + 1)
if (highestValueSoFar > currentValue) then
set my lst's item rangeLeft to currentValue
else
set highestValueSoFar to currentValue
end if
-- Work through the rest of the range, rotating values back into the sorted group as necessary.
repeat with k from (rangeLeft + 2) to rangeRight
set currentValue to my lst's item k
if (highestValueSoFar > currentValue) then
-- The current value's less than the highest so far and must be inserted into the sorted group. Shift previously sorted values greater than it up a slot (except for the highest so far, which is in a variable) until the appropriate insertion location's found.
repeat with insertionIndex from (k - 2) to rangeLeft by -1
tell my lst's item insertionIndex
if (it > currentValue) then
-- Greater value. Move it up.
set my lst's item (insertionIndex + 1) to it
else
-- Lesser or equal value. Set the just vacated slot above it as the insertion location and stop looking.
set insertionIndex to insertionIndex + 1
exit repeat
end if
end tell
end repeat
-- Insert the current value at the determined location.
set my lst's item insertionIndex to currentValue
else
-- The current value's greater than or equal to the highest so far and simply inherits that role.
set my lst's item (k - 1) to highestValueSoFar
set highestValueSoFar to currentValue
end if
end repeat
-- At the end, ensure that the highest value goes back into the list.
set my lst's item rangeRight to highestValueSoFar
end isrt
end script
-- Process the input parameters.
set listLen to (count theList)
if (listLen > 1) then
-- Negative and/or transposed range indices.
if (rangeIndex1 < 0) then set rangeIndex1 to listLen + rangeIndex1 + 1
if (rangeIndex2 < 0) then set rangeIndex2 to listLen + rangeIndex2 + 1
if (rangeIndex1 > rangeIndex2) then set {rangeIndex1, rangeIndex2} to {rangeIndex2, rangeIndex1}
-- Do the sort.
if (rangeIndex2 - rangeIndex1 + 1 > o's cutoff) then
tell o to qsrt(rangeIndex1, rangeIndex2)
else
tell o to isrt(rangeIndex1, rangeIndex2)
end if
end if
return -- nothing.
end dualPivotQuicksort
property sort : dualPivotQuicksort
(*
on demo()
script o
property lst : {}
end script
repeat 1000 times
set end of o's lst to (random number 1000)
end repeat
-- Sort the entire list (items 1 thru -1).
sort(o's lst, 1, -1)
return o's lst
end demo
demo()
*)
12th August 2018: Demo at end commented out for clarity.