You are not logged in.
I recently came across Vladimir Yaroslavskiy's dual-pivot Quicksort algorithm — which he claims is even faster than a Quicksort optimised in the ways suggested by Robert Sedgewick and others — and have spent the past month or so trying to get it to be faster in AppleScript instead of slower!
As the name suggests, a dual-pivot Quicksort uses two pivot values for partitioning instead of one. The Java code in Yaroslavskiy's 2009 write-up of his idea includes analogues of three of the optimisations recommended for single-pivot sorts (which makes its claimed superiority over single-pivot sorts with similar optimisations somewhat less impressive. ) These optimisations — two-medians-of-five pivot selection, five-way partitioning, and insertion sorting of short partitions — are all inefficiently coded when translated directly into AppleScript (the insertion sort eyewateringly so), but it may be that the code was simplified to clarify the method. Once these inefficiencies are fixed, the result's a very fast vanilla sort.
Yaroslavskiy's pivot selection involves pre-sorting five evenly-spaced items in the current range, using a sequence of nine compare-and-swap actions. The second and fourth sorted items are then swapped with the items at the ends of the range and used as pivot values — with the minor optimisation that they're not actually put back into the list at that stage. My version of the idea includes the two end items among the five pivot candidates, uses the insertion sort algorithm, sorts the values between variables instead of in the list, and returns the unsuccessful candidates directly to the appropriate slots afterwards. This takes just eight list accesses, four to ten comparisons, and zero to fourteen variable reassignments.
The partitioning scheme in Yaroslavskiy's code initially divides each range into values less than the lesser pivot, an instance of the lesser pivot, values between or equal to the pivots, an instance of the greater pivot, and values greater than the greater pivot. The central partition's then partitioned further to move any other pivot instances it contains to the appropriate ends of it. The idea is that once all instances of the two pivot values are in place, they needn't be touched again for the rest of the sort. In AppleScript, it's faster to do the whole thing in one pass instead of two, which involves more comparisons and moves, but actually takes less time. However, it's still not fast enough to bring sort times down to those obtained with my implementation of the equivalent single-pivot sort. Full five-way partitioning only pays for itself in AppleScript where a pathologically huge proportion of the values is duplicates. It's better to stick with Yaroslavskiy's initial separation and leave any pivot instances in the central partition to be sorted out incidentally later. This does makes the dual-pivot sort faster than my single-pivot implementation in the vast majority of cases!
A less critical, but philosophically more challenging decision involves a related optimisation whereby the middle partition is ignored if the pivots are equal. If they are equal, the partition only contains more instances of that one value and is therefore fully sorted. The problem is that the pivots rarely are equal, so the time spent comparing them is usually wasted. But when the number of duplicates in the list is such that there are fewer than about fifteen different values per hundred items, the sort begins to suffer a speed penalty if it doesn't have this optimisation. (Quicksorts are very inefficient at sorting values that are all the same.) Since the penalty with the optimisation is much less than the worst-case penalty without it, I've kept it in.
The "straight" version of the dual-pivot Quicksort here vies with a couple of my other sorts as my fastest vanilla sort implementation. The customisable version is easily my fastest customisable sort. While testing them, I also tried iterative versions, which are remarkably identical in speed. For some reason, the iterative version of the customisable sort is consistently a few milliseconds faster than its recursive sibling with lists containing more than around ten thousand items. For this reason, and to make the code more interesting (hopefully!), the iterative version of the customisable sort is the one given in the following post. Meanwhile, here's the recursive "straight" version. The script includes a demo handler at the end:
Applescript:
(* Quicksort — dual-pivot version
Quicksort algorithm: S.A.R. (Tony) Hoare, 1960.
Insertion sort algorithm: unknown author.
AppleScript implementations: Arthur J. Knapp and Nigel Garvey, 2003. Minor modifications by Nigel Garvey 2007, 2010, and 2018.
Dual-pivot Quicksort algorithm: Vladimir Yaroslavskiy, 2008/2009.
AppleScript implementation: Nigel Garvey 2018.
Parameters: (list, range index 1, range index 2)
The range index parameters are integers specifying the range to be sorted in the list. They can be positive or negative and don't need to be in a particular order. The values 1 and -1 can be used to indicate the entire list.
Result returned: None. The passed list is sorted in place.
*)
on dualPivotQuicksort(theList, rangeIndex1, rangeIndex2)
script o
property cutoff : 16 -- Use an insertion sort or swap with ranges of this length or less.
property lst : theList
on qsrt(rangeLeft, rangeRight) -- The dual-pivot Quicksort handler.
(* TWO-MEDIANS-OF-5 PIVOT SELECTION *)
-- Select the second lowest and second highest of five values from the current range to use as pivot values.
set slot3 to (rangeLeft + rangeRight) div 2 -- Roughly the middle of the range
set slot2 to (rangeLeft + slot3) div 2 -- Roughly halfway between the middle and the beginning.
set slot4 to (slot3 + rangeRight) div 2 -- Roughly halfway between the middle and the end.
set v1 to my lst's item rangeLeft
set pivot1 to my lst's item slot2 -- 'pivot1' rather than 'v2', in anticipation of the eventual contents.
set v3 to my lst's item slot3
set pivot2 to my lst's item slot4 -- 'pivot2' rather than 'v4', ditto.
set v5 to my lst's item rangeRight
-- Sort the five values between the variables using an insertion sort algorithm.
if (v1 > pivot1) then
tell pivot1
set pivot1 to v1
set v1 to it
end tell
end if
if (pivot1 > v3) then
tell v3
set v3 to pivot1
if (v1 > it) then
set pivot1 to v1
set v1 to it
else
set pivot1 to it
end if
end tell
end if
if (v3 > pivot2) then
tell pivot2
set pivot2 to v3
if (pivot1 > it) then
set v3 to pivot1
if (v1 > it) then
set pivot1 to v1
set v1 to it
else
set pivot1 to it
end if
else
set v3 to it
end if
end tell
end if
if (pivot2 > v5) then
tell v5
set v5 to pivot2
if (v3 > it) then
set pivot2 to v3
if (pivot1 > it) then
set v3 to pivot1
if (v1 > it) then
set pivot1 to v1
set v1 to it
else
set pivot1 to it
end if
else
set v3 to it
end if
else
set pivot2 to it
end if
end tell
end if
-- Put the non-pivot values back into the list at the appropriate points. The pivot values nominally go to the ends of the range, but don't physically need to be there.
set my lst's item slot2 to v1
set my lst's item slot3 to v3
set my lst's item slot4 to v5
(* PARTIAL FIVE-WAY PARTITIONING *)
-- Group the items which aren't the two pivot instances into values less than pivot1, values between or equal to the pivots, and values greater than pivot2. Then "swap" the pivot instances into place between the middle and outer groups.
set leftPointer to rangeLeft + 1
set rightPointer to rangeRight - 1
set k to leftPointer
repeat until (k > rightPointer)
set leftValue to my lst's item k
if (pivot1 > leftValue) then
-- The current value's less than pivot1. Swap it with the leftmost value in the middle partition and cede that slot to the less-than-pivot1 partition. The item may be swapped with itself the first few times, but it's best not to worry about it.
set my lst's item k to my lst's item leftPointer
set my lst's item leftPointer to leftValue
set leftPointer to leftPointer + 1
else if (leftValue > pivot2) then
-- The value's greater than pivot2. Cede any values greater than pivot2 at the end of the middle partition to the greater-than-pivot2 partition until a value less than or equal to pivot2 shows up or the end pointer meets the current index.
set rightValue to my lst's item rightPointer
repeat while ((rightValue > pivot2) and (rightPointer > k))
set rightPointer to rightPointer - 1
set rightValue to my lst's item rightPointer
end repeat
-- Swap the current value with the one just found. If this is less than pivot1, make it a three-way swap with the item at the beginning of the partition. Extend the outer partition(s) accordingly.
set my lst's item rightPointer to leftValue
if (pivot1 > rightValue) then
set my lst's item k to my lst's item leftPointer
set my lst's item leftPointer to rightValue
set leftPointer to leftPointer + 1
else
set my lst's item k to rightValue
end if
set rightPointer to rightPointer - 1
end if
set k to k + 1
end repeat
-- "Swap" the pivot instances into place between the middle and outer partitions.
set item rangeLeft of my lst to item (leftPointer - 1) of my lst
set item (leftPointer - 1) of my lst to pivot1
set item rangeRight of my lst to item (rightPointer + 1) of my lst
set item (rightPointer + 1) of my lst to pivot2
-- At this point:
-- The left partition (rangeLeft thru (leftPointer - 2)) contains values less than pivot1 or is empty.
-- Item (leftPointer - 1) is an instance of pivot1.
-- The middle partition (leftPointer thru rightPointer) contains at least one value, which is greater than or equal to pivot1 and less than or equal to pivot2.
-- Item (rightPointer + 1) is an instance of pivot2.
-- The right partition ((rightPointer + 2) thru rangeRight) contains values greater than pivot2 or is empty.
-- The two known pivot instances are in the right places in the list and can be left alone for the rest of the sort.
(* RECURSION OR OTHERWISE *)
-- Deal with partitions containing more than one item by recursion, insertion sorting, or simple swapping as appropriate.
set leftPartitionLength to (leftPointer - 1) - rangeLeft
if (leftPartitionLength > cutoff) then
qsrt(rangeLeft, leftPointer - 2)
else if (leftPartitionLength > 2) then
isrt(rangeLeft, leftPointer - 2)
else if (leftPartitionLength is 2) then
set leftValue to my lst's item rangeLeft
set rightValue to my lst's item (leftPointer - 2)
if (leftValue > rightValue) then
set my lst's item rangeLeft to rightValue
set my lst's item (leftPointer - 2) to leftValue
end if
end if
if (pivot2 > pivot1) then -- (If the pivots are equal, the middle partition only contains instances of that value and doesn't need any further action.)
set middlePartitionLength to rightPointer - leftPointer + 1
if (middlePartitionLength > cutoff) then
qsrt(leftPointer, rightPointer)
else if (middlePartitionLength > 2) then
isrt(leftPointer, rightPointer)
else if (middlePartitionLength is 2) then
set leftValue to my lst's item leftPointer
set rightValue to my lst's item rightPointer
if (leftValue > rightValue) then
set my lst's item leftPointer to rightValue
set my lst's item rightPointer to leftValue
end if
end if
end if
set rightPartitionLength to rangeRight - (rightPointer + 1)
if (rightPartitionLength > cutoff) then
qsrt(rightPointer + 2, rangeRight)
else if (rightPartitionLength > 2) then
isrt(rightPointer + 2, rangeRight)
else if (rightPartitionLength is 2) then
set leftValue to my lst's item (rightPointer + 2)
set rightValue to my lst's item rangeRight
if (leftValue > rightValue) then
set my lst's item (rightPointer + 2) to rightValue
set my lst's item rangeRight to leftValue
end if
end if
end qsrt
on isrt(rangeLeft, rangeRight) -- The insertion sort handler.
-- Presort the first two items to set up a minor optimisation whereby the most recent instance of the highest value so far doesn't go back into the list until it's superseded or the end of the sort is reached.
set highestValueSoFar to my lst's item rangeLeft
set currentValue to my lst's item (rangeLeft + 1)
if (highestValueSoFar > currentValue) then
set my lst's item rangeLeft to currentValue
else
set highestValueSoFar to currentValue
end if
-- Work through the rest of the range, rotating values back into the sorted group as necessary.
repeat with k from (rangeLeft + 2) to rangeRight
set currentValue to my lst's item k
if (highestValueSoFar > currentValue) then
-- The current value's less than the highest so far and must be inserted into the sorted group. Shift previously sorted values greater than it up a slot (except for the highest so far, which is in a variable) until the appropriate insertion location's found.
repeat with insertionIndex from (k - 2) to rangeLeft by -1
tell my lst's item insertionIndex
if (it > currentValue) then
-- Greater value. Move it up.
set my lst's item (insertionIndex + 1) to it
else
-- Lesser or equal value. Set the just vacated slot above it as the insertion location and stop looking.
set insertionIndex to insertionIndex + 1
exit repeat
end if
end tell
end repeat
-- Insert the current value at the determined location.
set my lst's item insertionIndex to currentValue
else
-- The current value's greater than or equal to the highest so far and simply inherits that role.
set my lst's item (k - 1) to highestValueSoFar
set highestValueSoFar to currentValue
end if
end repeat
-- At the end, ensure that the highest value goes back into the list.
set my lst's item rangeRight to highestValueSoFar
end isrt
end script
-- Process the input parameters.
set listLen to (count theList)
if (listLen > 1) then
-- Negative and/or transposed range indices.
if (rangeIndex1 < 0) then set rangeIndex1 to listLen + rangeIndex1 + 1
if (rangeIndex2 < 0) then set rangeIndex2 to listLen + rangeIndex2 + 1
if (rangeIndex1 > rangeIndex2) then set {rangeIndex1, rangeIndex2} to {rangeIndex2, rangeIndex1}
-- Do the sort.
if (rangeIndex2 - rangeIndex1 + 1 > o's cutoff) then
tell o to qsrt(rangeIndex1, rangeIndex2)
else
tell o to isrt(rangeIndex1, rangeIndex2)
end if
end if
return -- nothing.
end dualPivotQuicksort
property sort : dualPivotQuicksort
(*
on demo()
script o
property lst : {}
end script
repeat 1000 times
set end of o's lst to (random number 1000)
end repeat
-- Sort the entire list (items 1 thru -1).
sort(o's lst, 1, -1)
return o's lst
end demo
demo()
*)
12th August 2018: Demo at end commented out for clarity.
Last edited by Nigel Garvey (2018-08-19 08:30:57 am)
NG
Offline
Following on from the previous post, here's the iterative customisable version. The script includes a couple of demo handlers at the end. Don't worry about its length. It is quite fast.
Applescript:
(* Quicksort — customisable iterative dual-pivot version
Quicksort algorithm: S.A.R. (Tony) Hoare, 1960.
Insertion sort algorithm: unknown author.
AppleScript implementations: Arthur J. Knapp and Nigel Garvey, 2003. Minor modifications by Nigel Garvey 2007, 2010, and 2018.
Dual-pivot Quicksort algorithm: Vladimir Yaroslavskiy, 2008/2009.
AppleScript implementation: Nigel Garvey 2018.
Parameters: (the list, range index 1, range index 2, customisation object)
The range index parameters are integers specifying the range to be sorted in the list. They can be positive or negative and don't need to be in a particular order. The values 1 and -1 can be used to indicate the entire list.
The customisation object is a record with optional 'comparer' and/or 'slave' properties.
If given, the 'comparer' value must be a script object containing an isGreater(item1, item2) handler which compares two items passed to it from the list and returns a boolean indicating whether or not the first is "greater than" the second according to its criteria.
If given, the 'slave' value can be a list of lists in which the moves of the main sort are to be echoed. These lists must of course be long enough to allow the same absolute range indices as for the main list. Alternatively, the value can be a script object containing its own slave handlers and any necessary preset properties. This is for compatibility with scripts written to use earlier versions of my customisable sorts and might conceivably be used for other purposes, such as transposing the sort range in the slave list(s) or counting item movements. The same script object can be used for both the 'comparer' and 'slave' values if it contains both sets of handlers.
Where the 'comparer' and/or 'slave' properties are omitted, or the customisation parameter isn't a record or a script object, the defaults are direct comparisons and no 'slave' actions. Because of the customisation hooks, a "straight" sort with this code isn't as fast as with the non-customisable version. But subjectively, this isn't noticeable.
Result returned: None. The passed lists are sorted in place.
*)
on customIterativeDualPivotQuicksort(theList, rangeIndex1, rangeIndex2, customiser)
script o
property cutoff : 16 -- Use an insertion sort or swap with ranges of this length or less.
property comparer : me
property slave : me
property slaveSorting : true
property lst : theList
property stack : {} -- For the range indices of partitions waiting to be processed.
property slaveLists : {}
property slaveListCount : missing value
property singleSlaveList : missing value
on qsrt(rangeLeft, rangeRight)
set stackLength to 0
set stackIndex to 1
repeat -- Iteration repeat.
(* TWO-MEDIANS-OF-5 PIVOT SELECTION *)
-- Select the second lowest and second highest of five values from the current range to use as pivot values.
set slot3 to (rangeLeft + rangeRight) div 2 -- Roughly the middle of the range
set slot2 to (rangeLeft + slot3) div 2 -- Roughly halfway between the middle and the beginning.
set slot4 to (slot3 + rangeRight) div 2 -- Roughly halfway between the middle and the end.
set v1 to my lst's item rangeLeft
set pivot1 to my lst's item slot2 -- 'pivot1' rather than 'v2', in anticipation of the eventual contents.
set v3 to my lst's item slot3
set pivot2 to my lst's item slot4 -- 'pivot2' rather than 'v4', ditto.
set v5 to my lst's item rangeRight
-- Sort the five values between the variables using an insertion sort algorithm. In any slave sort, the sorting takes place in the slave list(s).
if (comparer's isGreater(v1, pivot1)) then
tell pivot1
set pivot1 to v1
set v1 to it
end tell
if (slaveSorting) then tell slave to swap(slot2, rangeLeft)
end if
if (comparer's isGreater(pivot1, v3)) then
tell v3
set v3 to pivot1
if (comparer's isGreater(v1, it)) then
set pivot1 to v1
set v1 to it
if (slaveSorting) then tell slave to swap3(slot3, slot2, rangeLeft)
else
set pivot1 to it
if (slaveSorting) then tell slave to swap(slot3, slot2)
end if
end tell
end if
if (comparer's isGreater(v3, pivot2)) then
tell pivot2
set pivot2 to v3
if (comparer's isGreater(pivot1, it)) then
set v3 to pivot1
if (comparer's isGreater(v1, it)) then
set pivot1 to v1
set v1 to it
if (slaveSorting) then
tell slave
swap3(slot4, slot3, slot2)
swap(slot2, rangeLeft)
end tell
end if
else
set pivot1 to it
if (slaveSorting) then tell slave to swap3(slot4, slot3, slot2)
end if
else
set v3 to it
if (slaveSorting) then tell slave to swap(slot4, slot3)
end if
end tell
end if
if (comparer's isGreater(pivot2, v5)) then
tell v5
set v5 to pivot2
if (comparer's isGreater(v3, it)) then
set pivot2 to v3
if (comparer's isGreater(pivot1, it)) then
set v3 to pivot1
if (comparer's isGreater(v1, it)) then
set pivot1 to v1
set v1 to it
if (slaveSorting) then
tell slave
swap3(rangeRight, slot4, slot3)
swap3(slot3, slot2, rangeLeft)
end tell
end if
else
set pivot1 to it
if (slaveSorting) then
tell slave
swap3(rangeRight, slot4, slot3)
swap(slot3, slot2)
end tell
end if
end if
else
set v3 to it
if (slaveSorting) then tell slave to swap3(rangeRight, slot4, slot3)
end if
else
set pivot2 to it
if (slaveSorting) then tell slave to swap(rangeRight, slot4)
end if
end tell
end if
-- Put the non-pivot values back into the list at the appropriate points. The pivot values nominally go to the ends of the range, but don't physically need to be there (except in any slave lists).
set my lst's item slot2 to v1
set my lst's item slot3 to v3
set my lst's item slot4 to v5
if (slaveSorting) then
tell slave
swap(rangeLeft, slot2)
swap(slot4, rangeRight)
end tell
end if
(* PARTIAL FIVE-WAY PARTITIONING *)
-- Group the items which aren't the two pivot instances into values less than pivot1, values between or equal to the pivots, and values greater than pivot2. Then "swap" the pivot instances into place between the middle and outer groups.
set leftPointer to rangeLeft + 1
set rightPointer to rangeRight - 1
set k to leftPointer
repeat until (k > rightPointer)
set leftValue to my lst's item k
if (comparer's isGreater(pivot1, leftValue)) then
-- The current value's less than pivot1. Swap it with the leftmost value in the middle partition and cede that slot to the less-than-pivot1 partition. The item may be swapped with itself the first few times, but it's best not to worry about it.
set my lst's item k to my lst's item leftPointer
set my lst's item leftPointer to leftValue
if (slaveSorting) then tell slave to swap(leftPointer, k)
set leftPointer to leftPointer + 1
else if (comparer's isGreater(leftValue, pivot2)) then
-- The value's greater than pivot2. Cede any values greater than pivot2 at the end of the middle partition to the greater-than-pivot2 partition until a value less than or equal to pivot2 shows up or the end pointer meets the current index.
set rightValue to my lst's item rightPointer
repeat while ((comparer's isGreater(rightValue, pivot2)) and (rightPointer > k))
set rightPointer to rightPointer - 1
set rightValue to my lst's item rightPointer
end repeat
-- Swap the current value with the one just found. If this is less than pivot1, make it a three-way swap with the item at the beginning of the partition. Extend the outer partition(s) accordingly.
set my lst's item rightPointer to leftValue
if (comparer's isGreater(pivot1, rightValue)) then
set my lst's item k to my lst's item leftPointer
set my lst's item leftPointer to rightValue
if (slaveSorting) then tell slave to swap3(k, leftPointer, rightPointer)
set leftPointer to leftPointer + 1
else
set my lst's item k to rightValue
if (slaveSorting) then tell slave to swap(k, rightPointer)
end if
set rightPointer to rightPointer - 1
end if
set k to k + 1
end repeat
-- "Swap" the pivot instances into place between the middle and outer partitions. (Real swaps in any slave list.)
set item rangeLeft of my lst to item (leftPointer - 1) of my lst
set item (leftPointer - 1) of my lst to pivot1
if (slaveSorting) then tell slave to swap(rangeLeft, leftPointer - 1)
set item rangeRight of my lst to item (rightPointer + 1) of my lst
set item (rightPointer + 1) of my lst to pivot2
if (slaveSorting) then tell slave to swap(rightPointer + 1, rangeRight)
-- At this point:
-- The left partition (rangeLeft thru (leftPointer - 2)) contains values less than pivot1 or is empty.
-- Item (leftPointer - 1) is an instance of pivot1.
-- The middle partition (leftPointer thru rightPointer) contains at least one value, which is greater than or equal to pivot1 and less than or equal to pivot2.
-- Item (rightPointer + 1) is an instance of pivot2.
-- The right partition ((rightPointer + 2) thru rangeRight) contains values greater than pivot2 or is empty.
-- The two known pivot instances are in the right places in the list and can be left alone for the rest of the sort.
(* ITERATION CONTROL *)
-- Deal with non-empty partitions in subsequent repeats or by insertion sorting or simple swapping as appropriate.
set leftPartitionLength to (leftPointer - 1) - rangeLeft
if (comparer's isGreater(pivot2, pivot1)) then
set middlePartitionLength to rightPointer - leftPointer + 1
else -- The pivots are equal, so the middle partition only contains instances of that value and doesn't need any further action.
set middlePartitionLength to 0
end if
set rightPartitionLength to rangeRight - (rightPointer + 1)
-- If the right partition's to be handled in a repeat after either or both of the other partitions, store its range indices for then. Otherwise, if it's to be insertion sorted or its items swapped, do that now.
if (rightPartitionLength > cutoff) then
if ((leftPartitionLength > cutoff) or (middlePartitionLength > cutoff)) then
if (stackIndex > stackLength) then
set end of my stack to rightPointer + 2
set end of my stack to rangeRight
set stackLength to stackLength + 2
else
set my stack's item stackIndex to rightPointer + 2
set my stack's item (stackIndex + 1) to rangeRight
end if
set stackIndex to stackIndex + 2
end if
else if (rightPartitionLength > 2) then
isrt(rightPointer + 2, rangeRight)
else if (rightPartitionLength is 2) then
set leftValue to my lst's item (rightPointer + 2)
set rightValue to my lst's item rangeRight
if (comparer's isGreater(leftValue, rightValue)) then
set my lst's item (rightPointer + 2) to rightValue
set my lst's item rangeRight to leftValue
if (slaveSorting) then tell slave to swap(rightPointer + 2, rangeRight)
end if
end if
-- Similarly with the middle partition, except that it won't be handled after the right.
if (middlePartitionLength > cutoff) then
if (leftPartitionLength > cutoff) then
if (stackIndex > stackLength) then
set end of my stack to leftPointer
set end of my stack to rightPointer
set stackLength to stackLength + 2
else
set my stack's item stackIndex to leftPointer
set my stack's item (stackIndex + 1) to rightPointer
end if
set stackIndex to stackIndex + 2
end if
else if (middlePartitionLength > 2) then
isrt(leftPointer, rightPointer)
else if (middlePartitionLength is 2) then
set leftValue to my lst's item leftPointer
set rightValue to my lst's item rightPointer
if (comparer's isGreater(leftValue, rightValue)) then
set my lst's item leftPointer to rightValue
set my lst's item rightPointer to leftValue
if (slaveSorting) then tell slave to swap(rightPointer + 2, rangeRight)
end if
end if
-- If the left partition's to be handled in a repeat, set the repeat's right range index to the partition's. Otherwise, use an insertion sort or swap (or do nothing) and set the repeat's range indices to those of one of the other partitions. If neither qualifies, retrieve a pair of indices from the stack and use those. If there are none left, finish.
if (leftPartitionLength > cutoff) then
set rangeRight to leftPointer - 2
else
if (leftPartitionLength > 2) then
isrt(rangeLeft, leftPointer - 2)
else if (leftPartitionLength is 2) then
set leftValue to my lst's item rangeLeft
set rightValue to my lst's item (leftPointer - 2)
if (comparer's isGreater(leftValue, rightValue)) then
set my lst's item rangeLeft to rightValue
set my lst's item (leftPointer - 2) to leftValue
if (slaveSorting) then tell slave to swap(rightPointer + 2, rangeRight)
end if
end if
if (middlePartitionLength > cutoff) then
set rangeLeft to leftPointer
set rangeRight to rightPointer
else if (rightPartitionLength > cutoff) then
set rangeLeft to rightPointer + 2
else
set stackIndex to stackIndex - 2
if (stackIndex < 1) then exit repeat -- The stacked partition indices have been used up. There's nothing more to do.
set rangeLeft to my stack's item stackIndex
set rangeRight to my stack's item (stackIndex + 1)
end if
end if
end repeat -- End of iteration repeat.
end qsrt
on isrt(rangeLeft, rangeRight) -- The insertion sort handler.
-- Presort the first two items to set up a minor optimisation whereby the most recent instance of the highest value so far doesn't go back into the list until it's superseded or the end of the sort is reached.
set highestValueSoFar to my lst's item rangeLeft
set currentValue to my lst's item (rangeLeft + 1)
if (comparer's isGreater(highestValueSoFar, currentValue)) then
set my lst's item rangeLeft to currentValue
if (slaveSorting) then tell slave to swap(rangeLeft, rangeLeft + 1)
else
set highestValueSoFar to currentValue
end if
-- Work through the rest of the range, rotating values back into the sorted group as necessary.
repeat with k from (rangeLeft + 2) to rangeRight
set currentValue to my lst's item k
if (comparer's isGreater(highestValueSoFar, currentValue)) then
-- The current value's less than the highest so far and must be inserted into the sorted group. Shift previously sorted values greater than it up a slot (except for the highest so far, which is in a variable) until the appropriate insertion location's found.
repeat with insertionIndex from (k - 2) to rangeLeft by -1
tell my lst's item insertionIndex
if (comparer's isGreater(it, currentValue)) then
-- Greater value. Move it up.
set my lst's item (insertionIndex + 1) to it
else
-- Lesser or equal value. Set the just vacated slot above it as the insertion location and stop looking.
set insertionIndex to insertionIndex + 1
exit repeat
end if
end tell
end repeat
-- Insert the current value at the determined location and reproduce the entire rotation in any slave sort.
set my lst's item insertionIndex to currentValue
if (slaveSorting) then tell slave to rotate(insertionIndex, k)
else
-- The current value's greater than or equal to the highest so far and simply inherits that role.
set my lst's item (k - 1) to highestValueSoFar
set highestValueSoFar to currentValue
end if
end repeat
-- At the end, ensure that the main list's highest value goes back into it.
set my lst's item rangeRight to highestValueSoFar
end isrt
-- Default comparison and slave handlers.
on isGreater(a, b)
(a > b)
end isGreater
on swap(a, b)
tell my singleSlaveList's item a
set my singleSlaveList's item a to my singleSlaveList's item b
set my singleSlaveList's item b to it
end tell
end swap
on swap3(a, b, c)
tell my singleSlaveList's item a
set my singleSlaveList's item a to my singleSlaveList's item b
set my singleSlaveList's item b to my singleSlaveList's item c
set my singleSlaveList's item c to it
end tell
end swap3
on rotate(a, b)
tell my singleSlaveList's item b
repeat with rotationIndex from (b - 1) to a by -1
set my singleSlaveList's item (rotationIndex + 1) to my singleSlaveList's item rotationIndex
end repeat
set my singleSlaveList's item a to it
end tell
end rotate
-- Alternative slave handlers for when there are multiple slave lists.
on swapMultiple(a, b)
repeat with i from 1 to slaveListCount
tell my slaveLists's item i's item a
set my slaveLists's item i's item a to my slaveLists's item i's item b
set my slaveLists's item i's item b to it
end tell
end repeat
end swapMultiple
on swap3Multiple(a, b, c)
repeat with i from 1 to slaveListCount
tell my slaveLists's item i's item a
set my slaveLists's item i's item a to my slaveLists's item i's item b
set my slaveLists's item i's item b to my slaveLists's item i's item c
set my slaveLists's item i's item c to it
end tell
end repeat
end swap3Multiple
on rotateMultiple(a, b)
repeat with i from 1 to slaveListCount
tell my slaveLists's item i's item b
repeat with rotationIndex from (b - 1) to a by -1
set my slaveLists's item i's item (rotationIndex + 1) to my slaveLists's item i's item rotationIndex
end repeat
set my slaveLists's item i's item a to it
end tell
end repeat
end rotateMultiple
end script
-- Process the input parameters.
set listLen to (count theList)
if (listLen > 1) then
-- Negative and/or transposed range indices.
if (rangeIndex1 < 0) then set rangeIndex1 to listLen + rangeIndex1 + 1
if (rangeIndex2 < 0) then set rangeIndex2 to listLen + rangeIndex2 + 1
if (rangeIndex1 > rangeIndex2) then set {rangeIndex1, rangeIndex2} to {rangeIndex2, rangeIndex1}
if (rangeIndex2 > rangeIndex1) then
-- The customisation parameter for the earliest version of Custom Quicksort was a script object containing both comparison and slave handlers. Accept that, if presented; otherwise expect a record containing comparer and/or slave properties, whose values are script objects containing the comparison and slave handlers respectively. If either property is omitted from the record, or if the parameter isn't a script object or a record, use the appropriate default handers in the script object above. NEW: THE SLAVE PARAMETER CAN NOW BE JUST A LIST OF SLAVE LISTS, IN WHICH CASE THE SCRIPT ITSELF WILL APPLY THE APPROPRIATE SLAVE ACTIONS TO EACH ONE.
if (customiser's class is script) then
try -- Use the customising script's isGreater handler, if it has one.
customiser's isGreater
set o's comparer to customiser
end try
try -- Use the customising script's slave handler and properties, if it has them.
customiser's swap
set o's slave to customiser
on error
set o's slaveSorting to false
end try
else if ((customiser is {}) or (customiser's class is record)) then
-- Use the passed or default comparer script. Get the passed or default slave parameter.
set {comparer:o's comparer, slave:slaveParam} to customiser & {comparer:o, slave:o}
if (slaveParam's class is script) then
-- Passed or default slave script. Use it.
set o's slave to slaveParam
set o's slaveSorting to (slaveParam is not o)
else if (slaveParam's class is list) then
-- Passed list of slave lists. Set the default 'slave' script object's slaveLists property to it.
set o's slaveLists to slaveParam
set o's slaveSorting to false
end if
-- Configure the default 'slave' script object to use the best handlers for the number of slave lists. This makes no difference with a non-default slave.
set o's slaveListCount to (count o's slaveLists)
if (o's slaveListCount > 0) then
set o's slaveSorting to true
if (o's slaveListCount is 1) then
set o's singleSlaveList to beginning of o's slaveLists
else
set o's swap to o's swapMultiple
set o's swap3 to o's swap3Multiple
set o's rotate to o's rotateMultiple
end if
end if
end if
-- Do the sort.
if (rangeIndex2 - rangeIndex1 + 1 > o's cutoff) then
tell o to qsrt(rangeIndex1, rangeIndex2)
else
tell o to isrt(rangeIndex1, rangeIndex2)
end if
end if
end if
return -- nothing.
end customIterativeDualPivotQuicksort
property sort : customIterativeDualPivotQuicksort
(*
on demo1() -- Sort copies of a list in various ways.
-- Custom comparer. Returns the opposite of the truth for a reverse sort.
script descending
on isGreater(a, b)
return (a < b)
end isGreater
end script
-- Set up five recognisably similar lists for demo purposes.
set {list1, list2, list3, list4, list5} to {{}, {}, {}, {}, {}}
repeat 25 times
set n to (random number 1000)
set end of list1 to n
set end of list2 to n
set end of list3 to n
set end of list4 to n as text
set end of list5 to n as real
end repeat
-- Sort list1 normally.
sort(list1, 1, -1, {})
-- Reverse sort items 1 thru 20 of list2, leaving items 21 thru 25 unsorted.
sort(list2, 1, 20, {comparer:descending})
-- Reverse sort list3, copying the moves in list4 and list5.
sort(list3, 1, -1, {comparer:descending, slave:{list4, list5}})
return {ascending:list1, |1thru20descending21thru25unsorted|:list2, descendingWithSlaveSorts:{list3, list4, list5}}
end demo1
on demo2() -- Sort a list of records in descending order by 'age' with ascending subsorts by 'surname' and 'christianName'.
-- Custom comparer.
script descendingByAgeAscendingByName
on isGreater(a, b)
if (a's age = b's age) then
if (a's surname = b's surname) then
return (a's christianName > b's christianName)
else
return (a's surname > b's surname)
end if
else
return (a's age < b's age)
end if
end isGreater
end script
-- Set up a sample list of records.
set surnames to {"Smith", "Jones", "Aardvark", "AppleScript", "Walker", "Diplodocus", "Klein", "Zozzle", "Lazenby"}
set christianNames to {"John", "Bert", "James", "Fred", "Ella", "Mabel", "Wayne", "Peter", "Mabel", "Avarice", "Verisimilitude", "Dinsdale", "Isambard", "Henrietta", "Stephen", "Philip", "Ringo", "Sid", "Genghis", "Hayley", "Caligula"}
set lst to {}
repeat until ((count lst) is 25)
set r to {age:(random number from 21 to 35), surname:(some item of surnames), christianName:(some item of christianNames)}
if (lst does not contain {r}) then set end of lst to r
end repeat
-- Do the sort.
sort(lst, 1, -1, {comparer:descendingByAgeAscendingByName})
return lst
end demo2
demo1()
-- demo2()
*)
8th April 2018: Replaced the dummy slave handlers, which were used when other lists weren't being sorted in parallel with the main one, with conditional calls to the non-dummies. This makes sorts with parallel sorting very slightly less fast, but speeds up sorts without it (which are probably more usual) by a greater amount.
12th August 2018: Demos at end commented out for clarity.
29th October 2019: Check for r in lst in the second demo corrected following jean.o.matic's comment below.
Last edited by Nigel Garvey (2019-10-29 01:54:44 pm)
NG
Offline
Wow!! It really is very fast. Thanks, Nigel.
Mac mini running 10.14.6, 2011 27" iMac as display.
Offline
Hi Nigel: I saw this article on timsort and thought you might be interested. I searched but didn't find anything on macscripter.net about it.
https://hackernoon.com/timsort-the-fast … b28417f399
Offline
Hi kerflooey.
Thanks for the link! I've downloaded a couple of descriptions of timsort and will no doubt become engrossed in it when I get a moment or three.
NG
Offline
Hi Nigel: I saw this article on timsort and thought you might be interested. I searched but didn't find anything on macscripter.net about it.
Have been trying to speed up the natural merge sort, known as timsort in Python, but with no avail for unsorted data. Natural merge sort is of course the fastest when you have two ordered lists, even in AppleScript but I didn't bother to post it here (iirc, I think I've mentioned natural merge sort somewhere).
It is true that natural merger sort (timsort) is insanely fast in programming languages like C. It makes use of the fact that unsorted data doesn't exists, only worse case sorting. It works most efficiently using linked lists, a type of list AppleScript doesn't support (anymore).
Offline
Well. I've done an initial literal translation of the Timsort code in the hackernoon article into AppleScript, implementing a bug fix suggested by someone on GitHub, where it had also been posted. As one would expect, the result's a disaster speedwise, given all the repeated list accesses and concatentations. It's also nowhere near a full implementation of Timsort, since all it does is to grab runs of items which are already in order, sort each run (!) with a very inefficient insertion sort, and then merge the runs. If I've correctly understood Tim Peters's own description of the sort, it should also grab and reverse any runs of items which are exactly in reversed order, extend and presort runs to get optimal lengths for merging (hence the need for the insertion sort), and implement fiendishly complex optimisations of the merges themselves.
I may be gone some time….
Last edited by Nigel Garvey (2018-08-18 08:18:47 am)
NG
Offline
Thanks Nigel for all these sorting algorithms. Your knowledge of them and how you adapt them in nice Applescript code is amazing. I always search MacScripter for your scripts when I need a sorting algorithm for a special task. I also learned how to make a more versatile handler by passing the comparaison object as a parameter.
Just a remark about the second demo in the iterative version where you wrote this line:
if (lst does not contain r) then set end of lst to r
to test the inclusion of a record in a list. This never worked for me (always false).
This simple test produced a list of 100 items with only 27 unique one:
Applescript:
set surnames to {"Smith", "Jones", "Taylor"}
set christianNames to {"John", "Bert", "Williams"}
set lst to {}
repeat until ((count lst) is 100)
set r to {age:(random number from 21 to 23), surname:(some item of surnames), christianName:(some item of christianNames)}
if (lst does not contain r) then set end of lst to r
end repeat
return {lst, lst's length}
Model: iMac 2017
AppleScript: 25.1.7
Browser: Safari 605.1.15
Operating System: macOS 10.14
Last edited by jean.o.matic (2019-10-29 12:14:54 pm)
______________________
Jean.O.matiC
Montréal, Québec, Canada
Offline
Just a remark about the second demo in the iterative version where you wrote this line:
if (lst does not contain r) then set end of lst to r
to test the inclusion of a record in a list. This never worked for me (always false).
Thanks, jean.o.matic!
That's a mistake on my part. I was forgetting that records and lists have to be explicity presented in lists when using 'contains' or 'is in' to see if other lists contain them:
if (lst does not contain {r}) then set end of lst to r.
'contains' and 'is in' are commands which compare lists. If the item to be checked isn't a list, it's coerced to one for the comparison, which is OK with most values, but not with records and lists!
Applescript:
set lst to {{age:35, surname:"Bloggs", christianName:"Bill"}, {99, "Smith", "Rumplestiltskin"}, 44, "Aardvark", "Aaron"}
lst contains "Aaron" --> true. ("Aaron" coerced to {"Aaron"}.)
lst contains {"Aaron"} --> true. ({"Aaron"} explicit.)
lst contains {age:35, surname:"Bloggs", christianName:"Bill"} --> false. (Record coerced to three-item list.)
lst contains {{age:35, surname:"Bloggs", christianName:"Bill"}} --> true
lst contains {99, "Smith", "Rumplestiltskin"} --> false
lst contains {{99, "Smith", "Rumplestiltskin"}} --> true
lst contains {44, "Aardvark", "Aaron"} --> true
I'll edit my post above.
NG
Offline