QuickSort Algorithm

Hi, Kevin.

That “optimized pivot” idea looks very interesting. Thanks for posting it. I’m going to have a happy couple of days trying to analyse it to see if Arthur’s and my effort can benefit from it! :slight_smile:

Meanwhile, I’ve taken the liberty of applying a couple of optimisations to your Qpartition() handler. Basically, they use existing variable values to reduce the number of times the list is accessed. There’s also an exploitation of an AppleScript quirk whereby “simple” comparisons are faster than “compound” ones. It’s faster to do this .

if a > b then
else
	-- Do something.
end if

. than this:

if a ≤ b then
	-- Do something.
end if

Similarly with ≥ and not. It’s faster to test their antitheses than to test them. The difference is minimal, but every little helps in a large sort!

on Qpartition(leftIndex, rightIndex, pivot)
	--private routine called by quickSort.  
	--do not call from your script!
	set pivotValue to item pivot of bs's alist
	--set temp to item pivot of bs's alist -- We already have this as pivotValue.
	set item pivot of bs's alist to item rightIndex of bs's alist
	--set item rightIndex of bs's alist to temp -- No need to put it back in the list.
	set tempIndex to leftIndex
	repeat with pointer from leftIndex to (rightIndex - 1)
		set temp to item pointer of bs's alist -- temp set here to save getting the item twice.
		if temp > pivotValue then --≤ pivotValue then
		else
			--set temp to item pointer of bs's alist -- Moved up.
			set item pointer of bs's alist to item tempIndex of bs's alist
			set item tempIndex of bs's alist to temp
			set tempIndex to tempIndex + 1
		end if
	end repeat
	--set temp to item rightIndex of bs's alist -- Already in pivotValue.
	set item rightIndex of bs's alist to item tempIndex of bs's alist
	set item tempIndex of bs's alist to pivotValue --temp
	
	return tempIndex
end Qpartition

Funny you should mention. As I was writing this (and the other sorts - bubble, insertion, and merge) I found myself wondering if there was a difference, but I didn’t code it up to see if there was.

Thanks Nigel! :cool:

Hey everybody
as I’m not the perfect scripter as you guys, I’m having a hard time to transform the qsort routine into a qsort the would match my needs.
I’ve been using Kevins’s pivot version at the moment - and was also unsuccesfully trying to implement Nigel’s superfast version (I’m working in a tell application routine and I couldn’t call Nigel’s routine through that)

What I actually want to achieve is removing the duplicates, while sorting the list.

As I didn’t get that done, for the moment, I’m rearranging the sorted list like that:


				set srtLst to my quickSort(Lst)
		set lgthsrtLst to srtLst's length
		set Lst to {}
		repeat with x1 from 1 to lgthsrtLst
			set TmpItm to item x1 of srtLst
			if x1 < (lgthsrtLst) then
				if TmpItm = item (x1 + 1) of srtLst then
				else
					set Lst to Lst & TmpItm
				end if
			end if
			if x1 = (lgthsrtLst) then set Lst to Lst & TmpItm
		end repeat

It would probably be much faster, if I’d erase the duplicates directly in the qsort handler.
could you give me a hint on how to get that done?

Thank you so very much!
And most important thank you so much for posting your wisdom in general!!

ekel75

PS. do you know which is faster, using a variable for 3 to 4 instances in a script

			set lgthCalcNum to CalcNum's length
			if (lgthCalcNum) > 6 then set CalcNum to (text 1 thru ((lgthCalcNum) - 6) of CalcNum) & "." & text ((lgthCalcNum) - 5) thru (lgthCalcNum) of CalcNum

or not!? ( like this )

		if (CalcNum's length) > 6 then set CalcNum to (text 1 thru ((CalcNum's length) - 6) of CalcNum) & "." & text ((CalcNum's length) - 5) thru (CalcNum's length) of CalcNum

Model: Desktop G4 800MHz
AppleScript: AppleScript D1-1.9.3
Browser: Firefox 1.5.0.7
Operating System: Mac OS X (10.4)

Hi, ekel75.

I don’t think that combining a sort with removing the duplicates would actually be faster. Sorts do a large number of item comparisons “ often a very large number. The comparisons check if one item is greater or less than another. It would greatly increase the workload if each stage of the sort also included a check to see if the items were equal, and making a new list every time a duplicate was found would increase it even more. There may be an effective way to reduce the slow-down, but I can’t see it.

Your process of sorting the list first and then building another without the duplicates is probably the best way. I’d code it something like this:

-- A script object for speed of access to the lists.
script o
	property sortedList : missing value
	property prunedList : missing value
end script

-- Do the sort.
set o's sortedList to quickSort(myList)

-- Since the list has been sorted, equal values will be together, not randomly distributed.

-- Initialise a check value to the first item in the sortedList
set checkValue to item 1 of o's sortedList
-- Start off the pruned list with that value.
set o's prunedList to {checkValue}
repeat with i from 2 to (count o's sortedList)
	set thisValue to item i of o's sortedList
	if (thisValue is checkValue) then
		-- If this value's a duplicate, skip it.
	else
		-- Otherwise, we've reached (a possible run of) a new value.
		-- Append this value to the pruned list and make it the new check value.
		set end of o's prunedList to thisValue
		set checkValue to thisValue
	end if
end repeat
set myList to o's prunedList

Another way, if the items in the list are all of the same class and you know when you write the script what that will be, is this:

-- A script object for speed of access to the list.
script o
	property sortedList : missing value
end script

-- Do the sort.
set o's sortedList to quickSort(myList)

-- Since the list has been sorted, equal values will be together, not randomly distributed.

-- Initialise a check value to the first item in the sortedList
set checkValue to item 1 of o's sortedList
repeat with i from 2 to (count o's sortedList)
	set thisValue to item i of o's sortedList
	if (thisValue is checkValue) then
		-- If this value's a duplicate, replace it with 'missing value'
		set item i of o's sortedList to missing value
	else
		-- Otherwise, we've reached (a possible run of) a new value.
		-- Make it the new check value.
		set checkValue to thisValue
	end if
end repeat

-- Assuming that this is a list of integers, return the integers (not the 'missing values').
set myList to o's sortedList's integers

I’ve just seen the addendum to your post. The version with the variable should be slightly faster, but the difference will be miniscule with just those two lines. Basically, retrieving a value from a variable is faster than calling a function or using a reference to work it out again.

Dear Nigel,

thank you very much for your quick response and all your effort!
I like your routine very much. My approach has the same basic principle but your’s just got that neatly programmed twist to it.
Simply pro! :wink: Thank you very much for making me understand Applescript + programming more and more :slight_smile:

So now I’m trying to implement your second version, since it’s all the same class.
I tried it with an empty script and worked like a charm.

Trying to implement it doesn’t work too well though :frowning:

I’m in a < tell application > routine, and some of the stuff from Applescript has to be handled differently from within < tell application, >
which I haven’t fully figured out yet. (Which is why I’m not using your quicksort :frowning: ) Calling subroutines works fine though,
so I tried to put your script into a new handler and calling it up like that:

tell application "Adobe InDesign CS2"
	activate
tell document 1
...
		set Lst to my pruneList(Lst)
...
end tell
end tell

--
--Subroutines

on quickSort(theLst)
...
end quickSort

on pruneList(thePruneLst)
	script o
		property sortedList : missing value
	end script
	set o's sortedList to my quickSort(thePruneLst)
	-- Initialise a check value to the first item in the sortedList
	set checkValue to item 1 of o's sortedList
	repeat with i from 2 to (count o's sortedList)
		set thisValue to item i of o's sortedList
		if (thisValue is checkValue) then
			-- If this value's a duplicate, replace it with 'missing value'
			set item i of o's sortedList to missing value
		else
			-- Otherwise, we've reached (a possible run of) a new value.
			-- Make it the new check value.
			set checkValue to thisValue
		end if
	end repeat
	return o's sortedList's text
end pruneList

But somehow I’m not getting it right. Am I not allowed to call the quickSort handle from within a second handler ?
Are some values not transported right!?!? :confused:

thank you very much for your help!

ekel75


– FOUND IT !?!

I guess I found it myself, though I still don’t get why it worked before!

the list “Lst” is not actual text but a reference to text within an Indesign Document…
it looks like this
{text from character 1 to character 7 of cell id 0 of table id 853847 of cell id 26 of table id 853696 of cell id 2 of table id 853446 of text frame id 852504 of page id 852502 of spread id 852497 of document “testdoc.indd” of application “Adobe InDesign CS2”, …}
so I included this:

		set Lst to (search with find attributes {applied character style:CharStyle})
		repeat with x from 1 to (count of Lst)
			set item x of Lst to item x of Lst as text
		end repeat
		set Lst to my pruneList(Lst)

what do you think? you probably have some slick coded idea up your sleeve right as you’re reading this :wink:

greetz e/

Hi, ekel75.

I’m afraid I don’t have InDesign and am not familiar with scripting it. But it looks as though what you’re doing is the way to do it: get the references, coerce to text, sort, prune. (The pruning method relies on the fact that the texts are sorted.) If Lst contains a large number of items, a script object might speed up the coercion loop a little too:

script o
	property Lst : missing value
end script

set o's Lst to (search with find attributes {applied character style:CharStyle})
repeat with x from 1 to (count o's Lst)
	set item x of o's Lst to item x of o's Lst as text
end repeat

set theList to my quickSort(o's Lst)
set theList to my pruneList(theList)

Hey Nigel,

thank you very much for your reply! Just found it after posting my last question… :slight_smile: (** edit: deleted my last post**)
I changed the script in that way… every bit of a second counts with large lists!

  • your quicksort is a perfect example (still can’t call it from within the main script :()

my prune script is now calling the quicksort so I only call prunelist from the mainscript.
(cause I can’t prune without sorting first anyway)

… do you have a hint for my last question?! :smiley: (** edit: deleted my last post**)

helplessy yours,
ekel75

** edit
*
@Nigel:
I just noticed that your CustomQSort works as a handler, with which I’m not having problems calling from my main script!
So hoooooray! All my sort problems are gone again! Thank you sooo much!
Happy me! :smiley:
*
** end of edit
*

This algorithm and its implementation is very nice !
I am using it often, but unfortunately I am not clever enough (okay, maybe too lazy …) to understand its structure.

It should be very easy to use this algorithm as a searching routine as well. As I understood the QuickSort algortihm, it is more less nothing other like a searching routine, or it should be quite easy to modify it for this use.

Is there anybody able and so kind to tell me how to use this QuickSort Algorith as a searching routine, i.e. searching for an item, giving out the (one and only, or first) index containig it ?

Best Regards

Jens

The quicksort algorithm really isn’t just a search engine, it is making comparisons.

You don’t say what you want to search either… is it an ordered list of the same datatype?

I wonder if Jens is thinking of a binary search, which is distantly related to QuickSort in that it’s a “divide and conquer” process. In AppleScript, you use ‘is in’ to determine which half of the list the item’s in, then which half of that half, and so on. It can be faster than checking each item from the beginning of the list, but not by very much. And (if I remember correctly) it depends on the length of the list and where the item actually is in it. An item near the beginning might be found more quickly by a linear search. And of course a binary search can use a lot of memory generating the half lists for checking.

on binarySearch(theList, value)
	script o
		property lst : theList
	end script
	
	set valueAsList to {value}
	set l to 1
	set r to (count theList)
	
	if (valueAsList is in theList) then
		repeat until (value is item l of o's lst)
			set l to l + 1
			set m to (l + r) div 2
			if (valueAsList is in items l thru m of o's lst) then
				set r to m - 1
			else
				set l to m + 1
			end if
		end repeat
		return l
	else
		return 0
	end if
end binarySearch

Any speed advantage comes from the fact that ‘is in’ loops through the list(s) and compares the items by means of its underlying C code, which is faster than the equivalent instructions at the AppleScript level. It has to be sufficiently faster here that, even when it’s done several times, and with all the other AppleScript commands involved, it’s still faster than looping through the list and comparing the items with AppleScript.