Effective way of finding the median in a list of numbers

McUsrII · June 27, 2015, 2:21pm

Hello Nigel.

Shales are so much more practical than a lawn, but, a lawn has its charm.

It’s not a wifes tale, although N. Wirth just uses it to illustrate how quicksort operates. He uses the low + high div 2 like you do. However, a Professor Eric D. Demaine of MIT, highly praised the random partioning of quicks sort, then remarked he didn’t show it, but the students were supposed to have seen it in the book: Cormen “Introduction To Algorithms”. I have read through the handout for the “quick sort” lecture, and it wasn’t there either. The philosophy behind it is described by Wirth however, and that is that things fall into place faster when they are swapped over a longer distance.

I really shouldn’t speculate too much on the advantages, with random partiontiong, the elements are put faster into the right partition I guess, due to the longer swapping distance. The call tree will be more unbalanced, because the differing sizes of the partions, but maybe some middle operations are saved for all I know.

There is of course also that, hower small the random number generator is, it takes something to outperform (low+high) div 2.

Have a nice evening.

Edit

I changed the partitioning algorithm to one that swaps over longer distances, (Wirths), and now it has decreased yet another 50% in time (for 1000 elements).

It may also be, that the had a “tutorial-version” of quicksort, using the partitioning of a dataset, using a random pivot as an introduction to the subject, I don’t know that before I have the “Cormen-book” in front of me.

McUsrII · June 27, 2015, 7:31pm

Hello.

Niklaus Wirth also writes that a random pivot, that just deviates by 1 or so from the middle, is used for breaking a "worst case pattern of input, where the pivot value with cause one of the partitions to be of length 1, again, again and again, due to that median recurrently hits the largest value in the current partition.

Other than that, reading passed the quicksort theme, I found the original Find algorithm by C. A. R. Hoare, which has partioning inlined, without any recursion, or random number generated pivot. I’ll be back with that one later, at least the timing result. (Almost like the last partition implementation I posted).

Nigel_Garvey · June 28, 2015, 12:23am

Ah. That does seem to work slightly quicker than a pivot chosen randomly from the entire range, not least of all because it requires fewer instructions.

-- Random from entire range.
tell (r - l + 1) to set pivot to o's extract's item ((x0 / 1.0E+9) * it mod it div 1 + l)

-- Randomly either the middle item or the one after it.
set pivot to o's extract's item ((l + r) div 2 + x0 div 5.0E+8 mod 2)

However, on my machine, it’s still averaging marginally slower than simply going for the middle item.

Median-of-3 still produces the fastest sorts.

McUsrII · June 28, 2015, 7:30am

Hello Nigel.

I have no doubt that the median of 3 is the best approach, it is in interesting thought though, in itself, to skew the median slightly, to avoid always hitting upon the largest element in a partition, if the input is organized that way.

Here is my take on that, this approach doesn’t consider usage of insertion sort, or other means when the partitions gets small, but it doesn’t create any partitions of a single element, if it is avoidable. I use mod 3, since the median is divided by two, and makes a “pendulum”, that points to the left of the median, the original, and the item right of the median, to avoid hitting the exact same element. The right happenstance, occurs quite seldom, but it should vary between the element to the left of the median, and the original median for each halving of the partition.

(*
	Scheme for skewing a median with -1, 0 or 1 , if applicable.
	
	Idea: a mod 3 will never repeat itself, when the medians are divisible by two.
	
	So the median mod 3 will vary with values 0 - 2. the 0 result will be more seldom
	so most of the time, the median will be skewed 1 to the left, or kept at its 
	original position, but it should really vary with each call .

	We don't bother to skew however, if that creates a partition of the first element.

*)
to switch(low, high)
	set aMedian to (low + high) div 2
	return aMedian - 1 * ((aMedian - low > 2) as integer) * ((aMedian) mod 3 - 1)
end switch

Nigel_Garvey · June 28, 2015, 1:42pm

Hi McUsrII.

I think you’re confusing a few ideas here:

Median. The value sought in a median value search. At each recursion, it’s known to be one of the values in the current range, but not which. It has no relevance in Quicksort per se, apart from being the optimum pivot in the first round of partitioning.
Median position. My term for the position the median value would occupy if the list were sorted. This is a known entity. It’s the middle position (or one of the two middle positions) in the entire range, but not necessarily so in the subranges handled by the recursions.
Pivot value. The value in the current subrange around which partitioning is based.
Position from which the current pivot value is taken. Depends on the pivot-choice method used.
Positions where instances of the current pivot end up after partitioning. Depends on the pivot value relative to the others in the current subrange.

It very often happens during partitioning that the left and right indices cross and stop two positions apart. This is because they’ve met on an instance of the pivot value and have caused it to be “swapped” with itself. They’ve now stopped either side of the pivot instance and it’s in its final position in the sort.

This is what’s exploited in a median value search. When the left and right indices stop either side of the median position, whatever’s in that position must be an instance of the pivot equal to the median value. And because of the way recursion’s applied up till that point (ie. to the subranges containing the median position, even when down to only one item), the situation must eventually occur.

So unlike in a full Quicksort, where the optimum pivot value is the median value of the current subrange, the optimum pivot in a median value search is the median of the entire range. We don’t necessary want to avoid extreme pivot values in the subranges because one of them’s likely to be the median value anyway. Obviously it’s still a worst-case scenario if you consistently pick the wrong one, but hitting the right one greatly speeds up homing in on the final result. Perhaps a good system would be to use the higher of two (or three) values when subsorting a left partition or the lower of two (or three) when subsorting a right.

McUsrII · June 28, 2015, 5:43pm

Hello Nigel.

My idea was solely, to avoid picking the one that lead to the “wrong” result, over and over, due to some freak pattern in the input, that coincides with the halving. And the code as such, was just meant to illustrate what I meant, as I grasp that sometimes something like that is used, as an “insurance” against the worst case scenario. I really meant to median position, and not the median, as you describe it. The two terms are often used interchangeably though. Maybe I should look over my use of pivot, and pivotvalue, I seem to use pivot for both of those as well. I do reckognize the importance, to communicate such subjects in an unambigous way, and will take greater care for the future.

It’s been a busy day, I’ll post the next version of the median handler: find/quickselect by the afternoon.

By the way, now that a large part of the world that doesn’t write from left to right parttake in programming, wouldn’t it be better to name the the start and end of a range, consistently with “start” and “end” instead of “left” and “right”. This is no critique of you of course, I came to think of it the other day, and felt just that the general convention of using “left” and “right” to denote the start and end of a range, is a bit outdated by now.

Nigel_Garvey · June 28, 2015, 7:19pm

Hah! Having been under the impression that the only dialect of AppleScript since Mac OS 9.0 has been “International English”, I switched my system over to Arabic just now to see what things would look like in Script Editor. I was greeted with the ridiculous spectacle of script texts being in English and written from left to right, but aligned hard against the right margins without indentation. Lists were indeed ordered from right to left instead of vica versa ” but only in the Result pane. Typing and running ‘set l to {1, 2, 3, 4, 5}’ returned ‘{5, 4, 3, 2, 1}’.

But in any case, since I’m unlikely at my age to write about scripts in anything other than a western European language ” specifically British English ” I think I’ll stick with “left” and “right” where convenient to avoid the linguistic acrobatics that would be required otherwise.

McUsrII · June 28, 2015, 7:32pm

Hello Nigel.

I didn’t mean you, nor just AppleScript, I meant that the whole landscape for using languages has changed, since left or right was introduced, probably sometime in the 1950’s. And I wasn’t after how things are written, but the “mental model” of it. An Arabic, or Chinese person, people from other regions, may start from the right side, and therefore think of the left, as the end of it. Hopefully code is written in English, since that is the Latin of our day. But sticking with a western language, doesn’t mean that we shouldn’t adapt global mental-models we all can share.

Left and right is pretty established by now, and I think it is at least impractical to change it by now. But new algorithms and api’s should probably consider such issues, along with the fact that we now support writing and reading from right to left, in order to provide as much of “wholeness” as possible, making the environment we live and breathe in as friendly as possible on a global scale.

I mentioned it, in general, because it touched the theme of naming things unambiguosly. I actually pondered writing a blog post about it, but dismissed it, as being too “nerdy”.

McUsrII · June 28, 2015, 10:00pm

Hello.

Here is the last and best version of the “find k’th least element”, that can be used for finding the median of a list, probably in order to pick the root element when building a binary tree in order to keep it balanced.

Hoare’s original version, beat the previous version with almost 50% -again.


property scriptTitle : "Median Hoare Version"
set maxitems to 1000

script pseudo
	-- pure multiplicative random generator.
	-- Its  faster most of the time than Standard Additions random number  (10000 runs).
	-- it has been  proved that the sequence doesn't repeat itself before 2^31 -2 calls have been made.
	-- yet: No warranties about anything. 
	-- Rosen "Discrete Mathematics" p. 207
	property x0 : 3
	property rmod : 2 ^ 31 - 1
	property multiplier : 7 ^ 5
	
	on rand()
		set x0 to (multiplier * x0) mod rmod
		return (x0 / 1.0E+9)
	end rand
	on init()
		rand()
	end init
end script

pseudo's init()
set l to {}
repeat maxitems times
	set end of l to rand(1, maxitems)
end repeat
set max to 0
repeat with i from 1 to maxitems
	if item i of l > max then set max to item i of l
end repeat
log "max: " & max

set min to (maxitems + 1)
repeat with i from 1 to maxitems
	if item i of l < min then set min to item i of l
end repeat
log "min: " & min

set ll to length of l
set k to ll div 2 + 1 * (((ll mod 2) > 0) as integer)

set t0 to (current date)
if 1 = 1 then
	repeat 1000 times
		set md to find(l, k)
	end repeat
else
	set md to find(l, k)
end if
set t1 to ((current date) - t0) / 1000
log "time T : " & t1
log "median val: " & md
set t0 to (current date)
set m to countingsort(l, max)
set t1 to ((current date) - t0) / 1000
log "time T : " & t1

log "low: " & item (k - 1) of m
log "median: " & item k of m
log "high: " & item (k + 1) of m


(*
	This version uses at most 2n comparisions in the average case ( O(n) )
	and O(n^2) comparisions in worst case. Probability of a worst case is
	greater than ( 1 / n! ) and *much* lesser than 1/n.
*)

on find(|L|, k)
	-- returns the k'th least element of a list. 
	-- By C.A.R Hoare in an article. Implemented in Pascal by N. Wirth
	-- Adapted to AppleScript by McUsr.
	script o
		property A : |L|
	end script
	
	set low to 1
	set high to length of |L|
	
	repeat while high > low
		set x to item k of o's A
		set i to low
		set j to high
		repeat while i â‰¤ j
			repeat while x > item i of o's A
				set i to i + 1
			end repeat
			repeat while item j of o's A > x
				set j to j - 1
			end repeat
			if i â‰¤ j then
				set w to item i of o's A
				set item i of o's A to item j of o's A
				set item j of o's A to w
				set i to i + 1
				set j to j - 1
			end if
		end repeat
		if j < k then set low to i
		if k < i then set high to j
	end repeat
	return item k of o's A
end find

on rand(low, high)
	-- returns  a random integer, within low and high inclusive
	-- Made by McUsr
	set k to high - low + 1
	if 0 = 1 then
		return ((random number) * k mod k div 1 + low)
	else
		return ((pseudo's rand()) * k mod k div 1 + low)
	end if
end rand

on countingsort(|L|, k)
	--    countingsort: origin unknown. found it in an algorithms lecture from MIT
	--    Implemented in AppleScript by McUsr 2015/6/24
	script o
		property l : |L|
		property C : missing value
		property Cp : missing value
		property B : missing value
	end script
	set ll to length of o's l
	copy |L| to o's C
	repeat with i from 1 to ll
		set item i of o's C to 0
	end repeat
	
	repeat with i from 1 to ll
		set item (item i of o's l) of o's C to (item (item i of o's l) of o's C) + 1
	end repeat
	copy o's C to o's Cp
	
	repeat with i from 2 to k
		set item i of o's Cp to (item (i - 1) of o's Cp) + (item i of o's C)
	end repeat
	copy o's C to o's B
	
	repeat with j from ll to 1 by -1
		tell item j of o's l
			set item (item it of o's Cp) of o's B to it
			set item it of o's Cp to (item it of o's Cp) - 1
		end tell
	end repeat
	return o's B
end countingsort

Edit
Removed some debugging info.
Incorporated the fast random number generator, just for the hell of it.

Nigel_Garvey · June 29, 2015, 10:01pm

Here’s a different approach by way of experiment. It counts how many values are less than and greater than each value in the range. When neither count is more than the half the length of the range, the current value is either the median value or one of the two which have to be averaged to get it. The huge cost of comparing every value with all the others is mitigated by only doing it with values lying between contracting limits which have proved too high or too low earlier in the process.

The script here actually takes two or three times as long as my other one, but the difference only becomes noticeable with about three thousand or so values.

Edit: Previous bug fix replaced with a more satisfactory compromise. Also some reworking of comments and a couple of variable name changes.
Further Edit: It turns out that a simpler fix for the bug was simply to leave out the aspect of the “limits” idea which led to it! :rolleyes: The preliminary sample of lowest and highest values is no longer carried out. There’s no discernable effect on performance.

(* Find the median of the values (presumed to be integers) in a range of a list.
By Nigel Garvey 2015.
With odd numbers of items, the median value's the one which would be in the middle if the range were sorted. With even numbers of items, it's the average of the two middle values.

Parameters: (list, range index 1, range index 2)
*)

on medianValue(theList, l, r)
	script o
		property lst : theList
	end script
	
	-- Process the range parameters.
	set listLen to (count theList)
	if (l < 0) then set l to listLen + l + 1
	if (r < 0) then set r to listLen + r + 1
	if (l > r) then set {l, r} to {r, l}
	if ((l < 1) or (r > listLen)) then error "Duff range parameters!" -- Compose as required.
	
	set rangeLen to r - l + 1
	set halfRangeLen to rangeLen div 2
	set oddLen to (rangeLen mod 2 is 1)
	-- Limits to be set and tightened as the search discovers what's too low or too high.
	set tooLow to missing value
	set tooHigh to missing value
	-- Result variables.
	set result1 to missing value
	set result2 to missing value
	
	-- The search takes each value which isn't known to be too low or too high and checks how many of the other values are higher than it and how many are lower. When neither count is more than half the length of the range, the current value is either the median or one of the two values which must be averaged to get it.
	repeat with i from l to r
		set iVal to item i of o's lst
		if (((tooLow is missing value) or (iVal > tooLow)) and ((tooHigh is missing value) or (iVal < tooHigh))) then
			-- This value's within the current limits. Compare it with all the other values in the range.
			set lowerCount to 0
			set higherCount to 0
			repeat with j from l to r
				if (j is i) then
					-- Don't compare it with itself!
				else
					set jVal to item j of o's lst
					-- Increment the lower or higher count (or do nothing) accordingly.
					-- (It's slightly faster to complete the counts and check them afterwards than it is to check after each increment in the hope of finishing early.)
					if (jVal < iVal) then
						set lowerCount to lowerCount + 1
					else if (jVal > iVal) then
						set higherCount to higherCount + 1
					end if
				end if
			end repeat
			
			-- Now act on the count results.
			if (lowerCount > halfRangeLen) then
				-- Over half the other values are lower than this one, so it's too high.
				set tooHigh to iVal
			else if (higherCount > halfRangeLen) then
				-- Over half the other values are higher, so it's too low.
				set tooLow to iVal
			else if (result1 is missing value) then
				-- This value's the first hit. Log it and, if the range is an odd length, end the search.
				set result1 to iVal
				if (oddLen) then exit repeat
			else if (iVal is result1) then
				-- This value's the same as the first hit. Ignore it.
			else
				-- This value's the second hit. Log it and end the search.
				set result2 to iVal
				exit repeat
			end if
		end if
	end repeat
	
	-- Return the first hit if there's only one; otherwise the average of the two, as an integer if possible.
	if (result2 is missing value) then return result1
	set |median| to (result1 + result2) / 2
	tell |median| as integer to if (it is |median|) then set |median| to it
	return |median|
end medianValue

--(* Demo:
set l to {}
repeat with i from 1 to 6
	set end of my l to (random number 1000)
end repeat

log l
-- Find the median of values 1 thru -1 of l.
set m to medianValue(l, 1, -1)

McUsrII · June 30, 2015, 8:38pm

Hello Nigel.

Your last version, looks interesting too. I think that there is something proportional to log n, that makes the difference kick in noticably at 3000 items.

Well, I didn’t come around to it today, but I’m going to present two O(n) versions, at least over some days, so it works properly when I post it. The first is a cheat, and based on CountingSort above, the other is the median of medians algorithm, which is the real deal, that also has a worstcase of O(n). The problem with that solution as I see it, is that you have to calibrate the algorithm with the number of items, as I think you need at least 25 elements for the median of medians to work. Well, N.Wirths says that if you have less than or equal to 10 items then you should sort and pick the median from the sorted elements. I am also a bit eager to use the median for building a complete binary tree, but I haven’t figured yet, if that is possible or a feasible way to do it. (But then I really need the list to be sorted anyway.)

McUsrII · July 1, 2015, 7:31am

Hello.

I came by this little random number generator, that promises to generate random numbers, perceived as random by humans, (not totally uniformly distributed). Which may be a technique suitable sometimes, when needing random number s for something. This random number generator, have been suitable for doom, so it should be suitable for small AppleScript games. You can read all about it in the links in the code.

script perlinsRand
	-- https://github.com/id-Software/DOOM/blob/master/linuxdoom-1.10/m_random.c
	-- https://news.ycombinator.com/item?id=9809998
	property randInts : {0, 8, 109, 220, 222, 241, 149, 107, 75, 248, 254, 140, 16, 66, ¬
		74, 21, 211, 47, 80, 242, 154, 27, 205, 128, 161, 89, 77, 36, ¬
		95, 110, 85, 48, 212, 140, 211, 249, 22, 79, 200, 50, 28, 188, ¬
		52, 140, 202, 120, 68, 145, 62, 70, 184, 190, 91, 197, 152, 224, ¬
		149, 104, 25, 178, 252, 182, 202, 182, 141, 197, 4, 81, 181, 242, ¬
		145, 42, 39, 227, 156, 198, 225, 193, 219, 93, 122, 175, 249, 0, ¬
		175, 143, 70, 239, 46, 246, 163, 53, 163, 109, 168, 135, 2, 235, ¬
		25, 92, 20, 145, 138, 77, 69, 166, 78, 176, 173, 212, 166, 113, ¬
		94, 161, 41, 50, 239, 49, 111, 164, 70, 60, 2, 37, 171, 75, ¬
		136, 156, 11, 56, 42, 146, 138, 229, 73, 146, 77, 61, 98, 196, ¬
		135, 106, 63, 197, 195, 86, 96, 203, 113, 101, 170, 247, 181, 113, ¬
		80, 250, 108, 7, 255, 237, 129, 226, 79, 107, 112, 166, 103, 241, ¬
		24, 223, 239, 120, 198, 58, 60, 82, 128, 3, 184, 66, 143, 224, ¬
		145, 224, 81, 206, 163, 45, 63, 90, 168, 114, 59, 33, 159, 95, ¬
		28, 139, 123, 98, 125, 196, 15, 70, 194, 253, 54, 14, 109, 226, ¬
		71, 17, 161, 93, 186, 87, 244, 138, 20, 52, 123, 251, 26, 36, ¬
		17, 46, 52, 231, 232, 76, 31, 221, 84, 37, 216, 165, 212, 106, ¬
		197, 242, 98, 43, 39, 175, 254, 145, 190, 84, 118, 222, 187, 136, ¬
		120, 163, 236, 249}
	property randEntries : length of randInts
	property curRand : 0
	on clearRnd()
		set curRand to 0
	end clearRnd
	on next()
		if curRand < randEntries then
			set curRand to curRand + 1
		else
			set curRand to 1
		end if
		return item curRand of randInts
	end next
end script


perlinsRand's next()

Nigel_Garvey · July 1, 2015, 9:01am

Thanks, McUsrII. I must admit that, despite having implemented many sorts in AppleScript over the years, I’ve never got the hang of expressions like “log n” and “O(n)”. I think it’s because they didn’t convey (to me) why things behaved they way they did ” especially in a high-level language like AppleScript, where as much depends on individual command implementations and data characteristics as on the algorithms themselves. But I think it’s about time I looked into “n” notation more thoroughly!

DJ_Bazzie_Wazzie · July 1, 2015, 9:47am

True, something that keeps AppleScript reminding me every day. It’s never the obvious solution which in some sort of way AppleScript completely misses its targets.

McUsrII · July 1, 2015, 12:31pm

Hello Nigel.

I don’t really ever saw Big O, or Big Theta or Omega as anything but as a proportion as to how the processing time increases, as the dataset evolves. It is a very coarse measure in itself, just giving the order of magnitude, and it really isn’t anything to worry that much about. It is really just one of the trade-off factors. Readability, maintainability, sheer size of the code and so on, being others. Correctness is of course not a trade-off. I seldom do work on so big datasets in AppleScript, that I have to worry about it, since it doesn’t make sense, unless you vary wildly in the size of the datasets.

In this particular case, it may slow down because the handlers has a complexity of n log n that is, that the processing time increases more than the number of elements (linearly). Then the second handler may be slower in a part of the algorithm, which maybe has a log N complexity by itself, (the inner loop), that makes up for the slow increase in time. I guess this since the time doesn’t increase noticably before 3000 items. Your last algorithm is just a fraction slower, than your previous, and that fraction adds up over time, I actually don’t think it is even a log N increase that is involved, but something that adds up slower, maybe log 1/N, as just a wild guess, not having really calculated anything, not even timed it. Just timing both for say 1000, 2000, and 3000 items, and comparing the differences, should give a good clue as to how they develop in propertions, as the number of items increases.

@DJ Bazzie Wazzie. That is also a good point, But big O notation, and complexity analysis, still gives a little hint about thing may progress, or how they should have progressed, hadn’t the number of items in the list started to degenerate performance by itself for instance, due to number of items in a list larger than some treshold. At least we are more informed when we look it up, than not. -Big Oh notation, provides some guidance, nothing more, but some guidance, is better than none. The more info, the more educated the guess.

Complexity analysis is a huge subject, I personally think it is ok, to be familiar with, but not dwell to deep into, because not before long, you are reading up on finite fields, and the law of total probability, (besides starting to remember arithmetic and geometric progressions, and so on). But well, it has it’s charm as well I guess.

Edit

They say that the longer you write about something, the less you really know about it. At least I know that MIT has an open courseware class on the net in Discrete Mathematics, which is a precursor to their open courseware class “Introduction to Algorithms”. You really need to know the Discrete Mathematics, to get something out of that Algorithms course. Lots of video lectures, and readings online, and even notes at least from the recitations. You can pick and choose from whatever lectures you want to watch. The lectureres, are really aces, but I tend to shrink a little in my chair, when they enthusiastically talks about “very cool mathematics”.

McUsrII · July 1, 2015, 6:54pm

Hello.

Here is the “cheat” version for finding the median, the reason for this, is that I actually work with a lot of lists those day, where the max element is bounded by the length of the list. That justifies this very special version, -that performs in linear time. O(N). It also sorts the list, which is a necessity, when finding the median with this version.

set thel to {4, 1, 3, 4, 3, 5, 1}
set soughtK to 4 -- median for 7 items is ceil (7 / 2 ) = 4 
countingsortAndMedian(thel, 5, soughtK)
on countingsortAndMedian(|L|, maxval, k)
	--    countingsort: origin unknown. found it in an algorithms lecture from MIT
	--    Implemented in AppleScript by McUsr 2015/6/24
	script o
		property L : |L|
		property C : missing value
		property Cp : missing value
		property B : missing value
	end script
	set ll to length of o's L
	copy |L| to o's C
	repeat with i from 1 to ll
		set item i of o's C to 0
	end repeat
	
	repeat with i from 1 to ll
		set item (item i of o's L) of o's C to (item (item i of o's L) of o's C) + 1
	end repeat
	copy o's C to o's Cp
	
	repeat with i from 2 to maxval
		set item i of o's Cp to (item (i - 1) of o's Cp) + (item i of o's C)
	end repeat
	copy o's C to o's B
	
	repeat with j from ll to 1 by -1
		tell item j of o's L
			set item (item it of o's Cp) of o's B to it
			set item it of o's Cp to (item it of o's Cp) - 1
		end tell
	end repeat
	return {o's B, item k of o's B}
end countingsortAndMedian

I should also be able to build a complete binary tree, while looping over length/2 elements of the sorted array, after having inserted the root element as its root, and populated it with the elements on each side of the median as its children so that I can end up with a complete binary tree, should I wish to. (I see no use for this at the moment, but it may come in handy). The binaray tree, would however be a “read-only” binary tree, in that it would be quite costly to maintain, compared to the cost of re-building it, which should be almost as fast with a small number of items.

technomorph · July 15, 2018, 1:52am

There’s also a vanilla way:

use framework "Foundation"

on medianOfList:theList
	set anNSExpression to current application's NSExpression's expressionForConstantValue:theList
	set newNSExpression to current application's NSExpression's expressionForFunction:"median:" arguments:{anNSExpression}
	return (newNSExpression's expressionValueWithObject:(missing value) context:(missing value)) as real
end medianOfList:

Just don’t ask me to explain it…

is there a way to use this to find Maximum and Minimum values in a list?

thanks

Nigel_Garvey · July 15, 2018, 8:36am

Hi.

Substituting “max:” or “min:” for “median:” seems to work:

use AppleScript version "2.4" -- Mac OS 10.10 (Yosemite) or later.
use framework "Foundation"
use scripting additions

on minAndMaxFromList:theList
	set anNSExpression to current application's NSExpression's expressionForConstantValue:theList
	
	set newNSExpression to current application's NSExpression's expressionForFunction:"min:" arguments:{anNSExpression}
	set minVal to (newNSExpression's expressionValueWithObject:(missing value) context:(missing value))
	
	set newNSExpression to current application's NSExpression's expressionForFunction:"max:" arguments:{anNSExpression}
	set maxVal to (newNSExpression's expressionValueWithObject:(missing value) context:(missing value))
	
	return (current application's NSDictionary's dictionaryWithObjects:{minVal, maxVal} forKeys:{"min", "max"}) as record
end minAndMaxFromList:

set theList to {}
repeat 10 times
	set end of theList to (random number 20)
end repeat

{theList, my minAndMaxFromList:theList}

But this is slightly simpler. It doesn’t work for medians:

use AppleScript version "2.4" -- Mac OS 10.10 (Yosemite) or later.
use framework "Foundation"
use scripting additions

on minAndMaxFromList:theList
	set anNSArray to current application's NSArray's arrayWithArray:theList
	
	set minVal to (anNSArray's valueForKeyPath:"@min.self")
	set maxval to (anNSArray's valueForKeyPath:"@max.self")
	
	return (current application's NSDictionary's dictionaryWithObjects:{minVal, maxval} forKeys:{"min", "max"}) as record
end minAndMaxFromList:

set theList to {}
repeat 10 times
	set end of theList to (random number 20)
end repeat

{theList, my minAndMaxFromList:theList}

Marc_Anthony · July 16, 2018, 3:30am

I wrote this script in April 2015 to help me with some homework in a statistics class. Perhaps it’s a relevant option.

set valueList to {59, 63, 59, 59, 60, 59, 63}
set valueList to sort(valueList)
set counted to count valueList

–Mean (average)
set mean to {}
set endvalue to 0
repeat with aVal in valueList
set endvalue to endvalue + aVal
end repeat
set mean to endvalue / (counted)

–Median (middle value)
set median to {}
if counted mod 2 = 1 then
set median’s end to valueList’s middle item --odd
else
set median’s end to ((valueList’s middle item) + (valueList’s reverse’s middle item)) / 2 --even
end if

–Mode (most frequent occurence(s))
set isMode to {}
set traversalPast to {}
repeat with anItem from 1 to count valueList
set counter to 0
repeat with another in valueList
if valueList’s item anItem = another’s contents then set counter to counter + 1
end repeat
if counter > 1 and traversalPast does not contain valueList’s item anItem then set isMode’s end to valueList’s item anItem & “:” & counter & "x, "
set traversalPast’s end to valueList’s item anItem
end repeat
if isMode is {} then set isMode to “None”

–Midrange (difference between 1st and last sorted values)
set midrange to ((valueList’s item -1) + (valueList’s item 1)) / 2

"Min: " & valueList’s item 1 & " | Max: " & valueList’s item -1 & return & ¬
"Mean: " & mean & return & ¬
"Median: " & median & return & ¬
"Mode: " & isMode & return & ¬
"Midrange: " & midrange & return & ¬
"RANGE: " & ((valueList’s item -1) - (valueList’s item 1))

on sort(thelist)
set AppleScript’s text item delimiters to linefeed
set new_string to (do shell script “echo " & (thelist as text)'s quoted form & " | sort -g”)'s paragraphs
end sort

Nigel_Garvey · July 16, 2018, 12:06pm

Hi Marc.

It seems to do the job, despite the implicit coercion of isMode to text at the end being influenced by the TIDs set in the sort handler and the sorted list containing text! I wasn’t familiar with “mode” in the statistical sense before. From what I’ve been reading about it this morning, it should be either the most frequently occurring value in the data or the co-most frequently occurring values. The interpretation in your script isn’t quite the same.

Once the list is sorted, of course, it’s only necessary to have one repeat, and there are a few other optimisations which can be made.

set valueList to {}
repeat 10 times
	set end of valueList to random number 15
end repeat

if (valueList is {}) then
	set {minValue, maxValue, |mean|, |median|, isMode, midrange, range} to {"None", "None", "None", "None", "None", "None", "None"}
else
	set sortedList to sort(valueList) -- The result's a list of numeric texts, but it doesn't matter here.
	log sortedList
	set listLength to (count sortedList)
	
	-- Initialise various variables to the first item in the sorted list.
	set minValue to sortedList's beginning -- Minimum value.
	set thisValue to minValue -- Preset in case there's only one value and the repeat below isn't executed.
	set sumOfValues to minValue -- Sum for the calculation of the average.
	set valueBeingCounted to minValue -- The value whose occurrences are currently being counted.
	set counter to 1 -- The number of times it's occurred so far.
	set currentMode to {valueBeingCounted} -- The value(s) with the highest occurrence count so far.
	set highestOccurrenceCount to 1 -- The number of times they've occurred.
	-- Work through the rest of the sorted list, updating the sum of the values and the mode data.
	repeat with i from 2 to listLength
		set thisValue to item i of sortedList
		set sumOfValues to sumOfValues + thisValue
		if (thisValue is valueBeingCounted) then
			set counter to counter + 1
		else
			if (counter > highestOccurrenceCount) then
				set currentMode to {valueBeingCounted}
				set highestOccurrenceCount to counter
			else if (counter = highestOccurrenceCount) then
				set end of currentMode to valueBeingCounted
			end if
			set valueBeingCounted to thisValue
			set counter to 1
		end if
	end repeat
	-- If necessary, update the mode from the count in progress at the end of the repeat
	if (counter > highestOccurrenceCount) then
		set currentMode to {valueBeingCounted}
		set highestOccurrenceCount to counter
	else if (counter = highestOccurrenceCount) then
		set end of currentMode to valueBeingCounted
	end if
	
	-- The maximum value is the last that was fetched from the list.
	set maxValue to thisValue
	--Mean (average)
	set |mean| to sumOfValues / listLength
	--Median (middle value or average of two middle values)
	set m to (1 + listLength) div 2
	set |median| to item m of sortedList
	if (listLength - m = m) then set |median| to (|median| + (item (m + 1) of sortedList)) / 2 -- even number of items.
	--Mode (most frequently occurring value(s))
	if (highestOccurrenceCount = 1) then set currentMode to minValue
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to ", "
	set isMode to "{" & currentMode & "}:" & highestOccurrenceCount & "x"
	set AppleScript's text item delimiters to astid
	--Midrange (average of 1st and last sorted values)
	set midrange to (maxValue + minValue) / 2
	--Range (difference between 1st and last sorted values)
	set range to maxValue - minValue
end if

"Min: " & minValue & " | Max: " & maxValue & return & ¬
	"Mean: " & |mean| & return & ¬
	"Median: " & |median| & return & ¬
	"Mode: " & isMode & return & ¬
	"Midrange: " & midrange & return & ¬
	"RANGE: " & range


on sort(thelist)
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to linefeed
	set sortedList to (do shell script "echo " & (thelist as text)'s quoted form & " | sort -g")'s paragraphs
	set AppleScript's text item delimiters to astid
	return sortedList -- List of text.
end sort

Shane’s ASObjC code quoted in post #32 can be adapted to return the mode of a list by simply replacing “median:” with “mode:” and changing the coercion at the end to ‘as list’. In this case, the “mode” returned is just a list of the most frequently occurring value(s), with no indication of how often they occur. If there’s only one instance of each value in a list, the result is a list containing just the lowest value. The script above also returns the lowest value instead of “None”, but I’ve no opinion about which is better.

use framework "Foundation"
use scripting additions

on modeOfList:theList
	set anNSExpression to current application's NSExpression's expressionForConstantValue:theList
	set newNSExpression to current application's NSExpression's expressionForFunction:"mode:" arguments:{anNSExpression}
	return (newNSExpression's expressionValueWithObject:(missing value) context:(missing value)) as list
end modeOfList:

set theList to {}
repeat 10 times
	set end of theList to (random number 15)
end repeat

{theList, my modeOfList:theList}