Position in list

If i have list like this:

set mylist to {"first", "second", "third"}

I want to know what is position of “second” in list and it returns 2.

This should be happend fast, so hopely there is no need for loops.

No chance without a loop.

If you have more than ca. 20 list elements, a binary search is the fastest way

I’d just found this on the apple site:-

set this_list to {"Sal", "Sue", "Bob", "Carl"}
list_position("Sal", this_list)
--> 4

on list_position(this_item, this_list)
	repeat with i from 1 to the count of this_list
		if item i of this_list is this_item then return i
	end repeat
	return 0
end list_position

Regards,

Nick

that’s from AppleScript Guidebook: Essential Sub-Routines http://www.apple.com/applescript/guidebook/sbrt/pgs/sbrt.13.htm

Yeah, it is.
I think the first name in the list was a bit of a giveaway.

it took me 4evah 2 find that link;-)

I have two long lists of text items which I will have to search repeatedly and find the positions of specific phrases. Could anyone tell me how would I do a binary search to accomplish this?

Also, I see people here saying that the “list_position” routine cited on Apple’s site, with its repeat loop, is the best way to go. Is that true with long lists? My lists are ordered and it seems like extra CPU to look at every item in them.

Thanks,

Gary

This is a binary search routine by Nigel Garvey that doesn’t require that the list be ordered but uses “is in” instead. It returns the index of the item in the list or 0 if the item is not found.

set L to {"alpha", "beta", "gamma", "delta", "epsilon", "zeta", "eta", "theta", "iota", "kappa", "lambda", "mu", "nu", "xi", "omicron", "pi", "rho", "sigma", "tau", "upsilon", "phi", "chi", "psi", "omega"}
set I to binarySearch(L, V)

on binarySearch(theList, theValue)
	script o
		property lst : theList
	end script
	
	set theValueAsList to {theValue}
	set L to 1
	set r to (count o's lst)
	
	if (theValueAsList is in theList) then
		repeat until (theValue is item L of o's lst)
			set L to L + 1
			set m to (L + r) div 2
			if (theValueAsList is in items L thru m of o's lst) then
				set r to m - 1
			else
				set L to m + 1
			end if
		end repeat
		return L
	else
		return 0
	end if
end binarySearch

Another alternative is a script by StefanK for binary searching in this thread which requires an ordered list.

set value to "g"
set theList to {"a", "b", "c", "d", "e", "f", "g", "h"}
display dialog ((BinarySearch(theList, value)) as string)

on BinarySearch(theList, value)
	set {low, high} to {1, count theList}
	if {value} is not in theList then return false
	copy high to lasth
	set high to low + ((high - low) div 2)
	repeat until high < low
		if {value} is in items low thru high of theList then
			if high = low then exit repeat
			copy high to lasth
			set high to low + ((high - low) div 2)
		else
			set low to high + 1
			set high to lasth
		end if
	end repeat
	return low
end BinarySearch

If your lists are ordered and very long, using a script object will speed StefanK’s script up. Note that it takes much longer to build the list than to find the index of the value.

set theList to {}
repeat with k from 1 to 10000
	set theList's end to "ABC" & k & "D"
end repeat
set value to "ABC5036D"
display dialog "List built, ready for search" giving up after 2

display dialog ((BinarySearch(theList, value)) as string)

on BinarySearch(theList, value)
	script B
		property L : theList
	end script
	set {low, high} to {1, count B's L}
	if {value} is not in B's L then return false
	copy high to lasth
	set high to low + ((high - low) div 2)
	repeat until high < low
		if {value} is in items low thru high of B's L then
			if high = low then exit repeat
			copy high to lasth
			set high to low + ((high - low) div 2)
		else
			set low to high + 1
			set high to lasth
		end if
	end repeat
	return low
end BinarySearch

Has anyone tried using do shell script?

on getListIndex(needle, haystack)
	-- Change haystack into text
	set ASTID to AppleScript's text item delimiters
	set AppleScript's text item delimiters to ASCII character 10
	set haystack to "" & haystack
	
	-- Escape special characters in needle
	repeat with thisItem in {"\\", "*", "[", "]"}
		set AppleScript's text item delimiters to thisItem
		set needle to text items of needle
		
		set AppleScript's text item delimiters to "\\" & thisItem
		set needle to "" & needle
	end repeat
	set AppleScript's text item delimiters to ASTID
	
	-- It would be safer to write to a file instead of using `echo`
	do shell script "echo " & quoted form of haystack & " | /usr/bin/grep --line-number --line-regexp --max-count=1 " & quoted form of needle & " | /usr/bin/cut -d : -f 1"
	return result as integer
end getListIndex

getListIndex("b", {"a", "b", "c"})
--> 2

-- getListIndex("anteater", paragraphs of (do shell script "/usr/bin/look a"))
--> 9271

Interestingly, Bruce, using Nigel Garvey’s “lotsa” for 500 runs reveals that the shell script version is 1.6 times faster than the version of Stephan’s with the script object in it for a list of 10000 entries, searching for an index at 5036.

Warning: this takes nearly 3 minutes to run

set Ratios to lotsa(500) -- the number of runs

on lotsa(many)
	-- Any preliminary or common values here.
	set theList to {}
	repeat with k from 1 to 10000
		set theList's end to "ABC" & k & "D"
	end repeat
	set value to "ABC5036D"
	-- Dummy loop to absorb a small observed
	-- time handicap in the first repeat.
	repeat many times
	end repeat
	
	-- Test 1.
	set t to GetMilliSec
	repeat many times
		-- First test code or handler call here.
		my getListIndex(value, theList)
	end repeat
	set t1 to ((GetMilliSec) - t) / 1000
	
	-- Test 2.
	set t to GetMilliSec
	repeat many times
		-- Second test code or handler call here.
		my BinarySearch(theList, value)
	end repeat
	set t2 to ((GetMilliSec) - t) / 1000
	
	-- Timings.
	return {t1, t2, t1 / t2, t2 / t1}
end lotsa

on getListIndex(needle, haystack)
	-- Change haystack into text
	set ASTID to AppleScript's text item delimiters
	set AppleScript's text item delimiters to ASCII character 10
	set haystack to "" & haystack
	
	-- Escape special characters in needle
	repeat with thisItem in {"\\", "*", "[", "]"}
		set AppleScript's text item delimiters to thisItem
		set needle to text items of needle
		
		set AppleScript's text item delimiters to "\\" & thisItem
		set needle to "" & needle
	end repeat
	set AppleScript's text item delimiters to ASTID
	
	-- It would be safer to write to a file instead of using `echo`
	do shell script "echo " & quoted form of haystack & " | /usr/bin/grep --line-number --line-regexp --max-count=1 " & quoted form of needle & " | /usr/bin/cut -d : -f 1"
	return result as integer
end getListIndex

on BinarySearch(theList, value)
	script B
		property L : theList
	end script
	set {low, high} to {1, count B's L}
	if {value} is not in B's L then return false
	copy high to lasth
	set high to low + ((high - low) div 2)
	repeat until high < low
		if {value} is in items low thru high of B's L then
			if high = low then exit repeat
			copy high to lasth
			set high to low + ((high - low) div 2)
		else
			set low to high + 1
			set high to lasth
		end if
	end repeat
	return low
end BinarySearch

Actually, Stefan’s handler works with unordered lists too. :slight_smile:

I can’t try that one at the moment as I’m not in the same place as my G5. In Jaguar on my PowerBook, it always returns 0 (because the shell script result’s always “”). One thing that’s immediately obvious is that it’ll only work with text or with items that can be unambiguously coerced to text. Something similar can be done with vanilla:

on getListIndex(needle, haystack)
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to return
	set h to (return & haystack & return)
	set AppleScript's text item delimiters to return & needle & return
	set o to (count paragraphs of text item 1 of h)
	set AppleScript's text item delimiters to astid
	if (o > (count haystack)) then set o to 0
	
	return o
end getListIndex

getListIndex("b", {"a", "b", "c"})
--> 2

set z to paragraphs of (do shell script "/usr/bin/look a")
getListIndex("anteater", z)
--> 9271

By my casual observation, that’s slightly faster.