Get record position in list of records, or quickly search for record

I’ve read through many posts on records and quickly searching big lists or finding the position/index of an item in a long list. However, I can’t figure out a good way to do what I need to do (without just using a repeat loop to step through the whole list of records).

I have a list of records such as this:


{num:8808, datetime:"2020-07-24T14:32:00", caption:"Middle of nowhere", lat:"57.59631", lon:"-13.68732", notes:"Photographer: Jane Smith", lens:"Fujinon Super EBC 23mm"}

The num property can be used to match a file (e.g. DSCF8808.jpg), and when I’m processing that file, I need to access the record. So I need a quick way of locating the record so I can then access the other properties. In an ideal world, I would be able to do the following:


set vPhotoDetails to vPhotoDetails & {num:8808, datetime:"2020-07-24T14:32:00", caption:"Middle of nowhere", lat:"57.59631", lon:"-13.68732", notes:"Photographer: Jane Smith", lens:"Fujinon Super EBC 23mm"}

set vFileIndex to hGetFileIndex("DSCF8808.jpg") --> vFileIndex = 8088

set vPhotoRecord to (item of vPhotoDetails whose num = vFileIndex) --> vPhotoRecord = {num:8808, datetime:"2020-07-24T14:32:00", caption:"Middle of nowhere", lat:"57.59631", lon:"-13.68732", notes:"Photographer: Jane Smith", lens:"Fujinon Super EBC 23mm"}

Now, I know the last line is not valid AppleScript, which is my whole problem! Is there an elegant way to do this in a handler without resorting to:


on hGetRecordIndex(vRecords, vFileIndex)
	set vRecordNum to 1
	repeat with vRecord in vRecords
		if (num of vRecord) = (vFileIndex as integer) then return vRecordNum
		set vRecordNum to vRecordNum + 1
	end repeat
	
	return null
end hGetRecordIndex

My concern is that this handler will be quite slow if vRecords is a large list. Any advice much appreciated!

There’s no shortcut in AppleScript. However, AppleScriptObjC provides a solution:

use AppleScript version "2.5" -- macOS 10.11 or later
use framework "Foundation"
use scripting additions

set theArray to current application's NSArray's arrayWithArray:vPhotoDetails
set thePred to current application's NSPredicate's predicateWithFormat:"num = %@" argumentArray:{8808}
set theResult to (theArray's filteredArrayUsingPredicate:thePred)'s firstObject() as record -- assuming only one match

Hi. Assuming you have a list that’s consecutively ordered (by the num record), you can just call it by index. If the list is nonconsecutive, it may be more efficient to repopulate—forcing it to be consecutive—or to loop with a reference; this will depend on the span between any gaps in the series.

set thing to {{num:1, something:"something", whatever:"something else"}, {num:2, something:"something2", whatever:"something else2"}, {num:8809, something:"something8809", whatever:"something else8809"}, {num:8810, something:"something8810", whatever:"something else8810"}}

#1) Call by index
thing's item 2 --or item 8809, if there are that many items

#2) Loop until criteria
set {counter, findThis} to {1, 8809}
repeat until my thing's item counter's num is findThis
	set counter to counter + 1
end repeat
thing's item counter's num

A repeat loop made faster with a reference-to operator or script object seems a reasonably quick way to accomplish what the OP wants. To quantify this, I created a list containing 1000 items with the match record being item 901. I then ran the loop suggested by the OP both with and without a reference-to operator. The results were 9 milliseconds with the reference-to operator and 463 milliseconds without.


-- untimed code
set oneRecord to {num:1, datetime:"2020-07-24T14:32:00", caption:"Middle of nowhere", lat:"57.59631", lon:"-13.68732", notes:"Photographer: Jane Smith", lens:"Fujinon Super EBC 23mm"}

set vRecords to {}

repeat 1000 times
	copy oneRecord to end of vRecords
	set num of oneRecord to ((num of oneRecord) + 1)
end repeat

-- timed code
set vFileIndex to 901

getRecordIndex(a reference to vRecords, vFileIndex)

on getRecordIndex(vRecords, vFileIndex)
	
	set vRecordNum to 1
	repeat with vRecord in vRecords
		if (num of vRecord) = vFileIndex then return vRecordNum
		set vRecordNum to vRecordNum + 1
	end repeat
	
end getRecordIndex

I thought I’d also write a script utilizing a script object. It’s a few milliseconds faster than the script using a reference-to operator.


-- untimed code
set oneRecord to {num:1, datetime:"2020-07-24T14:32:00", caption:"Middle of nowhere", lat:"57.59631", lon:"-13.68732", notes:"Photographer: Jane Smith", lens:"Fujinon Super EBC 23mm"}

set vRecords to {}

repeat 1000 times
   copy oneRecord to end of vRecords
   set num of oneRecord to ((num of oneRecord) + 1)
end repeat

-- timed code
set vFileIndex to 901

getRecordIndex(vRecords, vFileIndex)

on getRecordIndex(vRecords, vFileIndex)
	
	script o
		property vRecordsRef : vRecords
	end script
	
	set vRecordNum to 1
	repeat with vRecord in o's vRecordsRef
		if (num of vRecord) = vFileIndex then return vRecordNum
		set vRecordNum to vRecordNum + 1
	end repeat
	
end getRecordIndex


Thanks all for the replies, this is all really helpful!

I’ve never dipped into ASObjC because of my lack of knowledge of Objective-C, so this is very helpful. I would never have been able to come up with this on my own! I’ve not had a chance to work on my script since my original post, but I’ll probably try using this method first. Am I correct that the use AppleScript version line is to ensure a minimum of version 2.5 (i.e. it will work with version 2.7 which I’m currently using on Mac OS 10.14 Mojave)?

This is really helpful, as the loop is a lot quicker than I thought it would be. The reference-to operator is a good idea which I hadn’t thought of. Thanks for doing the timings, I haven’t figured out how to do that, so it’s useful to have some data on this. I’m still considering using your solution of putting the records in a script object and looping around. Thought it might be slower than the ASObjC solution Shane provided, it looks like it will be plenty fast enough for my purposes (I can’t see my records numbering more than 10k-20k) and would be a bit more readable for me in the future (given my lack of Objective-C knowledge).

One related question I have is why is it faster to put the reference to the records in a script object. When I was looking up fast find routines in posts on this forum, I saw the use of script objects (for instance, this post by Nigel Garvey). I’ve not used script objects before, and I don’t understand why putting the reference to the records in a script object would speed up the execution. What is the relationship of a script object to the rest of the script?

This is actually how the current version of my script works (which is now almost 9 years old), but it was written for data exports from a slightly different database, where the data was grouped in ‘rolls’ of only a few dozen records with consecutive numbering, and the numbering of the records always started around zero (generally -2, -1, 0, 1, or 2). So it was easy to initialise an offset based on the first record and then just call the record by index from the filename. I was initially hoping I could adapt the code for my new data exports. However, with my current setup, it’s one ungrouped database with all records, gaps between numbers, and the first record (and file) could be 8088. So repopulating the list with empty records from 1 to 8087 would be a lot less efficient than just looping through the existing records looking for the right num.

Thanks for all the input, great to have a couple of options that will work!

jolinwarren. I’m glad you’ve received some helpful suggestions. FWIW, I have included below the script I used for the timing tests–in this case it tests the script-object script.


use framework "Foundation"
use scripting additions

set decimalPlaces to 3

-- untimed code
set oneRecord to {num:1, datetime:"2020-07-24T14:32:00", caption:"Middle of nowhere", lat:"57.59631", lon:"-13.68732", notes:"Photographer: Jane Smith", lens:"Fujinon Super EBC 23mm"}

set vRecords to {}

repeat 1000 times
	copy oneRecord to end of vRecords
	set num of oneRecord to ((num of oneRecord) + 1)
end repeat

-- start time
set startTime to current application's CFAbsoluteTimeGetCurrent()

-- timed code
set vFileIndex to 901

set theIndexNumber to getRecordIndex(vRecords, vFileIndex)

on getRecordIndex(vRecords, vFileIndex)
	
	script o
		property vRecordsRef : vRecords
	end script
	
	set vRecordNum to 1
	repeat with vRecord in o's vRecordsRef
		if (num of vRecord) = vFileIndex then return vRecordNum
		set vRecordNum to vRecordNum + 1
	end repeat
	
end getRecordIndex

-- elapsed time
set elapsedTime to (current application's CFAbsoluteTimeGetCurrent()) - startTime
set nf to current application's NSNumberFormatter's new()
nf's setFormat:("0." & (text 1 thru decimalPlaces of "00000"))
set elapsedTime to ((nf's stringFromNumber:elapsedTime) as text) & " seconds"

-- result
elapsedTime --> 6 milliseconds on first run and 2 milliseconds on rerun
-- count vRecords --> 1000
-- theIndexNumber --> 901

Generally speaking, script objects, a reference to, and my are just hocus pocus that play on an AppleScript quirk. Using a script object seems to be a stylistic choice, but, in many scenarios, it ultimately won’t be the fastest method—it tends to verbosity. My solution is at least twice as fast as the script object method; even when there are 10K list objects, it executes in significantly less than a second.

use framework "Foundation"
use scripting additions

set decimalPlaces to 3

-- untimed code
set oneRecord to {num:1, datetime:"2020-07-24T14:32:00", caption:"Middle of nowhere", lat:"57.59631", lon:"-13.68732", notes:"Photographer: Jane Smith", lens:"Fujinon Super EBC 23mm"}

set vRecords to {}

repeat 1000 times
	copy oneRecord to end of my vRecords --*edit
	set num of oneRecord to ((num of oneRecord) + 1)
end repeat
-- start time
set startTime to current application's CFAbsoluteTimeGetCurrent()

-- timed code

#2) Loop until criteria
set {counter, findThis} to {1, 901}
repeat until my vRecords's item counter's num is findThis
	set counter to counter + 1
end repeat
set finality to vRecords's item counter

-- elapsed time
set elapsedTime to (current application's CFAbsoluteTimeGetCurrent()) - startTime
set nf to current application's NSNumberFormatter's new()
nf's setFormat:("0." & (text 1 thru decimalPlaces of "00000"))
set elapsedTime to ((nf's stringFromNumber:elapsedTime) as text) & " seconds"

-- result
elapsedTime --> about 1 millisecond 

*Edited for clarity, test list generation improvement, and to update timing outcome.

Hi.

  1. On my machine, peavine’s timing script is consistently “twice as fast” as Marc’s: ie. 0.001 seconds as opposed to 0.002.
  2. Marc’s script loops through my vRecords, which makes it essentially the script object method anyway. The script object in this case is the main script, not one in a handler.
  3. peavine’s script uses a repeat with … in … repeat with the script object list variable instead of an indexed repeat, which is interesting. Its timing appears to be essentially identical to Marc’s, even with 10,000 items, although the intial set-up takes forever because the script copies the records to end of vRecords instead of to end of my vRecords.
  4. When using a script object in a handler, I’ve always obtained the list’s length by applying count to the handler’s list parameter variable rather than to the script object property because this always used to be slightly faster. (My theory was that count is applied directly to the list rather than to its items or properties.) But experimenting with it again this morning, I’m finding that counting the parameter variable instead of the script object property nearly doubles the time taken for 10,000 items, raising it from 0.034 seconds to about 0.060! I’ll be changing my ways from now on and counting lists as script properties! :wink:

Thanks Nigel for the post.

I’m just learning the ins-and-outs of speed-enhancement techniques for large lists and wanted to decide on one before placing it in my notebook. So, I reran the tests with 10000 records and with the matching record set at 9901. The results–which only include enough code to identify the approach employed–were:


-- no speed enhancement - 299 seconds on first run
getRecordIndex(vRecords, vFileIndex)
on getRecordIndex(vRecords, vFileIndex)
	set vRecordNum to 1
	repeat with vRecord in vRecords
		if (num of vRecord) = vFileIndex then return vRecordNum
		set vRecordNum to vRecordNum + 1
	end repeat
end getRecordIndex

-- modify above with a-reference-to operator - 0.51 seconds on first run
getRecordIndex(a reference to vRecords, vFileIndex)

-- script object one - 0.44 seconds on first run
repeat with vRecord in o's vRecordsRef

-- script object two - 0.71 seconds on first run
repeat with i from 1 to (count vRecords)

-- script object three - 0.42 seconds on first run
-- this appears to confirm Nigel's point 4
repeat with i from 1 to (count o's vRecordsRef)

I also ran Marc Anthony’s script as written, changing only the number of records to 10000 and the matching record to 9901. The timing result on first run was 0.42 seconds.

I think Nigel answers this question in the following:

https://www.macscripter.net/viewtopic.php?pid=60390

BTW, there does seem to be a significant speed advantage to the use of what Nigel refers to above as the script-object method as compared with the simple use of a reference-to operator. I don’t know the reason for this but it’s probably worth bearing in mind when working with very large lists.

Thanks for all this follow-up. I think I’m starting to get my head around the different methods and their implications, so the code examples, timings, and explanations from everyone are hugely helpful! I hadn’t previously seen/digested Nigel’s explanation on why a script object can speed things up, but that makes sense, and the whole thing is a lot clearer to me.

peavine, thank you for the comprehensive timing results. It looks like I should go with either the “script object three” or Marc’s method. They are both plenty fast enough for my purposes. If I ever end up with a database that has significantly more than 10k rows, I will revisit this thread and consider the ASObjC solution. peavine, thanks also for the timing code, that will be useful for future development.

As a further follow-up for those who are interested, I am now using a modified version of Marc’s approach #2. The script isn’t finished yet, but I think this part of it won’t change further.

The one issue with Marc’s code was that if none of the records contained the number being searched for, after getting to the end of the list of records AppleScript will (quite rightly) throw an error. After rewriting the loop, I also noticed that if I passed it a number that wasn’t in any of the records, the timing was significantly faster (thanks again peavine for the timing code, it’s so useful!). This lead me to realise that comparing num in each record to the number I was searching for was less of a performance hit than:

set vPhotoRecord to vPhotoDetails's item vRecNum

Using my newly gained understanding from all the helpful people in this thread, I used my to use a reference to vPhotoDetails instead, and the timing on first run with 10000 records and searching for 9991 is now 0.0245 seconds!


use framework "Foundation"
use scripting additions

set decimalPlaces to 4

-- untimed code
set oneRecord to {num:1, datetime:"2020-07-24T14:32:00", caption:"Middle of nowhere", lat:"57.59631", lon:"-13.68732", notes:"Photographer: Jane Smith", lens:"Fujinon Super EBC 23mm"}

set vPhotoDetails to {}

repeat 10000 times
	copy oneRecord to end of my vPhotoDetails
	set num of oneRecord to ((num of oneRecord) + 1)
end repeat

-- start timer
set startTime to current application's CFAbsoluteTimeGetCurrent()

-- timed code

set {vRecNum, vFileIndex} to {1, 9991}
set vPhotoRecord to false

repeat with vRecNum from 1 to (count my vPhotoDetails)
	if my vPhotoDetails's item vRecNum's num is vFileIndex then
		set vPhotoRecord to my vPhotoDetails's item vRecNum
		exit repeat
	end if
end repeat

-- elapsed time
set elapsedTime to (current application's CFAbsoluteTimeGetCurrent()) - startTime
set nf to current application's NSNumberFormatter's new()
nf's setFormat:("0." & (text 1 thru decimalPlaces of "00000"))
set elapsedTime to ((nf's stringFromNumber:elapsedTime) as text) & " seconds"

-- result
elapsedTime --> 0.0245 seconds


Even with 100,000 records and searching for 99,991, the timing for the loop is only 0.1657 seconds, so this scales well. I doubt I will ever get to 100,000 records in any case, but that’s probably around when I would rewrite this section in ASObjC as suggested by Shane.

Thanks again to everyone. I’m now going through other existing code to optimise some of the loops. :smiley:

jolinwarren. Sounds like you’ve made great progress.

Just as a point of credit, the timing script I use is based on one written by Nigel and the elapsed-time code was written by Shane. I did some fine-tuning, though.

Out of curiosity, I tested the ASObjC snippet with 10K objects, and it’s quite the laggard method at ~.385 seconds, which is a little surprising.