Find a record within a large list based on a subset of keys/values

Dear all,

I have a list of >5000 of records of calendar events. Each record has 10 pairs (or more) of key/values.
I would like to find (very) similar events within this large list having the same “Data start” and “Summary”, but different “UID”.
I could iterated (with a loop) for each even in the list and compare it with an internal loop for each event.
Something like (in a very very simplified way):

repeat with anEvent in these_events 
set duplicatedEvents to {}
	repeat with anEvent in these_events 
		anEvent
		set theStartingDate to anEvent's start date
		set theSummary to anEvent's summary
		set theUID to anEvent's uid
		
		repeat with searchEvent in these_events
			searchEvent's summary
			if searchEvent's uid ≠ theUID and searchEvent's start date = theStartingDate and searchEvent's summary = theSummary then
				beep -- we have it 
				set end of duplicatedEvents to {{theUID, theSummary, theStartingDate}, {searchEvent's uid, searchEvent's start date, searchEvent's summary}}
			end if
		end repeat
	end repeat

But this is very inefficient.
I wonder if there is a way to interrogate the list with something like:

repeat with anEvent in these_events -- (get events of calendar adHocCalendar)
		anEvent
		set theStartingDate to anEvent's start date
		set theSummary to anEvent's summary
		set theUID to anEvent's uid
		set end of duplicatedEvents every event whose uid ≠ theUID and start date = theStartingDate and summary = theSummary 
	end repeat

Thanks in advance
Luciano

Hi Luciano.

You can’t use a ‘whose’ filter if these_events really is a list. But you can improve the efficiency of the search logic by not looking for duplicates among the items before ‘anItem’ and not examining items that have already be shown to be duplicates of a previous ‘anItem’. Something like the following. I’ve assumed that your list is a list of event properties obtained from Calendar.

tell application "Calendar"
	set these_events to properties of events of calendar "Blah" -- Your calendar here
end tell
findDuplicatedEvents(these_events)

on findDuplicatedEvents(these_events)
	script o
		property event_list : these_events's items
		property duplicatedEvents : {}
	end script
	set eventCount to (count these_events)
	
	using terms from application "Calendar"
		repeat with i from 1 to eventCount
			set anEvent to item i of o's event_list
			-- Events already found to be (and thus to have) duplicates will have been replaced with 'missing value' below.
			if (anEvent is not missing value) then
				set thisDuplicateGroup to {}
				set theStartingDate to anEvent's start date
				set theSummary to anEvent's summary
				set theUID to anEvent's uid
				
				-- Don't search for duplicates among the previous 'anEvent's.
				repeat with j from (i + 1) to eventCount
					set searchEvent to item j of o's event_list
					if ((searchEvent is not missing value) and (searchEvent's start date = theStartingDate) and (searchEvent's summary = theSummary)) then
						beep -- we have it 
						set end of thisDuplicateGroup to {searchEvent's uid, searchEvent's start date, searchEvent's summary}
						-- Eliminate this duplicate from the rest of the search.
						set item j of o's event_list to missing value
					end if
				end repeat
				if (thisDuplicateGroup is not {}) then
					set beginning of thisDuplicateGroup to {theUID, theSummary, theStartingDate}
					set end of o's duplicatedEvents to thisDuplicateGroup
				end if
			end if
		end repeat
	end using terms from
	
	return o's duplicatedEvents
end findDuplicatedEvents

Beautiful !!! Thanks a lot Nigel !!!

Here’s a method I use, two different examples that use the same “arrayOfDupesOnlyValues”

Basically need to get an arrayOfValues for the keys.
Create a countedSet from this arrayOfValues.
Filter / Create a arrayOfDupesOnlyValues where the the countedSet countForObject > 1

then I’ve shown two functions using this data to filter the original list.

  1. using some SMS Forder techniques to get an indexSet
  2. building a compoundPredicate to filter the original list

sorry didn’t have much time to build a test list

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions
use script "BridgePlus" version "1.3.2"
load framework

-- classes, constants, and enums used
property NSCompoundPredicate : a reference to current application's NSCompoundPredicate
property NSCountedSet : a reference to current application's NSCountedSet
property NSDictionary : a reference to current application's NSDictionary
property NSPredicate : a reference to current application's NSPredicate
property SMSForder : a reference to current application's SMSForder
property NSArray : a reference to current application's NSArray
property NSString : a reference to current application's NSString
property NSMutableArray : a reference to current application's NSMutableArray

set listREC1 to {{title:"Track01", artist:"Artist02", album:"Greatest Hits"}, {title:"Track02", artist:"Artist02", album:"Greatest Hits"}, {title:"Track03", artist:"Artist03", album:"Best Ever"}, {title:"Track04", artist:"Artist04"}, {title:"Track05", artist:"Artist05", album:"Greatest Hits"}, {title:"Track06", artist:"Artist07", album:"Greatest Hits"}, {title:"Track08", artist:"Artist03", album:"Best Ever"}, {title:"Track99", artist:"Artist99"}}

set keys to {"artist", "album"}

property aTest1 : {}
property aTest2 : {}

set aTest1 to (my groupItemsInList:listREC1 usingGroupKeys:keys) as list
-->  {{{title:"Track03", album:"Best Ever", artist:"Artist03"}, {title:"Track08", album:"Best Ever", artist:"Artist03"}}, {{title:"Track01", album:"Greatest Hits", artist:"Artist02"}, {title:"Track02", album:"Greatest Hits", artist:"Artist02"}}}
set aTest2 to (my groupItemsWithPredicateInList:listREC1 usingGroupKeys:keys) as list
-->  {{{title:"Track03", album:"Best Ever", artist:"Artist03"}, {title:"Track08", album:"Best Ever", artist:"Artist03"}}, {{title:"Track01", album:"Greatest Hits", artist:"Artist02"}, {title:"Track02", album:"Greatest Hits", artist:"Artist02"}}}

-- FIND GROUP ITEMS USING SMS FORDERs indexsOfItems
on groupItemsInList:originalList usingGroupKeys:dupeGroupKEYS
	set groupedArray to NSMutableArray's array()
	set sourceArray to NSArray's arrayWithArray:originalList
	set valuesOfArray to SMSForder's subarraysFrom:sourceArray usingKeys:dupeGroupKEYS outKeys:(missing value) |error|:(missing value)
	set dupeValues to (my onlyDupeCountedItemsFromArray:valuesOfArray)
	repeat with i from 1 to count of dupeValues
		set currentValues to (dupeValues's objectAtIndex:(i - 1)) as list
		set matchedIndexs to (SMSForder's indexesOfItems:{currentValues} inArray:valuesOfArray inverting:false) as list
		set matchedIndexSet to (SMSForder's indexSetWithArray:matchedIndexs)
		set matchedObjects to (sourceArray's objectsAtIndexes:matchedIndexSet)
		(groupedArray's insertObject:matchedObjects atIndex:(groupedArray's |count|()))
	end repeat
	return groupedArray
end groupItemsInList:usingGroupKeys:

-- FIND GROUP ITEMS USING FILTER PREDICATE
on groupItemsWithPredicateInList:originalList usingGroupKeys:dupeGroupKEYS
	set groupedArray to NSMutableArray's array()
	set sourceArray to NSArray's arrayWithArray:originalList
	set dupeValues to (my onlyDupeCountedValuesFromArray:sourceArray forKeys:dupeGroupKEYS)
	repeat with i from 1 to count of dupeValues
		set currentValues to (dupeValues's objectAtIndex:(i - 1))
		set aPred to (my createCompoundPredicateForKeys:dupeGroupKEYS matchingValues:currentValues)
		set matchedItems to (sourceArray's filteredArrayUsingPredicate:aPred)
		set aCount to matchedItems's |count|()
		if (aCount > 1) then
			(groupedArray's insertObject:matchedItems atIndex:(groupedArray's |count|()))
		end if
	end repeat
	return groupedArray
end groupItemsWithPredicateInList:usingGroupKeys:

-- UTILITIES
on createCompoundPredicateForKeys:groupKeys matchingValues:matchValues
	set allPreds to NSMutableArray's array()
	set keysArray to NSArray's arrayWithArray:groupKeys
	repeat with i from 1 to count of groupKeys
		set currentKey to (keysArray's objectAtIndex:(i - 1))
		set matchValue to (matchValues's objectAtIndex:(i - 1))
		set aPred to NSPredicate's predicateWithFormat_("%K == %@", currentKey, matchValue)
		(allPreds's addObject:aPred)
	end repeat
	set aCompoundPred to (NSCompoundPredicate's andPredicateWithSubpredicates:allPreds)
	return aCompoundPred
end createCompoundPredicateForKeys:matchingValues:

on onlyDupeCountedValuesFromArray:aArray forKeys:groupKeys
	set sourceArray to NSArray's arrayWithArray:aArray
	set valuesOfArray to SMSForder's subarraysFrom:sourceArray usingKeys:groupKeys outKeys:(missing value) |error|:(missing value)
	set dupeValues to (my onlyDupeCountedItemsFromArray:valuesOfArray)
	return dupeValues
end onlyDupeCountedValuesFromArray:forKeys:

on onlyDupeCountedItemsFromArray:aArray
	set dupeItems to NSMutableArray's array()
	set aCountedSet to NSCountedSet's setWithArray:aArray
	set uniqueItems to aCountedSet's allObjects()
	repeat with i from 1 to count of uniqueItems
		set aItem to (uniqueItems's objectAtIndex:(i - 1))
		set aCount to (aCountedSet's countForObject:aItem)
		if (aCount > 1) then
			(dupeItems's addObject:aItem)
		end if
	end repeat
	return dupeItems
end onlyDupeCountedItemsFromArray:

Thanks a lot!
I will test it !