Script Efficiency / Run Time...

A brief update on today’s event to keep this thread moving forward.

  1. As noted above, I greatly appreciate the input of hhas, Nigel and Mark and commit to taking a look at their code and coming back with my comments / findings tomorrow.

  2. In terms of today’s activities they have been somewhat limited because of work commitments but I did manage to resolve / tackle one issue that anyone working with OS X Contacts (“Contacts”) and Microsoft Outlook (“Outlook”) may be interested in.

In testing the portion of the code I had written I noticed that Outlook and Contacts had different record counts which should not be case given that i) all my contacts are stored on Exchange Server [i.e. I store no contacts anywhere else] ii) Contacts and Outlook both act as an Exchange Server client [i.e. so the record count should match].

The Contact record count breaks down as follows:

¢ Total records 1,041
¢ Less Phantom Single records ( 12) *
¢ Less Phantom Out records ( 9) **
¢ Less Phantom Multi records ( 9) ***
¢ Less Apple records u[/u] ****
¢ Total 1009

  • Records which i) contain the only an e-mail address for an Exchange Server contact and ii) do not match the Exchange server contact record because the Exchange Server contact record is populated. I HAVE NO EXPLANATION AS TO HOW OR WHY THESE RECORDS OCCUR.

** Records which contain an e-mail address for a non Exchange Server contact. I HAVE NO EXPLANATION AS TO HOW OR WHY THESE RECORDS OCCUR.

*** Records which i) contain one of multiple e-mail addresses for an Exchange Server contact and ii) do not match the Exchange server contact record because the Exchange Server contact record is populated. I HAVE NO EXPLANATION AS TO HOW OR WHY THESE RECORDS OCCUR.

**** Records added by OS X [i.e. Apple corporate and user].

The Outlook record count breaks down as follows:

¢ Total records 1,020
¢ Less Multiple Category records ( 10)
¢ Less Contact Group records u[/u]
¢ Total 1,009

With the above noted I can safely conclude that both Contact and Outlook contain the same number of contacts though I must admit that confirming / reconciling this discrepancy was neither a fast nor trivial exercise.

  1. What are your comments / thoughts about deleting Contacts’ Phantom records noting that I leaning on i) backing up the Exchange Server contacts ii) deleting the Phantom contacts and iii) testing that their deletion had no impact on the Exchange Server listing given that none of these records have matching Exchange Server records.

I would be interested in your thoughts on this.

Appreciate the above code…though I understand how most of it works I do need to take a closer…with that, I do hope to have some time tomorrow to work on the code and will - to the extent I understand all of the code – insert it into my script!

List duplicates…I have form others different approaches but should you have another that is even more elegant pleae let me know, I am here to learn!

Thanks for all your input.

Love the quote!

Noted, I will look into this over the next few weeks as I hope to have some added time once my vacation starts.

Will be ordering those books as starting points / references…thanks for pointing me in the right direction!

Shane:

Appreciate the code though I am going to have to go through a few pages of your book – see my immediately preceding post – before I understand how / why the code works noting that I am committed to understanding what / why code works because I want to learn!!

FWIW, message #12 above is an example of that very thing.

Shane:

I understand that as well as see the elegance / efficiency in the code that you wrote and would love to be at a point that I could say that I understand what you coded and why it works but I am note there YET…that said, keep the code coming as I am BOTH very curious and determined to pick this stuff up…thanks again!

Here’s an explanation. A set is like a list, except any item can only appear in it once. An NSCountedSet is a special kind of set that also keeps track of how many times an object has been added to it. and if an item has been added more than once, trying to remove it just decrements the object’s count.

So we start off making a counted set from our list, using the set method setWithArray:

set anNSCountedSet to current application's NSCountedSet's setWithArray:myPeopleList

An ordinary set doesn’t keep count – it just stores each item just once. So we make a simple set from the counted set:

set anNSSet to current application's NSSet's setWithSet:anNSCountedSet

Sets have a method for subtracting one set from another. So if we subtract the simple set from the counted set, any items with a count of 1 are removed, and any with a higher count just have their count decremented:

anNSCountedSet's minusSet:anNSSet

So now the counted set has only values that were in the list more than once, and we get that as an AppleScript list by asking for an array (allObjects()) and then coercing it to a list:

set dupeEntries to anNSCountedSet's allObjects() as list

Appreciate the explanation which is crystal clear…now I need to learn how to use the language so I can code the above on my own…there are however a few possible pitfalls, see my next post…

As promised here is my script inclusive of a lot of Excel related code to to produce a nice looking spreadsheet listing the duplicate contact records…that said, please note that additional commentary appears after the script which I think is interesting…


tell application "Contacts"
	-- Set variables
	set firstNames to {} -- List of all first names from all Contacts
	set lastNames to {} -- List of all last names from all Contacts
	set orgNames to {} -- List of all organization names from all Contacts	
	set phoneNumbers to {} -- List of all work telephone numbers from all Contacts
	
	set myList to {} -- List of all first names, last names, organization names, work telephone numbers and "first name last name" combinations from all Contacts
	set myListNames to {} -- List of all "first name last name" combinations from all Contacts
	set myListNamesUnique to {} -- List of myListNames WITH DUPLICATE ENTRIES REMOVED
	set myListUnique to {} -- List of myList WITH DUPLICATE ENTRIES REMOVED
	set myListNamesDuplicate to {} -- List of myListNames DUPLICATES
	set myListDuplicate to {} -- List of myList DUPLICATES ENTRIES
	set myListPhantom to {} -- List of myList PHANTOM ENTRIES 
	
	
	-- Set / extract the Contact first names, last names, organization names and work telephone numbers
	set {firstNames, lastNames, orgNames, phoneNumbers} to {first name, last name, organization, value of every phone whose label is "work"} of every person
	
	repeat with i from 1 to count lastNames
		if item i of firstNames is equal to missing value then set item i of firstNames to " "
		if item i of lastNames is equal to missing value then set item i of lastNames to " "
		if item i of orgNames is equal to missing value then set item i of orgNames to " "
		if item i of phoneNumbers is equal to missing value then set item i of phoneNumbers to " "
	end repeat
		
	repeat with i from 1 to (count lastNames)
		set end of myList to {i, item i of firstNames, item i of lastNames, item i of orgNames, item i of phoneNumbers, item i of firstNames & " " & item i of lastNames}
		set end of myListNames to (item i of firstNames & " " & item i of lastNames)
	end repeat
	
	
	-- Set / create the list of duplicate entries and unique entries noting that the test is based on matching "first name last name" as an "extended test" [i.e a test that also includes other contact properties] would not produce the correct results [i.e. two contact records that identical in every sense except the first record has the organization as ACME while the second record has the organization as ACME Inc. would not be identified as being a duplicate]. 
	repeat with i from 1 to (count myListNames)
		
		if (((count of item 2 of item i of myList) < 2) and ((count of item 3 of item i of myList) < 2) and ((count of item 4 of item i of myList) < 2)) then
			set end of myListPhantom to item i of myList -- Ignore / kickout the phantom contacts
			
		else
			
			if ((item i of myListNames is not in myListNamesUnique) or (((count of item 3 of item i of myList) < 2) and (item 4 of item i of myList > 1))) then -- The "or" portion of the test is to include retailers, restaurants, etc. where i) there is no first name or last name but ii) there is an organization name.  WIthout this added test these contact records would eb considered duplicates because there "first nane last name" combination [i.e. " "] would be in myListNamesUnique
				set end of myListNamesUnique to item i of myListNames
				set end of myListUnique to item i of myList
			else
				set end of myListNamesDuplicate to item i of myListNames
				set end of myListDuplicate to item i of myList
			end if
			
		end if
	end repeat			
end tell


tell application "Microsoft Excel"
	
	open
	
	
	-- Get current date in YYYYMMDD format	
	set currentDate to current date
	set currentDateYear to year of currentDate
	set currentDateMonth to month of currentDate as integer
	if ((day of currentDate < 10) is true) then
		set currentDateDay to "0" & day of currentDate
	else
		set currentDateDay to day of currentDate
	end if
	set currentDateYYYYMMDD to currentDateYear & currentDateMonth & currentDateDay as string
	
	
	-- Make and name a new workbook
	make new workbook
	set theBook to the active workbook
	set theSheet to the active sheet of the theBook
		
	
	-- Set the magniifcation / zoom percentage
	set zoom of the active window to 100
	
	
	-- Set the column widths in the new workbook for readability
	set column width of the first column of theSheet to 3
	set column width of the second column of theSheet to 5
	set column width of the third column of theSheet to 20
	set column width of the fourth column of theSheet to 70
	set column width of the fifth column of theSheet to 3
	
	
	-- Set and format the column heading and titles
	set myRangeCells to range ("B2:D2") of theSheet
	set weight of (get border of myRangeCells which border edge top) to border weight thick
	set borderWeightLog to get weight of (get border of myRangeCells which border edge top)
	
	set insertedTextTop to "Contact Records Which Appear Two or More Times" as string
	set insertedTextBottom to "Selected Contact Source:  Work's Exchange Server"
	set myRangeTop to range ("B3:B3") of theSheet
	set myRangeBottom to range ("B4:B4") of theSheet
	set value of myRangeTop to insertedTextTop
	set value of myRangeBottom to insertedTextBottom
	
	set myRangeCells to range ("B7:D7") of theSheet
	set weight of (get border of myRangeCells which border edge top) to border weight thin
	set borderWeightLog to get weight of (get border of myRangeCells which border edge top)
	
	set myRangeCells to range ("B3:B3")
	set font size of font object of myRangeCells to 16
	set font style of font object of myRangeCells to "Bold"
	set fontStyleLog to (get font style of font object of myRangeCells)
	
	set myRangeCells to range ("B4:B4")
	set font style of font object of myRangeCells to "Bold"
	-- set fontStyleLog to (get font style of font object of myRangeCells)
	-- log fontStyleLog
	
	set InsertedTextBottom1 to "Count"
	set InsertedtextBottom2 to "Name"
	set InsertedTextBottom3 to "Organization"
	set myRangeBottom1 to range ("B9:B9") of theSheet
	set myRangeBottom2 to range ("C9:C9") of theSheet
	set myRangeBottom3 to range ("D9:D9") of theSheet
	set value of myRangeTop to insertedTextTop
	set value of myRangeBottom1 to InsertedTextBottom1
	set value of myRangeBottom2 to InsertedtextBottom2
	set value of myRangeBottom3 to InsertedTextBottom3
	
	set myRangeCells to range ("B9:D9") of theSheet
	set horizontal alignment of myRangeCells to horizontal align center
	
	set myRangeCells to range ("B11:C100000") of theSheet
	set horizontal alignment of myRangeCells to horizontal align center
	
	set myRangeCells to range ("B9:D9")
	-- set font size of font object of myRangeCells to 16
	set font style of font object of myRangeCells to "Bold"
	
	set myRangeCells to range ("B9:D9") of theSheet
	set weight of (get border of myRangeCells which border edge bottom) to border weight thin
	set borderWeightLog to get weight of (get border of myRangeCells which border edge bottom)
	
	
	-- Import myListDuplicates
	repeat with i from 1 to count of myListDuplicate
		set value of cell ("B" & (10 + i) as string) to i
		set value of cell ("C" & (10 + i) as string) to item 6 of item i of myListDuplicate
		
		if item 4 of item i of myListDuplicate < 2 then set item 4 of item i of myListDuplicate to "Organization field is blank"
		set value of cell ("D" & (10 + i) as string) to item 4 of item i of myListDuplicate
	end repeat
	
	sort range ("C11:D" & (10 + (count of myListDuplicate) as string)) of worksheet theSheet key1 (range "C11" of worksheet theSheet) key2 (range "D11" of worksheet theSheet)
	
	-- Set the name of the spreadsheet to be saved
	set userName to do shell script "whoami"
	set fileNameSaved to "Macintosh HD:Users:" & userName & ":Desktop:" & currentDateYYYYMMDD & "_contact records which appear two or more times.xls"
	
	
	-- Save the workbook / spreadsheet
	tell theBook
		save workbook as theBook filename fileNameSaved overwrite yes
	end tell
	
	
	-- Test, and depending on the test result, close the spreadsheet
	display dialog "Do you want to close the Excel spreadsheet which lists those contact records which appear two or more times?" buttons {"Yes", "No"} default button 2 giving up after 5
	set spreadsheetClose to button returned of result
	if ((spreadsheetClose = "Yes") is true) then quit
	
end tell

The intersecting commentary:

  1. Appreciate the help from everyone particularly for point me in a direction that did not require the need to sort the data noting that the script runs fairly quickly on my mid-2012 i7 8GB MBA [I wish it had more RAM]!

  2. In testing the script I noted that there were a few problems that need to be sorted:

a) Comparing records based on multiple entries such as a list consisting of {first name, last name, organization, phone number} does not work…consider the case where the same person is entered twice but the first record has his organization as ACME while the second record has his organization as ACME Inc.

b) Comparing records based on name alone did not work because records which had no name entries but did have organization entries [i.e. restaurants, stores, etc.] were defaulting to the duplicates list because there were multiple occurences with no names!

c) Due to 2a and 2b the coding got a little more complicated that I had hoped but at least it is “done” in that i) it functions and ii) it needs to be optimized in terms of code and performance [but, at least it is a start]!

Would appreciate any and all feedback / suggestions you may have!

Thanks!

Also, sets are unordered collections whereas lists are ordered collections. This is what gives them their different performance characteristics.

Sets are significantly more efficient at performing containment tests than [unsorted] lists because they don’t have to worry about preserving the order in which items were originally given. That allows them to rearrange that content internally to that enable highly efficient searching. Worst efficiency is O(log n), though some implementations approach constant time, O(1), getting close enough to make no real difference in practice (aka ‘amortized constant time’).

Best-case efficiency with a list is O(log n), and only if that list is guaranteed to be sorted (in which case you can use a binary, aka “divide and conquer”, search); otherwise it’s linear time, O(n), since you have to iterate the entire list testing each item in turn. (And that’s assuming list access is constant time, which AppleScript’s aren’t unless the previously mentioned kludges are used.)


p.s. O(log n), if you’re unclear, means that every time the collection doubles in size, the time it takes to search increases by +1. For example, if it takes 1ms to search a 1000-item collection, then a 2000-item collection takes 2ms, 4000 items takes 3ms, 8000 items takes 4ms, … 256,000 items takes 9ms, 512,000 items takes 10ms, and so on, quickly becoming the least of your worries. :slight_smile:

Appreciate the explanation and actually get it as I am a math guy…thanks!

Computers are dumber than humans. You may be aware that “ACME” and “ACME Inc” are the same entity, but to the computer they’re just a couple of character strings.

In this case, if possible, I would suggest writing a script that goes through Contacts and normalizes all your company names. (That may not be so practical if that information’s coming from an external source, in which case, modifying existing entries here might cause knock-on problems elsewhere.) Various manual/automatic/hybrid strategies you could use, depending on how rigorous/safe/laborious you want it to be. e.g. It is entirely possible for two different organizations to have very similar names, in which case how do you safely determine if “ACME” and “ACME Inc” are indeed the same? One option might be to compare their addresses; though, of course, addresses might differ as well if they use more than one office, or even if there’s just a typo. Frequently the best strategy is to fix the really obvious inconsistencies automatically, then write out the remaining ones for the user to review and fix manually where appropriate.

Of course, avoiding these kinds of data inconsistencies is why we invented technology like relational databases, where entities like ACME only ever have a single definition to which other (likewise unique) entities such as people can link. Unfortunately, competent organization is one of those areas where humans are frequently dumber than even computers. :stuck_out_tongue:

You also need to balance the benefit of making all these fixes to the cost of doing it. You haven’t really said why you’re doing it. For instance, if it’s only because your employer’s Exchange server offends your personal OCD, that isn’t justifiable grounds for spending time on a personal indulgence instead of doing the work they’re paying you to do. Alternative, if the company-wide address book really is such a disaster as to significantly impact employee efficiency then IT/admin should be fixing it at source, in which case affected employees should make the problem known to their managers, who should kick it up the chain until something gets done about it (and if they choose not to, then - meh - it’s still not your problem to fix, and probably the least of many such problems at that).

Once you get the Automation bug, it’s like owning the world’s biggest nuclear-powered hammer: not only does every problem look even more like a nail, but now you can knock down entire walls without even thinking about it. :slight_smile:

Ah well, in that case explaining set theory to a mathematician ⊆ teaching your grandmother to suck eggs.

Although not terribly surprising as I recall running into such issues when I first took programming 30 years ago – no, this is not a typo – and programmed for a few years thereafter in APL, Fortran and Pascal there are actually TWO CHALLENGES in trying to pick up AppleScript namely learning the syntax AND working through logic challenges / issues that were never anticipated such as that identified above.

I am not too concerned about those type of inconsistencies as I think they are few and far between but they do exists and even a handful can create havoc when trying to identify multiple records for a given contact.

I do not – at least as of now – need to create such a “consistency script” but who knows what tomorrow may bring :slight_smile:

Though I am not a computer engineer I think there is basic flaw in much of the software that we use that creates such issues…why not have an application – OS X Contacts or Microsoft Outlook as simple examples – that actually intervene when entering similar names, organizations, telephone numbers, etc.?

Though Microsoft Outlook does this somewhat for contact names it could be made much more robust.

Why am I doing this…I am – believe it or not at age 50 – trying to teach myself to code in Applescript after having done no coding for 25+ years and think that the best way to do this is to try to develop scripts that i) are not overly difficult and ii) have a practical / real purpose…I am specifically coding this one because a friend of mine has a contact directory that is a mess [i.e. I alone appear in it 4 times]!

With that, should you have a better way for me to learn this please let me know noting a course is not an option as I have very long work days!

Agree, this is addictive as my list of scripts that I want to write is growing exponentially! Too bad I am not better at coding but hopefully I am improving day-by-day!

***

Thanks…

:smiley:

I do unfortunately have one last question…

When I run the script as an application I receive the below dialog box asking for permission to access my contacts

https://www.dropbox.com/s/wx73cq6za1jhxib/Screen%20Shot%202014-12-19%20at%209.21.06%20AM.png?dl=0

While I am sure there is a way to avoid this dialog box from within the script I do not how to do so and would appreciate assistance in adding / inserting this code.

Thanks in advance for the help.

Joel

You can’t avoid that dialog initially. But it keeps reappearing because your script modifies itself. You can stop that either by codesigning it, which requires a digital signature, or using something like “chmod a-w” at the command line to lock the main.scpt within your application bundle.

That is too bad although I don’t understand how the script modifies itself given that I saved it as an application rather than a script.

Thanks for the response which is – as always greatly appreciated – because at least I know there is no way to avoid the dialog.

Joel

PS. If I wanted to try the chmod a-w approach how would I go about it?

An applet is just an application shell containing a script. When you run it, on quitting it saves the modified script back to disk. That’s how properties are persistent.

set appPath to POSIX path of (choose file)
set scriptPath to appPath & "Contents/Resources/Scripts/main.scpt"
do shell script "chmod a-w " & quoted form of scriptPath

But once you’ve done it, you can’t edit the script again without reversing the process.

Understood…


set appPath to POSIX path of (choose file)
set scriptPath to appPath & "Contents/Resources/Scripts/main.scpt"
do shell script "chmod a-w " & quoted form of scriptPath

But once you’ve done it, you can’t edit the script again without reversing the process.
[/quote]

Ahhh, the last sentence is key…I think I will leave things as they are as clicking okay is fine given that I won’t be running the script more than once a month.

Really appreciate all the help!

Joel

PS. I am off to my next script!