Guys,
I have many group and committee memberships and related rules in Entourage to move new emails to their respective Entourage folders. However, due to many reasons I receive multiple copies of emails (announcements, etc.) that eventually take up space on the HD and may slow down Entourage. I have written a script based on one I found somewhere on the web (that I regret not recalling the original creator to attribute them, but this has been over a few years) to archive each folder on a “recently changed folder per half hour” basis so that, in the event of a crash or corruption of the Entourage database, I have a fairly up to date back up. Then daily, when I may be away from the machine, a full archive is done. To speed up the archival process and once again save on HD space, I wanted to eliminate duplicates. That’s where this script comes in. (I will share the archival scripts once I have cleaned them up a little and commented the code more). The code for the delete duplicates script was initially based on code I picked up somewhere on the web, possibly from the Mactopia Entourage site, a couple of years back. My regrets that I cannot attribute the originator for their work here.
This script runs through approximately 22,000 emails over 80+ folders in about 25 minutes (~15 emails per second…). Not as fast as I would like, but a lot better than my first implementation that took 2 hours. The script does depend on the “intersection of” code based on the Santimage osax. I tried to get around that using a short routine with “is contained by” but the results were slower.
The premise of the script is that it starts at the top folders (INBOX being #1) and runs through the folders in order of Entourage standard folders, then user added folders in alpha order. The script skips the deleted, drafts, outbox, and junk mail folders as these are in constant flux and duplicates in them may cause a problem later, as you’ll see. The first instance of an email (based on MIME Message-ID and then at least three matches of sender, date, subject, and Entourage ID) is left alone while any other iteration encountered is deleted. The matches must be the MIME ID plus at least three of the other criteria as one of the elements may be changed for some reason. I have tested this theory over hundreds of thousands of emails so far and it seems to be bullet-proof. Feel free to show me where there is an issue and I’ll adjust. The MIME ID match is really a must as it is unlikely to be repeated, but may be, so the rest of the elements being checked are to ensure the MIME ID was not a duplicate.
Since the script starts at the upper folders that come with Entourage any copies that have an instance in one of these folders is retained while later discovered copies are discarded. So keeping your filing away of emails current helps ensure you do not have to refile an email you once filed.
I first wrote the script using database elements and database events. Was far, far too slow. I converted it to using lists and it “flew” in comparison to database records (a 5 to 8 times speed up). If a message arrives at Entourage while the program is running there may be an issue of showing a negative number for duplicate files deleted. I will fix this by setting Entourage to work offline, however this ‘bug’ does not bother me so I am going to work on it as a low priority.
There are debug and record keeping hooks in the script. Statements on progress are output to a text file (great to view with TextWrangler [free] as the real time updating of the file while the script runs is reflected in the TextWrangler view of the file. There is the option to speak progress that aggravates my wife when she hears it so I leave it off. There is also the ability to output log information for troubleshooting.
The file must be saved as a standard script, and not in Script Debugger’s Debug mode as Entourage will not run it properly and give erroneous error messages. Save this in the “Entourage Script Menu Items” folder in the “Microsoft User Data” folder, it will then appear under the Entourage scripts menu item.
That’s all, except for the usual disclaimer that use this at your own peril, I give no warranties of any kind as to wiping out emails that were considered works of art, etc., etc.
Edit 29 Sept: spelled Satimage correctly in comment, stressed that their osax is needed for this to work, and added a link to their site where the osax can be downloaded from (http://www.satimage.fr/software/en/downloads/downloads_companion_osaxen.html)
--a derivative work by John Egan based on an original script I have lost and wish to attribute
-- the attribution will be determined and added as a later revision comment
-- version 1.0.0
-- Scans all Entourage 2008 folders for duplicate messages
property pDebugModeOn : false --send messages to Event Log in Activity window
property pLogEventsOn : true -- generate a log file
property pSpeakMsg : false
property pBaseFolder : "" --the desktop, or where the log file should be
property pEntourageResultsFolder : "Entourage Maintenance Log Messages"
property pLogFolder : "" -- filled in at run time with the desktop folder path
property pLogFileName : "Duplicate Email Message Removal.log"
property pLineEndingMac : ASCII number return -- carriage return (13)
property pLineEndingUnix : ASCII character 10 -- line feed character
property pLinefeed : pLineEndingUnix -- use this line feed character in logs
--lists used in place of database, used as properties to access anywhere
global pTotalDelCount
global pMsgHeaderSubject
global pMsgHeaderSender
global pMsgHeaderDate
global pMsgHeaderEntID
global pMimeID --keep the IDs in the list for faster search
global pRemovedMessages
global pCountDone
global ptimeStart
-------------------------------------------------------
set pRemovedMessages to 0
set pCountDone to 0
set ptimeStart to current date
-------------------------------------------------------
-- set up log file on desktop
set pBaseFolder to (the path to the desktop) as text
set pLogFolder to pBaseFolder & pEntourageResultsFolder & ":"
if pLogEventsOn is true then
my LogEntry(return & return & "Starting up...", 0)
my LogEntry("Using Lists instead of a database, timing results", 0)
-- create the Log folder on disk if not created before
if not my FileExists(pLogFolder) then
my LogEntry("Creating folder: \"" & pLogFolder & "\".", 0)
ignoring application responses
tell application "Finder"
make new folder at (pBaseFolder as alias) with properties {name:pEntourageResultsFolder}
end tell
end ignoring
end if
end if
---==============================================
--main entry and exit to the program
set pTotalDelCount to 0 --init to zero
set pMsgHeaderSubject to {}
set pMsgHeaderSender to {}
set pMsgHeaderDate to {}
set pMsgHeaderEntID to {}
set pMimeID to {}
set folderCount to ProcessFolder(application "Microsoft Entourage") --{msgCount,
display dialog "Deleted " & pTotalDelCount & " messages from " & folderCount & " folders" buttons {"Ok"} giving up after 20
---==============================================
on ProcessFolder(theFolder) --recursive starting at top Entourage folders
local theFolder, folderItem
set folderList to {}
set RunningTotal to 0 -- set deleted message count to 0
set NameList to {"Deleted Items", "Drafts", "Outbox", "Junk E-mail"} -- skips useless to check folders
tell application "Microsoft Entourage"
set folderList to the folders of theFolder
end tell
set folderCount to count the items of folderList
if folderCount ≠0 then
repeat with folderItem in folderList
set theContents to the folderItem
set theName to name of folderItem as text
if theName is not in NameList then
--log the folder and its messages
if pLogEventsOn is true then
if the name of theFolder ≠"Microsoft Entourage" then
my LogEntry("*************** Processing folder " & theName & ", a sub-folder of " & the name of theFolder, 1)
else
my LogEntry("********** Processing folder " & theName & ", a top folder for your Email", 1)
end if
tell application "Microsoft Entourage"
my LogEntry("******* Processing " & (count messages of folderItem) & " messages", 0)
end tell
end if
-------------------------
set DelCnt to my delDupes(folderItem)
set RunningTotal to RunningTotal + DelCnt
if pLogEventsOn is true then
my LogEntry("******* Messages deleted from this folder: " & DelCnt, 2)
end if
--process if this folder has sub-folders
set FldrCnt to my ProcessFolder(folderItem)
set folderCount to folderCount + FldrCnt
end if
end repeat
end if
set pTotalDelCount to pTotalDelCount + RunningTotal
return folderCount
end ProcessFolder
---==============================================
on delDupes(theFolder)
set DelCnt to 0
set eStr to ""
set eNum to 0
set rList to ""
set badObj to ""
set expectedType to ""
set theMailList to {}
set theResult to 0
set pCountDone to pCountDone + 1
tell application "Microsoft Entourage"
set FolderMsgCnt to the count of messages of theFolder
set theMailList to every message of theFolder -- speeds up processing
end tell
if FolderMsgCnt ≠0 then
if pSpeakMsg is true then
set SpokenMsg to (the name of theFolder & " is folder " & pCountDone) as string
say SpokenMsg
say "Messages to check. " & FolderMsgCnt
end if
script theRefScript
property theMailListRef : theMailList
end script
repeat with FolderMsgPointer from FolderMsgCnt to 1 by -1 --get all of the messages in the folder
try
-- get message header info
set msgContents to the item FolderMsgPointer of theRefScript's theMailListRef
tell application "Microsoft Entourage"
set msgSubject to (get subject of msgContents) --as text
set msgSenderAddress to (get sender's address of msgContents) --as text
set msgSentDate to (get time sent of msgContents) -- as text
set msgID to (get ID of msgContents) --as text
set theMsgContents to msgContents's (headers)
end tell
set uniqueID to my getmessageID(theMsgContents)
----------------------
-- sort list and delete any duplicates here
if uniqueID is in pMimeID then --if uniqueID in List then there is a possible match from email to database
set msgDeleted to my getRecMatch(uniqueID, msgID, msgSentDate, msgSubject, msgSenderAddress, FolderMsgPointer, theFolder)
else
set msgDeleted to false
end if
--add next item to the existing list if it was not deleted
if msgDeleted is false then
if pMimeID is {} then
set pMimeID to {uniqueID}
set pMsgHeaderSubject to {msgSubject}
set pMsgHeaderSender to {msgSenderAddress}
set pMsgHeaderDate to {msgSentDate}
set pMsgHeaderEntID to {msgID}
else
set end of pMimeID to uniqueID
set end of pMsgHeaderSubject to msgSubject
set end of pMsgHeaderSender to msgSenderAddress
set end of pMsgHeaderDate to msgSentDate
set end of pMsgHeaderEntID to msgID
end if
end if
on error eStr number eNum partial result rList from badObj to expectedType
if pLogEventsOn is true then
my LogEntry("Trouble in the delDupes handler............................................", 0)
my LogEntry("Folder message number: " & FolderMsgPointer, 0)
my LogEntry("Message Title: " & msgSubject, 0)
my LogEntry("Sender address: " & msgSenderAddress, 0)
my LogEntry("Message Date: " & msgSentDate, 0)
my LogEntry(" Message ID: " & uniqueID, 0)
end if
if pDebugModeOn is true then
log (current date)
log "Trouble reported, see log file. Reference folder: " & name of the theFolder & " and its message number: " & FolderMsgPointer
log eStr & return & eNum & return & rList
end if
end try
end repeat
tell application "Microsoft Entourage"
set removedCount to FolderMsgCnt - (count messages of theFolder)
end tell
set theResult to removedCount
if pSpeakMsg is true then
if removedCount is 0 then
say "Finished. No duplicates detected"
else
say "Finished. " & removedCount & " duplicates removed"
end if
end if
end if
return theResult
end delDupes
---==============================================
on LogEntry(someText, timingAct)
if not pLogEventsOn then return
--if log folder does not exist then create it
if not my FileExists(pLogFolder) then
tell application "Finder"
make new folder at (pBaseFolder as alias) with properties {name:pEntourageResultsFolder}
end tell
my LogEntry("Creating folder: \"" & pLogFolder & "\".", 0)
end if
--create text entry to log file, append to end of file to maintain long record
set logFile to (pLogFolder as text) & pLogFileName
set logRef to 0
ignoring application responses
try
set logRef to open for access file logFile with write permission
end try
if logRef ≠0 then
write FormatDateTime(current date) & ": " & someText & pLinefeed starting at eof to logRef
if timingAct is 1 then
set ptimeStart to current date
end if
if timingAct is 2 then
write "Time to complete: " & tab & ((current date) - ptimeStart) & pLinefeed starting at eof to logRef
write "Email entries running total (pMimeID): " & tab & (count items in pMimeID) & pLinefeed starting at eof to logRef
write "Email entries deleted: " & tab & pTotalDelCount & pLinefeed starting at eof to logRef
end if
close access logRef
end if
end ignoring
end LogEntry
---==============================================
on FormatDateTime(theDate)
set theDate to theDate as date
set dd to text -2 thru -1 of ("0" & theDate's day)
copy theDate to tempDate
set the month of tempDate to January
set mm to text -2 thru -1 of ¬
("0" & 1 + (theDate - tempDate + 1314864) div 2629728)
set yy to text -1 thru -4 of ((year of theDate) as text)
set hh to time string of theDate
return (yy & "/" & mm & "/" & dd & " " & hh as text)
end FormatDateTime
---==============================================
on FileExists(someFile)
try
set anAlias to (someFile as alias)
return true
on error
return false
end try
end FileExists
---==============================================
--searches lists for a match, runs through the number of matches made
-- returns the list item if there is any match of uniqueID, subject, date, and any other field
--deletes the record if the number of header items matches is greater than 50% along with the Mime Message ID being the same
on getRecMatch(uniqueID, msgID, msgSentDate, msgSubject, msgSenderAddress, FolderMsgPointer, theFolder)
set DelCnt to 0
set eStr to ""
set eNum to 0
set rList to ""
set badObj to ""
set expectedType to ""
set recid to {}
set recDate to {}
set recSubj to {}
set recAddr to {}
set allrecMatches to {}
try
set recKey to my FindAll(pMimeID, uniqueID) -- id of them all, but only use the last one, as that is what counts
set LastRecKey to (last item of recKey) as list
if pDebugModeOn is true then
log "UniqueID match to existing record(s): " & recKey
log "Will ony use: " & LastRecKey & ", as it is the one that counts."
end if
-----
set recid to my FindAll(pMsgHeaderEntID, msgID) -- id of them all, REQUIRES Satimage's osax for 'intersection of' (http://www.satimage.fr/software/en/downloads/downloads_companion_osaxen.html)
if recid is not false then
-- this code is slower (a hundredth of a second) versus 'intersection of' on test runs, thanks to McUsr's millisec routines used to check it. I left it in case others have ideas...
(*set recKey2recID to {}
if (count of LastRecKey) < (count of recid) then
repeat with theItem in LastRecKey
if theItem is in recid then set end of recKey2recID to theItem
end repeat
else
repeat with theItem in recid
if theItem is in LastRecKey then set end of recKey2recID to theItem
end repeat
end if*)
set recKey2recID to intersection of LastRecKey and recid with removing duplicates -- what items from the 2 lists match
end if
if pDebugModeOn is true then
log "Entourage ID match to existing record(s): " & recid
if recKey2recID ≠{} then log "Match found to RecKey item identity"
end if
-----
set recDate to my FindAll(pMsgHeaderDate, msgSentDate) -- id of them all
if recDate is not false then
set recKey2recDate to intersection of LastRecKey and recDate with removing duplicates -- what items from the 2 lists match
end if
if pDebugModeOn is true then
log "Email date & time match to existing record(s): " & recDate
if recKey2recDate ≠{} then log "Match found to RecKey item identity"
end if
-----
set recSubj to my FindAll(pMsgHeaderSubject, msgSubject) -- id of them all
if recSubj is not false then
set recKey2recSubj to intersection of LastRecKey and recSubj with removing duplicates -- what items from the 2 lists match
end if
if pDebugModeOn is true then
log "Email subject match to existing record(s): " & recSubj
if recKey2recSubj ≠{} then log "Match found to RecKey item identity"
end if
-----
set recAddr to my FindAll(pMsgHeaderSender, msgSenderAddress) -- id of them all
if recAddr is not false then
set recKey2recAddr to intersection of LastRecKey and recAddr with removing duplicates -- what items from the 2 lists match
end if
if pDebugModeOn is true then
log "Email sender match to existing record(s): " & recAddr
if recKey2recAddr ≠{} then log "Match found to RecKey item identity"
end if
-----
set allrecMatches to recKey2recID & recKey2recDate & recKey2recSubj & recKey2recAddr
set recKeyMatchesCnt to count items of allrecMatches
if recKeyMatchesCnt < 3 then
set msgDeleted to false
return msgDeleted
end if
if recKeyMatchesCnt > 2 then
tell application "Microsoft Entourage"
delete message FolderMsgPointer of theFolder
end tell
if pLogEventsOn is true then
my LogEntry("Deleting a duplicate email...", 0)
my LogEntry("This folder's message # " & FolderMsgPointer & " deleted...", 0)
end if
if pDebugModeOn is true then
log "Folder message # " & FolderMsgPointer & " has been deleted"
log "from folder: " & name of theFolder
log (current date)
log return
end if
set msgDeleted to true
end if
on error eStr number eNum partial result rList from badObj to expectedType
--means there is no match and a problem was found
if pDebugModeOn is true then
log "Error in searching the lists. The reported recKey is: " & recKey
set recid to {}
log " Trouble... "
log uniqueID
log "Message is number: " & FolderMsgPointer & " of folder: " & name of theFolder
log (current date)
end if
if pLogEventsOn is true then
my LogEntry(eStr, 0) --log eStr
my LogEntry(eNum, 0) --log eNum
my LogEntry(rList, 0) --log rList
my LogEntry(recKey2recID, 0)
my LogEntry(recKey2recDate, 0)
my LogEntry(recKey2recSubj, 0)
my LogEntry(recKey2recAddr, 0)
end if
end try
set msgDeleted to false
return msgDeleted
end getRecMatch
---==============================================
on getmessageID(msgHeaders)
local msgHeadersStart, msgHeadersEnd
-- get the headers
script theRefScript
property msgHeadersRef : msgHeaders -- the trick to have a reference for a list, this gets fastest results
end script
if theRefScript's msgHeadersRef contains "Message-ID:" then
set saveDelims to AppleScript's text item delimiters
set AppleScript's text item delimiters to return -- split full header text into paragraphs
set headerList to the text items of theRefScript's msgHeadersRef
set AppleScript's text item delimiters to saveDelims
set theRefScript's msgHeadersRef to the paragraphs of theRefScript's msgHeadersRef
set msgHeadersStart to items 1 thru ((length of theRefScript's msgHeadersRef) div 2) of theRefScript's msgHeadersRef
set msgHeadersEnd to items ((length of theRefScript's msgHeadersRef) div 2) thru -1 of theRefScript's msgHeadersRef
if (msgHeadersStart contains "Message-ID:") or ((msgHeadersStart as text) contains "Message-ID: <") then
if item -1 of msgHeadersStart does not start with "Message-ID:" then
set theRefScript's msgHeadersRef to msgHeadersStart
end if
else
if (msgHeadersEnd contains "Message-ID:") or ((msgHeadersEnd as text) contains "Message-ID: <") then
set theRefScript's msgHeadersRef to msgHeadersEnd
end if
end if
-- find the Message-ID header
set WatchIt to false --this tracks to see if Message ID is in a line by itself and the actual ID is on the next line
repeat with nextHeader in theRefScript's msgHeadersRef
if WatchIt is false then
if nextHeader starts with "Message-ID:" then
if length of nextHeader > length of "Message ID:" then
set nextHeader to (characters 12 thru -1 of nextHeader) as string --eliminates Message-ID:--msgIDHeader
exit repeat
else
set WatchIt to true
end if
end if
else
exit repeat
end if
end repeat
if pDebugModeOn is true then
log nextHeader -- splitID
end if
-- remove whitespace characters from the front of the string
set done to false
set whiteChars to (ASCII character 10) & space & tab & return
repeat until done
if whiteChars contains the first character of nextHeader then
set nextHeader to (characters 2 through -1 of nextHeader) as text
else
set done to true
end if
end repeat
-- remove surrounding < > chars if neeeded
if the first character of nextHeader is "<" then set nextHeader to (characters 2 through -1 of nextHeader) as text
if the last character of nextHeader is ">" then set nextHeader to (characters 1 through -2 of nextHeader) as text
else
-- this message has no Message-ID: header - this is abnormal
if pLogEventsOn is true then
my LogEntry("Message-ID headers are missing, means header is screwed up...", 0)
my LogEntry("This one will be stored as a bogus one", 0)
my LogEntry(nextHeader, 0)
end if
set nextHeader to "Message-ID header missing!"
end if
return nextHeader --as text)
end getmessageID
---==============================================
-- a routine to find where, if anywhere, in a list, a value is
on FindAll(theList, theValue)
try
if theList does not contain theValue then return {}
set theResult to {}
script theRefScript
property ListRef : theList -- the trick to have a reference for a list, this gets fastest results
end script
repeat with i from 1 to count of theRefScript's ListRef -- number of items in theList
if theRefScript's ListRef's item i is theValue then set theResult's end to i
end repeat
return theResult
on error eMsg number eNum
if pDebugModeOn is true then
log "Can't findAll: " & eMsg & " error number: " & eNum
end if
return {}
end try
end FindAll
Model: MacBook (early 2009)
Browser: Safari 533.16
Operating System: Mac OS X (10.6)