On my gallant search for my holygrail I have found myself here I have spent nearly an entire day looking for what I need and can’t seem to find the answer.
I use mail.app on my iMac running mountain lion and need to extract all email addresses in all mailboxes. I am trying to tidy up my database and network to be able to group clients into very specific and targetted fields.
So far, I have a script which allows me to extract email addresses in the ‘to’, ‘cc’, ‘bcc’ fields but what I need is to add into this applescript some scripting which looks through the rest of the email and extracts all other email addresses too. I have very bits of script which will do this, but the results are put in to the ‘results’ section of the editor.
I need to have the 2 scripts together and and also for all the data to be dropped into a .txt file.
Below is what I currently have.
"
font-family: Monaco, 'Courier New', Courier, monospace;
font-size: 10px;
font-weight: normal;
margin: 0px;
padding: 5px;
border: 1px solid #000000;
width: 720px; height: 340px;
color: #000000;
background-color: #E6E6EE;
overflow: auto;"
"this text can be pasted into the AppleScript Editor"
-- Merge Two Scripts Here
tell application "Mail"
set selectionMessage to selection -- just select the first message in the folder
set thisMessage to item 1 of selectionMessage
set theseMessages to (every message in (mailbox of thisMessage))
set listOfEmails to {}
-- End of Original set and beginning of new set
repeat with eachMessage in theseMessages
try
set theFrom to (extract address from sender of eachMessage)
if listOfEmails does not contain theFrom then
copy theFrom to the end of listOfEmails
end if
-- To field Extract
if (address of to recipient) of eachMessage is not {} then
repeat with i from 1 to count of to recipient of eachMessage
set theTo to (address of to recipient i) of eachMessage as string
if listOfEmails does not contain theTo then
copy theTo to the end of listOfEmails
end if
end repeat
end if
-- BCC Extract
if (address of bcc recipient) of eachMessage is not {} then
repeat with i from 1 to count of bcc recipient of eachMessage
set thebcc to (address of bcc recipient i) of eachMessage as string
if listOfEmails does not contain thebcc then
copy thebcc to the end of listOfEmails
end if
end repeat
end if
-- CC Extract
if (address of cc recipient) of eachMessage is not {} then
repeat with i from 1 to count of cc recipient of eachMessage
set theCC to (address of cc recipient i) of eachMessage as string
if listOfEmails does not contain theCC then
copy theCC to the end of listOfEmails
end if
end repeat
end if
-- Body Extract
if (address of bcc recipient) of eachMessage is not {} then
repeat with i from 1 to count of bcc recipient of eachMessage
set thebcc to (address of bcc recipient i) of eachMessage as string
if listOfEmails does not contain thebcc then
copy thebcc to the end of listOfEmails
end if
end repeat
end if
end try
end repeat
end tell
tell application "Finder" to set ptd to path to documents folder as string
set theFile to ptd & "extracted.txt"
set theFileID to open for access theFile with write permission
set SortedListOfEmails to simple_sort(listOfEmails)
repeat with i from 1 to count of SortedListOfEmails
write item i of SortedListOfEmails & return to theFileID as «class utf8»
end repeat
close access theFileID
on simple_sort(my_list)
set the index_list to {}
set the sorted_list to {}
repeat (the number of items in my_list) times
set the low_item to ""
repeat with i from 1 to (number of items in my_list)
if i is not in the index_list then
set this_item to item i of my_list as text
if the low_item is "" then
set the low_item to this_item
set the low_item_index to i
else if this_item comes before the low_item then
set the low_item to this_item
set the low_item_index to i
end if
end if
end repeat
set the end of sorted_list to the low_item
set the end of the index_list to the low_item_index
end repeat
return the sorted_list
end simple_sort
Firstly, a couple of points about your existing script. ‘(address of to repicient)’ should be ‘(address of to recipients)’ ” ie. ‘recipients’ in the plural. Similarly with the ‘cc recipients’ and ‘bcc recipients’. These references get you the information you want anyway, so there’s no no point in then laboriously counting the message’s recipients and extracting the e-mail address from each in turn:
I’ve knocked together a shell script to extract every e-mail address from a body text. It’s pretty crude, but should work in most cases. However, if the text of the message is very long, it’ll be necessary to use another method to get it into the shell script.
tell application "Mail"
set selectionMessage to selection -- just select the first message in the folder
set thisMessage to item 1 of selectionMessage
set theseMessages to (every message in (mailbox of thisMessage))
set listOfEmails to {}
-- End of Original set and beginning of new set
repeat with eachMessage in theseMessages
set theFrom to (extract address from sender of eachMessage)
if listOfEmails does not contain theFrom then
set the end of listOfEmails to theFrom
end if
-- To field Extract
set toAddresses to address of to recipients of eachMessage
repeat with i from 1 to (count toAddresses) -- The repeat won't happen if toAddresses is empty
set theTo to item i of toAddresses
if listOfEmails does not contain theTo then
set the end of listOfEmails to theTo
end if
end repeat
-- BCC Extract
set bccAddresses to address of bcc recipients of eachMessage
repeat with i from 1 to (count bccAddresses)
set thebcc to item i of bccAddresses
if listOfEmails does not contain thebcc then
set the end of listOfEmails to thebcc
end if
end repeat
-- CC Extract
set ccAddresses to address of cc recipients of eachMessage
repeat with i from 1 to (count ccAddresses)
set theCC to item i of ccAddresses
if listOfEmails does not contain theCC then
set the end of listOfEmails to theCC
end if
end repeat
-- Body Extract
set bodyText to content of eachMessage
try
set bodyAddresses to paragraphs of (do shell script "<<<" & quoted form of bodyText & " grep -Eo '[[:alnum:]][^[:space:]<>@\":;]+@[^ <>\"]+[][:alpha:]]'")
repeat with i from 1 to (count bodyAddresses)
set thisAddress to item i of bodyAddresses
if (listOfEmails does not contain thisAddress) then
set end of listOfEmails to thisAddress
end if
end repeat
end try
end repeat
end tell
set SortedListOfEmails to simple_sort(listOfEmails)
set ptd to path to documents folder as string
set theFile to ptd & "extracted.txt"
set theFileID to open for access theFile with write permission
try
repeat with i from 1 to count of SortedListOfEmails
write item i of SortedListOfEmails & return to theFileID as «class utf8»
end repeat
end try
close access theFileID
on simple_sort(my_list)
set the index_list to {}
set the sorted_list to {}
repeat (the number of items in my_list) times
set the low_item to ""
repeat with i from 1 to (number of items in my_list)
if i is not in the index_list then
set this_item to item i of my_list as text
if the low_item is "" then
set the low_item to this_item
set the low_item_index to i
else if this_item comes before the low_item then
set the low_item to this_item
set the low_item_index to i
end if
end if
end repeat
set the end of sorted_list to the low_item
set the end of the index_list to the low_item_index
end repeat
return the sorted_list
end simple_sort
If you have a very full mailbox and need a little extra speed, you could extract all the data from Mail in one go and sort through it by vanilla means:
tell application "Mail"
set selectionMessage to selection -- just select the first message in the folder
set thisMessage to item 1 of selectionMessage
set {allSenders, allTos, allBCCs, allCCs, allBodyTexts} to {sender, address of to recipients, address of bcc recipients, address of cc recipients, content} of every message in mailbox of thisMessage
end tell
set listOfEmails to {}
-- End of Original set and beginning of new set
repeat with i from 1 to (count allSenders)
tell application "Mail" to set theFrom to (extract address from item i of allSenders)
if listOfEmails does not contain theFrom then
set end of listOfEmails to theFrom
end if
-- To field Extract
set toAddresses to item i of allTos
repeat with j from 1 to (count toAddresses) -- The repeat won't happen if toAddresses is empty
set theTo to item j of toAddresses
if listOfEmails does not contain theTo then
set the end of listOfEmails to theTo
end if
end repeat
-- BCC Extract
set bccAddresses to item i of allBCCs
repeat with j from 1 to (count bccAddresses)
set thebcc to item j of bccAddresses
if listOfEmails does not contain thebcc then
set the end of listOfEmails to thebcc
end if
end repeat
-- CC Extract
set ccAddresses to item i of allCCs
repeat with j from 1 to (count ccAddresses)
set theCC to item j of ccAddresses
if listOfEmails does not contain theCC then
set the end of listOfEmails to theCC
end if
end repeat
-- Body Extract
set bodyText to item i of allBodyTexts
try
set bodyAddresses to paragraphs of (do shell script "<<<" & quoted form of bodyText & " grep -Eo '[[:alnum:]][^[:space:]<>@\":;]+@[^ <>\"]+[][:alpha:]]'")
repeat with j from 1 to (count bodyAddresses)
set thisAddress to item j of bodyAddresses
if (listOfEmails does not contain thisAddress) then
set end of listOfEmails to thisAddress
end if
end repeat
end try
end repeat
-- Rest of the script as per.
Awesome script. It is what I need. Now how do I make the following adjustments to it:
I want to save it as a csv file in the following format:
From:
To:
Cc:
Bcc:
Date & Time:
Subject:
If I have a common email address, for instance @xyz.gov.in is there a way to just search all those and save all those email addressed in the above format?
Extracts not only the e-mail addresses of the messages’ senders and recipients (but probably not any addresses in the body texts), but also the times sent/received, the subjects, and optionally the senders’ and recipients’ display names, where these exist. It’s not clear if these data will be required just for selected messages or for all messages in the same mailbox as the selection, as in the original script.
Doesn’t weed out duplicate e-mail addesses, sort the remainder lexically, and store them one-per-line with returns for line endings; but collates all the extracted data for each message in CSV format, apparently with six records per message, each record having two fields: a header and a value. CSV records are of course separated by linefeeds or return-linefeed pairs.
Compose all the CSV data only for messages where any of the sender/recipient e-mail addresses has a particular domain? It’s undoubtedly doable. You’d have to decide if this was to be the normal modus operandi and whether the domain was to be fixed/default/askable-for.