Extract email addresses from body of email

Hi all,

I’m looking for an AppleScript that extracts email addresses from the body of emails.

I tried this one but it didn’t quite return only email addresses from the body, it returned email addresses and some other text as well:


set theAddresses to ""
try
    set TID to AppleScript's text item delimiters
    set AppleScript's text item delimiters to {space, character id 10, character id 13}
    tell application "Mail"
        repeat with theMessage in (get selection)
            set theText to text items of (get content of theMessage)
            repeat with thisItem in theText
                if thisItem contains "@" then
                    set theAddresses to theAddresses & (thisItem as rich text) & return
                end if
            end repeat
        end repeat
    end tell
    set the clipboard to theAddresses
    set AppleScript's text item delimiters to TID
on error
    set AppleScript's text item delimiters to TID
end try
the clipboard

If any of you could help me to return only email addresses from the body, that would be really helpful.

Thanks,
Stefan.

Hi Stefan. Welcome to MacScripter.

You don’t say what other text you’re getting, but I imagine you’ll need to add a few more characters to the text item delimiters list in your script:

-- Characters which may be adjacent to e-mail addresses in a text. Add any more you may discover.
set AppleScript's text item delimiters to {space, linefeed, return, tab, quote, character id 160, "<", ">", "?", "!", ":", ";", "(", ")"}

Edit: Removed “.” from the delimiter list suggested above and in the script immediately below! [No red-face emoji available.] The script still treats “uucp@localhost” as an e-mail address, though.

Breaking up the text into lots of little chunks and going through them all to see if any contain “@” can take quite a while, simply because of the large number of chunks. You could speed up the process considerably by breaking up the text at the "@"s instead (producing fewer chunks) and reconstructing the e-mail addresses from the relevant bits on either side of each break:


set theAddresses to {}
try
	set TID to AppleScript's text item delimiters
	tell application "Mail" to set selectedMessages to (get selection)
	repeat with theMessage in selectedMessages
		tell application "Mail" to set theContent to (get content of theMessage)
		-- Break the text at the ampersand characters in it (if any).
		set AppleScript's text item delimiters to "@"
		set theText to text items of theContent
		-- Assuming the "@"s are from e-mail addresses, reconstruct the addresses from the relevant text either side of the breaks.
		set AppleScript's text item delimiters to {space, linefeed, return, tab, quote, character id 160, "<", ">", "?", "!", ":", ";", "(", ")"}
		repeat with i from 2 to (count theText)
			set end of theAddresses to (text item -1 of item (i - 1) of theText) & "@" & (text item 1 of item i of theText)
		end repeat
	end repeat
	
	set AppleScript's text item delimiters to return
	set the clipboard to (theAddresses as text)
	set AppleScript's text item delimiters to TID
on error
	set AppleScript's text item delimiters to TID
end try
the clipboard

If you’re running a recent Mac OS version, it’s also possible to use AppleScriptObjC and a regex to find e-mail addresses, which is likely to be more accurate:


use AppleScript version "2.4" -- Mac OS 10.10 (Yosemite) or later.
use framework "Foundation"
use scripting additions

set emailRegex to current application's class "NSRegularExpression"'s regularExpressionWithPattern:("\\b(?i)[a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,}\\b") options:(0) |error|:(missing value)
set theAddresses to current application's class "NSMutableArray"'s new()

tell application "Mail" to set selectedMessages to (get selection)
repeat with theMessage in selectedMessages
	tell application "Mail" to set theContent to (get content of theMessage)
	set theContent to (current application's class "NSString"'s stringWithString:(theContent))
	set emailMatches to (emailRegex's matchesInString:(theContent) options:(0) range:({0, theContent's |length|()}))
	repeat with thisMatch in emailMatches
		tell theAddresses to addObject:(theContent's substringWithRange:(thisMatch's range()))
	end repeat
end repeat

set the clipboard to ((theAddresses's componentsJoinedByString:(return)) as text)
the clipboard

Alternate way using ASObjC :

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions
on findURLsIn:theString
	set theNSDataDetector to current application's NSDataDetector's dataDetectorWithTypes:(current application's NSTextCheckingTypeLink) |error|:(missing value)
	set theURLsNSArray to theNSDataDetector's matchesInString:theString options:0 range:{location:0, |length|:length of theString}
	return (theURLsNSArray's valueForKeyPath:"URL.absoluteString") as list
end findURLsIn:


set theContent to ""
tell application "Mail"
	repeat with theMessage in (get selection)
		set theContent to theContent & linefeed & (get content of theMessage)
	end repeat
end tell
set mailtoAddresses to its findURLsIn:theContent
--> {"mailto:actu@news.batiactu.com", "mailto:batiactu@batiactugroupe.com", "mailto:actu@news.batiactu.com", "mailto:batiactu@batiactugroupe.com"}

# I guess that there is a better way to drop the "mailto:" component

set TID to AppleScript's text item delimiters
set text item delimiters to linefeed
set mailtoAddresses to mailtoAddresses as text
set text item delimiters to {"mailto:"}
set mailtoAddresses to text items 2 thru -1 of mailtoAddresses
set text item delimiters to TID
mailtoAddresses
(*
{"actu@news.batiactu.com
", "batiactu@batiactugroupe.com
", "actu@news.batiactu.com
", "batiactu@batiactugroupe.com"}*)

Yvan KOENIG running Sierra 10.12.6 in French (VALLAURIS, France) mercredi 11 octobre 2017 15:00:15

Thanks, Yvan. I thought there must be an NSDataDetector method to find e-mail addresses, but since I don’t regard e-mail addresses as “links”, I didn’t think to try it!

In fact, though, the method gets not only e-mail addresses (which are returned as “mailto:” links) but “mailto:” links with query extensions and links of other protocol types such as “http://”. The extraneous stuff needs to be weeded out:

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

on findEmailAddressesIn:theString
	-- Locate all the "links" in the text.
	set theNSDataDetector to current application's NSDataDetector's dataDetectorWithTypes:(current application's NSTextCheckingTypeLink) |error|:(missing value)
	set theString to current application's NSString's stringWithString:theString
	set theURLsNSArray to theNSDataDetector's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
	-- Extract them as link URL strings.
	set URLStrings to (theURLsNSArray's valueForKeyPath:"URL.absoluteString")
	-- Lose any which don't contain "@".
	set emailPredicate to current application's NSPredicate's predicateWithFormat:"self CONTAINS '@'"
	set emailStrings to URLStrings's filteredArrayUsingPredicate:emailPredicate
	-- Join the remainder as a single, return-delimited text.
	set emailAddresses to emailStrings's componentsJoinedByString:(return)
	-- Edit out any instances of "mailto:" or of "?…"
	set emailAddresses to emailAddresses's stringByReplacingOccurrencesOfString:"mailto:|\\?[^\\r]*+" withString:"" options:(current application's NSRegularExpressionSearch) range:{location:0, |length|:emailAddresses's |length|()}
	-- Return as AppleScript text.
	return emailAddresses as text
end findEmailAddressesIn:


set theContent to ""
tell application "Mail"
	repeat with theMessage in (get selection)
		set theContent to theContent & linefeed & (get content of theMessage)
	end repeat
end tell
set emailAddresses to its findEmailAddressesIn:theContent

Here’s another approach:

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

on findEmailAddressesIn:theString
	-- Locate all the "links" in the text.
	set theNSDataDetector to current application's NSDataDetector's dataDetectorWithTypes:(current application's NSTextCheckingTypeLink) |error|:(missing value)
	set theString to current application's NSString's stringWithString:theString
	set theURLsNSArray to theNSDataDetector's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
	-- Extract email links
	set emailPredicate to current application's NSPredicate's predicateWithFormat:"self.URL.scheme == 'mailto'"
	set emailURLs to theURLsNSArray's filteredArrayUsingPredicate:emailPredicate
	-- Get just the addresses
	set emailsArray to emailURLs's valueForKeyPath:"URL.resourceSpecifier"
	--eliminate duplicates
	set emailsArray to (current application's NSSet's setWithArray:emailsArray)'s allObjects()
	-- Join the remainder as a single, return-delimited text
	set emailAddresses to emailsArray's componentsJoinedByString:(return)
	-- Return as AppleScript text
	return emailAddresses as text
end findEmailAddressesIn:


set theContent to ""
tell application "Mail"
	repeat with theMessage in (get selection)
		set theContent to theContent & linefeed & (get content of theMessage)
	end repeat
end tell
set emailAddresses to its findEmailAddressesIn:theContent

Here is yet another approach. Excluding the Mail text obtainment portion, it’s a one-liner, and it can function with older MacOS installs—at least as old as Mavericks. Caveat: It was minimally tested.

tell application "Mail" to set theText to (get selection)'s item 1's content

do shell script "echo " & theText's quoted form & " |  ruby -EBINARY -ne 'puts $_.scan /(\\w+@\\w*[.]\\w*)/' "

I don’t use Mail myself and my test messages happen to be five I still have in it from when John Day sent me them a few years ago precisely for the purpose of extracting e-mail addresses from their contents! I used sed at the time, but I can’t find the script and wouldn’t recommend it now anyway.

Going through the scripts in this thread again this morning:

  1. My initial TIDs script in response to Stefan turns out to be utter rubbish since it uses “.” as a delimiter between what is and what isn’t an e-mail address. Edit: Now corrected above. The script still treats “uucp@localhost” as an e-mail address, though.

  2. My regex script in the same post returns all the right results.

  3. Yvan’s suggestion returns “http:” links too and doesn’t remove query portions from any e-mail links which have them, such as in “mailto:fred@aardvark.com?subject=Hello”.

  4. My modification of that again returns all the right results.

  5. Shane’s suggestion returns only e-mail addresses and eliminates duplicates, but doesn’t remove query portions. I think his check for the URL scheme is more solid than mine for the presence of “@” and anything which looks like a query can be cut before the NSDataDetector link search. If wished, the addresses can be returned in the order found by using an NSOrderedSet instead of an NSSet.

  6. Marc’s script only allows for the presence of one dot in each address — which is better than my TIDs script!

My version of Shane’s version of my version of Yvan’s version:

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

on findEmailAddressesIn:theString
	set theString to current application's NSString's stringWithString:theString
	-- Cut anything which looks like a link query in the text.
	-- set theString to theString's stringByReplacingOccurrencesOfString:"\\?[^ \"]++" withString:("") options:(current application's NSRegularExpressionSearch) range:{location:0, |length|:theString's |length|()}
	-- Or cut any instance of "mailto:", which currently has the same effect on NSDataDetector's identification of e-mail addresses. 
	set theString to theString's stringByReplacingOccurrencesOfString:"mailto:" withString:("")
	-- Locate all the "links" in the text.
	set theNSDataDetector to current application's NSDataDetector's dataDetectorWithTypes:(current application's NSTextCheckingTypeLink) |error|:(missing value)
	set theURLsNSArray to theNSDataDetector's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
	-- Extract email links
	set emailPredicate to current application's NSPredicate's predicateWithFormat:"self.URL.scheme == 'mailto'"
	set emailURLs to theURLsNSArray's filteredArrayUsingPredicate:emailPredicate
	-- Get just the addresses
	set emailsArray to emailURLs's valueForKeyPath:"URL.resourceSpecifier"
	--eliminate duplicates
	set emailsArray to (current application's NSSet's setWithArray:emailsArray)'s allObjects()
	-- Or: set emailsArray to (current application's NSOrderedSet's orderedSetWithArray:emailsArray)'s array()
	-- Join the remainder as a single, return-delimited text
	set emailAddresses to emailsArray's componentsJoinedByString:(return)
	-- Return as AppleScript text
	return emailAddresses as text
end findEmailAddressesIn:


set theContent to ""
tell application "Mail"
	repeat with theMessage in (get selection)
		set theContent to theContent & linefeed & (get content of theMessage)
	end repeat
end tell
set emailAddresses to its findEmailAddressesIn:theContent

Edit: Minor change to the script in the light of Shane’s comment below.

And removing all instances of “mailto:” would also have the same result.

So it does! Any difference in performance would depend on what was in the text and would be minute, but it makes for a simpler line in the script and I don’t imagine the effect would be deliberately changed in the future. I’ve edited my script in post #7 accordingly.

Thank you all very much for your help with this.

I was able to use these answers to get what I needed. :smiley: