Pulling All Hyperlinks from Email Text & Storing in a Variable

Is there a way of checking the content of an email and copying all the hyperlinks within the text to a single variable?
I’ve written a script that searches a selected email content and subject for values that I need. However, I am not sure how to approach the hyperlinks.

Hyperlinks in the emails will always start with the HTTPS:// prefix but there may be several of them, or there may be none of them.

I want all hyperlinks stored in a single variable so I can drop them into a database using the finished script.

This is my first time using the delimiters within Applescript, so this might be a bit messy - the script below is already searching and storing several other values I need to pull from these emails:

set subjectSearch to "-"
set bSearch to "List:"
set qSearch to "Name"
set bkbSearch to "Age"
set qkbSearch to "Birth"
set refSearch to "#"
set theResults to {}

tell application "Mail"
	try
		set theMessages to selection
		if theMessages is {} then
			display alert "No Messages Selected" message "Select the messages you want to collect before running this script."
			error number -128
			return
		end if
		
		set dateList to {}
		set reportText to ""
		repeat with theMessage in theMessages
			set reportText to reportText & (content of theMessage) as string
		end repeat
		
		set theurls to (content of theMessage) as string
		
		repeat with tMsg in (get selection)
			set end of dateList to the short date string of (get date received of tMsg)
		end repeat
		
		repeat with aMessage in theMessages
			tell aMessage
				set theSubject to subject
				set theContent to content
			end tell
			set refNum to my getFirstWordAfterSearchText(refSearch, theSubject)
			set refB to my getFirstWordAfterSearchText(bSearch, theContent)
			set refQ to my getFirstWordAfterSearchText(qSearch, theContent)
			set refBkb to my getFirstWordAfterSearchText(bkbSearch, theContent)
			set refQkb to my getFirstWordAfterSearchText(qkbSearch, theContent)
		end repeat
	on error theError
		display dialog theError buttons {"OK"} default button 1
		return
	end try
end tell
end
on getFirstWordAfterSearchText(searchString, theText)
	try
		set {tids, AppleScript's text item delimiters} to {AppleScript's text item delimiters, searchString}
		set textItems to text items of theText
		set AppleScript's text item delimiters to tids
		return (first word of (item 2 of textItems))
	on error theError
		return ""
	end try
end getFirstWordAfterSearchText


return {dateList, remNum, refB, refQ, refBkb, refQkb}



If this isn’t possible - maybe simply copying all text that occurs after a certain word to a variable is simpler? Since the URLs I’m trying to grab always appear at the end of the email - so maybe copy everything that appears after the word “URL’s” ?

Hello Adam

You may use that as a draft.
You will not be surprised to read that most of the code was borrowed from Shane STANLEY.

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

script o
	property theSources : {}
	property theMessages : {}
end script

on findURLsIn:theString
	set theNSDataDetector to current application's NSDataDetector's dataDetectorWithTypes:(current application's NSTextCheckingTypeLink) |error|:(missing value)
	set theURLsNSArray to theNSDataDetector's matchesInString:theString options:0 range:{location:0, |length|:length of theString}
	return (theURLsNSArray's valueForKeyPath:"URL.absoluteString") as list
end findURLsIn:


set linksList to {}
tell application "Mail"
	set o's theMessages to the selection
	repeat with aMessage in o's theMessages
		set end of o's theSources to source of aMessage
	end repeat
end tell
--its findURLsIn:theContent
o's theSources as list

set theNSDataDetector to current application's NSDataDetector's dataDetectorWithTypes:(current application's NSTextCheckingTypeLink) |error|:(missing value)

set theArray to current application's NSMutableArray's array()
repeat with aSource in o's theSources
	set aSource to (current application's NSString's stringWithString:aSource) # Thanks to Nigel GARVEY which pointed to this omission
	set theURLsNSArray to (theNSDataDetector's matchesInString:aSource options:0 range:{location:0, |length|:aSource's |length|()})
	set anArray to (theURLsNSArray's valueForKeyPath:"URL.absoluteString")
	(theArray's addObject:anArray)
end repeat
# theArray is an array of arrays
set theArray to (theArray's valueForKeyPath:"@unionOfArrays.self")
# drop duplicates and sort the array
set theSet to current application's NSOrderedSet's orderedSetWithArray:theArray
set theArray to (theSet's array())'s sortedArrayUsingSelector:"localizedStandardCompare:"

set thePred to current application's NSPredicate's predicateWithFormat:"self BEGINSWITH 'https:'"
set theArray to theArray's filteredArrayUsingPredicate:thePred
set theNSString to theArray's componentsJoinedByString:linefeed
theNSString as text

Edited according to Nigel GARVEY’s comment

Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) mardi 14 mars 2017 13:56:58

As I’m not sure of what you really need, here is an alternate version.
A property allow it to return
– a list of strings whose each component is empty or contain the found links separated by the substring " | "
– a list of the links available in a mail.

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

property buildListOfStrings : true
# true = return a list of strings built by concatenation of links available in a mail using the string " | " as separator
# false = return a list of list of links available in a mail

script o
	property theMessages : {}
	property theLinks : {}
end script

tell application "Mail"
	set o's theMessages to the selection
	repeat with aMessage in o's theMessages
		set aSource to source of aMessage
		set end of o's theLinks to (my extractLinks:aSource)
	end repeat
end tell

o's theLinks

on extractLinks:aSource
	set theNSDataDetector to current application's NSDataDetector's dataDetectorWithTypes:(current application's NSTextCheckingTypeLink) |error|:(missing value)
	set aSource to (current application's NSString's stringWithString:aSource) # Thanks to Nigel GARVEY which pointed to this omission
	set theURLsNSArray to (theNSDataDetector's matchesInString:aSource options:0 range:{location:0, |length|:aSource's |length|()})
	set theArray to (theURLsNSArray's valueForKeyPath:"URL.absoluteString")
	# drop duplicates and sort the array
	set theSet to current application's NSOrderedSet's orderedSetWithArray:theArray
	set theArray to (theSet's array())'s sortedArrayUsingSelector:"localizedStandardCompare:"
	
	set thePred to current application's NSPredicate's predicateWithFormat:"self BEGINSWITH 'https://'"
	set theArray to theArray's filteredArrayUsingPredicate:thePred
	if buildListOfStrings then
		set theNSString to theArray's componentsJoinedByString:" | " # You may define an other separator. If you use a comma you will be unable to know which link belongs to which mail
		return theNSString as text
	else
		return theArray as list
	end if
end extractLinks:

Edited according to Nigel GARVEY’s comment.

Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) mardi 14 mars 2017 14:25:37

Hi Yvan.

I’m not sure it’s relevant here, but just to point out that ‘length of aSource’ returns the number of characters in the AppleScript text ‘aSource’, whereas the |length| of an NSString is measured in 16-bit units. These aren’t always the same, so ideally, the latter should be used for the range value:

set aSource to current application's NSString's stringWithString:aSource
set theURLsNSArray to (theNSDataDetector's matchesInString:aSource options:0 range:{location:0, |length|:aSource's |length|()})

What an ass.

Thanks Nigel. I knew that but failed to take care of it.

I edited my two posts.

Yvan KOENIG running Sierra 10.12.3 in French (VALLAURIS, France) mardi 14 mars 2017 15:46:49