I have a small amount of text lines in text documents.
Those can either be only email addresses or only order numbers.
What I would like to do is to check if a file includes a list with only email addresses and if yes to sort them in alphabetical order and to get rid of the duplicates. So to have a list with unique email addresses.
Hi, epaminos.
Using 2 ready to use handlers from Shane Stanley, your task is easy:
use AppleScript version "2.4"
use framework "Foundation"
use scripting additions
set theText to "0001 some@gmail.com
12345 kniazidis.rompert@gmail.com
235 someoneOther@gmail.com
403"
-- OR:
-- set textFile to choose file of type "txt"
-- set theText to read textFile as text
-- Extract email adresses
set emailList to paragraphs of (my findEmailAddressesIn:theText)
if emailList is {} then
display dialog "The text doesn't contain any email address"
return
end if
-- Sort emails list alphabetical
set sortedAlphabeticalEmailList to my sortListOfStrings:emailList
-- Remove duplicates
-- the handler findEmailAddressesIn:theString eliminates duplicates too
-- so, no need additional steps
on findEmailAddressesIn:theString
-- Locate all the "links" in the text.
set theNSDataDetector to current application's NSDataDetector's dataDetectorWithTypes:(current application's NSTextCheckingTypeLink) |error|:(missing value)
set theString to current application's NSString's stringWithString:theString
set theURLsNSArray to theNSDataDetector's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
-- Extract email links
set emailPredicate to current application's NSPredicate's predicateWithFormat:"self.URL.scheme == 'mailto'"
set emailURLs to theURLsNSArray's filteredArrayUsingPredicate:emailPredicate
-- Get just the addresses
set emailsArray to emailURLs's valueForKeyPath:"URL.resourceSpecifier"
--eliminate duplicates
set emailsArray to (current application's NSSet's setWithArray:emailsArray)'s allObjects()
-- Join the remainder as a single, return-delimited text
set emailAddresses to emailsArray's componentsJoinedByString:(return)
-- Return as AppleScript text
return emailAddresses as text
end findEmailAddressesIn:
on sortListOfStrings:theList
-- convert list to Cocoa array
set theArray to current application's NSArray's arrayWithArray:theList
-- sort the array using a specific function
set theArray to ¬
theArray's sortedArrayUsingSelector:"localizedStandardCompare:"
-- return the sorted array as an AppleScript list
return theArray as list
end sortListOfStrings:
Thank you for one more time for your instant reply!
Well, that is awesome! My code was much poor:
set {aList, aSet} to {input, {}}
ignoring case
repeat with i from 1 to count aList
set anItem to item i of aList
if (anItem is in aSet)
then
-- do nothing
else
set end of aSet to anItem
end if
end repeat
end ignoring
set aSet to simple_sort(aSet)
set end of aSet to "
"
return aSet
on simple_sort(my_list)
set the index_list to {}
set the sorted_list to {}
repeat (the number of items in my_list) times
set the low_item to ""
repeat with i from 1 to (number of items in my_list)
if i is not in the index_list then
set this_item to item i of my_list as text
if the low_item is "" then
set the low_item to this_item
set the low_item_index to i
else if this_item comes before the low_item then
set the low_item to this_item
set the low_item_index to i
end if
end if
end repeat
set the end of sorted_list to the low_item
set the end of the index_list to the low_item_index
end repeat
return the sorted_list
end simple_sort
PS: Where I struggled: if the input had only email addresses to sort them alphabetically. But if the input was a mixed text, with email addresses and other text as well, then to extract and sort only the email addresses.
I don’t see any reason to avoid extracting step. Anyway, to determine if the text has only emails in it, you should extract emails firstly.
You can, of course break theText on text items using delimiters {space, linefeed} and check one by one if it is email address, but this only involves repeat loop and additional operations. This will only make the script slower, and therefore your plan is of no use to me personally. Or, I do not quite understand you and there is some benefit?
So, for fun, I’ll add the determining of the file content:
use AppleScript version "2.4"
use framework "Foundation"
use scripting additions
set theText to "0001 some@gmail.com
12345 kniazidis.rompert@gmail.com
235 someoneOther@gmail.com
403"
-- OR:
-- set textFile to choose file of type "txt"
-- set theText to read textFile as text
set ATID to AppleScript's text item delimiters
set AppleScript's text item delimiters to {space, return, linefeed}
set theTextItems to text items of theText
set AppleScript's text item delimiters to ATID
-- Extract email adresses
set emailList to paragraphs of (my findEmailAddressesIn:theText)
if emailList is {} then
display dialog "The text doesn't contain any email address"
return
end if
if (count emailList) is (count theTextItems) then
display dialog "The text has only email addresses"
else
display dialog "The text has mixed stuff"
end if
-- Sort emails list alphabetical
set sortedAlphabeticalEmailList to my sortListOfStrings:emailList
-- Remove duplicates
-- the handler findEmailAddressesIn:theString eliminates duplicates too
-- so, no need additional steps
on findEmailAddressesIn:theString
-- Locate all the "links" in the text.
set theNSDataDetector to current application's NSDataDetector's dataDetectorWithTypes:(current application's NSTextCheckingTypeLink) |error|:(missing value)
set theString to current application's NSString's stringWithString:theString
set theURLsNSArray to theNSDataDetector's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
-- Extract email links
set emailPredicate to current application's NSPredicate's predicateWithFormat:"self.URL.scheme == 'mailto'"
set emailURLs to theURLsNSArray's filteredArrayUsingPredicate:emailPredicate
-- Get just the addresses
set emailsArray to emailURLs's valueForKeyPath:"URL.resourceSpecifier"
--eliminate duplicates
set emailsArray to (current application's NSSet's setWithArray:emailsArray)'s allObjects()
-- Join the remainder as a single, return-delimited text
set emailAddresses to emailsArray's componentsJoinedByString:(return)
-- Return as AppleScript text
return emailAddresses as text
end findEmailAddressesIn:
on sortListOfStrings:theList
-- convert list to Cocoa array
set theArray to current application's NSArray's arrayWithArray:theList
-- sort the array using a specific function
set theArray to ¬
theArray's sortedArrayUsingSelector:"localizedStandardCompare:"
-- return the sorted array as an AppleScript list
return theArray as list
end sortListOfStrings:
To be honest I used the “Extract email addresses” of Automator and my aforementioned script.
So although your script works great in Script Editor and thank you so much for it, I have to figure out why it gives me an error in Automator.
But in general it is something very specific that will help me a lot at my job and this is the reason I would like to automate it better.
To be more specific:
I have a part of a large text (around 300-400 lines) in daily text files (which are created from emails).
I open a specific day’s file, copy the text in clipboard, run my Script code and then AppleScript extracts the text of selected paragraphs (those that start with “orders” and their contents, which are the 10 lines underneath)
Then it extracts the email addresses from these selected paragraphs.
And then it removes the duplicates and I have a list of unique email addresses one by one in a list without quotes that I use to contact the customers about their orders.
But on Mondays I have a list of email addresses that have placed orders during the weekend.
So I have a list of email addresses from Saturday and a different list from Sunday. Unfortunately many of them are duplicates.
This is why I was dreaming of having one script…
To open Saturday’s file, copy the whole text to clipboard, run the script, get the first list.
To open Sunday’s file, copy all to clipboard, run the script, get the second list.
But if I run the same script with input from clipboard which has only email addresses (from the combined two lists or more), to unify them.
And this list can be for up to a week or a fortnight, so I hope you understand how time saving it can be.
Thank you again for the help and please feel free to contact me through PM if you would like some more clarifications which I cannot post here.
My previous script as it should be when it is Automator’s service (Quick Action):
use AppleScript version "2.4"
use framework "Foundation"
use scripting additions
on run {input, parameters}
set theText to "0001 some@gmail.com
12345 kniazidis.rompert@gmail.com
235 someoneOther@gmail.com
403"
-- OR:
-- set textFile to choose file of type "txt"
-- set theText to read textFile as text
set ATID to AppleScript's text item delimiters
set AppleScript's text item delimiters to {space, return, linefeed}
set theTextItems to text items of theText
set AppleScript's text item delimiters to ATID
-- Extract email adresses
set emailList to paragraphs of (my findEmailAddressesIn:theText)
if emailList is {} then
display dialog "The text doesn't contain any email address"
return
end if
if (count emailList) is (count theTextItems) then
display dialog "The text has only email addresses"
else
display dialog "The text has mixed stuff"
end if
-- Sort emails list alphabetical
set sortedAlphabeticalEmailList to my sortListOfStrings:emailList
-- Remove duplicates
-- the handler findEmailAddressesIn:theString eliminates duplicates too
-- so, no need additional steps
end run
on findEmailAddressesIn:theString
-- Locate all the "links" in the text.
set theNSDataDetector to current application's NSDataDetector's dataDetectorWithTypes:(current application's NSTextCheckingTypeLink) |error|:(missing value)
set theString to current application's NSString's stringWithString:theString
set theURLsNSArray to theNSDataDetector's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
-- Extract email links
set emailPredicate to current application's NSPredicate's predicateWithFormat:"self.URL.scheme == 'mailto'"
set emailURLs to theURLsNSArray's filteredArrayUsingPredicate:emailPredicate
-- Get just the addresses
set emailsArray to emailURLs's valueForKeyPath:"URL.resourceSpecifier"
--eliminate duplicates
set emailsArray to (current application's NSSet's setWithArray:emailsArray)'s allObjects()
-- Join the remainder as a single, return-delimited text
set emailAddresses to emailsArray's componentsJoinedByString:(return)
-- Return as AppleScript text
return emailAddresses as text
end findEmailAddressesIn:
on sortListOfStrings:theList
-- convert list to Cocoa array
set theArray to current application's NSArray's arrayWithArray:theList
-- sort the array using a specific function
set theArray to ¬
theArray's sortedArrayUsingSelector:"localizedStandardCompare:"
-- return the sorted array as an AppleScript list
return theArray as list
end sortListOfStrings:
NOTE: you can get text from clipboard. Change
set theText to "0001 some@gmail.com
12345 kniazidis.rompert@gmail.com
235 someoneOther@gmail.com
403"