I’ll admit upfront I am a newbie to Applescript, so this may seem basic to some of you. My issue is this: I am in charge of an account which sends me orders for business cards on a daily basis. These orders come to me via the mail program. I have attached a sample of one of these emails. I would like a script which would strip out the extra information such as DB_First_Name:, etc, and leave me with just the basic information (name, tel., etc.) saved as a text file, which then could be flowed into a template set up in Indesign CS3. I can set it up right now to save the email to a folder on my desktop as rich text, which at that point I would like to be able to run the script to edit the text file. I had a script which did this in OS 9 using outlook express, but since upgrading to OS X and using mail I have not been able to revise the script to make it work. Any assistance would be much appreciated, let me know if there is more information necessary, thanks for reading.
Given that you’ve saved as RTF rather than plain text, TextEdit is required to read the file (otherwise we could read it directly into an AppleScript).
What you need is something like this to get you started:
tell application "TextEdit"
set F to open alias "ACB-G5_1:Users:bellac:Desktop:DBStuff.rtf"
set P to paragraphs of document 1
end tell
considering case
repeat with aP in P
if aP begins with "DB_Last_Name:" then
set N to rest of words of contents of aP
else if aP begins with "DB_Phone1:" then
set T to rest of words of contents of aP
end if
end repeat
end considering
set Out to N & return & T as Unicode text
set NF to open for access ((path to desktop as text) & "DBOut.txt") with write permission
try
set eof of NF to 0
write Out to NF as Unicode text
close access NF
on error
close access NF
end try
Thanks for the response! When I try to run this script, I get an error saying the document doesn’t exit. I have modified the script you sent to reflect the path to the folder on my desktop, but it doesn’t see the file. Any ideas?
Is it really an RTF file? Can you open it with TextEdit, for example?
And yes, if you don’t need any text formatting then it’s easier. For example:
set F to read (choose file)
set P to paragraphs of F
considering case
repeat with aP in P
if aP begins with "DB_Last_Name: " then
set N to word -1 of contents of aP
else if aP begins with "DB_Phone1: " then
set T to word -1 of contents of aP
else if aP begins with "Account_Unit: " then
set Acct to word -1 of aP
end if
end repeat
end considering
set Out to N & return & T & return & Acct as Unicode text
set NF to open for access ((path to desktop as text) & "DBOut.txt") with write permission
try
set eof of NF to 0
write Out to NF as Unicode text
close access NF
on error
close access NF
end try
I’ve interpreted the problem as being that you want to loose the labels and the separating white text and just keep the values “ or empty lines where there are no values.
This works for me with both plain text and RTF source files:
set sourceFile to (choose file)
set sourceFileName to name of (info for sourceFile)
if (sourceFileName ends with ".txt") then
set theParas to paragraphs of (read sourceFile)
else if (sourceFileName ends with ".rtf") then
tell application "System Events" to set TEWasOpen to (application process "TextEdit" exists)
tell application "TextEdit"
open sourceFile
set theParas to paragraphs of (text of front document as string)
if (TEWasOpen) then
close front document
else
quit
end if
end tell
else
error "The file must have a suitable ".txt" or ".rtf" name extension."
end if
set editedParas to {}
set whitespace to space & tab & (ASCII character 202)
repeat with thisPara in theParas
set paraLen to thisPara's length
if (paraLen is 0) then
set end of editedParas to ""
else
set afterWhitespace to (character 1 of thisPara is in whitespace)
repeat with i from 2 to paraLen
if (character i of thisPara is in whitespace) then
set afterWhitespace to true
else if (afterWhitespace) then
set end of editedParas to (text i thru paraLen of thisPara)
exit repeat
end if
end repeat
if ((i is paraLen) and (character i of thisPara is in whitespace)) or (not afterWhitespace) then set end of editedParas to ""
end if
end repeat
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to ASCII character 10
set editedText to editedParas as string
set AppleScript's text item delimiters to astid
set destinationPath to sourceFile as Unicode text
set destinationPath to text 1 thru -5 of destinationPath & " (edited).txt"
set fref to (open for access file destinationPath with write permission)
try
set eof fref to 0
write editedText as string to fref
end try
close access fref
Really appreciate the effort here folks. The error I get now when I try to run this script is Can’t get name of “Document 1.rtf”. I believe I have the path named correctly, what am I doing wrong? Here’s the script as I have modified it for my machine:
Thanks again!
Emery
set sourceFile to "Mac HD:Desktop:Daily Prov:Document 1.rtf"
set sourceFileName to name of "Document 1.rtf"
if (sourceFileName ends with ".txt") then
set theParas to paragraphs of (read sourceFile)
else if (sourceFileName ends with ".rtf") then
tell application "System Events" to set TEWasOpen to (application process "TextEdit" exists)
tell application "TextEdit"
open sourceFile
set theParas to paragraphs of (text of front document as string)
if (TEWasOpen) then
close front document
else
quit
end if
end tell
else
error "The file must have a suitable ".txt" or ".rtf" name extension."
end if
set editedParas to {}
set whitespace to space & tab & (ASCII character 202)
repeat with thisPara in theParas
set paraLen to thisPara's length
if (paraLen is 0) then
set end of editedParas to ""
else
set afterWhitespace to (character 1 of thisPara is in whitespace)
repeat with i from 2 to paraLen
if (character i of thisPara is in whitespace) then
set afterWhitespace to true
else if (afterWhitespace) then
set end of editedParas to (text i thru paraLen of thisPara)
exit repeat
end if
end repeat
if ((i is paraLen) and (character i of thisPara is in whitespace)) or (not afterWhitespace) then set end of editedParas to ""
end if
end repeat
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to ASCII character 10
set editedText to editedParas as string
set AppleScript's text item delimiters to astid
set destinationPath to sourceFile as Unicode text
set destinationPath to text 1 thru -5 of destinationPath & " (edited).txt"
set fref to (open for access file destinationPath with write permission)
try
set eof fref to 0
write editedText as string to fref
end try
close access fref
Now, for the next challenge: since I receive multiple orders, I can’t save them all as Document 1, as it wants to write over the previous file saved as Document 1. Can the script be set up as for the document to be a variable, such as Document X, which would process each individual file one at a time? Then, since the emails will come with a first name/last name designation, maybe save the text file as such, first name/last name?
I can’t tell you how much I appreciate the help you’ve given so far, thanks again!
Here’s the progress I’ve made. I have found a script which will export the individual emails from a folder in mail, then save them as separate emails with different names on a folder on my desktop. From here, I use the basic script for shortening file names, so the document name is now “PHSbcform1.htm.rtf”. However, since there is more than one document in this folder, the export script has added a " (1)“, " (2)”, etc., after “htm”. The original script in this string which strips the extra info out of the text file works great for the file named “PHSbcform1.htm.rtf”. What I need to now is to alter this script which will allow it to process files with a variable in the name. Is this possible? Thanks again for the help, much appreciated.
the whole procedure to extract the emails seems to be a bit complicated.
I would read the data directly from the mails, then you could even attach the script to a mail rule,
then everything works automatically.
Edit:
Here is a different approach to extract the values and strip off the whitespaces with a shell command.
Select one or more mails in Mail.app and run the script. The textfiles will created on the desktop named with sender and subject of the mail
property CR : ASCII character 13
tell application "Mail" to set sel to selection
repeat with oneMail in sel
tell application "Mail" to tell oneMail to set {theContent, theSubject, theSender} to {paragraphs of content, subject, extract name from sender}
set theLines to {}
set {TID, text item delimiters} to {text item delimiters, ":"}
repeat with i in theContent
if i contains "Special_Instructions" then exit repeat
try
set str to do shell script "echo " & quoted form of text item 2 of i & " | strings"
if str begins with space then set str to text 2 thru -1 of str
if str contains CR then
set offs to offset of CR in str
set str to text 1 thru (offs - 1) of str & tab & text (offs + 1) thru -1 of str
end if
set end of theLines to str
end try
end repeat
set text item delimiters to ASCII character 10
set editedText to theLines as string
set text item delimiters to TID
set destinationPath to ((path to desktop as Unicode text) & theSender & "_" & theSubject & ".txt")
set fref to (open for access file destinationPath with write permission)
try
set eof fref to 0
write editedText as string to fref
end try
close access fref
end repeat
Besides taking nearly three times as long as the vanilla and deliberately not handling the “Special_Instructions:” line, your shell script method leaves out the “D1:” and “Shipto_State:” results when I try it. (The shell script returns “” for those lines.)
(Tested by replacing the paragraph-editing process in my script with that from yours and matching the variable names. Both versions tested on the same file, derived from Emery’s example in post #1.)
They’re not unreliable, Stefan, but on my machine, for example (see sig), starting a new thread for a shell call takes nearly 50 ms. I try to avoid using them inside a loop for that reason because each cycle will bear that overhead. While not a rigorous test, this is what I used to determine that:
set ProcTime to "perl -e 'use Time::HiRes qw(time); print time'"
set rep to 100
repeat 10 times -- get the pumps primed
do shell script ProcTime
end repeat
set proct to 0
set t1 to GetMilliSec
repeat rep times
set strt to do shell script ProcTime
do shell script "echo ''"
set proct to proct + (do shell script ProcTime) - strt
end repeat
set tot to ((GetMilliSec) - t1) / 1000
set shellCost to (tot - proct) / rep
Looking at that on my Jaguar machine this evening, I see that ‘strings’ only returns strings that have four or more printable characters, unless a lower number is specified as an option:
set str to do shell script "echo " & quoted form of text item 2 of i & " | strings -2" -- or: ". strings -1"
If it’s also true in Tiger, that could be why the values “23” and “OR” were omitted when I tried your method this morning! The other “OR” could have survived because it’s bounded in my file by a couple of spaces, but I can’t check that till I get back to my other machine.
In the Jaguar implementation of ‘strings’, spaces count towards strings, but tabs don’t. The use of the command in this context relies on the white space after the labels either not containing spaces (option “-1”) or not containing consecutive spaces (option “-2”).