I’m a recreational coder (sad, but true), a “nooby” to this BBS, and I’ve just recently met some difficulty transferring text data from email to a database. I’d appreciate some advice. I suspect the problem isn’t too complex … it’s just beyond me
I am using Mail.app (2.0.5) and FileMaker Pro (7.0v1) under OSX.4.3, and Entourage (9.0) and FileMaker Pro (5.0v3) under OS9.2.2.
I use applescript to transfer data from emails to a Filemaker database for ‘tidy-up’ and then on to a MySQL database for web access. My desktop Mac-side automation has been applescript (1.10.3) and my remote Linux-side automation has been mainly PHP. This has all been working fine with Entourage (9.0) under OS9 and Eudora (6.2.3) under OSX but when I tried to shift to using Mail.app (2.0.5), partly because of the better range of applescript commands available in the dictionary, I seem to have lost the line-breaks.
I use the following basic approach:
tell application "Mail"
....
set MessageWholeText to source of last message in mailbox MailboxForArchiving
....
end tell
tell application "FileMaker Pro"
....
set field "MessageWholeText" of last record to MessageWholeText
....
end tell
The text that ends up in the database field has no line-breaks, it’s just a blob of space-separated text. This applies whether I’m extracting the headers, the content, or the source of the email. I’ve tried coercing the test ‘as string’, ‘as unicode text’ with no apparent effect. However when I interrupt the script and ‘return’ the variable (e.g. MessageWholeText above) within Script Editor (2.1.1(81)) it looks fine with all the line-breaks exactly where they appear in the original email.
This did / does not happen in Eudora or Entourage under OS9.2.2 (although I’ve not succeeded in getting Entourage 11.1.0 to run under OSX … but that’s another saga altogether).
My guess, and it’s only a guess, is that whatever ‘character’ Mail.app uses for those line-breaks is fine for Script Editor but does not work the same way in Filemaker … and that Eudora and Entourage use a different method to show a line-break or new paragraph (or whatever). BUT if I simply copy the text from an open email in Mail.app and paste it into my FileMaker fields the line-breaks all come through fine.
Oh gurus who know a lot more that me about this sort of thing …
1. Is this the likely cause of my troubles?
2. If not, what is?
3. What can / should I do to fix this? I haven’t tried writing the applescript to use the clipboard to transfer the text, maybe I should, and I haven’t tried moving it through another application, perhaps a text editor or word processor, first … seems too messy.
You may be on the right track, Dougal - although I can’t test the whole thing for you at the moment. However, Mail is returning linefeeds (ASCII character 10) in a message’s source - while they’re copied directly from the message as returns (ASCII character 13). So you could try inserting this between your Mail and FileMaker Pro tell blocks (outside the tell blocks themselves):
set text item delimiters to return
tell MessageWholeText's paragraphs to set MessageWholeText to beginning & ({""} & rest)
set text item delimiters to {""}
It’s gonna take me a little while to decipher that snippet of script and understand what it’s actually doing … but my transfer script is working smoothly now with it modified to handle all the text variables and inserted between Mail.app and FileMaker.
Maybe a brief analysis of what actually happens here might help to clarify things a little, Dougal.
A step-by-step version of the code (to which I’ve added some comments and demo tests) might have looked something like this:
to checkLineEnds(text_to_check) -- for demo only
{ASCII_13:text_to_check contains (ASCII character 13), ASCII_10:text_to_check contains (ASCII character 10)}
end checkLineEnds
set MessageWholeText to "abc
defg
hijkl" (* to also help demonstrate results *)
set original_check to {original_text:checkLineEnds(MessageWholeText)} -- again, to demonstrate results later
set MessageWholeList to every paragraph of MessageWholeText
-- creates an AppleScript list comprised of all the paragraphs from MessageWholeText
--> {"abc", "defg", " hijkl"}
(* "every paragraph", or "paragraphs", is line-ending-agnostic. It treats Mac-, Windows-, and Unix-style line breaks equivalently *)
set AppleScript's text item delimiters to return
-- if the list is now coerced to text, each item from the list will be rejoined - separated by a return character (ASCII character 13)
if class of MessageWholeText is Unicode text then
set MessageWholeText to MessageWholeList as Unicode text
else
set MessageWholeText to MessageWholeList as text
end if
-- this should preserve any encoding from the original text
--> "abc
-- defg
-- hijkl"
set AppleScript's text item delimiters to {""}
-- restore initial value (or 'default', if you like) of AppleScript's text item delimiters
original_check & {converted_text:checkLineEnds(MessageWholeText)} -- demo only
--> {original_text:{ASCII_13:false, ASCII_10:true}, converted_text:{ASCII_13:true, ASCII_10:false}}
The result demonstrates that all ASCII 10 (linefeed) characters have now been replaced with ASCII 13 (return) characters - which Filemaker Pro evidently prefers.
Now let’s take another look at the code I actually suggested:
set text item delimiters to return
tell MessageWholeText's paragraphs to set MessageWholeText to beginning & ({""} & rest)
set text item delimiters to {""}
This also sets and restores AppleScript’s text item delimiters in a similar way to the example above - and so the rest of the action is really concentrated in that middle line.
First, the tell statement…
This merely avoids having to set a variable (as we did above), such as MessageWholeList, to store the list resulting from MessageWholeText’s paragraphs. Within the tell block, we can now refer to most of the list’s properties and elements implicitly - without the need to explicitly refer back to the list. Take, for example, this fairly standard syntax:
set pet_list to {"crocodile", "dog", "cat", "mouse", "canary"}
set text item delimiters to "/"
set dialog_message to "Of the " & length of pet_list & " pets in this room, I wonder if the " & pet_list's item 1 & " might worry the others (" & items 2 thru -1 of pet_list & ")?"
set text item delimiters to {""}
display dialog dialog_message
--> "Of the 5 pets in this room, I wonder if the crocodile might worry the others (dog/cat/mouse/canary)?"
Let’s now try that using the ‘tell…’ variant - but this time, we’ll also use the reserved word beginning in place of item 1 - and the list’s rest property, instead of items 1 thru -2.
set text item delimiters to "/"
tell {"crocodile", "dog", "cat", "mouse", "canary"} to set dialog_message to "Of the " & length & " pets in this room, I wonder if the " & item 1 & " might worry the others (" & rest & ")?"
set text item delimiters to {""}
display dialog dialog_message
--> "Of the 5 pets in this room, I wonder if the crocodile might worry the others (dog/cat/mouse/canary)?"
And now for the set MessageWholeText to beginning & ({“”} & rest) part…
Remember how, earlier, we used an if/then/else block to check for Unicode text? And how we then coerced the list of paragraphs to either Unicode text or plain text?
Well, we can actually achieve all that in a single hit - by taking advantage of AppleScript’s string concatenation rules. Here’s the bottom line:
So let’s see what happens when we apply this principle to a simple list:
set pet_status_list to {"crocodile (content)", "dog (missing)", "cat (missing)", "mouse (missing)", "canary (missing)"}
set text item delimiters to return -- this dictates what character or string will separate the listed items after coercion to string/Unicode text
set status_report to pet_status_list's beginning & rest of pet_status_list
set text item delimiters to {""} -- restore TIDs to initial value
status_report
--> "crocodile (content)dog (missing)
-- cat (missing)
-- mouse (missing)
-- canary (missing)"
Fine.
Um… apart, that is, from the first line. Why have the first and second items been joined together - with no separating return character?
Well, what happened was this: AppleScript was presented with a string (or Unicode text) to the left of the operator, and a list to the right of it…
… so it duly coerced the list to a string (or Unicode text)…
"crocodile (content)" & "dog (missing)
cat (missing)
mouse (missing)
canary (missing)"
… and concatenated the result:
"crocodile (content)dog (missing)
cat (missing)
mouse (missing)
canary (missing)"
So - to insert an additional separator between the string (or Unicode text) and the list, we need to first add an extra, empty item to the list itself:
→ {“”, “dog (missing)”, “cat (missing)”, “mouse (missing)”, “canary (missing)”}[/code]
To make sure that’s evaluated first, put it into parentheses - and then concatenate the result with the original string/Unicode text:
set text item delimiters to return
tell {"crocodile (content)", "dog (missing)", "cat (missing)", "mouse (missing)", "canary (missing)"} to set status_report to beginning & ({""} & rest)
--> "crocodile (content)
-- dog (missing)
-- cat (missing)
-- mouse (missing)
-- canary (missing)"
I hope that all (sort of) makes sense - but please yell if I’ve just added to the confusion…
Very pleased to hear it, Dougal.
Did I say “brief analysis” earlier? Oops - sorry folks! :rolleyes:
The explanation, while perhaps redefining the scope of “brief”, was clear and very helpful. I think I now understand what that snippet of script does, especially the “& ({”“} & rest)” part which had thrown me completely.
I studied the tutorial provided by Kai and am puzzled why he fusses with the first paragraph when substituting all paragraph ASCII 10’s with ASCII13’s:
“set MessageWholeText to beginning & ({”“} & rest”
Couldn’t he have accomplished the subtitutions more simply by doing the following? (Here’s my entire solution):
set MessageWholeText to "abc
defg
hijkl"--text lines with ASCII 10 endings
set thesePars to paragraphs of MessageWholeText --create list of all paragraphs (excluding the ASCII 10 character at the ends)
set text item delimiters to return --in effect create new end-of paragraph character using ASCII 13s
return thesePars as text --return output
set text item delimiters to "" --replace the default delimiter setting
By the way, I found the subject concerning the handling of ambiguous end-of-paragraph ASCII characters informative. I didn’t know that some of Apple’s text windows use ASCII number 10. I also thought Kai’s very succinct test modules very interesting. Finally, I am looking for a way to use Regular Expressions with AppleScript strings that doesn’t involve downloading new API’s. I would like to use UNIX apps like GREP however this seems to opperate only on text files–not strings.
If you’re already sure that the source text is of class ‘string’, then using a string/text coercion is fine, Antony. (However, I wouldn’t advise the <return thesePars as text> in that position, since it will skip the line that resets AppleScript’s text item delimiters.)
Similarly, if you know that the source text is of class ‘Unicode text’, then the normal coercion would be to Unicode text.
Then again, if you’re unsure what class the source text is likely to be, and you wanted to preserve any original encoding, you’d normally need to check the incoming class and use the appropriate coercion. To do this, you could use an if/then statement (as I did in an earlier example above). Alternatively, you could use the <beginning & ({“”} & rest)> approach I suggested - which, in situations like this, is faster than using an if/then block. (This is touched on in the section above that starts: “And now for the set MessageWholeText to beginning & ({“”} & rest) part…”)
Not sure I’d agree that attending to such details is really “fussing”, though.
Kai, thanks for the prompt reply and generous expertise. I now have a better idea of what you were trying to do. The “fussing” is really about preserving whatever the original code type–unicode or regular text–existed in the source text. I suppose creating a list of text parts converts all text types to regular text. This is the subtle information I really should know. I have yet to come across an instance where I have encountered unicode text, though this is probably because all my work is with domestic text (USA)–so I glossed over your careful explanation regarding unicode preservation.
Just a couple of additional points, if I may. Actually, creating a list from text (whether listing characters, words, paragraphs or text items) should preserve any encoding present in the original text. For example:
set wordList to words of ("Just a few words." as Unicode text)
set classList to {}
repeat with currentWord in wordList
set classList's end to currentWord's class
end repeat
{wordList:wordList, classList:classList}
--> {wordList:{"Just", "a", "few", "words"}, classList:{Unicode text, Unicode text, Unicode text, Unicode text}}
Where encoding may be lost is during coercion to text - either explicitly:
(* assumes tids are set to {""} *)
set wordList to words of ("Just a few words." as Unicode text)
wordList as string
{result, result's class}
--> {"Justafewwords", string}
… or implicitly, through (for instance) concatenation with plain text:
(* assumes tids are set to {""} *)
set wordList to words of ("Just a few words." as Unicode text)
"some plain text: " & wordList
{result, result's class}
--> {"some plain text: Justafewwords", string}
Incidentally, you’re probably using Unicode text more than you might imagine. Here are just a few examples:
tell application "Finder" to name of (path to applications folder)
{result, result's class}
--> {"Applications", Unicode text}
text returned of (display dialog "Demonstration." default answer "sample text" giving up after 1)
{result, result's class}
--> {"sample text", Unicode text}
Thanks for the additional examples. I see that unicode is pretty much the standard output type used by AppleScript built-in objects (although not the standard output when parsing text unless specified). I have recently read that this is part of an ongoing evolution of the Apple OS towards a more universal standard.