Collect information from mail body

Hello Everyone,

I am having trouble collecting bunch of information from an incoming mail!

I intend to use the collected information for creating another email that will be a translation of the first gathered information…

Here is a part example of the received mail If I get a hold on where I made a mistake I can finish my whole mail script…

Please find a new inquiry with file name: ZWMi1 Paper Post-it memo pad Tough I attached to this mail XXXXX

Our Description Post-it memo pad
Material(s): Paper
2mm hard-cover, with post-its in neon-colours like in picture below.
5x colour post-it strips/ please see picture for sizes/ 25 sheets
1x smaller post-it paper note/ about 50x70mm/ 25 sheets/ white
1x big post-it paper note /about 100x70mm/ 100 sheets/ white

Please find the temporary item number: PMP PAP 001

Quantity 1K or 5K

Here is my script which do not return anything sadly :(;-(



tell application "Mail"
	set msg to selection
	try
		set msgcontent to content of msg
		set msgid to message id of msg
		set {originalSubject, originalProductName, originalMaterial, originalExtraDescription, originalQuantity, originalSizes, originalBodyColor, originalLogo, originalLogoDetails, originalAccessory, originalAccessoryInformation, originalDelivery, originalShipmentTo, originalSafetyTest} to my parseMsg(msgcontent)
		my createEvent(originalSubject, originalProductName, originalMaterial, originalExtraDescription, originalQuantity, originalSizes, originalBodyColor, originalBodyColor, originalLogo, originalLogoDetails, originalAccessory, originalAccessoryInformation, originalDelivery, originalShipmentTo, originalSafetyTest)
	end try
end tell

-- Parse the email content to extract movie details.
on parseMsg(msgcontent)
	set originalSubject to extractBetween(msgcontent, "Please find a new inquiry with file name:  ", "I attached to this mail the RLPS to be filed in the proper area in kiki.")
	set originalProductName to extractBetween(msgcontent, "Our Description  ", "Material(s): ")
	set originalMaterial to extractBetween(msgcontent, "Material(s): ", " ")
	set originalExtraDescription to extractBetween(msgcontent, originalMaterial, "Please find the temporary item number:  ")
	set originalQuantity to extractBetween(msgcontent, "Quantity  ", "Size  ")
	set originalSizes to extractBetween(msgcontent, "Size  ", "Body color  ")
	set originalBodyColor to extractBetween(msgcontent, "Body color  ", "Logo  ")
	set originalLogo to extractBetween(msgcontent, "Logo  ", "Logo position  ")
	set originalLogoDetails to extractBetween(msgcontent, "Logo position  ", "Accessory  ")
	set originalAccessory to extractBetween(msgcontent, "Accessory  ", "Extra information")
	set originalAccessoryInformation to extractBetween(msgcontent, "Extra information", "Approximate delivery time: ")
	set originalDelivery to extractBetween(msgcontent, "Approximate delivery time: ", "Place of delivery: ")
	set originalShipmentTo to extractBetween(msgcontent, "Client Area: ", "Client Sea Port: ")
	set originalSafetyTest to extractBetween(msgcontent, "Safety Test: ", "Type of product: ")
	
	return {originalSubject, originalProductName, originalMaterial, originalExtraDescription, originalQuantity, originalSizes, originalBodyColor, originalLogo, originalLogoDetails, originalAccessory, originalAccessoryInformation, originalDelivery, originalShipmentTo, originalSafetyTest}
end parseMsg


-- Extract the substring from between two strings
to extractBetween(theString, startText, endText)
	set tid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to startText
	set startComps to text items of theString
	set AppleScript's text item delimiters to endText
	set endComps to text items of second item of startComps
	set AppleScript's text item delimiters to tid
	return trim(first item of endComps)
end extractBetween

-- Trim all whitespace from start and end of a string
on trim(theString)
	set theChars to {" ", tab, character id 10, return, character id 0, character id 8232}
	repeat until first character of theString is not in theChars
		set theString to text 2 thru -1 of theString
	end repeat
	repeat until last character of theString is not in theChars
		set theString to text 1 thru -2 of theString
	end repeat
	return theString
end trim

Thanks for any help.

Did you consult Mail’s dictionary? It has this to say about selection:

So you need to select an item from a list of messages:

	--replace this:
	set msg to selection
	
	-- with this:
	set msg to item 1 of (get selection)

I’ve not looked at the rest of the script.

Your extractBetween(theString, startText, endText) is returning errors because startComps doesn’t always have 2 text items. Use the version below to log the errors for debugging.

to extractBetween(theString, startText, endText)
	local startComps, endComps
	try
		set tid to AppleScript's text item delimiters
		set AppleScript's text item delimiters to startText
		set startComps to text items of theString
		set AppleScript's text item delimiters to endText
		set endComps to text items of second item of startComps
		set AppleScript's text item delimiters to tid
		return trim(first item of endComps)
	on error errMsg number errNum
		log errMsg & " number " & errNum
	end try
end extractBetween

Also,as alastor933 mentions, use

set msg to first item of (get selection)

Thank you both.

I have corrected the “get selection” Issue thanks. Now I am catching the email content but still cannot get the required text.

I got the following error message and do not understand how to debug it…

(Can’t get item 2 of {“Dear Ella,…”}. number -1728)

in the extractBetween() handler you have to check for the existence of the startString.
You can do this by counting the text items.
If theString contains startString the number of text items is at least 2

Hello Stefan,

Thanks for your answer but then my first approach might be the right one as it can be that actually the text is made of only one or 5 or six words and even sometimes that there is not text at all then in that case the script will have to skip th information!

Could you point out a solution?

Sorry I made a typo should read “Might not be the right one…”

this version of the handler returns missing value in case the text does not contain startText


to extractBetween(theString, startText, endText)
	set tid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to startText
	set startComps to text items of theString
	if count startComps < 2 then
		set AppleScript's text item delimiters to tid
		return missing value
	end if
	set AppleScript's text item delimiters to endText
	set endComps to text items of second item of startComps
	set AppleScript's text item delimiters to tid
	return trim(first item of endComps)
end extractBetween


Thanks Stefan,

Tried the new version and now I am getting the whole email but not the bits and pieces as I would like???

Any ideas?

Hello Stefan,

To be more specific here is the result I am getting and I really cannot figure out why it is not breaking out as I would like…

this is what appears in my replies window:

"tell application “Mail”
get selection
→ {message id 183064 of mailbox “ZWM/ZWM - 2014-2015/ZWMi14071101 - 87500-105K Paper Post-it memo pad”, message id 183038 of mailbox “ZWM/ZWM - 2014-2015/ZWMi14071101 - 87500-105K Paper Post-it memo pad”, message id 183036 of mailbox “ZWM/ZWM - 2014-2015/ZWMi14071101 - 87500-105K Paper Post-it memo pad”, message id 183035 of mailbox “ZWM/ZWM - 2014-2015/ZWMi14071101 - 87500-105K Paper Post-it memo pad”}
get content of message id 183064 of mailbox “ZWM/ZWM - 2014-2015/ZWMi14071101 - 87500-105K Paper Post-it memo pad”
→ "Dear Ella,

Please find a new inquiry with file name: ZWMi14071101 - 87500-105K Paper Post-it memo pad I attached to this mail the RLPS to be filed in the proper area.

Our Description Post-it memo pad
Material(s): Paper
2mm hard-cover, with post-its in neon-colours like in picture below.
5x colour post-it strips/ please see picture for sizes/ 25 sheets
1x smaller post-it paper note/ about 50x70mm/ 25 sheets/ white
1x big post-it paper note /about 100x70mm/ 100 sheets/ white

Please find the temporary item number: PMP PAP 001

… I break the message here as it is a bit long…
Here is the end of the replies window
"
get message id of message id 183064 of mailbox “ZWM/ZWM - 2014-2015/ZWMi14071101 - 87500-105K Paper Post-it memo pad”
→ “1F7D0D92-58BA-4CF0-A595-B4866B5F8537@xxxxx.cn
end tell

I recommend to remove the try block to get an error if there is one.

bye removing the try block I lock my editor and need to force quit it…

Hey Claude,

Vanilla AppleScript is not a very good tool for parsing complex text.

Better to use the Satimage.osax or a Mavericks library to provide AppleScript with direct support for regular expressions. You can of course shell-out to sed, Perl, Python, or Ruby, but it’s more convenient to not have to deal with the shell.

With regular expressions you can more easily allow for common text formatting mistakes and drill down to what you need.

I have included 3 handlers I’ve used for over a decade. Basic-find (fnd), Find-with-Capture (fndUsing), and Boolean-Find (fndBool).

The one in actual use in the script is ‘fndUsing’. This allows finding complex text and returning only the portion of it you need.

If you want to pursue this you can send me a complete sample message off-list to listmeister@thestoneforge.com, and I’ll help you write the regex.


set msgContent to "
Our Description  Post-it memo pad
"
set originalProductName to fndUsing("^Our Description:?[[:blank:]]+(.+[[:blank:]]*$)", "\\1", msgContent, false, true, false, false, true) of me

--> "Post-it memo pad"

---------------------------------------------------------------------------------------
--» FIND HANDLERS
---------------------------------------------------------------------------------------
on fnd(_find, _data, _case, _all, strRslt) # Last 3 are all bool
	try
		find text _find in _data case sensitive _case all occurrences _all string result strRslt with regexp
	on error
		return false
	end try
end fnd
---------------------------------------------------------------------------------------
on fndBool(_find, _data, _case, _all, strRslt)
	try
		find text _find in _data case sensitive _case all occurrences _all string result strRslt with regexp
		true
	on error
		false
	end try
end fndBool
---------------------------------------------------------------------------------------
on fndUsing(_find, _capture, _data, _case, _regex, _word, _all, strRslt)
	try
		set findResult to find text _find in _data using _capture case sensitive _case regexp _regex ¬
			whole word _word all occurrences _all string result strRslt
	on error
		false
	end try
end fndUsing
---------------------------------------------------------------------------------------

There is nothing wrong with using vanilla AppleScript. When you can solve it with simply splitting and joining text items why make use of an heavy AppleEvent? With considering diacriticals and case you can even refine your text processing. There is also nothing wrong with using a shell when it comes to text processing, the shell itself is a text processor. But also a do shell script can’t work without sending an AppleScript. The only difference between Stefan’s script and a regular expression is that Stefan’s script is not greedy while regex are. A solution similar to an regular expression would be like:

set theString to "Lorem ipsum <b>dolor</b> sit amet, consectetur <b>adipiscing</b> elit."
extractBetween(theString, "<b>", "</b>")

on extractBetween(theString, startString, endString)
	set oldTID to AppleScript's text item delimiters
	set AppleScript's text item delimiters to startString
	if (count of text items of theString) < 2 then
		set AppleScript's text item delimiters to oldTID
		return missing value
	end if
	set theString to text items 2 thru -1 of theString as string
	set AppleScript's text item delimiters to endString
	if (count of text items of theString) < 2 then
		set AppleScript's text item delimiters to oldTID
		return missing value
	end if
	set theString to text items 1 thru -2 of theString as string
	set AppleScript's text item delimiters to oldTID
	return theString --for claudeB return trim(theString)
end extractBetween

hello ccstone,

Thank you for taking the time to answer my problem.

I appreciate your solution but since I am not so advanced in applescript and completely unexperienced in programming I do not quiet understand your codes…

I don’t mind sending you by mail a sample of the original mail with highlights of what I want to grab.

Thanks again for your time and patience

Hello DJ Bazzie Wazzie

Thank you also for helping me with my problem.

Your solution is more something I can understand and I tried it but I do not quiet catch how I can have the script to catch all of the strings and return them separately?

By trying your solution it is OK to catch one string but I cannot get the whole of them as it only returns the last extracted string?

Also it does not trim the result and if I place a space at the end of the first text it returns “missing value”?

Would be kind enough to give me an example of putting the script together for at least 3 strings so that I can understand how to finish it?

Thanks again for being patient.

Only replace my handler with yours nothing more than that:

set theString to "Please find a new inquiry with file name:  ZWMi1 Paper Post-it memo pad Tough  I attached to this mail XXXXX

Our Description  Post-it memo pad
Material(s): Paper     
2mm hard-cover, with post-its in neon-colours like in picture below.
5x colour post-it strips/ please see picture for sizes/ 25 sheets
1x smaller post-it paper note/ about 50x70mm/ 25 sheets/ white
1x big post-it paper note /about 100x70mm/ 100 sheets/ white

Please find the temporary item number:  PMP PAP 001

Quantity  1K or 5K"

parseMSG(theString)

on parseMSG(msgcontent)
	set originalSubject to extractBetween(msgcontent, "Please find a new inquiry with file name: ", "I attached to this mail")
	set originalProductName to extractBetween(msgcontent, "Our Description  ", "Material(s): ")
	
	set originalMaterial to extractBetween(msgcontent, "Material(s): ", " ")
	set originalExtraDescription to extractBetween(msgcontent, originalMaterial, "Please find the temporary item number: ")
	set originalQuantity to extractBetween(msgcontent, "Quantity ", "Size ")
	set originalSizes to extractBetween(msgcontent, "Size ", "Body color ")
	set originalBodyColor to extractBetween(msgcontent, "Body color ", "Logo ")
	set originalLogo to extractBetween(msgcontent, "Logo ", "Logo position ")
	set originalLogoDetails to extractBetween(msgcontent, "Logo position ", "Accessory ")
	set originalAccessory to extractBetween(msgcontent, "Accessory ", "Extra information")
	set originalAccessoryInformation to extractBetween(msgcontent, "Extra information", "Approximate delivery time: ")
	set originalDelivery to extractBetween(msgcontent, "Approximate delivery time: ", "Place of delivery: ")
	set originalShipmentTo to extractBetween(msgcontent, "Client Area: ", "Client Sea Port: ")
	set originalSafetyTest to extractBetween(msgcontent, "Safety Test: ", "Type of product: ")
	return {originalSubject, originalProductName, originalMaterial, originalExtraDescription, originalQuantity, originalSizes, originalBodyColor, originalLogo, originalLogoDetails, originalAccessory, originalAccessoryInformation, originalDelivery, originalShipmentTo, originalSafetyTest}
end parseMSG

on extractBetween(theString, startString, endString)
	set oldTID to AppleScript's text item delimiters
	set AppleScript's text item delimiters to startString
	if (count of text items of theString) < 2 then
		set AppleScript's text item delimiters to oldTID
		return missing value
	end if
	set theString to text items 2 thru -1 of theString as string
	set AppleScript's text item delimiters to endString
	if (count of text items of theString) < 2 then
		set AppleScript's text item delimiters to oldTID
		return missing value
	end if
	set theString to text items 1 thru -2 of theString as string
	set AppleScript's text item delimiters to oldTID
	return trim(theString)
end extractBetween


on trim(theString)
	set theChars to {" ", tab, character id 10, return, character id 0, character id 8232}
	repeat until first character of theString is not in theChars
		set theString to text 2 thru -1 of theString
	end repeat
	repeat until last character of theString is not in theChars
		set theString to text 1 thru -2 of theString
	end repeat
	return theString
end trim

Thank you DJ Bazzie Wazzie,

I have tested your solution and it works fine till I have to split a paragraph in 2 like for example in the following:
Below is my text!

Our Description Post-it memo pad
Material(s): Paper
2mm hard-cover, with post-its in neon-colours like in picture below.
5x colour post-it strips/ please see picture for sizes/ 25 sheets
1x smaller post-it paper note/ about 50x70mm/ 25 sheets/ white
1x big post-it paper note /about 100x70mm/ 100 sheets/ white

Please find the temporary item number: PMP PAP 001

Quantity 1K or 5K"

For material I want to collect as a string only “Paper” and nothing behind! Now it collects " Paper and the rest of the text till “Quantity 1K or”

Could you show me how to break this paragraph as also sometimes it will not be there.(This information is optional and not always available) So when this paragraph is not there then the script should skip this area.