Parsing website html using applescript

Hi smart people!

I’m trying to parse a webpage to import some contact data into a filemaker database. I have been successful at isolating the data I want, but the page has multiple contacts. Can anyone suggest a way to repeat the parsing through the end of the webpage? Thanks in advance!


tell application "Safari"
	set pageHTML to source of document 1
end tell


set TID to AppleScript's text item delimiters -- save previous value
------- Find Email Address ----------

set text item delimiters to "Email Address :
														</TD>
													</TR>
													
													<TR>
														<TD style=\"padding:3px\">
															"
set emaila to text item 2 of pageHTML
set text item delimiters to " "
set emaila to text item 1 of emaila

------- Find last name ----------

set text item delimiters to "Last Name :
														</TD>
													</TR>
													
													<TR>
														<TD style=\"padding:3px\">
															"

set lname to text item 2 of pageHTML
set text item delimiters to " "
set lname to text item 1 of lname

------- Find first name ----------

set text item delimiters to "First Name :
														</TD>
													</TR>
													
													<TR>
														<TD style=\"padding:3px\">
															"

set fname to text item 2 of pageHTML
set text item delimiters to " "
set fname to text item 1 of fname

------- Find Phone Number ----------

set text item delimiters to "Primary Phone :
														</TD>
													</TR>
													
													<TR>
														<TD style=\"padding:3px\">
															"
set phonen to text item 2 of pageHTML
set text item delimiters to " "
set phonen to text item 1 of phonen





set emaila to text items 2 thru -1 of pageHTML

Sorry adayzdone I’m fairly new to applescript and don’t follow how that will help. Can you explain? Thanks!

Rather than getting one value, you can get a range of values.

set theList to {}
set emailaList to {1, 2, 3}
set lnameList to {4, 5, 6}
set fnameList to {7, 8, 9}
set phonenList to {10, 11, 12}

repeat with i from 1 to count of emailaList
	set end of theList to {emaila:item i of emailaList, lname:item i of lnameList, fname:item i of fnameList, phonen:item i of phonenList}
end repeat
return theList
set theList to {}
set emailaList to text items 2 thru -1 of pageHTML
set lnameList to text items 2 thru -1 of pageHTML
set fnameList to text items 2 thru -1 of pageHTML
set phonenList to text items 2 thru -1 of pageHTML

repeat with i from 1 to count of emailaList
	set end of theList to {emaila:item i of emailaList, lname:item i of lnameList, fname:item i of fnameList, phonen:item i of phonenList}
end repeat
return theList

Thanks so much for the quick response. I think I see where you’re heading. The part I am still confused about is how that will let me extract the data I want. Clearly I’m a noob.

When I posted, it removed my " " but what I am trying to do is find the email addresses on the webpage in between “Email Address: .” and " "

When I use set emailaList to text items 2 thru -1 of pageHTML it doesn’t parse correctly for me.

I must be misunderstanding. Thanks for putting up with me :slight_smile:

Are you able to post a link?

Unfortunately it’s an internal webpage. Is there similar example you know of that I can use as a reference?

You can look at this thread:
http://macscripter.net/viewtopic.php?pid=148308#p148308

Yikes! That’s well beyond my skill level.

Perhaps if I can give another example it will help me understand it. Let’s say I wanted to import into filemaker all of the email addresses and phone numbers from this webpage: http://directory.csus.edu/searchResult.jsp?peopleBydeptName=Academic+Advising+Center&act=people&popWin=true

Again, I can get so far as parsing an individual record but I can’t quite figure out how to repeat it for the entire webpage.

Thanks for your patience.


set pageHTML to (do shell script "curl " & quoted form of ("http://directory.csus.edu/searchResult.jsp?peopleBydeptName=Academic+Advising+Center&act=people&popWin=true"))


set TID to AppleScript's text item delimiters -- save previous value
------- Find Email Address ----------

set text item delimiters to "mailto:"
set emaila to text item 2 of pageHTML
set text item delimiters to ">"
set emaila to text item 1 of emaila
------- Find phone number ----------

set text item delimiters to "Academic Advising Center</font></a>&nbsp</td>
      <td class=\"tableData\"><font size=2>"

set phonen to text item 2 of pageHTML

set text item delimiters to "</font>"
set phonen to text item 1 of phonen

set pageHTML to (do shell script "curl " & quoted form of ("http://directory.csus.edu/searchResult.jsp?peopleBydeptName=Academic+Advising+Center&act=people&popWin=true"))

set TID to AppleScript's text item delimiters -- save previous value
set theList to {}

set text item delimiters to "mailto:"
set xxx to text items 2 thru -1 of pageHTML

set text item delimiters to ">"
repeat with i from 1 to count of xxx
	set end of theList to text item 1 of (item i of xxx)
end repeat

set text item delimiters to TID
return theList

Thanks! That really helped! I updated my original code as you can see below and it returns a good list. Now if only I could figure out how to import the list into multiple records and fields in FileMaker.


--try
tell application "Safari"
	set pageHTML to source of document 1
end tell

set TID to AppleScript's text item delimiters -- save previous value
set theList to {}
------- Find Email Address ----------
considering case
	set text item delimiters to "Email Address :
														</TD>
													</TR>
													
													<TR>
														<TD style=\"padding:3px\">
															"
	set emaila to text items 2 thru -1 of pageHTML
end considering

set text item delimiters to " "
------- Find Phone Number ----------
set text item delimiters to "Primary Phone :
														</TD>
													</TR>
													
													<TR>
														<TD style=\"padding:3px\">
															"
set phonen to text items 2 thru -1 of pageHTML
set text item delimiters to " "
------- Find Last Name ----------
set text item delimiters to "Last Name :
														</TD>
													</TR>
													
													<TR>
														<TD style=\"padding:3px\">
															"

set lname to text items 2 thru -1 of pageHTML
set text item delimiters to " "
------- Find First Name ----------

set text item delimiters to "First Name :
														</TD>
													</TR>
													
													<TR>
														<TD style=\"padding:3px\">
															"

set fname to text items 2 thru -1 of pageHTML
set text item delimiters to " "


repeat with i from 1 to count of emaila
	set end of theList to text item 1 of (item i of emaila)
	set end of theList to text item 1 of (item i of phonen)
	set end of theList to text item 1 of (item i of lname)
	set end of theList to text item 1 of (item i of fname)
end repeat

set text item delimiters to TID
return theList
--end try