Hi smart people!
I’m trying to parse a webpage to import some contact data into a filemaker database. I have been successful at isolating the data I want, but the page has multiple contacts. Can anyone suggest a way to repeat the parsing through the end of the webpage? Thanks in advance!
tell application "Safari"
set pageHTML to source of document 1
end tell
set TID to AppleScript's text item delimiters -- save previous value
------- Find Email Address ----------
set text item delimiters to "Email Address :
</TD>
</TR>
<TR>
<TD style=\"padding:3px\">
"
set emaila to text item 2 of pageHTML
set text item delimiters to " "
set emaila to text item 1 of emaila
------- Find last name ----------
set text item delimiters to "Last Name :
</TD>
</TR>
<TR>
<TD style=\"padding:3px\">
"
set lname to text item 2 of pageHTML
set text item delimiters to " "
set lname to text item 1 of lname
------- Find first name ----------
set text item delimiters to "First Name :
</TD>
</TR>
<TR>
<TD style=\"padding:3px\">
"
set fname to text item 2 of pageHTML
set text item delimiters to " "
set fname to text item 1 of fname
------- Find Phone Number ----------
set text item delimiters to "Primary Phone :
</TD>
</TR>
<TR>
<TD style=\"padding:3px\">
"
set phonen to text item 2 of pageHTML
set text item delimiters to " "
set phonen to text item 1 of phonen
set emaila to text items 2 thru -1 of pageHTML
Sorry adayzdone I’m fairly new to applescript and don’t follow how that will help. Can you explain? Thanks!
Rather than getting one value, you can get a range of values.
set theList to {}
set emailaList to {1, 2, 3}
set lnameList to {4, 5, 6}
set fnameList to {7, 8, 9}
set phonenList to {10, 11, 12}
repeat with i from 1 to count of emailaList
set end of theList to {emaila:item i of emailaList, lname:item i of lnameList, fname:item i of fnameList, phonen:item i of phonenList}
end repeat
return theList
set theList to {}
set emailaList to text items 2 thru -1 of pageHTML
set lnameList to text items 2 thru -1 of pageHTML
set fnameList to text items 2 thru -1 of pageHTML
set phonenList to text items 2 thru -1 of pageHTML
repeat with i from 1 to count of emailaList
set end of theList to {emaila:item i of emailaList, lname:item i of lnameList, fname:item i of fnameList, phonen:item i of phonenList}
end repeat
return theList
Thanks so much for the quick response. I think I see where you’re heading. The part I am still confused about is how that will let me extract the data I want. Clearly I’m a noob.
When I posted, it removed my " " but what I am trying to do is find the email addresses on the webpage in between “Email Address: .” and " "
When I use set emailaList to text items 2 thru -1 of pageHTML it doesn’t parse correctly for me.
I must be misunderstanding. Thanks for putting up with me
Are you able to post a link?
Unfortunately it’s an internal webpage. Is there similar example you know of that I can use as a reference?
Yikes! That’s well beyond my skill level.
Perhaps if I can give another example it will help me understand it. Let’s say I wanted to import into filemaker all of the email addresses and phone numbers from this webpage: http://directory.csus.edu/searchResult.jsp?peopleBydeptName=Academic+Advising+Center&act=people&popWin=true
Again, I can get so far as parsing an individual record but I can’t quite figure out how to repeat it for the entire webpage.
Thanks for your patience.
set pageHTML to (do shell script "curl " & quoted form of ("http://directory.csus.edu/searchResult.jsp?peopleBydeptName=Academic+Advising+Center&act=people&popWin=true"))
set TID to AppleScript's text item delimiters -- save previous value
------- Find Email Address ----------
set text item delimiters to "mailto:"
set emaila to text item 2 of pageHTML
set text item delimiters to ">"
set emaila to text item 1 of emaila
------- Find phone number ----------
set text item delimiters to "Academic Advising Center</font></a> </td>
<td class=\"tableData\"><font size=2>"
set phonen to text item 2 of pageHTML
set text item delimiters to "</font>"
set phonen to text item 1 of phonen
set pageHTML to (do shell script "curl " & quoted form of ("http://directory.csus.edu/searchResult.jsp?peopleBydeptName=Academic+Advising+Center&act=people&popWin=true"))
set TID to AppleScript's text item delimiters -- save previous value
set theList to {}
set text item delimiters to "mailto:"
set xxx to text items 2 thru -1 of pageHTML
set text item delimiters to ">"
repeat with i from 1 to count of xxx
set end of theList to text item 1 of (item i of xxx)
end repeat
set text item delimiters to TID
return theList
Thanks! That really helped! I updated my original code as you can see below and it returns a good list. Now if only I could figure out how to import the list into multiple records and fields in FileMaker.
--try
tell application "Safari"
set pageHTML to source of document 1
end tell
set TID to AppleScript's text item delimiters -- save previous value
set theList to {}
------- Find Email Address ----------
considering case
set text item delimiters to "Email Address :
</TD>
</TR>
<TR>
<TD style=\"padding:3px\">
"
set emaila to text items 2 thru -1 of pageHTML
end considering
set text item delimiters to " "
------- Find Phone Number ----------
set text item delimiters to "Primary Phone :
</TD>
</TR>
<TR>
<TD style=\"padding:3px\">
"
set phonen to text items 2 thru -1 of pageHTML
set text item delimiters to " "
------- Find Last Name ----------
set text item delimiters to "Last Name :
</TD>
</TR>
<TR>
<TD style=\"padding:3px\">
"
set lname to text items 2 thru -1 of pageHTML
set text item delimiters to " "
------- Find First Name ----------
set text item delimiters to "First Name :
</TD>
</TR>
<TR>
<TD style=\"padding:3px\">
"
set fname to text items 2 thru -1 of pageHTML
set text item delimiters to " "
repeat with i from 1 to count of emaila
set end of theList to text item 1 of (item i of emaila)
set end of theList to text item 1 of (item i of phonen)
set end of theList to text item 1 of (item i of lname)
set end of theList to text item 1 of (item i of fname)
end repeat
set text item delimiters to TID
return theList
--end try