I’m trying to get multiple bits of data from a web page using Applescript to automate. This script pulls the links from the web page and then searches for specific text within the source code. The script retrieves the first set of data no problem, but I’m having problems getting the script to go back into the Source code and continue to get the remaining ~80 bits of information. I have another script below that has a chance to collect the data but it isn’t working yet. Any suggestions to get this thing working would be great. I apologize in advance for my newbie ignorance…
This script opens two pages, the second page is the source for the scraping. Getting the thing to loop is giving me trouble…
set oldDelimiters to AppleScript's text item delimiters
set AppleScript's text item delimiters to ""
tell application "Safari"
activate
open location "http://chem.sis.nlm.nih.gov/chemidplus/ProxyServlet?objectHandle=DBMaint&actionHandle=default&nextPage=jsp/chemidlite/ResultScreen.jsp&TXTSUPERLISTID=000050000"
delay 3
open location "http://chem.sis.nlm.nih.gov/chemidplus/ProxyServlet?objectHandle=Search&actionHandle=getAll3DMViewFiles&nextPage=jsp%2Fcommon%2FChemFull.jsp%3FcalledFrom%3Dlite&chemid=000050000&formatType=_3D"
delay 3
set FullRecordHTML to the source of the document 1 as text
end tell
set StartPoint1 to ((text offset of "onclick=\"javascript:popUpInfoWin('<H2>Data Source Information</H2><br><b>List Acronyms</b><br>" in FullRecordHTML) + 94)
set EndPoint1 to ((text offset of "<br>');\"><img src=\"images/chemidlite/infosmall.gif\" width=\"12\" height=\"12\" border=\"0\"></a>" in FullRecordHTML) - 1)
set Target1 to (characters StartPoint1 thru EndPoint1) of FullRecordHTML as string
set Startpoint2 to EndPoint1 + 92
set EndPoint2 to ((text offset of "<!-- In case all the names should be broken, uncomment the line below and comment the line above -->" in FullRecordHTML) - 1)
set NewStartPoint to EndPoint2
set Target2 to (characters Startpoint2 thru EndPoint2) of FullRecordHTML as string
set Target2NumberChar to (number of characters in Target2)
repeat with UpCount from 1 to Target2NumberChar
set {test1, test2, test3} to {character UpCount of Target2, (character (UpCount + 1) of Target2), (character (UpCount + 2) of Target2)}
if test1 is equal to test2 and test2 is not equal to test3 and test3 is not equal to " " then
set Target2 to (characters (UpCount + 2) thru (number of characters in Target2)) of Target2 as string
exit repeat
end if
end repeat
set Target2NumberChar to (number of characters in Target2)
repeat with UpCount from 1 to Target2NumberChar
set {test1, test2, test3} to {character UpCount of Target2, (character (UpCount + 1) of Target2), (character (UpCount + 2) of Target2)}
if test1 is equal to test2 and test2 is equal to test3 and test3 is equal to " " then
beep
set Target2 to (characters 1 thru (UpCount - 4)) of Target2 as string
exit repeat
end if
end repeat
set CombinedInfo to (Target1 & " : " & Target2)
display dialog CombinedInfo
set AppleScript's text item delimiters to oldDelimiters
return CombinedInfo
This script sets the source code into paragraphs as a list, but I can’t get the thing to loop successfully…
set oldDelimiters to AppleScript's text item delimiters
set AppleScript's text item delimiters to ""
tell application "Safari"
activate
open location "http://chem.sis.nlm.nih.gov/chemidplus/ProxyServlet?objectHandle=DBMaint&actionHandle=default&nextPage=jsp/chemidlite/ResultScreen.jsp&TXTSUPERLISTID=000050000"
delay 3
open location "http://chem.sis.nlm.nih.gov/chemidplus/ProxyServlet?objectHandle=Search&actionHandle=getAll3DMViewFiles&nextPage=jsp%2Fcommon%2FChemFull.jsp%3FcalledFrom%3Dlite&chemid=000050000&formatType=_3D"
delay 3
set FullRecordHTML to the source of the document 1 as string
end tell
set HTMLparagraphs to every paragraph of FullRecordHTML as list
repeat with EachParagraph in HTMLparagraphs
set StartString to "onclick=\"javascript:popUpInfoWin('<H2>Data Source Information</H2><br><b>List Acronyms</b><br>"
set MidString to "<br>');\"><img src=\"images/chemidlite/infosmall.gif\" width=\"12\" height=\"12\" border=\"0\"></a>"
set EndString to "<!-- In case all the names should be broken, uncomment the line below and comment the line above -->"
set EachParagraph to (EachParagraph & paragraph (EachParagraph + 1) of HTMLparagraphs and paragraph (EachParagraph + 2) of HTMLparagraphs) as string
if EachParagraph contains StartString and MidString and EndString then
RetreiveInfo(EachParagraph)
end if
end repeat
on RetreiveInfo(EachParagraph)
beep
delay 1
set StartPoint1 to ((text offset of "onclick=\"javascript:popUpInfoWin('<H2>Data Source Information</H2><br><b>List Acronyms</b><br>" in EachParagraph) + 94)
set EndPoint1 to ((text offset of "<br>');\"><img src=\"images/chemidlite/infosmall.gif\" width=\"12\" height=\"12\" border=\"0\"></a>" in EachParagraph) - 1)
set Target1 to (characters StartPoint1 thru EndPoint1) of EachParagraph as string
set Startpoint2 to EndPoint1 + 92
set EndPoint2 to ((text offset of "<!-- In case all the names should be broken, uncomment the line below and comment the line above -->" in EachParagraph) - 1)
set NewStartPoint to EndPoint2
set Target2 to (characters Startpoint2 thru EndPoint2) of EachParagraph as string
set Target2NumberChar to (number of characters in Target2)
repeat with UpCount from 1 to Target2NumberChar
set test1 to character UpCount of Target2
set test2 to character (UpCount + 1) of Target2
set test3 to character (UpCount + 2) of Target2
if test1 is equal to test2 and test2 is not equal to test3 and test3 is not equal to " " then
set Target2 to (characters (UpCount + 2) thru (number of characters in Target2)) of Target2 as string
exit repeat
end if
end repeat
set Target2NumberChar to (number of characters in Target2)
repeat with UpCount from 1 to Target2NumberChar
set test1 to character UpCount of Target2
set test2 to character (UpCount + 1) of Target2
set test3 to character (UpCount + 2) of Target2
if test1 is equal to test2 and test2 is equal to test3 and test3 is equal to " " then
beep
set Target2 to (characters 1 thru (UpCount - 4)) of Target2 as string
exit repeat
end if
end repeat
display dialog Target1
end RetreiveInfo
Model: PowerBook G4 (old and but still running strong)
AppleScript: 2.0.1
Browser: Safari 525.20.1
Operating System: Mac OS X (10.5)