Total noob here! I’m attempting to scrape live timing from the websites used by kart tracks to display race results. At this point, I’d like to grab the data in the table every 10 to 15 seconds and dump it to an CSV file that I will eventually analyze in Excel.
I’ve been able to grab PDFs of the page using FakeApp for Mac, but extracting the data I want from the PDFs is challenging. I’ve poked around using AppleScript, but I don’t know how to access the data I want from the page.
I’ve been able to figure out the xpath as //div[3]/table/tbody/tr/td
My questions:
What applescript commands should I investigate using to capture the data?
What terms should I use to search this site for help on this task?
Here is what I have tried thus far with no results.
tell application "Safari"
activate
open location "http://vlcharlotte.clubspeedtiming.com/sp_center/livescore.aspx"
delay 3
set selectedText to (do JavaScript "document.evaluate('//div[3]/table/tbody/tr[1]/td[3]',document,null,9,null).singleNodeValue;")
-- set selectedText to (do JavaScript "document.getElementByID(//div[3]/table/tbody/tr[1]/td[3]).innerHTML" in document 1)
display dialog selectedText
end tell
tell application "Safari"
activate
open location "http://vlcharlotte.clubspeedtiming.com/sp_center/livescore.aspx"
delay 3
tell current tab of window 1
do JavaScript "testg = document.getElementByID('//div[3]/table/tbody/tr[1]/td[3]').innerText;"
--do JavaScript "dr1 = document.selectNodes('//div[3]/table/tbody/tr[1]/td[3]').value;"
-- do JavaScript "dr2 = document.selectNodes('//div[3]/table/tbody/tr[2]/td');"
-- do JavaScript "dr3 = document.selectNodes('//div[3]/table/tbody/tr[3]/td');"
do JavaScript "alert(dr1);"
end tell
end tell
Since AppleScript is new to you and probably shell scripting (curl) as well I would say that for now the easiest way is using safari and do javascript command to scrape data from the site.
For using javascript in safari the easiest way to search is by looking for a javascript command like getElementByID()
The next step would be using curl in a do shell script to make the script faceless and independent. You have more control and the script looks more professional. You can send the data from the site to any text processor which will output the right data back into your script.
My problem is that the URL doesn’t work on my machine. It’s a blank page so or the site is using protocols outside the HTTP (port 80) or isn’t worldwide published. Your problem is that an XML path doesn’t work for the getElementById (note the small capital d at the end, javascript is case sensitive). HTML is not valid XML (XHTML is by the way) and therefore using XML paths are not valid because it will only work on a valid XML DOM. getElementById() is the best way to find an element in your document because the id must be unique in a document. However it’s not required for an element to have an attribute id, so a web designer/developer will use an attribute id for those elements he needs later in his to make them easy to find.
If the getElementById() doesn’t work (the element you want doesn’t have an attribute id) you can find elements different ways, I still prefer to use getElementsByClassName(), getElementsByName() or getElementsByTagName() and in that order. Class name is most used (ccs names) and by name is less used but those are still designer / developer created names and can be unique in some case which makes elements easy to find. getElementsByTagName() will definitely get you the element you want but it’ll return an array of elements where you have find out which one is the one you’re looking for.