Perhaps someone has already tried to do what I am trying to do - get the text of HTML elements without the participation of any browser.
If I knew how the getElementsByTagName(‘someTagName’) and getElementsByClassName(‘someClassName’) functions are implemented in Safari, then I could repeat them without the participation of the browser.
My understanding is that these functions 1) find all parts of HTML that start and end with the tag (or class) name, 2) remove the tag name itself from the beginning and end of lines.
The following simple code that I started only does the 2nd step.
use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions
set theHTML to do shell script "curl 'https://www.google.gr'"
set elementsInnerStrings to my getElementsByTagName(theHTML, "script")
on getElementsByTagName(theHTML as text, theTag as text)
set ATID to AppleScript's text item delimiters
set AppleScript's text item delimiters to {"<" & theTag & " ", "</" & theTag}
set tagStrings to text items of theHTML
set AppleScript's text item delimiters to ATID
if (tagStrings as list) is {} then return {}
set tempList to {}
repeat with tagString in tagStrings
if tagString does not start with ">" and tagString does not start with "<!" then set end of tempList to contents of tagString
end repeat
return tempList
end getElementsByTagName
on getElementsByClassName(theHTML as text, theClass as text)
-- similar stuff
end getElementsByClassName
The request for help is: how to do the same effectively? (without repeat loops, for example)
Look into using NSXMLDocument
Thanks for the tip, @technomorph. This really works:
use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions
set theHTML to do shell script "curl 'https://www.google.com'"
set elementsInnerStrings to my getElementsByTagName(theHTML, "script")
on getElementsByTagName(theXML, tagName) -- gets tag text contents
set {theXMLDoc, theError} to current application's NSXMLDocument's alloc()'s initWithXMLString:theXML options:(current application's NSXMLDocumentTidyHTML) |error|:(reference)
set {theMatches, theError} to (theXMLDoc's nodesForXPath:("//" & tagName) |error|:(reference))
return (theMatches's valueForKey:"stringValue") as list
end getElementsByTagName
Note you can also create the NSXMLDocument straight from a NSURL as well.
Here’s some other examples of Paths
Note the stuff in […] are filters
static NSString* PlaylistNameXPath = @“.//title”;
// DJ K-Tel - Session #104 - 20th November 2021
//
//static NSString* PlaylistNameXPath = @“/html/meta[@property=‘og:title’]”;
static NSString* PlaylistTracksXPath = @“.//div[@class=‘info’]”;
//
static NSString* PlaylistTracksTitleXPath = @“.//div[@class=‘sub’]”;
//
Who’ll Stop The Rain?
static NSString* PlaylistTracksArtistXPath = @“.//div[@class=‘head’]”;
//
Creedence Clearwater Revival
Quite interesting examples. Many applications have a distinctly slow speed for retrieving objects when applying filters. One notable example of this is getting a list of Calendars.app events - very slow. I’ll think about using AsObjC to improve the speed of some application-related scripts.
AppleScript events are “expensive”
You may wanna look into eventKit framework (using AsObjC)
https://developer.apple.com/documentation/eventkit?language=objc
I found hensame with iTunes and now use the ITLibrary framework
Much faster. Though it’s read only.
Seems with event kit you can create items and modify them