Relative newbie to Appelscript with some VBA experience. Was wondering if anyone could help me with an issue I’m having with parsing XML.
From what I have found here, I have put together an Applescript to allow the user to select an XML file and which will then extract the relevant elements from within, which I will then need to do some other processes to. It goes like this:
set theXMLFile to ((choose file) as string)
tell application "System Events"
set theXMLFile to XML file theXMLFile
set planEvent to XML elements of XML element "schedule" of theXMLFile whose name is "Event"
set planDate to {}
set planTime to {}
set titleName to {}
set episodeName to {}
set titleID to {}
set episodeID to {}
set txEvent to {}
repeat with i from 1 to (count planEvent)
set planDate to planDate & value of XML element "Date" of item i of planEvent
set planTime to planTime & value of XML element "Time" of item i of planEvent
set titleName to titleName & value of XML element "Title" of item i of planEvent
set episodeName to episodeName & value of XML element "Episode" of item i of planEvent
set titleID to titleID & value of XML element "umbrellaID" of item i of planEvent
set episodeID to episodeID & value of XML element "specificID" of item i of planEvent
end repeat
end telll
I tested this on a small-ish file with 191 Event elements and it took 15 seconds or so, which is fine. I then tried it with a much longer document, around 1500 Event elements, and it took just over 17 minutes, which is unusable for us.
Am I doing something fundamentally wrong? Should I be trying a different angle? Or are the longer files always going to be too large for the script to handle in a usable time?
You could probably speed it up a bit by changing “set planDate to planDate &” to “set end of planDate”, and similarly for the other lists. But as you’ve seen, System Events is very inefficient for XML of any real size.
It would be helpful if you could post a snippet of the XML, but something like this, using AppleScriptObjC, is probably going to be faster:
use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions
set theXMLFile to POSIX path of (choose file)
set theURL to current application's |NSURL|'s fileURLWithPath:theXMLFile
set {theXMLDoc, theError} to current application's NSXMLDocument's alloc()'s initWithContentsOfURL:theURL options:0 |error|:(reference)
if theXMLDoc = missing value then error (theError's localizedDescription() as text)
set {planEvents, theError} to theXMLDoc's nodesForXPath:"//schedule/Event" |error|:(reference)
if planEvents = missing value then error (theError's localizedDescription() as text)
set planDates to {}
-- etc
repeat with aNode in planEvents
set end of planDates to (aNode's elementsForName:"Date")'s firstObject()'s stringValue() as text
-- etc
end repeat
I also wonder why you’re making lists, other than for testing. It’s generally more efficient to extract the values as you actually need them.
I haven’t used System Event’s XML Suite much myself, but a couple of things which are probably effecting your script’s speed are:
Growing the lists by concatenation. Each concatenation creates a new list, so you can imagine the number of ever-longer lists that’s producing with 6 list variables * 1500 event elements! It’s more efficient to set the end of each list to the item you want to add. That way you’re just adding things to the same lists (except when the system reallocates memory in the background) instead of making and discarding thousands of new lists.
Less significantly, you could assign item i of planEvent to a variable and then use the variable instead of ‘item i of planEvent’ every time, so that each item of planEvent is only fetched from the list once instead of six times.
You may also be able to speed things up by using “references” to the variables hold the lists to which you’re adding values ” that is, referring to the variables as belonging to something instead of just naming them (see the script below). AppleScript has a quirk whereby this can speed up accesses to multiple items in very long lists.
I’m not able to try test script, but I’d guess it’s a bit faster than your original. An ASObjC version might possibly be faster still, but I’ll leave than to someone more familiar with XML handling. (Edit: I see Shane’s posted such a script while I’ve been fiddling with this.)
set theXMLFile to ((choose file) as string)
tell application "System Events"
-- Initialise a script object at run time through which these list variables can be referenced.
script
property planDate : {}
property planTime : {}
property titleName : {}
property episodeName : {}
property titleID : {}
property episodeID : {}
property txEvent : {}
end script
set o to result -- Call it, say, 'o'.
set theXMLFile to XML file theXMLFile
set planEvent to XML elements of XML element "schedule" of theXMLFile whose name is "Event"
repeat with i from 1 to (count planEvent)
-- Get this item of planEvent just once.
set thisItem to item i of planEvent
-- set the 'end' of each list, referencing the list variables as belonging to the script object 'o'.
set end of o's planDate to value of XML element "Date" of thisItem
set end of o's planTime to value of XML element "Time" of thisItem
set end of o's titleName to value of XML element "Title" of thisItem
set end of o's episodeName to value of XML element "Episode" of thisItem
set end of o's titleID to value of XML element "umbrellaID" of thisItem
set end of o's episodeID to value of XML element "specificID" of thisItem
end repeat
end tell
I tried Nigel’s suggestion, but, while it worked, it didn’t result in a significant time improvement. Sorry
I was a little scared of Shane’s as I haven’t worked my way up to ASObjC yet and am a little wary of copying code into my projects that I don’t understand. But, putting my best foot forward, I tried it anyway and got an error on this line:
set {planEvents, theError} to theXMLDoc's nodesForXpath:"//schedule/Event" |error|:(reference)
And yes, I should have included a snippet of the XML itself:
Taking the clues from Shane’s script, this is certainly faster than the System Events efforts!
use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions
set theXMLFile to POSIX path of (choose file)
set theURL to current application's |NSURL|'s fileURLWithPath:theXMLFile
set {theXMLDoc, theError} to current application's NSXMLDocument's alloc()'s initWithContentsOfURL:theURL options:0 |error|:(reference)
if theXMLDoc = missing value then error (theError's localizedDescription() as text)
set planEvent to theXMLDoc's rootElement()'s elementsForName:"Event"
-- Initialise a script object at run time through which these list variables can be referenced.
script
property planDate : {}
property planTime : {}
property titleName : {}
property episodeName : {}
property titleID : {}
property episodeID : {}
-- property txEvent : {} -- Not used?
end script
set o to result -- Call it, say, 'o'.
repeat with aNode in planEvent
set end of o's planDate to (aNode's elementsForName:"Date")'s firstObject()'s stringValue() as text
set end of o's planTime to (aNode's elementsForName:"Time")'s firstObject()'s stringValue() as text
set end of o's titleName to (aNode's elementsForName:"Title")'s firstObject()'s stringValue() as text
set end of o's episodeName to (aNode's elementsForName:"Episode")'s firstObject()'s stringValue() as text
set end of o's titleID to (aNode's elementsForName:"umbrellaID")'s firstObject()'s stringValue() as text
set end of o's episodeID to (aNode's elementsForName:"specificID")'s firstObject()'s stringValue() as text
end repeat
I tried making a test file with about 1500 elements, as per the OP, and rewrote the script to use multiple XPath queries, thus avoiding all the stuff in the repeat loop. It speeds things up again, by a factor of about x4:
use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions
set theXMLFile to POSIX path of (choose file)
set theURL to current application's |NSURL|'s fileURLWithPath:theXMLFile
set {theXMLDoc, theError} to current application's NSXMLDocument's alloc()'s initWithContentsOfURL:theURL options:0 |error|:(reference)
if theXMLDoc = missing value then error (theError's localizedDescription() as text)
set planDate to ((theXMLDoc's nodesForXPath:"//schedule/Event/Date" |error|:(missing value))'s valueForKey:"stringValue") as list
set planTime to ((theXMLDoc's nodesForXPath:"//schedule/Event/Time" |error|:(missing value))'s valueForKey:"stringValue") as list
set titleName to ((theXMLDoc's nodesForXPath:"//schedule/Event/Title" |error|:(missing value))'s valueForKey:"stringValue") as list
set episodeName to ((theXMLDoc's nodesForXPath:"//schedule/Event/Episode" |error|:(missing value))'s valueForKey:"stringValue") as list
set titleID to ((theXMLDoc's nodesForXPath:"//schedule/Event/umbrellaID" |error|:(missing value))'s valueForKey:"stringValue") as list
set episodeID to ((theXMLDoc's nodesForXPath:"//schedule/Event/specificID" |error|:(missing value))'s valueForKey:"stringValue") as list
By my count, we’re down from about 17 minutes to <0.4 seconds
Thanks so much for that. Yes, it’s almost instant now. Guess I am going to have to invest some time in learning some ASObjC to figure out what does what.
Yes thanks for this. I myself am now seeing the power off OBJc as I’m going into the AppleScript world.
I started a AppleScript using SATimages LibXML library
I got everything figured out and accomplished what I wanted to on my smaller XMl file that had maybe 150 lines or so in it. (I’m work on passing a Traktor DJ software music library)
But when I went to use it on my full library where one node had over 50,000 entries in it.
Well it didn’t even get thru it.
I’ve taken the last code posted above and modified it to get the info that I need and it’s really fast!
Few questions though:
when I run the script (in Script Debugger 6.0.6) in completes running.
But then about 5-15 secs after it completes running Script Debugger crashes.
Any ideas here?
for many of my tasks what I’m looking to do is:
A) first find only certain entries that contain a particular attribute value
B) then grab all of the the Entries and Attributes I need from that Filtered List into a new list
What’s the best way to do this.
2.1). I can come up with an Xpath Filter that would get me my Sublist
2.2) but I’m not clear on how to point further Xpath requests towards this.
As all the code above seems to point getting the values towards the doc
//---------------
EDIT: after reviewing more code on this post I’m now thinking I should use
Something like this:
set {planEvents, theError} to theXMLDoc’s nodesForXpath:“//schedule/Event” |error|:(reference)
And then rather than using the {XMLDoc, theError} when getting my data I should use:
{planEvents, theError}
--------------//
Or
2.3) should I just gather all the data that I need into a list/record and then process from
There?
3.1) Is there any “dictionary” available to help breakdown the code that listed in this thread
So that I can understand what exactly is happening here?
(I’ve found and realize I should get Shane’s Book!)
3.3) How can I access any OBJc dictionary in Script Debugger 6?
I realize that I can use the RawSyntax. But that doesn’t
Really give much info.
4). What OSAX libraries should I look at using that will help
Me further with any other OBJc usage in AppleScript?
You should be running 6.0.8, at least. Preferably, version 7.
Xpath is by far the quickest.
Apple’s documentation is available on-line, or via Xcode. But it’s written with Objective-C in mind, so it can take a bit of effort to apply it to AppleScript.
There is no dictionary for Objective-C. You have to rely on Apple’s documentation.
I’m afraid scripting additions aren’t going to help, either.
The best resources are existing code (around here and other places), Apple’s documentation, my book or some similar guide to the basics, and a bit of time.
OK thanks,
Now my next question is how do i create an array using a xpath query where
I only want to get the elements whose child element has an attribute of a certain value?
See attached XML below:
what I’m trying to get is:
All ENTRY elements
that have INFO children
whose RATING attribute is “Search In Playlists”
just for info for all (you and Shane know this) I tested the following script:
property myArray : missing value
set myString to LoopWithString() -- test changing the handler
on LoopWithString()
set startDate to current date
set myString to ""
repeat with j from 1 to 50000
set myString to myString & j
end repeat
set endDate to current date
set elapsedTime to endDate - startDate
return elapsedTime -- 6 seconds on MacBook Pro 2015
end LoopWithString
on LoopWithArraySlow()
set startDate to current date
set myArray to {}
repeat with j from 1 to 50000
copy j to end of my myArray
end repeat
set myString to myArray as string
set endDate to current date
set elapsedTime to endDate - startDate
return elapsedTime -- 37 seconds on MacBook Pro 2015
end LoopWithArraySlow
on LoopWithArrayFast()
-- Note the use of word "my" to access the array
set startDate to current date
set myArray to {}
repeat with j from 1 to 50000
copy j to end of my myArray
end repeat
set myString to myArray as string
set endDate to current date
set elapsedTime to endDate - startDate
return elapsedTime -- 1 seconds on MacBook Pro 2015
end LoopWithArrayFast
These result with loop of 50,000.
I remember when I discovered the trick about the “magic” word “my” many years ago…
//COLLECTION/ENTRY/.[//@RATING=‘Search In Playlists’]
but wondering if this is “better”
–//COLLECTION/ENTRY/*//[@RATING=‘Search In Playlists’]
once I got my xPath figured out and working on my small XML
once I ran it on my large XML which has near 50,000 ENTRY items
it didn’t complete.
next I thought I would just load all of my //COLLECTION/ENTRY items into an array.
Then I looped thru each item and figured out how to check if it matched my
criteria. Then added it to a new array.
this worked fine again on my small XML
but still haven’t been able to get it to work reasonably with the
large XML.
ASObjC and OSAX libraries are not related. Their purpose is entirely different and one doesn’t succeed the other.
OSAX is as the name implies an library that adds functionality to the programming language. Similar to extensions in PHP or modules in Python. The purpose of OSAX is to expand the functionality and usability of AppleScript.
ASObjC is an window to another environment and has more in common with an do shell script command than OSAX. Its goal is to use the OS base SDK in AppleScript but you still have to programming everything yourself (in an off-AppleScript syntax).
In Practice, the difference would be that in an well written OSAX the entire ASObjC function written by Shane here could be done with a single AppleScript command using libxml.
To answer your question: For ASObjC usage you don’t need any OSAX at all.