XML parser running slowly

Hi,

Relative newbie to Appelscript with some VBA experience. Was wondering if anyone could help me with an issue I’m having with parsing XML.

From what I have found here, I have put together an Applescript to allow the user to select an XML file and which will then extract the relevant elements from within, which I will then need to do some other processes to. It goes like this:

set theXMLFile to ((choose file) as string)
tell application "System Events"
	set theXMLFile to XML file theXMLFile
	set planEvent to XML elements of XML element "schedule" of theXMLFile whose name is "Event"
	set planDate to {}
	set planTime to {}
	set titleName to {}
	set episodeName to {}
	set titleID to {}
	set episodeID to {}
	set txEvent to {}
	repeat with i from 1 to (count planEvent)
		set planDate to planDate & value of XML element "Date" of item i of planEvent
		set planTime to planTime & value of XML element "Time" of item i of planEvent
		set titleName to titleName & value of XML element "Title" of item i of planEvent
		set episodeName to episodeName & value of XML element "Episode" of item i of planEvent
		set titleID to titleID & value of XML element "umbrellaID" of item i of planEvent
		set episodeID to episodeID & value of XML element "specificID" of item i of planEvent
	end repeat
end telll

I tested this on a small-ish file with 191 Event elements and it took 15 seconds or so, which is fine. I then tried it with a much longer document, around 1500 Event elements, and it took just over 17 minutes, which is unusable for us.

Am I doing something fundamentally wrong? Should I be trying a different angle? Or are the longer files always going to be too large for the script to handle in a usable time?

Many thanks

You could probably speed it up a bit by changing “set planDate to planDate &” to “set end of planDate”, and similarly for the other lists. But as you’ve seen, System Events is very inefficient for XML of any real size.

It would be helpful if you could post a snippet of the XML, but something like this, using AppleScriptObjC, is probably going to be faster:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set theXMLFile to POSIX path of (choose file)
set theURL to current application's |NSURL|'s fileURLWithPath:theXMLFile
set {theXMLDoc, theError} to current application's NSXMLDocument's alloc()'s initWithContentsOfURL:theURL options:0 |error|:(reference)
if theXMLDoc = missing value then error (theError's localizedDescription() as text)
set {planEvents, theError} to theXMLDoc's nodesForXPath:"//schedule/Event" |error|:(reference)
if planEvents = missing value then error (theError's localizedDescription() as text)
set planDates to {}
-- etc
repeat with aNode in planEvents
	set end of planDates to (aNode's elementsForName:"Date")'s firstObject()'s stringValue() as text
	-- etc
end repeat

I also wonder why you’re making lists, other than for testing. It’s generally more efficient to extract the values as you actually need them.

Fixed typo: nodesForXpath should be nodesForXPath

Hi. Welcome to MacScripter.

I haven’t used System Event’s XML Suite much myself, but a couple of things which are probably effecting your script’s speed are:

  1. Growing the lists by concatenation. Each concatenation creates a new list, so you can imagine the number of ever-longer lists that’s producing with 6 list variables * 1500 event elements! It’s more efficient to set the end of each list to the item you want to add. That way you’re just adding things to the same lists (except when the system reallocates memory in the background) instead of making and discarding thousands of new lists.
  2. Less significantly, you could assign item i of planEvent to a variable and then use the variable instead of ‘item i of planEvent’ every time, so that each item of planEvent is only fetched from the list once instead of six times.

You may also be able to speed things up by using “references” to the variables hold the lists to which you’re adding values ” that is, referring to the variables as belonging to something instead of just naming them (see the script below). AppleScript has a quirk whereby this can speed up accesses to multiple items in very long lists.

I’m not able to try test script, but I’d guess it’s a bit faster than your original. An ASObjC version might possibly be faster still, but I’ll leave than to someone more familiar with XML handling. (Edit: I see Shane’s posted such a script while I’ve been fiddling with this.) :slight_smile:

set theXMLFile to ((choose file) as string)
tell application "System Events"
	-- Initialise a script object at run time through which these list variables can be referenced.
	script
		property planDate : {}
		property planTime : {}
		property titleName : {}
		property episodeName : {}
		property titleID : {}
		property episodeID : {}
		property txEvent : {}
	end script
	set o to result -- Call it, say, 'o'.
	
	set theXMLFile to XML file theXMLFile
	set planEvent to XML elements of XML element "schedule" of theXMLFile whose name is "Event"
	repeat with i from 1 to (count planEvent)
		-- Get this item of planEvent just once.
		set thisItem to item i of planEvent
		-- set the 'end' of each list, referencing the list variables as belonging to the script object 'o'.
		set end of o's planDate to value of XML element "Date" of thisItem
		set end of o's planTime to value of XML element "Time" of thisItem
		set end of o's titleName to value of XML element "Title" of thisItem
		set end of o's episodeName to value of XML element "Episode" of thisItem
		set end of o's titleID to value of XML element "umbrellaID" of thisItem
		set end of o's episodeID to value of XML element "specificID" of thisItem
	end repeat
end tell

Many thanks for the replies.

I tried Nigel’s suggestion, but, while it worked, it didn’t result in a significant time improvement. Sorry :frowning:

I was a little scared of Shane’s as I haven’t worked my way up to ASObjC yet and am a little wary of copying code into my projects that I don’t understand. But, putting my best foot forward, I tried it anyway and got an error on this line:

set {planEvents, theError} to theXMLDoc's nodesForXpath:"//schedule/Event" |error|:(reference)

And yes, I should have included a snippet of the XML itself:

Thanks again.

Taking the clues from Shane’s script, this is certainly faster than the System Events efforts!

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set theXMLFile to POSIX path of (choose file)
set theURL to current application's |NSURL|'s fileURLWithPath:theXMLFile
set {theXMLDoc, theError} to current application's NSXMLDocument's alloc()'s initWithContentsOfURL:theURL options:0 |error|:(reference)
if theXMLDoc = missing value then error (theError's localizedDescription() as text)
set planEvent to theXMLDoc's rootElement()'s elementsForName:"Event"

-- Initialise a script object at run time through which these list variables can be referenced.
script
	property planDate : {}
	property planTime : {}
	property titleName : {}
	property episodeName : {}
	property titleID : {}
	property episodeID : {}
	-- property txEvent : {} -- Not used?
end script
set o to result -- Call it, say, 'o'.

repeat with aNode in planEvent
	set end of o's planDate to (aNode's elementsForName:"Date")'s firstObject()'s stringValue() as text
	set end of o's planTime to (aNode's elementsForName:"Time")'s firstObject()'s stringValue() as text
	set end of o's titleName to (aNode's elementsForName:"Title")'s firstObject()'s stringValue() as text
	set end of o's episodeName to (aNode's elementsForName:"Episode")'s firstObject()'s stringValue() as text
	set end of o's titleID to (aNode's elementsForName:"umbrellaID")'s firstObject()'s stringValue() as text
	set end of o's episodeID to (aNode's elementsForName:"specificID")'s firstObject()'s stringValue() as text
end repeat

Aaargh – I fixed the spelling here, but forgot to update it there. It’s XPath, not Xpath.

Nigel’s latest version accomplishes the same thing. I was using an XPath query because at that stage I was guessing the actual format of your XML.

I tried making a test file with about 1500 elements, as per the OP, and rewrote the script to use multiple XPath queries, thus avoiding all the stuff in the repeat loop. It speeds things up again, by a factor of about x4:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set theXMLFile to POSIX path of (choose file)
set theURL to current application's |NSURL|'s fileURLWithPath:theXMLFile
set {theXMLDoc, theError} to current application's NSXMLDocument's alloc()'s initWithContentsOfURL:theURL options:0 |error|:(reference)
if theXMLDoc = missing value then error (theError's localizedDescription() as text)
set planDate to ((theXMLDoc's nodesForXPath:"//schedule/Event/Date" |error|:(missing value))'s valueForKey:"stringValue") as list
set planTime to ((theXMLDoc's nodesForXPath:"//schedule/Event/Time" |error|:(missing value))'s valueForKey:"stringValue") as list
set titleName to ((theXMLDoc's nodesForXPath:"//schedule/Event/Title" |error|:(missing value))'s valueForKey:"stringValue") as list
set episodeName to ((theXMLDoc's nodesForXPath:"//schedule/Event/Episode" |error|:(missing value))'s valueForKey:"stringValue") as list
set titleID to ((theXMLDoc's nodesForXPath:"//schedule/Event/umbrellaID" |error|:(missing value))'s valueForKey:"stringValue") as list
set episodeID to ((theXMLDoc's nodesForXPath:"//schedule/Event/specificID" |error|:(missing value))'s valueForKey:"stringValue") as list

By my count, we’re down from about 17 minutes to <0.4 seconds :cool:

Oh dear. I hope that’s fast enough. :wink:

:cool:

Wow! :smiley:

Thanks so much for that. Yes, it’s almost instant now. Guess I am going to have to invest some time in learning some ASObjC to figure out what does what.

Thanks again.

Yes thanks for this. I myself am now seeing the power off OBJc as I’m going into the AppleScript world.

I started a AppleScript using SATimages LibXML library
I got everything figured out and accomplished what I wanted to on my smaller XMl file that had maybe 150 lines or so in it. (I’m work on passing a Traktor DJ software music library)
But when I went to use it on my full library where one node had over 50,000 entries in it.
Well it didn’t even get thru it.

I’ve taken the last code posted above and modified it to get the info that I need and it’s really fast!
Few questions though:

  1. when I run the script (in Script Debugger 6.0.6) in completes running.
    But then about 5-15 secs after it completes running Script Debugger crashes.
    Any ideas here?

  2. for many of my tasks what I’m looking to do is:
    A) first find only certain entries that contain a particular attribute value
    B) then grab all of the the Entries and Attributes I need from that Filtered List into a new list

    What’s the best way to do this.
    2.1). I can come up with an Xpath Filter that would get me my Sublist
    2.2) but I’m not clear on how to point further Xpath requests towards this.
    As all the code above seems to point getting the values towards the doc

//---------------
EDIT: after reviewing more code on this post I’m now thinking I should use
Something like this:

set {planEvents, theError} to theXMLDoc’s nodesForXpath:“//schedule/Event” |error|:(reference)

And then rather than using the {XMLDoc, theError} when getting my data I should use:
{planEvents, theError}
--------------//

Or
2.3) should I just gather all the data that I need into a list/record and then process from
       There?

3.1) Is there any “dictionary” available to help breakdown the code that listed in this thread
So that I can understand what exactly is happening here?
(I’ve found and realize I should get Shane’s Book!)

3.3) How can I access any OBJc dictionary in Script Debugger 6?
I realize that I can use the RawSyntax. But that doesn’t
Really give much info.

4). What OSAX libraries should I look at using that will help
Me further with any other OBJc usage in AppleScript?

Thanks so much

Kerry

You should be running 6.0.8, at least. Preferably, version 7.

Xpath is by far the quickest.

Apple’s documentation is available on-line, or via Xcode. But it’s written with Objective-C in mind, so it can take a bit of effort to apply it to AppleScript.

There is no dictionary for Objective-C. You have to rely on Apple’s documentation.

I’m afraid scripting additions aren’t going to help, either.

The best resources are existing code (around here and other places), Apple’s documentation, my book or some similar guide to the basics, and a bit of time.

Thanks Shane… for others finding this amazing thread.
I’ve found this online and helping clear things up a bit:

https://developer.apple.com/library/content/documentation/Cocoa/Conceptual/NSXML_Concepts/NSXML.html#//apple_ref/doc/uid/TP40001263-SW1

OK thanks,
Now my next question is how do i create an array using a xpath query where
I only want to get the elements whose child element has an attribute of a certain value?

See attached XML below:

what I’m trying to get is:

All ENTRY elements
that have INFO children
whose RATING attribute is “Search In Playlists”

I’ve tried everything with no luck?

thanks

Kerry

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<NML VERSION="19">
	<HEAD COMPANY="www.native-instruments.com" PROGRAM="Traktor"/>
	<COLLECTION ENTRIES="20">
		<ENTRY LOCK="0" LOCK_MODIFICATION_TIME="2017-12-29T16:02:30" TITLE="All My Friends (Franz Ferdinand Version)" ARTIST="LCD Soundsystem">
			<LOCATION DIR="/:Users/:kerry/:Music/:iTunes/:iTunes Media/:Music/:LCD Soundsystem/:All My Friends - EP/:" FILE="All My Friends (Franz Ferdinand Version).m4p" VOLUME="Tekno" VOLUMEID="Tekno"/>
			<ALBUM TRACK="2" TITLE="All My Friends - EP"/>
			<MODIFICATION_INFO AUTHOR_TYPE="user"/>
			<INFO BITRATE="128000" GENRE="Alternative" COMMENT="missing 20180401" RATING="Search In Playlists" PLAYTIME="353" IMPORT_DATE="2018/4/29" FLAGS="10" FILESIZE="5582" COLOR="1"/>
			<TEMPO BPM_QUALITY="100.000000"/>
		</ENTRY>
		<ENTRY MODIFIED_DATE="2018/12/30" MODIFIED_TIME="27503" LOCK="0" LOCK_MODIFICATION_TIME="2017-12-30T07:37:56" AUDIO_ID="AMMAARIREhEREhEhIzMyERIjMzMhEjRERjVEM1VDNGmprIdZqZepiHq7q6iXdnqruruImsqqqIl5iLvcvdQBIyMhEiMiMyIhNEVUVUNURDM0NJmpqohZuZe4mGqqrKmWdoq7uruImsqq2HiImLzcrLvLrIiavKq5iJiYrM3KiZm726zLuGd2dlZmZ3Zmd4mqvImHmruph2mZzamoeanOzLlmR3dmdHRoZ1dmZHZ2d1dHdnaqzNy8yqmYiIeXiIl4iomIh4iJeImoiIqJu9ysl4qsqrqImKu62YmIqKzMvLqrvJiaq6qqeIm6qsl4d5eHdmhndnVEQzMiIRAAAAAAAA==" TITLE="Jessie's Girl" ARTIST="Rick Springfield">
			<LOCATION DIR="/:Users/:kerry/:Music/:iTunes/:iTunes Media/:Music/:Compilations/:30 Stars of the 80's/:" FILE="Jessie's Girl.m4a" VOLUME="Tekno" VOLUMEID="Tekno"/>
			<ALBUM OF_TRACKS="30" TRACK="5" TITLE="Billboard Hot 100 Singles 1981"/>
			<MODIFICATION_INFO AUTHOR_TYPE="user"/>
			<INFO BITRATE="2822400" GENRE="Billboard" COMMENT="missing 20180401" RATING="Search In Playlists" COVERARTID="051/TNH244BFYPFWHBIZEDMXD31H1BNC" KEY="D" PLAYTIME="196" PLAYTIME_FLOAT="195.466660" RANKING="102" IMPORT_DATE="2018/4/29" RELEASE_DATE="1981/1/1" FLAGS="14" COLOR="1"/>
			<TEMPO BPM="131.627625" BPM_QUALITY="100.000000"/>
			<LOUDNESS PEAK_DB="0.930028" PERCEIVED_DB="-0.424675" ANALYZED_DB="-0.424675"/>
			<MUSICAL_KEY VALUE="2"/>
			<CUE_V2 NAME="AutoGrid" DISPL_ORDER="0" TYPE="4" START="352.527589" LEN="0.000000" REPEATS="-1" HOTCUE="0"/>
		</ENTRY>
		
	</COLLECTION>
	
</NML>

That’s really an Xpath question, not an AppleScript question, so this may not be the best place to ask. You might try something like: https://www.w3schools.com/xml/xpath_intro.asp

Hi Nigel,

just for info for all (you and Shane know this) I tested the following script:


property myArray : missing value

set myString to LoopWithString() -- test changing the handler

on LoopWithString()
	set startDate to current date
	set myString to ""
	repeat with j from 1 to 50000
		set myString to myString & j
	end repeat
	set endDate to current date
	set elapsedTime to endDate - startDate
	return elapsedTime -- 6 seconds on MacBook Pro 2015
end LoopWithString

on LoopWithArraySlow()
	set startDate to current date
	set myArray to {}
	repeat with j from 1 to 50000
		copy j to end of my myArray
	end repeat
	set myString to myArray as string
	set endDate to current date
	set elapsedTime to endDate - startDate
	return elapsedTime -- 37 seconds on MacBook Pro 2015 
end LoopWithArraySlow

on LoopWithArrayFast()
	-- Note the use of word "my" to access the array
	set startDate to current date
	set myArray to {}
	repeat with j from 1 to 50000
		copy j to end of my myArray
	end repeat
	set myString to myArray as string
	set endDate to current date
	set elapsedTime to endDate - startDate
	return elapsedTime -- 1 seconds on MacBook Pro 2015 
end LoopWithArrayFast

These result with loop of 50,000.
I remember when I discovered the trick about the “magic” word “my” many years ago…

Stefano - Ame

OK I figured out my xPath properly:

//COLLECTION/ENTRY/.[//@RATING=‘Search In Playlists’]

but wondering if this is “better”

–//COLLECTION/ENTRY/*//[@RATING=‘Search In Playlists’]


once I got my xPath figured out and working on my small XML
once I ran it on my large XML which has near 50,000 ENTRY items
it didn’t complete.


next I thought I would just load all of my //COLLECTION/ENTRY items into an array.
Then I looped thru each item and figured out how to check if it matched my
criteria. Then added it to a new array.

this worked fine again on my small XML
but still haven’t been able to get it to work reasonably with the
large XML.

any ideas?

What do you mean, exactly?

Were you trying to coerce it to a list?

ASObjC and OSAX libraries are not related. Their purpose is entirely different and one doesn’t succeed the other.

OSAX is as the name implies an library that adds functionality to the programming language. Similar to extensions in PHP or modules in Python. The purpose of OSAX is to expand the functionality and usability of AppleScript.

ASObjC is an window to another environment and has more in common with an do shell script command than OSAX. Its goal is to use the OS base SDK in AppleScript but you still have to programming everything yourself (in an off-AppleScript syntax).

In Practice, the difference would be that in an well written OSAX the entire ASObjC function written by Shane here could be done with a single AppleScript command using libxml.

To answer your question: For ASObjC usage you don’t need any OSAX at all.

I Xpathed all entries into an array.
then looped thru each on to find which ones matched my criteria.
6hrs later i had to force quit and go to bed

What are your criteria? There’s probably a better way.