XML parser running slowly

Many thanks for the replies.

I tried Nigel’s suggestion, but, while it worked, it didn’t result in a significant time improvement. Sorry :frowning:

I was a little scared of Shane’s as I haven’t worked my way up to ASObjC yet and am a little wary of copying code into my projects that I don’t understand. But, putting my best foot forward, I tried it anyway and got an error on this line:

set {planEvents, theError} to theXMLDoc's nodesForXpath:"//schedule/Event" |error|:(reference)

And yes, I should have included a snippet of the XML itself:

Thanks again.

Taking the clues from Shane’s script, this is certainly faster than the System Events efforts!

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set theXMLFile to POSIX path of (choose file)
set theURL to current application's |NSURL|'s fileURLWithPath:theXMLFile
set {theXMLDoc, theError} to current application's NSXMLDocument's alloc()'s initWithContentsOfURL:theURL options:0 |error|:(reference)
if theXMLDoc = missing value then error (theError's localizedDescription() as text)
set planEvent to theXMLDoc's rootElement()'s elementsForName:"Event"

-- Initialise a script object at run time through which these list variables can be referenced.
script
	property planDate : {}
	property planTime : {}
	property titleName : {}
	property episodeName : {}
	property titleID : {}
	property episodeID : {}
	-- property txEvent : {} -- Not used?
end script
set o to result -- Call it, say, 'o'.

repeat with aNode in planEvent
	set end of o's planDate to (aNode's elementsForName:"Date")'s firstObject()'s stringValue() as text
	set end of o's planTime to (aNode's elementsForName:"Time")'s firstObject()'s stringValue() as text
	set end of o's titleName to (aNode's elementsForName:"Title")'s firstObject()'s stringValue() as text
	set end of o's episodeName to (aNode's elementsForName:"Episode")'s firstObject()'s stringValue() as text
	set end of o's titleID to (aNode's elementsForName:"umbrellaID")'s firstObject()'s stringValue() as text
	set end of o's episodeID to (aNode's elementsForName:"specificID")'s firstObject()'s stringValue() as text
end repeat

Aaargh – I fixed the spelling here, but forgot to update it there. It’s XPath, not Xpath.

Nigel’s latest version accomplishes the same thing. I was using an XPath query because at that stage I was guessing the actual format of your XML.

I tried making a test file with about 1500 elements, as per the OP, and rewrote the script to use multiple XPath queries, thus avoiding all the stuff in the repeat loop. It speeds things up again, by a factor of about x4:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set theXMLFile to POSIX path of (choose file)
set theURL to current application's |NSURL|'s fileURLWithPath:theXMLFile
set {theXMLDoc, theError} to current application's NSXMLDocument's alloc()'s initWithContentsOfURL:theURL options:0 |error|:(reference)
if theXMLDoc = missing value then error (theError's localizedDescription() as text)
set planDate to ((theXMLDoc's nodesForXPath:"//schedule/Event/Date" |error|:(missing value))'s valueForKey:"stringValue") as list
set planTime to ((theXMLDoc's nodesForXPath:"//schedule/Event/Time" |error|:(missing value))'s valueForKey:"stringValue") as list
set titleName to ((theXMLDoc's nodesForXPath:"//schedule/Event/Title" |error|:(missing value))'s valueForKey:"stringValue") as list
set episodeName to ((theXMLDoc's nodesForXPath:"//schedule/Event/Episode" |error|:(missing value))'s valueForKey:"stringValue") as list
set titleID to ((theXMLDoc's nodesForXPath:"//schedule/Event/umbrellaID" |error|:(missing value))'s valueForKey:"stringValue") as list
set episodeID to ((theXMLDoc's nodesForXPath:"//schedule/Event/specificID" |error|:(missing value))'s valueForKey:"stringValue") as list

By my count, we’re down from about 17 minutes to <0.4 seconds :cool:

Oh dear. I hope that’s fast enough. :wink:

:cool:

Wow! :smiley:

Thanks so much for that. Yes, it’s almost instant now. Guess I am going to have to invest some time in learning some ASObjC to figure out what does what.

Thanks again.

Yes thanks for this. I myself am now seeing the power off OBJc as I’m going into the AppleScript world.

I started a AppleScript using SATimages LibXML library
I got everything figured out and accomplished what I wanted to on my smaller XMl file that had maybe 150 lines or so in it. (I’m work on passing a Traktor DJ software music library)
But when I went to use it on my full library where one node had over 50,000 entries in it.
Well it didn’t even get thru it.

I’ve taken the last code posted above and modified it to get the info that I need and it’s really fast!
Few questions though:

  1. when I run the script (in Script Debugger 6.0.6) in completes running.
    But then about 5-15 secs after it completes running Script Debugger crashes.
    Any ideas here?

  2. for many of my tasks what I’m looking to do is:
    A) first find only certain entries that contain a particular attribute value
    B) then grab all of the the Entries and Attributes I need from that Filtered List into a new list

    What’s the best way to do this.
    2.1). I can come up with an Xpath Filter that would get me my Sublist
    2.2) but I’m not clear on how to point further Xpath requests towards this.
    As all the code above seems to point getting the values towards the doc

//---------------
EDIT: after reviewing more code on this post I’m now thinking I should use
Something like this:

set {planEvents, theError} to theXMLDoc’s nodesForXpath:“//schedule/Event” |error|:(reference)

And then rather than using the {XMLDoc, theError} when getting my data I should use:
{planEvents, theError}
--------------//

Or
2.3) should I just gather all the data that I need into a list/record and then process from
       There?

3.1) Is there any “dictionary” available to help breakdown the code that listed in this thread
So that I can understand what exactly is happening here?
(I’ve found and realize I should get Shane’s Book!)

3.3) How can I access any OBJc dictionary in Script Debugger 6?
I realize that I can use the RawSyntax. But that doesn’t
Really give much info.

4). What OSAX libraries should I look at using that will help
Me further with any other OBJc usage in AppleScript?

Thanks so much

Kerry

You should be running 6.0.8, at least. Preferably, version 7.

Xpath is by far the quickest.

Apple’s documentation is available on-line, or via Xcode. But it’s written with Objective-C in mind, so it can take a bit of effort to apply it to AppleScript.

There is no dictionary for Objective-C. You have to rely on Apple’s documentation.

I’m afraid scripting additions aren’t going to help, either.

The best resources are existing code (around here and other places), Apple’s documentation, my book or some similar guide to the basics, and a bit of time.

Thanks Shane… for others finding this amazing thread.
I’ve found this online and helping clear things up a bit:

https://developer.apple.com/library/content/documentation/Cocoa/Conceptual/NSXML_Concepts/NSXML.html#//apple_ref/doc/uid/TP40001263-SW1

OK thanks,
Now my next question is how do i create an array using a xpath query where
I only want to get the elements whose child element has an attribute of a certain value?

See attached XML below:

what I’m trying to get is:

All ENTRY elements
that have INFO children
whose RATING attribute is “Search In Playlists”

I’ve tried everything with no luck?

thanks

Kerry

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<NML VERSION="19">
	<HEAD COMPANY="www.native-instruments.com" PROGRAM="Traktor"/>
	<COLLECTION ENTRIES="20">
		<ENTRY LOCK="0" LOCK_MODIFICATION_TIME="2017-12-29T16:02:30" TITLE="All My Friends (Franz Ferdinand Version)" ARTIST="LCD Soundsystem">
			<LOCATION DIR="/:Users/:kerry/:Music/:iTunes/:iTunes Media/:Music/:LCD Soundsystem/:All My Friends - EP/:" FILE="All My Friends (Franz Ferdinand Version).m4p" VOLUME="Tekno" VOLUMEID="Tekno"/>
			<ALBUM TRACK="2" TITLE="All My Friends - EP"/>
			<MODIFICATION_INFO AUTHOR_TYPE="user"/>
			<INFO BITRATE="128000" GENRE="Alternative" COMMENT="missing 20180401" RATING="Search In Playlists" PLAYTIME="353" IMPORT_DATE="2018/4/29" FLAGS="10" FILESIZE="5582" COLOR="1"/>
			<TEMPO BPM_QUALITY="100.000000"/>
		</ENTRY>
		<ENTRY MODIFIED_DATE="2018/12/30" MODIFIED_TIME="27503" LOCK="0" LOCK_MODIFICATION_TIME="2017-12-30T07:37:56" AUDIO_ID="AMMAARIREhEREhEhIzMyERIjMzMhEjRERjVEM1VDNGmprIdZqZepiHq7q6iXdnqruruImsqqqIl5iLvcvdQBIyMhEiMiMyIhNEVUVUNURDM0NJmpqohZuZe4mGqqrKmWdoq7uruImsqq2HiImLzcrLvLrIiavKq5iJiYrM3KiZm726zLuGd2dlZmZ3Zmd4mqvImHmruph2mZzamoeanOzLlmR3dmdHRoZ1dmZHZ2d1dHdnaqzNy8yqmYiIeXiIl4iomIh4iJeImoiIqJu9ysl4qsqrqImKu62YmIqKzMvLqrvJiaq6qqeIm6qsl4d5eHdmhndnVEQzMiIRAAAAAAAA==" TITLE="Jessie's Girl" ARTIST="Rick Springfield">
			<LOCATION DIR="/:Users/:kerry/:Music/:iTunes/:iTunes Media/:Music/:Compilations/:30 Stars of the 80's/:" FILE="Jessie's Girl.m4a" VOLUME="Tekno" VOLUMEID="Tekno"/>
			<ALBUM OF_TRACKS="30" TRACK="5" TITLE="Billboard Hot 100 Singles 1981"/>
			<MODIFICATION_INFO AUTHOR_TYPE="user"/>
			<INFO BITRATE="2822400" GENRE="Billboard" COMMENT="missing 20180401" RATING="Search In Playlists" COVERARTID="051/TNH244BFYPFWHBIZEDMXD31H1BNC" KEY="D" PLAYTIME="196" PLAYTIME_FLOAT="195.466660" RANKING="102" IMPORT_DATE="2018/4/29" RELEASE_DATE="1981/1/1" FLAGS="14" COLOR="1"/>
			<TEMPO BPM="131.627625" BPM_QUALITY="100.000000"/>
			<LOUDNESS PEAK_DB="0.930028" PERCEIVED_DB="-0.424675" ANALYZED_DB="-0.424675"/>
			<MUSICAL_KEY VALUE="2"/>
			<CUE_V2 NAME="AutoGrid" DISPL_ORDER="0" TYPE="4" START="352.527589" LEN="0.000000" REPEATS="-1" HOTCUE="0"/>
		</ENTRY>
		
	</COLLECTION>
	
</NML>

That’s really an Xpath question, not an AppleScript question, so this may not be the best place to ask. You might try something like: https://www.w3schools.com/xml/xpath_intro.asp

Hi Nigel,

just for info for all (you and Shane know this) I tested the following script:


property myArray : missing value

set myString to LoopWithString() -- test changing the handler

on LoopWithString()
	set startDate to current date
	set myString to ""
	repeat with j from 1 to 50000
		set myString to myString & j
	end repeat
	set endDate to current date
	set elapsedTime to endDate - startDate
	return elapsedTime -- 6 seconds on MacBook Pro 2015
end LoopWithString

on LoopWithArraySlow()
	set startDate to current date
	set myArray to {}
	repeat with j from 1 to 50000
		copy j to end of my myArray
	end repeat
	set myString to myArray as string
	set endDate to current date
	set elapsedTime to endDate - startDate
	return elapsedTime -- 37 seconds on MacBook Pro 2015 
end LoopWithArraySlow

on LoopWithArrayFast()
	-- Note the use of word "my" to access the array
	set startDate to current date
	set myArray to {}
	repeat with j from 1 to 50000
		copy j to end of my myArray
	end repeat
	set myString to myArray as string
	set endDate to current date
	set elapsedTime to endDate - startDate
	return elapsedTime -- 1 seconds on MacBook Pro 2015 
end LoopWithArrayFast

These result with loop of 50,000.
I remember when I discovered the trick about the “magic” word “my” many years ago…

Stefano - Ame

OK I figured out my xPath properly:

//COLLECTION/ENTRY/.[//@RATING=‘Search In Playlists’]

but wondering if this is “better”

–//COLLECTION/ENTRY/*//[@RATING=‘Search In Playlists’]


once I got my xPath figured out and working on my small XML
once I ran it on my large XML which has near 50,000 ENTRY items
it didn’t complete.


next I thought I would just load all of my //COLLECTION/ENTRY items into an array.
Then I looped thru each item and figured out how to check if it matched my
criteria. Then added it to a new array.

this worked fine again on my small XML
but still haven’t been able to get it to work reasonably with the
large XML.

any ideas?

What do you mean, exactly?

Were you trying to coerce it to a list?

ASObjC and OSAX libraries are not related. Their purpose is entirely different and one doesn’t succeed the other.

OSAX is as the name implies an library that adds functionality to the programming language. Similar to extensions in PHP or modules in Python. The purpose of OSAX is to expand the functionality and usability of AppleScript.

ASObjC is an window to another environment and has more in common with an do shell script command than OSAX. Its goal is to use the OS base SDK in AppleScript but you still have to programming everything yourself (in an off-AppleScript syntax).

In Practice, the difference would be that in an well written OSAX the entire ASObjC function written by Shane here could be done with a single AppleScript command using libxml.

To answer your question: For ASObjC usage you don’t need any OSAX at all.

I Xpathed all entries into an array.
then looped thru each on to find which ones matched my criteria.
6hrs later i had to force quit and go to bed

What are your criteria? There’s probably a better way.

All ENTRY elements
that have INFO children
whose RATING attribute is “Search In Playlists”

My original XPath that I finally figured out was this:
//COLLECTION/ENTRY/.[//@RATING=‘Search In Playlists’]

for most of those ENTRY/INFO elements, very few actually have RATING attributes
so I figured I would try an XPath that would only find those that actually had a rating using:
//ENTRY/INFO[@RATING]/…

here is my trimmed code:

set {theResults, theError} to (theXMLDoc's nodesForXPath:"//ENTRY/INFO[@RATING]/.." |error|:(specifier))

set theResults2 to {}
repeat with aResults in theResults		
	set trackRATING to (aResults's nodesForXPath:"//INFO/attribute::RATING" |error|:(missing value))'s firstObject()'s stringValue() as text
	
	if trackRATING is "Search In Playlists" then
		set end of theResults2 to aResults
	end if
	aResults's detach()
end repeat

which still after 4plus hours no real results

PS I am running SD6.0.8 on 10.10
I have a system install of 10.11 on a drive. So I’ve yet to try this all on SD7 and 10.11.
I’m guessing this could make a huge difference?

One other thought that i’m having is am I possibly crossing into iTunes item Object subclass
Track with the naming of my trackRATING variable?

thanks

You don’t want to be doing XPath queries in a loop – the point of using XPath is to avoid things like loops as much as possible.

It’s still not clear what you’re after, but if it’s actually:

something like this should do it:

set {theResults, theError} to (theXMLDoc's nodesForXPath:"//ENTRY/INFO[@RATING='Search In Playlists']/.." |error|:(reference))

But you don’t say what you want to do with the ENTRY elements when you’ve found them.

Hey Ame,
Thanks for this.
Can you explain to me the advantage of using “my”?
Also I’m trying to understand / analyze your code and I’m confuse about the last two.
As it seems like the code is exactly the same.
What is it that I’m missing that makes the code on the last one execute so fast?

Thanks