XML parser running slowly

You should be running 6.0.8, at least. Preferably, version 7.

Xpath is by far the quickest.

Apple’s documentation is available on-line, or via Xcode. But it’s written with Objective-C in mind, so it can take a bit of effort to apply it to AppleScript.

There is no dictionary for Objective-C. You have to rely on Apple’s documentation.

I’m afraid scripting additions aren’t going to help, either.

The best resources are existing code (around here and other places), Apple’s documentation, my book or some similar guide to the basics, and a bit of time.

Thanks Shane… for others finding this amazing thread.
I’ve found this online and helping clear things up a bit:

https://developer.apple.com/library/content/documentation/Cocoa/Conceptual/NSXML_Concepts/NSXML.html#//apple_ref/doc/uid/TP40001263-SW1

OK thanks,
Now my next question is how do i create an array using a xpath query where
I only want to get the elements whose child element has an attribute of a certain value?

See attached XML below:

what I’m trying to get is:

All ENTRY elements
that have INFO children
whose RATING attribute is “Search In Playlists”

I’ve tried everything with no luck?

thanks

Kerry

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<NML VERSION="19">
	<HEAD COMPANY="www.native-instruments.com" PROGRAM="Traktor"/>
	<COLLECTION ENTRIES="20">
		<ENTRY LOCK="0" LOCK_MODIFICATION_TIME="2017-12-29T16:02:30" TITLE="All My Friends (Franz Ferdinand Version)" ARTIST="LCD Soundsystem">
			<LOCATION DIR="/:Users/:kerry/:Music/:iTunes/:iTunes Media/:Music/:LCD Soundsystem/:All My Friends - EP/:" FILE="All My Friends (Franz Ferdinand Version).m4p" VOLUME="Tekno" VOLUMEID="Tekno"/>
			<ALBUM TRACK="2" TITLE="All My Friends - EP"/>
			<MODIFICATION_INFO AUTHOR_TYPE="user"/>
			<INFO BITRATE="128000" GENRE="Alternative" COMMENT="missing 20180401" RATING="Search In Playlists" PLAYTIME="353" IMPORT_DATE="2018/4/29" FLAGS="10" FILESIZE="5582" COLOR="1"/>
			<TEMPO BPM_QUALITY="100.000000"/>
		</ENTRY>
		<ENTRY MODIFIED_DATE="2018/12/30" MODIFIED_TIME="27503" LOCK="0" LOCK_MODIFICATION_TIME="2017-12-30T07:37:56" AUDIO_ID="AMMAARIREhEREhEhIzMyERIjMzMhEjRERjVEM1VDNGmprIdZqZepiHq7q6iXdnqruruImsqqqIl5iLvcvdQBIyMhEiMiMyIhNEVUVUNURDM0NJmpqohZuZe4mGqqrKmWdoq7uruImsqq2HiImLzcrLvLrIiavKq5iJiYrM3KiZm726zLuGd2dlZmZ3Zmd4mqvImHmruph2mZzamoeanOzLlmR3dmdHRoZ1dmZHZ2d1dHdnaqzNy8yqmYiIeXiIl4iomIh4iJeImoiIqJu9ysl4qsqrqImKu62YmIqKzMvLqrvJiaq6qqeIm6qsl4d5eHdmhndnVEQzMiIRAAAAAAAA==" TITLE="Jessie's Girl" ARTIST="Rick Springfield">
			<LOCATION DIR="/:Users/:kerry/:Music/:iTunes/:iTunes Media/:Music/:Compilations/:30 Stars of the 80's/:" FILE="Jessie's Girl.m4a" VOLUME="Tekno" VOLUMEID="Tekno"/>
			<ALBUM OF_TRACKS="30" TRACK="5" TITLE="Billboard Hot 100 Singles 1981"/>
			<MODIFICATION_INFO AUTHOR_TYPE="user"/>
			<INFO BITRATE="2822400" GENRE="Billboard" COMMENT="missing 20180401" RATING="Search In Playlists" COVERARTID="051/TNH244BFYPFWHBIZEDMXD31H1BNC" KEY="D" PLAYTIME="196" PLAYTIME_FLOAT="195.466660" RANKING="102" IMPORT_DATE="2018/4/29" RELEASE_DATE="1981/1/1" FLAGS="14" COLOR="1"/>
			<TEMPO BPM="131.627625" BPM_QUALITY="100.000000"/>
			<LOUDNESS PEAK_DB="0.930028" PERCEIVED_DB="-0.424675" ANALYZED_DB="-0.424675"/>
			<MUSICAL_KEY VALUE="2"/>
			<CUE_V2 NAME="AutoGrid" DISPL_ORDER="0" TYPE="4" START="352.527589" LEN="0.000000" REPEATS="-1" HOTCUE="0"/>
		</ENTRY>
		
	</COLLECTION>
	
</NML>

That’s really an Xpath question, not an AppleScript question, so this may not be the best place to ask. You might try something like: https://www.w3schools.com/xml/xpath_intro.asp

Hi Nigel,

just for info for all (you and Shane know this) I tested the following script:


property myArray : missing value

set myString to LoopWithString() -- test changing the handler

on LoopWithString()
	set startDate to current date
	set myString to ""
	repeat with j from 1 to 50000
		set myString to myString & j
	end repeat
	set endDate to current date
	set elapsedTime to endDate - startDate
	return elapsedTime -- 6 seconds on MacBook Pro 2015
end LoopWithString

on LoopWithArraySlow()
	set startDate to current date
	set myArray to {}
	repeat with j from 1 to 50000
		copy j to end of my myArray
	end repeat
	set myString to myArray as string
	set endDate to current date
	set elapsedTime to endDate - startDate
	return elapsedTime -- 37 seconds on MacBook Pro 2015 
end LoopWithArraySlow

on LoopWithArrayFast()
	-- Note the use of word "my" to access the array
	set startDate to current date
	set myArray to {}
	repeat with j from 1 to 50000
		copy j to end of my myArray
	end repeat
	set myString to myArray as string
	set endDate to current date
	set elapsedTime to endDate - startDate
	return elapsedTime -- 1 seconds on MacBook Pro 2015 
end LoopWithArrayFast

These result with loop of 50,000.
I remember when I discovered the trick about the “magic” word “my” many years ago…

Stefano - Ame

OK I figured out my xPath properly:

//COLLECTION/ENTRY/.[//@RATING=‘Search In Playlists’]

but wondering if this is “better”

–//COLLECTION/ENTRY/*//[@RATING=‘Search In Playlists’]


once I got my xPath figured out and working on my small XML
once I ran it on my large XML which has near 50,000 ENTRY items
it didn’t complete.


next I thought I would just load all of my //COLLECTION/ENTRY items into an array.
Then I looped thru each item and figured out how to check if it matched my
criteria. Then added it to a new array.

this worked fine again on my small XML
but still haven’t been able to get it to work reasonably with the
large XML.

any ideas?

What do you mean, exactly?

Were you trying to coerce it to a list?

ASObjC and OSAX libraries are not related. Their purpose is entirely different and one doesn’t succeed the other.

OSAX is as the name implies an library that adds functionality to the programming language. Similar to extensions in PHP or modules in Python. The purpose of OSAX is to expand the functionality and usability of AppleScript.

ASObjC is an window to another environment and has more in common with an do shell script command than OSAX. Its goal is to use the OS base SDK in AppleScript but you still have to programming everything yourself (in an off-AppleScript syntax).

In Practice, the difference would be that in an well written OSAX the entire ASObjC function written by Shane here could be done with a single AppleScript command using libxml.

To answer your question: For ASObjC usage you don’t need any OSAX at all.

I Xpathed all entries into an array.
then looped thru each on to find which ones matched my criteria.
6hrs later i had to force quit and go to bed

What are your criteria? There’s probably a better way.

All ENTRY elements
that have INFO children
whose RATING attribute is “Search In Playlists”

My original XPath that I finally figured out was this:
//COLLECTION/ENTRY/.[//@RATING=‘Search In Playlists’]

for most of those ENTRY/INFO elements, very few actually have RATING attributes
so I figured I would try an XPath that would only find those that actually had a rating using:
//ENTRY/INFO[@RATING]/…

here is my trimmed code:

set {theResults, theError} to (theXMLDoc's nodesForXPath:"//ENTRY/INFO[@RATING]/.." |error|:(specifier))

set theResults2 to {}
repeat with aResults in theResults		
	set trackRATING to (aResults's nodesForXPath:"//INFO/attribute::RATING" |error|:(missing value))'s firstObject()'s stringValue() as text
	
	if trackRATING is "Search In Playlists" then
		set end of theResults2 to aResults
	end if
	aResults's detach()
end repeat

which still after 4plus hours no real results

PS I am running SD6.0.8 on 10.10
I have a system install of 10.11 on a drive. So I’ve yet to try this all on SD7 and 10.11.
I’m guessing this could make a huge difference?

One other thought that i’m having is am I possibly crossing into iTunes item Object subclass
Track with the naming of my trackRATING variable?

thanks

You don’t want to be doing XPath queries in a loop – the point of using XPath is to avoid things like loops as much as possible.

It’s still not clear what you’re after, but if it’s actually:

something like this should do it:

set {theResults, theError} to (theXMLDoc's nodesForXPath:"//ENTRY/INFO[@RATING='Search In Playlists']/.." |error|:(reference))

But you don’t say what you want to do with the ENTRY elements when you’ve found them.

Hey Ame,
Thanks for this.
Can you explain to me the advantage of using “my”?
Also I’m trying to understand / analyze your code and I’m confuse about the last two.
As it seems like the code is exactly the same.
What is it that I’m missing that makes the code on the last one execute so fast?

Thanks

Hi technomorph.

The AppleScript Language Guide describes how (but not why!) using a reference to a list variable, instead of just using the variable directly, can speed up access to the items and properties of the list if it’s very large. (https://developer.apple.com/library/content/documentation/AppleScript/Conceptual/AppleScriptLangGuide/reference/ASLR_classes.html#//apple_ref/doc/uid/TP40000983-CH1g-DontLinkElementID_587)

In the ASLG example, the ‘a reference to’ operator is used to set another variable containing a reference to the list variable; but it’s also possible (and slightly faster still) to write the reference directly into the script code by including the owner of the variable in the code. eg.:

item 1000 of bigList -- Using a list variable directly.
item 1000 of my bigList -- Referencing the list variable as something belonging to the current script.
item 1000 of its bigList -- Referencing a list variable in another script.

It’s not possible to reference local variables, only properties, globals, or run-handler variables. But if you need to use the technique inside a handler, you can set up a temporary script object with a property set to the list and use references to that property:

on myHandler(myList)
	script o
		property bigList : myList
	end script
	
	item 1000 of o's bigList
end myHandler

Referencing list variables only speeds up access to the lists’ items and properties — ie. the list variable references must be parts of references to items or properties of those lists. It doesn’t speed up operations on the lists themselves, such as counting, ‘contains’, or concatenation.

You’re making it hard to follow. What are you planning to do with these attribute names and values?

IAC, you don’t need all that stuff. Try this:

set propNames to theResults's valueForKeyPath:"attributes.name"
set propValues to theResults's valueForKeyPath:"attributes.stringValue"

Hi yes i guess I should make that clear.

I’m working with the DJ Software Traktor (and a bit of iTune)
Their own library system has some flaws that I’m trying to work on in my own.
Their library system is XML 1.0 based and they name them with extension .NML
The main library system is called collection.NML
You can export out a single playlist playlist.NML from the software that carries pretty much
the same structure as the main collection.NML but just a great deal smaller.
You can then reimport that playlist.NML into the software and it will merge in the
main library and update as need be (I still need to do some investigation on exactly what
get’s updated and what gets potentially lost but thats in the future)

1) relocating missing ENTRY files:
(a little background iTunes when you change Track Name, Artist, Album, etc
depending on your iTunes management preferences, it will renames and relocates
the file based on the Artists Name/LP Name/Track Name. I like this management.
iTunes has a way better Library management system, than Traktor does so most
folks use iTunes for their main management, and then play in Traktor. Once
you’ve changed a Track in iTunes and it relocates it. Traktor will say it is missing.
It has a relocate function and you can point it to the new file. But it you have
numerous missing files. It can automatically try to refind them, but this is very
slow and can be inaccurate)

  • want to figure out a way to do better searching with the option to ask the user
    “I found these? Which one should I replace it with”
  • also I’ll be looking into using: iTunes NSPredicate, mdls with mdfind and kMD, your FileTagsLib
  • also some of Doug’s iTunes scripts (or my modified versions)

2) dealing with duplicates: both physical files, and duplicate XML entries

  • finding possible duplicates
  • helping select which ones would be best to keep (based on bit rate, and other quality factors)
  • transferring some of the meta data (attributes) between each ENTRY
  • possibly recalculating (adjusting) some of the attributes based on different lengths that
    may be different between each of the ENTRY’s
  • removing those “old” ENTRY’s
  • updating the library PLAYLISTS that had those old ENTRY’s with the selected ENTRY
  • saving all the new info to the master LIBRARY

I’ve accomplished some of these things by using:

iTunes Side

  • Doug’s Scripts Dupin software
  • my own custom applescripts for iTunes
    Traktor Side:
  • manually figuring out the selects and what to copy
  • export the smaller PLAYLIST.NML file
  • using an XML editor to edit and copy/paste ELEMENTS and ATTRIBUTES
  • reimporting that edited PLAYLIST.NML file into TRAKTOR and having it update the main LIBRARY

I’ve really being digging what I’ve been able to create on my own via AppleScript with other
tasks on the iTunes side and would love to develop applescripts to accomplish what I’m looking
to do and share them with the Traktor community.

Stages I’m going to work thru:

  • Parsing XML and creating an array or list of the ENTRY’s I wish to process b[/b]
  • Present a dialog to ask user to select which ATTRIBUTES would like to collect b[/b]
  • Gather all of the selected ATTRIBUTES for the ENTRY’s into a list or a record (partially DONE)
  • Show this data in a table b[/b]
  • then depending on the different tasks I listed above I will need to:
    - analyze, compare, sort, evaluate and create new lists/sublists/record/sub-records
    - present the new data to the user and ask them to manual select some choices (filters)
    - create a new array based on the choices made by the user
    - update the other PLAYLIST node replacing removed ENTRYS with the new ENTRYS
    - update the master XML with the new data
    - oh and of course create a backup of the master XML in case anything goes wrong

all of these are great ways for me to learn more bout applescript which I’m loving.
Also running into issues, figuring out how to get around them and then also figuring
out of to make it all more efficient and user friendly at the same time.

All of your input has been so amazingly helpful!

thanks again

Yes this did it. It’s what I had originally started with but 10.10.5 didn’t like it.
Definitely needed to make sure SD was in Source view mode.

the Xpath now takes 3 seconds to complete!

I will now look into using what you’ve suggested here now:

I was trying to use the xPath method you had in your final code above:

few problems.

  1. I found I did have to put it into a loop. I could not just use:
set trackTITLE to ((theResults's nodesForXPath:"//ENTRY/@TITLE" |error|:(missing value))'s valueForKey:"stringValue") as list
  1. what won’t work for me is the lists that creates, I won’t be able to merge them properly
    as for example the trackAUDIO_ID list that was created. Was not of the same length (item count) as the other list because one of the ENTRY’s did not have that attribute. This will be the case for many of my other attributes I’d be working with.
  2. as you mentioned before running the multiple Paths inside the loop was slow.

I will use your valueForKeyPath: method and report back!

thanks again

Kerry

works great but it only gives me the root element of the
ENTRY attribute names and attribute values

how can I also get the same for the children of the ENTRY
that include:

LOCATION
INFO
BPM
CUES_V2

Edit one thought I have is I could set up a variable that adds
The node to the Xpath and then run it again.
Doing the same for each element?

Or can I Xpath my results?
From ADC
The NSXMLNode class defines an XPath method that can be quite useful when making XPath queries. As the name suggests, you can send an XPath message to any node object to get an XPath string describing that node’s location in a tree.

I suspect you can do it with a different XPath query. You want to avoid loops as much as possible.

yes i set up separate Path query for each subelement and works great.

thanks