Parse XML with AppleScript

Here is my XML file:

<?xml version="1.0" encoding="UTF-16"?>

<8086:293e>
Audio
Some Name Here
Some Display Name Here
http://somesite.com
</8086:293e>

How would I use AppleScript to parse the XML without any 3rd party addons? Basically I want to save all the data from the “<8086:293e>” element into seperate variables (e.g. type, name, displayname, download). I’ve already read the Dictionary for this but I can’t make anything out of it.

Help is much appreciated
Thanks

Assuming it’s a text file like what you’ve shown (I put mine on my desktop) then:

set P to paragraphs 3 thru 6 of (read alias "ACB-G5_Leopard:Users:bell:Desktop:xmlFile")
set tid to AppleScript's text item delimiters
set tData to {}
--> {"Audio", "Some Name Here", "Some Display Name Here", "http://somesite.com"}
repeat with oneP in P
	set AppleScript's text item delimiters to ">"
	set aPart to text item 2 of oneP
	set AppleScript's text item delimiters to "<"
	set end of tData to text item 1 of aPart
end repeat
set AppleScript's text item delimiters to tid

OK, but what if I had different elements. Say I wanted to get the values of “<8086:3333>” rather than “<8086:293e>”. Would there be a way to specify which element to extract it from?

Check out the dictionary for system events in the XML section. There is some basic XML support built right in to system events.

tell application "System Events"
		tell XML element 1 of contents of XML file XMLfile  --THis targets the main outer tag, <8086:293e> in your case
				set typeText to (value of (XML elements whose name is "type")) as string
				set nameText to (value of (XML elements whose name is "name")) as string
		end tell
end tell

I didn’t check this so I hope there aren’t errors in the syntax. That should give you a good start at least.

If you want to get more complicated (like writing out an updated XML file after you chenge the data), you might want to get the XML and XSLT scripting additions here http://www.latenightsw.com/freeware/.

I tried that, but when I tried to run it I got:

Can’t make «class valL» of every «class xmle» of «class xmle» 1 of contents of «class xmlf» “/xmlfile.xml” of application “System Events” whose name = “type” into type string.

EDIT: OK so I have SOME progress. Here is the XML:

<?xml version="1.0" encoding="UTF-16"?> <test> <type>Audio</type> <name>Some Name</name> <displayname>Some Display Name</displayname> <download>http://somesite.com</download> </test>
And the AppleScript:


set XMLfile to "Leopard:Users:pcwiz:Desktop:xmlfile.xml"
tell application "System Events"
    tell XML element "test" of contents of XML file XMLfile
        set typeText to (value of XML element "type")
        set nameText to (value of XML element "name")
    end tell
end tell

That works fine. However, when I change the name from “test” to “8086:293e” and change “tell XML element “test”” in AppleScript to “tell XML element “8086:293e”” it gives me this error:

System Events got an error: Can’t get XML element “8086:293e” of contents of XML file “Leopard:Users:pcwiz:Desktop:xmlfile.xml”

Any ideas?

To me xml or html or whatever is just a big long string. As such I don’t need special xml commands to extract information from a string. I just use repeat loops (looking for characters I’m interested in) to get the information I want. The following will get all of the values and place them in a list for you.

set tt to "<?xml version=\"1.0\" encoding=\"UTF-16\"?>
<8086:293e>
<type>Audio</type>
<name>Some Name Here</name>
<displayname>Some Display Name Here</displayname>
<download>http://somesite.com</download>
</8086:293e>"

-- put the xml into a list
set ttParas to paragraphs of tt

-- we cycle through the list and extract the portion between the tag we're interested in
-- we know that the tags we want are between "<8086:" and "</8086:"
set tagValueList to {}
repeat with i from 1 to count of ttParas
	if text 1 thru 6 of (item i of ttParas) is "<8086:" then -- first we find the tag we're interested in
		repeat with j from (i + 1) to count of ttParas
			if text 1 thru 7 of (item j of ttParas) is "</8086:" then exit repeat -- we exit the loop when we hit the end tag
			set end of tagValueList to item j of ttParas
		end repeat
		exit repeat
	end if
end repeat
-- return tagValueList

-- next we extract each value from the found tags knowing that the value is between the ">" and "<" characters
set theTagValues to {}
repeat with i from 1 to count of tagValueList
	set thisTag to item i of tagValueList
	set thisTagValue to ""
	repeat with j from 1 to count of thisTag
		if item j of thisTag is ">" then
			repeat with k from (j + 1) to count of thisTag
				if item k of thisTag is "<" then exit repeat
				set thisTagValue to thisTagValue & item k of thisTag
			end repeat
			exit repeat
		end if
	end repeat
	set end of theTagValues to thisTagValue
end repeat
theTagValues

Hi.

It seems that “8086:293e” ” which not only begins with a numeral but contains a colon ” is an invalid name for an XML tag ” at least as far as System Events is concerned. Is it of your own devising or has it come from some application? If you’re stuck with it, you’ll have to use regulus’s text parsing idea.

An attribute can have those characters. If you can’t rename the tag to start with a non-number and not have a colon, something like this will work. I tested this (I am in Tiger):

<?xml version="1.0"?> Audio Some Name Some Display Name http://somesite.com
set XMLfile to "Macintosh HD:Users:myUser:Desktop:TestXML.xml"
tell application "System Events"
	tell XML element 1 of contents of XML file XMLfile
		set typeText to (value of XML element "type")
		set nameText to (value of XML element "name")
	end tell
end tell

Alright. I have been experimenting with this some more. I think you would be best structuring your XML file like this:

<?xml version="1.0"?> Audio1 Some Name1 Some Display Name1 http://somesite1.com Audio2 Some Name2 Some Display Name2 http://somesite2.com Audio3 Some Name3 Some Display Name3 http://somesite3.com

And then read in all of the values (however many there are) into a list, and then do a search on the list to pull the values of the one you are looking for.

set theFinalValues to {}
set XMLfile to "Macintosh HD:Users:myUser:Desktop:TestXML.xml"
tell application "System Events"
	tell XML element "Main" of contents of XML file XMLfile
		repeat with thisElement from 1 to (count of XML elements)
			set dataValue to (value of XML attribute of XML element thisElement) as string
			set typeText to (value of (XML elements whose name is "type") of XML element thisElement) as string
			set nameText to (value of (XML elements whose name is "name") of XML element thisElement) as string
			set displayname to (value of (XML elements whose name is "displayname") of XML element thisElement) as string
			set download to (value of (XML elements whose name is "download") of XML element thisElement) as string
			set theFinalValues to theFinalValues & {{dataValue, typeText, nameText, displayname, download}}
		end repeat
	end tell
end tell

repeat with thisData from 1 to count theFinalValues
	if item 1 of item thisData of theFinalValues is "8086:295e" then
		set typeText to item 2 of item thisData of theFinalValues
		set nameText to item 3 of item thisData of theFinalValues
		set displayname to item 4 of item thisData of theFinalValues
		set download to item 5 of item thisData of theFinalValues
		set theFinalSearchValue to {(item 1 of item thisData of theFinalValues), typeText, nameText, displayname, download}
	end if
end repeat
theFinalSearchValue

result:
{“8086:295e”, “Audio2”, “Some Name2”, “Some Display Name2”, “http://somesite2.com”}

I would agree with Matt-Boy. It is one of the requirements of XML that there be a “root” element as the 1st element. If you use changeable data as the 1st element, then you don’t have a root element, to my way of thinking.

I thought I would add to this thread with a similar issue I’m having. I’m attempting to parse an XML file from the Haloscan commenting system. Here’s a short example of what the XML looks like.

<?xml version="1.0" encoding="iso-8859-1" ?> 2005-06-14T08:57:25-05:00 Some Name some@mail http://wwww.w.come 100.100.0110.111 2005-06-14T08:57:25-05:00 Some Name some@mail http://wwww.w.come 100.100.0110.111 2005-06-14T08:57:25-05:00 Some Name some@mail http://wwww.w.come 100.100.0110.111

THere is a threadid with 1 or more comments under each. I want to search for specific thread ids and extract the comment info.

Just wanted to add my method:

set paski to do shell script “curl http://www.onthesnow.com/pennsylvania/snow.rss

–>get all items
set paskiitems to my parsecode(paski, “”, “”)

–>build array of mountaind data
set regiondata to {}
repeat with x from 1 to count of every item of paskiitems
set thismountain to item x of paskiitems
set AppleScript’s text item delimiters to return
set astid to AppleScript’s text item delimiters
set mountainname to my cleantags(thismountain, “”, “”, astid)
set mountaindescription to my cleantags(thismountain, “”, “”, astid)
set mountainlink to my cleantags(thismountain, “<guid isPermaLink="true">”, “”, astid)
set mountainstatus to cleantags(thismountain, “ots:open_staus”, “</ots:open_staus>”, astid)
set mountaindepth to cleantags(thismountain, “ots:base_depth”, “</ots:base_depth>”, astid)
set mountain48sf to cleantags(thismountain, “ots:snowfall_48hr”, “</ots:snowfall_48hr>”, astid)
copy {mountainname, mountaindescription, mountainlink, mountainstatus, mountaindepth, mountain48sf} to end of regiondata
end repeat

return regiondata

on parsecode(code, opentag, closetag)
set itemlist to {}
set AppleScript’s text item delimiters to opentag
set taglist to every text item of code as list
set childtaglist to {}
repeat with x from 2 to count of every item of taglist
copy item x of taglist to end of childtaglist
end repeat

repeat with thisitem in childtaglist
	set AppleScript's text item delimiters to closetag
	copy text item 1 of thisitem to end of itemlist
	set AppleScript's text item delimiters to opentag
end repeat

return itemlist

end parsecode

on cleantags(thisitem, opentag, closetag, astid)
try
set AppleScript’s text item delimiters to opentag
set rawitem to text item 2 of thisitem
set AppleScript’s text item delimiters to closetag
set cleanitem to text item 1 of rawitem
–>reset the delimiters
set AppleScript’s text item delimiters to astid
return cleanitem
on error errmsg
display dialog "Could not clean the " & opentag & return & return & errmsg
end try
end cleantags