AppleScript & RSS

I was wondering, is it possible to make a basic RSS reader in AppleScript? I looked around on the web, but I couldn’t find anything. All I want to do is have it display a list of the headlines from the feed, and link to it. Could anyone give me any help on how to do that. I’m not sure whether it matters, but the RSS version is 1.0. The website is www.controlbooth.com/backend.php. I don’t know any HTML, but I have friends who do, so I can get them to help me on that front, if necessary. Any help would be greatly appreciated.

This seems like it might be pretty easy to do if you employ the shell.

Try this in the Terminal:

curl http://www.controlbooth.com/backend.php

The above command will return your raw RSS data (in XML, I think).

You might then run that command through AS like so:

do shell script "curl [url=http://www.controlbooth.com/backend.php]http://www.controlbooth.com/backend.php"[/url]

The structure is very easy to read. Once you get the result, it should be pretty easy to parse out each item individually for display in some way of your choosing…

But you’ll need a GUI builder like AppleScript Studio or FaceSpan (http://www.facespan.com/facespan/pagespeed/url//features4.0/) to display more than a fraction of that data via AppleScript.



If you only need to read the one feed you mentioned, then its easy to write something custom to display that feed. But if you're looking to be able to read multiple feeds, it becomes a bit more complex due to the varying structure of XML data in feeds from site to site. (I don't know much about XML save for what I'm seeing when I run CURL on a given feed.)

That complexity might lessen later on, as AStudio evolves. I can imagine that with the introduction of Safari's RSS savvy, the WebView object in Xcode might also become smarter as a result and make handling XML display a bit easier. <shrug> But that would be some time down the road, if at all.

Hope this helps.

Thanks, that’s great. What I was hoping to do was to make a status menu to display all of the new stuff. I know how to create a status menu (thanks to jonn8’s application) and I know how to dynamically add items to a menu. My sole problem is getting the menu items to go somewhere. I suppose I could just tell it to create a list by paragraph, and then create a further list from each paragraph to determine the title of the item and the link to point it to. It just seems to me that there must be an easier way to go about doing this. My main problem is that my Developer Tools don’t work for some reason or another, and I can’t download from Apple’s FTP server because I’m behind a firewall. So, I think I’ll have to find the CD that came with my computer. Any help with my division problem would be greatly appreciated.

You mention a couple of separate issues here.

Well, here’s what I might do.
At menu item creation time, I might set the as name of the menu item to the URL it should point to. Then, in the on choose menu item handler, pick up that URL from the name of theObject and use it to construct an applescript command like so:

open location "http://some.rss.url?blah"

then again, I don’t know if AS names in IB have a character limit, so that might not work.
You’d have alot more options if instead of a status menu item, you put all your data in, say, a floating utility window with a table. This would give you columns in which to store the headline title, the url it points to and even a short version of the headline’s description. You could then use the ‘on double click’ handler of the table as the jumping off action for the AS ‘open location’ command. Just another thought.

I’m not sure its as simple as a paragraph. Your headline object separation is done with tags, not returns. Tags like . You’ll need to parse using these tags as delimiters or something.

You’ve got me convinced. I think I’ll use a table in a utility window, like you suggested. Now, my main problem would be that I have no idea what you mean by parsing and delimiters.

‘Parse’ and ‘delimiter’ are as you’ll discover quickly here, terms we use heavily in the AS world.

‘Parse’ simply implies extracting specific bits of text from a larger body of text.

‘Delimiter’ means ‘separator’. With applescript, one can define the delimiter within a body of text in order to easily access key parts.

:wink:

OK, I understand what that means, but could you explain to me how one goes about it. Could we use the feed I’m planning to use as an example? This is part of it:

So, I could remove the top 14 and bottom 4 lines of the downloaded feed. I guess that would be something to do with the “eof” but I’m not positive. This would leave me with:

Now, I suppose I would “set AppleScript’s text item delimiters to (”" & return & return & “”, which would separate all of items in the feed into a list.

Then could I go through each of the items in the list and separate them further, or would AppleScript get confused?

OK, I understand what that means, but could you explain to me how one goes about it. Could we use the feed I’m planning to use as an example? This is part of it:

So, I could remove the top 14 and bottom 4 lines of the downloaded feed. I guess that would be something to do with the “eof” but I’m not positive. This would leave me with:

Now, I suppose I would “set AppleScript’s text item delimiters to (”" & return & return & “”, which would separate all of items in the feed into a list.

Then could I go through each of the items in the list and separate them further, or would AppleScript get confused?

OK, I understand what that means, but could you explain to me how one goes about it. Could we use the feed I’m planning to use as an example? This is part of it:

So, I could remove the top 14 and bottom 4 lines of the downloaded feed. I guess that would be something to do with the “eof” but I’m not positive. This would leave me with:

Now, I suppose I would “set AppleScript’s text item delimiters to (”" & return & return & “”, which would separate all of items in the feed into a list.

Then could I go through each of the items in the list and separate them further, or would AppleScript get confused?

OK, I understand what that means, but could you explain to me how one goes about it. Could we use the feed I’m planning to use as an example? This is part of it:

So, I could remove the top 14 and bottom 4 lines of the downloaded feed. I guess that would be something to do with the “eof” but I’m not positive. This would leave me with:

Now, I suppose I would “set AppleScript’s text item delimiters to (”" & return & return & “”, which would separate all of items in the feed into a list.

Then could I go through each of the items in the list and separate them further, or would AppleScript get confused?

You’re very much on the right track here, close to what I was experimenting with this morning…

The difference being, you won’t use AppleScript’s read file or “eof” commands for excluding those beginning and ending parts of the feed. Just use text item delimiters for that, too.

Here’s some quick, un-optimized code to get you started.
(BEWARE: This code is highly un-optimized and probably a bit clunky. But it works. :wink: I’m certain there are lots of ways to trim down the line count, but I wanted to get this posted quickly…)

Open the AppleScript “rss” in a new Script Editor window.



-- Begin Script.
set raw_feed_text to (do shell script "curl [url=http://www.controlbooth.com/backend.php)]http://www.controlbooth.com/backend.php")[/url]
-- Set up empty list which will ultimately contain just the data you want in a structure
-- that should be easy to insert into your table for display.
-- The value of this variable is your end result.
set feed_record_list to {}

-- Call your first, primary subroutine which breaks all of your feed items out into individuals.
parse_feed(raw_feed_text) of me
set rss_item_list to result

-- Now convert the raw RSS items into a list of AppleScript records for population into your table.
repeat with r in rss_item_list
	extract_record(r) of me
	set end of feed_record_list to result
end repeat







-- Subroutines below.
on parse_feed(some_text)
	
	set feed_item_list to {}
	
	-- Strip out the header text.
	set feedHead to "<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"
 "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">

<channel>
<title>ControlBooth.com</title>
<link>http://www.controlbooth.com</link>
<description>Where Technical Theater Falls Into Place</description>
<language>en-us</language>

"
	set AppleScript's text item delimiters to feedHead
	set a to every text item in some_text
	set b to (items 2 through -1 of a) as string
	
	-- Strip out the footer text.
	set feedFoot to "

</channel>
</rss>"
	set AppleScript's text item delimiters to feedFoot
	set x to every text item in b
	set y to (text items 1 through -2 of x) as string
	
	-- Strip out carriage returns. These aren't useful to us for this purpose.
	set AppleScript's text item delimiters to return
	set z to (every text item in y)
	set AppleScript's text item delimiters to ""
	set y_noreturns to (every item in z as string)
	
	set AppleScript's text item delimiters to "</item>"
	set cList to every text item in y_noreturns
	set AppleScript's text item delimiters to ""
	
	repeat with c in cList
		try
			set final_item to (text 7 through -1 of c)
			set end of feed_item_list to final_item
		end try
	end repeat
	
	return feed_item_list
end parse_feed

on extract_record(some_feed_item)
	set AppleScript's text item delimiters to "</title>"
	set feedTitle to text 8 through -1 of (text item 1 of some_feed_item)
	set feedLink_long to (text item 2 of some_feed_item)
	set AppleScript's text item delimiters to "</link>"
	set feedLink_short to (text 8 through -1 of text item 1 of feedLink_long)
	set AppleScript's text item delimiters to ""
	set myFeedRecord to {title:feedTitle, link:feedLink_short}
	return myFeedRecord
end extract_record


Sure it is, though to be honest you’d be much better using something like Python or Perl, which already have good RSS parsing libraries. Writing your own RSS parser from scratch is a non-trivial task, especially as malformed RSS feeds are common and can a real pain to work with. However, if you just want to bodge something that pulls some text out of one particular feed that you’re certain is absolutely regular in structure, you could probably get away with using a simple regular expression to match and extract the bits you want. e.g. Here’s one using the Satimage osax [1]:

set str to "<?xml version="1.0" encoding="ISO-8859-1"?> 

<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" 
"http://my.netscape.com/publish/formats/rss-0.91.dtd"> 

<rss version="0.91"> 

<channel> 
<title>ControlBooth.com</title> 
<link>http://www.controlbooth.com</link> 
<description>Where Technical Theater Falls Into Place</description> 
<language>en-us</language> 

<item> 
<title>ControlBooth.com WebRing</title> 
<link>http://www.controlbooth.com/modules.php?name=News&file=article&sid=45</link> 
</item> 

<item> 
<title>Bachelor Dave Is Dead...</title> 
<link>http://www.controlbooth.com/modules.php?name=News&file=article&sid=44</link> 
</item> 

</channel> 
</rss>"


set matchedItems to find text "<item>[[:space:]]*<title>([^<]*)</title>[[:space:]]*<link>([^<]*)</link>" in str using "\1\n\2" with regexp, all occurrences and string result
repeat with matchRef in matchedItems
	set matchRef's contents to {text 1 thru paragraph -2 of matchRef, paragraph -1 of matchRef}
end repeat
return matchedItems
--> {{"ControlBooth.com WebRing", "http://www.controlbooth.com/modules.php?name=News&file=article&sid=45"}, {"Bachelor Dave Is Dead...", "http://www.controlbooth.com/modules.php?name=News&file=article&sid=44"}}

Note that there’s at least a dozen different reasons why this code is the very definition of “naive” and will break in all sorts of thrillingly disastrous ways in any kind of real-world use, so if reliability and good behaviour is a concern and/or you plan on doing anything more than what you’ve described then it’s simply not going to cut it and you should go use a proper RSS parser (e.g. here’s one for Python: [url=http://sourceforge.net/projects/feedparser/]http://sourceforge.net/projects/feedparser/[/url]) to do all the heavy lifting for you. Or get yourself a scriptable RSS reader like NetNewsWire and use AppleScript to control that.

Nitpick:

Parsing means breaking a [structured] string down into its component parts. What TJ’s describing is pattern matching, which is a simpler and much less powerful process. FWIW, there’s a couple basic AppleScript-based parsers, HTMLParser and PListDOM, on AppleMods if you’re interested in seeing some [very simple] examples.

HTH

has

[1] AppleScript’s TIDs are utterly lame and strictly for masochists. Get yourself a decent regex osax/library. If you’ve not used regular expressions before, the Satimage osax has some basic documentation in PDF form, and there’s scads of tutorials and stuff on the web - just Google for something like “regular expression tutorial”.

Thanks! That was incredibly helpful. However, when I run the script, it gives me this error message:

I myself don’t see what’s wrong, but I thought It would be best to ask you for a bit of help, since I don’t want to think I’m fixing it and screw some other part of the script up.

Which script are you trying to use? His or mine?

Sorry, I should have mentioned. I’m trying to use yours.

Sorry, I should have mentioned. I’m trying to use yours.

Hmm. It worked well here.

Ok, try this revised verson. It’s a bit leaner. It seems to work quite well here. My favorite feature is that it’s pure vanilla AS. Has makes some good points about better tools for parsing and even other languages that might do a better/easier job.

However, I’m usually most interested in vanilla AS if I can get it, even if it is clunkier than it ‘should be’ by the standards of other languages. In addition, if one wants to use grep, I think you can do that via the shell without necessarily employing a third party solution. (Or am I dreaming that?)

-- Begin Script.
set raw_feed_text to (do shell script "curl [url=http://www.controlbooth.com/backend.php)]http://www.controlbooth.com/backend.php")[/url]

-- Call your first, primary subroutine which breaks all of your feed items out into individuals.
parse_feed(raw_feed_text) of me

-- And here's your usable value for population in the gui table.
set usable_record_list to result

-- Subroutines below.
on parse_feed(some_text)
	set feed_item_list to {}
	set AppleScript's text item delimiters to "<item>"
	set no_header to (text items 2 through -1 of some_text)
	set AppleScript's text item delimiters to "<item>"
	set no_header2 to (every item in no_header) as string
	set some_text_rev1 to "<item>" & no_header2
	set AppleScript's text item delimiters to "</item>"
	set no_footer to (text items 1 through -2 of some_text_rev1)
	set AppleScript's text item delimiters to "</item>"
	set no_footer2 to (every text item in no_footer) as string
	set some_text_rev2 to (no_footer2 & "</item>")
	-- Strip out carriage returns. These aren't useful to us for this purpose.
	set AppleScript's text item delimiters to return
	set z to (every text item in some_text_rev2)
	set AppleScript's text item delimiters to ""
	set y_noreturns to (every item in z as string)
	set AppleScript's text item delimiters to "</item>"
	set cList to every text item in y_noreturns
	set AppleScript's text item delimiters to ""
	repeat with c in cList
		try
			set final_item to (text 7 through -1 of c)
			set end of feed_item_list to final_item
		end try
	end repeat
	set final_record_list to {}
	repeat with f in feed_item_list
		set AppleScript's text item delimiters to "</title>"
		set feedTitle to text 8 through -1 of (text item 1 of f)
		set feedLink_long to (text item 2 of f)
		set AppleScript's text item delimiters to "</link>"
		set feedLink_short to (text 8 through -1 of text item 1 of feedLink_long)
		set AppleScript's text item delimiters to ""
		set myFeedRecord to {title:feedTitle, link:feedLink_short}
		set end of final_record_list to myFeedRecord
	end repeat
	return final_record_list
end parse_feed

The problem with the above solutions isn’t clunkiness, it’s extreme fragility. Meaning they’ll blow up horribly at the slightest excuse. They don’t deal with things like text encodings, XML character entities and whitespace cleanup, and make all sorts of completely unsafe assumptions about order of elements, format of tags, etc. If the OP’s requirements are extremely limited and unlikely to change, his data source completely predictable, and the high risk of faults and failures not a worry, then he can maybe get away with using one of these naive solutions.

Otherwise he should use a real RSS parser; either one written in AppleScript (which he’d have to do himself [1], as nobody else has or is likely to), or one written in another language (Perl, Python, C, etc.) that be called from AppleScript via ‘do shell script’, implemented as an OSAX or scriptable FBA, or bound into an ASS application (if that’s what he’s writing). Reliability and ease of implementation and use are far more important than who wrote it or what language they used.

Example: It took me about 5 minutes to throw together a proof-of-concept scriptable FBA that wraps feedparser and returns its results as nested records and lists, whereas writing even a very basic, unforgiving RSS parser in vanilla AppleScript would take me days or even weeks (since I’d need to implement the major supporting libraries from scratch as well) [2]. Simply not worth it when you can already get an excellent (i.e. robust, mature and very well documented) error-tolerant RSS parser completely for free.

HTH

has

[1] A non-trivial task requiring good knowledge of XML, RSS and parser design, and lots of determination and free time. Using something like the XML Tools OSAX to parse the RSS might help, though it’s probably a strict XML parser that’ll barf on malformed RSS such as that provided by the OP’s example feed (it doesn’t escape ampersands correctly).

[2] Being pure AS, it’d also run slow as molasses compared to the ‘impure’ alternatives; another area where pragmatism always beats purity.

All valid points, Has.

So, yes, caveat emptor for my AS-only solution.
If the feed data structure in even by a character or two, it’ll likely break my parser, sure.

Take it for what it’s worth.

Hi, sorry I haven’r replied, but it took me a while to get my hands on a working copy of Xcode, and then the internet was shut down at my school, so I couldn;t get onto the site. T.J., I was wondering, how do I get te data to display in the data view. I’ve never really worked with one before, and I don’t completely understand how they work. So, your script gives me the variable “usable_record_list”. How do I make the list show up in the data view?