I just got an iPod and I’m trying to write Applescripts for Audio Hijack to extract the Real Audio feeds of some of my favorite shows, so I can record them to iTunes at night and listen to them on my iPod. I’ve done fairly well with some scripts I found for NPR on the Audio Hijack forums:
However, I am now looking at a show whose URL includes the name of the show, not just the date. This makes it difficult, because the sample script I’m using works by the date.
The show is KCRW’s “Sounds Eclectic” - a weekly roundup of music that had been played during the week on “Morning becomes Eclectic”:
The link on Sounds Eclectic do not link directly to the ram files; they link to the KCRW’s main page, which then redirects to an archive page that holds a neatly compiled list of the files you seek the url of the archive page is “http://www.kcrw.com/archive.html”. The search string in the urls is “/ram_wrap.cgi?/sc” as you said, but there are href blocks containing the same so I added “if url contains “/ram_wrap.cgi?/sc” and “http:”” so it returns only fully qualified links. I also included a property list to store previously downloaded urls to avoid repeat downloads. The links are snagged off a Safari page, and the rest could be done more efficiently with URL Scripting. I would have written the rest with URL Scripting, but when it gets to that part of the script my Crapintosh tries to open Classic URLScripting and I can’t get it to realise the OSX version <-Anyone know how I can get that back?
set PriorDownloads to {}
tell application "Safari"
launch
make new document at the beginning of documents
set the URL of the front document to "http://www.kcrw.com/archive.html"
delay 3
end tell
tell application "Safari"
set HTMLCode to source of document 1
close window 1
end tell
set AppleScript's text item delimiters to ASCII character 34
set HTMLBlocks to text items of HTMLCode
set AppleScript's text item delimiters to ""
repeat with ThisBlock in HTMLBlocks
if ThisBlock contains "http:" and ThisBlock contains "/ram_wrap.cgi?/sc" then
if ThisBlock is not in PriorDownloads then
tell application "Safari"
make new document at the beginning of documents
set the URL of the front document to ThisBlock
delay 2
close window 1
end tell
set PriorDownloads to PriorDownloads & ThisBlock
end if
end if
end repeat
Actually, there is a link on the main page - just below the one you clicked to go the archives… where it says “clickhere to listen.”
I’ve run into a problem with the script. When I run it I get the following error:
Or something of the sort. I can’t reproduce it because I only go the error the first time I ran the script. The second time I tried (to copy the error code) it actually worked fine. Not sure I know why that might be the case.
Thanks for your help. This will be great for a whole bunch of scripting purposes.
I will probably remove the “prior downloads” checking feature, since I’ll be running this from Audio Hijack which has its own scheduling features I can use to make sure the script is only run once a week.
Let me know if you figure out why the HTMLCode error pops up the first time the script is run.
I think I solved the problem with a longer delay. It may be that the page takes longer to load when there is no cache yet for that page.
Here is what I ended up with:
--Sounds Eclectic feed.
--Adapted by Luhmann from a script by sitcom
--http://bbs.applescript.net/message_send.php?id=6437&tid=13496
tell application "Safari"
activate
make new document at the beginning of documents
set the URL of the front document to "http://www.kcrw.com/archive.html"
delay 10
set HTMLCode to source of document 1
end tell
set AppleScript's text item delimiters to ASCII character 34
set HTMLBlocks to text items of HTMLCode
set AppleScript's text item delimiters to ""
repeat with ThisBlock in HTMLBlocks
if ThisBlock contains "http:" and ThisBlock contains "/ram_wrap.cgi?/sc" then
tell application "Safari"
open location ThisBlock
end tell
end if
end repeat
tell application "Audio Hijack"
activate
end tell
I see now why you had the handlers for previous downloads - because it is an archive page, and there seems to be more than one entry for the same show.
I’ve solved that by replacing the script with the link to the individual frame of the original page which has the links:
--Sounds Eclectic feed.
--Adapted by Luhmann from a script by sitcom
--http://bbs.applescript.net/message_send.php?id=6437&tid=13496
tell application "Safari"
activate
make new document at the beginning of documents
set the URL of the front document to "http://www.soundseclectic.com/cgi-bin/db/kcrw.pl?tmplt_type=se_home"
delay 10
set HTMLCode to source of document 1
end tell
set AppleScript's text item delimiters to ASCII character 34
set HTMLBlocks to text items of HTMLCode
set AppleScript's text item delimiters to ""
repeat with ThisBlock in HTMLBlocks
if ThisBlock contains "http:" and ThisBlock contains "/ram_wrap.cgi?/sc" then
tell application "Safari"
open location ThisBlock
end tell
end if
end repeat
tell application "Audio Hijack"
activate
end tell
Very nice, I wish I would have had the original page, becuase it contains all the HTML code. And yes, the delay allows the page to load before reading the html source code. The reason you didn’t get it the second time is because it was already loaded in your cache. If you have a slow connection you will need to set accordingly, as I see you have.
I don’t know if it’s true in all cases, but I have had great success in parsing HTML for links with the
block. ASCII character 34 is /"
Then each block can be parsed for conditions such as if ThisBlock contains “http:”
Internet Explorer has a cool script feature “ParseAnchor”. If the HTML code only contains the href data (“…/post.php?tid=13496”) and no fully qualified links, this command will combine the hfref with the server name “http://bbs.applescript.net/” to produce a fully qualified URL “http://bbs.applescript.net/post.php?tid=13496”.
SC
First, use the JavaScript “document.readyState” status to check to see if the page has loaded, not just an arbitrary delay:
set the_URL to "http://www.soundseclectic.com/cgi-bin/db/kcrw.pl?tmplt_type=se_home"
tell application "Safari"
make new document at beginning of documents
tell document 1
set URL to the_URL
if not my wait_to_finish() then return display dialog """ & the_URL & "" failed to load." buttons {"OK"} default button 1 with icon 2 giving up after 10
set the_HTML to source
end tell
end tell
on wait_to_finish()
delay 2
tell application "Safari"
repeat with i from 1 to (2 * minutes)
if (do JavaScript "document.readyState" in document 1) = "complete" then
return true
else
delay 1
end if
end repeat
end tell
return false
end wait_to_finish
Second, why use Safari at all?
set the_URL to "http://www.soundseclectic.com/cgi-bin/db/kcrw.pl?tmplt_type=se_home"
set the_HTML to (do shell script "curl " & quoted form of the_URL)
The whole script can be simplified to:
set the_URL to "http://www.soundseclectic.com/cgi-bin/db/kcrw.pl?tmplt_type=se_home"
set the_URL to (do shell script "curl " & quoted form of the_URL & " | grep '/ram_wrap.cgi?/sc' | awk -F '\"' '{print $2}' | sed -e 's/ //g'")
do shell script "curl " & quoted form of the_URL & " >> /tmp/temp.ram; open /tmp/temp.ram"
tell application "Audio Hijack" to activate
Actually, there is an error in what I posted. There should only be a single angle bracket to pipe the output to the file, not two angle brackets. Two would append the data to the file (if it exists) instead of overwriting it (as you want). So, it should be:
set the_URL to "http://www.soundseclectic.com/cgi-bin/db/kcrw.pl?tmplt_type=se_home"
set the_URL to (do shell script "curl " & quoted form of the_URL & " | grep '/ram_wrap.cgi?/sc' | awk -F '\"' '{print $2}' | sed -e 's/ //g'")
do shell script "curl " & quoted form of the_URL & " > /tmp/temp.ram; open /tmp/temp.ram"
tell application "Audio Hijack" to activate
And, assuming that “Audio Hijack” is in your Applications folder, here’s the (very long and bloated) one liner:
do shell script "URL=`curl http://www.soundseclectic.com/cgi-bin/db/kcrw.pl?tmplt_type=se_home | grep '/ram_wrap.cgi?/sc' | awk -F '\"' '{print $2}' | sed -e 's/ //g'`; curl $URL > /tmp/temp.ram; open /tmp/temp.ram; open '/Applications/Audio Hijack.app'"
If the link was correctly coded like this example, then that’s all you would need. However, when testing this script on the actual content, the HTML coding had an error. Instead of of the code being: