Batch converting *.webloc files to PDF using Safari and cups-pdf

While I was enjoying the delicious food and sunny weather in Southern Spain, I got an eMail from a fellow Mac user asking if it was possible to convert a huge collection of webloc-files to PDFs.

And here is the delayed, but positive answer: Yes, it’s possible by simply combining AppleScript, Safari and the free cups-pdf package!

Here is the script that will do the magic, as always you can download it for free:

webloc2pdf - Conveniently batch convert *.webloc files to PDF (ca. 28.3 KB)

It was successfully tested under and with Mac OS X 10.5.2, Safari 3.1.1 and cups-pdf installed and set as the current printer.

So how does this AppleScript droplet actually work? Well, at first it simply iterates over the dropped *.webloc files and extracts the URLs using the following shell command:

strings weblocfilename.webloc/rsrc | grep http | sed ‘/^.http/s//http/’ | head -1

Afterwards the AppleScript repeats the following procedure Sisyphus-like with every extracted URL:

  1. Create a new document/browser window in Safari
  2. Completely load the corresponding website therein
  3. Print the website using the great «print without print dialog» command
    (Important: The script assumes that cups-pdf is the current printer!)
  4. Close the document

That’s all, you will end up with a bunch of PDF files in your cups-pdf folder.

Here is the complete source code of the script:


-- created: 23.04.2008
-- tested on/with:
-- ¢ Mac OS X 10.5.2
-- ¢ Safari 3.1.1
-- ¢ Intel & PowerPC based Macs
-- requires:
-- ¢ cups-pdf package
-- you can get cups-pdf for free right here:
-- >> http://www.codepoetry.net/projects/cups-pdf-for-mosx

-- This AppleScript droplet batch converts dropped webloc files to
-- PDF using the Safari browser and cups-pdf. The script assumes that
-- cups-pdf is your current default printer. If you need to use specific 
-- print settings, then please create a printer preset for cups-pdf.

property mytitle : "webloc2pdf"

-- I am called when the user open the script with a double click
on run
	tell me
		activate
		display dialog "I am an AppleScript droplet." & return & return & "Please drop a bunch of webloc files onto my icon to batch convert the corresponding websites to PDF using cups-pdf." buttons {"OK"} default button 1 with title mytitle with icon note
	end tell
end run

-- I am called when the user drops Finder items onto the script icon
on open droppeditems
	try
		set weblocpaths to {}
		-- did the user drop any *.webloc files onto the script?
		repeat with droppeditem in droppeditems
			if (droppeditem as Unicode text) ends with ".webloc" then
				set weblocpaths to weblocpaths & (droppeditem as Unicode text)
			end if
		end repeat
		-- no, he or she didn't!
		if weblocpaths is {} then
			set errmsg to "You did not drop any *.webloc files onto me."
			my dsperrmsg(errmsg, "--")
		else
			-- extracting the URLs from the single webloc files
			set weblocurls to {}
			repeat with weblocpath in weblocpaths
				set weblocurls to weblocurls & my geturlfromwebloc(weblocpath)
			end repeat
			-- no URLs could be extracted :(
			if weblocurls is {} then
				set errmsg to "We could not extract any URLs from the webloc files."
				my dsperrmsg(errmsg, "--")
			else
				-- using Safari, cups-pdf and the «print without print dialog» command to batch
				-- convert the corresponding websites to PDF
				tell application "Safari"
					repeat with weblocurl in weblocurls
						make new document with properties {URL:weblocurl}
						delay 3
						set docloaded to false
						repeat 10 times
							delay 1
							set docstate to (do JavaScript "document.readyState" in document 1)
							if docstate is "complete" then
								set docloaded to true
								exit repeat
							end if
						end repeat
						if docloaded is true then
							print document 1 without print dialog
						end if
						close document 1
					end repeat
				end tell
			end if
		end if
		-- catching unexpected errors
	on error errmsg number errnum
		my dsperrmsg(errmsg, errnum)
	end try
end open

-- I am extracting the URL from a Web Internet Location file (*.webloc)
-- using the «strings» command
on geturlfromwebloc(weblocpath)
	set weblocpath to quoted form of ((POSIX path of weblocpath) & "/rsrc")
	set cmd to "strings " & weblocpath & " | grep http | sed '/^.http/s//http/' | head -1"
	set cmd to cmd as «class utf8»
	set weblocurl to do shell script cmd
	return weblocurl
end geturlfromwebloc

-- I am displaying error messages
on dsperrmsg(errmsg, errnum)
	tell me
		activate
		display dialog "Sorry, an error occured:" & return & return & errmsg & " (" & errnum & ")" buttons {"Never mind"} default button 1 with icon stop with title mytitle
	end tell
end dsperrmsg