export safari page from source with applescript

I’m trying to make a script that exports a safari page from source as a html document to a given folder. Posts from the archives recommends using either a curl command or a system events solution. The first gives pages that look less like the original page than what you get if you use safaris export from source manually, the latter is kinda kludgy. I have written some scripts and the safari applescript dictionary entries for save and source that I’m trying to use. I’m getting errors, but I’m not sure what they mean. Any help would be appreciated!

These are scripts I’ve written so far:

  tell application "Safari"
    set folderToSaveSafariWindowIn to "Q:Ø:"
    set pageToBeSaved to front window
    save document pageToBeSaved as source in alias pageToSaveSafariWindowIn
end tell

RESULT LOG:

tell application “Safari” get window 1 → window id 6017 save document (window id 6017) as source in alias “Q:Ø:” → error number -1700 from window id 6017 to integer

error “Safari got an error: Can’t make window id 6017 into type integer.” number -1700 from window id 6017 to integer

tell application "Safari"
    save source of document in "Q:Ø:"
end tell

RESULT:

error “Can’t get source of document.” number -1728 from «class conT» of document

These are entries from the applescript dictionary:

Model: macbook pro
AppleScript: NA
Browser: Safari 533.20.27
Operating System: Mac OS X (10.7)

try
	tell application "Safari"
		set x to URL of document 1
		set r to do shell script "echo " & quoted form of x & " | sed 's|/$||;s|:|%3A|g;s|/|%2F|g'"
		do shell script "curl " & x & " > " & quoted form of ((system attribute "HOME") & "/Desktop/" & r & ".html")
	end tell
end try

Thank you! I think there may be some sort of bug here though – see below. Would the output be saved in html the same way as in the scripts with the curl command?

I get this output: " "

The event log says:

tell application "Safari"
	get URL of document 1
		--> "http://macscripter.net/post.php?tid=37175"
	do shell script "echo 'http://macscripter.net/post.php?tid=37175' | sed 's|/$||;s|:|%3A|g;s|/|%2F|g'"
		--> error number -10004
end tell
tell current application
	do shell script "echo 'http://macscripter.net/post.php?tid=37175' | sed 's|/$||;s|:|%3A|g;s|/|%2F|g'"
		--> "http%3A%2F%2Fmacscripter.net%2Fpost.php?tid=37175"
end tell
tell application "Safari"
	system attribute "HOME"
		--> error number -10004
end tell
tell current application
	system attribute "HOME"
		--> "/Users/ivindkulsrud"
end tell
tell application "Safari"
	do shell script "curl http://macscripter.net/post.php?tid=37175 > '/Users/ivindkulsrud/Desktop/http%3A%2F%2Fmacscripter.net%2Fpost.php?tid=37175.html'"
		--> error number -10004
end tell
tell current application
	do shell script "curl http://macscripter.net/post.php?tid=37175 > '/Users/ivindkulsrud/Desktop/http%3A%2F%2Fmacscripter.net%2Fpost.php?tid=37175.html'"
		--> ""
end tell
Result:
""

Hi,

in Safari the keyword source is a property of class document.
It cannot be used as a parameter in the save command.

To get the HTML content of a page curl is indeed the best way .
The echo - sed part is not needed, quoted form is sufficient and much more reliable


tell application "Safari" to set {theURL, theTitle} to {URL, name} of document 1
set {TID, text item delimiters} to {text item delimiters, "/"} -- exchange all slashes with underscores
set theTitle to text items of theTitle
set text item delimiters to "_"
set theTitle to theTitle as text
set text item delimiters to TID

do shell script "curl " & quoted form of theURL & " > " & quoted form of (POSIX path of (path to desktop) & theTitle & ".html")


got it, thank you stefan

Or, as you use Safari anyway


tell application "Safari" to set {theSource, theTitle} to {source, name} of document 1
set {TID, text item delimiters} to {text item delimiters, "/"} -- exchange all slashes with underscores
set theTitle to text items of theTitle
set text item delimiters to "_"
set theTitle to theTitle as text
set text item delimiters to TID
set theFile to ((path to desktop as text) & theTitle & ".html")
try
	set fRef to open for access file theFile with write permission
	write theSource to fRef  as «class utf8»
	close access fRef
on error
	try
		close access file theFile
	end try
end try


This avoids to load the same page twice