Parse cUrl Download

Emacs · April 30, 2011, 11:29pm

I have a favorite blog about security but sometimes I forget to check that day and even worse sometimes I don’t check for a week. I made a script similar to this except that it mails me the url to an address and I put that script on my server and made it check around 11:50. Then on my client I have the mail app running and made it check for the subject and if it is Schneier then it will download the html on my computer. The script below when ran will check for blog posts for today and download them to the desktop if there is any. It is a good lesson in Parsing cUrl and Downloading.

set todayNumber to the day of (current date) as integer
set todayMonth to the month of (current date) as text
set todayYear to the year of (current date) as text
set SchneierDate to todayMonth & " " & todayNumber & "," & " " & todayYear
--April 29, 2011 Format
do shell script "curl [url=http://www.schneier.com/]http://www.schneier.com/"[/url]
set SchneierHTML to result

--Get Number Of Current Blog Posts
set AppleScript's text item delimiters to SchneierDate

try
	repeat with i from 1 to 10
		set CountListing to text item i of SchneierHTML
	end repeat
on error
	set ListingsCount to (i - 2)
end try

--Get Number Of Current Blog Posts

--Get URL
repeat with i from 2 to (ListingsCount + 1)
	set AppleScript's text item delimiters to "<h2 class=" & quote & "entry"
	set URLPart1 to text item i of SchneierHTML
	set AppleScript's text item delimiters to "a href=" & quote
	set URLPart2 to text item 2 of URLPart1
	set AppleScript's text item delimiters to quote & ">"
	set URLPart3 to text item 1 of URLPart2
	--Get URL
	
	--Download HTML
	set the FileDestination to "Macintosh HD:Users:YOURUSERNAME:Desktop:Schneier " & (i - 1) & " " & SchneierDate & ".html"
	
	tell application "URL Access Scripting"
		try
			download URLPart3 to file FileDestination replacing yes
			
		end try
	end tell
end repeat
--Download HTML

StefanK · May 1, 2011, 2:39pm

Hi,

the repeat loop to get the number of currents posts is not needed


tell (current date) to set SchneierDate to (its month as text) & " " & its day & "," & " " & its year
--April 29, 2011 Format
set SchneierHTML to do shell script "curl [url=http://www.schneier.com/]http://www.schneier.com/"[/url]

--Get Number Of Current Blog Posts
set {TID, text item delimiters} to {text item delimiters, SchneierDate}

set currentBlogPosts to text items of SchneierHTML
set numberOfCurrentBlogPosts to count currentBlogPosts
if numberOfCurrentBlogPosts > 1 then
	--Get URL
	repeat with i from 2 to numberOfCurrentBlogPosts
		set text item delimiters to "<h2 class=" & quote & "entry"
		set URLPart1 to text item i of currentBlogPosts
		set text item delimiters to "a href=" & quote
		set URLPart2 to text item 2 of URLPart1
		set text item delimiters to quote & ">"
		set URLPart3 to text item 1 of URLPart2
		--Get URL
		
		--Download HTML
		set the FileDestination to (path to desktop as text) & "Schneier " & (i - 1) & " " & SchneierDate & ".html"
		
		tell application "URL Access Scripting"
			try
				download URLPart3 to file FileDestination replacing yes
			end try
		end tell
	end repeat
end if
--Download HTML
set text item delimiters to TID

Hans-Gerd_Classen · September 17, 2011, 1:11pm

Hallo Stefan,

just upgraded to lion …

do you know of a equivalent to “URL Access Scripting” in lion?
Used it for downloading picFiles from Web … it’s been so easy!!

Hans-Gerd ClaÃŸen

StefanK · September 17, 2011, 1:44pm

GrÃ¼ezi Hans-Gerd,

cURL is as easy as URL scripting

in the example above replace


tell application "URL Access Scripting"
	try
		download URLPart3 to file FileDestination replacing yes
	end try
end tell

with


try
	do shell script "curl -o " & quoted form of POSIX path of FileDestination & " " & quoted form of URLPart3
end try

Hans-Gerd_Classen · September 17, 2011, 1:58pm

Great

Thx Stefan!