Download multiple files from a page one by one

dreadnought · March 27, 2009, 6:55pm

Hi,

I’m a switcher, so I’m quite new to AppleScript. I am familliar with Java, C and C++ programming languages, but quite newbie to AppleScript :rolleyes: I’d like to automate downloads from a file sharing portal but with the free option, so I can download only on one thread (I mean I can’t download multiple files simultaneously). I have a page, where there are links to the download pages. I have to transform that link with text replace e.g.: “http://path.to.download.server.com/123456/download_file.zip.html” → “http://dl.server.com/0/123456/download_file.zip”, so I have to transform the urls first, then download them one by one into /User/me/Downloads/myDir

It would be very appreciated if you could show me complete script (maybe with comments, so I could learn also not just use the script)

Thanks in advance!

Best regards,

Dreadnought

StefanK · March 27, 2009, 7:41pm

Hi,

I couldn’t test anything, because I can’t simulate the page with the links, but try this.
It loads the page with Safari, retrieves the links using javascript, transforms them and download the files one by one.
The property lines contain the path portions


property site_url : "http://path.to.download.server.com/123456/" -- page containing the links
property findText : "path.to.download.server.com" -- portion of the path to replace
property replaceText : "dl.server.com/0" -- replace text

set downloadFolder to POSIX path of (path to downloads folder) & "myDir/" -- destination folder (the folder /Users/me/Documents/myDir must exist)

tell application "Safari"
	activate
	open location site_url -- open page
end tell
-- wait until page loaded
if page_loaded(20) is false then return
-- get number of links
tell application "Safari" to set num_links to (do JavaScript "document.links.length" in document 1)
repeat with i from 0 to (num_links - 1)
	-- get each link
	tell application "Safari" to set this_link to do JavaScript "document.links[" & i & "].text" in document 1
	-- if the link begins with the URL site_url download the file (cut also the extension .html)
	if this_link starts with site_url then downloadLink(text 1 thru -6 of this_link)
end repeat

--
on downloadLink(theLink)
	-- replace findText.
	set {TID, text item delimiters} to {text item delimiters, findText}
	set theLink to text items of theLink
	-- .with replaceText
	set text item delimiters to replaceText
	set theSource to theLink as text
	set text item delimiters to "/"
	-- extract file name
	set fileName to last text item of theSource
	set text item delimiters to TID
	-- use cURL to download the file
	do shell script "curl -o " & quoted form of (downloadFolder & fileName) & space & quoted form of theSource
end downloadLink

-- handler to wait until the page is loaded completely
on page_loaded(timeout_value)
	delay 2
	repeat with i from 1 to the timeout_value
		tell application "Safari"
			if (do JavaScript "document.readyState" in document 1) is "complete" then
				return true
			else if i is the timeout_value then
				return false
			else
				delay 1
			end if
		end tell
	end repeat
	return false
end page_loaded

dreadnought · March 31, 2009, 2:36pm

Thank you!

I will try it out as soon as my tiny daughter give me some spare time after work

Best regards,

dreadnought

dreadnought · May 2, 2009, 5:56pm

Dear StefanK,

It almost works. I modified it as the correct need is to navigate to the replaced links and wait for the downloading as it is on a fixed bandwidth 1800 sec is enough. Somehow the first file is downloaded successfully and after the script can not find the this_link variable.

Can you help me to solve it?

My modified script is:

property site_url : "file:///Users/me/Documents/mylinks.html" -- page containing the links
property findText : "http://server/get/" -- portion of the path to replace
property replaceText : "http://dl.server/get/0/" -- replace text

set downloadFolder to "/Users/me/Downloads/" -- destination folder (the folder /Users/me/Documents/myDir must exist)

tell application "Safari"
	activate
	open location site_url -- open page
end tell
-- wait until page loaded
if page_loaded(20) is false then return
-- get number of links
tell application "Safari" to set num_links to (do JavaScript "document.links.length" in document 1)
repeat with i from 0 to (num_links - 1)
	-- get each link
	tell application "Safari" to set this_link to (do JavaScript "document.links[" & i & "].text" in document 1)
	set {TID, text item delimiters} to {text item delimiters, findText}
	set theLink to text items of this_link
	-- .with replaceText
	set text item delimiters to replaceText
	set theSource to theLink as text
	set text item delimiters to "/"
	
	--tell application "Safari" to do JavaScript "window.alert('" & text 1 thru -6 of theSource & "')" in document 1
	-- if the link begins with the URL site_url download the file (cut also the extension .html)
	-- if this_link starts with site_url then 
	tell application "Safari" to do JavaScript "window.open('" & text 1 thru -6 of theSource & "')" in document 1
	delay 1800
end repeat

-- handler to wait until the page is loaded completely
on page_loaded(timeout_value)
	delay 2
	repeat with i from 1 to the timeout_value
		tell application "Safari"
			if (do JavaScript "document.readyState" in document 1) is "complete" then
				return true
			else if i is the timeout_value then
				return false
			else
				delay 1
			end if
		end tell
	end repeat
	return false
end page_loaded

Best regards,

Dreadnought