Automating Downloads From Embedded Links

I need some help writing an Applescript that will automatically download the PDF from every instance of multiple “View PDF” buttons on this Chrome page:

https://www.lakewoodnj.gov/agendas

Essentially, the problem is that these links are not direct downloads where I can just right click and save the PDF files, but rather open the native Chrome viewer, which means I have to manually click on every View PDF button in order to download the associated file.

I can temporarily set the Chrome preference to download the files instead of displaying the viewer first, but even then it still takes multiple steps each time to download a single PDF, multiplied by hundreds of buttons on dozens of pages from this site.

I am looking to automate the process so that it cycles through every instance of these “View PDF” buttons and downloads each file without any further input from me, essentially simulating clicking the first “View PDF” button, performing the download, closing the window, then moving to the next “View PDF” button on the page and repeating the process until all the downloads are completed for that page.

You can grab the source from that URL and find the relative URLs of the PDFs.

i.e.
" data-src="/images/db/u-1-2711-6-18-24-Public-Hearing.pdf"

But you need the root of that URL to download directly. Opening the PDF you can grab the page source again and find the root.

i.e.
"https://www.lakewoodnj.gov/images/db/"

Now you can construct the full URLS for the PDFs.

i.e. "https://www.lakewoodnj.gov/images/db/u-1-2711-6-18-24-Public-Hearing.pdf"

HTH

1 Like

I was curious how the approach suggested by Paul would work and roughed-out a script that opens all of the PDFs in Safari. This worked well on my Sonoma computer. Pavilion would want to edit the script to individually save instead of open the PDFs, though.

use framework "Foundation"
use scripting additions

set theURL to "https://www.lakewoodnj.gov/agendas"
set urlOne to "https://www.lakewoodnj.gov"
set sourceCode to (do shell script "curl " & theURL)
delay 5 -- test different values
set pdfURLs to getMatchingStrings(sourceCode)

set urlList to {}
repeat with aURL in pdfURLs
	set urlTwo to (aURL's stringByReplacingOccurrencesOfString:"data-src=\"" withString:"") as text
	set end of urlList to urlOne & (urlTwo as text)
end repeat

display dialog "A total of " & (count urlList) & " PDFs were found and will be opened in Safari"
tell application "Safari"
	activate
	tell window 1
		repeat with aURL in urlList
			set current tab to (make new tab with properties {URL:aURL})
		end repeat
	end tell
end tell

on getMatchingStrings(theString)
	set theString to current application's NSString's stringWithString:theString
	set thePattern to "data-src=.*\\.pdf"
	set theRegex to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
	set regexResults to theRegex's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
	set theRanges to (regexResults's valueForKey:"range")
	set theMatches to current application's NSMutableArray's new()
	repeat with aRange in theRanges
		(theMatches's addObject:(theString's substringWithRange:aRange))
	end repeat
	return theMatches
end getMatchingStrings
1 Like

Here’s my version which doesn’t use a web browser.

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

on run
	local mySource, aFile, tid
	set mySource to do shell script "curl https://www.lakewoodnj.gov/agendas"
	set tid to text item delimiters
	set text item delimiters to {"data-src=\""}
	set mySource to text items of mySource
	set text item delimiters to {"\""}
	repeat with aFile in mySource
		set aFile to contents of aFile
		if aFile contains ".pdf" then
			do shell script "cd ~ ; curl -O " & "https://www.lakewoodnj.gov/" & text item 1 of aFile
		end if
	end repeat
	set text item delimiters to tid
end run

The “~” will make the script save to the user’s home directory. You can change this to a hard coded posix path if you wish.

This line isn’t necessary but can be surprisingly costly depending on what’s getting dereferenced.

I’ve always found the opposite to be true.
When you use a referenced variable , it gets dereferenced temporarily in each use of the variable.
That line dereferences it permanately during its contents lifetime