Search MacScripter topics for indicated app, save as webarchives

Following script finds and saves all MacScripter topics related to indicated by the user application.

It works, but I have a suspicion that it can be improved.

It would be interesting to hear rational suggestions for improving the script from users of our site.

For example, I could not figure out how to determine the number of pages found through a page search (1,2,3,4 … NEXT). I solved this problem clumsily by simply loading the next possible page at random and checking that its links hadn’t been found before.

Well, other suggestions are also interesting to hear.


use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions
use framework "Foundation"
use framework "AppKit"
use framework "WebKit"
property thePath : missing value
property login : "KniazidisR" -- set here your login name
property |password| : "myPassword" -- set here your password
property searchTopicsForApplication : "BBEdit" -- set here the application name



-- make folders (if need)
set desktopFolderHFS to (path to desktop folder) as text
tell application "Finder" to if not (exists folder (desktopFolderHFS & "MacScripter Topics:")) then make new folder at folder desktopFolderHFS with properties {name:"MacScripter Topics"}
set macscripterFolder to alias (desktopFolderHFS & "MacScripter Topics:")
set destinationFolderHFS to (macscripterFolder as text) & searchTopicsForApplication
tell application "Finder" to if not (exists folder destinationFolderHFS) then make new folder at macscripterFolder with properties {name:searchTopicsForApplication}
set destinationFolderPath to POSIX path of (alias destinationFolderHFS)

-- we will perform search for following expression
set searchExpression to "tell application \"" & searchTopicsForApplication & "\""

-- login to MacScripter
my loginToMacScripter(login, |password|)
my waitSafariWebPageLoading(10) -- wait maximum 10 seconds

-- search MacScripter Topics by indicated serach expression (by keywords)
set searchExpression to my encodeSearchExpression(searchExpression)

-- get URL indexes of topics from every page
set URLIndexes to my getURLIndexes(searchExpression)

-- save topics founded for indicated application as webarchives
my saveTopicsAsWebarchive(destinationFolderPath, URLIndexes)




------------------------------------------- HANDLERS --------------------------------------------

on archivePage:thePageURL toPath:aPath
	set my thePath to aPath -- store path for use later
	my performSelectorOnMainThread:"setUpWebViewForPage:" withObject:thePageURL waitUntilDone:true
end archivePage:toPath:

on setUpWebViewForPage:thePageURL
	-- needs to be done on the main thread
	-- make a WebView
	set theView to current application's WebView's alloc()'s initWithFrame:{origin:{x:0, y:0}, |size|:{width:100, height:100}}
	-- tell it to call delegate methods on me
	theView's setFrameLoadDelegate:me
	-- load the page
	theView's setMainFrameURL:thePageURL
end setUpWebViewForPage:

-- called when the WebView loads a frame
on WebView:aWebView didFinishLoadForFrame:webFrame
	-- the main frame is our interest
	if webFrame = aWebView's mainFrame() then
		-- get the data and write it to file
		set theArchiveData to webFrame's dataSource()'s webArchive()'s |data|()
		theArchiveData's writeToFile:thePath atomically:true
		-- display notification "The webarchive was saved"
	end if
end WebView:didFinishLoadForFrame:

on WebView:WebView didFailLoadWithError:theError forFrame:webFrame
	-- got an error, bail
	WebView's stopLoading:me
	display notification "The webarchive was not saved"
end WebView:didFailLoadWithError:forFrame:

on stringOffsetBeginOrEnd:theString withOffset:theChar withOption:theOption
	set aString to theString
	set x to the offset of theChar in aString
	if (theOption = "begin") then
		return (text from x to 1 of aString)
	else if (theOption = "end") then
		return (text from (x + 2) to -1 of aString)
	else
		error number -128
	end if
end stringOffsetBeginOrEnd:withOffset:withOption:

on saveTopicsAsWebarchive(destinationFolderPath, URLIndexes)
	repeat with URLIndex in URLIndexes
		-- open location
		tell application "Safari" to make new document with properties ¬
			{URL:("https://www.macscripter.net/viewtopic.php?id=" & URLIndex)}
		set isLoaded to my waitSafariWebPageLoading(10) -- check page full loading
		if isLoaded then
			if document "Info / MacScripter" of application "Safari" exists then
				-- badTopic URL : "The link you followed is incorrect or outdated."
				close document "Info / MacScripter" of application "Safari" saving no
			else
				-- Tell Safari to get the name and URL of document 1
				tell application "Safari" to tell document 1 to set {theName, theURL} to {name, URL}
				-- Correct the name string.
				set theName to (my stringOffsetBeginOrEnd:theName withOffset:"/" withOption:"end")
				-- Output path of the file.
				set thePath to destinationFolderPath & URLIndex & ". " & theName & " .webarchive"
				-- Run the handler.
				(my archivePage:theURL toPath:thePath)
				-- Following 2 lines is optional. Remove them to speed up the process
				display notification "Webpage saved"
				delay 7 -- this is to to allow switching notification
			end if
			-- close current document
			tell application "Safari" to close documents saving no
		end if
	end repeat
end saveTopicsAsWebarchive

on getURLIndexes(searchExpression)
	set URLIndexes to {}
	repeat with pageNumber from 1 to 1000
		set searchURL to "https://macscripter.net/search.php?action=search&keywords=" & searchExpression & "&author=&forum=-1&sort_by=5&sort_dir=DESC&show_as=topics&search=Submit&p=" & pageNumber
		tell application "Safari" to open location (searchURL)
		my waitSafariWebPageLoading(10) -- wait maximum 10 seconds
		set pageURLIndexes to my getPageURLIndexes()
		if URLIndexes contains pageURLIndexes then exit repeat
		set URLIndexes to URLIndexes & pageURLIndexes
		tell application "Safari" to close front window
	end repeat
	return URLIndexes
end getURLIndexes

on getPageURLIndexes()
	set pageURLIndexes to {}
	set i to -1
	set {ATID, AppleScript's text item delimiters} to {AppleScript's text item delimiters, {"viewtopic.php?id=", quote}}
	repeat
		try
			set i to i + 1
			tell application "Safari" to set end of pageURLIndexes to text item 3 of (do JavaScript "document.getElementsByClassName('tclcon')[" & i & "].innerHTML" in document 1)
		on error
			exit repeat
		end try
	end repeat
	set AppleScript's text item delimiters to ATID
	return pageURLIndexes
end getPageURLIndexes

on loginToMacScripter(login, |password|)
	tell application "Safari"
		activate
		open location ("http://macscripter.net/login.php")
		my waitSafariWebPageLoading(10) -- wait maximum 10 seconds
		do JavaScript "login.req_username.value = \"" & login & "\"" in document 1
		do JavaScript "login.req_password.value = \"" & |password| & "\"" in document 1
		do JavaScript "login.login.click()" in document 1
	end tell
end loginToMacScripter

on encodeSearchExpression(searchExpression)
	set ATID to AppleScript's text item delimiters
	set AppleScript's text item delimiters to space
	set searchExpression to text items of searchExpression
	set AppleScript's text item delimiters to "+"
	set searchExpression to searchExpression as text
	set AppleScript's text item delimiters to "\""
	set searchExpression to text items of searchExpression
	set AppleScript's text item delimiters to "%22"
	set searchExpression to searchExpression as text
	set AppleScript's text item delimiters to ATID
	return searchExpression
end encodeSearchExpression

on waitSafariWebPageLoading(loadingWaitMaximumSeconds as integer)
	set lineChangingChars to {linefeed, return, character id 11, character id 12, character id 133, character id 8232, character id 8233}
	set {ATID, htmlEnding} to {AppleScript's text item delimiters, ""}
	tell application "Safari"
		repeat 100 * loadingWaitMaximumSeconds times
			delay 0.1
			set AppleScript's text item delimiters to {"<", ">"}
			try
				copy text item -2 of (get source of front document) to htmlEnding
			end try
			set AppleScript's text item delimiters to lineChangingChars
			set htmlEnding to text items of htmlEnding
			set AppleScript's text item delimiters to ""
			set htmlEnding to "<" & htmlEnding & ">"
			if htmlEnding is "</html>" then exit repeat
		end repeat
	end tell
	set AppleScript's text item delimiters to ATID
	if htmlEnding is "</html>" then return true
	display notification "The webpage loading failed"
	return false
end waitSafariWebPageLoading

Unfortunately, I cannot get your script to work for me but I presume that this due to my running Sierra.

On the matter of determining the number of result pages —assuming I understand correctly— could you not grab the collection of page links after the string "Pages:: " (all contained within: #punsearch > div.linkst > div > p.pagelink)?

You should end up with a list of URLs, each ending with the page number—no more than nine links. Then there would a be a couple of ways to determine the last page… e.g. if text of last link is ‘NEXT’ then get the text of the second last link otherwise get text of last link, or throw the page numbers of all the links into a list and find the highest number.

Thank you for your interest.

But what exactly doesn’t work on Sierra?

I draw your attention to the fact that the script uses Safari, and you also need to specify your real account (name) and correct password in the script.

Interesting but today it is working for me (I just tried it again with different search and a results set of 9 and all pages were saved). FWIW, today when I ran it with the original search, it didn’t begin well — no pages captured until the 33rd result. The window would open but remain blank and then close quickly, typically in under three seconds.

Perhaps it was affected by a vestige of the outage we have experienced around these parts the past few days. Service is still spotty. When I tried it yesterday, it would open windows and then close them; it never actually saved a page.

One issue is that the saved webarchives all have a space before the ‘.webarchive’.

KniazidisR, your script works quite well! I find it to be very useful for searching for MacScripter topics and archiving them.

How would you modify your script to search for topics within the last 3 years?
In my attempt to search for topics within the last 3 years, I added the text “after:2019”, using Google’s search syntax, to your searchExpression variable. This however resulted in failure.

Do you have any suggestions on a method to find topics limited to the past three years?

@akim,

I don’t feel like filtering by year. But I will show you the way to write working code yourself.

First, you must add a new filter (by years) to the getPageURLIndexes() handler. This handler opens topics one by one. You should start by finding the publication date of the current topic:


-- get creation year of MacScripter topic, opened currently in the Safari
tell application "Safari"
	set topicCreationDate to do JavaScript "document.getElementsByClassName('blockpost rowodd firstpost')[0].getElementsByTagName('a')[0].innerHTML" in document 1
	set topicCreationYear to text 1 thru 4 of topicCreationDate
end tell

Then, as soon as the script reaches the 1st topic, which is older than 3 years, you should exit the repeat loop (exit repeat command).

NOTE: you can run this snippet (above) as is on this current topic to get its registration year. :slight_smile:

Thank you KniazidisR,
I liked your idea of getElementbyClassName. I modified your idea a bit as follows:

on getPageURLIndexes()
	set pageURLIndexes to {}
	set Year3Prior to (current date) - 3 * 52 * weeks --3 years
	set ATID to AppleScript's text item delimiters
	set i to -1
	repeat
		try
			
			set i to i + 1
			tell application "Safari"
				
				set TopicIDJS to "document.getElementsByClassName('tclcon')[" & i & "].innerHTML"
				set TopicIDInnerHtml to (do JavaScript TopicIDJS in document 1)
				set j to i + 1 --assign j to i plus 1, as the index in the Last Post column of any row does not match the index of the Scripting Forums column
				set TopicDateLastPostedInnerTextJS to "document.getElementsByClassName('tcr')[" & j & "].innerText"
				set TopicDateLastPostedInnerText to (do JavaScript TopicDateLastPostedInnerTextJS in document 1)
				
			end tell
			
			set {y, m, d} to words 1 through 3 of TopicDateLastPostedInnerText
			set AppleScript's text item delimiters to "-"
			set TopicDateLastPosted to date ({m, d, y} as string)
			if TopicDateLastPosted < Year3Prior then -- exit repeat if TopicDateLastPosted is earlier than three years from current date
				exit repeat
			end if
			set AppleScript's text item delimiters to {"viewtopic.php?id=", quote}
			set TopicID to text item 3 of TopicIDInnerHtml
			set end of pageURLIndexes to TopicID
			
		on error
			exit repeat
		end try
	end repeat
	set AppleScript's text item delimiters to ATID
	return pageURLIndexes
end getPageURLIndexes

There’s an issue with this script that will cause it to fail if a user doesn’t have Safari as their default browser.

The issue is here:


on getURLIndexes(searchExpression)
   set URLIndexes to {}
   repeat with pageNumber from 1 to 1000
       set searchURL to "https://macscripter.net/search.php?action=search&keywords=" & searchExpression & "&author=&forum=-1&sort_by=5&sort_dir=DESC&show_as=topics&search=Submit&p=" & pageNumber
       open location (searchURL)
       my waitSafariWebPageLoading(10) -- wait maximum 10 seconds
       set pageURLIndexes to my getPageURLIndexes()
       if URLIndexes contains pageURLIndexes then exit repeat
       set URLIndexes to URLIndexes & pageURLIndexes
       tell application "Safari" to close front window
   end repeat
   return URLIndexes
end getURLIndexes

The “open location” command is not addressed to Safari, so it opens the default browser on the user’s mac.

Easy enough fix, just change this line:

       open location (searchURL)

with this:

    tell application "Safari" to     open location (searchURL)

When I tried to run this, Safari would open, then Google Chrome, which would open a bunch of tabs, and Safari would open a new window and close it immediately.

Nothing was saved. The fix above took care of it.

Now I need to figure out what to do with the hundreds of webarchives it produces. I’m thinking a spotlight search for it.

Your remark is very helpful. I have updated the script due to its relevance. Thank you.