Parsing data from specific website

Hi all,

My HR department is using an online tool to process our requests.

So I would like to help them with speeding up their searches there.

On that tool (webpage), a lot of tables exist, but the Cmd+F key combination does not work on any browser in order to search for specific text strings.

Thus we have to manually search the entries there.

On a spreadsheet we have a list of some employee names that we would like to search them on the online tool.

So, is there any way to parse all the contents that appear on the screen with apple script?

Tried already to parse the html code, but unfortunately it did not work. And I identified a lot of JavaScripts in the source code.

Do you mean like this?

tell application "Safari"
	tell its window 1
		tell its tab 1
			set pageText to its text
		end tell
	end tell
end tell

Yes, exactly.

I have tried this as well, but unfortunately nothing is returned besides only the header of the page.

Moreover, not even the key combination cmd + F is functional.

According to the source code, the table is created via java queries. But nothing is selectable on it.

So we are not able to grab any data. There is only a search field on the top of the column (like the filters in Excel) where we can type the query we want.

Without an example of the html text of the page we won’t be able to diagnose.

Also is the table returned a picture? Which would explain not be able to select the text

Try this:


use framework "Foundation"

tell application "Safari"
	tell current tab of window 1
		set URLStr to URL
	end tell
end tell

set fileText to getUrlSource(URLStr)

on getUrlSource(URLStr)
	set theURL to current application's class "NSURL"'s URLWithString:URLStr
	set theData to current application's NSData's dataWithContentsOfURL:theURL
	set theString to current application's NSString's alloc()'s initWithData:theData encoding:(current application's NSUTF8StringEncoding)
	set theString to theString as text
	return theString
end getUrlSource

If that doesn’t work there’s a few other tricks we can try.

I had tried with this as well.

This is what I get:

You know, I thought this looked familiar. This is a script I wrote about a year ago and use every day from the the Scripts menu:

Open your page in Safari then run this script. If you don’t have BBEdit then change that to TextEdit.

use AppleScript version "2.4"
use scripting additions
use framework "Foundation"

 
tell application "Safari"
	tell current tab of window 1
		set URLStr to URL
		set fileName to the name
	end tell
end tell

set fileName to SlugifyText(fileName)
set fileName to fileName & ".html"
set fileText to getUrlSource(URLStr)
set newFile to ((path to desktop) as text) & fileName

set textWritten to my WriteToFile(newFile, fileText)

tell application "Safari"
	set pageText to text of fileText
	open file newFile
	tell current tab of window 1
		
		set pageText to its text
	end tell
end tell

tell application "BBEdit"
	make new window at beginning
	set text of window 1 to pageText
	activate
end tell
on getUrlSource(URLStr)
	set theURL to current application's class "NSURL"'s URLWithString:URLStr
	set theData to current application's NSData's dataWithContentsOfURL:theURL
	set theString to current application's NSString's alloc()'s initWithData:theData encoding:(current application's NSUTF8StringEncoding)
	set theString to theString as text
	return theString
end getUrlSource

on WriteToFile(myFile, dataToWrite)
	--(alias, text; list; record, etc.)
	try
		set openFile to open for access myFile with write permission
	on error errMsg number errNum
		try
			close access myFile
			set openFile to open for access myFile with write permission
		on error errMsg number errNum
			return {"Error:", errMsg, errNum}
		end try
	end try
	set eof of openFile to 1
	write dataToWrite to openFile
	close access openFile
	return myFile
end WriteToFile

on ReplaceAllInText(findString, replaceString, textToFix)
	
	set saveTID to AppleScript's text item delimiters
	repeat
		set AppleScript's text item delimiters to findString as list
		set textToFix to every text item of textToFix
		if (count of textToFix) = 1 then
			set textToFix to textToFix as text
			exit repeat
		end if
		set AppleScript's text item delimiters to {replaceString}
		set textToFix to textToFix as text
		if replaceString is in {findString} then exit repeat -- exits after one pass to avoid infinite loop
	end repeat
	set AppleScript's text item delimiters to saveTID
	return textToFix as text
end ReplaceAllInText

on SlugifyText(textToSlugify)
	set newText to {}
	set textToSlugify to ReplaceAllInText({"'"}, {""}, textToSlugify)
	set textToSlugify to ReplaceAllInText({"."}, {""}, textToSlugify)
	set textToSlugify to ReplaceAllInText({","}, {"-"}, textToSlugify)
	set textToSlugify to ReplaceAllInText({"!"}, {""}, textToSlugify)
	set textToSlugify to ReplaceAllInText({"?"}, {""}, textToSlugify)
	
	set saveTID to AppleScript's text item delimiters
	set AppleScript's text item delimiters to {""}
	set textToSlugify to text items of textToSlugify
	repeat with thisitem in textToSlugify
		if (thisitem as text) is in "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890-_" then
			set the end of newText to thisitem as text
		else
			set the end of newText to "-"
		end if
	end repeat
	if newText is not {} then
		set newText to newText as text
		
		set newText to my ReplaceAllInText("_", "-", newText)
		set newText to my ReplaceAllInText("--", "-", newText)
	else
		set newText to "slug"
	end if
	set AppleScript's text item delimiters to saveTID
	return newText
end SlugifyText


FYI, heres’ a page I use that script on, where you cannot select the text, but it is text, not an image:

https://tv.apple.com/us/show/prehistoric-planet/umc.cmc.4lh4bmztauvkooqz400akxav

Unfortunately, in Bbedit only one line is created with text “index.html”

When the local page opens does the table display?

No, nothing is loading.

It is just a plain html file with only a link “index.html” which is not doing anything.

What happens if you save it to the reading list?

It behaves exactly like a normal bookmark.

Nothing is saved locally and when clicked, the portal loads.

So, when you say “when clicked this , the portal loads”, you have clicked link, which opens other content.

Where? On the new webpage in the browser, or on your Mac server location? Or, it opens in some special installed application? It is very unclear.

Thank you for your interest and sorry for not being clear.

So when adding the portal to the reading list, it is actually saved as a plain bookmark.

Nothing is saved locally, no any content, besides of only the web address, which -when clicked- is loading in the Safari tab.

So, you try to parse the webpage which doesn’t contain any table, but only one link “index.html”, which does only 1 thing:

When clicked, it creates new bookmark in “Bookmarks.plist” of folder Safari of folder Library of Home directory.

You can:

  1. click someway (manually or using do JavaScript) the “index.html” link to save this bookmark in “Bookmarks.plist”.
  2. read key URLString of this bookmark from “Bookmarks.plist” to variable theURL.
  3. open this URL using

open location theURL
  1. now, parse the text content of new loaded webpage, which contains table contents indeed.

I think, you can simply open .webloc location file, which you call “saved address”. To open the webpage, which contains the table(s).

Simple example, which opens the .webloc location file on my Mac:


tell application "Finder" to open file "Apple HD:Users:123:Desktop:MacScripter.webloc"

Create a NSXMLdocument from the URL.
Get the NSXMLNodes you want by performing
A XPath query on the document.
Go thru each of the nodes and get the data from
Those nodes

http://preserve.mactech.com/articles/mactech/Vol.21/21.06/XMLParser/index.html

Interesting stuff!

I have to dig more into it.

Thank you all, though, so far for your replies.