Friday, August 12, 2022

#1 2022-06-09 09:33:38 am

epaminos
Member
From:: Greece
Registered: 2019-10-18
Posts: 94

Parsing data from specific website

Hi all,

My HR department is using an online tool to process our requests.

So I would like to help them with speeding up their searches there.

On that tool (webpage), a lot of tables exist, but the Cmd+F key combination does not work on any browser in order to search for specific text strings.

Thus we have to manually search the entries there.

On a spreadsheet we have a list of some employee names that we would like to search them on the online tool.

So, is there any way to parse all the contents that appear on the screen with apple script?

Tried already to parse the html code, but unfortunately it did not work. And I identified a lot of JavaScripts in the source code.

Offline

 

#2 2022-06-09 10:04:41 am

estockly
Member
Registered: 2009-01-03
Posts: 80

Re: Parsing data from specific website

Do you mean like this?

Applescript:

tell application "Safari"
   tell its window 1
       tell its tab 1
           set pageText to its text
       end tell
   end tell
end tell

Last edited by estockly (2022-06-09 10:05:20 am)

Online

 

#3 2022-06-09 10:49:31 am

epaminos
Member
From:: Greece
Registered: 2019-10-18
Posts: 94

Re: Parsing data from specific website

Yes, exactly.

I have tried this as well, but unfortunately nothing is returned besides only the header of the page.

Moreover, not even the key combination cmd + F is functional. 

According to the source code, the table is created via java queries. But nothing is selectable on it.

So we are not able to grab any data. There is only a search field on the top of the column (like the filters in Excel) where we can type the query we want.

Offline

 

#4 2022-06-09 10:52:19 am

robertfern
Member
Registered: 2011-11-29
Posts: 172

Re: Parsing data from specific website

Without an example of the html text of the page we won't be able to diagnose.

Also is the table returned a picture? Which would explain not be able to select the text

Last edited by robertfern (2022-06-09 10:53:19 am)

Offline

 

#5 2022-06-09 12:11:06 pm

estockly
Member
Registered: 2009-01-03
Posts: 80

Re: Parsing data from specific website

Try this:

Applescript:


use framework "Foundation"

tell application "Safari"
   tell current tab of window 1
       set URLStr to URL
   end tell
end tell

set fileText to getUrlSource(URLStr)

on getUrlSource(URLStr)
   set theURL to current application's class "NSURL"'s URLWithString:URLStr
   set theData to current application's NSData's dataWithContentsOfURL:theURL
   set theString to current application's NSString's alloc()'s initWithData:theData encoding:(current application's NSUTF8StringEncoding)
   set theString to theString as text
   return theString
end getUrlSource

If that doesn't work there's a few other tricks we can try.

Last edited by estockly (2022-06-09 12:30:59 pm)

Online

 

#6 2022-06-09 12:41:50 pm

epaminos
Member
From:: Greece
Registered: 2019-10-18
Posts: 94

Re: Parsing data from specific website

I had tried with this as well.

This is what I get:

"<!DOCTYPE html>
<html xmlns:ng=\"\" xmlns:tb=\"\">
  <head>
    <meta charset=\"UTF-8\">
    <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">
    <meta name=\"viewport\" content=\"initial-scale=1, maximum-scale=2, width=device-width, height=device-height, viewport-fit=cover\">
    <meta name=\"format-detection\" content=\"telephone=no\">
  <meta name=\"vizportal-config\" data-buildId=\"2022_1_134_5k3d1n9tc\" data-staticAssetsUrlPrefix=\"\"><link href=\"vendors-vizportal.css?6d14c6ba7ab54bd\" rel=\"stylesheet\"><link href=\"vizportal.css?6d14c6ba7ab54bd\" rel=\"stylesheet\"><script type=\"text/javascript\" src=\"/javascripts/api/tableau-2.min.js?6d14c6ba7ab54bd\"></script><script type=\"text/javascript\" src=\"jquery.min.js?6d14c6ba7ab54bd\"></script><script type=\"text/javascript\" src=\"rsa.js?6d14c6ba7ab54bd\"></script><script type=\"text/javascript\" src=\"underscore-min.js?6d14c6ba7ab54bd\"></script><script type=\"text/javascript\" src=\"q.min.js?6d14c6ba7ab54bd\"></script><script type=\"text/javascript\" src=\"canvas-to-blob.min.js?6d14c6ba7ab54bd\"></script><script type=\"text/javascript\" src=\"js.cookie.min.js?6d14c6ba7ab54bd\"></script><script type=\"text/javascript\" src=\"mousetrap.js?6d14c6ba7ab54bd\"></script><script type=\"text/javascript\" src=\"core.min.js?6d14c6ba7ab54bd\"></script><script type=\"text/javascript\" src=\"vendors-vizportal.js?6d14c6ba7ab54bd\"></script><script type=\"text/javascript\" src=\"vizportal.js?6d14c6ba7ab54bd\"></script></head>
  <body class=\"tb-body\">
    <div class=\"tb-app\" id=\"app-root\" ng-app=\"VizPortalRun\" id=\"ng-app\" tb-window-resize>
    <tb:app></tb:app><a href=\"index.html\">index.html</a>
      <script type=\"text/ng-template\" id=\"inline_stackedElement.html\">
        <div class=\"tb-absolute\" tb-window-resize tb-left=\"left\" tb-top=\"top\" tb-right=\"right\" tb-bottom=\"bottom\" tb-visible=\"visible\" tb-overflow-y=\"overflowY\"></div>
      </script>
      <tb:stacked-elements></tb:stacked-elements>
    </div>
    <script src=\"add_status_button.js\"></script>
  </body>
</html>
"

Offline

 

#7 2022-06-09 02:03:39 pm

estockly
Member
Registered: 2009-01-03
Posts: 80

Re: Parsing data from specific website

You know, I thought this looked familiar. This is a script I wrote about a year ago and use every day from the the Scripts menu:

Open your page in Safari then run this script. If you don't have BBEdit then change that to TextEdit.

Applescript:

use AppleScript version "2.4"
use scripting additions
use framework "Foundation"


tell application "Safari"
   tell current tab of window 1
       set URLStr to URL
       set fileName to the name
   end tell
end tell

set fileName to SlugifyText(fileName)
set fileName to fileName & ".html"
set fileText to getUrlSource(URLStr)
set newFile to ((path to desktop) as text) & fileName

set textWritten to my WriteToFile(newFile, fileText)

tell application "Safari"
   set pageText to text of fileText
   open file newFile
   tell current tab of window 1
       
       set pageText to its text
   end tell
end tell

tell application "BBEdit"
   make new window at beginning
   set text of window 1 to pageText
   activate
end tell
on getUrlSource(URLStr)
   set theURL to current application's class "NSURL"'s URLWithString:URLStr
   set theData to current application's NSData's dataWithContentsOfURL:theURL
   set theString to current application's NSString's alloc()'s initWithData:theData encoding:(current application's NSUTF8StringEncoding)
   set theString to theString as text
   return theString
end getUrlSource

on WriteToFile(myFile, dataToWrite)
   --(alias, text; list; record, etc.)
   try
       set openFile to open for access myFile with write permission
   on error errMsg number errNum
       try
           close access myFile
           set openFile to open for access myFile with write permission
       on error errMsg number errNum
           return {"Error:", errMsg, errNum}
       end try
   end try
   set eof of openFile to 1
   write dataToWrite to openFile
   close access openFile
   return myFile
end WriteToFile

on ReplaceAllInText(findString, replaceString, textToFix)
   
   set saveTID to AppleScript's text item delimiters
   repeat
       set AppleScript's text item delimiters to findString as list
       set textToFix to every text item of textToFix
       if (count of textToFix) = 1 then
           set textToFix to textToFix as text
           exit repeat
       end if
       set AppleScript's text item delimiters to {replaceString}
       set textToFix to textToFix as text
       if replaceString is in {findString} then exit repeat -- exits after one pass to avoid infinite loop
   end repeat
   set AppleScript's text item delimiters to saveTID
   return textToFix as text
end ReplaceAllInText

on SlugifyText(textToSlugify)
   set newText to {}
   set textToSlugify to ReplaceAllInText({"'"}, {""}, textToSlugify)
   set textToSlugify to ReplaceAllInText({"."}, {""}, textToSlugify)
   set textToSlugify to ReplaceAllInText({","}, {"-"}, textToSlugify)
   set textToSlugify to ReplaceAllInText({"!"}, {""}, textToSlugify)
   set textToSlugify to ReplaceAllInText({"?"}, {""}, textToSlugify)
   
   set saveTID to AppleScript's text item delimiters
   set AppleScript's text item delimiters to {""}
   set textToSlugify to text items of textToSlugify
   repeat with thisitem in textToSlugify
       if (thisitem as text) is in "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890-_" then
           set the end of newText to thisitem as text
       else
           set the end of newText to "-"
       end if
   end repeat
   if newText is not {} then
       set newText to newText as text
       
       set newText to my ReplaceAllInText("_", "-", newText)
       set newText to my ReplaceAllInText("--", "-", newText)
   else
       set newText to "slug"
   end if
   set AppleScript's text item delimiters to saveTID
   return newText
end SlugifyText

Last edited by estockly (2022-06-09 02:18:20 pm)

Online

 

#8 2022-06-09 03:11:15 pm

estockly
Member
Registered: 2009-01-03
Posts: 80

Re: Parsing data from specific website

robertfern wrote:

Also is the table returned a picture? Which would explain not be able to select the text


FYI, heres' a page I use that script on, where you cannot select the text, but it is text, not an image:

https://tv.apple.com/us/show/prehistori … qz400akxav

Last edited by estockly (2022-06-09 03:11:59 pm)

Online

 

#9 2022-06-09 04:12:53 pm

epaminos
Member
From:: Greece
Registered: 2019-10-18
Posts: 94

Re: Parsing data from specific website

Unfortunately, in Bbedit only one line is created with text "index.html"

Offline

 

#10 2022-06-09 06:43:12 pm

estockly
Member
Registered: 2009-01-03
Posts: 80

Re: Parsing data from specific website

When the local page opens does the table display?

Online

 

#11 2022-06-10 12:11:55 am

epaminos
Member
From:: Greece
Registered: 2019-10-18
Posts: 94

Re: Parsing data from specific website

No, nothing is loading.

It is just a plain html file with only a link “index.html” which is not doing anything.

Offline

 

#12 2022-06-10 12:43:09 am

estockly
Member
Registered: 2009-01-03
Posts: 80

Re: Parsing data from specific website

What happens if you save it to the reading list?

Online

 

#13 2022-06-10 01:05:41 am

epaminos
Member
From:: Greece
Registered: 2019-10-18
Posts: 94

Re: Parsing data from specific website

It behaves exactly like a normal bookmark.

Nothing is saved locally and when clicked, the portal loads.

Offline

 

#14 2022-06-12 04:54:22 am

KniazidisR
Member
From:: Greece
Registered: 2019-03-03
Posts: 2447

Re: Parsing data from specific website

epaminos wrote:

It behaves exactly like a normal bookmark.

Nothing is saved locally and when clicked, the portal loads.


So, when you say "when clicked this <normal bookmark>, the portal loads", you have clicked link, which opens other content.

Where? On the new webpage in the browser, or on your Mac server location? Or, it opens in some special installed application? It is very unclear.

Last edited by KniazidisR (2022-06-12 04:57:18 am)


Model: MacBook Pro
OS X: Catalina 10.15.7
Web Browser: Safari 14.1
Ram: 4 GB

Online

 

#15 2022-06-12 05:56:16 am

epaminos
Member
From:: Greece
Registered: 2019-10-18
Posts: 94

Re: Parsing data from specific website

Thank you for your interest and sorry for not being clear.

So when adding the portal to the reading list, it is actually saved as a plain bookmark.

Nothing is saved locally, no any content, besides of only the web address, which -when clicked- is loading in the Safari tab.

Offline

 

#16 2022-06-12 07:28:48 am

KniazidisR
Member
From:: Greece
Registered: 2019-03-03
Posts: 2447

Re: Parsing data from specific website

So, you try to parse the webpage which doesn't contain any table, but only one link "index.html", which does only 1 thing:

When clicked, it creates new bookmark in "Bookmarks.plist" of folder Safari of folder Library of Home directory.

You can:

1) click someway (manually or using do JavaScript) the "index.html" link to save this bookmark in "Bookmarks.plist".
2) read key URLString of this bookmark from "Bookmarks.plist" to variable theURL.
3) open this URL using

Applescript:


open location theURL

4) now, parse the text content of new loaded webpage, which contains table contents indeed.

epaminos wrote:


Nothing is saved locally, no any content, besides of only the web address, which -when clicked- is loading in the Safari tab.


I think, you can simply  open .webloc location file, which you call "saved address". To open the webpage, which contains the table(s).

Simple example, which opens the .webloc location file on my Mac:

Applescript:


tell application "Finder" to open file "Apple HD:Users:123:Desktop:MacScripter.webloc"

Last edited by KniazidisR (2022-06-12 07:57:10 am)


Model: MacBook Pro
OS X: Catalina 10.15.7
Web Browser: Safari 14.1
Ram: 4 GB

Online

 

#17 2022-06-17 06:31:45 am

technomorph
Member
Registered: 2017-12-14
Posts: 255

Re: Parsing data from specific website

Create a NSXMLdocument from the URL.
Get the NSXMLNodes you want by performing
A XPath query on the document.
Go thru each of the nodes and get the data from
Those nodes

http://preserve.mactech.com/articles/ma … index.html

Offline

 

#18 2022-06-17 07:02:48 am

epaminos
Member
From:: Greece
Registered: 2019-10-18
Posts: 94

Re: Parsing data from specific website

Interesting stuff!

I have to dig more into it.

Thank you all, though, so far for your replies.

Offline

 

Board footer

Powered by FluxBB

RSS (new topics) RSS (active topics)