Hi everyone,
I’m fairly new to Applescript and I’m trying to make a program that may be a bit over my head, but so far I’ve made some good headway (my script looks very amateur as I haven’t coded in almost 4 years).
Ultimately I am trying to take a string of text (a software title) and return the company name for a database I’m compiling. You can probably get the gist of what I’m doing from what I’ve written thus far, which is listed below. I’ve used TextWrangler and BBEdit in several instances due to their extensive library of commands.
The problem I’m having now is that I’m trying to select just the company name or at the very worst the lines containing the company name in the footer of almost all websites (e.g. see below, in “© Copyright 2002“2005 Rickard Andersson”, I would just like to select “Rickard Andersson”).
Any help would be greatly appreciated. P.S. the Automator script I run simply views the source of the Safari webpage and copies it to the clipboard.
Thanks!
Peter Leenhouts
tell application "TextWrangler"
activate
set countNum to {count lines of window 1}
repeat with i from 1 to countNum
tell window 1
select line i
end tell
copy selection
set textEntry to selection
set theURL to "http://www.google.com/search?hl=en&q=" & textEntry & "&btnI=I%27m+Feeling+Lucky"
tell application "Safari"
activate
make new document with properties {URL:theURL}
delay 10
tell application "Automator Launcher"
set workflow to "/Users/pleenhou/Desktop/Coding Project/view_source.workflow"
set macWorkflow to POSIX file workflow as text
open macWorkflow
delay 2
end tell
tell application "BBEdit"
activate
paste clipboard in window 1
find "©" searching in text 1 of window 1 options {search mode:literal, starting at top:false, wrap around:false, backwards:true, case sensitive:false, match words:true, extend selection:false} selecting match 1
set countChar to contents of selection
if (count countChar) ≥ 1 then
add suffix selection suffix "
"
copy selection
else
find "copyright" searching in text 1 of window 1 options {search mode:literal, starting at top:false, wrap around:false, backwards:true, case sensitive:false, match words:true, extend selection:false} selecting match 1
set countChar2 to contents of selection
if (count countChar2) ≥ 1 then
add suffix selection suffix "
"
set offset selection
copy selection
else
find "©" searching in text 1 of window 1 options {search mode:literal, starting at top:false, wrap around:false, backwards:true, case sensitive:false, match words:true, extend selection:false} selecting match 1
add suffix selection suffix "
"
copy selection
end if
end if
tell application "BBEdit"
activate
set countLine to {count lines of window 1}
select lines 1 thru countLine of window 1
delete selection
paste clipboard in window 2
end tell
delay 2
set countLine to {count lines of window 1}
select lines 1 thru countLine of window 1
delete selection
end tell
tell application "Safari"
close window 1
end tell
end tell
end repeat
end tell
Model: Macbook Pro C2D 2.16
AppleScript: 1.10.7
Browser: Safari 419.3
Operating System: Mac OS X (10.4)
If I understand what you’re trying to do, you can do most of it within a script without resorting to all of the external apps and Automator actions. Can you give an example of an Application you might be searching for and what the answer might look like so I don’t have to sort out what the Automator action portion of your script does? Perhaps the best illustration would be what you would do without the script.
The automator portion of the code is used to view the source code of the loaded page and then copy that to the clipboard. I wasn’t exactly sure how to do that in applescript, but found an easy solution in Automator.
Hi Peter,
first welcome to MacScripter.
As Adam mentioned, it would be very helpful to get an example of a real search term and the expected result
Here is an example of a search term that would be used: “Pinnacle Instant Copy”
::obviously, the company name is included in this one, but occasionally it is not::
This term is then used in a Google - “I’m Feeling Lucky” Search
The desired result, if you look at the bottom of the Google result which would be at
http://www.pinnaclesys.com/PublicSite/us/Home/
is “Pinnacle Systems, Inc.”, this occurs after the text “©2007”
Here’s how to get the HTML text of your URL as text in a variable called tHTML. After that you have to search it, but in my trials, that’s not easy because some sites use the word “copyright” more than once, some use the © symbol, etc.
set Ghead to "http://www.google.com/search?hl=en&q="
set Gtail to "&btnI=I%27m+Feeling+Lucky"
set tEntry to "NetNewsWire" -- you would put your grabbed entry here: perhaps "set tEntry to the clipboard".
tell application "Safari"
open location Ghead & tEntry & Gtail
delay 2
repeat with t from 1 to 5 -- this is to wait for the page to be complete.
if (do JavaScript "document.readyState" in document 1) is "complete" then
exit repeat
else
delay 1
end if
end repeat
set tHTML to source of front document
end tell
Thanks Adam!
That definitely helps speed up the process and now I don’t have to sit and let that Automator workflow run.
For searching the documents for “copyright”, ©, etc., I have been doing a reverse search starting at the bottom and taking the first instance as that seems to be where most occur.
The only problem now is how do I select the company name or at the very least the line it occurs on?
Thanks again for your help!
Hi,
here a similar approach, which filters the lines either with “©” oder"Copyright".
The problem is, each company could use a different way to write the copyright line
set copyRightLines to ""
tell application "TextWrangler"
-- activate
set countNum to {count lines of window 1}
repeat with i from 1 to countNum
tell window 1
select line i
end tell
copy selection
set textEntry to selection
set theURL to "http://www.google.com/search?hl=en&q=" & textEntry & "&btnI=I%27m+Feeling+Lucky"
tell application "Safari"
open location theURL
my page_loaded(20)
set s to text of document 1
end tell
set p to paragraphs of (do shell script "echo " & quoted form of s & " | grep 'Copyright\\|©'")
if (count p) > 1 then
repeat with j in p
if j contains "20" then
set copyRightLines to copyRightLines & contents of j & return
exit repeat
end if
end repeat
else
set copyRightLines to copyRightLines & item 1 of p & return
end if
tell application "Safari" to close window 1
end repeat
end tell
display dialog copyRightLines
on page_loaded(timeout_value)
delay 2
repeat with i from 1 to timeout_value
tell application "Safari"
if (do JavaScript "document.readyState" in document 1) is "complete" then
return true
else if i is timeout_value then
return false
else
delay 1
end if
end tell
end repeat
return false
end page_loaded
Stefan you are a miracle worker!!
This is amazing! Only one thing. Is it possible to take the results and put them into TextEdit or something to create a list?
Thank you everyone so much! I really appreciate your help.
Nevermind, I got it to work!
Again thank you so much!
Peter
replace the display dialog line with this,
it writes the result in a file copyright.txt on your desktop
set ff to open for access file ((path to desktop as Unicode text) & "copyright.txt") with write permission
write copyRightLines to ff
close access ff
One other problem I’m having is that occasionally there are websites that do not contain any company information. Is there a way I can include a blank line for those sites?
I guess, you get an error message, if there is no copyright informations
Try this:
set copyRightLines to ""
tell application "TextWrangler"
-- activate
set countNum to {count lines of window 1}
repeat with i from 1 to countNum
tell window 1
select line i
end tell
copy selection
set textEntry to selection
set theURL to "http://www.google.com/search?hl=en&q=" & textEntry & "&btnI=I%27m+Feeling+Lucky"
tell application "Safari"
open location theURL
my page_loaded(20)
set s to text of document 1
end tell
try
set p to paragraphs of (do shell script "echo " & quoted form of s & " | grep 'Copyright\\|©'")
if (count p) > 1 then
repeat with j in p
if j contains "20" then
set copyRightLines to copyRightLines & contents of j & return
exit repeat
end if
end repeat
else
set copyRightLines to copyRightLines & item 1 of p & return
end if
on error
set copyRightLines to copyRightLines & textEntry & ": no copyright information" & return
end try
tell application "Safari" to close window 1
end repeat
end tell
set ff to open for access file ((path to desktop as Unicode text) & "copyright.txt") with write permission
write copyRightLines to ff
close access ff
on page_loaded(timeout_value)
delay 2
repeat with i from 1 to timeout_value
tell application "Safari"
if (do JavaScript "document.readyState" in document 1) is "complete" then
return true
else if i is timeout_value then
return false
else
delay 1
end if
end tell
end repeat
return false
end page_loaded
Alternatively:
tell application "TextWrangler"
set theseItems to contents of lines of front text document
end tell
set copyrightLines to {}
repeat with thisItem in theseItems
try
do shell script "/usr/bin/python -c 'import sys, urllib; print urllib.quote(unicode(sys.argv[1], \"utf8\"))' " & quoted form of thisItem -- encode query
do shell script "/usr/bin/curl --silent --show-error --location --user-agent '' " & quoted form of ("http://www.google.com/search?hl=en&q=" & result & "&btnI=I%27m+Feeling+Lucky") & ¬
" | /usr/bin/ruby -e 'print $stdin.read.gsub(/<br ?\\/?>/, \"\\n\").gsub(%r{</?[^>]+?>}, \"\")' | /usr/bin/grep 'Copyright\\|©'"
set end of copyrightLines to last paragraph of result
on error errMsg number errNum
if errMsg is "The command exited with a non-zero status." then
set end of copyrightLines to thisItem & ": no copyright information"
else
error errMsg number errNum
end if
end try
end repeat
set ASTID to AppleScript's text item delimiters
set AppleScript's text item delimiters to ASCII character 10
set copyrightLines to "" & copyrightLines
set AppleScript's text item delimiters to ASTID
writeFile from copyrightLines into ((path to desktop as Unicode text) & "copyright.txt") without appending
on writeFile from someData into someFile given appending:appending
try
open for access someFile with write permission
set fileRef to result
if not appending then set eof of fileRef to 0
write someData to fileRef as (class of someData) starting at eof
close access fileRef
return true
on error errMsg number errNum
try
close access fileRef
end try
error errMsg number errNum
end try
end writeFile
I’ve been fooling around trying to get the scripts to return just one line per search, thus if I line them up in excel they would properly match. However, some titles have been returning 2 or 3 lines of results. Is there a way to limit the results to a single line?
Thanks,
Peter