I have pdf files which are print-outs of web pages (using cups-pdf). (If you don’t understand this, assume that I have saved my web pages as pdf files)
At the footer (bottom) of each page of such pdf file, I can see the URL showing the source of the web page (i.e. the address of the web page e.g. http://macscripter.net/viewtopic.php?id=16145)
Now, I want to extract the URL of each of such pdf file and open it in Firefox. (I am prepared to use UI scripting to deal with Firefox)
I am unable to find out how to refer to the footer of each page to get the URL. Though I use Skim for pdf files, I don’t mind using Preview for this task. I just want the URLs. Since there are a large number of pdf files (about 50 files), a script that would process selected files in Finder would be better. Please get me started with the URL part. I will probably figure my way out from there.
set thefile to choose file
tell application "Finder" to set filename to name of thefile
tell application "Adobe Acrobat Professional"
activate
open thefile
execute menu item "SelectAll" of menu "Edit"
if enabled of menu item "Copy" of menu "Edit" then
execute menu item "Copy" of menu "Edit"
set theText to the clipboard
else
set theText to "No text found in PDF"
end if
close document 1 saving no
end tell
set url_list to {}
repeat with i from 1 to the count of paragraphs of theText
set this_para to paragraph i of theText
if this_para contains "http://" then
set oldDels to AppleScript's text item delimiters
set AppleScript's text item delimiters to "http://"
set a to text item 2 of this_para
set AppleScript's text item delimiters to " "
set the_url to text item 1 of a
set AppleScript's text item delimiters to oldDels
copy "http://" & the_url to end of url_list
end if
end repeat
choose from list url_list
Thanks a lot, Blend3
I will modify it for Skim pdf reader. I got the idea of what you are trying to do. I guess, if we are searching through paragraph beginnings (instead of paragraph “contains”), then it would make sense if we checked only the last paragraph and that too of only page 1 of each pdf file because all pages have the same link. Would it be possible to do so?
Hi Chris,
I totally agree with your logic but Skim doesn’t appear to retain the formating of the pdf when you get every paragraph or text, so I came up with this:
set thefile to choose file
tell application "Skim"
open thefile
set theText to get text for page 1 of document 1
set oldDels to AppleScript's text item delimiters
set AppleScript's text item delimiters to "http://"
set a to text item 2 of theText
set AppleScript's text item delimiters to " "
set b to text item 1 of a
set AppleScript's text item delimiters to oldDels
set the_url to "http://" & b
end tell
the_url
To blend3:
Thanks, blend3 for writing the script. (I had some problems using the script which i will be able to fix if i have to use your script)
To Jacques:
Many thanks once again. Your script is exactly what i had asked for when i made my first post in this thread.
I was using Safari for 6 months and I had problems with the way Safari saved web pages. So I used to save web pages as pdf. But it had several drawbacks which I could no longer live with. It took me quite some time to find the perfect way to save web pages. I won’t name that Firefox plugin since I don’t want anyone to think that I am unduly publicizing it.
I can’t tell you how astonished I am with the way the script worked.
I set up a hotkey for your script and I could convert all my 150+ pdf files (scattered in several different places) into web pages within 3 minutes (because the script also works in Finder window which shows spotlight results)
I still can’t believe how quickly it all happened.
I wish I knew what this code meant:
Can you (or anyone) tell me what in bash should i refer to, to understand it? (I know nothing about bash)