Back then when I worked for an electroplating company I needed a script, which extracted part of a PDF document (patent) and saved it into a new file to be managed in a special database. I wrote a complicated Python script to do this, but now Skim - an excellent, free and well scriptable PDF viewer - supports this functionality out of the box:
tell application "Skim"
tell document 1
set tiffdata to grab page 1 for {0, 300, 300, 0} with TIFF format
set pdfdata to grab page 1 for {0, 300, 300, 0} without TIFF format
-- add your handler to save the data to a file
end tell
end tell
I found this to be extremely helpful.
I found your post really interesting, i’m trying to extract a page from a multipage PDF file and save it to a new (single page) PDF file.
I want to do this through AppleScript or Automator, but really can’t figure out how. Searching a bit the forum (and google) figured out that the preview application isn’t really scriptable, so i installed skim thinking it could do the job easily, but can’t find out how to “paste” the selected page to a new document (and save it).
Thanks in advance
Giovanni
Hi Giovanni,
I have written a small foundation tool for you, which extracts a given page number from a PDF input file and writes it to a new PDF output file. I named it getpage and you can have a look at the short source code here.
In order to get the idea how to use it in combination with AppleScript, I built a sample droplet named pagegetter for you, which can be downloaded for free here. It also contains the ready-for-use command line tool.
If you drop a bunch of PDF files onto the sample script, it will ask you for the page number to extract (e.g. 15) and then process all dropped files, extracting this page number and writing it to a new file.
The new files are saved in the same location as the original files, but their naming scheme is a bit different:
Original file name: test.pdf
Page number to extract: 15
New file name: test_15.pdf
Currently, an existing file path will never be overwritten by the script.
The command line tool getpage will run on Mac OS X 10.5 or higher, I tested the sample script on 10.6.
I hope you can actually make good use of this.
Best regards from snowy Berlin,
Martin
property mytitle : "pagegetter"
-- I am called when the user opens the script with a double click
on run
tell me
activate
display dialog "Please drop a bunch of PDF files onto my icon to extract a certain page number from each of them into a new file." buttons {"OK"} default button 1 with icon note with title mytitle
end tell
end run
-- I am called when the user drops Finder items onto the script's icon
on open finderitems
try
-- searching for PDF files
set pdffiles to {}
repeat with finderitem in finderitems
set finderiteminfo to info for finderitem
if (not folder of finderiteminfo) and (name of finderiteminfo ends with ".pdf") then
set pdffiles to pdffiles & finderitem
end if
end repeat
-- no PDF files found :(
if pdffiles is {} then
set errmsg to "Could not find any PDF documents in the dropped Finder items."
my dsperrmsg(errmsg, "--")
return
end if
-- which page number should be extracted?
set pagenumber to my askforpagenumber()
-- locating the command line tool inside my bundle...
set toolpath to ((path to me) as text) & "Contents:Resources:getpage"
set qtdtoolpath to quoted form of POSIX path of toolpath
-- processing the found PDF files
repeat with pdffile in pdffiles
set pdffileinfo to info for pdffile
set pdffilename to (name of pdffileinfo) as text
set outputpdffilename to ((characters 1 through -5 of pdffilename) & "_" & pagenumber & ".pdf") as text
set outputpdffilepath to (my getparentfolderpath((pdffile as text)) & outputpdffilename) as text
if not my itempathexists(outputpdffilepath) then
set command to qtdtoolpath & " -i " & quoted form of POSIX path of pdffile & " -p " & pagenumber & " -o " & quoted form of POSIX path of outputpdffilepath
try
do shell script command
on error errmsg number errnum
my dsperrmsg(errmsg, errnum)
end try
end if
end repeat
on error errmsg number errnum
if errnum is not equal to -128 then
my dsperrmsg(errmsg, errnum)
end if
end try
end open
-- I am asking the user to choose the page number to be extracted from the PDF files
on askforpagenumber()
try
tell me
activate
display dialog "Which page number should be extracted from the PDF files?" default answer "" buttons {"Cancel", "Enter"} default button 2 with icon note with title mytitle
set dlgresult to result
end tell
set answer to text returned of dlgresult
if answer is "" then
my askforpagenumber()
else
try
set pagenumber to answer as integer
if pagenumber is equal to 0 then
my askforpagenumber()
else
-- no more calls: we have a winner!
return pagenumber
end if
on error
my askforpagenumber()
end try
end if
on error
return missing value
end try
end askforpagenumber
-- I am indicating if a given item path already exists
on itempathexists(itempath)
try
set itemalias to itempath as alias
return true
on error
return false
end try
end itempathexists
-- I am returning the parent folder path of a given item path
on getparentfolderpath(itempath)
set olddelims to AppleScript's text item delimiters
set AppleScript's text item delimiters to ":"
set itemcount to (count text items of itempath)
set lastitem to the last text item of itempath
if lastitem = "" then
set itemcount to itemcount - 2 -- folder path
else
set itemcount to itemcount - 1 -- file path
end if
set parentfolderpath to text 1 thru text item itemcount of itempath & ":"
set AppleScript's text item delimiters to olddelims
return parentfolderpath
end getparentfolderpath
-- I am displaying error messages to the user
on dsperrmsg(errmsg, errnum)
tell me
activate
display dialog "Sorry, an error occurred:" & return & return & errmsg & " (" & errnum & ")" buttons {"OK"} default button 1 with icon stop with title mytitle
end tell
end dsperrmsg
Sorry for the late answer! Wow looks great! Thank you so much, I’ve been working on something similar, and came out with a script that might be used to send each page of a PDF to Word in PDF format.
It gives the user the chance to choose the PDF’s size on the Word Page, and adda custom caption. Then I modified it to be used as a Custom Service (through Automator).
Take a look at http://giovannimedici.altervista.org/