Tuesday, February 9, 2010

#1 2008-05-04 09:20:00 am

Martin Michel
Administrator
From: Berlin, Germany
Registered: 2008-03-03
Posts: 577
Website

Conveniently convert PDF documents to JPG images

Lately I received an eMail from someone asking me if there was a possibility to convert all pages of a PDF file to JPG images using AppleScript.

I knew about the pdf2tiff script from Dinu C. Gherman, but never heard about a similar solution for JPG conversion. Moreover pdf2tiff currently just converts one page of a PDF file to TIFF, not all pages.

And then there is the versatile sips command, but this only converts the first page of a PDF file to JPG:

sips -s format jpeg /Users/martin/Desktop/sample.pdf -out /Users/martin/Desktop/sample.jpg


But the automation request really kept me thinking and so I finally sat down and quickly modified Gherman's Python script to convert all pages of a PDF document to JPG. In addition I wrote a convenient AppleScript droplet to execute this Python script on dropped PDF files.

I invite you to download the result right here, maybe you can also make good use of the script:



The script was tested on Mac OS X 10.5.2 and does not run on Mac OS X 10.4 or earlier incarnations of our beloved operating system. It works on Intel & PowerPC based Macs and also asks you to choose a resolution to be used for the image conversion.

For the future it would be nice to let the user choose single page numbers (or a range of page numbers) to be converted to JPG as well as set the image compression factor, but currently I have no time to implement this.

This is the modified Python script responsible for the PDF to JPG conversion step.

This is the AppleScript droplet code that executes the above Python script on every dropped PDF document:

Applescript:


-- created: 01.04.2008
-- version: 0.1
-- tested on:
-- • Mac OS X 10.5.2
-- • Intel & PowerPC based Macs

-- This script will convert dropped PDF files to JPG images.
-- The JPG images are saved in the same folder as the
-- PDF source files. If a JPG file already exists,
-- it won't be replaced. The PDF source files are
-- not modified. Until now, all pages of a PDF file
-- are converted to JPG. It would be a nice feature,
-- if the user could also choose only certain page
-- numbers to be converted. Future?

property mytitle : "PDF2JPG"
property batchresolution : missing value

-- I am called when the user drops Finder items onto the script icon
on open droppeditems
   my main(droppeditems)
end open

-- I am called when the user double clicks the script icon
on run
   set infomsg to "I am a hungry AppleScript droplet, so please drop a bunch of PDF files onto my icon to convert them to JPG images."
   my dspinfomsg(infomsg)
   return
end run

-- I am the main function controlling the script flow
on main(droppeditems)
   try
       -- initializing important script properties
       set batchresolution to missing value
       -- searching th dropped items for PDF files
       set pdfpaths to my getpdfpaths(droppeditems)
       -- no PDF files found :(
       if pdfpaths is {} then
           set errmsg to "You did not drop any PDF documents onto the script."
           my dsperrmsg(errmsg, "--")
           return
       end if
       -- processing the PDF files
       repeat with pdfpath in pdfpaths
           -- getting the image resolution to be used fot the PDF2JPG conversion
           if batchresolution is missing value then
               set resolution to my askforresolution(pdfpath)
           else
               set resolution to batchresolution
           end if
           -- did the user provide a resolution?
           if resolution is not missing value then
               -- yes, so let's convert the PDF to JPG
               my pdf2jpg(pdfpath, resolution)
           end if
       end repeat
       -- catching unexpected errors
   on error errmsg number errnum
       my dsperrmsg(errmsg, errnum)
   end try
end main

-- I am searching the dropped items for PDF files
-- and return a list of unquoted Posix file paths
on getpdfpaths(droppeditems)
   set pdfpaths to {}
   repeat with droppeditem in droppeditems
       set iteminfo to info for droppeditem
       if folder of iteminfo is false and name extension of iteminfo is "pdf" then
           set pdfpaths to pdfpaths & (POSIX path of (droppeditem as Unicode text))
       end if
   end repeat
   return pdfpaths
end getpdfpaths

-- I am returning the Posix path to the Python script
-- responsible for the PDF manipulation, which is
-- located in the application bundle
on getpyscriptpath()
   set pyscriptpath to ((path to me) as Unicode text) & "Contents:Resources:pdflib.py"
   return (POSIX path of pyscriptpath)
end getpyscriptpath

-- I am returning the total page count of a given PDF file
-- [PDF file path must be passed as an unquoted Posix path]
on getpagecount(pdfpath)
   set action to "getpagecount"
   set cmd to "python" & space & quoted form of (my getpyscriptpath()) & space & action & space & quoted form of pdfpath
   set cmd to cmd as «class utf8»
   set pagecount to (do shell script cmd) as integer
   return pagecount
end getpagecount

-- I am converting a given PDF file to JPG
-- [PDF file path must be passed as an unquoted Posix path]
on pdf2jpg(pdfpath, resolution)
   set action to "pdf2jpg"
   set cmd to "python" & space & quoted form of (my getpyscriptpath()) & space & action & space & quoted form of pdfpath & space & resolution
   set cmd to cmd as «class utf8»
   do shell script cmd
end pdf2jpg

-- I am asking the user to provide a value for the resolution
on askforresolution(pdfpath)
   set msg to "Please enter a resolution used for the JPG conversion of the followng PDF file (72-600):"
   try
       tell me
           display dialog msg default answer "72" buttons {"Use value for batch", "Cancel", "Enter"} default button 3 with title mytitle
           set dlgresult to result
       end tell
   on error errmsg number errnum
       -- user hit 'Cancel' button :(
       if errnum is equal to -128 then
           return missing value
       end if
   end try
   set resolution to text returned of dlgresult
   -- empty input...asking again :)
   if resolution is "" then
       my askforresolution(pdfpath)
   else
       try
           -- can the input be coerced to an integer?
           set resolution to resolution as integer
       on error
           -- no, it can't...
           set errmsg to "The entered resolution is not a number."
           my dsperrmsg(errmsg, "--")
           my askforresolution(pdfpath)
       end try
       -- is the given resolution valid?
       if resolution > 600 then
           -- no, it's to high...
           set errmsg to "The entered resolution (" & resolution & ") exceeds the maximum value (600)."
           my dsperrmsg(errmsg, "--")
           my askforresolution(pdfpath)
       else if resolution < 0 then
           -- no, it's to low...
           set errmsg to "The entered resolution (" & resolution & ") is a negative value."
           my dsperrmsg(errmsg, "--")
           my askforresolution(pdfpath)
       else
           -- finally...
           if button returned of dlgresult is "Use value for batch" then
               set batchresolution to resolution
           end if
           return resolution
       end if
   end if
end askforresolution

-- I am displaying info messages
on dspinfomsg(infomsg)
   tell me
       activate
       display dialog infomsg buttons {"OK"} default button 1 with icon note with title mytitle
   end tell
end dspinfomsg

-- I am displaying error messages, hopefully rather seldom :)
on dsperrmsg(errmsg, errnum)
   set msg to "Sorry, an error occured:" & return & return & errmsg & " (" & errnum & ")"
   tell me
       activate
       display dialog msg buttons {"OK"} default button 1 with icon stop with title mytitle
   end tell
end dsperrmsg


Sal Soghoian is my role model.

Filed under: PDF, conversion, JPG, Python

Offline

 

Board footer

Powered by FluxBB

[ Generated in 0.257 seconds, 8 queries executed ]

RSS (new topics) RSS (active topics)