Conveniently convert PDF documents to JPG images

Lately I received an eMail from someone asking me if there was a possibility to convert all pages of a PDF file to JPG images using AppleScript.

I knew about the pdf2tiff script from Dinu C. Gherman, but never heard about a similar solution for JPG conversion. Moreover pdf2tiff currently just converts one page of a PDF file to TIFF, not all pages.

And then there is the versatile sips command, but this only converts the first page of a PDF file to JPG:

sips -s format jpeg /Users/martin/Desktop/sample.pdf -out /Users/martin/Desktop/sample.jpg

But the automation request really kept me thinking and so I finally sat down and quickly modified Gherman’s Python script to convert all pages of a PDF document to JPG. In addition I wrote a convenient AppleScript droplet to execute this Python script on dropped PDF files.

I invite you to download the result right here, maybe you can also make good use of the script:

PDF2JPG ¢ Convert all pages of PDF files to JPG images (ca. 38.1 KB)

The script was tested on Mac OS X 10.5.2 and does not run on Mac OS X 10.4 or earlier incarnations of our beloved operating system. It works on Intel & PowerPC based Macs and also asks you to choose a resolution to be used for the image conversion.

For the future it would be nice to let the user choose single page numbers (or a range of page numbers) to be converted to JPG as well as set the image compression factor, but currently I have no time to implement this.

This is the modified Python script responsible for the PDF to JPG conversion step.

This is the AppleScript droplet code that executes the above Python script on every dropped PDF document:


-- created: 01.04.2008
-- version: 0.1
-- tested on:
-- ¢ Mac OS X 10.5.2
-- ¢ Intel & PowerPC based Macs

-- This script will convert dropped PDF files to JPG images.
-- The JPG images are saved in the same folder as the
-- PDF source files. If a JPG file already exists,
-- it won't be replaced. The PDF source files are
-- not modified. Until now, all pages of a PDF file
-- are converted to JPG. It would be a nice feature,
-- if the user could also choose only certain page
-- numbers to be converted. Future?

property mytitle : "PDF2JPG"
property batchresolution : missing value

-- I am called when the user drops Finder items onto the script icon
on open droppeditems
	my main(droppeditems)
end open

-- I am called when the user double clicks the script icon
on run
	set infomsg to "I am a hungry AppleScript droplet, so please drop a bunch of PDF files onto my icon to convert them to JPG images."
	my dspinfomsg(infomsg)
	return
end run

-- I am the main function controlling the script flow
on main(droppeditems)
	try
		-- initializing important script properties
		set batchresolution to missing value
		-- searching th dropped items for PDF files
		set pdfpaths to my getpdfpaths(droppeditems)
		-- no PDF files found :(
		if pdfpaths is {} then
			set errmsg to "You did not drop any PDF documents onto the script."
			my dsperrmsg(errmsg, "--")
			return
		end if
		-- processing the PDF files
		repeat with pdfpath in pdfpaths
			-- getting the image resolution to be used fot the PDF2JPG conversion
			if batchresolution is missing value then
				set resolution to my askforresolution(pdfpath)
			else
				set resolution to batchresolution
			end if
			-- did the user provide a resolution?
			if resolution is not missing value then
				-- yes, so let's convert the PDF to JPG
				my pdf2jpg(pdfpath, resolution)
			end if
		end repeat
		-- catching unexpected errors
	on error errmsg number errnum
		my dsperrmsg(errmsg, errnum)
	end try
end main

-- I am searching the dropped items for PDF files
-- and return a list of unquoted Posix file paths
on getpdfpaths(droppeditems)
	set pdfpaths to {}
	repeat with droppeditem in droppeditems
		set iteminfo to info for droppeditem
		if folder of iteminfo is false and name extension of iteminfo is "pdf" then
			set pdfpaths to pdfpaths & (POSIX path of (droppeditem as Unicode text))
		end if
	end repeat
	return pdfpaths
end getpdfpaths

-- I am returning the Posix path to the Python script
-- responsible for the PDF manipulation, which is
-- located in the application bundle
on getpyscriptpath()
	set pyscriptpath to ((path to me) as Unicode text) & "Contents:Resources:pdflib.py"
	return (POSIX path of pyscriptpath)
end getpyscriptpath

-- I am returning the total page count of a given PDF file
-- [PDF file path must be passed as an unquoted Posix path]
on getpagecount(pdfpath)
	set action to "getpagecount"
	set cmd to "python" & space & quoted form of (my getpyscriptpath()) & space & action & space & quoted form of pdfpath
	set cmd to cmd as «class utf8»
	set pagecount to (do shell script cmd) as integer
	return pagecount
end getpagecount

-- I am converting a given PDF file to JPG
-- [PDF file path must be passed as an unquoted Posix path]
on pdf2jpg(pdfpath, resolution)
	set action to "pdf2jpg"
	set cmd to "python" & space & quoted form of (my getpyscriptpath()) & space & action & space & quoted form of pdfpath & space & resolution
	set cmd to cmd as «class utf8»
	do shell script cmd
end pdf2jpg

-- I am asking the user to provide a value for the resolution
on askforresolution(pdfpath)
	set msg to "Please enter a resolution used for the JPG conversion of the followng PDF file (72-600):"
	try
		tell me
			display dialog msg default answer "72" buttons {"Use value for batch", "Cancel", "Enter"} default button 3 with title mytitle
			set dlgresult to result
		end tell
	on error errmsg number errnum
		-- user hit 'Cancel' button :(
		if errnum is equal to -128 then
			return missing value
		end if
	end try
	set resolution to text returned of dlgresult
	-- empty input...asking again :)
	if resolution is "" then
		my askforresolution(pdfpath)
	else
		try
			-- can the input be coerced to an integer?
			set resolution to resolution as integer
		on error
			-- no, it can't...
			set errmsg to "The entered resolution is not a number."
			my dsperrmsg(errmsg, "--")
			my askforresolution(pdfpath)
		end try
		-- is the given resolution valid?
		if resolution > 600 then
			-- no, it's to high...
			set errmsg to "The entered resolution (" & resolution & ") exceeds the maximum value (600)."
			my dsperrmsg(errmsg, "--")
			my askforresolution(pdfpath)
		else if resolution < 0 then
			-- no, it's to low...
			set errmsg to "The entered resolution (" & resolution & ") is a negative value."
			my dsperrmsg(errmsg, "--")
			my askforresolution(pdfpath)
		else
			-- finally...
			if button returned of dlgresult is "Use value for batch" then
				set batchresolution to resolution
			end if
			return resolution
		end if
	end if
end askforresolution

-- I am displaying info messages
on dspinfomsg(infomsg)
	tell me
		activate
		display dialog infomsg buttons {"OK"} default button 1 with icon note with title mytitle
	end tell
end dspinfomsg

-- I am displaying error messages, hopefully rather seldom :)
on dsperrmsg(errmsg, errnum)
	set msg to "Sorry, an error occured:" & return & return & errmsg & " (" & errnum & ")"
	tell me
		activate
		display dialog msg buttons {"OK"} default button 1 with icon stop with title mytitle
	end tell
end dsperrmsg

Hello,

This script is great it is just what I needed…except…
How difficult is it to change 2 things:

1-Have it choose a folder of PDF’s, rather than drop individual files onto it
2-have it place the jpgs when done, into a folder called “Babs JPEG Folder” on the desktop, so it keeps the same name other than add the jpg extension. My PDF’s will always be one page only, so no need to add the “-n”

thanks!!
babs

Hi babs,

I have modified the above version a bit to fit your needs. You can find the updated script right here.

If you start the script with a double click it will now ask you to choose a folder and process the PDF files found therein.

Moreover it will try to save all JPG files in a folder named “Babs JPEG Folder” located on the desktop. The created JPG files don’t contain the page number extension.

Hope it helps.

Best regards from busy Berlin

Martin

Hello Martin from Busy Berlin :wink:

Many thanks!!! Works great!

Thank you so much!
babs :smiley:

Martin, this script is excellent. We are trying to adapt it in our production department so that it works in an automated workflow named Odystar from EskoArtwork. But to work, it must be scripted in a specific way.The Odystar workflow system has an Applescript “activity” that you link a script to, and when a PDF hits that activity, it calls the script and uses any script-specific parameters entered in the activity dialog. In the Odystar manual, here is what it says about specific scripting parameters:
The scripts have to contain at least three functions:
¢ “SetupOdystarScript”,
¢ “FinalizeOdystarScript”,
¢ “DoOdystarAction”.
The first two functions don’t use arguments, while the “DoOdystarAction”
function has three arguments:
¢ an input file specification,
¢ an output file specification,
¢ a “general purpose parameter”.
As an example, the simplest script is:
on DoOdystarAction(inInputFile, inOutputFile,
inParameter)
return “Success”
end DoOdystarAction
on SetupOdystarScript()
return “Success”
end SetupOdystarScript
on FinalizeOdystarScript()
return “Success”
end FinalizeOdystarScript
The specified script will be compiled the first time a Job needs to be
processed in the Process Folder. After compilation, the function
“SetupOdystarScript” will be called.
Then, for each file that needs to be processed, the function “DoOdystarAction”,
and the three parameters will be filled in. The script
can perform whatever it needs to do, and return a status string.
Odystar Reference Manual
AppleScript 2
533
When the “Run AppleScript” Gateway quits, the procedure “FinalizeOdystarScript”
will be called.
As long as the Gateway remains active, and the script is not modified
by the user, the script will remain in memory, and the state between
two calls to “DoOdystarAction” will be unmodified.
This means for example that the state of the script at the first call of
“DoOdystarAction” will be the same as at the end of the “Setup-
OdystarScript” procedure, and that the state when “FinalizeOdystarScript”
is called will be the same as the last call to
“DoOdystarAction”.
When a script is modified by the user, it will be reloaded when it is
changed on disk. This means changes to a script will indeed be
loaded if you edit the script and save it to disk.
Each of the above procedures has to return a status string, and the
status string should have one of the following values:
¢ “Success” tells the Gateway that the script code was executed
successfully. The Job will be sent to the success path.
¢ “Problem” tells the Gateway that the script code was executed,
but some (non fatal) problems were detected. The Job will be
sent to the problem path.
¢ “Error” (or any other value different from “Success” or “Problem”)
tells the Gateway that the script code was executed with
an error. The Job will be sent to the error path.
2.2. Additional Commands
On top of this basic functionality, the following functions can be
used to access and manipulate the content of the Job ticket. These
commands must be sent to “Run AppleScript” application.
¢ “current parameter block” returns a “handle” to the parameters
of the current Process Folder, which is a Process Folder derived
from the “Run AppleScript” Gateway.
¢ “parameter block of Process Folder ” will get the
parameter block associated with the specified Process Folder,
and it will copy those parameters in the current ticket, and then
it returns a handle to the inserted parameter block.
¢ “close parameter block ” will “close” the specified parameter
block, and the data will be saved if that parameter
block was modified.
534
¢ “get item of ” returns a “handle” to the
item specified by (which is string) in the hierarchy
starting from the location specified by .
¢ “set value of ” will change the value of the
specified parameter to the specified value.
¢ “get value of ” will return the value of the specified
item as a string. This will return an error if the current item
is a group.
An example:
tell application “Run AppleScript”
– first we need to get the parameters of the
– trapper.
set trapParameters to parameter block of Process
Folder “Trap”
– now we have a reference to the parameter
– block of a trapping process folder. now we
– need to get the actual parameter.
– we first need the “Trapping” section
set group to item “Trapping” of trapParameters
– and from the trapping group, we need the dis
– tance parameter
set param to item “Distance” of group
– and now that we have a reference to the pa
– rameter, we can change the value of the pa
– rameter
set value of param to “2.0”
end tell

I can send you the info if you are interested in trying to adapt your script for our use. It would be extremely helpful. Let me know what you think. Thanks so much!!

Model: MacPro
Browser: Safari 533.19.4
Operating System: Mac OS X (10.6)

Hi everybody,

Finally I found some time to write a native command line tool, which does the PDF page to JPEG image conversion on Mac OS X 10.4/5/6 and hopefully also on the upcoming Mac OS X 10.7 :smiley:

The AppleScript now also allows to specify certain page numbers to be processed and to choose a folder to save the produced JPEG files. Plus, the conversion process is a lot faster.

Is this is good news for you, you can download the AppleScript for free right here:

Download for Mac OS X 10.4
Download for Mac OS X 10.5 and higher

Happy scripting!

Hi Martin,

i have just downloaded your PDF2JPEG.app and tested it:
it is very quick, comfortable, effective in creating presentable results.

Only my HD is complaining: 15.6 MB pdfs become 298 MB jpgs .

Thank you so much

Peter

The Script works GREAT! Ever since I Downloaded Lion I have been looking for a substitute for Port Peg, and I believe I found it. I just have the same request as someone previously did, and that is being able to drop folders into the script. If this is easy for you to do then I would greatly appreciate it, if not I still greatly appreciate what you have already done.

Hello,
The script works very well. Could you add the functionality of the conversion of color image to grayscale?

Hi,

This script is very fast. Unfortunately the resolution doesn’t work at all. Every value I set, I always get 72dpi jpg :frowning:
It passes the resolution value to .py script and in the .py I can see there are these lines:

def pdf2jpg(pdfpath, resolution=72):

     rf = resolution / 72.0
    pdfimage.setSize_((width*rf, height*rf))

but I’m not able to check where is the problem, seeing that I don’t know Python at all. Even changing the default value with another one, no way.

So the sips command could be an idea having 1 page pdf. But the right command is this next:

sips “in.pdf” -s format jpeg --out “out.jpg”

UPDATE:
You can find a more recent script here:

http://www.macionette.com/blog/?page_id=5