Count pages in pdf files

Hi…
I´m looking for a script that can count pages in pdf files.
I have tried many scripts posted here, but none of them quite do the trick or do it right.

My pdf files is from customers and come in as many variants as you could expect from customers. :slight_smile:
Therefore scripts that rely on f.ex. "set AppleScript’s text item delimiters to “/Page /” and so on doesn´t work right as there seems to be too many ways to mark this info in pdf files.

Also my files are stored on servers so scripts that rely on spotlight info like mdls doesnt work as i don´t have indexed servers

Any good ideas that can count pages without Acrobat software.

I have tried to get image events to count, but so far no luck with that either…

Seems like a simple task, and its frustrating to not get this to work properly :frowning:

Any help or tips on this one ??

Thanks

Karl Sigvart

Hello,

maybe reading file’s metadata attributes

set p2File to "/path/to/my/portable document file.pdf"
try
	set pageCount to (do shell script "/usr/bin/mdls " & quoted form of p2File & " | /usr/bin/awk '/kMDItemNumberOfPages/{print $3}'") as integer
end try

Hi,

I wrote a small Foundation CLI to do this. The usage is

do shell script "/path/to/PDFPageCounter /path/to/file.pdf

the source code is very simple

[code]#import <Foundation/Foundation.h>
#import <Quartz/Quartz.h>

int main (int argc, const char * argv[]) {

if (argc != 2){
	printf("Usage: PDFPageCounter path\n");
	return 1;
}
int returnValue = 0;
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
NSString *filePath = [[[NSProcessInfo processInfo] arguments] objectAtIndex:1];
NSURL *docURL = [[NSURL alloc] initFileURLWithPath];
PDFDocument *doc = [[PDFDocument alloc] initWithURL:docURL];
if (doc) {
	printf("%i\n", [doc pageCount]);
	[doc release];
}
else {
	printf("no valid PDF document\n");
	returnValue = 2;
}
[docURL release];
[pool drain];
return returnValue;

}[/code]
You can download it here: PDFPageCounter

Hi,

I also wrote a foundation tool, which can be downloaded here (source code).

Usage is as follows, works for Mac OS X 10.4 and higher:


set toolpath to ((path to desktop) as text) & "pagenums"
set qtdtoolpath to quoted form of POSIX path of toolpath
set pdffilepath to "Macintosh HD:Users:martin:Desktop:Test.pdf"
set qtdpdffilepath to quoted form of POSIX path of pdffilepath

set command to qtdtoolpath & " -file " & qtdpdffilepath
set output to do shell script command

if output contains "Error" then
	error output
else
	set outputwords to words of output
	set pagecount to last item of outputwords
	display dialog "PDF: " & return & pdffilepath & return & return & "Pages: " & pagecount
end if

Happy counting!

From working with this in the past the difficulty is that Adobe keeps changing the way it indicates page breaks with each version so something that might work on a version 1 PDF won’t on a version 2. And since most (all?) versions of the PDF format do not have a page count in the header of the document then you have to account for all the variations that you might have. I have also noticed that PDF’s created by non-Adobe applications can create problems as well.

I will have to look at Martin and Stefan’s solution, thanks for posting them.
-Jerry

The mdls way is probably a tad quicker, because the actual calculation has been done:

	set theResult to (do shell script ("mdls -name kMDItemNumberOfPages " & aFilePosix))

might be, but.

:wink:

Ah, right…

Are you opposed to opening the file in Acrobat to count the pages? If not, the script can be pretty simple. And you can open it with “invisible” so the user doesn’t see the document opening. I tested this with Acrobat 9 (CS4) but it should work in CS3 as well (change the tell line to say “Adobe Acrobat Professional”).

set theFile to (path to desktop as text) & "Acrobat_JS_guide.pdf"
tell application "Adobe Acrobat Pro"
	open theFile with invisible
	set numPages to number of pages of document 1
	close document 1
end tell

numPages

Edit: Just re-read your original post - “Any good ideas that can count pages without Acrobat software”.
Never mind…

Model: iMacintel
AppleScript: xCode 3.0
Browser: Safari 525.20.1
Operating System: Mac OS X (10.5)

If i were you, I’d copy the pages to a spotlight indexed drive - like a thumbdrive or external harddrive or something, then use mdls to generate a report and after successful generation delete the duplicated files.

Best Regards

McUsr

Here’s a routine that I put together that combines all the advice in this post:


--NOTE THE SHELL SCRIPT MDLS RELIES ON SPOTLIGHT SO THE FILE MUST BE ON THE LOCAL HARD DRIVE FOR THIS SCRIPT TO WORK. SO IF THE USER SELECTS A FILE ON A SERVER, IT IS COPIED TO THE DESKTOP TEMPORARILY, THE PAGES ARE COUNTED AND THEN THE COPY IS DELETED.

set MyDesktop to path to desktop
set MyPDF to choose file with prompt "Please select a PDF file."

--CHECK IF THE PDF IS ON THE LOCAL DRIVE OR EXTERNALLY
if GetRoot(MyDesktop, 1, ":") is not GetRoot(MyPDF, 1, ":") then
	tell application "Finder"
		set MyPDFcopy to duplicate MyPDF to MyDesktop
		delay 2 --PAUSE TO ALLOW SPOTLIGHT TO INDEX THE PDF
		set PDF_Pages to my GetPDFNoPages(MyPDFcopy)
		delete MyPDFcopy
	end tell
else
	set PDF_Pages to my GetPDFNoPages(MyPDF)
end if

display dialog "There are " & PDF_Pages & " pages in this document."

on GetRoot(MyAlias, N, SP)
	set AppleScript's text item delimiters to SP
	set MyRoot to item N of (text items of (MyAlias as text))
	set AppleScript's text item delimiters to ""
	return MyRoot
end GetRoot

on GetPDFNoPages(MyPDF)
	set This_PDF to quoted form of POSIX path of (MyPDF as text)
	set PDF_Pages to do shell script "/usr/bin/mdls -name kMDItemNumberOfPages" & space & This_PDF
	set PDF_Pages to (GetRoot(PDF_Pages, 2, "kMDItemNumberOfPages = ")) as number
	return PDF_Pages
end GetPDFNoPages

Here is a piece of code which doesn’t rely upon Spotlight.

use AppleScript version "2.4" # Requires mac OSX 10.10 or higher
use scripting additions
use framework "Foundation"
use framework "Quartz"

set thePDF to (choose file of type {"pdf"} with prompt "Choose your PDF file:" without multiple selections allowed)

its countPages:(POSIX path of thePDF)

on countPages:posixPath
	set inNSURL to current application's class "NSURL"'s fileURLWithPath:posixPath
	-- make PDF document from the URL
	set theDoc to current application's PDFDocument's alloc()'s initWithURL:inNSURL
	return theDoc's pageCount()
end countPages:

Yvan KOENIG running Sierra 10.12.0 in French (VALLAURIS, France) mardi 25 octobre 2016 12:56:20

Hi Yvan,

thanks for this, I’ve been using this code for a while and put it into a script to run from within Indesign. It generally works but often I get the following error:

Error Number: -10000
Error String: -[PDFDocument PageCount]: unrecognized selector sent to instance 0x618000418bf0

If I run it from Script Debugger or Script Editor it doesn’t have a problem. Any ideas?
David

You might notice that it says PageCount rather than pageCount. AppleScriptObjC is case-sensitive.

One of the things about InDesign scripting is that all scripts are run in the same AppleScript component instance, which means they all share the same pool of variable names. So you can’t have one script using PageCount and another using pageCount.

So my guess is that you have another script that uses PageCount, and that the problem occurs if you run this script before the one that requires pageCount. The easiest solution is to change the variable in that other script.

That’s true, I’ve used “PageCount” all over the place in different scripts! It’s such a convenient choice of variable name. So “pageCount” seems to be a reserved name in AppleScript ObjC when it comes to PDFs. I’ll have to be careful with this, thanks :slight_smile:

It’s not really a reserved word – you can still use it as a variable – but you must match the case.

As a simple rule, you reduce the chance of problems drastically if you begin variable names with a lowercase letter.

Or use snake_case for AppleScript variables and camelCase for ASObjC selectors.

Yep, that will work – as long as you avoid trailing underscores.

And if the OP has lots of scripts they don’t want to change at this stage, they can also use pipes for the ASObjC selector.