Open document in Pages, and export as PDF?

I’m surely making a beginner’s error here, but I’m completely baffled by what should be easy.

I can open a DOCX file in Pages by opening the file’s alias, like this:

set tmpDocPath to (POSIX file (tmpPosix & "temporary.docx")) as string
set tmpDocAlias to tmpDocPath as alias
tell application "Pages"
open tmpDocAlias
delay 5
end tell

(The delay is in there because Pages takes time to convert a DOCX file.)

But I can’t figure out how to export the file as PDF. Everything I’ve tried produces error messages. I thought this should work, but it doesn’t:

xport document 1 to POSIX file "/Users/edward/Desktop/output.pdf" as PDF

The error is:

What obvious mistake am I making here?

My understanding is that the error message is inaccurate.
The problem is that Pages is unable to execute POSIX file “/Users/edward/Desktop/output.pdf”.

Try to use :

set tmpPosix to POSIX path of (path to desktop)
set tmpDocFile to (POSIX file (tmpPosix & "en intervention.pages")) # this object is accepted by Pages
set PDFFile to POSIX file (tmpPosix & "output.pdf") # this object is accepted by Pages

tell application "Pages"
	open tmpDocFile # I don't have a docx file available
	delay 5
	export document 1 to PDFFile as PDF
end tell

Yvan KOENIG running High Sierra 10.13.5 in French (VALLAURIS, France) samedi 16 juin 2018 17:59:34

Thank you, as always, Yvan.

I seem to have solved this, with your help. One oddity is that Pages seems to be unable to use a POSIX path to open a DOCX file in the temporary-items folder; I can only open it by specifying an alias.

The other oddity is that I seem to need to activate Pages before it will export a PDF. I’ll ask in another thread about running Pages invisibly.

That’s sandboxing in action. Sandboxed apps can’t do anything with paths – they need file objects like aliases.

As I am curious, I used LibreOffice to create a docx file.
Here is what I wrote which waits only the required time for the end of the opening process.

set fileName to "temporary.docx"
set tmpPosix to POSIX path of (path to desktop)
set tmpDocFile to (POSIX file (tmpPosix & fileName)) # this object is accepted by Pages
set PDFFile to POSIX file (tmpPosix & "output.pdf") # this object is accepted by Pages
set bareName to my supprime(fileName, {".docx", ".doc", ".txt", ".rtf"}) # remove the extension (with its dot) to get the name of the opened document
tell application "Pages"
	open tmpDocFile
	repeat # loop waiting until the document is really open
		if name of documents contains bareName then exit repeat
		delay 0.5
	end repeat
	export document 1 to PDFFile as PDF
end tell


#=====
(*
removes every occurences of d in text t
*)
on supprime(t, d)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set l to text items of t
	set AppleScript's text item delimiters to ""
	set t to l as text
	set AppleScript's text item delimiters to oTIDs
	return t
end supprime

#=====

I guess that only Shane may give you a code which don’t need to activate Pages.
It would require the availability of an ASObjC command allowing us to extract the content of a docx file.

Yvan KOENIG running High Sierra 10.13.5 in French (VALLAURIS, France) dimanche 17 juin 2018 13:19:23

Getting the contents of the .docx file isn’t so hard, but producing a PDF is more complex:

use AppleScript version "2.4"
use framework "Foundation"
use framework "AppKit"

-- classes, constants, and enums used
property NSAutoPagination : a reference to 0
property NSClipPagination : a reference to 2
property NSThread : a reference to current application's NSThread
property NSPrintJobSavingURL : a reference to current application's NSPrintJobSavingURL
property NSPrintOperation : a reference to current application's NSPrintOperation
property NSPrintSaveJob : a reference to current application's NSPrintSaveJob
property NSURL : a reference to current application's NSURL
property NSString : a reference to current application's NSString
property NSTextView : a reference to current application's NSTextView
property NSPrintInfo : a reference to current application's NSPrintInfo
property NSAttributedString : a reference to current application's NSAttributedString
property NSData : a reference to current application's NSData
property NSUUID : a reference to current application's NSUUID
property NSDictionary : a reference to current application's NSDictionary

property theResult : false -- whether it succeeded or not

using terms from scripting additions
	set thePath to POSIX path of (choose file)
end using terms from

set thePath to NSString's stringWithString:thePath
set newPath to thePath's stringByDeletingPathExtension()'s stringByAppendingPathExtension:"pdf"
set theURL to NSURL's fileURLWithPath:thePath
set attStr to NSAttributedString's alloc()'s initWithURL:theURL options:(NSDictionary's dictionary()) documentAttributes:(missing value) |error|:(missing value)
my saveStyledText:attStr asPDFToFile:newPath

on saveStyledText:styledText asPDFToFile:newPath
	-- create print info for saving to file
	set destURL to NSURL's fileURLWithPath:newPath
	set printInfo to NSPrintInfo's alloc()'s initWithDictionary:(NSDictionary's dictionaryWithObject:destURL forKey:(NSPrintJobSavingURL)) -- sets destination
	printInfo's setJobDisposition:NSPrintSaveJob -- save to file job
	printInfo's setHorizontalPagination:NSClipPagination
	printInfo's setVerticalPagination:NSAutoPagination
	-- get page size and margins
	set pageSize to printInfo's paperSize()
	set theLeft to printInfo's leftMargin()
	set theRight to printInfo's rightMargin()
	set theTop to printInfo's topMargin()
	-- make a very deep text view
	set theView to NSTextView's alloc()'s initWithFrame:{{0, 0}, {(pageSize's width) - theLeft - theRight, 3.0E+38}}
	theView's setHorizontallyResizable:false
	-- put in the text
	theView's textStorage()'s setAttributedString:styledText
	-- size to fit; must be done on the main thread
	if NSThread's isMainThread() then
		theView's sizeToFit()
	else
		theView's performSelectorOnMainThread:"sizeToFit" withObject:(missing value) waitUntilDone:true
	end if
	-- create print operation and run it
	set printOp to NSPrintOperation's printOperationWithView:theView printInfo:printInfo
	printOp's setShowsPrintPanel:false
	printOp's setShowsProgressPanel:false
	if NSThread's isMainThread() then
		set my theResult to printOp's runOperation()
	else
		my performSelectorOnMainThread:"runPrintOperation:" withObject:printOp waitUntilDone:true
	end if
end saveStyledText:asPDFToFile:

on runPrintOperation:printOp -- on main thread
	set my theResult to printOp's runOperation()
end runPrintOperation:

Thanks Shane.

It seems that my memory failed which told me that you already posted somewhere the code to save styled text as pdf.

Surprising feature : in my tests, your script generated a file with 55686 bytes encoded using Mac OS X 10.3.5 Quartz PDFContext while the script using Pages generated only 36579 bytes using the same encoding.

Yvan KOENIG running High Sierra 10.13.5 in French (VALLAURIS, France) dimanche 17 juin 2018 15:41:33

Shane’s script quickly produced a pdf, but with garbled text, such as

How do I convert this back to readable text?

@akim

No idea, here the generated pdf is perfectly readable.

Yvan KOENIG running High Sierra 10.13.5 in French (VALLAURIS, France) dimanche 17 juin 2018 18:26:09

Following up on your experience, Yvan, I removed the password protection on the docx. After that, the gibberish resolved, and most of the original docx text came through in the pdf. Thanks.

What ASObjC method would I use to remove Microsoft Word’s password protection?

I previously relied on Microsoft Word’s applescript commands, both to open a passworded docx and then save as a pdf. It would be nice to use an ASObjC method.

Your memory’s fine – most of that was in a script to convert from html to styled text, and then to PDF, I posted on Apple’s mailing list some time back.

I can only speculate that Pages is modifying the NSPageInfo differently.

I suspect that’s something proprietary to Microsoft.

One difference between (one the one hand) a PDF generated by Shane’s terrific script or a PDF generated by Microsoft Word’s export and (on the other hand) a PDF generated by Pages is this:

Footnote numbers in PDFs generated by Pages are live links between the footnote number in the test and the note itself.

Footnote numbers in PDFs generated by Word aren’t live. Shane’s script strips out the footnotes (as does TextEdit, if I remember correctly).

Yes, I’d expect the script to follow TextEdit’s behavior in most, if not all, respects.

That is, sandboxed app works only with existing files. Thanks for the important information. So, we will not use non-existent Posix files…

Let’s try it ourselves.
Pages.app: Batch export doc or docx file to different formats:


set DOCXfile to (choose file of type {"doc", "docx"})
set destinationFolder to (path to desktop) as text

tell application "Pages"
	(open DOCXfile)
	set docName to name of document 1
	set EPUBExportFileName to destinationFolder & docName & "_Exported:" & docName & ".epub"
	set TXTExportFileName to destinationFolder & docName & "_Exported:" & docName & ".txt"
	set PDFExportFileName to destinationFolder & docName & "_Exported:" & docName & ".pdf"
	set WORDExportFileName to destinationFolder & docName & "_Exported:" & docName & ".docx"
	set PagesExportFileName to destinationFolder & docName & "_Exported:" & docName & ".pages"
	set RichTXTExportFileName to destinationFolder & docName & "_Exported:" & docName & ".rtf"
end tell

tell application "Finder"
	try
		set theFolder to make new folder at destinationFolder with properties {name:(docName & "_Exported")}
	on error
		set theFolder to folder (destinationFolder & docName & "_Exported")
	end try
	try -- the most important piece (block) of code
		make new file at theFolder with properties {name:(docName & ".epub")}
		make new file at theFolder with properties {name:(docName & ".txt")}
		make new file at theFolder with properties {name:(docName & ".pdf")}
		make new file at theFolder with properties {name:(docName & ".docx")}
		make new file at theFolder with properties {name:(docName & ".pages")}
		make new file at theFolder with properties {name:(docName & ".rtf")}
	end try
end tell

with timeout of 600 seconds -- maximum 10 minutes
	tell application "Pages"
		export document 1 to file EPUBExportFileName as EPUB
		export document 1 to file TXTExportFileName as unformatted text
		export document 1 to file PDFExportFileName as PDF
		export document 1 to file WORDExportFileName as Microsoft Word
		export document 1 to file PagesExportFileName as Pages 09
		export document 1 to file RichTXTExportFileName as formatted text
		close document 1 saving no
		quit
	end tell
end timeout

Pages (and presumably other word processors) can open password-protected Word documents. Per Apple, Pages 11 can use passwords in applescript. I’m running an older version of Pages (7) so when I open a docx, I get a prompt to enter the password —either manual or AS open— which I don’t believe is scriptable.

What’s new in Pages for Mac: Learn about the new features in Pages 11.0 for Mac
https://support.apple.com/en-us/HT207243