PDF Selection

Outside of GUI scripting, how can I capture selected text in a pdf?

CUPS is a printing system installed on macOS that can be used to convert text to PDF. Because of my extremely limited knowledge of CUPS, the only customization I can offer is whether the output is in portrait or landscape mode. That having been said, the following is a quick-and-dirty text-to-PDF converter that may or may not be sufficient for your needs:


-- It's not clear from the post how the selected text is being captured; the following assumes that it has been saved to the clipboard:

set inputText to the clipboard

-- The output PDF file will be saved to the desktop; this can be modified to any desired destination

set outputPath to ((path to desktop as text) & "MyPDFFile.pdf")'s POSIX path

-- The "do shell script" "echo" command writes the text to a temporary text file, which "cupsfilter" transforms and writes to the output PDF file; the "-D" option causes the temporary text file to be deleted after it is read:

do shell script "" & ¬
	"echo " & inputText's quoted form & " >/tmp/MyTempFile.txt" & linefeed & ¬
	"cupsfilter -D /tmp/MyTempFile.txt >" & outputPath's quoted form

-- For the output PDF file to be in landscape rather than portrait mode, simply add the "-o landscape" option:

do shell script "" & ¬
	"echo " & inputText's quoted form & " >/tmp/MyTempFile.txt" & linefeed & ¬
	"cupsfilter -D -o landscape /tmp/MyTempFile.txt >" & outputPath's quoted form


Any other customization of the PDF output, for instance, font name and size, page size and margins, etc, will have to come from someone with more extensive knowledge of CUPS.

I found this helpful resource for more CUPS command options. The options are listed with the lp command but seem to work just as well with the cupsfilter command. Multiple “-o [option value]” expressions can be included in the command to combine their effects. Just be sure that there is exactly one space character between any text entries.

To set the media size to Legal (default = Letter):

cupsfilter ... -o media=Legal

To set the orientation to landscape (default = portrait):

cupsfilter ... -o landscape ...

To set the number of characters per inch to 16/inch (default = 10):

cupsfilter ... -o cpi=16 ...

To set the number of lines per inch to 8/inch (default = 6):

cupsfilter ... -o lpi=8 ...

To set the number of text columns to 2 columns (default = 1):

cupsfilter ... -o columns=2 ...

To set the page margins, use one or more options as desired; replace [value] by the margin in points, where 1 point = 1/72 inch:

cupsfilter ... -o page-left=[value] -o page-right=[value] -o page-top=[value] -o page-bottom=[value] ...

For example, to set 0.5 inch left and right page margins, and 1.5 inch top and bottom margins:

cupsfilter ... -o page-left=36 -o page-right=36 -o page-top=108 -o page-bottom=108 ...

Putting it all together:


set inputText to the clipboard
set outputPath to ((path to desktop as text) & "MyPDFFile.pdf")'s POSIX path
do shell script "" & ¬
	"echo " & inputText's quoted form & " >/tmp/MyTempFile.txt" & linefeed & ¬
	"cupsfilter -D -o media=Legal -o landscape -o cpi=16 -o lpi=8 -o columns=2 -o page-left=36 -o page-right=36 -o page-top=108 -o page-bottom=108 /tmp/MyTempFile.txt >" & outputPath's quoted form

Selected where?

In an active PDF document, I want to capture the text of the selection, to then generate data for an applescript. For example if in one PDF document, I select a textual reference to a page number in another pdf document, I can then run an applescript to go to the page number of the second pdf document.
I have been able to script this via System Events’ GUI approaches, but I would like to create a script that does not rely on GUI either to copy selected text, or to find a page number or text in another pdf document.

Sorry, I misunderstood your request. I thought you meant that the pdf was the target in which to store the selected text. Now I realize that the pdf is the source of the selected text.

How you do it(or if you can) going to depend on which application you’re using to view the PDF. Selections are an application thing, not a file thing.

This is an old thread, but bmose’s detailed posts on cupsfilter were a great help on a script I just finished, and I wanted to say thanks.

You’re welcome, peavine. As is evident from my posts, much was learned on the fly. cupsfilter does offer a convenient way to create pdf documents with the coding terseness characteristic of many Unix tools.

Use “Skim” PDF viewer.
https://skim-app.sourceforge.io


– Created by: Takaaki Naganoya
– Created on: 2019/09/16

– Copyright © 2019 Piyomaru Software, All Rights Reserved
http://piyocast.com/as/archives/7387

use AppleScript version “2.4”
use scripting additions
use framework “Foundation”

tell application “Skim”
tell front document
set aSel to selection
repeat with i in aSel
set aCon to contents of i
set rList to RTF of aCon

		set sCon to ""
		repeat with ii in rList
			set the clipboard to ii
			set aText to getClipboardAsText() of me
			set aCon to aCon & aText
		end repeat
		
		set aStr to textfy(aCon) of me
		return aStr
	end repeat
end tell

end tell

–Normalize Unicode Text in NFKC
on textfy(aText as string)
set aStr to current application’s NSString’s stringWithString:aText
set aNFKC to aStr’s precomposedStringWithCompatibilityMapping()
return aNFKC as string
end textfy

–Get Clipboard contents as text
on getClipboardAsText()
– get the pasteboard items
set theClip to current application’s NSPasteboard’s generalPasteboard()
set pbItems to theClip’s pasteboardItems()

set theStrings to {}

repeat with anItem in pbItems
	if (anItem's types()'s containsObject:(current application's NSPasteboardTypeString)) then
		set end of theStrings to (anItem's stringForType:(current application's NSPasteboardTypeString)) as text
	end if
end repeat

return theStrings as text

end getClipboardAsText

Model: MacBook Pro 2012
AppleScript: 2.7
Browser: Safari 12.1.2
Operating System: macOS 10.14

Skim.app has selection property, so getting selection is straight:


tell application "Skim"
	activate
	set aSelection to (selection of document 1) as text
end tell

Simple, plane AppleScript for applications, which open the PDFs, but doesn’t have selection property :


set the clipboard to ""

tell application id "com.apple.Preview" -- or "com.lymes.BookReaderPaddle" (BookReader.app),
-- "com.apple.Safari (Safari.app), "com.google.Chrome" (Google Chrome.app), for Infix PDF Editor (is Windows exe program inside Crossover.app's bottle) use name "Infix PDF Editor" instead of its id...
	set appName to its name
	activate
	tell application "System Events" to tell process appName
		set visible to true
		keystroke "c" using command down
	end tell
end tell

set {aSelection, maxTimeOut} to {"", 0}

repeat while aSelection = ""
	set maxTimeOut to maxTimeOut + 0.1
	if maxTimeOut > 1 then return display notification "NOTHING SELECTED." with title "Do selection and run the script again" sound name "Frog"
	set aSelection to (the clipboard) as text
end repeat