Export Text of PDF file as RTF file.

I tried 3 options:

  1. My option was using Preview, copy/paste and GUI scripting. It turned out to be the most unsuccessful, since a) it uses GUI scripting, b) TextEdit could not open the RTF obtained as a result. It could only open TXT format. Therefore, I raised my solution to garbage and do not even bring it anymore.

  2. The solution proposed by Shane Stanley turned out to be ideal in all respects, and many thanks to him for this. Here it is:


use AppleScript version "2.5"
use framework "Foundation"
use framework "Quartz"
use scripting additions

property |⌘| : a reference to current application
property NSDictionary : a reference to NSDictionary of |⌘|

set aPDF to choose file of type {"pdf"}
-- make source and destination URLs
set anURL to |⌘|'s |NSURL|'s fileURLWithPath:(POSIX path of aPDF)
set destinationURL to anURL's URLByDeletingPathExtension()'s URLByAppendingPathExtension:"rtf"
-- make PDF document
set aPDF to |⌘|'s PDFDocument's alloc()'s initWithURL:anURL
-- set the entire contents as a styled string
set attributedString to aPDF's selectionForEntireDocument()'s attributedString()
-- save it to .rtf
set aLength to attributedString's |length|()
set documentAttributes to NSDictionary's dictionaryWithObject:"NSRTF" forKey:"DocumentType"
set rtfData to attributedString's ¬
	RTFFromRange:(|⌘|'s NSMakeRange(0, aLength)) documentAttributes:documentAttributes
rtfData's writeToURL:destinationURL atomically:true
  1. The solution proposed by Fredrik71 uses the pdftotext utility. I didn’t really like it: although it does not use GUI scripting, it has the same drawback as my solution. TextEdit cannot open the resulting RTF file.
    Purely for the curious I bring it here:

set aPDFpath to POSIX path of (choose file of type "pdf")
set destinationRTFpath to (text 1 thru -4 of aPDFpath) & "rtf"
do shell script "/usr/local/bin/pdftotext -layout -q " & quoted form of aPDFpath & " " & quoted form of destinationRTFpath

I wondered if someone might post a pdf to text version of Shane’s script. I tried replacing rtf with txt but it didn’t work. Thanks.

use AppleScript version "2.5"
use framework "Foundation"
use framework "Quartz"
use scripting additions

set aPDF to choose file of type {"pdf"}
set aURL to current application's |NSURL|'s fileURLWithPath:(POSIX path of aPDF)
set destinationURL to aURL's URLByDeletingPathExtension()'s URLByAppendingPathExtension:"txt"
set aPDF to current application's PDFDocument's alloc()'s initWithURL:aURL
set theString to aPDF's |string|()
theString's writeToURL:destinationURL atomically:true encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)

Thanks Shane. I ran the script and received the following error:

I double checked and I did copy and paste the entire script. Also, I tested the RTF script on the same PDF and it worked fine.

Should be fixed now.

That works great–thanks.

Shane’s PDF-to-Text script in post 5 works great, but it would be helpful for me if spaces were removed from the beginning of all lines. I modified Shane’s script to remove these spaces, and my script appears to work OK, but it involves what are probably unnecessary file reads/writes. Anyways I wondered if Shane’s script could be easily changed to do this directly. Thanks.

A bit of regular expression work should do it:

use AppleScript version "2.5"
use framework "Foundation"
use framework "Quartz"
use scripting additions

set aPDF to choose file of type {"pdf"}
set pdfPath to aPDF as text

set aURL to current application's |NSURL|'s fileURLWithPath:(POSIX path of aPDF)
set destinationURL to aURL's URLByDeletingPathExtension()'s URLByAppendingPathExtension:"txt"
set aPDF to current application's PDFDocument's alloc()'s initWithURL:aURL
set theString to aPDF's |string|()
set theString to theString's stringByReplacingOccurrencesOfString:"(\\R)\\h+" withString:"$1" options:(current application's NSRegularExpressionSearch) range:{0, theString's |length|()}
set theString to theString's stringByReplacingOccurrencesOfString:"^\\h+" withString:"" options:(current application's NSRegularExpressionSearch) range:{0, theString's |length|()}
theString's writeToURL:destinationURL atomically:true encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)

That works great–thanks.