The latest version of ABBYY FineReader for OS X now supports AppleScript. I try to write a simple script that does an OCR of a given pdf document and saves it as a pdf document to make it searchable. Unfortunately, I’m a beginner in AppleScript and can’t get it to work. I couldn’t find further documentation or samples for scripting ABBYY FineReader.
I managed to open the dictionary of FineReader, it has this function:
export to pdf‚v : Converts the current document to a PDF file. If FineReader is running in a Sandbox, the file will be saved to a temporary directory.
export to pdf file : NO_DESCRIPTION
[ocr languages enum language list type] : List of recognition languages that includes language identifiers and full language names.
[saving type save settings enum] : Specifies file creation settings for saving results.
[export mode pdf layout] : Specifies export mode.
[keep page numbers headers and footers boolean] : Keeps headers, footers and page numbers.
[page size page size enum] : Specifies paper size.
[keep pictures boolean] : Keeps pictures in recognized document.
[image quality image quality enum] : Specifies quality of pictures in output file.
[keep text and background colors boolean] : Keeps background and character colors.
[use mrc boolean] : Compresses the output file significantly while retaining high quality of text and images.
[make pdfa boolean] : Creates a searchable PDF document that is well suited for archiving.
[create outline boolean] : Creates a table of contents in a PDF file based on headings.
[enable pdf tagging boolean] : Enables PDF tags.
[embed fonts boolean] : Embeds fonts from the document in the e-book.
→ file :
I tried this script:
tell application "FineReader OCR Pro"
export to pdf "<path to pdf>"
end tell
However, I get the output “missing value”. What is wrong?
Thanks,
Markus
I actually found out that is the output file. If I load a pdf to OCR within the Fine Reader user interface and then run the apple script, it works fine.
So you are right, it assumes that a current document is already loaded when using “export to pdf”. However, I cannot find a way to load a document via AppleScript. It looks like to me this interface is not complete.
if you look at the “export to pdf” definition closely, you will see an optional “from file” parameter. Here you specifiy the path to the pdf file to OCR. So a very simple script to add an OCR layer to an existing pdf file looks like this:
tell application "FineReader"
export to pdf "/Path/to/filename/File_to_OCR.pdf" from file "/Path/to/filename/File_to_OCR.pdf"
end tell
Just tested this in AS editor with FineReader Pro 12.1.1, works flawlessly!
HTH,
Hi Stefan,
thanks for answer, it makes sense, but it leaves me scratching my head.
I checked - I also have FineReader Pro 12.1.1 installed. But there is no “from file” parameter (compare the doc I attached to the first post).
If i try your code, it says ‘Expected end of line, etc. but found “from”’.
I reinstalled FineReader, but no change.
Can you post the doc you have of “export to pdf”?
Thanks
Markus
Thanks, this was the missing piece. I have the App Store version. I installed it via the dmg and the missing parameter is there. Apparently, 12.1.1 is not 12.1.1. Will contact the support.
Ok, the Abbyy Support answered. I will try it out and post the results
set testPath to "/Volumes/data/_FR/auto_test_appstore"
set fromFile to POSIX file (testPath & "/1/")
set appFile to POSIX file "/Applications/FineReader OCR Pro.app"
set exportDir to testPath & "/out/" -- the dir must exist before script run
set exportFileName to "res2.docx"
using terms from application "FineReader OCR Pro"
set langList to {English, German}
set saveType to single file
end using terms from
using terms from application "FineReader OCR Pro"
set toFile to POSIX file (exportDir & exportFileName)
set retainLayoutWordLayout to as editable copy
set keepPageNumberHeadersAndFootersBoolean to yes
set keepLineBreaksAndHyphenationBoolean to yes
set keepPageBreaksBoolean to yes
set pageSizePageSizeEnum to automatic
set increasePaperSizeToFitContentBoolean to yes
set keepImageBoolean to yes
set imageOptionsImageQualityEnum to balanced quality
set keepTextAndBackgroundColorsBoolean to yes
set highlightUncertainSymbolsBoolean to yes
set keepPageNumbersBoolean to yes
end using terms from
WaitWhileBusy()
tell application "FineReader OCR Pro"
set hasdoc to has document
if hasdoc then
close document
end if
end tell
WaitWhileBusy()
tell application "FineReader OCR Pro"
set auto_read to auto read new pages false
end tell
tell application "Finder"
open fromFile ¬
using appFile
end tell
delay 5
WaitWhileBusy()
tell application "FineReader OCR Pro"
export to docx toFile ¬
ocr languages enum langList ¬
saving type saveType ¬
retain layout retainLayoutWordLayout ¬
keep page numbers headers and footers keepPageNumberHeadersAndFootersBoolean ¬
keep line breaks and hyphenation keepLineBreaksAndHyphenationBoolean ¬
keep page breaks keepPageBreaksBoolean ¬
page size pageSizePageSizeEnum ¬
increase paper size to fit content increasePaperSizeToFitContentBoolean ¬
keep pictures keepImageBoolean ¬
image quality imageOptionsImageQualityEnum ¬
keep text and background colors keepTextAndBackgroundColorsBoolean ¬
highlight uncertain characters highlightUncertainSymbolsBoolean ¬
keep line numbers keepPageNumbersBoolean
end tell
WaitWhileBusy()
-- moving exported file if FineReader is sendboxed--
tell application "FineReader OCR Pro"
set sandb to is sandboxed
end tell
if sandb then
tell application "FineReader OCR Pro"
set outputDir to get output dir
end tell
--set POSIX_exportFile to ((outputDir as string) & exportFileName)
set POSIX_exportDir to POSIX file exportDir
tell application "Finder"
set the_files to files of folder outputDir
repeat with this_file in the_files
duplicate this_file to POSIX_exportDir replacing yes
end repeat
end tell
end if
-- END moving exported file --
tell application "FineReader OCR Pro"
auto read new pages auto_read
close document
quit
end tell
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
on WaitWhileBusy()
repeat while IsMainApplicationBusy()
end repeat
end WaitWhileBusy
on IsMainApplicationBusy()
tell application "FineReader OCR Pro"
set resultBoolean to is busy
end tell
return resultBoolean
end IsMainApplicationBusy