Scripting ABBYY FineREader with AppleScript

markusloffler · January 10, 2015, 10:57am

The latest version of ABBYY FineReader for OS X now supports AppleScript. I try to write a simple script that does an OCR of a given pdf document and saves it as a pdf document to make it searchable. Unfortunately, I’m a beginner in AppleScript and can’t get it to work. I couldn’t find further documentation or samples for scripting ABBYY FineReader.

I managed to open the dictionary of FineReader, it has this function:

export to pdf‚v : Converts the current document to a PDF file. If FineReader is running in a Sandbox, the file will be saved to a temporary directory.
export to pdf file : NO_DESCRIPTION
[ocr languages enum language list type] : List of recognition languages that includes language identifiers and full language names.
[saving type save settings enum] : Specifies file creation settings for saving results.
[export mode pdf layout] : Specifies export mode.
[keep page numbers headers and footers boolean] : Keeps headers, footers and page numbers.
[page size page size enum] : Specifies paper size.
[keep pictures boolean] : Keeps pictures in recognized document.
[image quality image quality enum] : Specifies quality of pictures in output file.
[keep text and background colors boolean] : Keeps background and character colors.
[use mrc boolean] : Compresses the output file significantly while retaining high quality of text and images.
[make pdfa boolean] : Creates a searchable PDF document that is well suited for archiving.
[create outline boolean] : Creates a table of contents in a PDF file based on headings.
[enable pdf tagging boolean] : Enables PDF tags.
[embed fonts boolean] : Embeds fonts from the document in the e-book.
â†’ file :

I tried this script:

tell application "FineReader OCR Pro"
    export to pdf "<path to pdf>"
end tell

However, I get the output “missing value”. What is wrong?
Thanks,
Markus

kel1 · January 11, 2015, 8:59pm

it looks like you don’t need this: “” because it says export the current pdf.

Edited: try looking for current pdf in the dictionary.

gl,
kel

markusloffler · January 12, 2015, 9:07pm

I actually found out that is the output file. If I load a pdf to OCR within the Fine Reader user interface and then run the apple script, it works fine.

So you are right, it assumes that a current document is already loaded when using “export to pdf”. However, I cannot find a way to load a document via AppleScript. It looks like to me this interface is not complete.

Thanks anyway!

MacScriptorius · January 14, 2015, 10:05am

Markus,

if you look at the “export to pdf” definition closely, you will see an optional “from file” parameter. Here you specifiy the path to the pdf file to OCR. So a very simple script to add an OCR layer to an existing pdf file looks like this:

tell application "FineReader"
	export to pdf "/Path/to/filename/File_to_OCR.pdf" from file "/Path/to/filename/File_to_OCR.pdf"
end tell

Just tested this in AS editor with FineReader Pro 12.1.1, works flawlessly!
HTH,

Stefan

markusloffler · January 14, 2015, 10:39am

Hi Stefan,
thanks for answer, it makes sense, but it leaves me scratching my head.
I checked - I also have FineReader Pro 12.1.1 installed. But there is no “from file” parameter (compare the doc I attached to the first post).
If i try your code, it says ‘Expected end of line, etc. but found “from”’.
I reinstalled FineReader, but no change.
Can you post the doc you have of “export to pdf”?
Thanks
Markus

MacScriptorius · February 2, 2015, 4:36pm

Strange.

This is how the entry looks in my installation:

export to pdf‚v : Speichert das aktuelle Dokument als PDF-Datei. Wenn FineReader in einer Sandbox ausgefÃ¼hrt wird, wird die Datei in einem temporÃ¤ren Verzeichnis gespeichert.
export to pdf file : NO_DESCRIPTION
[from file file] :
[ocr languages enum language list type] : Die Liste der Erkennungssprachen umfasst Sprachenkennungen und die vollstÃ¤ndigen Sprachenbezeichnungen.
[saving type save settings enum] : Spezifiziert die Dateierstellungseinstellungen zur Speicherung der Ergebnisse.
[export mode pdf layout] : Spezifiziert den Exportmodus.
[keep page numbers headers and footers boolean] : Kopf- und FuÃŸzeilen und Seitenzahlen werden beibehalten.
[page size page size enum] : Spezifiziert PapiergrÃ¶ÃŸe.
[keep pictures boolean] : Bewahrt Bilder im erkannten Dokument.
[image quality image quality enum] : Spezifiziert die BildqualitÃ¤t in der Ausgabedatei.
[keep text and background colors boolean] : Bewahrt den Hintergrund und die Zeichenfarben.
[use mrc boolean] : Komprimiert die Ausgabedatei erheblich, und erhÃ¤lt gleichzeitig die hohe QualitÃ¤t von Text und Bildern.
[make pdfa boolean] : Erstellt ein durchsuchbares PDF-Dokument, das sich gut fÃ¼r die Archivierung eignet.
[create outline boolean] : Erstellt ein Inhaltsverzeichnis in einer PDF-Datei auf Grundlage der Ãœberschriften.
[enable pdf tagging boolean] : Aktiviert PDF-Tags.
[embed fonts boolean] : Bettet die Schriftarten des Dokuments in das E-Book ein.
â†’ file :

I have the direct sales version of Finereader.

Cheers,

Stefan

markusloffler · February 2, 2015, 8:13pm

Hi Stefan,

I have the direct sales version of Finereader.

Thanks, this was the missing piece. I have the App Store version. I installed it via the dmg and the missing parameter is there. Apparently, 12.1.1 is not 12.1.1. Will contact the support.

SchÃ¶ne Woche,
Markus

markusloffler · February 18, 2015, 5:07pm

Ok, the Abbyy Support answered. I will try it out and post the results


set testPath to "/Volumes/data/_FR/auto_test_appstore"

set fromFile to POSIX file (testPath & "/1/")
set appFile to POSIX file "/Applications/FineReader OCR Pro.app"

set exportDir to testPath & "/out/" -- the dir must exist before script run
set exportFileName to "res2.docx"

using terms from application "FineReader OCR Pro"
	set langList to {English, German}
	set saveType to single file
end using terms from

using terms from application "FineReader OCR Pro"
	set toFile to POSIX file (exportDir & exportFileName)
	set retainLayoutWordLayout to as editable copy
	set keepPageNumberHeadersAndFootersBoolean to yes
	set keepLineBreaksAndHyphenationBoolean to yes
	set keepPageBreaksBoolean to yes
	set pageSizePageSizeEnum to automatic
	set increasePaperSizeToFitContentBoolean to yes
	set keepImageBoolean to yes
	set imageOptionsImageQualityEnum to balanced quality
	set keepTextAndBackgroundColorsBoolean to yes
	set highlightUncertainSymbolsBoolean to yes
	set keepPageNumbersBoolean to yes
end using terms from

WaitWhileBusy()

tell application "FineReader OCR Pro"
	set hasdoc to has document
	if hasdoc then
		close document
	end if
end tell

WaitWhileBusy()

tell application "FineReader OCR Pro"
	set auto_read to auto read new pages false
end tell

tell application "Finder"
	open fromFile ¬
		using appFile
end tell

delay 5

WaitWhileBusy()

tell application "FineReader OCR Pro"
	export to docx toFile ¬
		ocr languages enum langList ¬
		saving type saveType ¬
		retain layout retainLayoutWordLayout ¬
		keep page numbers headers and footers keepPageNumberHeadersAndFootersBoolean ¬
		keep line breaks and hyphenation keepLineBreaksAndHyphenationBoolean ¬
		keep page breaks keepPageBreaksBoolean ¬
		page size pageSizePageSizeEnum ¬
		increase paper size to fit content increasePaperSizeToFitContentBoolean ¬
		keep pictures keepImageBoolean ¬
		image quality imageOptionsImageQualityEnum ¬
		keep text and background colors keepTextAndBackgroundColorsBoolean ¬
		highlight uncertain characters highlightUncertainSymbolsBoolean ¬
		keep line numbers keepPageNumbersBoolean
	
end tell

WaitWhileBusy()

-- moving exported file if FineReader is sendboxed--

tell application "FineReader OCR Pro"
	set sandb to is sandboxed
end tell

if sandb then
	
	tell application "FineReader OCR Pro"
		set outputDir to get output dir
	end tell
	
	--set POSIX_exportFile to ((outputDir as string) & exportFileName)
	set POSIX_exportDir to POSIX file exportDir
	
	tell application "Finder"
		set the_files to files of folder outputDir
		repeat with this_file in the_files
			duplicate this_file to POSIX_exportDir replacing yes
		end repeat
	end tell
	
end if

-- END moving exported file --

tell application "FineReader OCR Pro"
	auto read new pages auto_read
	close document
	quit
end tell

---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------

on WaitWhileBusy()
	repeat while IsMainApplicationBusy()
	end repeat
end WaitWhileBusy

on IsMainApplicationBusy()
	tell application "FineReader OCR Pro"
		set resultBoolean to is busy
	end tell
	return resultBoolean
end IsMainApplicationBusy