Folder Action to auto-OCR files using Acrobat 9 Pro (CS4)

Hi

I hope that this is an appropriate place to post this AppleScript I wrote yesterday. It appears to work, but someone else might look it over. I searched for a way to have files auto-OCR’d using Acrobat, and surprisingly, I couldn’t find anything. I ran across a script, but it didn’t quite work correctly or in the way I wanted it to. So I cobbled together the following.

To use it, compile in Script Editor, and put a copy in /Library/Scripts/Folder Action Scripts. Then click on a folder and right click (control-click) on the folder in Finder and associate the script with the folder.

I haven’t figured out how to get my ScanSnap to use this without some help from the user. If I scan to this folder, the Folder Action takes over before the scanning is finished, so that both the scanner and Acrobat become confused. I tried to get the the action to wait until the scanner was done, but it seemed too “hacky” and I turned that off.

====


(*
PDF - OCR with Acrobat Pro

Derived in part from  
	* Apple sample code, 
	* code from http://www.documentsnap.com/files/OCRIt-Acrobat-1.1.zip,
	* code at http://www.macsparky.com/blog/2009/5/24/pdfpen-ocr-folder-action-script.html

Author: Alan Harper, alan@alanharper.com.
Copyright (c) 2010 Alan Harper. All Rights reserved, except.

Permission to use or copy this script is freely given to all. Please remove my name as author if any changes are made. Give me credit if credit is due.

You need to set your default ocr parameters in Acrobat before running this. They should be sticky and be used every time you scan using this script.
Note that a time-out is used to allow Acrobat to finish scanning. This might not be appropriate if scanning large documents.
*)

property done_foldername : "OCRd PDFs"
property originals_foldername : "Original PDFs"
property newimage_extension : "pdf"
property type_list : {"PDF "}
property extension_list : {"pdf"}


on adding folder items to this_folder after receiving these_items
	tell application "Finder"
		if not (exists folder done_foldername of this_folder) then
			make new folder at this_folder with properties {name:done_foldername}
		end if
		set the results_folder to (folder done_foldername of this_folder) as alias
		if not (exists folder originals_foldername of this_folder) then
			make new folder at this_folder with properties {name:originals_foldername}
			set current view of container window of this_folder to list view
		end if
		set the originals_folder to folder originals_foldername of this_folder
	end tell
	
	try
		repeat with i from 1 to number of items in these_items
			set this_item to item i of these_items
			-- 			(*
			-- 			   Can't OCR until scanning has stopped. Commented out because this code didn't seem to be reliable
			-- 			   Note that the ScanSnap doesn't seem to set the "busy" bit of a file, which is unfortunate
			-- 			 *)
			-- 			set old_size to -1
			-- 			set the_size to 0
			-- 			repeat until old_size = the_size
			-- 				delay 10
			-- 				set the item_info to the info for this_item
			-- 				set old_size to the_size
			-- 				set the_size to size of item_info
			-- 			end repeat
			-- 			delay 5
			set the item_info to the info for this_item
			if (alias of the item_info is false and (the file type of the item_info is in the type_list) or (the name extension of the item_info is in the extension_list)) then
				tell application "Finder"
					my resolve_conflicts(this_item, originals_folder, "")
					set the new_name to my resolve_conflicts(this_item, results_folder, newimage_extension)
					set this_item to (move this_item to the originals_folder with replacing) as alias
					set the source_file to (duplicate this_item to the results_folder with replacing) as alias
				end tell
				my process_item(source_file, results_folder)
			end if
		end repeat
	on error error_message number error_number
		if the error_number is not -128 then
			tell application "Finder"
				activate
				display dialog error_message buttons {"Cancel"} default button 1 giving up after 120
			end tell
		end if
	end try
end adding folder items to

on resolve_conflicts(this_item, target_folder, new_extension)
	tell application "Finder"
		set the file_name to the name of this_item
		set file_extension to the name extension of this_item
		if the file_extension is "" then
			set the trimmed_name to the file_name
		else
			set the trimmed_name to text 1 thru -((length of file_extension) + 2) of the file_name
		end if
		if the new_extension is "" then
			set target_name to file_name
			set target_extension to file_extension
		else
			set target_extension to new_extension
			set target_name to (the trimmed_name & "." & target_extension) as string
		end if
		if (exists document file target_name of target_folder) then
			set the name_increment to 1
			repeat
				set the new_name to (the trimmed_name & "." & (name_increment as string) & "." & target_extension) as string
				if not (exists document file new_name of the target_folder) then
					-- rename to conflicting file
					set the name of document file target_name of the target_folder to the new_name
					exit repeat
				else
					set the name_increment to the name_increment + 1
				end if
			end repeat
		end if
	end tell
	return the target_name
end resolve_conflicts

on process_item(source_file, results_folder)
	-- NOTE that the variable source_file is a file reference in alias format 
	try
		tell application id "com.adobe.Acrobat.Pro"
			activate
			open source_file
		end tell
		
		tell application "System Events"
			tell application process "Acrobat"
				click the menu item "Recognize Text Using OCR..." of menu 1 of menu item "OCR Text Recognition" of the menu "Document" of menu bar 1
				repeat 3 times
					if name of front window is not "Recognize Text" then
						delay 5
					end if
				end repeat
				try
					click radio button "All pages" of group "Pages" of window "Recognize Text"
					click button "OK" of window "Recognize Text"
				end try
			end tell
		end tell
		
		with timeout of 600 seconds
			tell application id "com.adobe.Acrobat.Pro"
				save the front document with linearize
				close the front document
			end tell
		end timeout
		
	on error error_message
		tell application "Finder"
			activate
			display dialog error_message buttons {"Cancel"} default button 1 giving up after 120
		end tell
	end try
end process_item

Hi,
thanks for this script.
Do you also have a version for adobe acrobat x pro? This program does not have the “Document” menu entry anymore,
the script does not work for this version.
Connected to this: how do you actually know all the entries in the script? As far as I know, there is absolutely no documentation for apple script and adobe.
For example, your script addresses “menu bar 1” and so on. Is there a trick to find out, when clicking these things interactively, that lets you trace those locations such that you can convert them into apple script?
Recording the actions in apple script doesn’t work either.

The new adobe acrobat x pro has a “Document processing” entry, accessible under the “view->tools” menu entry.
I attach a screen shot to show this:
files.me.com/bjrnfrdnnd2/qqbn7q

Once you click this “Document processing” entry, a panel pops out on the right hand side that contains an entry that I actually want to click (“Optimize Scanned pdf”), the one that corresponds to the “Recognize Text Using OCR…” in Acrobat 8/9.
Once clicked “Optimize scanned pdf” a window opens like this:
files.me.com/bjrnfrdnnd2/x2zuws

which corresponds closely enough to the “Recognize Text Using OCR…” window:
files.me.com/bjrnfrdnnd2/j70vmn

So I think that I could modify your script for adobe acrobat x pro, if I knew what commands are necessary to open the “Document processing” window.

Can you help me there?

Model: macbook pro 15 ‘’ early 2008
AppleScript: 2.1.2
Browser: Firefox 3.6.13
Operating System: Mac OS X (10.6)