Hi
I hope that this is an appropriate place to post this AppleScript I wrote yesterday. It appears to work, but someone else might look it over. I searched for a way to have files auto-OCR’d using Acrobat, and surprisingly, I couldn’t find anything. I ran across a script, but it didn’t quite work correctly or in the way I wanted it to. So I cobbled together the following.
To use it, compile in Script Editor, and put a copy in /Library/Scripts/Folder Action Scripts. Then click on a folder and right click (control-click) on the folder in Finder and associate the script with the folder.
I haven’t figured out how to get my ScanSnap to use this without some help from the user. If I scan to this folder, the Folder Action takes over before the scanning is finished, so that both the scanner and Acrobat become confused. I tried to get the the action to wait until the scanner was done, but it seemed too “hacky” and I turned that off.
====
(*
PDF - OCR with Acrobat Pro
Derived in part from
* Apple sample code,
* code from http://www.documentsnap.com/files/OCRIt-Acrobat-1.1.zip,
* code at http://www.macsparky.com/blog/2009/5/24/pdfpen-ocr-folder-action-script.html
Author: Alan Harper, alan@alanharper.com.
Copyright (c) 2010 Alan Harper. All Rights reserved, except.
Permission to use or copy this script is freely given to all. Please remove my name as author if any changes are made. Give me credit if credit is due.
You need to set your default ocr parameters in Acrobat before running this. They should be sticky and be used every time you scan using this script.
Note that a time-out is used to allow Acrobat to finish scanning. This might not be appropriate if scanning large documents.
*)
property done_foldername : "OCRd PDFs"
property originals_foldername : "Original PDFs"
property newimage_extension : "pdf"
property type_list : {"PDF "}
property extension_list : {"pdf"}
on adding folder items to this_folder after receiving these_items
tell application "Finder"
if not (exists folder done_foldername of this_folder) then
make new folder at this_folder with properties {name:done_foldername}
end if
set the results_folder to (folder done_foldername of this_folder) as alias
if not (exists folder originals_foldername of this_folder) then
make new folder at this_folder with properties {name:originals_foldername}
set current view of container window of this_folder to list view
end if
set the originals_folder to folder originals_foldername of this_folder
end tell
try
repeat with i from 1 to number of items in these_items
set this_item to item i of these_items
-- (*
-- Can't OCR until scanning has stopped. Commented out because this code didn't seem to be reliable
-- Note that the ScanSnap doesn't seem to set the "busy" bit of a file, which is unfortunate
-- *)
-- set old_size to -1
-- set the_size to 0
-- repeat until old_size = the_size
-- delay 10
-- set the item_info to the info for this_item
-- set old_size to the_size
-- set the_size to size of item_info
-- end repeat
-- delay 5
set the item_info to the info for this_item
if (alias of the item_info is false and (the file type of the item_info is in the type_list) or (the name extension of the item_info is in the extension_list)) then
tell application "Finder"
my resolve_conflicts(this_item, originals_folder, "")
set the new_name to my resolve_conflicts(this_item, results_folder, newimage_extension)
set this_item to (move this_item to the originals_folder with replacing) as alias
set the source_file to (duplicate this_item to the results_folder with replacing) as alias
end tell
my process_item(source_file, results_folder)
end if
end repeat
on error error_message number error_number
if the error_number is not -128 then
tell application "Finder"
activate
display dialog error_message buttons {"Cancel"} default button 1 giving up after 120
end tell
end if
end try
end adding folder items to
on resolve_conflicts(this_item, target_folder, new_extension)
tell application "Finder"
set the file_name to the name of this_item
set file_extension to the name extension of this_item
if the file_extension is "" then
set the trimmed_name to the file_name
else
set the trimmed_name to text 1 thru -((length of file_extension) + 2) of the file_name
end if
if the new_extension is "" then
set target_name to file_name
set target_extension to file_extension
else
set target_extension to new_extension
set target_name to (the trimmed_name & "." & target_extension) as string
end if
if (exists document file target_name of target_folder) then
set the name_increment to 1
repeat
set the new_name to (the trimmed_name & "." & (name_increment as string) & "." & target_extension) as string
if not (exists document file new_name of the target_folder) then
-- rename to conflicting file
set the name of document file target_name of the target_folder to the new_name
exit repeat
else
set the name_increment to the name_increment + 1
end if
end repeat
end if
end tell
return the target_name
end resolve_conflicts
on process_item(source_file, results_folder)
-- NOTE that the variable source_file is a file reference in alias format
try
tell application id "com.adobe.Acrobat.Pro"
activate
open source_file
end tell
tell application "System Events"
tell application process "Acrobat"
click the menu item "Recognize Text Using OCR..." of menu 1 of menu item "OCR Text Recognition" of the menu "Document" of menu bar 1
repeat 3 times
if name of front window is not "Recognize Text" then
delay 5
end if
end repeat
try
click radio button "All pages" of group "Pages" of window "Recognize Text"
click button "OK" of window "Recognize Text"
end try
end tell
end tell
with timeout of 600 seconds
tell application id "com.adobe.Acrobat.Pro"
save the front document with linearize
close the front document
end tell
end timeout
on error error_message
tell application "Finder"
activate
display dialog error_message buttons {"Cancel"} default button 1 giving up after 120
end tell
end try
end process_item