I wrote a little Foundation Tool CLI PDFMetadata, which prints out the metadata you requested
Save it somewhere and use this syntax, you have only to adjust the paths
set theMetadata to do shell script "'/path/to/PDFMetadata' 'path/to/document.pdf'"
Without a separate utility or script around, I discovered that Metadata is human-readable in a hexdump of a PDF. So that means GREP could be used to search inside and parse for metadata.
Looks like Adobe uses pdf:metadatafieldname</pdf:metadatafieldname> to enclose keywords as well as untagged which might be more difficult to parse.
So for example, if you had Keyword = “one, two, three” you’d be looking for:
This assumes you specifically look for a certain metadata field. If you need them all, that might be different, you’d have to experiment. You can use this to see what I mean (save as a droplet):
--
-- Get Hexdump Info v4
-- by Kevin Quosig, 3/28/07
--
-- Used to drag-n-drop files to examine their contents/headers.
--
-- Most code segments courtesy of James Nierodzik of MacScripter
-- http://bbs.applescript.net/profile.php?id=8727
--
--
-- UTILITY HANDLER
--
-- Search and Replace routine using AppleScript Text Item Delimiters "trick"
--
on searchNreplace(parse_me, find_me, replace_with_me)
--save incoming TID state, set new TIDs
set {ATID, AppleScript's text item delimiters} to {"", find_me}
--using the specified character as a break point to strip the delimiter out and break the string into items
set being_parsed to text items of parse_me
--switch the TIDs again (replace string)
set AppleScript's text item delimiters to {replace_with_me}
--coerce it back to a string with new delimiters
set parse_me to being_parsed as string
--restore incoming TID state
set AppleScript's text item delimiters to ATID
--return results
return parse_me
end searchNreplace
--
-- MAIN HANDLER
--
on open fileList
-- parse through files dropped onto droplet
repeat with i from 1 to number of items in fileList
set AppleScript's text item delimiters to {""} --reset delimiters
set this_item to item i of fileList as string ---pick item to work with
set this_item_posix to quoted form of POSIX path of this_item --need POSIX path for shell scripts
set doc_name to name of (info for alias this_item) --used for renaming the TextEdit window
--Improved hexdump script line by TheMouthofSauron at MacScripter
--http://bbs.applescript.net/viewtopic.php?pid=77811#p77811
--
--hexdump with the -C parameter formats the hexdump as columns of hex pairs
--and then a column with a human-readable "ASCII translation" delimited by a pipe
--character at the beginning and end of the ASCII column
--
--"awk" takes the entire -C formatted hexdump line ($0 = all arguements)
--and filters-out the hex pairs and the delimiting of pipe characters
--(return only 16 characters starting at position 62)
--
set hex_dump to (do shell script "hexdump -C " & this_item_posix & " | awk '{print(substr($0,62,16))}'")
--remove carriage returns so output is one giant paragraph
--(allows for TextEdit searching for strings and manual scanning)
set hex_dump to searchNreplace(hex_dump, return, "")
--write to TextEdit window and rename window to file name to keep things straight
tell application "TextEdit"
make new document
set text of front document to hex_dump
set name of front window to doc_name
end tell
end repeat
end open
I was in a hurry earlier, but a GREP search routine that you could adapt is below.
It takes two inputs: the path to the file you want to GREP the innards of, and a list of strings to look for. It was specifically designed to be given a list of strings and if it found one of them to stop and return which one it found. You’d have to adapt it to actually pull data between two strings (PDF tags) or to generically look for all metadata, but it gives you some idea how to acces GREP.
You’d use the routine I posted earlier to do the research for WHAT to look for…i.e. what GREP is “seeing” during it’s searches.
Sounds messier than it ends-up being. I’ve found myself needing to parse file innards like this alot, oddly.
StefanK’s is probably easier, but means anyplace you used the script you’d have to be sure his add-on was handy. I prefer to make all my apps stand-alone since I can’t count on such things being handy and have to deploy things to dozens of machines. (No slight against StefanK, just different methodologies. StefanK is my hero! :D)
-- revised GREP routine courtesy of
-- Bruce Phillips of MacScripter
-- http://bbs.applescript.net/viewtopic.php?pid=83871#p83871
--
on grepForString(path_to_grep, search_list)
repeat with current_grep_item in search_list
try --known bug between AppleScript and GREP where if GREP finds nothing, AppleScript errors-out
do shell script "/usr/bin/grep --count " & quoted form of current_grep_item & " " & quoted form of POSIX path of path_to_grep
set grep_result to result
exit repeat
on error error_message number error_number
if error_message is "0" then -- grep didn't find anything
set grep_result to 0
else
-- pass on the error
error error_message number error_number
end if
end try
end repeat
return {grep_result, contents of current_grep_item}
end grepForString
set biglist to {}
set theListCommand to {"kMDItemFSName ", "kMDItemAuthors ", "kMDItemCreator ", "kMDItemTitle ", "kMDItemDescription ", "kMDItemContentCreationDate "}
set theList to {"File Name = ", "Author = ", "Creator = ", "Title = ", "Description = ", "Content Creation Date = "}
tell application "Finder"
set SiTem to selection
repeat with item_a from 1 to number of items in SiTem
set this_item to item item_a of SiTem as string
set this_item to POSIX path of this_item
repeat with item_b from 1 to number of items in theListCommand
set this_kMDItem to item item_b of theListCommand as string
set theResult to words of (do shell script "/usr/bin/mdls -name " & this_kMDItem & "-raw -nullMarker None " & quoted form of this_item)
set this_kMDItemResult to ""
repeat with item_c from 1 to number of items in theResult
set this_kMDItemResult to this_kMDItemResult & item item_c of theResult & space as string
end repeat
copy item item_b of theList & this_kMDItemResult & return to end of biglist
end repeat
set last item of biglist to return as string
end repeat
end tell
biglist as string
Yes, I called it "Creator = " in the script. The item name for the mdls is kMDItemCreator
Use the script below on any file to get its metadata, This will show you what you can get.
(The script was originally posted on macosxhints forums in 2005)
Hi Mark, your 1st script does exactly what I need, the only thing I’m missing is what I get from your second script “kMDItemEncodingApplications”. I’m getting a result in that field in TextEdit that I would like to get in the 1st script. That’s all I need to get it to be perfect, any idea?
No, not that. If you open a PDF in Acrobat, and choose file>properties, there is a metadata field called “Application” which shows the application that created the document (InDesign, etc), and another metadata field called “PDF Producer”, which is the particular “engine” that created the PDF (Distiller, PDF Producer (when using export in Adobe apps), Quartz (when using OSX’s built-in PDF creator via the print dialog), etc). All the different engines create their own unique “issues” when creating PDFs.
The kMDItemCreator item is more like the “Application” field built into the actual PDF - and the more I’ve looked at the various output from all the scripts shown here, it looks like the kMDItemCreator doesn’t use the PDF metatags built into the PDF (like is shown in the hexdump)…it’s more like the OSX file-based creator code or something (which is more volatile, and can be stripped if the file goes through another OS at some point.
It appears the Spotlight method knows nothing anything about the PDF Producer, and the kMDItemCreator is not the same as the .
Thanks everyone for sharing these various ways of skinning the same cat, though - I’m learning lots.
Thanks Stefank,
And stefcyr, It looks like you did find the bit you where looking for, "kMDItemEncodingApplications " would seems to be the PDF Producer. And the "kMDItemCreator " is actually the Content Creator as porkozone pointed out.
Updated Script
set biglist to {}
set theListCommand to {"kMDItemFSName ", "kMDItemAuthors ", "kMDItemEncodingApplications ", "kMDItemCreator ", "kMDItemTitle ", "kMDItemDescription ", "kMDItemContentCreationDate "}
set theList to {"File Name = ", "Author = ", "PDF Producer = ", "Content Creator = ", "Title = ", "Description = ", "Content Creation Date = "}
tell application "Finder"
set SiTem to selection
repeat with item_a from 1 to number of items in SiTem
set this_item to item item_a of SiTem as string
set this_item to POSIX path of this_item
repeat with item_b from 1 to number of items in theListCommand
set this_kMDItem to item item_b of theListCommand as string
set theResult to words of (do shell script "/usr/bin/mdls -name " & this_kMDItem & "-raw -nullMarker None " & quoted form of this_item)
set this_kMDItemResult to ""
repeat with item_c from 1 to number of items in theResult
set this_kMDItemResult to this_kMDItemResult & item item_c of theResult & space as string
end repeat
copy item item_b of theList & this_kMDItemResult & return to end of biglist
end repeat
set last item of biglist to return as string
end repeat
end tell
biglist as string
Thanks to all for your input! One thing: the last item in the list does not get pulled for some reason. In the examples above, the kMDItemContentCreationDate is missing from the result. If I put something else as the last item, that one is also missing.
is what’s causing it. If I understand correctly, this line is taking whatever the last item is, and replacing it with a return, wiping out the last item in the process. I seem to have removed it successfully without causing other issues, but am curious if there is some reason for this line that I am not seeing?
StefanK,
Your little Foundation Tool CLI IS FANTASTIC!!!
Question.
Not that I actually need this functionality at this point in time. BUT. do have any plans to update this tool so it also includes the “PDF Version”? I only ask because it’s often useful to know if the PDF’s transparency is flattened via a version of 1.3.
In any event, I am using your tool to test against the presence a particular keyword we enter after we flightcheck our PDFs. If the keyword is detected my script continues with the Save function, otherwise it alert the operator.