Script to get metadata of a PDF

Works great now! You guys are geniuses!

Thanks to all for your input! One thing: the last item in the list does not get pulled for some reason. In the examples above, the kMDItemContentCreationDate is missing from the result. If I put something else as the last item, that one is also missing.

It looks like this line

set last item of biglist to return as string

is what’s causing it. If I understand correctly, this line is taking whatever the last item is, and replacing it with a return, wiping out the last item in the process. I seem to have removed it successfully without causing other issues, but am curious if there is some reason for this line that I am not seeing?

The purpose is obviously to separate the entries with a additional return.
This should kill both birds with one stone

set end of biglist to return

PS: Actually you can omit all as string coercions

Oops…:rolleyes:

StefanK,
Your little Foundation Tool CLI IS FANTASTIC!!!

Question.
Not that I actually need this functionality at this point in time. BUT. do have any plans to update this tool so it also includes the “PDF Version”? I only ask because it’s often useful to know if the PDF’s transparency is flattened via a version of 1.3.

In any event, I am using your tool to test against the presence a particular keyword we enter after we flightcheck our PDFs. If the keyword is detected my script continues with the Save function, otherwise it alert the operator.

I can’t thank you enough!

-Jeff

No problem, I added the document version and number of pages.
Same Link

Hi StefanK,
After downloading the new app and overwriting the old with the new in the same location, the script event log is not returning any of the document’s properties?

Replacing the old PDFMetadate file does yield correct results where the script reads the data.

On a side note, Acrobat is interesting. Let’s assume I have a document that has the text, “Document has been flightchecked” as its Keyword in its Document Properties. If I were to then completely deleted this Keyword from the Properties field, the document is still retaining the deleted string somehow, somewhere, even if I were to save the file to a new name. It’s not until I add a new character in the Keywords field that the old string gets replaced. This is a non-issue, but I still thinks its odd that Acrobat Pro 9 behaves this way.

As the project is quite old Xcode updated some project settings.
I recompiled the CLI with e deployment target of 10.5 and 32/64 bit universal architecture.
Always same link.

I have been struggling with this script for years. The whole idea is to get a file count of the PDFs in the selected folder, PDFs that start with the letter R & finally a file count of PDFs that contains the keyword correction. The script creates a text file with this info. The part I’m having problems trying to get a file count of the pdfs that contain the keyword
correction. I’m getting a = (null) result

set target_folder to choose folder with prompt "Choose target folders containing only PDFs to count files" with multiple selections allowed without invisibles
set results to ""

repeat with i from 1 to (count target_folder)
	set thisFolder to (POSIX path of item i of target_folder)
	
	--Find & count all PDFs in the folders selected that DON'T starts with letter R
	set fileCount to do shell script "find " & quoted form of thisFolder & " -type f  -iname *.pdf | wc -l"
	set results to (results & "" & thisFolder & "=" & tab & fileCount & tab)
	
	--Find & count all PDFs in the folders selected that PDF file name starts with letter R
	set fileCount to do shell script "find " & quoted form of thisFolder & " -type f -iname 'R[0-9-_]*.pdf' | wc -l"
	set results to (results & "" & tab & "RESENDS=" & tab & fileCount & tab)
	
--THIS IS THE PART I'M HAVING PROBLEMS
	--Find & count all PDFs in the folders selected that keyword is correction
	set fileCount to do shell script "mdls -name " & "kMDItemKeywords" & "-raw -nullMarker None " & quoted form of thisFolder --& " -type f  -iname *.pdf | wc -l"
	set results to (results & "" & tab & "CORRECTION=" & tab & fileCount & return)
	
end repeat


--write results to a txt file
set theFilePath to (path to desktop folder as string) & "PDF File Count.txt"
set theFile to open for access file theFilePath with write permission
try
	set eof of theFile to 0
	--write results to file theFilePath
	write results to theFile
	close access theFile
on error
	close access theFile
end try
display dialog "done" giving up after "1"

Model: iMac (Retina 5K, 27-inch, Late 2015)
AppleScript: 2.8.1 (183.1)
Browser: Safari 537.36
Operating System: Mac OS X (10.11.6)

I think what you want may look something like this:

--Find & count all PDFs in the folders selected that keyword is correction
set fileCount to do shell script "find " & quoted form of thisFolder & " -type f  -iname *.pdf -print0 | xargs -0 mdls -name kMDItemKeywords | grep -i '\\bcorrection\\b' | wc -l"

find’s -print0 primary outputs character code 0 after each path instead of a linefeed. xargs’s -0 option makes it expect character code 0 as the separator instead of linefeeds and spaces. xargs itself calls the mdls function with each path. The grep command case-insensitively matches the complete word “correction”.

I made the modifications to the script & it didn’t work. I use Adobe Bridge to add metadata. I can clearly see the word correction in the keywords. However, I noticed when I open the PDF in Acrobat & checked Document Properties in the Keywords section I noticed that Acrobat is adding a literal semicolon followed by a space & then the word correction. Example ; correction

So if I delete the semicolon ; & the space just leave the word correction then your code works but defeats the purpose of batching metadata in Bridge. I tried to modify the grep search to find ; followed by a space then correction but it still fails. My point is that Acrobat is adding additional characters when metadata is applied using Bridge. This is totally out my league & would say very advance for me but can you help me once again if what I added is correct. grep -i ‘\b\;\^correction\b’

set target_folder to choose folder with prompt "Choose target folders containing only PDFs to count files" with multiple selections allowed without invisibles
set results to ""

repeat with i from 1 to (count target_folder)
	set thisFolder to (POSIX path of item i of target_folder)
	--Find & count all PDFs in the folders selected that keyword is correction
	set fileCount to do shell script "find " & quoted form of thisFolder & " -type f  -iname *.pdf -print0 | xargs -0 mdls -name kMDItemKeywords | grep -i '\\b\\;\\^correction\\b' | wc -l"
	set results to (results & "" & tab & "CORRECTION=" & tab & fileCount & return)
	
end repeat

--write results to a txt file
set theFilePath to (path to desktop folder as string) & "PDF File Count.txt"
set theFile to open for access file theFilePath with write permission
try
	set eof of theFile to 0
	--write results to file theFilePath
	write results to theFile
	close access theFile
on error
	close access theFile
end try

This uses my MetadataLib script library:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions
use script "Metadata Lib" version "2.0.0"

set theFolders to choose folder with prompt "Choose target folders containing only PDFs to count files" with multiple selections allowed without invisibles

set theFiles to perform search in folders {theFolders} predicate string "kMDItemContentType == %@ AND kMDItemKeywords CONTAINS[cd] %@" search arguments {"com.adobe.pdf", "; correction"}

You can get the library here:

https://www.macosxautomation.com/applescript/apps/Script_Libs.html

I’m guessing this means that 0 was returned for the number of files having “correction” amongst their kMDItemKeywords. :slight_smile:

That’s a bit of a mystery. :confused: The grep code returns any line containing “correction” as a complete word (not bounded by other “word” characters such as letters, digits, or underscores) . So the presence of a semicolon and a space should make no difference. My only guesses are that either there are invisible characters in the word “correction” you see on screen (maybe it’s wrongly encoded) or the space is actually some other character which grep considers to be a word character. It may be possible to find out by using this script:

set f to (choose file of type "com.adobe.pdf" with prompt "Choose a PDF file whose kMDItemKeywords you know contains the word \"correction\" …")
do shell script "mdls -name kMDItemKeywords " & quoted form of POSIX path of f
return id of result

Look out for any unusually high or unusually low numbers in the result.

Otherwise it’s worth giving Shane’s library a try. His script returns a list of the paths to the matching files, which would then have to be counted. (But ‘search string’ should be ‘predicate string’ with version 2.0.0.)

Indeed it should. Thanks.

I’ve gone through all the PDFs I can find on my computer to see what mdls returns for their kMDItemKeywords. Only two (not created by me) have keywords beginning with "; ". In both cases, the space is a normal space and my grep code matches them when “correction” is replaced with the relevant text.

The PDF for Shane’s book “Everyday AppleScriptObjC” has a keyword which contains the copyright symbol “©”. This character is outside the normal “ASCII” range and is returned by mdls as “\U00a9”, so grep only recognises it if it’s searching for this. Shane’s script, on the other hand, only recognises it if it’s searching for the copyright symbol itself.

So if, by the merest chance, the space in "; " happens to be a no-break space (character id 160), it’s likely that mdls will render it as “\U00a0”, in which case my grep code won’t work. It’s likely too that Shane’s script won’t work either unless a no-break space is used in the search argument.

On the off-chance that no-break spaces are the problem, here’s a revised line. The egrep code matches both “correction” as a complete word and “\U00a0correction”.

--Find & count all PDFs in the folders selected that keyword is correction
set fileCount to do shell script "find " & quoted form of thisFolder & " -type f -iname *.pdf -print0 | xargs -0 mdls -name kMDItemKeywords | egrep -i '(\\\\U00a0|\\b)correction\\b' | wc -l"

Hmm. This is interesting. Searching files to which I added the keyword “Correction,” I can acceptably locate that metadata for file classes such as JPG and TIFF. My code below should also work for PDFs, but doesn’t—at least not for PDFs that were edited with CS3’s Bridge app. Metadata returned (by the mdls command for one PDF) lists the usual suspects, however, the keyword “Correction” fails to appear at all. Perhaps the XML is somehow mangled?

count (do shell script "mdfind -onlyin " & my (choose folder)'s POSIX path's quoted form & " kMDItemKeywords == 'Correction' ")'s paragraphs

edit: Nigel’s code in post #29 also returns the incorrect entry for my test PDFs—" 0"—while correctly returning " 2" for the JPEGs.

Hi Marc.

Thanks. It’s good to get the input of someone who actually has Bridge! :slight_smile:

When you say that mdls fails to register “Correction” at all with PDFs, is that just under kMDItemKeywords or under any heading? Your post has prompted me to wonder if it might be actually be under some other heading. Just a thought. Clutching at straws. :slight_smile:

set f to (choose file of type "com.adobe.pdf" with prompt "Choose a PDF file whose kMDItemKeywords you know should contain the word \"correction\" …")
do shell script "mdls " & quoted form of POSIX path of f
--> All the metadata visible to mdls.

I can see that my entry was appended in the plain contents as viewed in TextEdit—but it doesn’t register under any heading that mdls reports as being metadata.

I suspect you’re dealing with Adobe’s XMP metadata, which isn’t searchable via Spotlight.