Hello all. I hope someone can help me. I want to know if it’s possible to sort pdfs by their metadata information?
Hi Chuck,
I love everything PDF workflow and so I just couldn’t resist to write a small script for you that shows you how to do it.
My solution is based on a Python helper script that requires Mac OS X 10.5 Leopard. In case you are using an earlier incaranation of our beloved operating system, you need to install PyObjC.
To test this script, please download the helper script to your dekstop and then execute the AppleScript code below:
on run
set pdffiles to choose file with prompt "Please choose PDF files only:" with multiple selections allowed
set stringlist to ""
set countpdffiles to length of pdffiles
repeat with i from 1 to countpdffiles
set pdffile to item i of pdffiles
set pdffilepath to quoted form of (POSIX path of (pdffile as Unicode text))
if i is equal to countpdffiles then
set stringlist to stringlist & pdffilepath
else
set stringlist to stringlist & pdffilepath & space
end if
end repeat
set pyscriptpath to quoted form of (POSIX path of (((path to desktop) as Unicode text) & "sortpdfs.py"))
set command to "/usr/bin/python/ " & pyscriptpath & space & stringlist
set sortedpdfpaths to paragraphs of (do shell script command)
return sortedpdfpaths
end run
It should return the paths to the chosen PDF documents sorted by the author’s name given in the metadata. Of course by manipulating the Python script, you can also choose to sort by the creator, title, etc.
HTH!
Thanks for the help Martin. I have another question for you. Is it possible to sort the pdfs if they contain keywords?
Of course it is. Please study and download the modified Python helper script.
on run
set pdffiles to choose file with prompt "Please choose PDF files only:" with multiple selections allowed
set stringlist to ""
set countpdffiles to length of pdffiles
repeat with i from 1 to countpdffiles
set pdffile to item i of pdffiles
set pdffilepath to quoted form of (POSIX path of (pdffile as Unicode text))
if i is equal to countpdffiles then
set stringlist to stringlist & pdffilepath
else
set stringlist to stringlist & pdffilepath & space
end if
end repeat
set pyscriptpath to quoted form of (POSIX path of (((path to desktop) as Unicode text) & "sortpdfskey.py"))
set command to "/usr/bin/python/ " & pyscriptpath & space & stringlist
set sortedpdfpaths to paragraphs of (do shell script command)
return sortedpdfpaths
end run
Do you have any suggestions that are solely native applescript without the use of any secondary scripts?
You can also easily access the keywords of a PDF document by scripting the excellent and free PDF viewer Skim:
tell application "Skim"
set pdfinfo to info of document 1
-- not every PDF document has keywords...
try
set pdfkeywords to keywords of pdfinfo
-- {"Yooooo!", "Sal Soghoian is my role model!"}
end try
end tell
Moreover you can use the «mdls» command to get the keywords of a PDF document:
set pdfpath to quoted form of "/Users/martin/Desktop/test.pdf"
set command to "mdls -name kMDItemKeywords -raw " & pdfpath
set output to do shell script command
But the problem is the sorting, as AppleScript does not provide any convenient built-in sort functions and also does not feature key/value dictionaries like Python (or any other programming language). So you will end up with several (nested?) lists that have to be sorted and compared. That’s why I do not like this approach But it’s possible.