extract xmp metadata from pdf file

hi

I want to extract the xmp metadata file from a pdf file


set theFile to choose file without invisibles
set FileData to read theFile 

I have this (not much yet, I know)
I thought of filter the xmp date from here filtering everything between


and


because these are the beginning and the end tags of the xmp file

but I cant see them in the result of my script
and I am sure there is an xmp file in the pdf

I think it’s an encoding problem, but I am not sure
(I’ve tried with adding

but then i just get Chinese characters)

anybody knows the solution?

Hello,
You might consider Phil Harvey’s unvaluable ExifTool-7.99 (a command line utility & Perl library)
Supported file types
XMP Tags

After installation, a simple script should do the job…



set p2f to "/Posix/path/to/myDocument.pdf"
do shell script "/usr/bin/exiftool  -XMP -b" & space & quoted form of p2f"
-------------------------------------------------------------------------------------------
-- Sample Output :
(*
<?xpacket begin=\"€\" id=\"W5M0MpCehiHzreSzNTczkc9d\"?>
<x:xmpmeta xmlns:x=\"adobe:ns:meta/\" x:xmptk=\"Adobe XMP Core 4.0-c316 44.253921, Sun Oct 01 2006 17:14:39\">
	<rdf:RDF xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">
		<rdf:Description rdf:about=\"\"
				xmlns:dc=\"http://purl.org/dc/elements/1.1/\">
			<dc:format>application/pdf</dc:format>
			<dc:description>
				<rdf:Alt>
					<rdf:li xml:lang=\"x-default\">Extensible Metadata Platform (XMP) data and serialization model specification</rdf:li>
				</rdf:Alt>
			</dc:description>
			<dc:title>
				<rdf:Alt>
					<rdf:li xml:lang=\"x-default\">XMP Specification Part 1: Data and Serialization Models</rdf:li>
				</rdf:Alt>
			</dc:title>
			<dc:creator>
				<rdf:Seq>
					<rdf:li>Adobe Developer Technologies</rdf:li>
				</rdf:Seq>
			</dc:creator>
		</rdf:Description>
		<rdf:Description rdf:about=\"\"
				xmlns:pdf=\"http://ns.adobe.com/pdf/1.3/\">
			<pdf:Producer>Acrobat Distiller 8.1.0 (Windows)</pdf:Producer>
			<pdf:Keywords>XMP  metadata  schema XML RDF</pdf:Keywords>
			<pdf:Copyright>2008 Adobe Systems Inc.</pdf:Copyright>
		</rdf:Description>
		<rdf:Description rdf:about=\"\"
				xmlns:xap=\"http://ns.adobe.com/xap/1.0/\">
			<xap:CreatorTool>FrameMaker 7.2</xap:CreatorTool>
			<xap:ModifyDate>2008-01-29T10:55:40-08:00</xap:ModifyDate>
			<xap:CreateDate>2008-01-29T10:29:21Z</xap:CreateDate>
			<xap:MetadataDate>2008-01-29T10:55:40-08:00</xap:MetadataDate>
		</rdf:Description>
		<rdf:Description rdf:about=\"\"
				xmlns:pdfx=\"http://ns.adobe.com/pdfx/1.3/\">
			<pdfx:Copyright>2008 Adobe Systems Inc.</pdfx:Copyright>
		</rdf:Description>
		<rdf:Description rdf:about=\"\"
				xmlns:xapMM=\"http://ns.adobe.com/xap/1.0/mm/\">
			<xapMM:DocumentID>uuid:69eafaf2-984c-4abc-9b3b-b44e7d3c3446</xapMM:DocumentID>
			<xapMM:InstanceID>uuid:5a5bdade-ae97-401b-84d8-6ceb50a0cab1</xapMM:InstanceID>
		</rdf:Description>
	</rdf:RDF>
</x:xmpmeta>					
<?xpacket end=\"w\"?>
*)

that works perfect.

But I still have a problem, it’s for a school project and I’m gonna have to demonstrate the script. And normally I won’t be able to install Exiftool.

So is there another solution who uses only standard applescript and/or terminal commands?

On the exiftool home page there are some downloadable example OS X scripts (droplets). The first droplet bundles exiftool as an application with the script, so you can use this as an example.

The following also works:

set my_PDF to POSIX path of (choose file)
do shell script "mdls " & quoted form of my_PDF

That’s the result I get:

kMDItemAttributeChangeDate = 2009-11-13 10:09:12 -0500
kMDItemAuthors = (scyr)
kMDItemContentCreationDate = 1903-12-31 19:00:00 -0500
kMDItemContentModificationDate = 2009-04-15 14:09:38 -0400
kMDItemContentType = "com.adobe.pdf"
kMDItemContentTypeTree = (
"com.adobe.pdf",
"public.data",
"public.item",
"public.composite-content",
"public.content"
)
kMDItemCreator = "Adobe InDesign CS4 (6.0.1)"
kMDItemDisplayName = "calsheet+IDEAlliance2.pdf"
kMDItemEncodingApplications = ("ApogeeX 4.0 Normalizer")
kMDItemFSContentChangeDate = 2009-04-15 14:09:38 -0400
kMDItemFSCreationDate = 2009-04-15 14:09:38 -0400
kMDItemFSCreatorCode = 0
kMDItemFSFinderFlags = 0
kMDItemFSInvisible = 0
kMDItemFSIsExtensionHidden = 0
kMDItemFSLabel = 0
kMDItemFSName = "calsheet+IDEAlliance2.pdf"
kMDItemFSNodeCount = 0
kMDItemFSOwnerGroupID = 501
kMDItemFSOwnerUserID = 501
kMDItemFSSize = 19831579
kMDItemFSTypeCode = 0
kMDItemID = 10201697
kMDItemKind = "Adobe PDF document"
kMDItemLastUsedDate = 2009-11-13 10:09:12 -0500
kMDItemNumberOfPages = 1
kMDItemPageHeight = 437.04
kMDItemPageWidth = 1224
kMDItemSecurityMethod = "None"
kMDItemTitle = "calsheet+IDEAlliance.ps, page 1 @ Normalize ( Untitled-4 )"
kMDItemUsedDates = (
2009-04-15 14:06:39 -0400,
2009-04-16 20:00:00 -0400,
2009-07-19 20:00:00 -0400,
2009-08-02 20:00:00 -0400,
2009-09-02 20:00:00 -0400,
2009-09-20 20:00:00 -0400,
2009-09-23 20:00:00 -0400,
2009-09-29 20:00:00 -0400,
2009-10-01 20:00:00 -0400,
2009-10-07 20:00:00 -0400,
2009-10-22 20:00:00 -0400,
2009-11-12 19:00:00 -0500
)
kMDItemVersion = "1.3""

@boardhead
I don’t think the first droplet bundles exiftool in the script. I think the droplet doesn’t use exiftool at all. Here is it’s code:


-- this script modified from an original script written by Brett Gross - PH

-- ------------------------------------------------------- GLOBAL VARIABLES
global extract
global theVers

on setup()
	try
		set mePath to path to me
		set mePathPOSIX to POSIX path of mePath
		set extract to (mePathPOSIX & "Contents/Resources/extract_preview") as string
		
		--display dialog
		
		-- Debug code to make sure that we've got things working alright
		set versionComm to " -ver"
		set theScript to (quoted form of extract & versionComm) as string
		--display dialog theScript
		
		set theVers to do shell script theScript
		
		display dialog theVers buttons {"OK"} default button 1 giving up after 2
		return theVers
	on error
		-- Abort
		return "Err"
	end try
end setup

on extractPreview(theImage)
	-- Pass an alias
	set theInfo to (info for theImage)
	set theName to name of theInfo
	set ppath to (POSIX path of theImage)
	set basePath to (characters 1 thru ((length of ppath) - (length of theName)) of ppath) as string
	log basePath
	set baseName to (characters 1 thru ((length of theName) - 4) of theName) as string
	log baseName
	set outFile to ((baseName & "_preview.jpg") as string)
	log outFile
	set theScript to ((quoted form of extract) & " " & (quoted form of POSIX path of theImage) & " '" & basePath & outFile & "'") as string
	set theRes to do shell script theScript
	log theScript
	return (basePath & outFile) as string
end extractPreview

on procFiles(theFiles)
	repeat with curFile in theFiles
		-- Go ahead and process the file
		extractPreview(curFile)
	end repeat
end procFiles

on run
	set theVers to my setup()
	if theVers is not "Err" then
		set theFiles to (choose file with multiple selections allowed) as list
		my procFiles(theFiles)
	end if
end run

on open (docList)
	set theVers to my setup()
	if theVers is not "Err" then
		my procFiles(docList)
		
	end if
end open

@stefcyr

that is metadata, but no xmp data
it should be an xml based format

If you would take the xmp out of this http://www.turboupload.com/lye5xx9k3bnc/sample_file.pdf.html file
you would get this http://users.skynet.be/metalhammer500/sample_file.xmp xmp file

Guess I misread, sorry about that. My bad.

-- ...
on setup()
	-- ...
		set extract to (mePathPOSIX & "Contents/Resources/extract_preview") as string
	-- ...

Hi MetalHammer,

“extract_preview” is a Perl script that invoques … Image::ExifTool …
So yes, “ExifTool” is bundled within the applet.

The link http://users.skynet.be/metalhammer500/sample_file.pdf returns an “Errors.cgi” file.

Guess I didn’t look very well. Sorry for that. I’m gonna try to bundle exiftool in my script tomorrow. If it works, it 'l be the perfect solution.

The link to the pdf works fine over here, but I’ll upload it to somewhere else tomorrow.

I’ve made the script. Here it is:


on open (theFiles) --actions as droplet
	extractXMP(theFiles)
end open

on run --actions with normal use
	set theFiles to (choose file with multiple selections allowed without invisibles)
	extractXMP(theFiles)
end run

on extractXMP(theFileList)
	repeat with theFile in theFileList
		
		--save name and path
		set theFilePath to POSIX path of theFile
		set theInfo to info for theFile
		set theFileName to name of theInfo
		
		--save exiftool command and execute it
		set mePath to path to me
		set mePathPOSIX to POSIX path of mePath
		set exiftoolPath to (mePathPOSIX & "Contents/Resources/exiftool") as string
		set theShellCommand to exiftoolPath & " -XMP -b" & space & theFilePath
		set theXMPcontents to do shell script theShellCommand
		
		--change extension
		set theFileNameChopped to text 1 thru -4 of theFileName
		set theFileName to theFileNameChopped & "xmp"
		
		--write xmp file
		set textFile to open for access ((path to desktop folder as text) & theFileName) with write permission
		try
			set eof of textFile to 0
			write theXMPcontents to textFile
			close access textFile
		on error e
			close access textFile
			display dialog e
		end try
		
		
	end repeat
end extractXMP

But I have still two questions.
First, when I use the file by double clicking it, I get a warning saying if I am sure I want to stop the file or execute it.
Is there a way to get rid of this?

Second, when I make an xmp file with this script the it’s file kind is “BBEdit text document”. I have another xmp file with “Adobe XMP file” as file kind. Can I change the file kind with applescript?

http://www.turboupload.com/fqoep73ayzwj/extract_xmp.app.zip.html

I’ve moved the applescript applescript, and now I get the error -1409.
Somebody can explain this error in plain english what the error means?
the line where I write the xmp file seems to be the problem

Hi,

if you use a string path specifier to open a file for read/write operation, you should add the keyword file
and you can omit folder in path to desktop folder


 set textFile to open for access file ((path to desktop as text) & theFileName) with write permission

I added the file keyword, but now the the xmp file is created at the root of my disk and it’s name is the path I where I wanted it to be placed.

there might be space characters in the file path, so use quoted form of to avoid those problems


.
set theShellCommand to quoted form of exiftoolPath & " -XMP -b" & space & quoted form of theFilePath
.

when I use quoted form the outputPath of the name of the xmp file has quotes. But it is still located at the root of my disk and still the filename is the path of where I want the file to be.

ps: I think the problem is not the path in the exiftool command like in your example, but the path to write the file.

Here is the script as I have it now.
(difference with the previous posted version is I added a piece of code where I placed the xmp file in the same location as the pdf file where i get the xmp from)


on open (theFiles) --acties bij gebruik als droplet
	extractXMP(theFiles)
end open

on run --acties bij gewoon gebruik
	set theFiles to (choose file with multiple selections allowed without invisibles)
	extractXMP(theFiles)
end run

on extractXMP(theFileList)
	repeat with theFile in theFileList
		
		--naam en pad opslaan
		set theFilePath to POSIX path of theFile
		set theInfo to info for theFile
		set theFileName to name of theInfo
		
		--exiftool commando opstellen en uitvoeren
		set mePath to path to me
		set mePathPOSIX to POSIX path of mePath
		set exiftoolPath to (mePathPOSIX & "Contents/Resources/exiftool") as string
		set theShellCommand to exiftoolPath & " -XMP -b" & space & quoted form of theFilePath
		set theXMPcontents to do shell script theShellCommand
		
		--extensie veranderen
		set theFileNameChopped to text 1 thru -4 of theFileName
		set theFileName to theFileNameChopped & "xmp"
		
		if theXMPcontents is not equal to "" then
			--pad opstellen
			set ASTID to AppleScript's text item delimiters
			set AppleScript's text item delimiters to "/"
			set stringlist to every text item of theFilePath
			set listnr to number of items in stringlist
			set stringlist to items 1 thru (listnr - 1) of stringlist
			set folderPath to stringlist as string
			set AppleScript's text item delimiters to ASTID
			
			set outputPath to folderPath & "/" & theFileName
			display dialog outputPath
			
			--xmp bestand schrijven
			set textFile to open for access file (quoted form of outputPath) with write permission
			try
				set eof of textFile to 0
				write theXMPcontents to textFile
				close access textFile
			on error e
				close access textFile
				display dialog e
			end try
			
		else
			display dialog "Er zit geen XMP file in " & theFileNameChopped
			
		end if
		
	end repeat
end extractXMP

This is the piece of code a added to the script

			--pad opstellen
			set ASTID to AppleScript's text item delimiters
			set AppleScript's text item delimiters to "/"
			set stringlist to every text item of theFilePath
			set listnr to number of items in stringlist
			set stringlist to items 1 thru (listnr - 1) of stringlist
			set folderPath to stringlist as string
			set AppleScript's text item delimiters to ASTID
			
			set outputPath to folderPath & "/" & theFileName
			display dialog outputPath
			
			--xmp bestand schrijven
			set textFile to open for access file (quoted form of outputPath) with write permission

not the filename, but the file path and only in shell script lines.

If the short name of the current user contains a space character or the app is placed in Applications Support folder the shell script line will fail. You have to escpace any space and special character (≠alphanumeric) in POSIX paths.

sorry, wrote some mistakes in my last post.

I said I used the quoted form of the filename, but I meant the file path.
And the script I posted is an old version. (updated now)

No spaces or special characters in the path
outputPath = /Users/jonasyde/School/Eindwerk/extract_xmp/sample_file.xmp
tested with files on desktop, doesn’t work either

sow, I think I made a fault in this line: (because it used to work before I added the next pieces)

set textFile to open for access file (quoted form of outputPath) with write permission

or in de building of the outputPath:

			set ASTID to AppleScript's text item delimiters
			set AppleScript's text item delimiters to "/"
			set stringlist to every text item of theFilePath
			set listnr to number of items in stringlist
			set stringlist to items 1 thru (listnr - 1) of stringlist
			set folderPath to stringlist as string
			set AppleScript's text item delimiters to ASTID
			
			set outputPath to folderPath & "/" & theFileName

and for competion, the full script: (last version, dubbelchecked)

on open (theFiles) --acties bij gebruik als droplet
	extractXMP(theFiles)
end open

on run --acties bij gewoon gebruik
	set theFiles to (choose file with multiple selections allowed without invisibles)
	extractXMP(theFiles)
end run

on extractXMP(theFileList)
	repeat with theFile in theFileList
		
		--naam en pad opslaan
		set theFilePath to POSIX path of theFile
		set theInfo to info for theFile
		set theFileName to name of theInfo
		
		--exiftool commando opstellen en uitvoeren
		set mePath to path to me
		set mePathPOSIX to POSIX path of mePath
		set exiftoolPath to (mePathPOSIX & "Contents/Resources/exiftool") as string
		set theShellCommand to exiftoolPath & " -XMP -b" & space & quoted form of theFilePath
		set theXMPcontents to do shell script theShellCommand
		
		--extensie veranderen
		set theFileNameChopped to text 1 thru -4 of theFileName
		set theFileName to theFileNameChopped & "xmp"
		
		if theXMPcontents is not equal to "" then
			--pad opstellen
			set ASTID to AppleScript's text item delimiters
			set AppleScript's text item delimiters to "/"
			set stringlist to every text item of theFilePath
			set listnr to number of items in stringlist
			set stringlist to items 1 thru (listnr - 1) of stringlist
			set folderPath to stringlist as string
			set AppleScript's text item delimiters to ASTID
			
			set outputPath to folderPath & "/" & theFileName
			display dialog outputPath
			
			--xmp bestand schrijven
			set textFile to open for access file (quoted form of outputPath) with write permission
			try
				set eof of textFile to 0
				write theXMPcontents to textFile
				close access textFile
			on error e
				close access textFile
				display dialog e
			end try
			
		else
			display dialog "Er zit geen XMP file in " & theFileNameChopped
			
		end if
		
	end repeat
end extractXMP

open for access (file) expects an HFS path (colon separated).

Actually this syntax is correct, it must work

set textFile to open for access file ((path to desktop as text) & theFileName) with write permission

If I use this line of code it works, but when I use the next code, it doesn’t.

			set ASTID to AppleScript's text item delimiters
			set AppleScript's text item delimiters to "/"
			set stringlist to every text item of theFilePath
			set listnr to number of items in stringlist
			set stringlist to items 1 thru (listnr - 1) of stringlist
			set folderPath to stringlist as string
			set AppleScript's text item delimiters to ASTID
			
			set outputPath to folderPath & "/" & theFileName
			display dialog outputPath
			
			--xmp bestand schrijven
			set textFile to open for access file (quoted form of outputPath) with write permission

the format of outputPath is

/path/to/myFile.ext

for the open command you need

DiskName:path:to:myFile.ext