Using Spotlight in your AppleScripts

by Adam Bell

In my last tutorial article on using the shell in your AppleScripts, I introduced a Unix executable called play for playing music or sounds from a script without running iTunes. In that part of the tutorial, I used a shell process, mdfind, for locating the music to be played whether is was in an iTunes database or an MP3 in your Music folder. mdfind is a tool for accessing Spotlight metadata. This article focuses on some of the many other things this powerful set of tools make possible. It will get you started using the power of Spotlight from your AppleScripts.

Just What Exactly is Spotlight?

Spotlight is an integrated system-wide service in Tiger (OS X 10.4.x) for harvesting, storing, indexing, and querying metadata. What’s metadata? In simplest terms metadata is data about data - information about the information stored in a file. More formally, "Metadata is structured, encoded data that describe (yes, “data” is plural) characteristics of information-bearing entities to aid in the identification, discovery, assessment, and management of the described entities", whew. In Mac OS X Tiger, the “described entities” are files with the broadest definition of “file” - nothing else - so only files, folders and volumes have metatdata stored for them. You might think of Spotlight on your desktop as being the GUI front end to this technology, but the GUI, in its thrust for simplicity, doesn’t do the technology behind it justice.

Why is metadata harvested and used? Because in this age of exploding data storage capacities, finding anything among millions of files is getting increasingly difficult, particularly since the organization of your files is entirely up to you. Metadata in OS X is a block of cache stored on each of your volumes (a partition is a volume) that contains both attributes of every indexed file (creation date, type, name, extension, etc.) and a file content index database. In unsophisticated terms, it works like this (if Spotlight is enabled, of course): You save or use a file and a metadata sniffer extracts the file’s metadata and informs fsevents that there has been a change. fsevents is a kernal-level watcher of file events and it in turn notifies the Spotlight Server to obtain the new data from the sniffer and update the databases in its cache. These are all plist files; one per file. Any program seeking to find a file (like the Finder’s Find window or the Spotlight dropdown or window) queries the metadata framework and the Spotlight Server finds the file data you seek and returns it to the querier.

One word of warning here: if the kernal is very busy (unpacking a very large file, for example), file system events (fsevents) can miss an event (just as folder actions do if the processor is busy with early files dropped on the folder), and if the fsevents buffer is full, may fail to update metadata if a slow subscriber is holding it up. The only way this can be corrected is with a full search to find the new or changed files and update their metadata and those only happen if the system is idle for a while. I’ve found that this occasional failing is most prevalent the higher up the file hierarchy you go - sometimes file ContentLastChanged time for a large folder is not up to date and for volumes, may not even be close. I speculate that that’s because a full volume scan would be required to fix it and if you shut your machine down or put it to sleep when you’re not using it, that never happens.

When you look for something in Spotlight or the Finder, this system is actually searching among Metadata Attributes or in the content index for the file - which is a hash of the words in it. All of this information is also available in the shell using a set of executables whose names begin with “md”–> mdls, mdcheckschema, mdfind, mdutil, and mdimport. By using shell scripts for these in your AppleScripts, you can bypass a good deal of work that might be done otherwise by “middle men”, and access data that is not available to an AppleScript in so direct a manner. Of these, mdfind and mdls are the most useful from a scripter’s point of view and I’ll concentrate on those two in this article. Assuming that you have allowed Spotlight fairly free reign on your machine (have not declared big regions “private” in Spotlight’s preferences), there is a lot to be discovered, and that discovery process can become part of your AppleScripts when you have absorbed the examples in this article.

SIDE NOTE: Although it is not well documented by Apple as far as I can tell, it is possible to use Boolean expressions in the Spotlight GUI if you remember these rules:
[b]AND[/b] is the default where words are separated by spaces and both negating and grouping use parentheses. For [b]OR[/b] and [b]NOT[/b], you construct the query as follows: [b]Lenin|Trotsky(-Stalin)[/b] which is equivalent to "Lenin [b]OR[/b] Trotsky [b]NOT[/b] Stalin" Note that there are [b]NO SPACES[/b] in that query, and that the pipe "[b]|[/b]" is [b]OR[/b], and (-a_word), with the [i]parentheses required[/i], is [b]NOT[/b] a_word. Other combinations are possible as well: [b](Lenin|Trotsky) Stalin[/b] (note that there [i][b]is[/b][/i] a space) is equivalent to "(Lenin [b]OR[/b] Trotsky) [b]AND[/b] Stalin". Finally, inserting your text in quotes does not "glob" the text as it does in Google searches, for example; it confines the search to the [b]names[/b] of files and ignores their [b]content[/b].
I didn't discover these rules by experiment, I first saw them in [url=http://www.hiram.nl/ipsedixit/][b]Ipse dixit[/b][/url], but you will have to search for "Spotlight" in the site's search box to find them (the links to the specific pages are not reliable). Later, I found this macdevcenter.com article: "[url=http://www.macdevcenter.com/pub/a/mac/2006/01/04/mdfind.html?page=1][b]The Power of mdfind[/b][/url]" and realized that the same rules could apply in the GUI. You should also be aware that a possible reason for why this is undocumented by Apple, aside from the newness of Spotlight, could be that it doesn't always work without some fiddling; binary-conditioned searches can be rather fussy in my experience.

Why Go There?

I was first exposed to mdfind when Cameron Hayne (a prolific contributor to and moderator of the MacOSXHints forums) and a freelance consultant on things Unix, posted a quick way to list all the applications on a hard disk. I was astonished at how fast that search was compared to any vanilla AppleScript I could think of and was inspired to explore the possibilities of using Spotlight for AppleScript searches. It was made more appealing when I discovered that these searches do not require opening the application to which a file belongs (as vanilla AppleScripts do) to access its Spotlight metadata.

The two most useful tools, mdls and mdfind, are described in very scant detail in these Apple references (one can only hope that Apple will gradually improve this documentation, but Spotlight is new): Introduction to Spotlight Metadata Attributes Reference within which there is a link to Query Expression Syntax. Unfortunately, none of Apple’s Spotlight references are even remotely targeted to an AppleScripter’s needs - they are really aimed at helping application developers, though the Syntax article does acknowledge the needs of command line scripters in a shell. I found that some experiments were very helpful in exploring what can be done and this article will deal with some of what I have discovered so far, but it is by no means a complete exposé of Spotlight as used from the shell.

So What Can These Tools Do?

For openers, mdfind can be used to find the posix path to any file or files containing the text indicated (as you would type it in the Spotlight drop down on the menubar and get a list under it). It does not return a list; it returns a string with newLine paragraph delimiters. Because shell scripts require that you escape spaces in data passed to them, you should put the text in single quotes or use “quoted form of” before the search string if it contains more than one distinct word. Further, the syntax rules as outlined in the side note above for use in the Spotlight GUI, will also work in query expressions using mdfind, e.g., do shell script “mdfind Lenin|Trotsky(-Stalin)”, assuming you had any such documents.

A quick example is that since the name of a file includes its extension, you can search for extensions this way, but be forewarned that this will also find references to scripts in other documents (such as this one on my machine) since the fragment is not being treated as a name extension but simply as a string of text. As far as Spotlight is concerned, content includes the name of the file. A feature of the mdfind approach, shared by the Finder’s Find, is that you can “aim” your search at particular folders or volumes and as we will see shortly, you can “aim” mdls at a particular attribute as well. I find that process rather awkward in the Finder and the details there are not scriptable except in the GUI scripting sense. Clearly, as we will see shortly, mdfind can be used to provide the file reference for mdls, i.e., find something and get the value of an attribute in one do shell script call.

Here’s a script example that will find the path to every document on your boot volume containing “.scpt” in its name or content. Clearly, there are lots of ways to define such query, Q, in the script below (using boolean operators, for example), and an equally large number of ways to define where to search, but in POSIX terms the argument of -onlyin in the script is “/” which is the posix path to your boot volume. Using this, the search will ignore any other volume(s) on your hard disk(s) that are Spotlight-enabled (and will always ignore them anyway if you’ve made them “private” in Spotlight’s preferences). I should also mention that if a particular volume is mounted only briefly to add some files and then immediately dismounted (as in backing up to an external firewire disk or a memory stick, say), then Spotlight may not have updated the metadata for that disk and your query will get “old” data.


set Q to quoted form of ".scpt" -- to make clear what the query is, but could be in the shell script.
-- quoted form is not really necessary for ".scpt", but it's good practice in shell scripts and does no harm in a query argument.
set T to paragraphs of (do shell script "mdfind -onlyin / " & Q) -- mdfind returns a newLine delimited string

As mentioned above, this will also find files in which “.scpt” appears. We can, however, be more specific and “aim” our query at name extensions or even more specifically, at compiled scripts. There are more ways to get there later in the article, but in this instance, if you enclose your string in escaped quotes, ", (so they are included in the query), then Spotlight will search for and return only file names but will not search its contents indices. Using this filter, we can find all our compiled scripts thus:


set Q to quoted form of "\".scpt\"" -- note the escaped "inner" quotes. The outer ones are for AppleScript's setting Q to the text.
set T to paragraphs of (do shell script "mdfind -onlyin / " & Q)

where T is a newLine delimited string of your scripts found anywhere on your boot volume. Leave out the -onlyin /, and find AppleScripts anywhere on any mounted, Spotlight-enabled volume. Try the script - it’s amazingly fast. Recall that the same “trick” works in the Spotlight dropdown. You can always find the posix path to the files including any string in its name this way and because content is not searched, it’s fast. If you want an HFS path, use POSIX file of (POSIX/path/to/file).

File Metadata

To search our file system more methodically and with greater specificity we must use the metadata types stored with the files (just as Spotlight does). Our first step then is to explore the kinds of metadata that are typically stored for files so we can use it in an mdfind call. To explore a file’s metadata, we use mdls (think metadata list).


-- Pick the file to examine.
-- The shell uses POSIX paths and some file names have spaces  
-- so use quoted form & POSIX path for what would be an alias:
set P to quoted form of (POSIX path of (choose file without invisibles))
-- Collect the metadata for the chosen file.
set MetaData to (do shell script "mdls " & P) -- returns a newLine delimited string.
display dialog MetaData
MetaData

Note that this is formatted for a monofont in the Terminal using spaces to align the equal signs. This doesn’t align them in a display dialog’s font, however. Also note that if the data extends beyond the height of your dialog, you will have to look at the results pane of your Script Editor to see all of it. The last item, often Dates Used, can run to many entries if you open or modify the file frequently.

To get a value for a particular metadata record you use the “-name” option followed by the attribute key in Unicode text i[/i] you are interested in. Say we wanted the version number of iTunes in your application folder (not that you’d get it this way since plain vanilla AppleScript can often do this faster (but that is not always true, as we will see shortly), but as an easy illustration with a ready comparison):


-- Using mdls in a shell script (as an illustration)
set AN to "kMDItemVersion" as Unicode text
set F to POSIX path of ((path to "apps" as text) & "iTunes.app")
set v to (do shell script "mdls -name " & AN & space & F)
set tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to AN & " = \""
set Vers to (text item 2 of v)
set AppleScript's text item delimiters to "\""
set kMD_Vers to text item 1 of Vers
set AppleScript's text item delimiters to tid
-- Or in AppleScript (vanilla)
-- using POSIX file rather than repeating the location, F, in HFS terms.
set AS_Vers to short version of (info for (POSIX file F))
{kMD_Vers, AS_Vers} --> {"7.0.2", "7.0.2"}

When you can get the data you want directly from “info for…”, however, there’s no point (or utility) in pursuing it from the shell. Where the shell technique shines is in getting information that is not available to an AppleScript; like the recent history of the file’s usage, for example, shown next.


set tFile to (choose file)
set tPath to quoted form of (POSIX path of tFile)
set tName to name of (info for tFile)
set MD to do shell script "mdls -name kMDItemUsedDates " & tPath
set tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to "= ("
set tDates to text item -1 of MD
set AppleScript's text item delimiters to ")"
set tDates to (text item 1 of tDates) as text
set AppleScript's text item delimiters to tid
set Usage to {}
-- omit first paragraph which is blank (as is the last) and skip
-- the last entry which is always when you run this test because
-- that counts as a "use" to file system events.
repeat with k from 2 to (count paragraphs of tDates) - 3
	set Usage's end to (text 1 thru -8 of paragraph k of tDates) & return
end repeat
set Usage to Usage as text
display dialog "The file \"" & tName & "\" was used as follows:" & return & return & Usage with title "File Usage Dates Before Now"

As another example, we all know that the “Finder” can locate the original file corresponding to an alias in another Finder window and it’s not outrageously difficult to find them either:


tell application "Finder"
	set O to {}
	set A to (files in entire contents of (choose folder) where it's kind is "Alias")
	repeat with f in A
		set O's end to original item of f
	end repeat
end tell

But there’s an alternative to using the Finder to find Alias files, though not to finding their originals: we can use Spotlight metadata. We can’t follow an alias to its original in UNIX because the UNIX API doesn’t include them - they are creatures of the Carbon and Cocoa APIs and we only get the hint from the metadata for an alias that it is resolvable, but not what it resolves to.


set W to quoted form of (POSIX path of (choose folder with prompt "Find Aliases and Targets Where?"))
set tA to paragraphs of (do shell script "mdfind -onlyin " & W & " 'kMDItemKind == \"Alias\"'")
set AliasToOrig to {}
set BrokenAlias to {}
repeat with Al in tA
	set hf to POSIX file (contents of Al)
	try
		tell application "Finder" to set AliasToOrig's end to {hf as alias, ((original item of hf) as alias)}
	on error
		set BrokenAlias's end to hf as alias
	end try
end repeat

It is natural to wonder which of theses methods for finding aliases is faster; is it worth going to the shell to do the job with mdfind or is a filtered Finder or “info for” instruction better? The following test tells all. I’ve set it up to ask for a folder and warn you not to choose your Documents folder, for example, nor to increase the number of trials much beyond 20 unless you want to wait - I chose a folder containing 213 files in 28 internal folders (my master files for these tutorials, in fact) and among them there are only two aliases to files that are not in the chosen folder.


set Ratios to lotsa(20) -- (Nigel Garvey's test template as I use it.)

on lotsa(many)
	-- Any common values for test items here.
	set Fh to (choose folder)
	set Fp to POSIX path of Fh
	-- Dummy loop to absorb a small observed
	-- time handicap in the first repeat of two tests.
	-- Without it, the results are not symmetric if you reverse the order.
	repeat many times
	end repeat
	
	-- Test 1.
	set t to GetMilliSec
	repeat many times
		-- First test code or handler call here.
		set tA to paragraphs of (do shell script "mdfind -onlyin " & Fp & " 'kMDItemKind == \"Alias\"'")
	end repeat
	set tSh to ((GetMilliSec) - t) / 1000 --> in seconds
	
	-- Test 2.
	set t to GetMilliSec
	repeat many times
		-- Second test code or handler call here.
		tell application "Finder" to set A to (files in entire contents of Fh where it's kind is "Alias")
	end repeat
	set tF to ((GetMilliSec) - t) / 1000 --> in seconds
	
	-- Timings.
	-- Do it both ways unless you know for sure which is faster or don't care which form you get.
	return {tSh / tF, tF / tSh} --> {0.413200221852, 2.420134228188}
end lotsa

Bottom Line: the shell script version is about 2.4 times as fast as the Finder version so it’s worth it to accept the shell script latency (on my machine about 50 msec). It gets particularly “worth it” if you want all the aliases on your boot volume, for example.

Another example of an mdfind win is to get a list of all the applications on your startup disk. In this instance, we have four ways to go about it using metadata keys: ‘kMDItemContentTypeTree == "com.apple.application"c’ (where c is equivalent to ignoring case), ‘kMDItemKind == "Application"’, ‘kMDItemContentTypeTree == "com.apple.application-file"’ and ‘kMDItemContentTypeTree == "com.apple.application-bundle"’. Using the script below you can examine the differences between these forms for yourself - they won’t return the same number of file paths.


set AA to paragraphs of (do shell script "mdfind -onlyin / 'kMDItemContentTypeTree == \"com.apple.application\"c'")
set AB to paragraphs of (do shell script "mdfind -onlyin / 'kMDItemKind == \"Application\"'")
set AC to paragraphs of (do shell script "mdfind -onlyin / 'kMDItemContentTypeTree == \"com.apple.application-bundle\"'")
set AD to paragraphs of (do shell script "mdfind -onlyin / 'kMDItemContentTypeTree == \"com.apple.application-file\"'")
set C to {count AA, count AB, count AC, count AD} --> {1127, 1091, 780, 347} on my machine
-- List the differences between the forms
set D to {} -- difference between AA and AB
set E to {} -- difference between AA and AC
set F to {} -- difference between AB and AD
repeat with P in AA -- What's in AA that's not in AB
	tell contents of P to if it is not in AB then set D's end to it
end repeat
repeat with Q in AA -- What's in AA that's not in AC
	tell contents of Q to if it is not in AC then set E's end to it
end repeat
repeat with R in AA -- What's in AA that's not in AD
	tell contents of R to if it is not in AD then set F's end to it
end repeat

It takes 3.1 seconds on my machine to do all four searches of my entire root volume containing in excess of 600,000 files. There is no contest between these forms and the same task in an AppleScript using the Finder. It is left as an exercise for the reader (remember that from textbooks?) to discover the differences between the lists returned, i.e., what files these metadata attributes include.

As a final example before considering more complex attributes, the following script finds the HFS path to all of the scripts in either of my library folders.


-- Get them in POSIX form
set S to paragraphs of (do shell script "mdfind -onlyin /Library/Scripts 'kMDItemContentType == \"com.apple.applescript.script\"c'" & "mdfind -onlyin ~/Library/Scripts 'kMDItemContentType == \"com.apple.applescript.script\"c'")
-- Convert them in place to HFS form
repeat with F in S
	set contents of F to POSIX file F as Unicode text
end repeat -- S is the path to every script

Apple’s System-Declared Uniform Type Identifiers (UTIs)
In the scripts above, we were finding applications on the basis of what Apple calls their UTIs. Apple publishes a long list of the forms they use, although developers are free to use others of their own invention for identifying attributes of their own files. Apple publishes a list of UTIs, but not those of other developers. For further examples of how some of these are used in a search consider the next four scripts:


set AA to paragraphs of (do shell script "mdfind -onlyin / 'kMDItemContentTypeTree == \"com.apple.resolvable\"'")
-- An alternative with the same results on my machine
set AC to paragraphs of (do shell script "mdfind -onlyin / 'kMDItemContentTypeTree == \"com.apple.alias-file\"'")

will find all aliases in the specified range that the Alias Manager considers resolvable or that are of type alias-file.


set DI to paragraphs of (do shell script "mdfind -onlyin / 'kMDItemContentTypeTree == \"public.disk-image\"'")

will find all the disk image (.dmg) files in the “-onlyin” directory you specify.


set PD to paragraphs of (do shell script "mdfind -onlyin / 'kMDItemContentTypeTree == \"com.adobe.pdf\"'")

will find any pdfs for which Adobe Reader is the declared owner.


set PD to paragraphs of (do shell script "mdfind -onlyin /Documents/ 'kMDItemCreator == \"Windows NT 4.0\"'")

will find all files in your Documents folder that originated on a Microsoft NT 4.0 server.

Combining Finding (mdfind) with Listing Values (mdls) of Metadata

Consider finding songs that are in an iTunes database. First, we need the available metadata, and since you don’t know the posix path to an individual song in the database, we’ll search for one (you’ll have to substitute a song title you know is your own iTunes library). I’ve appended the full metadata string returned for one of mine so you can see what is returned from an iTunes library entry.


set F to do shell script "mdfind -onlyin ~/Music/iTunes 'When You Say Nothing At All'"
set MetaData to (do shell script "mdls " & quoted form of F)
--> (note that the first line is the path as the system "knows" it.)
(*
/Users/bellac/Music/iTunes/iTunes Music/Alison Krauss & Union Station/Live [Disc 2]/2-08 When You Say Nothing At All.m4a -------------
kMDItemAlbum                    = \"Live [Disc 2]\"
kMDItemAttributeChangeDate      = 2006-02-09 15:22:47 -0400
kMDItemAudioBitRate             = 127992
kMDItemAudioChannelCount        = 2
kMDItemAudioEncodingApplication = \"iTunes v6.0.2, QuickTime 7.0.4\"
kMDItemAudioTrackNumber         = 8
kMDItemAuthors                  = (\"Alison Krauss & Union Station\")
kMDItemCodecs                   = (AAC)
kMDItemComposer                 = \"Don Schlitz/Paul Overstreet\"
kMDItemContentCreationDate      = 2006-02-09 15:22:09 -0400
kMDItemContentModificationDate  = 2006-02-09 15:22:47 -0400
kMDItemContentType              = \"public.mpeg-4-audio\"
kMDItemContentTypeTree          = (
    \"public.mpeg-4-audio\", 
    \"public.audio\", 
    \"public.audiovisual-content\", 
    \"public.data\", 
    \"public.item\", 
    \"public.content\"
)
kMDItemDisplayName              = \"2-08 When You Say Nothing At All.m4a\"
kMDItemDurationSeconds          = 261.8978684807256
kMDItemFSContentChangeDate      = 2006-02-09 15:22:47 -0400
kMDItemFSCreationDate           = 2006-02-09 15:22:09 -0400
kMDItemFSCreatorCode            = 1752133483
kMDItemFSFinderFlags            = 0
kMDItemFSInvisible              = 0
kMDItemFSIsExtensionHidden      = 0
kMDItemFSLabel                  = 0
kMDItemFSName                   = \"2-08 When You Say Nothing At All.m4a\"
kMDItemFSNodeCount              = 0
kMDItemFSOwnerGroupID           = 20
kMDItemFSOwnerUserID            = 501
kMDItemFSSize                   = 4245331
kMDItemFSTypeCode               = 1295270176
kMDItemID                       = 1081807
kMDItemKind                     = \"MPEG-4 Audio File\"
kMDItemLastUsedDate             = 2006-02-09 15:22:09 -0400
kMDItemMediaTypes               = (Sound)
kMDItemMusicalGenre             = \"Country\"
kMDItemStreamable               = 0
kMDItemTitle                    = \"When You Say Nothing At All\"
kMDItemTotalBitRate             = 127992
kMDItemUsedDates                = (2006-02-09 15:22:09 -0400)
*)

As you can see, a lot of information. Now, suppose you wanted to know only the last time you listened to a song. You don’t have to get the full metadata file for a chosen song to get it; use the atribute name as we did above for aliases and scripts. Some further explanation is required before we go there, however, because we don’t want to use a do shell script “mdfind…” followed by do shell script “mdls…”; why make two shell calls, each a separate process to be spawned in its own thread, if one will do?

I wanted to use mdfind to find a song by title or some part of it, and then to use mdls to get the value of kMDItemLastUsedDate. mdls… will not accept a pipe from mdfind… - it requires an explicit path to a file be in the instruction or mdls errors. We can, however, combine the queries in the shell instruction itself as follows:

First, F=mdfind -onlyin ~/Music/iTunes/ " & quoted form of Q & " runs mdfind in the iTunes folder in my Music folder with the query supplied by the text returned from a dialog. The leading F=mdfind... (and note that those are “back ticks” (grave accent) preceding and following the instruction - they are NOT single quotes) sets the shell variable “F” to the results of the mdfind call in the back ticks - the path to a file if a unique one is found (we can’t deal with a list here). Then in the second half of the shell call, following a semicolon to terminate the first one, we use the variable just defined (“F”) explicitly in the second part (with the variable preceded by a dollar sign), the mdls -name kMDItemLastUsedDate "$F" part, with the $F in quotes (if in Terminal) and in escaped quotes for a “do shell script” call as shown in the script below.


set Q to (text returned of (display dialog "Enter a Unique Song Title" default answer "Song Title Here" with title "Find Last Played Date") as Unicode text)
try
	set LastPlayed to last paragraph of (do shell script "F=`mdfind -onlyin ~/Music/iTunes/ " & quoted form of Q & "`;mdls -name kMDItemLastUsedDate  \"$F\"")
on error
	display dialog "Title \"" & Q & "\" was not found or was not sufficiently unique to return a single result."
	return
end try
-- Now dig out the date and time we want.
set tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to "kMDItemLastUsedDate = "
set E to text item 2 of LastPlayed
set AppleScript's text item delimiters to space
set F to text items of E
set AppleScript's text item delimiters to tid
set tDate to reverse of (words of item 1 of F)
set AppleScript's text item delimiters to "/"
set tDate to tDate as text
set AppleScript's text item delimiters to tid
set tTime to item 2 of F
display dialog "\"" & Q & "\"" & " was last played on" & return & tDate & " at " & tTime

In more general terms, to find the metadata associated with any song in your iTunes library (without error checking - this will error if there is no song or too many just as the script above did) - but I leave it to you to fix that with a try block in the script that follows.


set tSong to quoted form of (text returned of (display dialog "Enter a unique part of a song title" with title "Find Song Data" default answer "Song Title Here")) as Unicode text
display dialog (do shell script "Song=`mdfind -onlyin ~/Music/iTunes/ " & tSong & "`;mdls \"$Song\"")

The Boolean Syntax of Shell Searches for Metadata Attributes

Spotlight shell searches can include boolean operations and some “globbing”, as described in Apple’s “Query Expression Syntax. This page describes the rules for searching with Spotlight from the command line. In that document the syntax for combining MD attributes is not the same as for a text entry in mdfind: AND is &&, OR is ||, equal to is ==, not equal to is !=, and some wild cards like “*” are possible, to name only a few of the rules. Further, there are key words for comparing dates and times. Using these we can target an mdfind query with conditions like these (from the reference):

(kMDItemAuthors == "Kevin"wc || kMDItemAuthors == "Steve"wc ) && (kMDItemContentType == "audio"wc || kMDItemContentType == "video"wc ) – (wc means word-based with a case transition allowed.)
This reads in English: Author contains “Steve” or “Kevin” and content type contains “audio” or “video”.

How do we use this? Suppose you wanted to classify a huge file of mixed type images according to their dimensions and resolution, but wanted only the JPEG image files among them. The following script would return the POSIX paths to files that met the specifications. Pay attention to the parentheses inside the single quotes that delimit (or “glob”) the conditions, too:


set OKPic to (do shell script "mdfind -onlyin /Users/MyUser/Pictures/Photos/ 'kMDItemKind == \"JPEG document\" && ((kMDItemPixelWidth >= 1500 && kMDItemPixelHeight >= 1500) || kMDItemResolutionHeightDPI >= 300)'")

Running a repeat loop on the paragraphs of OKPic and using mdls, we could then quite efficiently sort out those we found in the search in which the width was greater than height to go to a “Hi-Res Landscape JPEGs” folder, with the rest going to a “Hi-Res Portrait JPEGs” folder. That script would be much faster, I believe, than telling Image Events to open every file in the folder whose name extension was in {“jpg”, “JPEG”} and testing for those properties.

Alternatively, we could label the files that met our criteria in their Finder window:


set OKPic to paragraphs of (do shell script "mdfind -onlyin /Users/myUser/Pictures/ 'kMDItemPixelWidth >= 1500 && kMDItemPixelHeight >= 1500 || kMDItemResolutionHeightDPI >= 300'")
-- Now convert the paths and use the Finder to label the files.
repeat with p in OKPic
	set thePic to POSIX file p
	tell application "Finder" to set label index of thePic to 2
end repeat

As a final example using time tests, suppose we were interested in opening a clever script we had downloaded from bbs.applescript.com and filed in ~/Library/Scripts/ within the last few days. Since it was new, at least to us, we’d look for its kMDItemContentCreationDate some time back from yesterday. Because I lose my scripts rather too often by keeping them in nearly 30 folders inside my Script folder and forgetting what I called them, this is a handy way for me to find out where I put it. It finds all my Scripts that have been created during a selectable time back (in days) and presents me with a “choose from list” of their names. Choosing one (i.e., recognizing the name I couldn’t recall) opens it. All that happens in just two lines with one shell call.

set N to (text returned of (display dialog "How many days back should be considered?" default answer "7" with title "Open a Lost Script"))
open POSIX file (choose from list (paragraphs of (do shell script "mdfind -onlyin ~/Library/Scripts 'kMDItemContentCreationDate < $time.today && kMDItemContentCreationDate >= $time.today(-" & N & ")'")))

Special Metadata Searching Provisions

We saw in the preamble to this article that Spotlight is file-based, but we also know that Safari Bookmarks, Address Book entries, and iCal calendars are not single-file based, i.e., that an individual iCal event, A/B entry, or Safari Bookmark is not a single file; it’s in a database file. Safari’s Bookmarks (and similarly Camino’s) are stored in a single plist file (~/Library/Safari/Bookmarks.plist), Address Book uses a data file and two index files (~/Library/Application Support/AddressBook/AddressBook.data, ABPerson.skIndexInverted, and ABSubscribedPerson.skIndexInverted, where sk stands for Search Kit), and finally, iCal maintains a directory for each Calendar (~/Library/Application Support/iCal/Sources/.calendar, where UUID identifies the user and calendar). How then, can we search for these things if they’re not single files?

This is made possible because the system stores individual files for A/B entries, Safari Bookmarks, and iCal events in ~/Library/Caches/. You can see them with these instructions:


-- To see the Metadata Caches on your machine:
set MetadataCaches to do shell script "ls ~/Library/Caches/Metadata/"
-- on myMachine: Camino, Chronos Notes, Safari, Transmit, yojimbo, iCal

-- For Safari's uncomment the next line (remove the (* and *) lines only)
(*
set SB to do shell script "ls ~/Library/Caches/Metadata/Safari/"
*)
-- each with the name extension "webbookmark"

-- For the Address Book, A special Cache not in /Caches/Metadata
(*
set AB to do shell script "ls ~/Library/Caches/com.Apple.AddressBook/Metadata/"
*)
-- each name ending with "ABPerson.abcdp"

-- For iCal, uncomment the section below.
(*
set IC to do shell script "ls ~/Library/Caches/Metadata/iCal/"
-- Long hex-coded file names, one for each calendar.
-- Assuming you have at least one:
set C to paragraph 1 of IC
set aCal to do shell script "ls ~/Library/Caches/Metadata/iCal/" & C
-- a string of files ending in "-.icalevent"
*)

If you wanted to count your Safari (or Camino) Bookmarks, this would do it:


set BMs to paragraphs of (do shell script "mdfind 'kMDItemContentType == \"com.apple.safari.bookmark\"'")
set SBM to {}
repeat with abm in BMs
	if contents of abm contains "Safari" then set end of SBM to contents of abm
end repeat
set beginning of SBM to ((count SBM) as text) & " bookmark cache files listed"

iCal is searchable through its Metadata and doesn’t have to be running for the search to succeed. Here’s an example for finding my next dental appointment (obviously, you’ll have to put in an event name from any of your own calendars):


-- Find the calendar cache for the event containing the name (of my dentist, e.g.) after today
set D to paragraph -1 of (do shell script "mdfind 'kMDItemTitle == \"Dr. Penwell\" && kMDItemDueDate > $time.today'") as Unicode text
-- Get the value of Date and Time for this event (not combined to keep it clearer but recall they can be)
set DT to paragraph -1 of (do shell script "mdls -name kMDItemDueDate " & D)
-- Prepare a notification (in AppleScript rather than the shell)
set tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to "kMDItemDueDate = " as Unicode text
set Apt to text item 2 of DT
set AppleScript's text item delimiters to "/"
set tDay to (reverse of (words 1 thru 3 of Apt)) as text
set AppleScript's text item delimiters to ":"
set tTime to (words 4 thru -3 of Apt) as text
set AppleScript's text item delimiters to tid
set msg to "See Dentist on " & tDay & " at " & tTime
--> "See Dentist on 17/04/2007 at 11:45"

As a final example of searching these special files, finding the telephone number or numbers of a name in your Address Book (without running the Address Book.app):


set tName to text returned of (display dialog "Enter a known Address Book entry as it appears there" default answer "Cadabra Abra" with title "Look Up Phone Numbers") as Unicode text
-- defaults to the well-known genie and the entry in my Address Book that I use for testing. ;-)
try -- combined form for one call, errors for 0 or more than 1 value
	set dataAB to (do shell script "P=`mdfind 'kMDItemContentType == com.apple.addressbook.person && kMDItemDisplayName == \"" & tName & "\"'`;mdls -name kMDItemPhoneNumbers \"$P\"")
	set tid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to "kMDItemPhoneNumbers ="
	set PN to text item 2 of dataAB
	set AppleScript's text item delimiters to tid
	set PhNums to text 3 thru -2 of PN
	display dialog PhNums
on error -- Either no such name found or too many found if the name appears more than once.
	display dialog "Sorry, no such name appears." & return & return & "Did you enter it as it is in your Address Book.app?" & return & return & "Normally: Last_Name First_Name" with icon 0
end try

Enough. If I haven’t put you to sleep, I’ll quote my Grandmother’s standard bedtime story ending: “… and I stepped on a twig; and the twig bended; and the story ended”, to which there was a resounding “Aww, one more Gramma”, but there never was. Hope you’ve enjoyed this semi-exposé as much as I enjoyed exploring it. Have fun exploring it yourself.

Can Spotlight be limited to search only the top level folders inside a folder? For example, I would be interested in searching the contents of a folder for any top level folders inside it that had the characteristics kMDItemFSLabel = 0, (no label color) but I don’t want spotlight to return results that point to the sub folders in those unlabeled folders. Can Spotlight do that?

Thanks,

DS

Unfortunately not, DS.

The unix tool “find” understands depth but not label index, while mdfind understands label index but depth is not a metadata property.

Fortunately, AppleScript can do what you want:

tell application "Finder"
	set F to folders of alias (choose folder) -- looks only at top level folders, not contents
	set no_label to {} -- placeholder
	repeat with I in F -- grind through the set
		if label index of I is 0 then set end of no_label to contents of I
	end repeat
end tell
no_label -- a list of paths to uncolored folders

If you want names replace “contents” with “name”

Thanks, I appreciate your answer. I’ll give your AppleScript a try and I’ll stick to a purely AS/Finder solution as long as I am confined by label index and depth. Thanks for the great article on Spotlight! Maybe I’ll get to incorporate it into something else.

Excellent post! Quick question:

Is it possible to script the Privacy setting in Spotlight? If monitoring of file changes (e.g., fsevents) is taking place as your Applescript is batch processing lots of finder items and this monitoring significantly slows down your Applescript, is it possible to temporarily “privatize” the directories you are processing with your script and then “unprivatize” it after it completes??

It would be the equivalent of going to System Preferences > Spotlight > Privacy - and then choosing the directories you didn’t want checked while your Applescript runs. Then it would take it off of that list once the script finishes? Would that work? Is this a solution? I have scripts that process directories with over 500 files in them and check them against others etc. The scripts have slowed down significantly since the dawn of 10.5 because of this further file monitoring. I would love to be able to speed them up by using a temporary pause on this monitoring. Any suggestions?

Update: I just dragged the directory in which my Applescript was batch processing all of the items to the Privacy tab in Spotlight. Just this change Increased the speed of the script by 80%.

First off, thank you!. very enlightening post. now a slew of questions:

you say “wc means word-based with a case transition allowed.” is this syntax specific to mdfind? i was not able to find any documentation anywhere on apple’s File Metadata Query Expression Syntax web page in regards to this. in practice it appears to work but i’m just wondering where you learned this? and if this is currently undocumented are there other such expressions that are undocumented as well?

also, i could use a bit of help understanding how these statements are read. for instance, why is:

kMDItemContentType == "video"wc

not read simply as

kMDItemContentType == videowc

by the shell. and also, what would be the difference between including “wc” at the end of a search query and using the case modifier as illustrated on apple’s web page:

kMDItemTextContent ==[c] “Paris”

would this give the same result as:

kMDItemTextContent == "Paris"wc

i know that’s a lot of questions. any help you can offer would be greatly appreciated.

Glad you found it useful. Bear in mind as I have a stab at your questions that I wrote that nearly 5 years ago.

You can combine query flags, so w means “whole word” and avoids finding interior pieces. c, as you know is “any case”.

because of the quotes. To make it seem to be videowc you’d have to concatenate the pieces.

I don’t think there’s any difference. Apple’s way is safer because you’ll never know if they’ll change any of the others.

Hope I’ve helped a bit.

thanks! that pretty much answers everything i asked about… the only lingering question is in regards to the “w” in “wc”. if i understand you correctly this is taken as two different arguments “c” for case, and “w” for word. but i still don’t see any mention of the “w” in the man page or on apple’s site. where are you getting this from?

I read an article years ago that explained some of the spotlight query terms. I don’t remember where now (5 years ago). My recollection is that it doesn’t actually mean “word”. Instead, I think it means “bounded by white space” which in most circumstances results in a word. There’s also a “-live” flag that leaves mdfind running and listing all new encounters of the search term. I’ve never used it. Unfortunately, Apple has never published a decent document on all the things metadata searches can do but you might find more by googling around. If you do, please let me know too.

I read an article years ago that explained some of the spotlight query terms. I don’t remember where now (5 years ago). My recollection is that it doesn’t actually mean “word”. Instead, I think it means “bounded by white space” which in most circumstances results in a word. There’s also a “-live” flag that leaves mdfind running and listing all new encounters of the search term. I’ve never used it. Unfortunately, Apple has never published a decent document on all the things metadata searches can do but you might find more by googling around. If you do, please let me know too.

You might find this useful: The Power of mdfind

Cameron Hayne over at The MacOSXHints Forums just added a neat wrinkle – using a bash function to make it possible to search for partial names: for example, this script.

set tScript to "function locatemdw { mdfind -onlyin  /Applications  \"kMDItemDisplayName == '*$@*'w\"; }
locatemdw Ling"
set tPaths to (do shell script tScript & "| grep '.app$'")
--> /Applications/Lingon 3.app

note that adding a c following the w in "'$@'w; " makes your search case insensitive as well.

grep .app should be grep ‘.app$’. With the dollar sign at the end you tell grep the the line needs to end with the given expression so file that ends with .applescript are not included. The single quotes are needed so the substitution for the dollar sign in bash is turned off.

Thank you. The script is corrected.