Fastest way to get file list from folder

I’m hoping to speed up a script that is using the finder to get a list of files of specific types in the entire path of a folder. Is there a faster way than using the finder, possibly a shell command or a method in Xcode that could be called from an AppleScript Studio application?
Current command:


set fileTypes to {"psd", "tif", "jpg", "png", "gif", "bmp", "eps", "ai", "pdf"}
tell application "Finder" to set ImageCount to (every file of entire contents of theItem whose name extension is in fileTypes)

The usual shell command is find. The options to find are really a small language. Learning how to put the available options together takes time. It may not always be obvious how to modify an existing list of options to add a new search criteria.

find can be faster than a search in Finder, but there are fundamental differences in how they work. find works at the POSIX level and takes no notice of file packages (directories of files that appear as a single item in Finder (e.g. most applications)). Images inside packages will be skipped by the Finder code you were using, but find will ignore the package status of a directory and search inside it. Not all the metadata available to Finder is directly available to find (type, creator, UI positioning info, icon, Get Info comments, displayed name, etc.). Great care must be taken in quoting the arguments to find to make sure nothing is interpreted incorrectly. Also, since find’s output is mostly plain text, care must be taken to make sure that filenames that include newlines or carriage returns are preserved (they are valid file name characters, though Finder usually does a good job dissuading the user from employing them).

The following code uses find to search for files that have extensions in a specified list.

on run
	set fileTypes to {"psd", "tif", "jpg", "png", "gif", "bmp", "eps", "ai", "pdf"}
	set theItem to alias ((path to desktop as Unicode text) & "stuff:") -- 3496 files in 290 directories; 3248 items (47 packages) in 160 folders
	--set theItem to alias ((path to startup disk as Unicode text) & "Applications:Utilities:") -- 12574 files in 5185 directories; 43 items (34 packages) in 6 folders
	set t0 to current date
	with timeout of 600 seconds
		tell application "Finder" to set ImageCount to (every file of entire contents of theItem whose name extension is in fileTypes)
	end timeout
	set t1 to current date
	set foundFiles to findFiles(POSIX path of theItem, fileTypes)
	set t2 to current date
	{t1 - t0, length of ImageCount, t2 - t1, length of foundFiles}
	--> /Applications/Utilities: {0, 0, 12, 1134}: Finder takes less than 1 second to find 0 matching items, find takes 12 seconds to find 1134 matching files
	--> ~/Desktop/stuff: {213, 689, 4, 714}: Finder takes 213 seconds to find 689 matching items, find takes 4 seconds to find 714 matching files
end run

to findFiles(startDirectory, extensionList)
	if class of extensionList is not list then
		set extensionList to {extensionList}
	else
		(* Duplicate the list, because it will be destructively modified by buildFind0Command.
		* We should not stomp on our caller's parameters unless it is clearly documented (like convertPOSIXPathsToFiles and convertItemsToAliases do).
		*)
		copy extensionList to extensionList
	end if
	set find0Cmd to buildFind0Command(startDirectory, extensionList)
	-- use "true" to ignore the exit code of find, this is needed if there might be any unreadable directories under startDirectory
	set cmd to find0Cmd & ";true"
	set findResults to do shell script cmd without altering line endings
	set posixPaths to extractFind0Results(findResults)
	set theFiles to convertPOSIXPathsToFiles(posixPaths)
	convertItemsToAliases(theFiles) -- if desired
end findFiles

to buildFind0Command(startDirectory, extensionList)
	"find " & quoted form of startDirectory & " -type f " & buildExtensionsClause(extensionList) & " -print0"
end buildFind0Command

to buildExtensionsClause(extensionList)
	if length of extensionList is 0 then return ""
	repeat with e in extensionList
		-- use -iname to match the normal case insensitivity exhibited in Finder
		set contents of e to "-iname \\*." & quoted form of quoteForFindNamePredicate(contents of e)
	end repeat
	if length of extensionList is greater than 1 then
		set {otid, text item delimiters} to {text item delimiters, {" -or "}}
		try
			set extensionClause to "\\( " & (extensionList as Unicode text) & " \\)"
			set text item delimiters to otid
		on error m number n from o partial result r to t
			set text item delimiters to otid
			error m number n from o partial result r to t
		end try
	else
		set extensionClause to first item of extensionList
	end if
	return extensionClause
end buildExtensionsClause

to quoteForFindNamePredicate(nameString)
	-- all of "[]*?" should be quoted with a backslash
	repeat with c in "[]*?"
		set c to contents of c
		set nameString to switchText from nameString to ("\\" & c) instead of c
	end repeat
	nameString
end quoteForFindNamePredicate

(* switchText From: http://bbs.applescript.net/viewtopic.php?pid=41257#p41257
Credit: kai, Nigel Garvey*)
to switchText from t to r instead of s
	local d
	set d to text item delimiters
	try
		set text item delimiters to s
		set t to t's text items
		-- The text items will be of the same class (string/unicode text) as the original string.
		set text item delimiters to r
		-- Using the first text item (beginning) as the first part of the concatentation means we preserve the class of the original string in the edited string.
		tell t to set t to beginning & ({""} & rest)
		set text item delimiters to d
	on error m number n from o partial result r to t
		set text item delimiters to d
		error m number n from o partial result r to t
	end try
	t
end switchText

to extractFind0Results(findResults)
	-- break up the null-delimited result of "find -print0"
	set {otid, text item delimiters} to {text item delimiters, {ASCII character 0}}
	try
		set foundPaths to text items of findResults
		set text item delimiters to otid
	on error m number n from o partial result r to t
		set text item delimiters to otid
		error m number n from o partial result r to t
	end try
	-- the output from "find -print0" always ends in a null, so drop the final, always empty entry
	if length of foundPaths is greater than 1 then
		set foundPaths to items 1 through -2 of foundPaths
	else
		set foundPaths to {}
	end if
	foundPaths
end extractFind0Results

to convertPOSIXPathsToFiles(posixPaths)
	-- changes the contents of the list passed as posixPaths!
	repeat with p in posixPaths
		set contents of p to POSIX file (contents of p)
	end repeat
	posixPaths
end convertPOSIXPathsToFiles

to convertItemsToAliases(theItems)
	-- changes the contents of the list passed as theItems!
	repeat with i in theItems
		set contents of i to contents of i as alias
	end repeat
	theItems
end convertItemsToAliases

Edit History: Added without altering line endings to do shell script otherwise, line feeds are changed to carriage returns. 2009-01-30: Corrected bug involving inadvertent modification of contents of list passed to findFiles as extensionList. Thanks to Jerome for reporting the problem.

On Spotlight-enabled Macs you can also make use of the «mdfind» command to quickly search for specific files. To find out about the meta-data properties of a file, you can use «mdls» in the Terminal.

The script below is only an example. It might be better to use the file extension instead of the item kind property in order to search for image and graphic files (e.g. “kMDItemFSName == ‘.png’ || kMDItemFSName == '.pdf’” (because of the localization of the item kind property…).


set searchpath to (path to documents folder)
set itemkinds to {"Portable Document Format (PDF)", "Portable Network Graphics image"}
set ikcode to ""
set countitemkinds to length of itemkinds
repeat with i from 1 to countitemkinds
	if i is not equal to countitemkinds then
		set ikcode to ikcode & "kMDItemKind == " & quoted form of (item i of itemkinds) & " || "
	else
		set ikcode to ikcode & "kMDItemKind == " & quoted form of (item i of itemkinds)
	end if
end repeat
set ikcode to "\"" & ikcode & "\""
set command to "mdfind -onlyin " & quoted form of POSIX path of searchpath & " " & ikcode
set output to paragraphs of (do shell script command)

You might find this handler useful…something chrys provided me. It returns a list of files and folders (items) at a specific heirarchy level (say you knew you needed to know all the folder/files in a directory 2 levels down) but omits items from a list of exceptions. It returns the items as a text list. Also uses shell find, as suggested in this thread, and uses a few speed tricks with properties.

I freely admit I only barely understand how it works, but it works like a charm–I’ve used it in two different scripts already. :wink:

-- Get a file/folder list of all items at a certain level inside a given folder
--
-- with help from chrys of MacScripter
-- http://bbs.applescript.net/viewtopic.php?pid=91191#p91191
--
on listGetter(folder_to_scan, scan_level, folder_exceptions)
	--exceptions formatted for shell find
	copy folder_exceptions to folder_exceptions
	repeat with fe_ref in folder_exceptions
		set contents of fe_ref to quoted form of contents of fe_ref
	end repeat
	set ASTID to AppleScript's text item delimiters
	set AppleScript's text item delimiters to " -or -name "
	set exclude_code to text 6 thru -1 of ("" & ({""} & folder_exceptions))
	set AppleScript's text item delimiters to ASTID
	--do shell find with exceptions
	do shell script "/usr/bin/find " & (quoted form of POSIX path of folder_to_scan) & " ! \\( \\( " & exclude_code & " \\) -prune \\) -maxdepth " & scan_level & " -mindepth " & scan_level & " -print0 ; true" without altering line endings
	set find0 to result
	set {ASTID, text item delimiters} to {text item delimiters, {ASCII character 0}}
	try
		set POSIX_pathnames to text items 1 through -2 of find0 -- Drop the last text item because it is always empty (find -print0 always prints a trailing null).
		set text item delimiters to ASTID
	on error m number n from o partial result r to t
		set text item delimiters to ASTID
		error m number n from o partial result r to t
	end try
	script speedHack
		property Mac_pathnames : {}
	end script
	repeat with P_pn in POSIX_pathnames
		set end of speedHack's Mac_pathnames to (POSIX file (contents of P_pn)) as text
	end repeat
	speedHack's Mac_pathnames
end listGetter

Thanks for the code! chrys, that really speeds things up. I’m almost amazed at how much faster it is, especially with the added code.

I knew there must be a shell command to do this, but wasn’t sure what it was. I admit I’m a bit ignorant on shell scripting, though I keep meaning to learn more on it, I need to find a good book on it unfortunately there isn’t one that I have found titled “Shell Scripting for the Mac Wanna-Be-Geek”. I will read up more on the Unix Find command.

Calvin, what are the parameter settings for the handler (folder_to_scan, scan_level, folder_exceptions). Fonder_to_scan is a no brainer but the other two are not readily apparent.

Martin, I’m not sure if I want to rely on spotlight. I think that there is always a trade off on file finds. In the past I have used “Kind” and “Type” which were pretty reliable on files built on a Mac in the pre-OS X days, but didn’t work if the file was missing the file resources. File Extension seems to be pretty reliable today since most files are saved with an extension even if is not visible but won’t always pick up older Mac files that were not saved with extensions.

Again, thanks for your help.

FYI, I made a change to my script above:

Sorry for the omission!

Thanks Chrys, a quick test returns 2414 files in 4 seconds on an external USB drive versus 64 for the Finder method. A nice reduction in time using this method over the Finder.

folder_to_scan
The folder to run the handler on.

scan_level
How deep in the subfolder heirarchy to search.

folder-exceptions
Names of files and folders that the handler should skip.

So instead of return all files and folder in a given folder, or all the files, folders, and subfolders in a given folder, the script allows you to scan a specific level of folders. In this example:

FOLDER NAME
Level 1a
Level 2a-1
Level 2b-1
Level 2c-1
Level 1b
Level 1c
Level 2a-2
Level 2b-2
Level 2c-2

It could return the paths to Level 1a, Level 1b, and Level 1c as a list, or just their contents at Level 2.

I found it useful for parsing user data on my server which has a very rigid heirarchy, but does have a few exceptions. Sometimes the folder/file list I need is 2 levels deep, sometimes only 1, but this lets me collect both sets together into one master set of user items to parse all at once (looking for old files/folders, for example). I also like that it lets me get the Finder concept of “items” like Tiger (since I don’t care if it’s a file or a folder, only that it’s at a given level of the heirarchy), but compatible with Leopard.

Esoteric I admit, but I like this little gem of chrys’… :wink: