Getting All Files List of Directory and of it's SubDirectories



set the_folder to choose folder
set all_Files_List to {}
get_All_Files_of_Folder(the_folder, all_Files_List)
all_Files_List

on get_All_Files_of_Folder(the_folder, all_Files_List)
	tell application "Finder"
		--Check each of the files in this disk/folder
		set files_list to (every file of the_folder)
		repeat with i from 1 to (count files_list)
			set end of all_Files_List to item i of files_list
		end repeat
		set sub_folders_list to folders of the_folder
		repeat with the_sub_folder_ref in sub_folders_list
			my get_All_Files_of_Folder(the_sub_folder_ref, all_Files_List)
		end repeat
	end tell
end get_All_Files_of_Folder

Thanks for sharing your recursive technique. If you’re wanting to specifically develop a recursive handler yourself to do a deep dive through a directory tree in (vanilla) AppleScript, you will be far better off in terms of speed by using System Events rather than Finder. System Events is generally more capable and far more graceful at dealing with file system operations than Finder, and doesn’t block the Finder UI during execution of a script.

The few occasions where one must use Finder are for the UI-related operations/propeties/objects: reveal, insertion location, selection, update, and Finder window.

However, in case you might not be aware, Finder has a built in AppleScript property that performs a recursive enumeration of all files and folders within a directory tree. In its most basic form, it looks like this:

tell application "Finder" to get the entire contents of (choose folder)

This is faster than your present script (although your script could be optimised to out-perform this Finder operation as it is written here).
To increase the speed and usability of the returning data, coerce the expression to an alias list:

tell application "Finder" to get the entire contents of (choose folder) as alias list

General rule of thumb: alias objects and alias list objects return much speedier results than Finder file objects, and are innately more useful in most scripts.

Another way to optimise the return of a recursive search is to specify the type of Finder object you want, e.g. document files, folders, files, application files, etc.:

tell application "Finder" to get every application file in the entire contents of (path to applications folder)

This enumerates application bundles very quickly in the applications folder—don’t try and enumerate that folder without specifying a Finder file class, because it will descend into the application bundles itself, pull out every icon, localised string file, image, program file, etc. and you will make AppleScript poo its pants.

That already-speedy enumeration of application bundles can be made faster still by combining it with the earlier tip about coercing things to alias lists:

tell application "Finder" to get every application file in the entire contents of (path to applications folder) as alias list

One final tip is that you can, instead of coercing to an alias list, enumerate specific properties for each of the files, and this is almost always faster still:

tell application "Finder" to get the name of every application file in the entire contents of (path to applications folder)

The entire contents property is best reserved for fairly shallow directory trees, or if narrowing the focus by specifying specific file classes. If you attempt to enumerate the Desktop, you’ll regret it (the Macintosh HD disk folder sits on the desktop for many people, so you’ll end up attempting to retrieve every single file on your computer). It is a fast operation when used correctly, but only relative to other Finder operations, which are drastically slow to begin with.

For a more universal deep dive, you’ll either want to use System Events and build a recursive subroutine like the one you’ve done for Finder (SE is, on my machine, approximately 8 to 11 times faster in generalised, cursory testing); or use AppleScriptObjC, which can do a dive of my entire Home folder in approximately zero seconds, which is infinity times faster than System Events.

I fear you flatter it. It also slows down more so when the hierarchy above the container is deep, and if you misjudge things, it has a habit of just going AWOL.

And if you’re not comfortable writing the latter, there’s always my FileManagerLib:

use scripting additions
use theLib : script "FileManagerLib" version "2.2.1"

set theDesktop to path to desktop
set theContents to objects of theDesktop result type files list with searching subfolders without include folders

Now we will check everything:

with timeout of 600 seconds
	set myFolder to path to desktop
	tell application "Finder" to get the entire contents of myFolder
end timeout

Result: 07 min 16 sec 87 msec
NOTE: This is not even a clean list of files, there is also a bunch of folders in it!!!

set the_folder to path to desktop
set all_Files_List to {}
get_All_Files_of_Folder(the_folder, all_Files_List)
all_Files_List

on get_All_Files_of_Folder(the_folder, all_Files_List)
	tell application "Finder"
		--Check each of the files in this disk/folder
		set files_list to (every file of the_folder)
		repeat with i from 1 to (count files_list)
			set end of all_Files_List to item i of files_list
		end repeat
		set sub_folders_list to folders of the_folder
		repeat with the_sub_folder_ref in sub_folders_list
			my get_All_Files_of_Folder(the_sub_folder_ref, all_Files_List)
		end repeat
	end tell
end get_All_Files_of_Folder

Result: 4 min 39 sec 37 msec

set the_folder to path to desktop
set all_Files_List to {}
get_All_Files_of_Folder(the_folder, all_Files_List)
all_Files_List

on get_All_Files_of_Folder(the_folder, all_Files_List)
	tell application "System Events"
		--Check each of the files in this disk/folder
		set files_list to (every file of the_folder)
		repeat with i from 1 to (count files_list)
			set end of all_Files_List to item i of files_list
		end repeat
		set sub_folders_list to folders of the_folder
		repeat with the_sub_folder_ref in sub_folders_list
			my get_All_Files_of_Folder(the_sub_folder_ref, all_Files_List)
		end repeat
	end tell
end get_All_Files_of_Folder

Result: 00 min 00 sec 15 msec

use scripting additions
use theLib : script "FileManagerLib" version "2.2.1"

set theDesktop to path to desktop
set theContents to objects of theDesktop result type files list with searching subfolders without include folders

Result: 00 min 00 sec 05 msec

BREAF CONCLUSION:

  1. Oscar goes to Stanley’s FileManagerLib
  2. System Event’s recursive subroutine KniazidisR (for those who don’t like Stanley…)
    3) Finder’s recursive subroutine KniazidisR —> to trash
  3. Finder’s get the entire contents —> to trash

There are also differences in what is returned. The System Events version includes invisible files, including .DS_Store files. The Finder version may or may not, depending on your current Finder setting. The lib version has an optional include invisible items parameter, which defaults to false.

The System Events version also tries to resolve any symlinks, rather than reporting them as files (even if the symlinks point to folders). In fact, I tried modifying your System Events handler to return POSIX paths, changing the relevant line to this:

set end of all_Files_List to POSIX path of item i of files_list

When that hits a symlink, the script errors:


Then…

If you’re going to test something, for future reference, don’t choose one of the two specific situations that is stated not to be well suited to the method being tested.

Also, I would never use entire contents in the form you have here. I stated this as its most basic form, but then went on to list numerous ways to optimise its efficiency.

Yes, it’s called entire contents, and I also stated in my post that you can filter by class type, and gave examples of how to do this. Eliminating folders is easy and has the added benefit of increasing its speed.

Actually, the testing procedure belongs in the trash. None of your scripts were optimised, and you specifically chose the slowest form of one method in its least applicable environment. Seeing that System Events was timed at 15s, I believe you could shave another 5s off that. You can get a test to yield any results you wish if you don’t care about performing a reasonably thought out one (which can still be a cursory estimation: quick and rough shouldn’t mean it is unreliable).

It’s not a concern for me, at all, as I know when and where each method is most suitable. As with most things in every discipline, there is rarely ever going to be one method that will be universally the ideal/best/most efficient/most well-suited at performing a task. It’s about knowing what’s available to you and knowing how to use it, so you can pick the right tool, for the right job, at the right time. Bear in mind also that speed of execution isn’t and shouldn’t be your only consideration.

By all means, throw away a technique you may not have previously known about, and believe it has absolutely no use or value in any situation; but you now have one less piece of artillery you have in your AppleScripting arsenal with which can use to solve a task.

To paraphrase what I wrote: “suited to shallow tree depths; not to be used to enumerate desktops; and it’s fast compared to other operations in Finder, but that still makes it potentially slow.” I think what might appear as flattery at first glance is actually me sounding too British when I write. I wonder if it sounded like I was selling entire contents as a triumph of recursive directory searching. In the context of a recursive handler operating in Finder, I was giving attention to a feature that some people don’t know about, and those that do often implement poorly. I do believe it gets more negative press than it warrants, but my post was endorsing System Events over Finder in the vanilla setting (although, I note what you said about simlinks). There are situations, however, where enumerating the entire contents in Finder will be faster than any manually written (non-ASOC)handler; and, if one doesn’t only consider execution time, but also the time of the programmer, even ASOC can lost its appeal and entire contents can become increasingly attractive to some for non-time-sensitive tasks.

I’m piqued by what you said about entire contents being slowed down by the depth of the tree above its starting point. Can you tell me more about this,? It sounds like an important thing to know and I don’t believe I was aware of this previously. It feels very counter-intuitive, but, of course, many things can at first. Do you know to what extent the “canopy” affects one’s descent ? And do you know how or why this is (or where I should go to read about it)?

No, your “British-ness” wasn’t lost on me — but I fear my “Australian-ness” in the reply was lost. (I confess I don’t follow your point about the desktop — my hard disk sits on it, but it doesn’t get enumerated by any of these methods, as far as I can see.)

Honestly, my biggest gripe is that I’ve seen too many scripts fail from using entire contents. Mostly because it has been used outside some of the caveats you outlined, and in some cases because the assumptions about number of files or depth made when and where the script was written no longer apply. Environments change (someone changes their setting for whether invisible files show, for example, and the command behaves differently), and talk about programmer time should not overlook the possibility (probability?) of maintenance that ends up being a rewrite.

If there were no reasonable alternatives, or if the method had some other redeeming features, I could understand the appeal. But one of the reasons I wrote FileManagerLib was to provide a simple-to-use alternative without the drawbacks. (I’m also generally wary of System Events, simply because it’s been an inconsistent bug-fest over the years. It still doesn’t understand that something designed for AppleScript should return an HFS value for the name property.)

The issue with hierarchy is that AppleScript specifiers are built by traversing up the ownership hierarchy to the root, building parent specifiers as you go. There’s no container-level caching, which means the full hierarchy is traversed for every item. So building the specifiers for a bunch of items in a folder is going to take a whole lot longer if its buried many folders deep. I suspect using as alias list short-circuits much of this, at least to the point where it doesn’t seem noticeable.

Oh dear, I had no idea you are Australian. How awful I just assume people to be American by default.

Are you doubly sure that enumerating the entire contents of your desktop doesn’t descend through your entire hard drive ? I just tested it using:

I didn’t let it continue to run beyond the 2 minute mark (in the past, it took almost 20 minutes on my underpowered machine). Once I remove my hard disk from the desktop, the desktop is enumerated in 0.38 seconds. If Apple changed this in Mojave, I think it’s a good thing if it now knows to omit attached devices that sit on the desktop.

Yes, I completely agree. It’s possibly one reason I felt it could worthwhile writing a post about how to implement it in a useable fashion, but wasn’t meaning to market it as an all-rounder that I or anyone should reach for in the first instance. Perhaps I ought to have clarified that it’s a tool for specific use cases, but determining those use cases is something the scripter will have to get a feel for.

Agreed. Robustness is (should) be a priority.

I always wondered what might explain the benefits inferred by a coercion of this nature, and your explanation makes sense. Thanks.