entire contents is slow, make it faster?

I found out that getting the entire contents of a folder is 8 to 9 times slower than getting the name of files only. On a folder with 1837 files inside in several subfolders it took only 2 seconds to complete against the 17 for set theFilesPaths to every file of entire contents of theFolder
Adding set modification date,creation date etc did not slow it down.
What else could one add to my list below please to get the same (or more) info about the files?
Thanks


set theFolder to choose folder
tell application "Finder"
	--set theFilesPaths to every file of entire contents of theFolder
	set theFilesNames to name of every file of entire contents of theFolder
	set theFilesModDates to modification date of every file of entire contents of theFolder
	set theFilesCreationDates to creation date of every file of entire contents of theFolder
	set theFilesModSizes to size of every file of entire contents of theFolder
	set theFilesTypes to file type of every file of entire contents of theFolder
	set theFilesCreators to creator type of every file of entire contents of theFolder
end tell

Hi,

indeed your script is very slow, because calling entire contents six times is incredibly expensive
Calling it just once is more effective but still quite slow


set theFolder to choose folder

tell application "Finder"
	with timeout of 30 * minutes seconds
		set {name:theFilesNames, modification date:theFilesModDates, creation date:theFilesCreationDates, size:theFilesModSizes, file type:theFilesTypes, creator type:theFilesCreators} to every file of entire contents of theFolder
	end timeout
end tell

The fastest way is to determine the files with the shell find command and create the lists with a repeat loop


set theFilesNames to {}
set theFilesModDates to {}
set theFilesCreationDates to {}
set theFilesModSizes to {}
set theFilesTypes to {}
set theFilesCreators to {}

set theFolder to choose folder
set theFiles to paragraphs of (do shell script "find " & quoted form of POSIX path of theFolder & " -type f ! -name '.*'")
repeat with oneFile in theFiles
	tell (info for POSIX file oneFile as alias)
		set end of theFilesNames to name
		set end of theFilesModDates to modification date
		set end of theFilesCreationDates to creation date
		set end of theFilesModSizes to size
		set end of theFilesTypes to file type
		set end of theFilesCreators to file creator
	end tell
end repeat

Thank you Stefan.
I’ll test.
In the meanwhile: we are now asking for name, creation date, modification date,size, type and creator.
I wander if there are there any other attributes? that we can add to the list.

Hi Stefan,

I ran your script on the same folder and unfortunately got this error:
Applescript Error
Can’t make file “Macintosh HD MBP:Users:ml:Documents:WordPower Documents:Fantail Digital: Fantail jpgs:Icon” into type alias.
The offending file “icon” is an invisible file. It is the custom icon file of the folder containing the jpgs. I tried to show all invisible files and then run the script again but it gave the same error.
I deleted the icon file and it ran smoothly
I recreated the custom folder icon with viou (copy icon and create custom folder icon from previously copied icon) and it failed again.

Result of my tests:
I ran your script and mine on the same folder with 1837 files in several subfolders.
Your script took 5 seconds, my original one 1.
It looks like my theory above is really right:
"getting the entire contents of a folder is much slower than getting the name then the creation date then the mod date etc of files only, one after the other.

Could you please test these both and confirm or tell me if I am under a misconception?
Thanks for your interest and input.

Your script


set theFolder to choose folder
set myDate to (current date)
set {ASTID, AppleScript's text item delimiters} to {AppleScript's text item delimiters, return} -- save delimiter value and set the new one
-- set theFolder to choose folder
set theFilesNames to {}
set theFilesModDates to {}
set theFilesCreationDates to {}
set theFilesModSizes to {}
set theFilesTypes to {}
set theFilesCreators to {}
set theFiles to paragraphs of (do shell script "find " & quoted form of POSIX path of theFolder & " -type f ! -name '.*'")
repeat with oneFile in theFiles
	tell (info for POSIX file oneFile as alias)
		set end of theFilesNames to name
		set end of theFilesModDates to modification date
		set end of theFilesCreationDates to creation date
		set end of theFilesModSizes to size
		set end of theFilesTypes to file type
		set end of theFilesCreators to file creator
	end tell
end repeat
set theTime to stTm(myDate)
return "Lapse:" & theTime & return & return & theFilesNames & return & return & theFilesCreationDates & return & return & theFilesModDates & return & return & theFilesModSizes & return & return & theFilesTypes & return & return & theFilesCreators
on stTm(strTm)
	set lpsdScs to (current date) - strTm
	set hrs to lpsdScs div 3600
	set lpsdScs to lpsdScs - (hrs * 3600)
	set mts to lpsdScs div 60
	set lpsdScs to lpsdScs - (mts * 60)
	return ("Hrs:" & hrs & " Mts:" & mts & " Sec:" & lpsdScs)
end stTm

mine:

set theFolder to choose folder
set myDate to (current date)
set {ASTID, AppleScript's text item delimiters} to {AppleScript's text item delimiters, return} -- save delimiter value and set the new one
-- set theFolder to choose folder
tell application "Finder"
	--set theFilesPaths to every file of entire contents of theFolder
	set theFilesNames to name of every file of entire contents of theFolder
	set theFilesModDates to modification date of every file of entire contents of theFolder
	set theFilesCreationDates to creation date of every file of entire contents of theFolder
	set theFilesModSizes to size of every file of entire contents of theFolder
	set theFilesTypes to file type of every file of entire contents of theFolder
	set theFilesCreators to creator type of every file of entire contents of theFolder
	-- set nameCheckList to theFilesNames
end tell
set AppleScript's text item delimiters to ASTID -- restores the old value
set theTime to stTm(myDate)
return "Lapse:" & theTime & return & return & theFilesNames & return & return & theFilesCreationDates & return & return & theFilesModDates & return & return & theFilesModSizes & return & return & theFilesTypes & return & return & theFilesCreators
on stTm(strTm)
	set lpsdScs to (current date) - strTm
	set hrs to lpsdScs div 3600
	set lpsdScs to lpsdScs - (hrs * 3600)
	set mts to lpsdScs div 60
	set lpsdScs to lpsdScs - (mts * 60)
	return ("Hrs:" & hrs & " Mts:" & mts & " Sec:" & lpsdScs)
end stTm

Hi.

The name of an “Icon” file is actually five characters, the last being a return:

"Icon
"

The return’s apparently omitted from shell script results, so changing ‘paragraphs of’ to something else wouldn’t be effective. You’d need to insert a line to look out specifically for such a file:


set theFilesNames to {}
set theFilesModDates to {}
set theFilesCreationDates to {}
set theFilesModSizes to {}
set theFilesTypes to {}
set theFilesCreators to {}

set theFolder to choose folder
set theFiles to paragraphs of (do shell script "find " & quoted form of POSIX path of theFolder & " -type f ! -name '.*'")
repeat with oneFile in theFiles
	if (oneFile ends with "Icon") then set oneFile to oneFile & return
	tell (info for (oneFile as POSIX file))
		set end of theFilesNames to name
		set end of theFilesModDates to modification date
		set end of theFilesCreationDates to creation date
		set end of theFilesModSizes to size
		set end of theFilesTypes to file type
		set end of theFilesCreators to file creator
	end tell
end repeat

Well spotted Nigel
Thanks for the correction.

Nitpick: info for is deprecated.

If the original poster (or other readers) care to fully handle these kinds of file names, it is possible to do so, but it is not as simple or as easy to read and understand as the paragraphs of method. I have previously posted at least one script that can properly handle such files names (it is not completely generalized, though the code is broken down into handlers that could be recomposed in other ways).

The core bits of the implementation are: use -print0 instead of -print in the find command line (-print is implied if no “actions” are given); use do shell script . without altering line endings, split the returned string at each NUL (ASCII character 0; [b]character id 0/b if pre-Leopard compatibility is of no concern); drop the last, empty part of the NUL-split.

-print0 establishes NUL as the filename terminating character instead of LF (NUL is a particularly nice choice since it is not a valid character for any part of pathnames on most modern systems; it can do inline delimiting without fear of also being present in the delimited data).
without altering line endings turns off the CR (ASCII character 13) and LF (ASCII character 10) munging that do shell script does by default. This is important if you want to handle filenames with LFs or CRLF combinations.
Split on the NUL character because that is what -print0 uses as a filename terminator.
Ignore the last item from the NUL-split (it will be the empty string) because the output from -print0 is NUL-terminated, not NUL-delimited.

I know, Bruce, but it still works and it’s quite convenient

Typical Apple. :wink: mleonti’s folder contains 1837 files. On my G5 running Tiger, 1837 iterations of ‘info for myFile’ take 1.783 seconds, while the same number of System Events’s ‘properties of myFile’ commands take 32. 66 seconds.

Thanks chrys. Here’s a committee version of Stefan’s script: :slight_smile:

set theFilesNames to {}
set theFilesModDates to {}
set theFilesCreationDates to {}
set theFilesModSizes to {}
set theFilesTypes to {}
set theFilesCreators to {}

set theFolder to (choose folder)
try
	modification date of (info for theFolder without size)
	set infoForWorks to true
on error
	set infoForWorks to false
end try

set theFiles to (do shell script ("find " & quoted form of POSIX path of theFolder & " -type f ! -name '.*' -print0") without altering line endings)
set astid to AppleScript's text item delimiters
if ((system attribute "ascv") div 256 mod 16 < 2) then
	set AppleScript's text item delimiters to ASCII character 0
else
	set AppleScript's text item delimiters to character id 0
end if
set theFiles to theFiles's text items 1 thru -2
set AppleScript's text item delimiters to astid

repeat with oneFile in theFiles
	set oneFile to POSIX file oneFile
	if (infoForWorks) then
		tell (info for oneFile)
			set end of theFilesNames to name
			set end of theFilesModDates to modification date
			set end of theFilesCreationDates to creation date
			set end of theFilesModSizes to size
			set end of theFilesTypes to file type
			set end of theFilesCreators to file creator
		end tell
	else
		tell application "System Events"
			tell (get properties of oneFile)
				set end of theFilesNames to name
				set end of theFilesModDates to modification date
				set end of theFilesCreationDates to creation date
				set end of theFilesModSizes to size
				set end of theFilesTypes to file type
				set end of theFilesCreators to creator type
			end tell
		end tell
	end if
end repeat

Deprecated doesn’t means “dropped” but simply something like “frozen”: will no longer evolve, or may be dropped . one day.
If I remember well it’s the same for ASCII character.

But of course, it’s good practice to use the ‘modern’ feature.
Alas the stats are not really good.


set theFilesNames to {}
set theFilesModDates to {}
set theFilesCreationDates to {}
set theFilesModSizes to {}
set theFilesTypes to {}
set theFilesCreators to {}

set theFolder to choose folder
set debut to current date


set theFiles to paragraphs of (do shell script "find " & quoted form of POSIX path of theFolder & " -type f ! -name '.*'")
tell application "System Events"
	repeat with oneFile in theFiles
		if (oneFile ends with "Icon") then set oneFile to oneFile & return
		tell (get properties of file oneFile)
			set end of theFilesNames to name
			set end of theFilesModDates to modification date
			set end of theFilesCreationDates to creation date
			set end of theFilesModSizes to size
			set end of theFilesTypes to file type
			set end of theFilesCreators to creator type
		end tell
	end repeat
end tell
display dialog "done in " & ((current date) - debut) & " secondes"

does the trick in 14 seconds on a folder treated thru info for in 2 seconds.

Other drawback, it doesn’t treat directly an entry which is an alias to a folder (info for does that automatically).
So we have to add code grabbing the alias’s parent.

Yvan KOENIG (from FRANCE lundi 11 mai 2009 17:20:13)

That’s why I still prefer info for :slight_smile:

Hello Stephan

I assume that the Standard addition grabs the infos from System Events itself so we may guess that there is a way to grab them in a script more efficiently than the one I used.

My goal was just to show a way to do the trick using the tools whose future is more guaranteed than the one which you used.

But, as well as some features are (NOT AVAILABLE YET) for years, we may assume that some ones will be “deprecated” for years too :wink:

Yvan KOENIG (from FRANCE lundi 11 mai 2009 18:55:50)

I would like to thank everyone who helped. I learned a lot and I completed my task.

Hi Stefan,
I tried what you suggested on a small folder and I got the following Applescript error:
"Can’t get name of {document file "1" of folder "Quarantined viruses" of folder "Desktop" of folder "m" of folder "Users" of startup disk of application "Finder", document file "3" of folder "Quarantined viruses" of folder "Desktop" of folder "m" of folder "Users" of startup disk of application "Finder", document file "Find all files in folde2r.scpt" of folder "Quarantined viruses" of folder "Desktop"

Works well thank you.
Is there any way to get the file path? I could not see it in the list info for (oneFile as POSIX file) returned
I tried:

set theFilesNames to {}
set theFilesModDates to {}
set theFilesCreationDates to {}
set theFilesModSizes to {}
set theFilesTypes to {}
set theFilesCreators to {}
set filePaths to {}

set theFolder to choose folder
set theFiles to paragraphs of (do shell script "find " & quoted form of POSIX path of theFolder & " -type f ! -name '.*'" without altering line endings)
repeat with oneFile in theFiles
	if (oneFile ends with "Icon") then set oneFile to oneFile & return
	tell (info for (oneFile as POSIX file as alias))
		
		set end of filePaths to path
		
		set end of theFilesNames to name
		set end of theFilesModDates to modification date
		set end of theFilesCreationDates to creation date
		set end of theFilesModSizes to size
		set end of theFilesTypes to file type
		set end of theFilesCreators to file creator
	end tell
end repeat

and I got this Apllescript error:
Can’t get path of {name:“1”, creation date:date “Thursday, 7 May 2009 9:55:05 PM”, modification date:date “Thursday, 7 May 2009 9:56:33 PM”, size:779, folder:false, alias:false, package folder:false, visible:true, extension hidden:false, name extension:missing value, displayed name:“1”, default application:alias “Macintosh HD:Applications:TextEdit.app:”, kind:“SimpleText Document”, file type:“TEXT”, file creator:“ttxt”, type identifier:“com.apple.traditional-mac-plain-text”, locked:false, busy status:false, short version:“”, long version:“”}.

oneFile itself represents the POSIX (slash separated) path

I bow to you Stefan while I slap myself for my naivity! :slight_smile:

Hi to all,
In my quest I found this Unix code by by Jarno Elonen with help from Leendert Meyer, Uriel, Patrick-Emil Zörner

OUTF=rem-duplicates.sh; echo “#! /bin/sh” > $OUTF; find “$@” -type f -exec md5sum {} ; | sort --key=1,32 | uniq -w 32 -d --all-repeated=separate | sed -r ‘s/^[0-9a-f]( )//;s/([^a-zA-Z0-9./_-])/\\1/g;s/(.+)/#rm \1/’ >> $OUTF; chmod a+x $OUTF; ls -l $OUTF

Can anybody make it work as a shell script?
It would be lovely if finding, sorting and listing only the duplicates could all be done with a one-liner :slight_smile:

This is what I tried:
I punched it in the terminal and I found that there are problems:
sed: illegal option – r
usage: sed script [-Ealn] [-i extension] [file …]
sed [-Ealn] [-i extension] [-e script] … [-f script_file] … [file …]
find: illegal option – t
find: illegal option – y
find: illegal option – p
find: illegal option – e
find: f: No such file or directory
uniq: illegal option – w
usage: uniq [-c | -d | -u] [-i] [-f fields] [-s chars] [input [output]]
-rwxr-xr-x 1 ml staff 11 15 May 22:09 rem-duplicates.sh

I changed sed to -E and I changed the find “$@” to find “/Users/ml/Desktop/Quarantined viruses/”, the path of the file to find ran it and got and uniq to -c.

I ran
MacBook-Pro:~ ml$ OUTF=rem-duplicates.sh; echo “#! /bin/sh” > $OUTF; find “/Users/ml/Desktop/Quarantined viruses/” -type f -exec md5sum {} ; | sort --key=1,32 | uniq -c | sed -E ‘s/^[0-9a-f]( )//;s/([^a-zA-Z0-9./_-])/\\1/g;s/(.+)/#rm \1/’ >> $OUTF; chmod a+x $OUTF; ls -l $OUTF
and this is the result:
find: md5sum: No such file or directory
find: md5sum: No such file or directory
find: md5sum: No such file or directory
find: md5sum: No such file or directory
find: md5sum: No such file or directory
find: md5sum: No such file or directory
find: md5sum: No such file or directory
find: md5sum: No such file or directory
find: md5sum: No such file or directory
find: md5sum: No such file or directory
find: md5sum: No such file or directory
find: md5sum: No such file or directory
find: md5sum: No such file or directory
find: md5sum: No such file or directory
find: md5sum: No such file or directory
find: md5sum: No such file or directory
-rwxr-xr-x 1 ml staff 11 15 May 22:14 rem-duplicates.sh

Not directly on a stock install of Mac OS X. Those options look like they are for GNU coreutils and GNU sed. Neither of which are included in standard installations (at least not on Tiger). Both are available through MacPorts and probably Fink though.

-E may work as a replacement for -r on sed, but there does not seem to be a direct replacement for -w for uniq, which is where the core of the duplicate finding is happening in this shell script.

Very interesting Chrys,

I will investigate the possibilities. I already had a look at MacPorts and it sounds very promising.

Your version of Find and Sort is proving to be very fast indeed:
I set it to the Documents folder on a MacBookPro connected to an iMac 3.06 ghtz via an ethernet 100 hub and it did: 39231 files in 38 secs.
I tried it again connecting on a partition of the iMac and it did 66280 files in 11 secs.
I repeated that just to be sure: it took only 8 secs,
I quit Script editor to clear the ram and it took 9, then 8 again.
So if we can assume 9 is a safe average bet then it did an impressive 7364.4444444444444 files a second.
NB my timing function is included in the time lapsed and it does not handle ticks. (was nearly 66281 or just after 66280 files?)

I suppose the next step (and last) is to find a fast script to receive theFiles and deliver only the duplicates on separate lines with their path so they can easily be checked and the duplicates deleted.