Hi all,
I could use some help: I have one folder with a bunch of files in them, I know that any file that is the same exact size needs to be deleted even if it not the same file. I just need to keep one file that is that size at any given time. I am just looking for a ruthless down and dirty script to help with this.
I believe the following will do what you want. You absolutely have to try this on a test folder before using it elsewhere. All deleted files are moved to the trash
set sourceFolder to choose folder
set fileSizes to {}
tell application "Finder"
set theFiles to every file in sourceFolder as alias list
repeat with aFile in theFiles
set the end of fileSizes to size of aFile
end repeat
end tell
set previousFileSizes to {item 1 of fileSizes} # as suggested by Yvan
set deleteList to {}
repeat with i from 2 to (count fileSizes)
set anItem to item i of fileSizes
if (previousFileSizes contains anItem) then set end of deleteList to item i of theFiles
set end of previousFileSizes to anItem
end repeat
set AppleScript's text item delimiters to ":"
repeat with anItem in deleteList
display dialog "Delete " & quote & text item -1 of (anItem as text) & quote default button 2
end repeat
set AppleScript's text item delimiters to ""
--An alternative dialog that prompts only once.
--display dialog "Delete " & (count deleteList) & " duplicate files." buttons {"Cancel", "OK"} default button 1
tell application "Finder" to delete deleteList
The procedure I used to find duplicates is adapted from a post by Nigel at:
Here, no need to coerce the file references list to aliases list. It is enough this:
set theFiles to every file in sourceFolder
No need text Item delimeters too. And default button there should always be a button with less dangerous consequences:
repeat with anItem in deleteList
tell application "Finder" to display dialog "Delete the following file?" & ¬
return & return & name of anItem default button "Cancel"
end repeat
Otherwise, the script is good and simple enough to remove duplicates without searching in subfolders.
set sourceFolder to choose folder
set fileSizes to {}
tell application "Finder"
set theFiles to every file in sourceFolder
repeat with aFile in theFiles
set the end of fileSizes to size of aFile
end repeat
end tell
set previousFileSizes to {item 1 of fileSizes} # as suggested by Yvan
set deleteList to {}
repeat with i from 2 to (count fileSizes)
set anItem to item i of fileSizes
if (previousFileSizes contains anItem) then set end of deleteList to item i of theFiles
set end of previousFileSizes to anItem
end repeat
repeat with anItem in deleteList
tell application "Finder"
try
display dialog "Delete the following file?" & ¬
return & return & name of anItem default button "Cancel"
if button returned of result is "OK" then delete anItem
end try
end tell
end repeat
NOTE: try block is need to avoid interruption of process from “User cancelled” error.
This offers no warnings and may not be as efficient, but it’s a little simpler:
set thePath to "Macintosh HD:Users:shane:Desktop:Sizes"
tell application id "com.apple.finder" -- Finder
set theFiles to sort files of folder thePath by size
set theSize to size of item 1 of theFiles
repeat with aFile in rest of theFiles
set thisSize to size of aFile
if thisSize = theSize then
delete aFile
else
set theSize to thisSize
end if
end repeat
end tell
Yes, this is much simpler. And as I see, it should be much efficient too. As for warnings, it is easy to add one similar display dialog before delete aFile code line.
Several knowledgeable forum members have reported that it takes less time for the Finder to create a list of files as an alias list than it does for the Finder to create a list of files using Finder’s own syntax. For example, see Marc Anthony’s post number 21 in the following thread.
Nigel appears to be reporting something similar when he wrote:
“One of the things which takes so long with the Finder is that it has to put together a long list of its own specifiers. As with System Events, if you can get it to return the results in some other form instead, it’s often quicker (that is, quicker than it otherwise would be)… The as alias list is a Finder speciality which works with the preceding specifier rather than coercing a returned list after the fact.”
Finally, I often find it necessary to process files outside a Finder tell statement and an alias list makes this simpler and often quicker. So, I would disagree with your statement.
You are right. As alias list “Finder” gets results much faster.
I tested this with 45459 files in my home folder. Without alias list time was 920 seconds, and with alias list time was 385 seconds. The recursive test script I used was this:
set logTime to {}
tell application "Finder"
set theFolder to (path to home folder) --the last created folder on my desktop
set startTime to my (current date)'s time --my used to escape standard addition error
set allFiles to {}
my getAllFiles(theFolder, allFiles)
set logTime's end to (my ((current date)'s time)) - startTime
delay 0.5
set startTime to my (current date)'s time
set allFiles to {}
my getAllFiles2(theFolder, allFiles)
set logTime's end to (my ((current date)'s time)) - startTime
end tell
logTime
on getAllFiles(theFolder, allFiles)
tell application "Finder"
set fileList to files of theFolder
repeat with i from 1 to (count fileList)
set end of allFiles to item i of fileList
end repeat
set subFolders to folders of theFolder
repeat with subFolderRef in subFolders
my getAllFiles(subFolderRef, allFiles)
end repeat
end tell
end getAllFiles
on getAllFiles2(theFolder, allFiles)
tell application "Finder"
set fileList to files of theFolder as alias list
repeat with i from 1 to (count fileList)
set end of allFiles to item i of fileList
end repeat
set subFolders to folders of theFolder as alias list
repeat with subFolderRef in subFolders
my getAllFiles2(subFolderRef, allFiles)
end repeat
end tell
end getAllFiles2
I created a test folder of 300 text files, all of which were the same size, and then timed the scripts in this thread. I first modified each script to exclude the dialog prompt (except Shanes which doesn’t have one). The results were:
peavine - 1 to 2 seconds
Shane - 8 to 10 seconds
KniazidisR - 12 to 14 seconds
The difference would appear to be that my script bulk deletes the duplicate files, while the other scripts don’t. Also, Shane clearly stated that his script was written for simplicity not speed, and my test is worse-case in that it deletes 299 of 300 files.
BTW, I modified Shane’s script to bulk delete the duplicate files by moving the delete command out of the repeat loop, and the timing was 2 seconds. This may be the best script–simple and fast.
It seems that according to the original post, the simpler code would be :
set thePath to (path to desktop as text) & "Sizes"
tell application id "com.apple.finder" -- Finder
set theFiles to sort files of folder thePath by size
delete (rest of theFiles)
end tell
Testing that I discovered something.
When it apply upon the Desktop,the instruction [format]set theFiles to sort files of folder thePath by size[/format] return a list of aliases.
For an other folder it return a list of document files.
Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) mardi 3 septembre 2019 17:31:34
I couldn’t resist finding a role for my Custom Iterative Ternary Merge Sort. This is a bit faster than Peavine’s posted script with my 6592-file test folder, but presumably the OP’s folder won’t have that many files!
use sorter : script "Custom Iterative Ternary Merge Sort" -- <https://macscripter.net/viewtopic.php?pid=194430#p194430>
use scripting additions
on main()
script o
property filePaths : missing value
property fileSizes : missing value
end script
set thePath to (choose folder) as text
-- Get corresponding lists of the folder's files' paths and sizes. System Events is faster than the Finder for this.
tell application "System Events" to set {o's filePaths, o's fileSizes} to {path, size} of files of folder thePath
-- Sort both lists on the file's paths (can be omitted), then (stably) on the sizes.
tell sorter to sort(o's filePaths, 1, -1, {slave:{o's fileSizes}}) -- If desired.
tell sorter to sort(o's fileSizes, 1, -1, {slave:{o's filePaths}})
-- Initialise a "current size" variable to some figure below the lowest size.
set currentSize to (beginning of o's fileSizes) - 1024
-- Work through the sizes. At each increase, replace the corresponding path with missing value.
repeat with i from 1 to (count o's fileSizes)
set thisSize to item i of o's fileSizes
if (thisSize > currentSize) then
set item i of o's filePaths to missing value
set currentSize to thisSize
end if
end repeat
-- Get a list containing only the unreplaced paths and bulk-delete those files.
-- The Finder accepts list of paths for this. System Events's dictionary says it does too, but it doesn't accept lists of anything on my machine.
set filesToDelete to o's filePaths's text
tell application "Finder" to delete filesToDelete
return
end main
main()
Here’s an option using ASObjC. The differences are (a) it ignores packages (and invisible files), which is probably reasonable in this situation, and (b) if the size of two files match it also checks that their contents match.
use AppleScript version "2.5" -- macOS 10.11 or later
use framework "Foundation"
use scripting additions
-- constants and enums used
property NSDirectoryEnumerationSkipsHiddenFiles : a reference to 4
property NSURLFileSizeKey : a reference to current application's NSURLFileSizeKey
set thePath to "/Users/shane/Desktop/Size test" --POSIX path of (choose folder with prompt "choose the folder")
set theFolder to current application's NSURL's fileURLWithPath:thePath
set fileManager to current application's NSFileManager's |defaultManager|()
set {theFiles, theError} to fileManager's contentsOfDirectoryAtURL:theFolder includingPropertiesForKeys:{NSURLFileSizeKey} options:NSDirectoryEnumerationSkipsHiddenFiles |error|:(reference)
set sizeInfo to current application's NSMutableDictionary's dictionary() -- keys will be size, objects will be array of URLs
repeat with aFile in theFiles
set {theResult, theSize} to (aFile's getResourceValue:(reference) forKey:NSURLFileSizeKey |error|:(missing value))
if theSize is not missing value then -- skip packages and folders
if (sizeInfo's allKeys()'s containsObject:theSize) as boolean then -- check if same size already found
set matchingFiles to (sizeInfo's objectForKey:theSize) -- get files that had same size
set matchFlag to false
repeat with aMatch in matchingFiles -- compare contents, delete if the same
if (fileManager's contentsEqualAtPath:(aMatch's |path|()) andPath:(aFile's |path|())) as boolean then
(fileManager's trashItemAtURL:aFile resultingItemURL:(missing value) |error|:(missing value))
set matchFlag to true
exit repeat
end if
end repeat
if not matchFlag then
(matchingFiles's addObject:aFile)
end if
else
(sizeInfo's setObject:(current application's NSMutableArray's arrayWithObject:aFile) forKey:theSize)
end if
end if
end repeat
I don’t now why, but your pure AppleScript variant runs faster - 32 mseconds on my machine (after compiling). AppleScriptObjC variant runs 602 mseconds (after compiling).
Peavine’s script runs 17 mseconds on my machine (after compiling). I removed warning dialog from his script to test.
It’s because the first script just compares file sizes. The second one first compares sizes, and if they match it compares the entire contents of the files to see if they match exactly. Not what was asked for, but safer.
A byte comparison, I believe. The documents say: “For files, this method checks to see if they’re the same file, then compares their size, and finally compares their contents.”
Thanks for new version. A believe, no need matchFlag:
repeat with aMatch in matchingFiles -- compare contents, delete if the same
if not ((fileManager's contentsEqualAtPath:(aMatch's |path|()) andPath:(aFile's |path|())) as boolean) then
(matchingFiles's addObject:aFile)
else
(fileManager's trashItemAtURL:aFile resultingItemURL:(missing value) |error|:(missing value))
exit repeat
end if
end repeat
And, I don’t know if this comparison method leaves the byte comparison at the first byte mismatch. If not, then perhaps this method has some additional options for the faster behavior.
From documentation I read: “For files, this method checks to see if they’re the same file, then compares their size, and finally compares their contents. This method does not traverse symbolic links, but compares the links themselves.”
This means that this method has its own comparison order. So, checking files for equality of size before calling this method is doing one job twice.
No, you’re modifying the number of items in matchingFiles while you loop through it. You’re also risking adding the same file multiple times, and therefore ending up with the array containing items that have been trashed. In any event, it would be the tiniest of optimizations.
Yes, but the difference is that we’re storing the size for re-use, rather than having to read it from two files for each comparison.
Look at it this way. Suppose there are 10 files, all different. My script gets their size once each, and that’s all. If we just used contentsEqualAtPath:andPath:, you’d need to call it 9 times with the first item, 8 times with the second item, and so on.