The script included below identifies duplicate files with the specified extensions in a selected folder and its subfolders. The sha1
checksum values are used to identify the duplicate files, and these values are included in the output, which is written to a text file on the user’s desktop. This script can be slow if the number or size of the files being processed is large, altough the script is fairly robust.
use framework "Foundation"
use scripting additions
getDuplicateFiles()
on getDuplicateFiles()
set theExtensions to {"jpg", "jpeg"} --set to desired lowercase file extensions
set theFolder to POSIX path of (choose folder)
set theFiles to getFiles(theFolder, theExtensions)
set theData to current application's NSMutableArray's new()
repeat with aFile in theFiles
set aLine to do shell script "sha1 -r " & quoted form of (aFile as text)
(theData's addObject:aLine)
end repeat
(theData's sortUsingSelector:"compare:")
set theDuplicates to current application's NSMutableOrderedSet's new()
set oldChecksum to current application's NSString's stringWithString:""
set oldItem to current application's NSString's stringWithString:""
repeat with anItem in theData
set newChecksum to ((anItem's componentsSeparatedByString:space)'s objectAtIndex:0)
if (newChecksum's isEqualToString:oldChecksum) is false then
set oldChecksum to newChecksum
else
(theDuplicates's addObject:oldItem)
(theDuplicates's addObject:anItem)
end if
set oldItem to anItem
end repeat
set theString to ((theDuplicates's array())'s componentsJoinedByString:linefeed)
writeFile(theString)
end getDuplicateFiles
on getFiles(theFolder, fileExtensions)
set theFolder to current application's |NSURL|'s fileURLWithPath:theFolder
set fileManager to current application's NSFileManager's defaultManager()
set folderContents to (fileManager's enumeratorAtURL:theFolder includingPropertiesForKeys:{} options:6 errorHandler:(missing value))'s allObjects() --option 6 skips hidden files and package contents
set thePredicate to current application's NSPredicate's predicateWithFormat_("pathExtension.lowercaseString IN %@", fileExtensions)
return (folderContents's filteredArrayUsingPredicate:thePredicate)'s valueForKey:"path"
end getFiles
on writeFile(theText)
set theFile to (current application's NSHomeDirectory()'s stringByAppendingPathComponent:"Desktop")'s stringByAppendingPathComponent:"Duplicate Files.txt"
(current application's NSString's stringWithString:theText)'s writeToFile:theFile atomically:true encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
end writeFile
The following script is similar to the above, except that a regex is used to identify files that are not duplicates. The regex pattern was provided by Nigel. The timing results for the two scripts are pretty much the same.
use framework "Foundation"
use scripting additions
getDuplicateFiles()
on getDuplicateFiles()
set theExtensions to {"jpg", "jpeg"} --set to desired lowercase file extensions
set theFolder to POSIX path of (choose folder)
set theFiles to getFiles(theFolder, theExtensions)
set theData to current application's NSMutableArray's new()
repeat with aFile in theFiles
set aLine to do shell script "sha1 -r " & quoted form of (aFile as text)
(theData's addObject:aLine)
end repeat
(theData's sortUsingSelector:"compare:")
set dataString to (theData's componentsJoinedByString:linefeed)
set dataNoDuplicates to (dataString's stringByReplacingOccurrencesOfString:"([^ ]++ ).++\\n(?:\\1.++(?:\\n|$))++" withString:"" options:1024 range:{0, dataString's |length|()})
set dataNoDuplicates to (dataNoDuplicates's componentsSeparatedByString:linefeed)
theData's removeObjectsInArray:dataNoDuplicates
set theString to theData's componentsJoinedByString:linefeed
writeFile(theString)
end getDuplicateFiles
on getFiles(theFolder, fileExtensions)
set theFolder to current application's |NSURL|'s fileURLWithPath:theFolder
set fileManager to current application's NSFileManager's defaultManager()
set folderContents to (fileManager's enumeratorAtURL:theFolder includingPropertiesForKeys:{} options:6 errorHandler:(missing value))'s allObjects() --option 6 skips hidden files and package contents
set thePredicate to current application's NSPredicate's predicateWithFormat_("pathExtension.lowercaseString IN %@", fileExtensions)
return (folderContents's filteredArrayUsingPredicate:thePredicate)'s valueForKey:"path"
end getFiles
on writeFile(theText)
set theFile to (current application's NSHomeDirectory()'s stringByAppendingPathComponent:"Desktop")'s stringByAppendingPathComponent:"Duplicate Files.txt"
(current application's NSString's stringWithString:theText)'s writeToFile:theFile atomically:true encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
end writeFile