Get Duplicate Files with a Shortcut

A recent thread in the Code Exchange forum includes several scripts that identify all files in a selected folder that have a duplicate file. The last script in the thread is by Nigel and is the one to use.

Just on a proof-of-concept basis, I wrote a shortcut that does the same thing. However, it should be noted:

  • The Get Contents of Folder action returns hidden files and package contents, which makes the shortcut unusable in some circumstances.

  • Occasionally, the shortcut appears to run without end. In some cases, this may be a reflection of how slow the shortcut is, and, in other instances, this may be the result of hidden and package files. Also, of course, there may be an error in the shortcut.

  • The Shortcuts app does not support sets or subtracting one list from another, and, as a result, I had to employ a different approach to get the duplicate files. I think this works correctly but further testing is needed.

  • The shortcut uses the md5 hash algorithm, but this can be changed to one of three other options. I ran timing tests with all four hash algorithms, and there was no difference.

  • Nigel’s script employs pre-filtering by file size, but I don’t believe this can be done with a shortcut, except by employing a shell command.

I’ll work to see if I can address a few of the above issues. If anyone tests this shortcut, it’s best done with a folder that contains a relatively small number of files. The following screenshot only shows a portion of the shortcut.

Find Duplicate Files.shortcut (24.7 KB)

I worked a bit to optimize the above shortcut and got some unexpected results.

I moved the Generate Hash action out of the repeat loop (see screenshot below), thinking this might improve matters. The timing result after the Generate Hash action was 0.88 second and after the repeat loop was 2.72 second. Why the repeat loop would take 1.84 second to run is a mystery. I rewrote the repeat loop in every way possible, and the only change that made a difference was to include the Generate Hash action inside the repeat loop (as in my original shortcut), which reduced the timing result to 1.63 second.

The thought occurred that the Shortcuts app is doing some housekeeping after the repeat loop, which distorts the timing result. However, the 1-second discrepancy persists when I run the timing tests on both shortcut versions to their ends at the Save File action. So, including the Get Hash action inside the repeat loop seems the way to go.

To avoid the overhead of the Shortcuts editor, I ran the timing tests by way of the Shortcuts menu.