Nigel. Your version works great. Just as a test, I ran the script on a huge folder on an external drive, and itās surprising how fast it is (under a second).
FWIW, on my test folder (NAS drive, no packages) it comes up with lower numbers than Nigelās script and my ASObjC version for several extensions. It also ( like mine) including folders.
For a shell script solution, Iād go for Marcās rather than mine. It doesnāt descend into packages and it does actually include them! On the other hand, it doesnāt descend into folders with dots in their names either, presumably believing them to be packages.
ā-inameā could perhaps be ā-nameā, since thereās no need for case insensitivity when looking for a dot. And "! -regex ā./[.].ā " might be simplified to "! -path ā/.ā " or possibly "! -name ā.*ā ", either of which would also make the -E option unnecessary. But I doubt any of these make much difference in practice. The āsā command in the āsedā string doesnāt need the āgā because the regex used there initially includes everything in the line and then winds back looking for a dot, so the match is always to the last dot in the line anyway.
The spaces are in fact an indent, in which the counts are right-aligned. But it has a fixed width.
I donāt think any of the shell scripts above are totally bullet-proof when we donāt know whatās in the folder(s). Mine overlooks packages, Marcās overlooks items in folders which have dots in their names. Shaneās ASObjC script is the best option in these respects, but includes extensions from folders which have dots in their names. (alear does specifically mention files.) While a CSV file is one of the options mentioned, we donāt know what separatorās considered the default on alearās system.
Hereās a version of Shaneās script which produces a text file similar in appearance to those of the shell scripts. Folder extensions arenāt counted but file and package extensions are. Package contents are ignored, but folder contents arenāt. The results are sorted by extension and are preceded on their lines by their counts, which are dynamically indented to the minimum extent required. The total is displayed at the bottom. It would also be possible to include a header indicating the folder to which the results pertain, but I havenāt bothered here. Hopefully the scriptās compatible with Mavericks ā¦.
use AppleScript version "2.3.1" -- macOS 10.9 (Mavericks) or later
use framework "Foundation"
use scripting additions
-- For testing:
--set theSourceFolder to (choose folder with prompt "Select an HDD or folder:")
--reportOnFolder(theSourceFolder)
on reportOnFolder(theSourceFolder)
set theDestinationFile to (choose file name with prompt "Choose file name" default name "zCARPA.txt")
set destinationURL to current application's class "NSURL"'s fileURLWithPath:(POSIX path of theDestinationFile)
-- get all files
set theSourceFolder to current application's |NSURL|'s fileURLWithPath:(POSIX path of theSourceFolder)
set fileManager to current application's NSFileManager's defaultManager()
set URLKeys to current application's class "NSArray"'s arrayWithArray:({current application's NSURLIsRegularFileKey, current application's NSURLIsPackageKey})
set theOptions to (current application's NSDirectoryEnumerationSkipsPackageDescendants) + (get current application's NSDirectoryEnumerationSkipsHiddenFiles)
set theFiles to (fileManager's enumeratorAtURL:theSourceFolder includingPropertiesForKeys:(URLKeys) options:theOptions errorHandler:(missing value))'s allObjects()
-- remove items with no extensions
set theFilter to current application's NSPredicate's predicateWithFormat:"pathExtension != ''"
set theFiles to theFiles's filteredArrayUsingPredicate:theFilter
-- Build a counted set containing the extensions of those items which aren't folders.
set theSet to current application's NSCountedSet's new()
repeat with thisItem in theFiles
if (((thisItem's resourceValuesForKeys:(URLKeys) |error|:(missing value)) as record as list) contains true) then (theSet's addObject:(thisItem's pathExtension()))
end repeat
-- build array of records so we can sort
set theResults to current application's NSMutableArray's array()
set theSum to 0
tell (space & space) to tell (it & it) to set eightSpaces to (it & it) -- MacScripter /displays/ the literal string as a single space.
repeat with aValue in theSet's allObjects()
set theCount to (theSet's countForObject:(aValue)) as integer
set theSum to theSum + theCount
-- The spaces at the beginning of theEntry are padding for an indent, whose size will be adjusted later.
(theResults's addObject:{theValue:aValue, theEntry:(eightSpaces & theCount) & (space & aValue)})
end repeat
-- sort on the dictionaries 'theValue' values.
set sortDesc to current application's NSSortDescriptor's sortDescriptorWithKey:"theValue" ascending:true
theResults's sortUsingDescriptors:{sortDesc}
-- create the text with an entry for the total count at the end.
set theSum to theSum as text
theResults's addObject:({theEntry:linefeed & eightSpaces & theSum & " TOTAL"})
set theText to (theResults's valueForKey:"theEntry")'s componentsJoinedByString:(linefeed)
-- Adjust the width of the indent to the number of characters in the total.
set theText to theText's stringByReplacingOccurrencesOfString:("(?m)^ +(?=[ \\d]{" & (count theSum) & "} )") withString:("") options:(current application's NSRegularExpressionSearch) range:({0, theText's |length|()})
-- Write the text to the specified text file as UTF-8.
theText's writeToURL:(destinationURL) atomically:(true) encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
end reportOnFolder
Edit: Bug pointed out below by Shane fixed. String of eight spaces explicitly set in a variable to avoid confusion when viewed on MacScripter. Edit 2: Eight-space string set more efficiently for fun and to take up less room on the page.
The process that takes most of the time with Nigelās ASObjC version is the repeat loop that checks every file to see if itās a folder or package. If we make an assumption that no folder will be named with a valid file extension ā not a bullet-proof assumption, but probably a reasonable one in most cases ā we can speed things up a bit by changing this code:
set theSet to current application's NSCountedSet's new()
repeat with thisItem in theFiles
set theExt to thisItem's pathExtension()
if (((thisItem's resourceValuesForKeys:(URLKeys) |error|:(missing value)) as record as list) contains true) then (theSet's addObject:(thisItem's pathExtension()))
end repeat
To this:
set theSet to current application's NSCountedSet's new()
set sortDesc to current application's NSSortDescriptor's sortDescriptorWithKey:"pathExtension" ascending:true
set theFiles to theFiles's sortedArrayUsingDescriptors:{sortDesc}
repeat with thisItem in theFiles
set theExt to thisItem's pathExtension()
if (theSet's containsObject:theExt) or (((thisItem's resourceValuesForKeys:(URLKeys) |error|:(missing value)) as record as list) contains true) then (theSet's addObject:theExt)
end repeat
In my test, that knocks nearly 40% off the overall time.
Thanks, Shane. Youāre right. It should be theCount in the repeat (but theSum is right later on). Now fixed. :rolleyes:
There should be a string of eight spaces in front of both theCount and theSum in the lines youāve quoted. They look like single spaces when viewed in MacScripter, but clicking the āscripletā link produces the correct number in Script Editor. Your quotes both have only single spaces in those positions. To avoid any confusion, Iāve also edited the script to set the eight-space string using the AppleScript space constant.
Thanks too for this suggestion ā although your āfromā codeās not quite whatās in my script!
I donāt see the point of sorting the URL array first. Does it make checking against the set faster? Otherwise youāre sorting all the URLs instead of a potentially smaller number of dictionaries later.
I do see whatās happening in your repeat. Checking if the extensionās already in the set is (I imagine) faster than checking whether or not the URL represents a folder. If the extension is in the set, a file or package with that extension has already passed the test and the assumption is that the current URL is for another file or package of the same type and thereās no need to check if itās a folder. Itās not an assumption Iād want to make myself,though. Also, of course, if the extensionās not already in the set, both tests have to be done, so any speed advantage depends on how frequently each extension occurs in relation to the number of URLs.
Neither can I :(. it got left there after I tried something else.
Actually, in the case of packages thatās more than an assumption: if you name a folder with a packageās extension, itās regarded, and treated, as a package. So itās really an assumption about files. (And it failed on my test case, because Iād deliberately named a folder that way for a test of something else I did ages ago.)
Itās interesting to see just how much time it saves, but Iād never rely on it. Itās code waiting to break.
True. But the extension test is much quicker, and Iām not sure one would be bothered counting at all unless one expected there to be multiple files with the same extension.
I wrote a script that adds a header and total file count to the output of Marcās command line. I did not change the line alignment in the text file.
I ran this on a backup folder on an external SSD that contained 28,631 files including many apps. For comparison purposes, I also ran Nigelās AppleScriptObjC solution from earlier in this thread. The extension and total-file counts were identical.
set theSourceFolder to (choose folder with prompt "Select an HDD or folder:")
set theSourceFolder to POSIX path of theSourceFolder
set textFile to (choose file name with prompt "Choose file name" default name "zCARPA")
set textFile to (textFile as text) & ".txt"
--Marc Anthony's command line that gets file-extension counts.
set extensionData to (do shell script "find -E " & quoted form of theSourceFolder & " -iname '*.*' -prune ! -regex '.*/[.].*' | sed 's/.*[.]//g' | sort | uniq -c")
set AppleScript's text item delimiters to linefeed
set extensionData to paragraphs of extensionData
set AppleScript's text item delimiters to ""
--Remove spaces from front of each line of file-extension data for file-count purposes.
set extensionCountData to {}
repeat with aLine in extensionData
repeat until aLine does not start with " "
set aLine to text 2 thru -1 of aLine
end repeat
set the end of extensionCountData to aLine
end repeat
--Get total file count.
set fileCount to 0
set AppleScript's text item delimiters to " "
repeat with anItem in extensionCountData
try
set fileCount to fileCount + ((text item 1 of anItem) as integer)
end try
end repeat
set the end of extensionData to return & (fileCount as text) & " Total"
set AppleScript's text item delimiters to ""
--Add a header to text file.
set fileData to {"Recursive File-Extension Summary of " & theSourceFolder & return & return}
--Add extension and total file counts to text file.
set AppleScript's text item delimiters to linefeed
set fileData to (fileData & extensionData) as text
set AppleScript's text item delimiters to ""
--Save text file.
try
set openedFile to open for access file textFile with write permission
set eof of openedFile to 0
write fileData to openedFile
close access openedFile
on error
try
close access openedFile
end try
end try
At the risk of beating this to death, the results differ here. Thereās just no obvious way of getting around the fact that find canāt tell the difference between a directory and a package.
Shane. I understand the issue with the Find utility and packages, and, I assume that the backup folder I used as a test contained no packages. I donāt believe my post was misleading but if it was I apologize.
FWIW, Iāve included below the results that were returned with my script (with Marcās command line). Nigelās script did in fact return identical results.
Itās not misleading, and itās a valid approach to the problem ā as long as one is aware of its limitations and know they donāt apply to the particular application. My concern is that people often arenāt aware of the limitations, because theyāre not obvious. Sometimes even I put reliability before speed :).
Shane. I agree with what you say, and so I decided to run some tests to determine which scripts return inaccurate results with packages. Thus far, my test folder contains one each of the following:
Hi, peavine. Your result from #35 isnāt surprising. Due to the wildcarded prune statement, my code does not evaluate folders whose name contains a period; this necessarily includes packages.
Shane, regarding your discrepant results from post #21, Iām thinking that a culprit may possibly be a difference in resolving aliases/symbolic links, however, in a large test that I just ran, the ASObjC method returned items that were unexpected; itās finding files inside ā.lprojā folders. Are these not packages?