Tuesday, November 19, 2019

#26 2019-10-17 06:08:10 pm

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 6034

Re: Script to count files by extension

The process that takes most of the time with Nigel's ASObjC version is the repeat loop that checks every file to see if it's a folder or package. If we make an assumption that no folder will be named with a valid file extension — not a bullet-proof assumption, but probably a reasonable one in most cases — we can speed things up a bit by changing this code:

Applescript:

   set theSet to current application's NSCountedSet's new()
   repeat with thisItem in theFiles
       set theExt to thisItem's pathExtension()
       if (((thisItem's resourceValuesForKeys:(URLKeys) |error|:(missing value)) as record as list) contains true) then (theSet's addObject:(thisItem's pathExtension()))
   end repeat

To this:

Applescript:

   set theSet to current application's NSCountedSet's new()
   set sortDesc to current application's NSSortDescriptor's sortDescriptorWithKey:"pathExtension" ascending:true
   set theFiles to theFiles's sortedArrayUsingDescriptors:{sortDesc}
   repeat with thisItem in theFiles
       set theExt to thisItem's pathExtension()
       if (theSet's containsObject:theExt) or (((thisItem's resourceValuesForKeys:(URLKeys) |error|:(missing value)) as record as list) contains true) then (theSet's addObject:theExt)
   end repeat

In my test, that knocks nearly 40% off the overall time.


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/
latenightsw.com

Offline

 

#27 2019-10-18 02:44:04 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 5105

Re: Script to count files by extension

Shane Stanley wrote:

I think the line:

Applescript:

(theResults's addObject:{theValue:aValue, theEntry:(" " & theSum) & " " & aValue})

should be:

Applescript:

   (theResults's addObject:{theValue:aValue, theEntry:(" " & theCount) & " " & aValue})


Thanks, Shane. You're right. It should be theCount in the repeat (but theSum is right later on). Now fixed.  roll

I'm also not seeing the alignment happening.


There should be a string of eight spaces in front of both theCount and theSum in the lines you've quoted. They look like single spaces when viewed in MacScripter, but clicking the "scriplet" link produces the correct number in Script Editor. Your quotes both have only single spaces in those positions. To avoid any confusion, I've also edited the script to set the eight-space string using the AppleScript space constant.


NG

Offline

 

#28 2019-10-18 04:23:28 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 5105

Re: Script to count files by extension

Shane Stanley wrote:

we can speed things up a bit by changing this code:

Applescript:

   set theSet to current application's NSCountedSet's new()
   repeat with thisItem in theFiles
       set theExt to thisItem's pathExtension()
       if (((thisItem's resourceValuesForKeys:(URLKeys) |error|:(missing value)) as record as list) contains true) then (theSet's addObject:(thisItem's pathExtension()))
   end repeat

To this:

Applescript:

   set theSet to current application's NSCountedSet's new()
   set sortDesc to current application's NSSortDescriptor's sortDescriptorWithKey:"pathExtension" ascending:true
   set theFiles to theFiles's sortedArrayUsingDescriptors:{sortDesc}
   repeat with thisItem in theFiles
       set theExt to thisItem's pathExtension()
       if (theSet's containsObject:theExt) or (((thisItem's resourceValuesForKeys:(URLKeys) |error|:(missing value)) as record as list) contains true) then (theSet's addObject:theExt)
   end repeat


Thanks too for this suggestion — although your "from" code's not quite what's in my script!  wink

I don't see the point of sorting the URL array first. Does it make checking against the set faster? Otherwise you're sorting all the URLs instead of a potentially smaller number of dictionaries later.

I do see what's happening in your repeat. Checking if the extension's already in the set is (I imagine) faster than checking whether or not the URL represents a folder. If the extension is in the set, a file or package with that extension has already passed the test and the assumption is that the current URL is for another file or package of the same type and there's no need to check if it's a folder. It's not an assumption I'd want to make myself,though. Also, of course, if the extension's not already in the set, both tests have to be done, so any speed advantage depends on how frequently each extension occurs in relation to the number of URLs.


NG

Offline

 

#29 2019-10-18 05:00:43 am

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 6034

Re: Script to count files by extension

Nigel Garvey wrote:

I don't see the point of sorting the URL array first.



Neither can I sad. it got left there after I tried something else.

the assumption is that the current URL is for another file or package of the same type



Actually, in the case of packages that's more than an assumption: if you name a folder with a package's extension, it's regarded, and treated, as a package. So it's really an assumption about files. (And it failed on my test case, because I'd deliberately named a folder that way for a test of something else I did ages ago.)

It's interesting to see just how much time it saves, but I'd never rely on it. It's code waiting to break.

Also, of course, if the extension's not already in the set, both tests have to be done, so any speed advantage depends on how frequently each extension occurs in relation to the number of URLs.



True. But the extension test is much quicker, and I'm not sure one would be bothered counting at all unless one expected there to be multiple files with the same extension.


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/
latenightsw.com

Offline

 

#30 2019-10-18 10:25:43 am

peavine
Member
From:: Prescott, Arizona
Registered: 2018-09-04
Posts: 219

Re: Script to count files by extension

I wrote a script that adds a header and total file count to the output of Marc's command line. I did not change the line alignment in the text file.

I ran this on a backup folder on an external SSD that contained 28,631 files including many apps. For comparison purposes, I also ran Nigel's AppleScriptObjC solution from earlier in this thread. The extension and total-file counts were identical.

Applescript:

set theSourceFolder to (choose folder with prompt "Select an HDD or folder:")
set theSourceFolder to POSIX path of theSourceFolder

set textFile to (choose file name with prompt "Choose file name" default name "zCARPA")
set textFile to (textFile as text) & ".txt"

--Marc Anthony's command line that gets file-extension counts.
set extensionData to (do shell script "find -E " & quoted form of theSourceFolder & " -iname '*.*' -prune ! -regex '.*/[.].*' | sed 's/.*[.]//g' | sort | uniq -c")

set AppleScript's text item delimiters to linefeed
set extensionData to paragraphs of extensionData
set AppleScript's text item delimiters to ""

--Remove spaces from front of each line of file-extension data for file-count purposes.
set extensionCountData to {}
repeat with aLine in extensionData
   repeat until aLine does not start with " "
       set aLine to text 2 thru -1 of aLine
   end repeat
   set the end of extensionCountData to aLine
end repeat

--Get total file count.
set fileCount to 0
set AppleScript's text item delimiters to " "
repeat with anItem in extensionCountData
   try
       set fileCount to fileCount + ((text item 1 of anItem) as integer)
   end try
end repeat
set the end of extensionData to return & (fileCount as text) & " Total"
set AppleScript's text item delimiters to ""

--Add a header to text file.
set fileData to {"Recursive File-Extension Summary of " & theSourceFolder & return & return}

--Add extension and total file counts to text file.
set AppleScript's text item delimiters to linefeed
set fileData to (fileData & extensionData) as text
set AppleScript's text item delimiters to ""

--Save text file.
try
   set openedFile to open for access file textFile with write permission
   set eof of openedFile to 0
   write fileData to openedFile
   close access openedFile
on error
   try
       close access openedFile
   end try
end try

Last edited by peavine (2019-10-18 02:56:23 pm)


2018 Mac mini - macOS Mojave

Offline

 

#31 2019-10-18 04:40:10 pm

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 6034

Re: Script to count files by extension

peavine wrote:

The extension and total-file counts were identical.



At the risk of beating this to death, the results differ here. There's just no obvious way of getting around the fact that find can't tell the difference between a directory and a package.


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/
latenightsw.com

Offline

 

#32 2019-10-18 05:37:09 pm

peavine
Member
From:: Prescott, Arizona
Registered: 2018-09-04
Posts: 219

Re: Script to count files by extension

Shane Stanley wrote:
peavine wrote:

The extension and total-file counts were identical.



At the risk of beating this to death, the results differ here. There's just no obvious way of getting around the fact that find can't tell the difference between a directory and a package.


Shane. I understand the issue with the Find utility and packages, and, I assume that the backup folder I used as a test contained no packages. I don't believe my post was misleading but if it was I apologize.

FWIW, I've included below the results that were returned with my script (with Marc's command line). Nigel's script did in fact return identical results.

631 app
   1 applescript
   1 dict
  46 html
   3 jpeg
  36 jpg
   1 md
  45 mscm
  12 nmbtemplate
124 numbers
108 ods
  33 pages
  11 pcalckeys
23954 pdf
   5 photoslibrary
  10 plist
922 png
   1 rtf
   1 sb-57f3f37b-ccnjKL
1465 scpt
  24 sh
   6 soulver
  10 tax2018
700 txt
486 xls
   2 xlsx
28638 Total


2018 Mac mini - macOS Mojave

Offline

 

#33 2019-10-18 05:50:04 pm

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 6034

Re: Script to count files by extension

There's certainly no need to apologize!

It's not misleading, and it's a valid approach to the problem -- as long as one is aware of its limitations and know they don't apply to the particular application. My concern is that people often aren't aware of the limitations, because they're not obvious. Sometimes even I put reliability before speed smile.


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/
latenightsw.com

Offline

 

#34 2019-10-19 01:57:00 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 5105

Re: Script to count files by extension

Even excluding package contents is an assumption on our part.  wink


NG

Offline

 

#35 2019-10-19 09:55:36 am

peavine
Member
From:: Prescott, Arizona
Registered: 2018-09-04
Posts: 219

Re: Script to count files by extension

Shane Stanley wrote:

It's not misleading, and it's a valid approach to the problem -- as long as one is aware of its limitations and know they don't apply to the particular application. My concern is that people often aren't aware of the limitations, because they're not obvious.


Shane. I agree with what you say, and so I decided to run some tests to determine which scripts return inaccurate results with packages. Thus far, my test folder contains one each of the following:

EXTENSION - KIND
app - Application
rtfd - RTF with Attachments
scptd - Script Bundle
wdgt - Widget
download - Safari Download

My script with Marc's command line and Nigel's AppleScriptObjC script return:

1 app
   1 download
   1 rtfd
   1 scptd
   1 wdgt


Nigel's command-line script returns:

1 car
  92 css
   1 dmg
  15 dylib
   3 html
   6 icns
  14 js
   2 map
   3 md
  16 nib
  13 plist
  39 png
   1 provisionprofile
   2 rtf
  11 sample
   1 scpt
  42 strings
   5 tiff
  22 ttf
   5 woff


I was surprised by the above, as I expected my script with Marc's command line to return the contents of the packages.

Last edited by peavine (2019-10-19 11:20:53 am)


2018 Mac mini - macOS Mojave

Offline

 

#36 2019-10-19 07:45:10 pm

Marc Anthony
Member
From:: Dallas, TX
Registered: 2006-04-27
Posts: 907

Re: Script to count files by extension

Hi, peavine. Your result from #35 isn't surprising. Due to the wildcarded prune statement, my code does not evaluate folders whose name contains a period; this necessarily includes packages.

Shane, regarding your discrepant results from post #21, I'm thinking that a culprit may possibly be a difference in resolving aliases/symbolic links, however, in a large test that I just ran, the ASObjC method returned items that were unexpected; it's finding files inside ".lproj" folders. Are these not packages?

Offline

 

#37 2019-10-19 08:29:57 pm

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 6034

Re: Script to count files by extension

Marc,

No, .lproj folders are just directories. They normally live inside packages (unless you have Xcode projects).


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/
latenightsw.com

Offline

 

Board footer

Powered by FluxBB

RSS (new topics) RSS (active topics)