count occurrences of a string in all files of a folder

Hello all,

I’ve been struggling with this: one of the options in my app has to allow the user to count all the occurrences of a user-defined string within all the files of a folder.
It has to open each file one by one, and add the number of occurrences found to the count, and return the total number at the end.

I came up with something a while ago with the help of a friend, but it only processes 1 file at a time and I’m having a hard time adapting the code so that it does what I mentionned above. Here’s the code:


on run
	set myFile to choose file with prompt "Choose a file: "
	set theText to readFile(myFile) 
	set theChar to text returned of (display dialog "What are you trying to count ?" default answer "a")
	set occNbr to findLr(theText, theChar) 
	
	display dialog "Number of Occurences : " & occNbr
end run

on readFile(PathFile) -- we read the file
	try
		set foo to (open for access PathFile) 
		set theData to (read foo) 
		close access foo 
	on error 
		close access foo 
	end try
	return theData
end readFile

on findLr(the_string, charTF)
	set AppleScript's text item delimiters to charTF --use of the searched string to separate bits
	set the_list to text items of the_string --put those bits in a list
	set charNbr to ((count of the_list) - 1) --count those bits, minus 1
	set AppleScript's text item delimiters to "" 
	return charNbr 
end findLr

So here I tried doing something like this :

--  ... my stuff
if state of button "countOccurrences" of drawer "drawer" of window "main" is 1 then
	 set the item_list to list folder theFolder without invisibles
	 set theFolder to theFolder as string
    repeat with i from 1 to (count of the item_list)
	 set this_item to item i of the item_list
	 set this_item to ((theFolder) & this_item as string) as alias
	 set this_info to info for this_item
	 set listChar to {}
	 set theText to readFile(this_item)
	 set theChar to contents of text field 1 of box 2 of tab view item 1 of tab view 1 of window 1
	 set occNbr to findLr(theText, theChar)
     end repeat
	set contents of text field "occurrences" of drawer "drawer" of window "main" to occNbr
end if

on readFile(PathFile) -- we read the file
   try
       set foo to (open for access PathFile)
       set theData to (read foo)
       close access foo
   on error
       close access foo
   end try
   return theData
end readFile

on findLr(the_string, charTF)
   set AppleScript's text item delimiters to charTF --use of the searched string to separate bits
   set the_list to text items of the_string --put those bits in a list
   set charNbr to ((count of the_list) - 1) --count those bits, minus 1
   set AppleScript's text item delimiters to ""
   return charNbr
end findLr

but it says “the file wasn’t open” … what do you think I’m doing wrong ?
I know I don’t have anything to add the number of occurrences to the total count yet, but I need to fix the problem of files not opening first …

Thanks in advance for any help

OK, I’ve managed to customize this script so that it opens all the files in a folder and counts ALL OCCURENCES in ALL THE FILES.

It works fine, except if there are subfolders in the folder, in which case it just fails, saying “path_to_first_file_of_subfolder wasn’t open”.

Here is the code:

on run
	set theFolder to choose folder without invisibles
	set the item_list to list folder theFolder without invisibles
	set theFolder to theFolder as string
	set theChar to text returned of (display dialog "What are you trying to count ?" default answer "a")
	repeat with i from 1 to (count of the item_list)
		set this_item to item i of the item_list
		set this_item to ((theFolder) & this_item as string) as alias
		set listChar to {}
		set theText to readFile(this_item)
		set occNbr to findLr(theText, theChar)
	end repeat
	display dialog "Number of Occurences in all files of your folder: " & occNbr
end run

on readFile(PathFile) -- we read the file
	try
		open for access (PathFile)
		set theData to (read PathFile)
		close access (PathFile)
	on error
		close access PathFile
	end try
	return theData
end readFile

on findLr(the_string, charTF)
	set AppleScript's text item delimiters to charTF --use of the searched string to separate bits
	set the_list to text items of the_string --put those bits in a list
	set charNbr to ((count of the_list) - 1) --count those bits, minus 1
	set AppleScript's text item delimiters to ""
	return charNbr
end findLr

Still not really what I’m looking for though, since I want it to display only the occurrences that my app processes … AND it fails if there are subfolders!

EDIT : I just realized that this script doesnt even work actually. It only displays the count of occurrences in the last file of the list, because i’m not putting the number of occurrences found in each file anywhere – so I end up with only the last file’s result.

So i need to store the number of occurrences somewhere after each loop : how could I do that ?

I usually start an empty list and then copy data to the end of it. Sometimes the data is a record, sometimes it’s a string, and still other times I use a property declaration instead of setting an empty list each time… all depends on what I’m trying to do and how I’ll be using the results.


set finalResults to {}

repeat with x from 1 to 5
	copy ("result " & x as string) to end of finalResults
end repeat

finalResults

Yeah but how would I save the total number at each loop, so that it displays the TOTAL number at the end, not as a list but just a number ??

The total number of items in the list?

length of finalResults

The sum of numbers added in a loop?

set finalResult to 0
repeat with x from 1 to 5
set finalResult to finalResult + x
end repeat

Sorry, I haven’t looked closely at what you’re specifically wanting to do in the grand scheme of your script; I just saw the question about getting the number of occurances after a loop.

Awesome.

Do you by any chance know how to NOT let the thing show an error if there are subfolders inside the folder that I choose?
It’s like it tries to open subfolders and does not know how to process them …

That problem is set up when you define the contents of your source folder

set the item_list to list folder theFolder without invisibles

That lists the name of everything in the folder, including folders. Easiest way around (tho more processor intensive) is to check whether or not an item is a folder with ‘info for’ in the repeat loop.


   repeat with i from 1 to (count of the item_list)
       set this_item to item i of the item_list
       set this_item to ((theFolder) & this_item as string) as alias
       if not folder of (info for this_item)
             set this_item to ((theFolder) & this_item as string) as alias
             set listChar to {}
             set theText to readFile(this_item)
             set occNbr to findLr(theText, theChar)
       end if

The other way around it is not not list folder to get item_list, which is only the names of items in the folder listed, but to have the Finder get the contents of the source folder, which gets the full path of items in the folder, and specify only files be listed. Tho it’s a “six of one half dozen the other” situation since having the Finder get the contents of a folder is slower than listing just the names of items in a folder, especially if ‘entire contents of’ is used on a folder full of many files and folders.


   tell application "Finder"
         --just get items in theFolder that are files
         set the item_list to every file of folder theFolder as alias list

         --if you want to also get files in subfolders of theFolder but not the subfolders themselves, then
         set the item_list to every file of the entire contents of folder theFolder as alias list
   end tell

   repeat with i from 1 to (count of the item_list)
       set this_item to item i of the item_list
       --set this_item to ((theFolder) & this_item as string) as alias
       --above not needed now that this_item is already an alias since the Finder got
       --the contents of theFolder as an alias list
       set listChar to {}
       set theText to readFile(this_item)
       set occNbr to findLr(theText, theChar)

Also, another advantage to getting a list of file names and checking each item property for whether or not it’s a folder is that in one call - ‘info for’ - you can also check the file type at the same time, such as making sure it’s the kind of file that your readFile(this_item) handler is able to handle. So it might go like this:

set this_item to ((theFolder) & this_item as string) as alias
set this_item_info to (info for this_item)
if not folder of this_item_info then
…etc., and
if kind of this_item_info is/is not… (or any other ‘info for’ properties)

There may be situations where other methods of folder recursion (or lack thereof) are appropriate; such as checking the end of a file path for “:” or “/” depending on what kind of error checking is needed on strings, alias, and the things are passed around and coerced into different forms (items, folders, files, aliases, etc.).

Alternatively:

display dialog "Count occurrences of this text:" default answer "a"
set pattern to text returned of result

choose folder with prompt "Search for text in files in this folder:"
set inputFolder to POSIX path of result

do shell script "/usr/bin/grep --count --no-filename --fixed-strings " & ¬
	quoted form of pattern & " " & ¬
	quoted form of inputFolder & "* " & ¬
	" | /usr/bin/grep --invert-match '^0$'" & ¬
	" | /usr/bin/python -c 'from sys import stdin\nmatches = 0\nfor line in stdin:\n\tmatches += int(line)\nprint matches'"
set matches to result as integer

As is, the text from the dialog will (obviously) be treated as a regular expression; Whether or not this is desirable is another question.

This is MUCH faster than what I have… incredible.

but what does it mean when you say treated as a regular expression?

Grep uses regular expressions to match text; If the input text contains any special characters that are used in regular expressions, then you likely wouldn’t get the expected output. The input text could be escaped before using it, but I haven’t taken the time to try that yet.

Such quoting is easy in Perl (and probably languages like Python and Ruby, too) where you can use the \Q and \E interpolated meta-characters in string literals (including regexps) to start and end special interpretation, or you can use the quotemeta function.

If you want to stick to forms more directly accessible to shell scripting, look at fgrep (or, equivalently, the -F option to grep). It treats its pattern arguments as string literals for normal substring matching (or whole line matching with -x/–line-regexp, or “word” matching with -w/–word-regexp).

Hey again,

I like this :

copy ("Item " & i & ": " & occNbr & return as string) to end of totalNumber

because I get a list of all the occurrences changed in each file, separately.

I’d like to show these in a table view, which is located in a panel (invisible on launch) but I cant manage to insert that into the table view, anyone has an idea on how to do that ?
Cant find anything online about table views taht corresponds to what i’m trying to do.

(The table view’s content is only displayed if a checkbox is checked, in which case i initialize that list …)

OK, i’ve managed to log the list of items processed by app according to this scheme :

item number / path to the item / number of changes in the item

Here’s my code:

repeat with i from 1 to (count of items of filesList)
	log return & "Item " & i & ": " & ((thePath of item i of filesList) as string) & " : " & ((nbr of item i of filesList) as string) & " changes..."
end repeat

I can see it in the console, but is there any way to throw that into a variable so that I can display it in a text View ?

_________ EDIT ________
Sorry, just realized I was kinda off-topic here … started another thread at http://bbs.macscripter.net/viewtopic.php?pid=109029#p109029