Sunday, November 27, 2022

#1 2022-11-21 10:53:59 pm

akim
Member
Registered: 2010-04-04
Posts: 152

Alternative to Grep?

I was able to run a lengthy grep command in Terminal

grep -nRHIi   'claim number\|control\|page\|patient name' '/Users/alan/Desktop/'| grep -iv 'packet\|explanation of\|refer\|detach\|external'| sort -n -t  - -k 2

with a very quick and successful response.

The shell equivalent  in AppleScript  failed, and crashed Script Debugger

Applescript:


   set grepSh to " grep -nRHIi 'claim number\\|control\\|page\\|patient name'" & quoted form of TargetDirectory & "| grep -iv 'packet\\|explanation of\\|refer\\|detach\\|external'| sort -n -t - -k 2"
   set GrepPageResults to do shell script grepSh

A shell equivalent of a smaller  grep command succeeded without crashing Script Debugger.

Applescript:


       set grepSh to "grep -nRHIi " & quoted form of "page" & space & quoted form of TargetDirectory & "| grep -iv refer\\|detach " & " | sort -n -t - -k 2"
   set GrepPageResults to do shell script grepSh

It appears to me that my grep shell command has overloaded AppleScript. I looked at Shane's RegexAndStuffLib, but could not find a method to find multiple items, exclude others and then sort by tabs.

I would appreciate any idea on methods to find or grep for multiple words, while excluding others and then to sort the results.

Online

 

#2 2022-11-22 03:28:58 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 5581

Re: Alternative to Grep?

Hi.

It's not clear what's in the text you're parsing. One thing you could try is to increase the level of backslash escaping in the shell script text. It can get rather complex with the need to escape the backslash in the string passed to the shell and to escape both of those backslashes in the AppleScript text representing the process! You may need to experiment:

Applescript:

   set grepSh to " grep -nRHIi 'claim number\\\\|control\\\\|page\\\\|patient name'" & quoted form of TargetDirectory & "| grep -iv 'packet\\\\|explanation of\\\\|refer\\\\|detach\\\\|external'| sort -n -t - -k 2"
   set GrepPageResults to do shell script grepSh


NG

Offline

 

#3 2022-11-22 08:09:17 am

akim
Member
Registered: 2010-04-04
Posts: 152

Re: Alternative to Grep?

Thanks Nigel for your help. My goal is to find files in a directory, in this case the Desktop directory,  that contain a list of words and phrases, such as "claim number, control, page, and patient name" , while excluding other words and phrases such as "packet, explanation of, refer, detach and external". After finding insensitive  case words  and the lines in which those words were located in the files, I piped the results to a unix sort function, so that I could sort the found set by the files in which they were located.

Regarding the multiple backslashes, I  initially loaded my AppleScript with only two backslashes "\\" which unfortunately expanded to "\\\\" when I uploaded it to Macscripter. I have re-uploaded the  original script with only two backslashes required by AppleScript to allow the single backslash required in the grep command when run from Terminal.

Applescript:

set grepSh to " grep -nRHIi 'claim number\\|control\\|page\\|patient name'" & quoted form of TargetDirectory & "| grep -iv 'packet\\|explanation of\\|refer\\|detach\\|external'| sort -n -t - -k 2"

I would like to find another method of finding multiple words in files of a folder, while excluding other words, and then sort the found data by the  names of those files in that folder

Online

 

#4 2022-11-22 09:58:11 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 5581

Re: Alternative to Grep?

akim wrote:

Regarding the multiple backslashes, I  initially loaded my AppleScript with only two backslashes "\\" which unfortunately expanded to "\\\\" when I uploaded it to Macscripter.


Hi akim.

It looks as if you'd need six backslashes before each vertical bar. What I was trying to get across above was that with 'do shell script', the text sent to grep is a string within a string within a string:

The string parameter sent to grep, here containing "\|"
The text of the shell script command which includes the grep string parameter. It seems to require both the backslash and the bar in the string to be escaped: "\\\|"
The AppleScript source code which produces the shell script text. In this, all three backslashes need to be escaped: "\\\\\\|"


NG

Offline

 

#5 2022-11-22 10:12:30 am

Mockman
Member
From:: Toronto
Registered: 2020-05-27
Posts: 266

Re: Alternative to Grep?

Separately, your first applescript (crashy) lacks a space before the directory, whereas your second applescript (non-crashy) has a space. What happens if you add a space there in the problematic script?

patient name'" & quoted form
--> patient name''/Users

page" & space & quoted form
--> page' '/Users

or:
patient name' " & quoted form
--> patient name' '/Users

Last edited by Mockman (2022-11-22 10:33:55 am)

Offline

 

#6 2022-11-22 01:23:36 pm

akim
Member
Registered: 2010-04-04
Posts: 152

Re: Alternative to Grep?

Nigel, Thanks for the clarification of the backslash additions in AppleScript. This AppleScript modification is good to know.

Mockman, Thanks for finding the extra space that I erroneously added. I deleted that space, but unfortunately,  the result did not change, with the grep shell script still causing ScriptDebugger to spin for a long time.

Peavine, Thanks for your Foundation framework script. It worked well to find those files that contained specific words and then sorted the files.

My goal is to analyze the matched lines of text for other words that might precede or follow the specified words.

Using the Foundation framework, how might I capture the entire line that  contains the matched target words, similar to  grep's  -n option?

Online

 

#7 2022-11-22 03:17:59 pm

Mockman
Member
From:: Toronto
Registered: 2020-05-27
Posts: 266

Re: Alternative to Grep?

akim wrote:

Mockman, Thanks for finding the extra space that I erroneously added. I deleted that space, but unfortunately,  the result did not change, with the grep shell script still causing ScriptDebugger to spin for a long time.



The space needs to be there.

Offline

 

#8 2022-11-22 05:56:03 pm

peavine
Member
From:: Prescott, Arizona
Registered: 2018-09-04
Posts: 1511

Re: Alternative to Grep?

The following script returns matching lines in a file but does not include line numbers. The timing result with a test file containing 1,651 paragraphs was 14 milliseconds.

Applescript:

use framework "Foundation"
use scripting additions

set theFile to POSIX path of (choose file of type "txt")
set matchingLines to getMatchingLines(theFile)

on getMatchingLines(theFile)
set theString to current application's NSString's stringWithContentsOfFile:theFile encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
set theDelimiters to (current application's NSCharacterSet's newlineCharacterSet())
set theArray to (theString's componentsSeparatedByCharactersInSet:theDelimiters)

set includeWords to "word one|word two" -- not case sensitive
set excludeWords to "word three|word four|word five"
set includePattern to "(?im)^.*(" & includeWords & ").*$"
set excludePattern to "(?im)^.*(" & excludeWords & ").*$"

set thePredicate to current application's NSPredicate's predicateWithFormat_("(self MATCHES %@)", includePattern)
set includeArray to (theArray's filteredArrayUsingPredicate:thePredicate)'s mutableCopy()
set thePredicate to current application's NSPredicate's predicateWithFormat_("(self MATCHES %@)", excludePattern)
set excludeArray to theArray's filteredArrayUsingPredicate:thePredicate
includeArray's removeObjectsInArray:excludeArray
return ((includeArray's componentsJoinedByString:linefeed) as text)
end getMatchingLines

The following script returns matching lines with line numbers. The timing result was 60 milliseconds.

Applescript:

use framework "Foundation"
use scripting additions

set theFile to (choose file of type "txt")
set numberedLines to getNumberedLines(theFile)
set matchingLines to getMatchingLines(numberedLines)

on getNumberedLines(theFile)
   set theText to paragraphs of (read theFile)
   script o
       property theLines : theText
       property numberedLines : {}
   end script
   repeat with i from 1 to (count o's theLines)
       set end of o's numberedLines to (i as text) & ". " & (item i of o's theLines)
   end repeat
   return o's numberedLines
end getNumberedLines

on getMatchingLines(numberedLines)
   set includeWords to "word one|word two" -- not case sensitive
   set excludeWords to "word three|word four|word five"
   set includePattern to "(?im)^.*(" & includeWords & ").*$"
   set excludePattern to "(?im)^.*(" & excludeWords & ").*$"
   set numberedLines to (current application's NSArray's arrayWithArray:numberedLines)
   set thePredicate to current application's NSPredicate's predicateWithFormat_("(self MATCHES %@)", includePattern)
   set includeArray to (numberedLines's filteredArrayUsingPredicate:thePredicate)'s mutableCopy()
   set thePredicate to current application's NSPredicate's predicateWithFormat_("(self MATCHES %@)", excludePattern)
   set excludeArray to numberedLines's filteredArrayUsingPredicate:thePredicate
   includeArray's removeObjectsInArray:excludeArray
   return ((includeArray's componentsJoinedByString:linefeed) as text)
end getMatchingLines

The above scripts ignore word boundaries, but this can be changed by editing the scripts as follows:

Applescript:

set includePattern to "(?im)^.*(\\b" & includeWords & "\\b).*$"
set excludePattern to "(?im)^.*(\\b" & excludeWords & "\\b).*$"

The above scripts are not case sensitive. To change this replace (?im) with (?m) in two places.

Last edited by peavine (2022-11-24 08:06:27 am)


2018 Mac mini - macOS Monterey - Script Debugger 8

Offline

 

#9 2022-11-23 07:04:17 pm

akim
Member
Registered: 2010-04-04
Posts: 152

Re: Alternative to Grep?

Peavine, Thanks for the new AppleScript example. The Objective C lines of code have been very helpful.

Online

 

#10 Yesterday 10:34:12 am

peavine
Member
From:: Prescott, Arizona
Registered: 2018-09-04
Posts: 1511

Re: Alternative to Grep?

Just for the sake of completeness, I thought I would modify my first script in post 8 to work with all text files in a folder. The script returns a string containing each file's POSIX path (which are sorted by name), immediately followed by the matching lines. The timing result with 100 files, each of which contained 125 paragraphs, was 105 milliseconds.

Applescript:

use framework "Foundation"
use scripting additions

set theFolder to POSIX path of (choose folder)
set theFiles to getFiles(theFolder)
set matchingData to getMatchingData(theFiles)

on getFiles(theFolder)
   set fileManager to current application's NSFileManager's defaultManager()
   set theFolder to current application's |NSURL|'s fileURLWithPath:theFolder
   set folderContents to fileManager's contentsOfDirectoryAtURL:(theFolder) includingPropertiesForKeys:{} options:4 |error|:(missing value)
   set thePredicate to current application's NSPredicate's predicateWithFormat:"pathExtension ==[c] 'txt'"
   set theFiles to (folderContents's filteredArrayUsingPredicate:thePredicate)'s valueForKey:"path"
   return (theFiles's sortedArrayUsingSelector:"localizedStandardCompare:")
end getFiles

on getMatchingData(theFiles)
   set includeWords to "word one|word two"
   set excludeWords to "word three|word four"
   set includePattern to "(?im)^.*(" & includeWords & ").*$"
   set excludePattern to "(?im)^.*(" & excludeWords & ").*$"
   set matchingData to current application's NSMutableArray's new()
   repeat with aFile in theFiles
       (matchingData's addObject:aFile)
       set matchingLines to getMatchingLines(aFile, includePattern, excludePattern)
       set matchingLines to (matchingLines's stringByAppendingString:linefeed)
       (matchingData's addObject:(matchingLines))
   end repeat
   return (matchingData's componentsJoinedByString:linefeed) as text
end getMatchingData

on getMatchingLines(theFile, includePattern, excludePattern)
   set theString to current application's NSString's stringWithContentsOfFile:theFile encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
   set theDelimiters to (current application's NSCharacterSet's newlineCharacterSet())
   set theArray to (theString's componentsSeparatedByCharactersInSet:theDelimiters)
   set includePredicate to current application's NSPredicate's predicateWithFormat_("(self MATCHES %@)", includePattern)
   set includeArray to (theArray's filteredArrayUsingPredicate:includePredicate)'s mutableCopy()
   set excludePredicate to current application's NSPredicate's predicateWithFormat_("(self MATCHES %@)", excludePattern)
   set excludeArray to theArray's filteredArrayUsingPredicate:excludePredicate
   includeArray's removeObjectsInArray:excludeArray
   return (includeArray's componentsJoinedByString:linefeed)
end getMatchingLines

Last edited by peavine (Yesterday 07:32:30 pm)


2018 Mac mini - macOS Monterey - Script Debugger 8

Offline

 

#11 Yesterday 08:58:47 pm

akim
Member
Registered: 2010-04-04
Posts: 152

Re: Alternative to Grep?

Thank you peavine for your wonderful example of how to write Foundation framework's AppleScript Objective C methods to find files containing specific words. I appreciate your sorting of the files, by file name, and have learned a lot from your example.

However, now that your script returns the found lines containing specific words, how might I  parse through each file's  found lines, with a loop or otherwise to manipulate  the found data, file by file.

I have attempted to return every paragraph of the returned result, but that attempt parsed the found items into separate paragraphs and removed the identifying file's path from which the words  were found.

Is it possible to  return the found items as an array or list , in which AppleScript could then parse through each found file's items, so that each file's path and its found lines containing the included words could be further analyzed?

Expressed in a different manner, is it possible to return an array of arrays, rather than an array of lines separated by linefeeds. If so, is it possible to sort that array of arrays by file path name?

I am at a loss, and would appreciate more insight or direction.  Thank you in advance.

Last edited by akim (Yesterday 09:08:42 pm)

Online

 

#12 Today 07:45:43 am

peavine
Member
From:: Prescott, Arizona
Registered: 2018-09-04
Posts: 1511

Re: Alternative to Grep?

akim. I've revised my script to return an array of arrays. The subarrays will be sorted by file name, and each subarray will contain two items--the file path and an array of matching lines.

A slight alternative to the above is to coerce the array of arrays to a list of lists, which can then be analyzed with basic AppleScript. I've included but commented out this alternative in my script.

Applescript:

use framework "Foundation"
use scripting additions

set theFolder to POSIX path of (choose folder)
set theFiles to getFiles(theFolder)
set matchingData to getMatchingData(theFiles)

on getFiles(theFolder)
   set fileManager to current application's NSFileManager's defaultManager()
   set theFolder to current application's |NSURL|'s fileURLWithPath:theFolder
   set folderContents to fileManager's contentsOfDirectoryAtURL:(theFolder) includingPropertiesForKeys:{} options:4 |error|:(missing value)
   set thePredicate to current application's NSPredicate's predicateWithFormat:"pathExtension ==[c] 'txt'"
   set theFiles to (folderContents's filteredArrayUsingPredicate:thePredicate)'s valueForKey:"path"
   return (theFiles's sortedArrayUsingSelector:"localizedStandardCompare:")
end getFiles

on getMatchingData(theFiles)
   set includeWords to "word one|word two"
   set excludeWords to "word three|word four"
   set includePattern to "(?im)^.*(" & includeWords & ").*$"
   set excludePattern to "(?im)^.*(" & excludeWords & ").*$"
   set matchingData to current application's NSMutableArray's new()
   repeat with aFile in theFiles
       set anArray to current application's NSMutableArray's new()
       (anArray's addObject:aFile)
       set matchingLines to getMatchingLines(aFile, includePattern, excludePattern)
       (anArray's addObject:matchingLines)
       (matchingData's addObject:anArray)
   end repeat
   return matchingData -- return an array of arrays
   -- return matchingData as list -- return a list of lists
end getMatchingData

on getMatchingLines(theFile, includePattern, excludePattern)
   set theString to current application's NSString's stringWithContentsOfFile:theFile encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
   set theDelimiters to (current application's NSCharacterSet's newlineCharacterSet())
   set theArray to (theString's componentsSeparatedByCharactersInSet:theDelimiters)
   set includePredicate to current application's NSPredicate's predicateWithFormat_("(self MATCHES %@)", includePattern)
   set includeArray to (theArray's filteredArrayUsingPredicate:includePredicate)'s mutableCopy()
   set excludePredicate to current application's NSPredicate's predicateWithFormat_("(self MATCHES %@)", excludePattern)
   set excludeArray to theArray's filteredArrayUsingPredicate:excludePredicate
   (includeArray's removeObjectsInArray:excludeArray)
   return includeArray
end getMatchingLines

If you coerce the array of arrays to a list of lists, you could do the analysis as follows:

Applescript:

repeat with aList in matchingData
   set aList to contents of aList
   set theFile to item 1 of aList
   set matchingLines to item 2 of aList
   repeat with aLine in matchingLines
       set aLine to contents of aLine
       -- analyze aLine
   end repeat
end repeat

Last edited by peavine (Today 09:07:55 am)


2018 Mac mini - macOS Monterey - Script Debugger 8

Offline

 

Board footer

Powered by FluxBB

RSS (new topics) RSS (active topics)