Tuesday, September 26, 2017

#1 2016-12-31 09:23:24 am

ChangeAgent
Member
From:: Scotland
Registered: 2008-03-07
Posts: 186

Word processing, how to create a list sorted by word or by frequency

I want to count how many times each word in a document appears in that document.  Example, if the word Hogmanay is in there 10 times and Christmas 4 times in a text with 5000 words I like to see it in a list.  Words like 'and' or 'the' can be included I do not mind.   
However I want all of the 5000 words to be included in the list that are in the document.  The list might be 100 or 500 words long or more, with some words scoring high while others score maybe 1 or 2. 

Using AppleScript, is there a way to do so in Word or Text Edit or Open Office or Pages or TextWrangler or...
I know there is free software for Windows not this but I have no access to Windows at all.  Is there free Mac software?  Or is it build in somewhere?  Or, as said via Apple Script or Automator?   

Thanks

Offline

 

#2 2016-12-31 10:51:04 am

Yvan Koenig
Member
Registered: 2006-09-14
Posts: 3194

Re: Word processing, how to create a list sorted by word or by frequency

Quick and dirty answer:

Applescript:

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

# using a script object to store the list fasten their treatment
script o
   property theWords : {}
   property theCounts : {}
   property theFullList : {}
end script

on indexOf:aValue inList:theList
   set theArray to current application's NSArray's arrayWithArray:theList
   set theIndex to theArray's indexOfObject:aValue
   if theIndex = current application's NSNotFound then
       return 0
   else
       return (theIndex + 1)
   end if
end indexOf:inList:

tell application "TextEdit" to tell document 1
   set theText to its text
end tell

set o's theFullList to words of theText
repeat with aWord in o's theFullList
   set maybe to (its indexOf:(aWord as text) inList:(o's theWords))
   if maybe = 0 then
       set end of o's theWords to (aWord as text)
       set end of o's theCounts to 1
   else
       set item maybe of o's theCounts to (item maybe of o's theCounts) + 1
   end if
end repeat
repeat with i from 1 to count o's theWords
   set maybe to item i of o's theWords as text
   set item i of o's theWords to {maybe, (item i of o's theCounts as integer)}
end repeat
o's theWords

Yvan KOENIG running Sierra 10.12.2 in French (VALLAURIS, France) samedi 31 décembre 2016 16:46:45

Offline

 

#3 2016-12-31 12:06:54 pm

Yvan Koenig
Member
Registered: 2006-09-14
Posts: 3194

Re: Word processing, how to create a list sorted by word or by frequency

I wish to post a work in progress.

Applescript:

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

script o
   property theWords : {}
   property theCounts : {}
   property byWords : {}
   property byFrequency : {}
end script

tell application "TextEdit" to tell document 1
   set theText to its text
end tell

set o's theWords to words of theText
set ArrayOfWords to current application's NSMutableArray's arrayWithArray:{}
set ArrayOfCapitalizedWords to current application's NSMutableArray's arrayWithArray:{}
--set ArrayOfCounts to current application's NSMutableArray's arrayWithArray:{}
repeat with aWord in o's theWords
   set aWord to (current application's NSString's stringWithString:aWord)
   set oneWord to (aWord's uppercaseString()) -- as text
   set maybe to (ArrayOfCapitalizedWords's indexOfObject:oneWord)
   if maybe = current application's NSNotFound then
       (ArrayOfWords's addObject:aWord)
       (ArrayOfCapitalizedWords's addObject:oneWord)
       set end of o's theCounts to 1
   else
       # grabs the old count and add 1
       set newCount to (item (maybe + 1) of o's theCounts) + 1
       set (item (maybe + 1) of o's theCounts) to newCount
   end if
end repeat
set o's theWords to ArrayOfWords as list
copy o's theWords to o's byWords
copy o's theWords to o's byFrequency

set space3 to space & space & space
set i to 0
repeat with aWord in o's theWords
   set aWord to aWord as text # Required
   set i to i + 1
   set item i of o's theWords to {aWord, (item i of o's theCounts as integer)}
   set item i of o's byWords to aWord & ", " & (item i of o's theCounts as integer)
   set item i of o's byFrequency to text -3 thru -1 of (space3 & (item i of o's theCounts as integer)) & ", " & aWord
end repeat

set theArray to current application's NSArray's arrayWithArray:(o's byWords)
set theArray to theArray's sortedArrayUsingSelector:"localizedStandardCompare:"
set o's byWords to theArray as list

set theArray to current application's NSArray's arrayWithArray:(o's byFrequency)
set theArray to theArray's sortedArrayUsingSelector:"localizedStandardCompare:"
set o's byFrequency to theArray as list

{o's theWords, o's byWords, o's byFrequency}

I wished to use a mutable array for the counts but I don't know the correct syntax to change the value of an item in a mutable array, the array supposed to store the occurrences of the words is wrong.

CAUTION: As indexOfObject is case sensitive, the first version returned a wrong list. In my test file, "the" and "The" are treated as different words.

Yvan KOENIG running Sierra 10.12.2 in French (VALLAURIS, France) samedi 31 décembre 2016 18:06:11

Last edited by Yvan Koenig (2016-12-31 02:30:24 pm)

Offline

 

#4 2016-12-31 05:09:45 pm

Marc Anthony
Member
From:: Dallas, TX
Registered: 2006-04-27
Posts: 765

Re: Word processing, how to create a list sorted by word or by frequency

Here is a relatively efficient non-ASOC way to do it.

Applescript:

set message to "I want to count how many times each word in a document appears in that document. Example, if the word Hogmanay is in there 10 times and Christmas 4 times in a text with 5000 words I like to see it in a list. Words like 'and' or 'the' can be included I do not mind.
However I want all of the 5000 words to be included in the list that are in the document. The list might be 100 or 500 words long or more, with some words scoring high while others score maybe 1 or 2.

Using AppleScript, is there a way to do so in Word or Text Edit or Open Office or Pages or TextWrangler or...
I know there is free software for Windows not this but I have no access to Windows at all. Is there free Mac software? Or is it build in somewhere? Or, as said via Apple Script or Automator?

Thanks"
's words

#make records
set freqRec to {}
repeat with focus from 1 to count message
   set counter to 0
   repeat with comparator from 1 to count message
       if my message's item comparator is my message's item focus then set counter to counter + 1
   end repeat
   set freqRec to my freqRec & (run script "{|" & my message's item focus & "|: " & counter & "}")
end repeat
#hack error to extract keys
try
   freqRec as text
on error err
end try
set AppleScript's text item delimiters to {"|", "Can’t make ", " into type text.", "{", "}"}
set freqRec to {err's text items}
set AppleScript's text item delimiters to {""}
set freqRec to freqRec as text
set AppleScript's text item delimiters to {linefeed, ", "}
set freqRec to freqRec's text items
set freqRec to freqRec as text
#sort
(do shell script "echo " & (freqRec)'s quoted form & " | sort -df") --'s paragraphs

Offline

 

#5 2016-12-31 06:44:53 pm

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4383

Re: Word processing, how to create a list sorted by word or by frequency

Another ASObjC offering:

Applescript:

use AppleScript version "2.4"
use framework "Foundation"

tell application "TextEdit" to set wordList to words of text of document 1

-- Get same-case versions of all the words.
set lowercasedWords to (current application's class "NSArray"'s arrayWithArray:(wordList))'s valueForKey:("lowercaseString")
-- Use an NSCountedSet to count the number of each.
set countedWords to current application's class "NSCountedSet"'s setWithArray:(lowercasedWords)
-- Create an array of dictionaries, each containing a word and its count.
set resultArray to current application's class "NSMutableArray"'s new()
set wordEnumerator to countedWords's objectEnumerator()
repeat (countedWords's |count|()) times
   set thisWord to wordEnumerator's nextObject()
   set thisCount to countedWords's countForObject:(thisWord)
   tell resultArray to addObject:({|word|:thisWord, |count|:thisCount})
end repeat
-- Reverse-sort the array on the counts.
set sortOnCount to current application's class "NSSortDescriptor"'s sortDescriptorWithKey:("count") ascending:(false)
tell resultArray to sortUsingDescriptors:({sortOnCount})
-- Coerce back to list and return the result
return resultArray as list

That's definitely my last script this year.  wink


NG

Online

 

#6 2016-12-31 08:42:08 pm

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 5136

Re: Word processing, how to create a list sorted by word or by frequency

Nigel Garvey wrote:

That's definitely my last script this year.  wink


And here are my first smile

This is similar to Nigel's:

Applescript:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set theWords to words of (the clipboard)
-- make counted set of all words in lowercase
set theWords to (current application's NSArray's arrayWithArray:theWords)'s valueForKey:"lowercaseString"
set theCountedSet to current application's NSCountedSet's setWithArray:theWords
-- get array of unique words
set uniqueWords to theCountedSet's allObjects()
-- build array of dictionaries containing both the words and their counts
set theList to current application's NSMutableArray's array()
repeat with aWord in uniqueWords
   (theList's addObject:{theWord:aWord, theCount:(theCountedSet's countForObject:aWord)})
end repeat
-- sort the array first by the count and second by the word
set desc1 to current application's NSSortDescriptor's sortDescriptorWithKey:"theCount" ascending:false
set desc2 to current application's NSSortDescriptor's sortDescriptorWithKey:"theWord" ascending:true
theList's sortUsingDescriptors:{desc1, desc2}
-- convert to tab-delimited text in form <word><tab><count><linefeed>
set newList to {}
repeat with aDict in theList
   set end of newList to ((aDict's objectForKey:"theWord") as text) & tab & (aDict's objectForKey:"theCount") as text
end repeat
set saveTID to AppleScript's text item delimiters
set AppleScript's text item delimiters to {linefeed}
set newList to newList as text
set AppleScript's text item delimiters to saveTID
return newList

This variation tries to retain the case of words that always appear in other than all-lowercase, like the OP's examples of Hogmanay and Christmas:

Applescript:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set theWords to words of (the clipboard)
-- make array of words and matching lowercase array
set theWords to current application's NSArray's arrayWithArray:theWords
set theWordsLower to theWords's valueForKey:"lowercaseString"
-- make counted set of words in lowercase
set theCountedSet to current application's NSCountedSet's setWithArray:theWordsLower
-- get array of unique words
set uniqueWords to theCountedSet's allObjects()
-- build array of dictionaries containing both the words and their counts
set theList to current application's NSMutableArray's array()
repeat with aWord in uniqueWords
   (theList's addObject:{theWord:aWord, theCount:(theCountedSet's countForObject:aWord)})
end repeat
-- sort the array first by the count and second by the word
set desc1 to current application's NSSortDescriptor's sortDescriptorWithKey:"theCount" ascending:false
set desc2 to current application's NSSortDescriptor's sortDescriptorWithKey:"theWord" ascending:true
theList's sortUsingDescriptors:{desc1, desc2}
-- convert to tab-delimited text in form <word><tab><count><linefeed>
set newList to {}
repeat with aDict in theList
   set oneWord to (aDict's objectForKey:"theWord")
   if (theWords's containsObject:oneWord) as boolean is false then
       -- the original list didn't contain the lowercase version, so look up original array
       set theIndex to (theWordsLower's indexOfObject:oneWord)
       set oneWord to (theWords's objectAtIndex:theIndex)
   end if
   set end of newList to (oneWord as text) & tab & (aDict's objectForKey:"theCount") as text
end repeat
set saveTID to AppleScript's text item delimiters
set AppleScript's text item delimiters to {linefeed}
set newList to newList as text
set AppleScript's text item delimiters to saveTID
return newList


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/

Offline

 

#7 2017-01-01 03:43:56 am

ChangeAgent
Member
From:: Scotland
Registered: 2008-03-07
Posts: 186

Re: Word processing, how to create a list sorted by word or by frequency

WOW!  First of all happy new year folks and thank you for posting all your replies.  Overwhelmed for choice!  They all work and I need to test which one fits my purpose best.

Again a thousand thanks.

Offline

 

#8 2017-01-01 07:22:42 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4383

Re: Word processing, how to create a list sorted by word or by frequency

Shane Stanley wrote:

This is similar to Nigel's:


Exactly the same except for the word source, the style, the subsort on words, and the conversion to text at the end.  wink

This variation tries to retain the case of words that always appear in other than all-lowercase, like the OP's examples of Hogmanay and Christmas:


Things get difficult once you start down that road.  hmm  If the text contains something like "It is important to distinguish between saLT and SALT. I'm going to be talking exclusively about the latter," your solution only lists the former, with the combined count of both. (This can be got round by doing the case corrections before putting the words into the NSCountedSet.) And of course if the text also contains "salt", only it will be listed. If case is important, it may be better not to convert to lower case in the first place and to leave it to the user interpret the results. Or perhaps to write something to specific requirements.

But given your variation, you can lose the second repeat and the TIDs by doing the substitutions in the first repeat and including a formatted string in each dictionary. I've reverted to my own style here:

Applescript:

use AppleScript version "2.4"
use framework "Foundation"

tell application "TextEdit" to set wordList to words of text of document 1

set |⌘| to current application
set originalWords to |⌘|'s class "NSArray"'s arrayWithArray:(wordList)
-- Get same-case versions of all the words.
set lowercasedWords to originalWords's valueForKey:("lowercaseString")
-- Use an NSCountedSet to count the number of each.
set countedWords to |⌘|'s class "NSCountedSet"'s setWithArray:(lowercasedWords)
-- Create an array of dictionaries, each containing a word, its count, and a foramtted string containing both.
set resultArray to |⌘|'s class "NSMutableArray"'s new()
set wordEnumerator to countedWords's objectEnumerator()
set presentationFormat to |⌘|'s class "NSString"'s stringWithString:("%@" & tab & "%@")
repeat (countedWords's |count|()) times
   set thisWord to wordEnumerator's nextObject()
   set thisCount to countedWords's countForObject:(thisWord)
   -- If this word never appears entirely lower-cased in the text, substitute the (first!) original version.
   if not ((originalWords's containsObject:(thisWord)) as boolean) then
       set firstOriginalIndex to lowercasedWords's indexOfObject:(thisWord)
       set thisWord to originalWords's objectAtIndex:(firstOriginalIndex)
   end if
   set thisString to |⌘|'s class "NSString"'s stringWithFormat_(presentationFormat, thisWord, thisCount)
   tell resultArray to addObject:({|word|:thisWord, |count|:thisCount, |string|:thisString})
end repeat
-- Reverse-sort the array on the counts, subsorting forwards on the words.
set reverseSortOnCount to |⌘|'s class "NSSortDescriptor"'s sortDescriptorWithKey:("count") ascending:(false)
set forwardSortOnWord to |⌘|'s class "NSSortDescriptor"'s sortDescriptorWithKey:("word") ascending:(true) selector:("localizedCaseInsensitiveCompare:")
tell resultArray to sortUsingDescriptors:({reverseSortOnCount, forwardSortOnWord})
-- Extract the strings, join them with linefeeds, and return as Applescript text.
return ((resultArray's valueForKey:("string"))'s componentsJoinedByString:(linefeed)) as text


NG

Online

 

#9 2017-01-01 07:43:56 am

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 5136

Re: Word processing, how to create a list sorted by word or by frequency

Nigel Garvey wrote:

Things get difficult once you start down that road.


More in theory than in practice, I suspect. I mean, the whole exercise has a certain lack of precision starting from the definition of words.

If someone really has used Christmas and christMas and in the reverse order, yes, they'll get an odd result. But I think that's really a case of GIGO.


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/

Offline

 

#10 2017-01-01 09:12:56 am

ChangeAgent
Member
From:: Scotland
Registered: 2008-03-07
Posts: 186

Re: Word processing, how to create a list sorted by word or by frequency

Thanks! So much choice! This one surely catches the differences in writing.

Offline

 

#11 2017-01-01 05:38:38 pm

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 5136

Re: Word processing, how to create a list sorted by word or by frequency

This version is modified to use some of Nigel's efficiencies, and changing the case logic slightly: mixed-case words will be added to the list in the case used only if they are cased consistently throughout, otherwise they will be added in lowercase.

Applescript:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set theWords to words of (the clipboard)
-- make array of words and matching lowercase array
set theWords to current application's NSArray's arrayWithArray:theWords
set theWordsLower to theWords's valueForKey:"lowercaseString"
-- make counted set of words in lowercase
set theCountedSet to current application's NSCountedSet's setWithArray:theWordsLower
set rawCountedSet to current application's NSCountedSet's setWithArray:theWords
-- get array of unique words
set uniqueWords to theCountedSet's allObjects()
-- build array of dictionaries containing both the words and their counts
set theList to current application's NSMutableArray's array()
repeat with aWord in uniqueWords
   set thisCount to (theCountedSet's countForObject:aWord)
   if (theWords's containsObject:aWord) as boolean is false then
       -- the original list didn't contain the lowercase version, so look up original array
       set theIndex to (theWordsLower's indexOfObject:aWord)
       set casedWord to (theWords's objectAtIndex:theIndex)
       -- check if all instances match this case
       if (rawCountedSet's countForObject:casedWord) = thisCount then set aWord to casedWord
   end if
   set thisString to current application's NSString's stringWithFormat_("%@    %@", aWord, thisCount)
   (theList's addObject:{theWord:aWord, theCount:thisCount, theString:thisString})
end repeat
-- sort the array first by the count and second by the word
set desc1 to current application's NSSortDescriptor's sortDescriptorWithKey:"theCount" ascending:false
set desc2 to current application's NSSortDescriptor's sortDescriptorWithKey:"theWord" ascending:true selector:"localizedCaseInsensitiveCompare:"
theList's sortUsingDescriptors:{desc1, desc2}
return ((theList's valueForKey:"theString")'s componentsJoinedByString:(linefeed)) as text

Edited as per Nigel's comments below.

Last edited by Shane Stanley (2017-01-02 05:43:04 pm)


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/

Offline

 

#12 2017-01-02 03:34:30 am

ChangeAgent
Member
From:: Scotland
Registered: 2008-03-07
Posts: 186

Re: Word processing, how to create a list sorted by word or by frequency

Thanks Shane!

Offline

 

#13 2017-01-02 07:34:05 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4383

Re: Word processing, how to create a list sorted by word or by frequency

Shane Stanley wrote:

This version is modified to use some of Nigel's efficiencies, and changing the case logic slightly: mixed-case words will be added to the list in the case used only if they are cased consistently throughout, otherwise they will be added in lowercase.


Hi Shane.

It's been a bit difficult to follow the logic of what you've done as the code in this thread keeps switching back to your variable names (which I find rather cryptic) and style.

Your latest variation returns the lower-cased words sorted as expected followed by the upper- or mixed-cased words forward-sorted on the words. It turns out that this is because your theCount is being set to 0 for these words, which in turn is because they're not in your theCountedSet.

Applescript:

       -- check if all instances match this case
       if (rawCountedSet's countForObject:casedWord) = thisCount then set aWord to casedWord -- aWord is now casedWord
   end if
   set thisString to current application's NSString's stringWithFormat_("%@    %@", aWord, thisCount) -- The string contains the lower-case count
   (theList's addObject:{theWord:aWord, theCount:(theCountedSet's countForObject:aWord), theString:thisString}) -- aWord isn't in theCountedSet when it's casedWord.

Edit: Since it's still only one manifestation of each word that's used, I think the cure is simply to use the lower-case counts as before:

Applescript:

       -- check if all instances match this case
       if (rawCountedSet's countForObject:casedWord) = thisCount then set aWord to casedWord
   end if
   set thisString to current application's NSString's stringWithFormat_("%@ %@", aWord, thisCount)
   (theList's addObject:{theWord:aWord, theCount:thisCount, theString:thisString}) -- Use thisCount (the total number of instances of the word in any case, from theCountedSet)
end repeat

Last edited by Nigel Garvey (2017-01-02 09:53:56 am)


NG

Online

 

#14 2017-01-02 10:26:11 am

Yvan Koenig
Member
Registered: 2006-09-14
Posts: 3194

Re: Word processing, how to create a list sorted by word or by frequency

Just a question.

What need for the parameter selector:"localizedCaseInsensitiveCompare:" ?

I tried with
selector:"localizedStandardCompare:"
selector:()
and even with the shorter syntax:
set desc2 to current application's NSSortDescriptor's sortDescriptorWithKey:"theWord" ascending:true

and I got exactly the same results.

Yvan KOENIG running Sierra 10.12.2 in French (VALLAURIS, France) lundi 2 janvier 2017 16:26:06

Offline

 

#15 2017-01-02 11:35:43 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4383

Re: Word processing, how to create a list sorted by word or by frequency

Hi Yvan.

It's just to ensure that mixed-case words get sorted case-insensitively, otherwise all words beginning with the lower-case form of a letter will be sorted after all words beginning with the upper-case form (and which occur the same number of times, in these scripts).

As you rightly say, selector:("localizedStandardCompare:") could be used instead, as this "Compares strings as sorted by the Finder" — although the documentation adds that: "The exact sorting behavior of this method is different under different locales and may be changed in future releases. This method uses the current locale."

selector:() compiles as selector:{} but doesn't seem to cause any problems. However, using it or the form without selector: results in a case-sensitive sort. So applying the scripts to the text "Anthony the aardvark":

With selector:("localizedCaseInsensitiveCompare:") or selector:("localizedStandardCompare:"):

"aardvark	1
Anthony	1
the	1"

With selector:{} or without selector: :

"Anthony	1
aardvark	1
the	1"

NG

Online

 

#16 2017-01-02 11:43:39 am

Yvan Koenig
Member
Registered: 2006-09-14
Posts: 3194

Re: Word processing, how to create a list sorted by word or by frequency

Thanks Nigel

It's because Shane's script moves the words beginning with an uppercase at the very end of the list that the selector changes nothing.
I didn't tested but maybe it introduce changes in the script edited according to your very late proposal.

Yvan KOENIG running Sierra 10.12.2 in French (VALLAURIS, France) lundi 2 janvier 2017 17:43:23

Offline

 

#17 2017-01-02 05:45:03 pm

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 5136

Re: Word processing, how to create a list sorted by word or by frequency

Nigel Garvey wrote:

Your latest variation returns the lower-cased words sorted as expected followed by the upper- or mixed-cased words forward-sorted on the words. It turns out that this is because your theCount is being set to 0 for these words, which in turn is because they're not in your theCountedSet.


Mea culpa. Now fixed.


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/

Offline

 

#18 2017-01-02 05:48:52 pm

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 5136

Re: Word processing, how to create a list sorted by word or by frequency

Nigel Garvey wrote:

It's been a bit difficult to follow the logic of what you've done


So it checks whether an entry in the original list is identical to its equivalent in the lowercase list. If so, there's nothing to do; if not, it checks if it appears the same number of times as in the lowercase version, which would mean it was always cased consistently.


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/

Offline

 

#19 2017-01-03 04:40:20 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4383

Re: Word processing, how to create a list sorted by word or by frequency

Thanks, Shane.


NG

Online

 

Board footer

Powered by FluxBB

RSS (new topics) RSS (active topics)