Script That Counts Words & Characters

I’m not sure where I found this script, its been quite a while, so possibly here. But it was originally intended to count words and characters in TextEdit documents. I was able to change it to work with MS Word documents, but can not figure out how to get it to work with Apple’s Pages documents. What am I missing? Thanks.

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

tell application "Microsoft Word"
	set wc to count words of document 1
	set cc to count characters of document 1
	if wc is equal to 1 then
		set txt to " word, "
	else
		set txt to " words, "
	end if
	if cc is equal to 1 then
		set txtc to " character."
	else
		set txtc to " characters."
	end if
	
	set result to "This text comprises " & (wc as string) & txt & (cc as string) & txtc
	display dialog result with title "WordStats" buttons {"OK"} default button "OK"
end tell

You can try:

tell application "Pages"
	set wc to count words of body text of document 1
	set cc to count characters of body text of document 1
	if wc is equal to 1 then
		set txt to " word, "
	else
		set txt to " words, "
	end if
	if cc is equal to 1 then
		set txtc to " character."
	else
		set txtc to " characters."
	end if
	
	set result to "This text comprises " & (wc as string) & txt & (cc as string) & txtc
	display dialog result with title "WordStats" buttons {"OK"} default button "OK"
end tell
1 Like

Thank you, works perfectly. I needed to look at my original compared to your’s for a while before I spotted the additional “of body text” in the script. Subtle difference, but apparently important.

The clue is in the Dictionary. If you look at Pages’ definition of a document, you’d see that there are no words or characters there. They’re part of the body text

Note, also, that this only works for regular documents.
If your Pages document uses Page Layout mode (e.g. it’s a flyer or poster, not a book) then body text will be empty and you need to iterate through the text items of document 1, where a Pages ‘text item’ is really a text BOX. This is a major faux pas in my book, because in every version of AppleScript, in every app except Pages, ‘text items of’ means something very different.

Indeed, if you want the text of a text box in your document, you need to ask for:

object text of text item 1 of document 1

In an ideal world, the brain of every experienced AppleScripter would have just exploded, but it’s just an other ordinary day in the world of AppleScript, where rules are merely suggestions :slight_smile:

Here’s the revised script that handles regular and layout-based documents:

tell application "Pages"
	set cc to 0
	set wc to 0
	tell document 1
		if document body is true then
			set wc to count words of body text
			set cc to count characters of body text
		else
			repeat with i from 1 to count text items
				set wc to wc + (count words of object text of text item i)
				set cc to cc + (count characters of object text of text item i)
			end repeat
		end if
		
		if wc is equal to 1 then
			set txt to " word, "
		else
			set txt to " words, "
		end if
		if cc is equal to 1 then
			set txtc to " character."
		else
			set txtc to " characters."
		end if
		set result to "This text comprises " & (wc as string) & txt & (cc as string) & txtc
		display dialog result with title "WordStats" buttons {"OK"} default button "OK"
	end tell
end tell

Pages 14.4 Document definition, for reference:

document

document (noun), pl documents: A document.

PROPERTIES
Property Access Type Description
body text get/set rich text The document body text.
current page get page Current page of the document.
document body get boolean Whether the document has body text.
document template get template The template assigned to the document.
facing pages get/set boolean Whether the document has facing pages.
file get file Its location on disk, if it has one.
id get text Document ID.
modified get boolean Has it been modified since the last save?
name get text Its name.
password protected get boolean Whether the document is password protected or not.
selection get/set list of iWork item A list of the currently selected items.

Thank you for the very informative explanation, and the time I’m sure it took to write. For me, relatively new to AppleScript, explanations such as you provided, really help me to better understand where to look when I’m puzzled by a script.

You’re welcome.

The dictionary really is the de facto place to look for any given application, since it tells you how the application deals with its own objects, what commands it accepts, etc.

The challenge is being able to read the dictionary and understand what it’s trying to tell you. That only comes with time, I’m afraid. It’s a journey, for sure :slight_smile:

Using the Shortcuts.app, I created an automation that will return the character count, word count, sentence count, and line count of any text that you copy to your clipboard. Using this method, there is no need to worry about application specific applescript commands. This should work in every application as long as there is text on your clipboard that you want to be analyzed.

This following image is a screenshot of the commands I used in the Shortcuts.app

I pinned this shortcut which I named “Text Counter” in the menu bar. At this point now all I need to do is select text and copy it to my clipboard, then run the “Text Counter” shortcut from the Shortcuts menu bar menu, as demonstrated in this following animation.

Text_Counts

You can download the actual “Text Counter.shortcut” shortcut file directly from this following link.

Text Stats Shortcut

1 Like

wch1zpink. Thanks for the shortcut, which works great. I had written a shortcut which counted characters, but your shortcut is way better.

I was a little surprised that you used an AppleScript instead of a shortcut dialog. Perhaps you wanted the icon, which is not available in the shortcut dialog. FWIW, I’ve included a screenshot below which shows both of them.

1 Like

I’m trying to get used to using the Shortcuts app because I’m in the process of trying to automate my home and a bunch of accessories I purchased to be able to control with the HomeKit app(which relies heavily on integrating shortcuts into those automations.) However, AppleScript is still my first love and I try to sneak it in whenever I can lol. It’s a whole new learning curve.

1 Like

You can refer to these ‘text containers’ as Shapes and never use the text item abomination.

object text of shape 1 of document 1

I downloaded your shortcut and tried it on a document I have saved.

Question, is the shortcut counting spaces as characters? I’ve included screenshots of the count from the shortcut as well as the script (in post #1), and the difference is pretty significant.


I realized that without a document, my post is pretty much useless. This is a different document, but there are still some differences. Small, but still differences. I’m only posting because I would really like to switch from the script to the shortcut.
Sample.rtf.zip (3.6 KB)
Shortcut
AppleScript

@paulskinner That’s a fair workaround, and I suppose in this use case, if you’re trying to count the characters in your document, then text attached to actual shapes (lines, boxes, ellipses, etc.) in addition to text boxes should also be included.

Doesn’t absolve Apple Engineering for using ‘text item’ as an element of the document when it should be an AppleScript reserved word (it’s not on the official list, I know… just saying it should be :slight_smile: )

No need to write your own script for this.

I wasn’t precise enough in my language and I didn’t offer clear examples.

The only things in a Pages document that can contain text other than the body property are shapes. so you never need to refer to “Text Items”.

tell application "Pages"
	tell document 1
		return ({body text, object text of every shape})
		--In a Page layout format doc with body text...
		-->{"Body copy", {"Text in a rectangle", "Text in a ‘text box’"}}
		
		--In a Page layout format doc with 'missing value' body text property and no shapes...
		-->{missing value, {}} 
	end tell
end tell

Here is an AppleScript solution which analyzes the text that you have copied to your clipboard. You may find that this produces tighter results comparisons. Using this approach allows you to tweak the results a bit further using text item delimiters

You could also run this from the script menu in the system menu bar if you have it enabled or you could also create a shortcut using this code.

set sourceText to ""
copy (the clipboard) to sourceText

set paragraphCount to "Paragraphs : " & (count of paragraphs of sourceText)
set wordCount to "Words : " & (count of words of sourceText)
set characterCount to "Characters : " & (count of characters of sourceText)

activate
display dialog paragraphCount & linefeed & wordCount & ¬
	linefeed & characterCount buttons {"OK"} default button {"OK"} ¬
	with title "Paragraph / Word / & Character Counter" with icon 2

Homer712. The question of what constitutes a word is not the simple matter it might seem.

The AppleScript Language Guide defines a word as quoted below, although macOS Sequoia does not appear to have word-break rules in the International preference pane.

A continuous series of characters, with word elements parsed according to the word-break rules set in the International preference pane.

I couldn’t find anything specific, but just in general macOS follows the ICU/Unicode standards in matters of this sort. This standard can be found at:

I tested some of the solutions suggested in this thread plus one of my own on your RTF file, and the results were as follows. Unless you’re required to use a specific standard/solution, and just as a matter of personal preference, I would decide on one approach and use it consistently.

wch1zpink’s shortcut - 976 words
wch1zpink’s AppleScript - 984 words
Peavine’s ASObjC (see below) - 971 words
The shell’s wc utility - 982 words
Homer712’s WordStats - 988 words (not tested by me)

It’s instructive to note that the wc utility defines a word as “a string of characters delimited by white space characters.”

It would also be a fairly simple matter to write your own solution using a regex pattern with ASObjC or the Shortcuts app. There’s probably little reason to do this, though.

Anyways, my ASObjC suggestion:

use framework "AppKit"
use framework "Foundation"
use scripting additions

set theString to the clipboard
set theString to current application's NSString's stringWithString:theString
set wordCount to current application's NSSpellChecker's sharedSpellChecker()'s countWordsInString:theString language:(missing value) --language is current selection in Spelling panel's pop-up menu

Thanks to everyone who replied to my initial post. I’m asking myself, what I have learned by reading through all the replies. It’s this: it seems that in AppleScript, a word is a word, unless its not. To prove to myself this “a word is a word, unless its not” idea, I took a saved document (in MS Word) and copied it into a Pages document, and ran my original script as well as the Pages script that “gluebyte” provided. And, it seems that the idea that “a word is a word, unless its not”, extends to applications as well, as the results are pretty dramatically different. As someone suggested, I’ll use one application, most likely MS Word, when fooling with word and character counts. The initial purpose of counting words and characters came about because certain web sites have limits on the total number of words and characters that you are allowed when posting a question or request. Again, thanks to all who contributed to this thread. Below, the first screenshot is MS Word, the second is Pages.


When you have a Pages (v14.4) word-processing format document with an actual shape inserted with added text, then follow that with a Text box with some text, asking Pages to:

use scripting additions

tell application "Pages"
     activate
     tell front document
          get object text of every shape of it
     end tell
end tell
return

Resutls in a list that orders the object text results from the Text box, before that of the Shape.

I thought creating a regex pattern to count words would be a simple matter, but that’s not the case. My first thought was to count contiguous word characters (i.e. \w+) but then don’t is considered to be 2 words and v1.1.5 is reported as 3 words (see screenshot). Matching consecutive white space characters (i.e. \s+) returns 968 words for Homer712’s RTF document and seems a marginally better approach. Anyways, there are many established solutions, so there’s no reason to use a regex unless a particular word-break specification is desired.

\w+ - contiguous word characters - 1007 words

\b - same as above but word boundaries divided by 2 - 1007 words

\s+ - contiguous whitespace characters - 968 words

((\b[^\s]+\b)((?<=.\w).)?) - from a Google search - 960 words

N/A - wordcounter.net site - 968 words