AppleScript to delete any non-highlighted word in a Word doc

I’m still for a working solution for a common workflow problem. My work often involves working on texts that range from 5000 to 10000 words. I then use different highlight colors to indicate different things and later have to get a word count for one of the highlight colors (bright green).

Up until now, I was trying to get an AppleScript or VBA macro that would count the words for me, but finding a script that executes quickly has been tricky so far. I rethought the problem and thought that deleting all the non-highlighted text (or better yet, any text not highlighted in bright green) would be a viable solution, as I then just use Word’s own word counter to get the count. I know that deleted any non-highlighted text (looks like you can’t definite a specific highlight color, however) can be done via Word’sAdvanced Find & Replace in Word, but I’d like a solution that doesn’t require navigating menu and submenus.

Could someone write an AppleScript that would help me achieve this? It essentially needs to delete any text not highlighted in bright green in the current active Word doc.

Thanks in advance!

Applescript is broken in Microsoft Word.

just the command “words of document of window 1” won’t return a list as it should.

Oddly if I use a whose clause it will return a list, but very slowly. If the amount of text in the document is large, it will timeout.

like so:

words of document of window 1 whose highlight color index is yellow

Dang. I guess opening the document in something Pages and executing the script there (assuming a scripting solution in Pages is possible) would take about as much time as navigating the Find & Replace menu in Word. If VBA and AppleScript are a no go, I guess my last card would be a UI macro with Automator or an app like Keysmith.

It’s not broken, it’s just unique.

tell application "Microsoft Word"
	
	set tob to text object of document 1
	set ctob to content of tob
	words of ctob
	
end tell
1 Like

Although your script works, it doesn’t return ‘MS Word’ class “word” which have properties such as highlight color (in our case). It returns the actual contents of the word (ie a string).

I finally got a script to find words with a certain highlight color in another topic

look here…

it may not be fast, but it works and won’t error time-out on large files.

tell application "Microsoft Word"
	
	set tob to text object of document 1
	words of tob whose highlight color index is bright green
	
	-- to see individual words rather than references to them, preface with 'content of'
	content of words of tob whose highlight color index is bright green

end tell

In Word, word means something a bit different than it does in other applications — the same way that alias means something different to System Events (to piggyback upon another recent thread). It still isn’t broken. It does have a different logic though, which I think is to be expected given how complex the app is. Besides, I thought you were complaining about having to use a whose clause.

On another note, if you get the highlight index color of a paragraph, it will return one of three results:

	highlight color index of text object of paragraph 1 of tob
  1. if entire paragraph is highlighted (single tint) then will return that colour
  2. if no words are highlighted then will return no highlight
  3. if some words are highlighted then will return missing value

So, it’s likely that you could reduce the running time by working through paragraphs and skipping those without highlights and declaring every word in those entirely highlighted — save the grunt work for those paragraphs with mixed highlighting.

Most of what you gentlemen are saying is going right over my layman’s head, but I’m following this thread closely. :grinning:

Robert’s solution to count words highlighted in a specific color does work. As he mentioned, the process is pretty slow. I started this thread to see is an alternate approach (i.e. deleting all words not highlighted, or maybe even deleting words not highlighted in a specific color + using Word’s built-in counter for the rest) could yield better results. The first option - deleting words not highlighted (which is doable manually through Advanced Find & Replace) - would already be a nice step forward. I can then just manually delete the text/colors I don’t need and use the Word’s counter.

At any rate, thanks for all the help and input!

My pleasure. Word’s find functionality is scriptable but it quickly becomes obscure and none of microsoft’s examples use a similar property. BTW, I’m very much a layman but I’m fairly familiar with word and excel so I can occasionally figure out their scripting. If the find approach could be figured out, I think that would ultimately be your best solution as it would run fast. Sans that, GUI scripting the dialogue would probably be faster than most approaches given your document sizes.

BTW, my script above (post 6) should generate a list of bright green words as long as the document isn’t long. Comment out the line beginning with words of tob and run it on a document that’s maybe a page long.

FWIW, I get an error when running the script but assume that it’s due because I still use Sierra and Word 2011. That said, I haven’t gone through the script so I don’t yet know what’s causing the error.

error "Can’t divide 1.0 by zero." number -2701 from 1.0

In this line, the word psteps is highlighted

if (i mod psteps) = 0 then

This line always gives the same result as 0.

set wc to count (words of (paragraph 1 of document 1))

and yes its a full paragraph of over 30 words

(again, Word sucks at AppleScript)

Actually on further examination, there is no “words” items of a paragraph

** EDIT ** - Hold on I think I found a way

The line below doesn’t work for me…

It returns missing value just like line 3

my bad

forgot to add “text object of” (sometimes I’m ditzy)

OK

Here is a working version that goes by paragraphs

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

property hColors : {"Auto", "Black", "Blue", "Turquoise", "Bright Green", "Pink", "Red", "Yellow", "White", "Dark Blue", "Teal", "Green", "Violet", "Dark Red", "Dark Yellow", "Gray 50", "Gray 25", "unknown"}

on run
	local myDoc, myResult, i, pc, wc, wordList, WordColor, myIndex, nHighlightedWords, ws, wl, psteps, tsteps
	set myColor to choose from list hColors with title "Windows Hightlight Colors" with prompt "Please choose a highlight color..."
	if class of myColor is boolean then return -- user chose 'Cancel'
	set WordColor to item 1 of myColor
	set myIndex to getIndexOfItemInList(WordColor, hColors)
	set nHighlightedWords to 0
	tell application "Microsoft Word"
		if not (exists window 1) then return
		set pc to count (paragraphs of document 1)
		set WordColor to item myIndex of {auto, black, blue, turquoise, bright green, pink, red, yellow, white, dark blue, teal, green, violet, dark red, dark yellow, gray50, gray25, no highlight}
		set my progress description to "Getting MS-Word highlight colors…"
		set my progress completed steps to 0
		if pc < 1000 then
			set psteps to pc div 100 + 1 --if psteps = 0 then set psteps to 1
			set tsteps to pc div psteps
		else
			set tsteps to 100
			set psteps to pc div tsteps
		end if
		set my progress additional description to "(Paragraph 0 of " & pc & ")"
		set my progress total steps to tsteps
		repeat with i from 1 to pc
			set myResult to highlight color index of text object of paragraph i of document 1
			set wc to count (words of text object of paragraph i of document 1)
			if myResult is not no highlight then
				if myResult is in {WordColor, missing value} then -- partial highlight
					repeat with j from 1 to wc
						set myResult to highlight color index of word j of text object of paragraph i of document 1
						if myResult = WordColor then
							set ws to my trimSpace(content of word j of text object of paragraph i of document 1)
							set wl to length of ws
							if wl = 1 then
								if ws is not in {".", ",", ";", ":", "!", "?", "«", "»", "$", "€", "%", "-", "+", "@", "#", "*", "^", "<", ">", "(", ")", "/", "\\", "~"} then
									set nHighlightedWords to nHighlightedWords + 1
								end if
							else if wl = 2 then
								if ws is not {"« ", " »"} then
									set nHighlightedWords to nHighlightedWords + 1
								end if
							else
								set nHighlightedWords to nHighlightedWords + 1
							end if
						end if
					end repeat
				end if
			end if
			if (i mod psteps) = 0 then
				set my progress completed steps to (my progress completed steps) + 1
				set my progress additional description to "(Paragraph " & i & " of " & pc & ", # of words = " & wc & ") " & (myResult as text) & ", \"" & (content of word i of document 1) & "\""
			end if
			--delay 0.2
		end repeat
		set my progress completed steps to tsteps
		set my progress additional description to "(" & pc & " of " & pc & ") All Done!"
		delay 1
	end tell
	display alert "# of words with highlight color \"" & myColor & "\" is " & nHighlightedWords
	return nHighlightedWords
end run

on getIndexOfItemInList(theItem, theList)
	script L
		property aList : theList
	end script
	repeat with a from 1 to count of L's aList
		if item a of L's aList is theItem then return a
	end repeat
	return 0
end getIndexOfItemInList

on trimSpace(aString)
	local i
	repeat with i from length of aString to 1 by -1
		if text i of aString ≠ " " then
			exit repeat
		end if
	end repeat
	return text 1 thru i of aString
end trimSpace

WAY FASTER

** EDIT ** - FIXED
** EDIT ** - FIXED again

Thanks for keeping working on this. I’m getting the following error message with this latest script:

“The variable wc is not defined.” number -2753 from “wc”

I’ve edited it a few times. Try again

Also I timed the new version.
It’s over 45 times faster

It returns the same error message.

Weird, it run fine on 2 of my Macs.
What line is it crapping out on?

** EDIT ** found it will fix now

I removed “& wc” from this line and the script ran fine. I figured that it would only affect the progress reporting and would be safe to remove.

set my progress additional description to "(" & i & " of " & ") " & (myResult as text) & ", \"" & (content of word i of document 1) & "\""

For a one-page document with 536 words, it took ~22 seconds to complete and produced the dialogue below — much faster than my attempt.

# of words with highlight color "Bright Green" is 11

It’s working now and is much faster! Thanks so much!

I’d be curious how this code would perform on your huge documents.

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

on run
	
	--get the color we want to look for
	set colorChoices to {"auto", "black", "blue", "bright green", "dark blue", "dark red", "dark yellow", "gray25", "gray50", "green", "pink", "red", "teal", "turquoise", "violet", "white", "yellow"}
	set lookForColor to choose from list colorChoices with prompt "Select color to count" with title "Count Highlighted Words"
	if class of lookForColor is boolean then return
	set lookForColor to item 1 of lookForColor
	
	--hide Word so we don't see the selection jumping around
	--because as far as I can tell, Word doesn't have an equivalent of
	--Excel's screen updating property that lets us turn off and on visual updates
	tell application "System Events"
		set visible of application process "Microsoft Word" to false
	end tell
	
	tell application "Microsoft Word"
		--start at the beginning of the document
		home key selection move unit a story extend by moving
		
		--set some properties on the find object
		set findObj to find object of selection
		tell findObj
			clear formatting
			
			set match all word forms to false
			set match byte to false
			set match case to false
			set match fuzzy to false
			set match sounds like to false
			set match wildcards to false
			
			--we want highlighted words, not runs of our lookForColor
			set match whole word to true
			
			set content to ""
			set forward to true
			set highlight to true
			set wrap to find stop
		end tell
		
		--store our find results
		set foundWords to {}
		
		--perform the find operation
		set foundHighlighting to true
		repeat while foundHighlighting
			tell findObj to set foundHighlighting to execute find
			
			if foundHighlighting then
				set foundRange to text object of selection
				--this is just in case you need more info on the found words
				--if not, feel free to strip it down or even 
				--replace it with a simple count
				tell foundRange
					set hh to (highlight color index as text)
					set ss to start of content
					set ee to end of content
				end tell
				if hh is lookForColor then
					set end of foundWords to {start:ss, |end|:ee, color:hh}
				end if
				--tell foundRange to collapse range direction collapse end
			end if
		end repeat
		
	end tell
	
	--show Word again now that we're done
	tell application "System Events"
		set visible of application process "Microsoft Word" to false
	end tell
	
	set howMany to count of foundWords
	log foundWords
	tell application "Microsoft Word"
		display dialog "Found " & howMany & " words highlighted in " & lookForColor
	end tell
	
end run

Took 20 seconds on an 18-page document (9672 words) to return this in a dialogue:

Found 14 words highlighted in bright green

I’ll try again later after I add a few more highlights (all are on the first page). I should note that the document also had 10 yellow highlights which were identified but excluded from the bright green count.

FWIW, it hid the document momentarily but it became visible again as it was noting each highlight. I’m running Word 2011.

Regarding screen updating… While it appears that Word once had this as a property of the application (Word 2004), it seems to have been removed.