AppleNumbers - count words - ready, but too slow

Hi.

I have the script below to count words in all sheets and tables of a Number’s file. Nothing special. But sometimes the files I get came from Excel, so there are A LOT of empty columns and rows and the script takes forever testing them too.

I couldn’t think of a way to make it, for example, go to column C and check: “hum, even if this column has a header, I can see that rows 2 to 5 are empty, so I will assume this column is empty and won’t work on it”. And same with rows: "hey, I got to an empty row. How, there is another one following it, and another one, and another one… hey, maybe there is nothing from here to the last row, so I will stop working on this".

Does it make sense?

My code is below.

Any suggestion?

Thank you,
Luiz

tell application "Numbers"
	
	-- Start the timer
	set startTime to current date
	
	
	set doc_name to the name of the front document
	
	activate
	
	set thisDocument to document 1
	set textoFinal to ""
	
	
	repeat with currentSheet in sheets of thisDocument
		set nomeM to name of currentSheet
		
		repeat with currentTable in tables of currentSheet
			set rowCount to row count of currentTable
			set columnCount to column count of currentTable
			set nWords to 0
			
			repeat with rowIndex from 1 to rowCount
				repeat with columnIndex from 1 to columnCount
					set columnLetter to character columnIndex of "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
					set cellRef to columnLetter & rowIndex
					set thisCell to cell cellRef of currentTable
					try
						set theString to the value of thisCell
						if theString is not missing value and theString is not "" then
							set nWords to nWords + (count words of (the theString))
						end if
					end try
				end repeat
			end repeat
			
			set textoFinal to textoFinal & nWords & tab & " words in sheet \"" & nomeM & "\"" & " of \"" & doc_name & "\"" & return as string
			
			
		end repeat
	end repeat
	
	
	set f to POSIX file "/Users/XXXX/Desktop/Word Count.txt"
	write textoFinal to f starting at eof
	
	
	-- End the timer
	set endTime to current date
	
	-- Calculate the elapsed time in seconds
	set elapsedTime to (endTime - startTime)
	
	-- Display the elapsed time
	display dialog "The script took " & elapsedTime & " seconds to run." giving up after 3
	
end tell

Luiz. I would be inclined to loop through each column and row of a table and return those columns and rows that contain no (or a limited number of) values. This would take some time, because the script would have to loop through every column and every row of every table and sheet. You might want to omit headers when counting words in a column, which is easily done. Regardless, you need to refine what your criteria are and what you want returned by the script.

The following is a simple example of this approach for columns in one table only.

set {ATID, text item delimiters} to {text item delimiters, space}
set theList to {}
set columnLetters to "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
tell application "Numbers"
	tell table 1 of sheet 1 of document 1
		set columnCount to count columns
		repeat with i from 1 to columnCount
			set theValues to (value of every cell in column i whose value is greater than "")
			set wordCount to count (words of (theValues as text))
			set end of theList to "Column " & (item i of columnLetters) & " has " & wordCount & " words"
		end repeat
	end tell
end tell
set text item delimiters to ATID
return theList --> {"Column A has 4 words", "Column B has 3 words", "Column C has 2 words", "Column D has 0 words", "Column E has 0 words", "Column F has 0 words"}
1 Like

Here’s a development of peavine’s idea which works for every table of every sheet in the document. It uses the cells’ formatted values rather than their AppleScript-format values and does the “filtering” itself. It should be quite easy to omit empty columns, depending on what “empty” actually means.

set {ATID, text item delimiters} to {text item delimiters, space}
set theList to {}
set columnLetters to "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
tell application "Numbers"
	tell document 1
		set everything to every sheet's tables's columns's cells's formatted value
	end tell
end tell
repeat with s from 1 to (count everything)
	set thisSheet to everything's item s
	set sheetID to "Sheet " & s
	repeat with t from 1 to (count thisSheet)
		set thisTable to thisSheet's item t
		set tableID to sheetID & "'s table " & t
		repeat with c from 1 to (count thisTable)
			set thisColumn to thisTable's item c
			set columnID to tableID & "'s column " & columnLetters's character c
			set theList's end to columnID & " has " & ¬
				(count (thisColumn's text as text)'s words) & " words"
		end repeat
	end repeat
end repeat
set text item delimiters to ATID
return theList
1 Like

@Nigel_Garvey and @peavine,

Thank you, both solutions worked very well and FAST! Big file in a matter of seconds. I was wrong in checking EACH cell’s string.

I’m working with @Nigel_Garvey idea because it goes to every sheet, table, column. I just have one question, Nigel: can I use the Sheet name instead of number? Because it put everything into “everything” together, I couldn’t make it work.

Thank you again!
Luiz

Hi Luiz.

Yes. That’s an easy modification.

set {ATID, text item delimiters} to {text item delimiters, space}
set theList to {}
set columnLetters to "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
tell application "Numbers"
	tell document 1
		set everything to every sheet's tables's columns's cells's formatted value
		set sheetNames to every sheet's name -- Added.
	end tell
end tell
repeat with s from 1 to (count everything)
	set thisSheet to everything's item s
	set sheetID to "Sheet " & sheetNames's item s -- Changed.
	repeat with t from 1 to (count thisSheet)
		set thisTable to thisSheet's item t
		set tableID to sheetID & "'s table " & t
		repeat with c from 1 to (count thisTable)
			set thisColumn to thisTable's item c
			set columnID to tableID & "'s column " & columnLetters's character c
			set theList's end to columnID & " has " & ¬
				(count (thisColumn's text as text)'s words) & " words"
		end repeat
	end repeat
end repeat
set text item delimiters to ATID
return theList

Very interesting. Thank you again.

If you are curious, here is my final script. It goes thru everything in the file, count words, sum together and add to a text file.

Because sometimes I need to analyse several files, it has an option to sum what I have in the .TXT file. I mean, I run the script in 5 different XLS files, saying “no” to the dialog of “count all the words”. And in the 6th file I say “Proceed”, so it will sum what I have in the TXT file.

I know it looks a little crazy, “why not keep summing in the script?”. But sometimes I do this over an entire day, at different moments… Like one file in the morning, another one at 2pm, other 3 at 5pm…

Anyway, thank you again!

set {ATID, text item delimiters} to {text item delimiters, space}
set theList to {}
set columnLetters to "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
set textoFinal to ""

tell application "Numbers"
	set doc_name to the name of the front document
	tell document 1
		set everything to every sheet's tables's columns's cells's formatted value
		set sheetNames to every sheet's name 
	end tell
end tell



repeat with s from 1 to (count everything)
	set thisSheet to everything's item s
	set sheetID to "Sheet " & sheetNames's item s 
	repeat with t from 1 to (count thisSheet)
		set thisTable to thisSheet's item t
		
		set colWords to 0
		
		repeat with c from 1 to (count thisTable)
			
			set thisColumn to thisTable's item c
			set colWords to colWords + (count (thisColumn's text as text)'s words)
			
		end repeat
		
		set textoFinal to textoFinal & colWords & tab & " words in \"" & sheetID & "\"" & " of \"" & doc_name & "\"" & return as string
		
	end repeat
	
end repeat



set filePath to POSIX path of (path to desktop folder as string) & "Word Count.txt"
do shell script "touch " & quoted form of filePath

set f to POSIX file filePath
write textoFinal to f starting at eof

set text item delimiters to ATID

----- and now count the words

-- Define the file path
set filePath to POSIX path of (path to desktop folder as string) & "Word Count.txt"

-- Ask the user if they want to proceed with counting
set userResponse to display dialog "Do you want to count the words?" buttons {"No", "Proceed"} default button "Proceed" giving up after 5

-- Check if the dialog timed out
if gave up of userResponse then
	set userChoice to "Proceed"
else
	set userChoice to button returned of userResponse
end if

if userChoice is "Proceed" then
	-- Read the file content
	set fileContent to ""
	set totalSum to 0
	
	try
		set fileRef to open for access POSIX file filePath 
		set fileContent to read fileRef
		close access fileRef
	on error errMsg number errNum
		display dialog "Error reading file: " & errMsg
		return
	end try
	
	-- Split the file content into lines
	set textItems to paragraphs of fileContent
	
	-- Iterate over each line to extract and sum the numbers
	repeat with textItem in textItems
		try
			set tabOffset to offset of tab in textItem
			if tabOffset > 0 then
				set numString to text 1 thru (tabOffset - 1) of textItem
				set numValue to numString as number
				set totalSum to totalSum + numValue
			end if
		end try
	end repeat
	
	-- Prepare the total sum line
	set totalLine to totalSum & " words in total"
	
	-- Append the total line to the file
	try
		set fileRef to open for access POSIX file filePath with write permission
		set eof fileRef to (get eof fileRef) -- Move to the end of the file
		write return & totalLine to fileRef starting at eof
		close access fileRef
	on error errMsg number errNum
		display dialog "Error writing to file: " & errMsg
	end try
else
	
end if


Hi Luiz.

A couple of points about the File Read/Write commands, if you don’t mind me mentioning them.

The main reason for using a try block is to make sure that if an error occurs while the file’s open for access, the script doesn’t just crash out but keeps going long enough to close the access channel. To this end, there should be a close access command in the on error section too.

Scripters disagree over the best way to do this, because in theory, a script (or more properly the application running it) can have several channels open to a file at the same time, although only one can have write permission. If the error that occurs is that the file can’t be opened with write permission because it’s already open with write permission, having close access fileRef in the on error section will simply cause another error, because fileRef won’t have been set. Using close access (POSIX file filePath) instead will close an access to the file, but not necessarily the one causing the problem.

Happily, accidentally leaving the file open for access should only occur during script development and quitting the application running the script will automatically close any access channels it may still happen to have open. My own preferred layout (using your variables) is:

set fileRef to (open for access (POSIX file filePath) with write permission)
try
	-- Do stuff with the file here.
	close access fileRef
on error errMsg number errNum
	close access fileRef
	display dialog "Error writing to file: " & errMsg
end try

The other point is that the first line here doesn’t actually achieve anything:

set eof fileRef to (get eof fileRef) -- Move to the end of the file
write return & totalLine to fileRef starting at eof

(get eof fileRef) returns the number of bytes in the file and the set eof fileRef part sets the number of bytes in the file to the result!

However, in the following line, starting at eof means "starting immediately after the last byte in the file. eof has to be thought of as the insertion point at the end of the file rather than the last byte.

I hope some of this makes sense! :smile:

Nigel. I tested your script and it worked great. I was curious about a few issues and ran timing tests. My test spreadsheet contained a single sheet with 10 tables. Each table contained 7 columns and 6 rows with alternating fully-populated and empty columns.

I was primarily interested in the impact of a filter, and adding one to your script increased the result from 18 to 21 milliseconds.

Using value instead of formatted value increased the timing result of the script with a filter to 31 milliseconds, although this did alter the results. For example, the former returned one word for a date and the latter returned four words.

An alternate approach was to simply return the number of cells with values (see below). This was reasonably quick at 19 milliseconds.

BTW, I liked your everything variable. It’s a great way to quickly get values in a spreadsheet.

set theList to {}
set columnLetters to "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

tell application "Numbers" to tell document 1
	set everything to formatted value of every cell of every column of every table of every sheet whose formatted value is greater than ""
end tell

repeat with s from 1 to (count everything)
	set thisSheet to everything's item s
	set sheetID to "Sheet " & s
	repeat with t from 1 to (count thisSheet)
		set thisTable to thisSheet's item t
		set tableID to sheetID & "'s table " & t
		repeat with c from 1 to (count thisTable)
			set thisColumn to thisTable's item c
			set columnID to tableID & "'s column " & columnLetters's character c
			set theList's end to columnID & " has " & (count thisColumn) & " cells with values"
		end repeat
	end repeat
end repeat

set {TID, text item delimiters} to {text item delimiters, linefeed}
set theResults to theList as text
set text item delimiters to TID
return theResults
(*
Sheet 1's table 1's column A has 6 cells with values
Sheet 1's table 1's column B has 0 cells with values
Sheet 1's table 1's column C has 6 cells with values
Sheet 1's table 1's column D has 0 cells with values
Sheet 1's table 1's column E has 6 cells with values
Sheet 1's table 1's column F has 0 cells with values
...
*)

Sure, any comment/idea/criticism is welcome. Thank you for taking time on this and helping me.

And @peavine, about your comments below on @Nigel_Garvey script, if it helps:

  1. My original script was taking 12 seconds using “cell value”, and counted 11,555 words in a document.
  2. With the new script, it took nothing, and counted 11,624 words in the same document.

Yesterday I tested the new script running it on 68 documents. It counted 12 million words (yes, million) in seconds.

You too knows much better than I do about all this, I’m just sharing in the case it helps.

Thank you,
Luiz

I think I have to retract this. I edited Nigel’s script and changed formatted value to value. The script returned 0 words for a cell with a date, so formatted value apparently has to be used.

Hi peavine. Thanks for your tests. :+1:

The idea with my script’s ‘everything’ variable of course is that getting as many relevant values as possible from an application using only one command greatly reduces the communication time between the script and the application. Very often too, although not always, a script can pick what it needs from a returned list faster than an application can make up its own mind what matches a ‘whose’ condition. Applications generally respond to bulk data requests with list structures analogous to the specified object structures, so here we have a list (representing the document) of lists (the sheets) of lists (the tables) of lists (the columns) of formatted cell values (text or missing value).