Faster Search/Replace In Numbers Columns

Hi pavilion.

Here’s a slight reworking of your adaptation. It’s not been tested as my test document’s still set up for columns B, G, and H. But I’ve looked at it very hard. :wink:

Your doColumnF() and doColumnI() handlers now just fetch the values from their relevant columns and pass them to a common doPhoneStuff() handler. (Feel free to change the handler names if you don’t like them!) doColumnD()'s repeat only needs to span rowCount items (as opposed to doColumnsGAndH()'s rowCount * 2), so it’s a bit simpler.

By the way, I see you’re posting your AppleScript code indented by 4 spaces to get a window for it. If instead you put groups of three backticks on separate lines above and below it, the forum software will render it with a button that can be clicked to open it in people’s default script editors, eg.:

```
AppleScript code here.
```

Anyway. See how this goes:

use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions

on main()
	tell application "Numbers"
		activate
		set theTable to document 1's sheet 1's table 1
		set rowCount to theTable's row count
	end tell
	
	doPhoneColumn("F", theTable, rowCount)
	doPhoneColumn("I", theTable, rowCount)
	doColumnsGAndH(theTable, rowCount)
	doSingleColumn("D", "Monday", "Sunday", theTable, rowCount)
end main

on doPhoneColumn(columnLetter, theTable, rowCount)
	set cellRange to columnLetter & "1:" & columnLetter & rowCount
	set columnValues to selectAndGetValuesFromRange(theTable, cellRange)
	-- Handle phone column values as a single (tab & linefeed)-delimited text.
	-- (The tab's to make Numbers not take spaces for column separators.)
	set columnText to join(columnValues, tab & linefeed) & tab -- NEW: TAB AT THE END AS WELL.
	set columnText to current application's class "NSMutableString"'s stringWithString:(columnText)
	tell columnText to replaceOccurrencesOfString:("(?m)^(\\d{3})\\.?(\\d{3})\\.?(\\d{4})(?=\\t$)") withString:("($1) $2-$3") ¬
		options:(current application's NSRegularExpressionSearch) range:({0, its |length|()})
	tell columnText to replaceOccurrencesOfString:("missing value") withString:("") options:(0) range:({0, its |length|()})
	tell columnText to replaceOccurrencesOfString:space withString:(character id 160) options:(0) range:({0, its |length|()})
	pasteIntoNumbers(columnText as text)
end doPhoneColumn

on doColumnsGAndH(theTable, rowCount)
	script o
		property ghValues : missing value
	end script
	
	set o's ghValues to selectAndGetValuesFromRange(theTable, "G1:H" & rowCount)
	-- Edit the values individually.
	repeat with r from 1 to rowCount
		set {gValue, hValue} to o's ghValues's items (r + r - 1) thru (r + r)
		if (gValue is missing value) then
			set gValue to ""
		else if (gValue contains "Messaging") then
			set gValue to "Text"
		end if
		if (hValue is missing value) then
			set hValue to ""
		else if (hValue contains "in") then
			set hValue to "Incoming"
		else if (hValue contains "out") then
			set hValue to "Outgoing"
		end if
		-- Store each pair as a tab-joined text in a reused slot in o's ghValues.
		set o's ghValues's item r to (gValue & tab) & (hValue & tab) -- NEW: TAB AT THE END AS WELL.
	end repeat
	-- Coerce the stored texts to a single linefeed-delimited one and paste.
	set ghText to join(o's ghValues's items 1 thru rowCount, linefeed)
	pasteIntoNumbers(ghText)
end doColumnsGAndH

on doSingleColumn(columnLetter, searchText, replacementText, theTable, rowCount)
	script o
		property columnValues : missing value
	end script
	
	set cellRange to columnLetter & "1:" & columnLetter & rowCount
	set o's columnValues to selectAndGetValuesFromRange(theTable, cellRange)
	-- Edit the values individually.
	repeat with r from 1 to rowCount
		set cellValue to o's columnValues's item r
		if (cellValue is missing value) then
			set cellValue to ""
		else if (cellValue contains searchText) then
			set cellValue to replacementText
		end if
		-- Append a tab and store the item back in o's columnValues.
		set o's columnValues's item r to cellValue & tab
	end repeat
	-- Coerce the stored texts to a single linefeed-delimited one and paste.
	set columnText to join(o's columnValues, linefeed)
	pasteIntoNumbers(columnText)
end doSingleColumn

on join(lst, delim)
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to delim
	set txt to lst as text
	set AppleScript's text item delimiters to astid
	return txt
end join

on selectAndGetValuesFromRange(theTable, cellRange)
	tell application "Numbers"
		set theTable's selection range to theTable's range cellRange
		return theTable's selection range's cells's formatted value
	end tell
end selectAndGetValuesFromRange

on pasteIntoNumbers(txt)
	set the clipboard to txt
	tell application "System Events"
		set frontmost of application process "Numbers" to true
		keystroke "v" using {shift down, option down, command down}
	end tell
	delay 1 -- Allow time for the paste to complete. (Adjust if/as necessary.)
end pasteIntoNumbers

main()

Ok, wow, so everything works great (haven’t yet looked through the script, just ran it to test) but so far:

  1. The last values in Columns F & I are not getting reformatted.

UPDATE: This is another weird Numbers thing, because if I add a blank row underneath the last one, it formats those cells fine so it has to do with being the last record on the sheet. Still need a scripting solution that doesn’t leave me with that blank row but at least knowing this might help with that.

  1. The single column D code (set dValue to o’s ghValues’s item r) is showing this error: error “Can’t get item 1 of ghValues.” number -1728 from item 1 of ghValues.

    UPDATE: Fixed, just needed to change to dValues.

  2. Do we still need this (“The tab’s to make Numbers not take spaces for column separators.”) if we’re using the (character id 160)?

Ah. That was probably because the regex was looking out for the tab in the (tab & linefeed) inserted between the entries. There’s no tab at the end of the last line, of course, so the regex wasn’t matching it. I’ve edited the script in my post above to make the tab match optional. Hopefully that’ll fix the problem.

I should have seen that. ghValues should be dValues in that handler. Now also fixed.

It’s probably not necessary to have both. If the tab works to preserve the single column during pasting, I’d be inclined to ditch the (character id 160) line. It would be one line fewer in the code, the ordinary spaces in the entries would be preserved, and the tab’s consistent with the tabs inserted or appended in the other handlers. But that’s just my aesthetic sense. :slight_smile:

That fixed the formatting on the last line, thanks!

Column D now works great as well. I assume that for future single columns, I must still duplicate the entire handler and customize the search terms, with no way around that. I had envisioned a single handler where the search terms and column letters could be stored in the On doColumnX() code at the top, so that all the columns would use the same single code block, and adding future columns would only require an additional On doColumnX() and call, similar to the way you’ve done the phone code.

When I remove the (character id 160) line the formatting goes to the next column, so that needs to stay. Can/should we take out the tabs or just leave them as is?

UPDATE: Just re-read your thoughts on consistency for having the tabs in the other handlers, so I’ll leave them in as you suggest.

For single columns with just one text to replace, you could use this instead of the doColumnD() handler:

on doSingleColumn(columnLetter, searchText, replacementText, theTable, rowCount)
	script o
		property cellValues : missing value
	end script
	
	set cellRange to columnLetter & "1:" & columnLetter & rowCount
	set o's cellValues to selectAndGetValuesFromRange(theTable, cellRange)
	-- Edit the values individually.
	repeat with r from 1 to rowCount
		set cellValue to o's cellValues's item r
		if (cellValue is missing value) then
			set cellValue to ""
		else if (cellValue contains searchText) then
			set cellValue to replacementText
		end if
		-- Append a tab and store the item back in o's cellValues.
		set o's cellValues's item r to cellValue & tab
	end repeat
	-- Coerce the stored texts to a single linefeed-delimited one and paste.
	set columnText to join(o's cellValues, linefeed)
	pasteIntoNumbers(columnText)
end doSingleColumn

To call it for column “D”, you’d use:

doSingleColumn("D", "Monday", "Sunday", theTable, rowCount)

I’ve not been able to reproduce this. On my system, the presence of the inserted tabs is enough to prevent spaces in the cell values from being interpreted as field separators. However, I have noticed this morning that if the last cell in a phone number column is empty, it gets filled with the value from the first cell in that column. :flushed: I’m not able to explain this at all, but it’s fixed by putting a tab after the last value too, which also fixes the last-cell-not-edited bug without the need to make matching the tab optional in the regex. I’ve updated the script above.

Hi - I haven’t looked into your earlier single column comment/code yet because I wanted to button down the tab issue, so when I run the latest revised script (Message 41) as-is, I’m getting an error on this line:

set columnText to join(columnValues, tab & linefeed) & tab – NEW: TAB AT THE END AS WELL

Error: The variable columnValues is not defined." number -2753 from columnValues

Looks like that was a carry-over from the earlier rev when you had:
on doPhoneStuff(columnValues)

UPDATE: Fixed it myself by changing it to cellValues. I’m starting to get the hang of this!

I assume, once this is fixed, that I want to see what happens if I remove the (character id 160) line?

Yeah. Apologies. Too much rush to get this topic out of the way. That particular bug now fixed — hopefully!

Ok, so awesome news, the (character id 160) line can now be removed, as it works perfectly without it for the first time.

Now, onto the single column code, I’m really liking the potential of it, and it almost works, but what’s happening is, it’s not replacing the searched text only, but the entire cell. So if I want to replace ABC Company with ABC Inc., it’s putting “Inc.” into the entire cell.

UPDATE: My apologies, it looks like that’s the way the code has always been, as my previous replace values in the other columns were only for the entire cell. Can this be adjusted to search and replace for only partial contents, either at a word level or character level?

And can the doSingleColumn be loaded with more than one pair of search/replace terms, as it seems to only then require an additional nested “If” statement for each one. I suppose the alternative, less elegant, workaround would be to run it again with each new search/replace pair - though the way you now have it makes it so much easier :slight_smile:

Yes, by adjusting the regular expression. If you’re talking about a script that uses those – I completely lost track.

This thread has 48 posts now, and the similar one (Format Telephone Number in Numbers) even 121. But it seems that the problem (formatting a phone number in Numbers) has long been solved. Or hasn’t it?

Yes, the last rev in Message 41 replaces the entire cell using the search/replace values. Asking if that can be modified to replace just the occurrences of text as they are found, rather than replacing the entire cell.

if (cellValue contains searchText) then set cellValue to replacementText

Yes, the telephone formatting has been solved. Speaking here of the text search/replace.

This routine here seems to achieve that, but I have no idea how to incorporate it into the script:

set the message_text to "On Tuesday I told you to have the report ready by next Tuesday."
set the message_text to replace_chars(message_text, "Tuesday", "Friday")
--> Returns: "On Friday I told you to have the report ready by next Friday."

on replace_chars(this_text, search_string, replacement_string)
	set AppleScript's text item delimiters to the search_string
	set the item_list to every text item of this_text
	set AppleScript's text item delimiters to the replacement_string
	set this_text to the item_list as string
	set AppleScript's text item delimiters to ""
	return this_text
end replace_chars

The subject of this topic is actually “Faster Search/Replace In Numbers Columns”. That and a bit more been solved here too now. Anything further risks becoming “Please write the entire utility for me.” I think you should start a new topic for any other difficulties.

That said:

if (cellValue is missing value) then
	set cellValue to ""
else if (cellValue contains searchText) then
	set cellValue to replace_chars(cellValue, searchText, replacementText)
end if
1 Like

Amazing. Many thanks for your time and patience on this!

Hi NIgel,

I’m trying to expand this two-column range in your script by another column (“I”) so I expanded the range and added the iValue code but am getting an error on this line.

set {gValue, hValue, iValue} to o's ghValues's items (r + r - 1) thru (r + r)

Presumably, items (r + r - 1) thru (r + r) is causing the problem, but not sure what you’re doing here.

UPDATE: By shear luck, I got it to work by changing the line to this, adding another + r - 1 in the first () and another +r in the second. I’m sure it’s easy to follow if I understood what this notations do, presumably something about the columns but I can’t figure it out!

set {gValue, hValue, iValue} to o's ghValues's items (r + r - 1 + r - 1) thru (r + r + r)

on doColumnsGAndH(theTable, rowCount)
	script o
		property ghValues : missing value
	end script
	set o's ghValues to selectAndGetValuesFromRange(theTable, "G1:I" & rowCount)
	
	-- Edit the values individually.
	repeat with r from 1 to rowCount
		set {gValue, hValue, iValue} to o's ghValues's items (r + r - 1) thru (r + r)
		
		if (gValue is missing value) then
			set gValue to ""
		else if (gValue contains "Messaging") then
			set gValue to "Text"
		end if
		
		if (hValue is missing value) then
			set hValue to ""
		else if (hValue contains "in") then
			set hValue to "Incoming"
		else if (hValue contains "out") then
			set hValue to "Outgoing"
		end if
		
		if (iValue is missing value) then
			set iValue to ""
		else if (iValue contains "data") then
			set iValue to "DATA"
		else if (iValue contains "transfer") then
			set iValue to "TRANSFER"
		end if
		
		-- Store each pair as a tab-joined text in a reused slot in o's ghValues.
		set o's ghValues's item r to (gValue & tab) & (hValue & tab) -- NEW: TAB AT THE END AS WELL.
	end repeat
	-- Coerce the stored texts to a single linefeed-delimited one and paste.
	set ghText to join(o's ghValues's items 1 thru rowCount, linefeed)
	pasteIntoNumbers(ghText)
end doColumnsGAndH

Hi pavilion.

Yes. You’ve hit it. When the values are got from the cells in the range, they’re returned as a straight list, but are actually from consecutive slices through the range, ie. G1, H1, I1, G2, H2, I2, G3, H3, I3, etc.. My repeat, when it was just for columns G and H, was a bit of mathematical trickery to have the loop variable r represent a row number while the indices generated from it located where each pair of values from the row was in o’s ghValues.

r = 1: items (r + r - 1) thru (r + r) = items 1 thru 2
r = 2: items (r + r - 1) thru (r + r) = items 3 thru 4
r = 3: items (r + r - 1) thru (r + r) = items 5 thru 6
*etc.*

With three columns, you need items (r * 3 - 2) thru (r * 3) each time:

r = 1: items (r * 3 - 2) thru (r * 3) = items 1 thru 3
r = 2: items (r * 3 - 2) thru (r * 3) = items 4 thru 6
r = 3: items (r * 3 - 2) thru (r * 3) = items 7 thru 9
*etc.*

Wow that is really clever. I get it now!

UPDATE: spoke too soon :slight_smile: What about four columns? I would have assumed it increments, for example on line two, to r = 2: items (r * 4 - 2) thru (r * 4) = items 5 thru 8, but that doesn’t yield 5 thru 8 on the second row because while the second parens gets us to 8, the first parens needs to get to 5 , which is an odd number and not divisible by 2.

It’s a shame there isn’t a way to replicate the single column efficiency with column letter and search terms up top feeding a common set of variables so that the targets can be more easily updated without editing or adding more code, because when the target columns are adjacent the speed with which this code processes the cells is on an order of magnitude faster than running them column by column with the other code.

r = 2: items (r * 4 - 3) thru (r * 4) :slight_smile:

The subtracted number should be 1 less than the multiplier.

Oh man… I should have figured that out. Thanks so much for the explanation.

Hi Nigel,

I have been busy working on your script below to make the small improvements I have been looking to do to make it even better and hopefully help others as well. It has gone well so far, and I wanted to show you what I’ve done and ask for any corrections or improvements.

The ultimate objective is to have a search/replace script that performs on both single columns when they are separated in the sheet, and on adjacent columns when they are together (because this works much faster), but have it so that the customizable elements that need editing are placed in the script up top, without the need to dig into the code each time to change it. This is a great help to other users who are not as comfortable changing the actual script code that runs the routines.

(For those watching at home, this script is super helpful in doing something so obviously doable in Excel, but apparently not in Numbers, which is to search/replace only limited cell ranges, and not the entire sheet, which is the only way it works in the current version as far as I can tell. Thanks to @Nigel_Garvey with this script you can tailor your search requirements very nicely.)

So, the first part of my improvements was made to the doSingleColumn routine, where I changed it from replacing an entire cell’s contents to only replacing the characters being searched, which makes it much more versatile.

The second thing I added was a new set of search/replace containers that allowed me to have as many as I would like (in this case I have five per column, which is admittedly overkill :slight_smile: ). In order to get this to work, I had to switch around the order of the descriptors (what is the actual term for those container values between the parens?) so that the search terms are listed last, which then keeps the columnLetter, theTable and rowCount in a static place. And then I set up five sets of search/replace if statements that look for whatever is defined as real terms above, and obviously for the dummy “search2, replace2” and so one that remain empty.

Surprisingly, the whole thing works quite nicely, though I am not all that pleased with the aesthetic look of all the filler code needed to pull it together. And so I ask whether there’s any way to streamline this further, perhaps that wouldn’t require each doSingleColumn line to contain the dummy search/replace terms when not used, and also I imagine the script does marginally take longer to run since it has to perform those searches for naught, so anything you can think of to improve it and keep the speed would be great.

Now, as to the second part of the code - what I’ve renamed doMultipleColumns - I am having some trouble off the bat trying to recreate the same versatility as above, since this one in its current state requires me to define the column range and the search/replace terms inside the code instead of up top.

As you can see, I haven’t gotten very far, but the first order of business was to automate the insertion of the correct numbers in the items (r + r - 1) thru (r + r) line, as you patiently explained to me how to do the other day. So, my thought was to ultimately define the column range up top instead of below, and then set the variable columnCount just as you set the rowCount and use that instead of hard numbers in that line, but it isn’t working for the reason that the variable columnCount is not defined even as rowCount is, for some reason.

Once I have that working, I would then want to remove the column range from the code and define it above, and then set up a similar set of search/replace if statements that call search/replace terms above as well, in a similar way to how the single columns, if that is doable. Any assistance with this as well would be great!

UPDATE: I fixed the columnCount problem by declaring it globally on top (I assume that’s the best way to do it) and now the variable works fine in the items (r + r - 1) thru (r + r) line, and I also added an exit repeat statement at each extra search stage of the multiple if statements to check and see if the value of the next search term is present or not, in order to avoid processing any unnecessary searches, though I doubt it actually processes any faster than before.

I still could use help setting up the multiple search options on top with variables in the code, as in the single column version, because unlike the one unknown of search terms that I could easily assign to variables, here we are dealing with two unknowns: the search terms and the number of columns that would vary by the actual range identified on top.

What we don’t want to do is perform all the search/replace tasks on the entire range each time, as that added significant time to running the script in a previous version you posted a while back. Limiting each search to its specific column makes it super fast.

use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions
global columnCount

on main()
	tell application "Numbers"
		activate
		set theTable to document 1's sheet 1's table 1
		set rowCount to theTable's row count
		set columnCount to count of columns in range "G:H" of theTable
	end tell
	
	--doSingleColumn("G", theTable, rowCount, "Messaging", "Text", "Search2", "Replace2", "Search3", "Replace3", "Search4", "Replace4", "Search5", "Replace5")
	--doSingleColumn("H", theTable, rowCount, "in", "Incoming", "out", "Outgoing", "Search3", "Replace3", "Search4", "Replace4", "Search5", "Replace5")
	doMultipleColumns(theTable, rowCount, columnCount)
	
end main

on formatPhone(columnLetter, theTable, rowCount)
	set cellRange to columnLetter & "1:" & columnLetter & rowCount
	set cellValues to selectAndGetValuesFromRange(theTable, cellRange)
	set columnText to join(cellValues, tab & linefeed) & tab -- NEW: TAB AT THE END AS WELL.
	set columnText to current application's class "NSMutableString"'s stringWithString:(columnText)
	tell columnText to replaceOccurrencesOfString:("(?m)^(\\d{3})\\.?(\\d{3})\\.?(\\d{4})(?=\\t$)") withString:("($1) $2-$3") ¬
		options:(current application's NSRegularExpressionSearch) range:({0, its |length|()})
	tell columnText to replaceOccurrencesOfString:("missing value") withString:("") options:(0) range:({0, its |length|()})
	pasteIntoNumbers(columnText as text)
end formatPhone

on doSingleColumn(columnLetter, theTable, rowCount, Search1, Replace1, Search2, Replace2, Search3, Replace3, Search4, Replace4, Search5, Replace5)
	script o
		property cellValues : missing value
	end script
	
	set cellRange to columnLetter & "1:" & columnLetter & rowCount
	set o's cellValues to selectAndGetValuesFromRange(theTable, cellRange)
	-- Edit the values individually.
	repeat with r from 1 to rowCount
		set cellValue to o's cellValues's item r
		if (cellValue is missing value) then
			set cellValue to ""
		end if
		if (cellValue contains Search1) then
			set cellValue to replace_chars(cellValue, Search1, Replace1)
		end if
		if cellValue = "Search2" then exit repeat
		if (cellValue contains Search2) then
			set cellValue to replace_chars(cellValue, Search2, Replace2)
		end if
		if cellValue = "Search3" then exit repeat
		if (cellValue contains Search3) then
			set cellValue to replace_chars(cellValue, Search3, Replace3)
		end if
		if cellValue = "Search4" then exit repeat
		if (cellValue contains Search4) then
			set cellValue to replace_chars(cellValue, Search4, Replace4)
		end if
		if cellValue = "Search5" then exit repeat
		if (cellValue contains Search5) then
			set cellValue to replace_chars(cellValue, Search5, Replace5)
		end if
		-- Append a tab and store the item back in o's cellValues.
		set o's cellValues's item r to cellValue & tab
	end repeat
	-- Coerce the stored texts to a single linefeed-delimited one and paste.
	set columnText to join(o's cellValues, linefeed)
	pasteIntoNumbers(columnText)
end doSingleColumn

on doMultipleColumns(theTable, rowCount)
	script o
		property ghValues : missing value
	end script
	
	set o's ghValues to selectAndGetValuesFromRange(theTable, "G1:H" & rowCount)
	-- Edit the values individually.
	repeat with r from 1 to rowCount
		set {gValue, hValue} to o's ghValues's items (r * columnCount - 1) thru (r * columnCount)
		if (gValue is missing value) then
			set gValue to ""
		else if (gValue contains "Messaging") then
			set gValue to "Text"
		end if
		if (hValue is missing value) then
			set hValue to ""
		else if (hValue contains "in") then
			set hValue to "Incoming"
		else if (hValue contains "out") then
			set hValue to "Outgoing"
		end if
		-- Store each pair as a tab-joined text in a reused slot in o's ghValues.
		set o's ghValues's item r to (gValue & tab) & (hValue & tab) -- NEW: TAB AT THE END AS WELL.
	end repeat
	-- Coerce the stored texts to a single linefeed-delimited one and paste.
	set ghText to join(o's ghValues's items 1 thru rowCount, linefeed)
	pasteIntoNumbers(ghText)
end doMultipleColumns

on replace_chars(this_text, search_string, replacement_string)
	set AppleScript's text item delimiters to the search_string
	set the item_list to every text item of this_text
	set AppleScript's text item delimiters to the replacement_string
	set this_text to the item_list as string
	set AppleScript's text item delimiters to ""
	return this_text
end replace_chars

on join(lst, delim)
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to delim
	set txt to lst as text
	set AppleScript's text item delimiters to astid
	return txt
end join

on selectAndGetValuesFromRange(theTable, cellRange)
	tell application "Numbers"
		set theTable's selection range to theTable's range cellRange
		return theTable's selection range's cells's formatted value
	end tell
end selectAndGetValuesFromRange

on pasteIntoNumbers(txt)
	set the clipboard to txt
	tell application "System Events"
		set frontmost of application process "Numbers" to true
		keystroke "v" using {shift down, option down, command down}
	end tell
	delay 1 -- Allow time for the paste to complete. (Adjust if/as necessary.)
end pasteIntoNumbers

main()