Format Telephone Number in Numbers

chrillek · July 1, 2023, 10:34am

The pattern is crap, for several reasons (see @peavine’s dissection of it for more detail).

This “match phone number” string is supposedly intended for international phone numbers. While a leading + sign is now often used to introduce a country prefix, there are still regions using 00 for that. And there are quite a few countries having three digit prefixes.
Assuming only spaces, dots and dashes as group separators as in [\s.-] is overly narrow – what about slashes?
It uses no capturing groups, as already pointed out by @peavine. So, it does not help in reformatting, only in matching.
And it, of course, only works for US phone numbers. Apparently everybody assumes those are the only relevant ones

@peavine is correct in their statement that the RE (?m)^(\d{3}).?(\d{3}).?(\d{4})$ works ok. However, they’re missing a backslash in front of the dots, perhaps because Discourse requires two of those outside code. As it stands now, this RE will match anything between \d{3}, '\d{3}and\d{4}`. For example, 123456789011 would also match and be rewritten as (123) 567-9011.

Also: I posted a modified JavaScript solution in the other thread. It handles both formats in any number of columns (and has two glitches):

peavine · July 1, 2023, 1:12pm

chrillek. Thanks for making note of this. When I posted this yesterday the pattern did have backslashes before the dots but apparently the forum software deleted them. I’ve edited the post to display the pattern as a script, which retains the desired backslashes. I retested the pattern and it matches 1234567890 and 123.456.7890 but not 12345678901

peavine · July 1, 2023, 2:02pm

@pavilion. I reviewed and retested my script in post 67, and it works as expected under every circumstance I could envision. The only exception is when one of the cells in a target column had been merged with another cell. Sorry I couldn’t help.

pavilion · July 2, 2023, 3:22am

I’m kind of confused at this point as to what exact pattern I should be using. If you’ve updated the original script to this:

Screenshot 2023-07-01 at 11.22.45 PM

then it still doesn’t work for me, with the same issue on the 123.456.7890 format.

pavilion · July 2, 2023, 3:30am

Everything I’ve tried still leaves me with the same results as my earlier screenshots, with the data being truncated after the space into the next column. Can’t understand how the two of us are getting different results from the same formula. And moreover, if the issue is just the regex syntax, at worst wouldn’t it just not recognize the number and skip it, and not recognize the text and skip it? I thought the regex is just to be able to identify what cells get processed and what cells get ignored. But here, both the text and the 123.456.7890 are getting identified and just being mis-processed, no?

pavilion · July 2, 2023, 3:41am

OK, well here’s something to ponder and hopefully help me. I tried something different and now it works.

I copied 5 cells of each phone number column from the original sheet and pasted by themselves into a brand new sheet with no other data, and the script as you last posted it worked perfectly. But somehow when the script runs on the full spreadsheet it doesn’t.

Not sure why this is happening?

UPDATE: Not so fast. After I copied the entire columns over, the problem reappeared. I am going to see what there might be in that second column causing the issue and report back.

pavilion · July 2, 2023, 3:55am

Appreciate your input tremendously. After testing this again my spreadsheet the issue is not the 123.456.7890 number format at all because when I isolate all the numbers from the column it works as expected. The problem is the text, and a simply test will show it to you. Please run the script on a column of two cells with text that contains more than one word, e.g. A1- Text Message, A2 = Data Message and then you’ll see what I’m experiencing on the full column. But what made this harder to understand is that once the column contains both types, the numbers and the multi-word text, then ALL the entries are formatted badly, not just the text.

pavilion · July 2, 2023, 3:57am

OK, here’s the issue. the 123.456.7890 format in the second column gets processed perfectly by your script when there is not text in other cells (or more specifically, text with a space between words). Once there are those types of cells in the column, the whole thing gets formatted badly.

As a test, run your script on a column of cells with two words in them and you’ll see exactly what I mean.

UPDATE: That’s not exactly right either. I am uploading a very simple test spreadsheet, if you could kindly run the script on each of the columns and let me know if you can makes heads or tails of what’s going on!

Test Sheet.zip (265.8 KB)

You’ll see the first Column A of only numbers works fine, but the second two columns of only text and mixed text and numbers do not.

pavilion · July 2, 2023, 4:26am

Before:

After:

Nigel_Garvey · July 2, 2023, 7:55am

I’d been under the impression up till now that each column contained its own type of data and that column “B” only contained US-style telephone numbers.

If a cell has a space in it, the text is going to be split into separate cells when pasted back, in the same way the reformatted phone numbers were before the space in them was replaced with a non-break space. An extra line needs to be inserted into peavine’s script, just before the clipboard’s set, to replace every space in the entire text with a non-break space:

set theString to (theString's stringByReplacingOccurrencesOfString:space withString:(character id 160) options:1024 range:{0, theString's |length|()})

But I don’t know why normal spaces are confusing Numbers here.

chrillek · July 2, 2023, 8:10am

Just a wild guess: Numbers is tearing pasted data as it is a text file that is told to read. Therefore, it tries to find a „separator“ like comma, semicolon or tab. In their absence, it considers a space to be the separator. Adding a comma to the end of the first line fixes this behavior. I think.

Nigel_Garvey · July 2, 2023, 8:18am

Hi @chrillek.

Yes. I think you’re right. I’ve just been testing with my multi-column script here and it works perfectly if I change the non-break spaces back to ordinary spaces because it uses tabs as the column separators.

chrillek · July 2, 2023, 9:37am

One more thing: Appending a comma to the first line before setting the clipboard values seems to fix all the issues. And it’s bloody simple to fix this first line after pasting…

So, here’s my final take on this problem in JavaScript. It works just fine with the sample CSV posted by @pavilion here Format Telephone Number in Numbers - #91 by pavilion

The only grievance I have are the completely useless calls to display(). That is, they’d be useless if Apple’s stuff were working sanely.



/* Anonymous, autoexecuting function to do the work */
(() => {

/* Set up some constants:
  - the columns to search/replace in 'columnIndices' ('A' is 0, 'B' is 1 etc)
  - the 'app' constant (like the tell block in AS)
  - the table to work with 
  - the regular expression to find values that have to be modified
  - the regular expression to match the phone numbers
  - the replacement string to change the phone numbers
  - and the 'currentApplication' which is needed (?) to work with the clipboard
  */
  
const columnIndices = [0,2]; // Select columns A and C
const app = Application("Numbers");
const table = app.documents[0].sheets[0].tables[0]; //in 1st doc: 1st table's 1st sheet
const testRE =  /^[0-9.]{10,12}$/; //rows of 10 to 12 characters only containing digits and dots
const dateRE = /^(\d{3})\.?(\d{3})\.?(\d{4})$/; //phone numbers with and without dots
const replaceString = "($1) $2-$3"; // correctly formatted phone numbers
const ca = Application.currentApplication();
ca.includeStandardAdditions = true;

  /* Loop over the columns in 'columnIndices */
  columnIndices.forEach(c => {
    const currentCol = table.columns[c];
	/* Get all values for the current column. That is an 'Array' in JavaScript */
    const vals = currentCol.cells.value();
	/* Loop over the values and create a new 'Array' containing
	  - the unmodified originals for any undefined/empty values or those not matching the 'test' condition
	  - the corrected phone numbers 
	  */
    const newVals = vals.map(v => {
      if (!v || ! testRE.test(v)) return v;
      const nv = (v+"").replace(dateRE, replaceString);
  	return nv;
    });
	/* Set the clipboard to a newline separated string. Append a comma to the
	  first line to stop Numbers from treating the phone numbers as strings
	  */
    ca.setTheClipboardTo(newVals.join('\n').replace('\n',',\n'));
    app.activate();
	/* Set the selection for the table */
    table.selectionRange = currentCol;
	delay(0.3);
	/* Send the "paste without format" keystroke */
    const se = Application('System Events');
    se.keystroke('v',{using: ["shift down", "option down", "command down"]});
	delay(0.3);
	/* fix the first row */
	currentCol.cells[0].value = newVals[0];
  })
})()

I’m sure some kind soul can recreate this in AppleScript if they need it.

peavine · July 2, 2023, 2:53pm

@pavilion. Thanks for the test spreadsheet, which allowed me to reproduce the issue.

I tested Nigel’s suggestion, which seemed to fix the issue with my script. Also, I found a line of my script that could cause a problem (not related to the current issue), and I have fixed that also. So, my suggestion for testing is:

use framework "Foundation"
use scripting additions

-- formatColumn("A")
-- delay 5 -- test lower values
-- formatColumn("B")
-- delay 5 -- test different values
formatColumn("C")

on formatColumn(theColumn)
	tell application "Numbers" to tell table 1 of sheet 1 of document 1
		set format of column theColumn to text
		set theValues to value of every cell in column theColumn
		set selection range to range (theColumn & "1:" & theColumn & "1")
	end tell
	set theArray to current application's NSArray's arrayWithArray:theValues
	set theString to theArray's componentsJoinedByString:linefeed
	set thePattern to "(?m)^(\\d{3})\\.?(\\d{3})\\.?(\\d{4})$"
	set theString to (theString's stringByReplacingOccurrencesOfString:thePattern withString:("($1)" & character id 160 & "$2-$3") options:1024 range:{0, theString's |length|()})
	set thePattern to "(?m)^<null>$"
	set theString to (theString's stringByReplacingOccurrencesOfString:thePattern withString:"" options:1024 range:{0, theString's |length|()})
	set theString to (theString's stringByReplacingOccurrencesOfString:space withString:(character id 160) options:1024 range:{0, theString's |length|()})
	set the clipboard to (theString as text)
	delay 0.2 -- test different values
	tell application "System Events" to tell process "Numbers"
		set frontmost to true
		click menu item "Paste" of menu "Edit" of menu bar 1
	end tell
end formatColumn

pavilion · July 2, 2023, 3:28pm

Works like an absolute charm! So appreciative for all your help (as well as @chrillek) and hope others in need of the same solution will find this super useful.

p.s. I assume there’s nothing left to tweak in the script to make it any faster?

pavilion · July 2, 2023, 3:33pm

Sounds right, and was something I also saw earlier that when you add a comma after the first cell, it formats more reliably. Thanks for the great suggestion that got this working right!

pavilion · July 2, 2023, 3:36pm

Would doing this in @peavine’s script make it run any measurably faster, do you think? Can we try that out and see?

chrillek · July 2, 2023, 3:39pm

Why would adding more code make it run faster? And why is speed so important? Did you test the script I posted for speed? I have hunch that it might be faster than the AS stuff, but I’m not sure.

pavilion · July 2, 2023, 3:43pm

My sense from your comment was that adding the comma at the start and then removing it later would make the script more efficient and thus run faster, which is just something I assume is always better than slower, so I thought it might make sense to compare how the two versions run side by side.

peavine · July 2, 2023, 4:05pm

@pavilion. I don’t know of any way to make the code itself faster. If you are formatting multiple columns at once, you could try reducing or eliminating the 5-second delays. And, if the columns are adjacent, you could rewrite the script to format adjacent columns in one pass of the handler, although that might be more work than it’s worth.

BTW, I ran a timing test and it takes my script 350 milliseconds to format column C of your test spreadsheet. Of this amount, only 15 milliseconds is spent formatting data, and the rest is attributable to Numbers doing its work. Unless you take an entirely different approach, I don’t know of a way to make Numbers faster.