Is there a better RegEx for this?

To convert dates in text that often occur in variable formats to a uniform standard, I found the following pair of regular expressions to do it and wrote the script below from them using Satimage Text Tools. Is there a better way than the two-step process here?


(* 
Regular Expressions by James Sehrier on Keyboard Maestro list that I have adjusted from that format to Satimage regexp AppleScript format.
*)

set myDates to {"3/30/10", "9/11/2012", "1/1/2000"}

--> get dates to date that is (0)mm/(0)dd/(yy)yy format
set regEx_1 to "(\\d+)(?:/)(\\d+)(?:/)(\\d+)"
set Replace_1 to "0\\1/0\\2/\\3"

--> then fix for mm/dd/yy format
set regEx_2 to "(?:\\d*)(\\d\\d/)(?:\\d*)(\\d\\d/)(?:\\d*)(\\d\\d$)"
set Replace_2 to "\\1\\2\\3"

set newDates to {}

repeat with oneDate in myDates
	set T1 to change regEx_1 into Replace_1 in oneDate with regexp
	set T2 to change regEx_2 into Replace_2 in T1 with regexp
	set end of newDates to T2
end repeat

--> newDates = {"03/30/10", "09/11/12", "01/01/00"}

Hi, Adam.

I don’t know about reducing the number of steps, but it’s possible to do them all with one ‘change’ command:


(* 
Regular Expressions by James Sehrier on Keyboard Maestro list that I have adjusted from that format to Satimage regexp AppleScript format.
*)

set myDates to {"3/30/10", "9/11/2012", "1/1/2000"}

--> get dates to date that is (0)mm/(0)dd/(yy)yy format
set regEx_1 to "(\\d+)(?:/)(\\d+)(?:/)(\\d+)"
set Replace_1 to "0\\1/0\\2/\\3"

--> then fix for mm/dd/yy format
set regEx_2 to "(?:\\d*)(\\d\\d/)(?:\\d*)(\\d\\d/)(?:\\d*)(\\d\\d$)"
set Replace_2 to "\\1\\2\\3"

set newDates to {}

repeat with oneDate in myDates
	set end of newDates to (change {regEx_1, regEx_2} into {Replace_1, Replace_2} in oneDate with regexp)
end repeat

newDates --> {"03/30/10", "09/11/12", "01/01/00"}

How to edit this script to keep four digits years which, from my point of view, is the best format ?

Yvan KOENIG (VALLAURIS, France) dimanche 12 février 2012 18:55:04

@Nigel;

My version was simply nested:

repeat with oneDate in myDates
	set end of newDates to change regEx_2 into Replace_2 in (change regEx_1 into Replace_1 in oneDate with regexp) with regexp
end repeat

Yours is much cleaner – I didn’t know Text Tools would accommodate a list input.

@Yvan;

I’m a newbie at regex so I don’t know how to dot that without a lot of struggling. I will try.

A simple way to patch it would be to make the value of Replace_2 “\1\220\3”. There are undoubtedly better ways in terms of the overall process.

Thank you.

Now, I will try to decipher that.

Yvan KOENIG (VALLAURIS, France) lundi 13 février 2012 10:47:52

It’s the regex equivalent of a number of ‘text -2 thru -1 of (“0” & myNumber)’ operations. :slight_smile:

-- This is just to explain the regex. I doesn't perform it.

set myDates to {"3/30/10", "9/11/2012", "1/1/2000"}

set regEx_1 to "(\\d+)(?:/)(\\d+)(?:/)(\\d+)" -- (Remember digits)(forget "/")(remember digits)(forget "/")(remember digits)
set Replace_1 to "0\\1/0\\2/\\3" -- Replace with "0(memory 1)/0(memory 2)/(memory 3)"
--> "03/030/10" or "09/011/2012" or "01/01/2000"

-- Then:
set regEx_2 to "(?:\\d*)(\\d\\d/)(?:\\d*)(\\d\\d/)(?:\\d*)(\\d\\d$)" -- (Forget digits)(but remember the two before "/" and the "/" itself)(forget digits)(but remember the two before "/" and the "/" itself)(forget digits)(but remember the two before the end of the string)
set Replace_2 to "\\1\\220\\3" -- Replace with "(memory 1)(memory 2)20(memory 3)"
--> "03/30/2010" or "09/11/2012" or "01/01/2000"

I’m only just learning this stuff myself!

Thanks Nigel

Yvan KOENIG (VALLAURIS, France) lundi 13 février 2012 15:56:03

Hello

I’m glad because I found a way to get a better result.

The existing code replaced year 1943 by year 2043.
It was really annoying because I was born in 1943 :wink:

So I used an edited code.


(* 
Regular Expressions by James Sehrier on Keyboard Maestro list that I have adjusted from that format to Satimage regexp AppleScript format.
*)

set myDates to {"3/30/10", "9/11/2012", "1/1/00", "31/12/43", "31/12/1943"}

--> get dates to date that is (0)mm/(0)dd/(yy)yy format
set regEx_1 to "(\\d+)(?:/)(\\d+)(?:/)(\\d+)"
set Replace_1 to "0\\1/0\\2/20\\3"

--> then fix for mm/dd/yy format
set regEx_2 to "(?:\\d*)(\\d\\d/)(?:\\d*)(\\d\\d/)(?:\\d*)(\\d\\d\\d\\d$)"
set Replace_2 to "\\1\\2\\3"

set newDates to {}

repeat with oneDate in myDates
	(change regEx_1 into Replace_1 in oneDate with regexp)
	(change regEx_2 into Replace_2 in result with regexp)
	set end of newDates to result
	--set end of newDates to (change {regEx_1, regEx_2} into {Replace_1, Replace_2} in oneDate with regexp)
end repeat

newDates --> {"03/30/2010", "09/11/2012", "01/01/2000", "31/12/2043", "31/12/1943"}

Yvan KOENIG (VALLAURIS, France) samedi 31 mars 2012 22:11:49

Clever, Ivan;

I too have seen that Keyboard Maestro macro but didn’t think to translate it.

Hello Adam

In fact, I started from your script or maybe from Nigel’s one :wink:

Now I will try to convert it for Shane STANLEY’s ASObjC Runner because I’m not sure that we will be able to use OSAXens when SandBoxing will be widely spread.

Yvan KOENIG (VALLAURIS, France) dimanche 1 avril 2012 09:49:04

Hi, I got it.


(* 
Regular Expressions by James Sehrier on Keyboard Maestro list that I have adjusted from that format to Shane Stanley's ASObjC.app.
*)

set myDates to {"3/30/10", "9/11/2012", "1/1/00", "31/12/1943", "31/12/43"}

--> get dates to date that is (0)mm/(0)dd/(20)(yy)yy format
set regEx_1 to "(\\d+)(?:/)(\\d+)(?:/)(\\d+)"
set Replace_1 to "0$1/0$2/20$3"

--> then fix for mm/dd/yyyy format
set regEx_2 to "(?:\\d*)(\\d\\d/)(?:\\d*)(\\d\\d/)(?:\\d*)(\\d\\d\\d\\d$)"
set Replace_2 to "$1$2$3"

tell application "ASObjC Runner"
	(*
	look for regEx_1 in myDates replacing with Replace_1
	--> {"03/030/2010", "09/011/202012", "01/01/2000", "031/012/201943", "031/012/2043"}
	look for regEx_2 in result replacing with Replace_2
	--> {"03/30/2010", "09/11/2012", "01/01/2000", "31/12/1943", "31/12/2043"}
	set newDates to result
	*)
	set newDates to look for regEx_2 in (look for regEx_1 in myDates replacing with Replace_1) replacing with Replace_2
end tell
newDates --> {"03/30/2010", "09/11/2012", "01/01/2000", "31/12/1943", "31/12/2043"}

As you may see, like the regex one, it behaves the same for dd/mm/(yy)yy and mm/dd/(yy)yy.

Yvan KOENIG (VALLAURIS, France) dimanche 1 avril 2012 10:27:00

Nice, Yvan. :slight_smile:

Just for good measure, here’s a shell-script version. I’ve included an extra stage to interpret single-digit years too, as in “2/2/2”.


(* 
Regular Expressions by James Sehrier on Keyboard Maestro list that Adam Bell has adjusted from that format to Satimage regexp AppleScript format. Modified by Yvan Koenig to return four-digit years. Adapted thence for sed by NG.
*)

set myDates to {"3/30/10", "9/11/2012", "1/1/00", "2/2/2", "31/12/43", "31/12/1943", "9/3/1949", "7/7/7"}

set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to return
set myDates to myDates as text
set AppleScript's text item delimiters to astid

--> get dates to date that is (0)mm/(0)dd/(yy)yy format
set regEx_1 to "([0-9]+/)([0-9]+/)([0-9])"
set Replace_1 to "0\\10\\220\\3"

-- Insert an extra "0" into any year which begins with "20" and only has 3 digits.
set regEx_1a to "/20([0-9]([^0-9]\\|$))"
set Replace_1a to "/200\\1"

--> then fix for mm/dd/yy format
set regEx_2 to "[0-9]*([0-9]{2}/)[0-9]*([0-9]{2}/)[0-9]*([0-9]{4})"
set Replace_2 to "\\1\\2\\3"

set newDates to paragraphs of (do shell script ("echo " & myDates & " | sed -Ee 's|" & regEx_1 & "|" & Replace_1 & "|g'  -e 's|" & regEx_1a & "|" & Replace_1a & "|g' -e 's|" & regEx_2 & "|" & Replace_2 & "|g'"))
--> {"03/30/2010", "09/11/2012", "01/01/2000", "02/02/2002", "31/12/2043", "31/12/1943", "09/03/1949", "07/07/2007"}

Hi Nigel

The first conversion was not mine. It was an Adam Bell’s one :slight_smile:

Of course, as your version requires no third party tool, it’s my preferred one.

Yvan KOENIG (VALLAURIS, France) dimanche 1 avril 2012 16:35:57

OK. I’ve redistributed my praise at the top of the script. I think it’s right now. :wink:

Thanks.

In French we say : “Il faut rendre à César ce qui est à César”.

In English it may be :
“Render therefore unto Caesar the things which are Caesar’s”

or

“give credit where credit is due”

Better don’t hurt the forum’s admin :wink:

Yvan KOENIG (VALLAURIS, France) dimanche 1 avril 2012 18:29:52

Adamus Bellicosus Maximus Caesar ” Imperator (sub Raio Barbero) Fororum MacScripteri!

One can’t help but tremble a little… :confused:

Get real, guys :smiley:

Since you’re using Satimage anyway, how about using the strftime function as follows?


set myDates to {"3/30/10", "9/11/2012", "1/1/2000"}

set newDates to {}
repeat with oneDate in myDates
	set end of newDates to strftime date oneDate into "%m/%d/%y"
end repeat

Hi.

Thanks for the suggestion, about which I myself didn’t know. It has a slight restriction in that it only works where the user’s short-date-order preference (“Formats” in the “Language & Text” preference pane) matches the dates in the list. Otherwise it errors on out-of-range numbers. (The “30” in the first date is out of range in my dd/mm/yyyy set-up.) However, this often won’t be a problem. The Regex approach works with both dd/mm/yyyy and mm/dd/yyyy short dates, but not (obviously) with yyyy/mm/dd dates or (in the scripts above) with different separators.