How to validate a date text string with flexible formats?

From the point of view of slotting these things into my script, it would be easier to have just one list containing different-language plug-ins for the regex in the last shell script. For English, the “plug-in” would be “[dw]|(to(day|morrow)|yesterday)” and you’d slot it into the shell script thus:

set ValidDueDate to (do shell script ("<<<" & quoted form of theDueDate & " sed -E 'y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/ ; /^([-+]?[1-9][0-9]*" & plugin & ")$/ !s/.*/false/; /false/ !s/.+/true/ ;'")) as boolean

The German plug-in would be “[tw]|(heute|morgen|gestern)”, the French “[js]|(aujourd’\''hui|demain|hier)”, and so on. But you’d need to add more characters to the case-conversion alphabets in the shell script to cover possibilties like “mañana”.

For three-letter matches only: “[dw]|(to[dm]|yes)[[:alpha:]]", "[tw]|(heu|mor|ges)[[:alpha:]]”, “[js]|(auj|dem|hie)[[:alpha:]‘\’']*”, etc.

All languages should be accepted in PARALLEL, guess I didn’t make that clear. I combined all languages now in one plugin variable. Here’s the final script. Pretty awesome! Thanks a zillion times for your support, Nigel!

on validateDateInput(theDueDate)
	-- A string of the short-date separators to be recognised. (Edit to taste.)
	-- These will be used in a regex class, so the hyphen must be first or last.
	set allowedSeparators to "-/."
	-- A list of short-date part regexes, in day, month, year order.
	set datePartRegices to {"(0?[1-9]|[12][0-9]|3[01])", "(0?[1-9]|1[0-2])", "([1-9][0-9])?[0-9]{2}"}
	set plugin to "[dw|tw|js|ds]|(yes|to[dm]|ges|heu|mor|hie|auj|dem|aye|hoy|man|mañ)[[:alpha:]]*"
	
	-- Get the local short-date string for 1st February 4003, strip out everything except the "1", the "2", and the "3", and turn these into a 3-digit integer. Use the digits to index the short-date part regexes and to arrange them into the equivalent order in a full short-date regex.
	set order to (do shell script ("<<<" & quoted form of short date string of («data isot343030332D30322D3031» as date) & " sed -E 's/[^123]//g'")) as integer
	set separatorClass to "[" & allowedSeparators & "]"
	tell datePartRegices to set shortDateRegex to item (order div 100) & separatorClass & item (order mod 100 div 10) & separatorClass & item (order mod 10)
	
	if ((do shell script ("<<<" & quoted form of theDueDate & " sed -E '/^" & shortDateRegex & "$/ !s/.*/false/; /false/ !s/.+/true/ ;'")) as boolean) then
		-- If theDueDate is a valid date string in any of the allowed formats, test to see if it represents a valid date.
		set astid to AppleScript's text item delimiters
		set AppleScript's text item delimiters to characters of allowedSeparators -- Requires Snow Leopard or later.
		tell theDueDate's text items
			set {item (order div 100), item (order mod 100 div 10), item (order mod 10)} to {beginning as integer, item 2 as integer, end as integer}
			set {d, m, y} to it
		end tell
		set AppleScript's text item delimiters to astid
		
		set ValidDueDate to ((d < 29) or (m is in {1, 3, 5, 7, 8, 10, 12}) or ((d < 31) and (m is in {4, 6, 9, 11})) or ((d is 29) and (m is 2) and (y mod 4 is 0) and (y mod 400 is not in {100, 200, 300})))
	else
		-- Otherwise test for any of the alternative valid inputs.
		set ValidDueDate to (do shell script ("<<<" & quoted form of theDueDate & " sed -E 'y/ABCDEFGHIJKLMNÑOPQRSTUVWXYZ/abcdefghijklmnñopqrstuvwxyz/ ; /^([-+]?[1-9][0-9]*" & plugin & ")$/ !s/.*/false/; /false/ !s/.+/true/ ;'")) as boolean
	end if
	
	return ValidDueDate
end validateDateInput

validateDateInput("mañ")

Glad you’ve got what you needed. :slight_smile:

This optimisation of the above logic seems to work OK:

set ValidDueDate to ((d < 29) or (m is in {1, 3, 5, 7, 8, 10, 12}) or ((d < 31) and (m > 2)) or ((d is 29) and (y mod 4 is 0) and (y mod 400 is not in {100, 200, 300})))

PS.

Because you’ve got those bars in the class at the beginning of the regex, a bar will be accepted as valid in the input, as in “2|”. Although it’s not as clear, the class should probably be shortened to “[dwtjsd]”.

As I have multiple occurrences of those indicators within the script, my idea is to define your plugin string early in the script once, like shown here:


set theYesterdayIndicators to {"yes", "ges", "hie", "aye"} -- for yesterday, gestern, hier, ayer
set theTodayIndicators to {"tod", "heu", "auj", "hoy"} -- for today, heute, aujourd'hui, hoy
set theTomorrowIndicators to {"tom", "mor", "dem", "mañ", "man"} -- for tomorrow, morgen, demain, mañana, manana
set theDayIndicators to {"d", "t", "j"} -- for day/día, Tag, jour
set theWeekIndicators to {"w", "s"} -- for week/Woche, semain/semana

set text item delimiters to ""
set validateDayWeekIndicator to (theDayIndicators as string) & theWeekIndicators as string
set text item delimiters to "|"
set validateYesTodTomIndicator to ((theYesterdayIndicators as string) & "|" & theTodayIndicators as string) & "|" & theTomorrowIndicators as string
set plugin to "[" & validateDayWeekIndicator & "]|(" & validateYesTodTomIndicator & ")[[:alpha:]]*"
display dialog plugin buttons {"Looks exactly as it should"}

on validateDateInput(aDate)
	-- A string of the short-date separators to be recognised.
	-- These will be used in a regex class, so the hyphen must be first or last.
	set allowedSeparators to "-/."
	-- A list of short-date part regexes, in day, month, year order.
	set datePartRegices to {"(0?[1-9]|[12][0-9]|3[01])", "(0?[1-9]|1[0-2])", "([1-9][0-9])?[0-9]{2}"}
	
	-- Get the local short-date string for 1st February 4003, strip out everything except the "1", the "2", and the "3", and turn these into a 3-digit integer. Use the digits to index the short-date part regexes and to arrange them into the equivalent order in a full short-date regex.
	set order to (do shell script ("<<<" & quoted form of short date string of («data isot343030332D30322D3031» as date) & " sed -E 's/[^123]//g'")) as integer
	set separatorClass to "[" & allowedSeparators & "]"
	tell datePartRegices to set shortDateRegex to item (order div 100) & separatorClass & item (order mod 100 div 10) & separatorClass & item (order mod 10)
	
	if ((do shell script ("<<<" & quoted form of aDate & " sed -E '/^" & shortDateRegex & "$/ !s/.*/false/; /false/ !s/.+/true/ ;'")) as boolean) then
		-- If aDate is a valid date string in any of the allowed formats, test to see if it represents a valid date.
		set astid to AppleScript's text item delimiters
		set AppleScript's text item delimiters to characters of allowedSeparators -- Requires Snow Leopard or later.
		tell aDate's text items
			set {item (order div 100), item (order mod 100 div 10), item (order mod 10)} to {beginning as integer, item 2 as integer, end as integer}
			set {d, m, y} to it
		end tell
		set AppleScript's text item delimiters to astid
		
		set ValidDate to ((d < 29) or (m is in {1, 3, 5, 7, 8, 10, 12}) or ((d < 31) and (m > 2)) or ((d is 29) and (y mod 4 is 0) and (y mod 400 is not in {100, 200, 300})))
	else
		-- Otherwise test for any of the alternative valid inputs. (It might be necessary to add foreign characters here - both lower and upper case - when adding further languages)
		set ValidDate to (do shell script ("<<<" & quoted form of aDate & " sed -E 'y/ABCDEFGHIJKLMNÑOPQRSTUVWXYZ/abcdefghijklmnñopqrstuvwxyz/ ; /^([-+]?[1-9][0-9]*" & plugin & ")$/ !s/.*/false/; /false/ !s/.+/true/ ;'")) as boolean
	end if
	
	return ValidDate
end validateDateInput

set theDueDate to "2d"
if validateDateInput(theDueDate) of me is true then
	say "true"
else
	say "false"
end if

set plugin returns a string which is absolutely identical to “[dwtjs]|(yes|to[dm]|ges|heu|mor|hie|auj|dem|aye|hoy|man|mañ)[[:alpha:]]*”, the script however ends with an error, claiming plugin hasn’t been defined (number -2753 from “plugin”). What am I doing wrong?

UPDATE: Guess I just solved the problem: → of me was missing:

set ValidDate to (do shell script ("<<<" & quoted form of aDate & " sed -E 'y/ABCDEFGHIJKLMNÑOPQRSTUVWXYZ/abcdefghijklmnñopqrstuvwxyz/ ; /^([-+]?[1-9][0-9]*" & plugin of me & ")$/ !s/.*/false/; /false/ !s/.+/true/ ;'")) as boolean

Yes. It’s a “scoping” issue. Variables in handlers are local unless declared otherwise, so the ‘plugin’ in the ‘validateDateInput’ handler isn’t the same ‘plugin’ as the one at the top of the script in the “implicit run handler”.

Locals are temporary, exist only while the handler in which they occur is executing, and can only be accessed within that handler.

Globals, properties, and run-handler variables are permanent and “belong” to the script. If the script’s run directly from a file, the values of its globals, properties, and/or run-handler variables are saved back to the file when it exits

If you declare a variable global at the top of the script, it’ll be the same variable throughout the script (except in handlers where you’ve explicitly declared it local).

global plugin

set theYesterdayIndicators to .
etc.

If you instead make the declaration at the top of a handler, it’ll be the same variable as the one in the run handler (if there is such a variable there) and in any other handlers where it’s explicitly declared global.

Your ‘of me’ solution sets up a reference to the ‘plugin’ belonging to the script, which in this case is the one set in the implicit run handler.

The polite way to do things is to pass the string to the handler as an additional parameter.

validateDateInput(theDueDate, plugin)

on validateDateInput(aDate, plugin)

	-- Use 'plugin', not 'my plugin' here.

end validateDateInput

The two ‘plugin’ variables here contain the same value, but are different variables.

It’s considered very bad form to intersperse run-handler code with handler definitions, so your last six lines should be with the code at the top of the script (or it with them). However, I imagine they’ve just been added temporarily to test the rest of the script.

Pardon me, if I ever use the wrong terminology (I’m a Kraut, not a native speaker :P)…

IMO the use of parameters with handlers are only meaningful, if the handler is used in multiple instances within the script, and/or with changing/variable parameters. In my case the plugin parameter doesn’t change throughout the script (and couldn’t be replaced within anything else in the handler), which is why of me simply looks better. Do you agree? Or does it make any difference in performance?

Indeed only for testing purposes…

My script contains a bunch of handlers, which are all placed in a HANDLERS section at the very bottom of the script. What’s your opinion: In terms of script performance, should the order of the various handlers within that section match the order in which the handlers are called, or doesn’t this matter?

Hi.

I’ve posted a combined validation and interpretation script in your other thread.