How to validate a date text string with flexible formats?

I have a DueDate which will be provided as text string in one of the following formats:

  • as proper date, e.g. ‘12/09/2013’ or ‘12-9-13’
  • as number of days or weeks, e.g. ‘3d’ (in 3 days), ‘-10d’ (= 10 days ago), ‘2w’ (= in 2 weeks)
  • as ‘today’
  • as ‘tomorrow’

(The script later converts the #d/#w syntax/today/tomorrow into a date string, no questions on that part).

What I need is a code snippet, that validates that the provided DueDate text string is u conform with one of above shown, accepted formats[/u], so that the overall script doesn’t fail if the provided DueDate text string is either incomplete or contains any typos. This is what I have in mind:

try
	date (theDueDate) -- Test #1. This line works and reliably tests whether the DueDate is readable as date
        
        -- but these three formats must be accepted as OK as well
	OR (any negative or positive integer number)(ANY quantity, so 0 to infinity, of accidental spaces in-between, which would be OK)(a SINGLE occurrence of the character d, t or w) -- Test #2. How?
        OR 'today' -- Test #3. How?
        OR 'tomorrow' -- Test #4. How?

	set ValidDueDate to true
on error
	set ValidDueDate to false
end try

How to combine all four test, so that any of the four tests would lead to ‘true’? How to code tests #2, #3 and #4?

Model: MacBook Pro 13"
Browser: Safari 536.26.17
Operating System: Mac OS X (10.8)

Hello.

You haven’t found any AS Library that does it since you came here. Well, there are one open source library written in Java, that does this, there is also an open source project by Haas for scheduling and reminders, as I can recollect, that has code you may look at.

Maybe you’ll be the one that creates the As library for this? Or maybe you are in luck if you move your project into AsObj. :slight_smile:

Well, I have to admit that I’m somewhere between a professional coder and a ” how I call it ” copy-&-paste coder, so I’m hoping for an expert to come up with solution. I’m a creative so my brain isn’t exactly made for coding :smiley:

Java of course is no option, not only because of the recent security issues. It should be entirely done in AS, which I’m pretty sure is possible.

Hello.

I meant that you could look at both the java code and the c code I believe, and translate the parsing into AppleScript.

Since your requirments isn’t more than you specified, it should be fairly simple, even without.

What you need to do, is to break down the input string into words, not Applescript words, but first that, then by delimiters to get “-” and the like out of the way.

When you then count the number of numbers, or if you have a number with a letter, or just tomorrow and yesterday, then your problem would be solved, with date functions and arithmetic you can find here.

I don’t see the need to remove the date delimiters. My approach is to test the given string against every single of the 4 tests, and if 1 of them succeeds, the ValidDueDate is true, if all fail, it’ll be false. The first test taken on its own already works (which is why I don’t see the need to remove the date delimiters).

What I don’t know is how to code the other tests (using AppleScript’s text delimiters?? Naaahh…) and how to combine all 4 within the try block, I just haven’t seen any examples on multiple OR criterias in a try block yet. Especially test #2 gives me the creeps (I’ve seen ASs which call the Unix Shell with GREP like command, but don’t recall where. No, I don’t have a clue about fuzzy grep searches…).

Maybe the pseudo code below can help you. :slight_smile:

it needs some refinement: After you have set the text item delimiters, you can check for items with contents.
Valid combinations must be 3, 2 or 1. (if there is a space between 10 and d for instance, or 2 or w. I haven’t incorporated that in the pseudocode below.)

You could start by counting items with contents, from then on, 3 should be easy as they must be numbers.
1d and 2w types, can come as one or two items, today and tomorrow as a single item.

"split the text string up with text delimiters {"-","/"," "}

count the text items

if the count of text items with contents == three
   check if they are numbers, if they are: okdate!
else if count of items with contents == 1
   check if the item is tomorrow or today, if so, set due date accordingly.
  if not, strip out the first characters of the text item that are numbers, and coerce them to integer.
  if the last character is "w" or "d" then 
      set date accordingly
   else
      date is invalid
   end if
else 
   date is invalid
end if"

This does not error on my machine:

set theDueDate to "3-44-2013"
date theDueDate

But the same string with slashes does.

Hello.

I think the date delimiters for a non C locale gets the delimites from the C-like locale added, as “-” and “/” works fine for LC_ALL = no_NO.UTF-8.

Either non Us/English gets more, or some locales do! :slight_smile:

. Or was it “:” that worked?

Well, instead of date(DueDate), one could also check for ([0]1 to 31)(valid delimiter)([0]1 to 12)(valid delimiter)(YY or YYYY). That however leads to new problems:

  • local differences in the date syntax order (MM-DD vs. DD-MM) based on System Preferences > Language & Text > Formats (‘date(DueDate)’ captures that pretty well)
  • how many days a month has, incl. differences in leap years (again: ‘date(DueDate)’ captures that)

I’d stick with date(DueDate) for now, even if it is not 100% failsafe. (One could later further catch errors with a 5th test: MM-DD, where MM must not be > 12 and DD must not be > 31, whereas the question is how to ask the Mac for its local settings on the date format, MM-DD vs. DD-MM).

But, let’s return to my initial questions for now:

  • How to combine multiple try criterias in one try block? (I’m now considering nested try blocks)
  • How to try #2, #3 and #4? How to code that? Using AS’s ‘contains’? Using a Shell call?

Hello.

First of all. you know when the specified date is wrong, when setting the date by your datestring fails.

Nigel Garvey has put an univarsal date handler into code-echange that cures all for your part.

Nested try blocks and multiple try’s should both be avoided. You can however test for several condition under an “on error statement.” Not nested because a new stack is pulled up, and the code is executed within that one, before you start executing from your normal stack when you are don with the try catch, so any values you give variables there may be wiped out when you return to normal execution.

I normally code multiple tries sequentially. Setting success to false if something went wrong, then I pass by the remaining try blocks. 5 try blocks on a row, the 4 last of them embededd in if-tests.

Maybe you should read a little bit in the Apple Script Language Guide? Short and well written.

Where an error means “this test has failed”, subsequent tests have to be in the "on error’ part of the ‘try’ statement.

try
	date theDueDate
	set ValidDueDate to true
on error
	-- Other tests here, setting ValidDueDate to true or false as appropriate.
end try

Here’s an another approach I’ve been working on this afternoon:

set theDueDate to "2w"

if ((do shell script ("<<<" & quoted form of theDueDate & " sed -E '/^(0?[1-9]|[12][0-9]|3[01])[/-](0?[1-9]|1[0-2])[/-](20)?[0-9]{2}$/ !s/.*/false/; /false/ !s/.+/true/ ;'")) as boolean) then
	-- If theDueDate is a valid date string in any of the allowed formats, test to see if it represents a valid date.
	set {d, m, y} to theDueDate's words
	set {d, m} to {d as integer, m as integer}
	set ValidDueDate to (d < 29) or (m is in {1, 3, 5, 7, 8, 10, 12}) or ((d < 31) and (m is in {4, 6, 9, 11})) or ((d is 29) and (m is 2) and (y mod 4 is 0) and (y mod 400 is not in {100, 200, 300}))
else
	-- Otherwise test for any of the alternative valid inputs.
	set ValidDueDate to (do shell script ("<<<" & quoted form of theDueDate & " sed -E '/^([-+]?[1-9][0-9]*[dw]|to(day|morrow))$/ !s/.*/false/; /false/ !s/.+/true/ ;'")) as boolean
end if

A “valid date string” here contains slash and/or hyphen delimiters, the first part is a number between 1 and 31 (with/without a leading zero), the second part is a number between 1 and 12 (ditto), and the third is either a number between 0 and 99 or a number between 2000 and 2099. The number range thus limited, the rest of the applescript code above the ‘else’ line checks that the date itself is valid.

An “alternative valid input” optionally begins with “+” or “-”, then a number not equal to 0 and with no leading zeros,with any number of digits, then either “d” or “w”. Or it can be “today” or “tomorrow”.

:smiley:

I can’t wait to try that, there is only one more test to be had, and that is of course if duedate is not before today.

It is quite simple to perform a today (now()) as date object as seconds +86400 - due date as dateobject as seconds shouldn’t be negative.

You do amaze me Nigel.

Here we go … into the right direction! :stuck_out_tongue: I’ll give this a try. Perhaps you can describe the syntax of the individual lines a bit in depth. The shell script lines are a miracle to me, because I don’t know the commands and the syntax).

One more thing:

Does this script accept the various regional date formats? Probably not…

Here’s the problem: In Germany it is DD.MM.YY(YY) whereas the typical delimiter is “.”, but the Mac (including the AS date cmd) also accepts “-” and “/”. In the US it is MM-DD-YY(YY), whereas the delimiter is “-” or “/” (but what about “.”?).

Of course it is impossible to tell from “04-06-13” whether that is German format for June or US format for April. Again: this depends on the settings in System Preferences > Language & Text > Formats. See here. Note that my script in a later step uses the AS date cmd to parse the actual date from the tested string.

Two solutions:

a) We say it’s the users responsibility, to use the DD-MM or MM-DD order strictly according to that System Preference. The test really doesn’t have to take care of this. In that case however, we must accept any number up to [31], both for Days and Months, not knowing which is which.

b) The ultimate challenge: the test code snippet figures out the local System Prefs Formats setting and depending on that tests for either [±]MM[-/]DD[-/]YY(YY) OR [±]DD[.-/]MM[.-/]YY(YY).


--This is how to call the Pane (the responsible setting(s) is/are under the 3rd tab) 
-- Any idea how to read them out?
tell application "System Preferences"
	activate
	set the current pane to pane id "com.apple.Localization"
end tell

On the other hand: perhaps the current date format setting can be figured out through a shell cmd as well!?

Nope. Of course the DueDate can also be before today. That’s what is simply called ‘overdue’. That’s why I want the script to accept ‘-14d’ (so 14 days ago), and dates 2 months ago (e.g. 11/30/2012).

Uhm, well… It seems that Nigel’s code actually DOES take this into account. With my german setting (DD-MM-YYYY) the date “31/12/2013” returns true, whereas “12/31/2013” leads to false. Excellent!

The only things missing now is the acceptance of “.” as date delimiter. I’ve changed all occurrences of [/-] to [/-.], but that doesn’t work. Why?

The shell scripts use ‘sed’ to edit the input, which is assumed to be just one line. The first one tests for a date string of the form described in the paragraph beneath the script. If the copy of the line in that script doesn’t match the date string format, it’s changed to “false”. Then, if it hasn’t been changed to “false”, it’s changed to “true”. The “true” or “false” result is coerced to a boolean after the return from the shell script and the boolean is used to control the AppleScript ‘if’ statement.

If the boolean is ‘true’, the date string is further analysed to see if it represents a valid calendar date. The shell script has only allowed the date string if the first part represents a number between 1 and 31, the second a number between 1 and 12, and the third a number between (20)00 and (20)09. The date’s therefore valid if the day’s less than 29, or the month has 31 days, or the day is less than 31 and the month has 30 days, or the day is 29 and the month is February and it’s a leap year.

If the boolean from the first shell script is ‘false’, the second shell script is executed to see if the input matches any of your other allowed formats. It performs the same “true” or “false” substitution as the first shell script.

Only the formats you specified in post #1. It also assumes a day-month-year short-date order, which I deduced to be the one you use. The script can be expanded to allow other separators, but needs changes to both the sed and AppleScript codes. Testing the local short-date order can be done either with AppleScript or (I think) a shell script. However, it’s 01:00 where I am as I write…

It’s 02:12 over here (Frankfurt, Germany) :smiley:

Oops… Stupid me, didn’t notice that my german settings (DD-MM-YYYY) of course must lead to true and others to false with your script. But as I wrote in post #9: testing the syntax as text, rather than using the date cmd leads to the localization problem, which means that

a) the code snippet either has to figure out the System Preferences or
b) we have to establish a changeable date format preference within the script, depending on which the code snippet test either for one format (DD-MM-YYYY) or the other (MM-DD-YYYY).

Needless to say that I’ll mention you in the credits of the script (a pretty slick, full-featured Mail2Things script, which turns incoming emails into To Dos in Cultured Code’s Things). :wink:

On figuring out the System Preferences > Language & Text > Formats settings: Guess this should be the AppleScript relevant cmd, and this the one for the shell.

The cumbersome part now is to cover the various possible returned values… Anyone got a table showing all return values with their associated date formats at hand? Just saw that South Africa has YYYY-MM-DD … Holy cow! Shell> locale -a at least outputs a list with all possible locales.

OK. This version takes the local short-date order into account and can handle “.” separators (or any others you may wish to add) as well. I’ve changed the “year” regex to allow any four-digit year between 1000 and 9999 as well as two-digit years as before. Since this script uses simultaneous multiple delimiters, it needs to be run in Snow Leopard or later. (Or else the user must be forced to use the first of the allowed separators when entering a short date!)

on validateDateInput(theDueDate)
	-- A string of the short-date separators to be recognised. (Edit to taste.)
	-- These will be used in a regex class, so the hyphen must be first or last.
	set allowedSeparators to "-/."
	-- A list of short-date part regexes, in day, month, year order.
	set datePartRegices to {"(0?[1-9]|[12][0-9]|3[01])", "(0?[1-9]|1[0-2])", "([1-9][0-9])?[0-9]{2}"}
	
	-- Get the local short-date string for 1st February 4003, strip out everything except the "1", the "2", and the "3", and turn these into a 3-digit integer. Use the digits to index the short-date part regexes and to arrange them into the equivalent order in a full short-date regex.
	set order to (do shell script ("<<<" & quoted form of short date string of («data isot343030332D30322D3031» as date) & " sed -E 's/[^123]//g'")) as integer
	set separatorClass to "[" & allowedSeparators & "]"
	tell datePartRegices to set shortDateRegex to item (order div 100) & separatorClass & item (order mod 100 div 10) & separatorClass & item (order mod 10)
	
	if ((do shell script ("<<<" & quoted form of theDueDate & " sed -E '/^" & shortDateRegex & "$/ !s/.*/false/; /false/ !s/.+/true/ ;'")) as boolean) then
		-- If theDueDate is a valid date string in any of the allowed formats, test to see if it represents a valid date.
		set astid to AppleScript's text item delimiters
		set AppleScript's text item delimiters to characters of allowedSeparators -- Requires Snow Leopard or later.
		tell theDueDate's text items
			set {item (order div 100), item (order mod 100 div 10), item (order mod 10)} to {beginning as integer, item 2 as integer, end as integer}
			set {d, m, y} to it
		end tell
		set AppleScript's text item delimiters to astid
		
		set ValidDueDate to ((d < 29) or (m is in {1, 3, 5, 7, 8, 10, 12}) or ((d < 31) and (m > 2)) or ((d is 29) and (y mod 4 is 0) and (y mod 400 is not in {100, 200, 300})))
	else
		-- Otherwise test for any of the alternative valid inputs.
		set ValidDueDate to (do shell script ("<<<" & quoted form of theDueDate & " sed -E 'y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/ ; /^([-+]?[1-9][0-9]*[dw]|(to(day|morrow)|yesterday))$/ !s/.*/false/; /false/ !s/.+/true/ ;'")) as boolean
	end if
	
	return ValidDueDate
end validateDateInput

validateDateInput("31.01.13")

Edits: Made the "alternative input form’ parsing effectively case-insensitive and added “yesterday” to the possibilities. Fixed a bug in the date-order handling and made it more efficient. Made a couple of optimisations in the date verification logic.

Freaking awesome!! :smiley: Kudos to you!

Final issue (I promise ;)):

The script will support any language which the Things.app itself does support. Based on a language variable (en, de, es etc.; which currently is set manually in the script, but which I’ll now simply retrieve from the system), the various text strings are set into the appropriate language, like shown here:


		if myLanguage is "de" then
			set theDueDateWords to {"Fällig: ", "vorgestern", "gestern", "Heute", "Morgen", "Übermorgen", "seit", "in", "Tagen"}
		else if myLanguage is "fr" then
			set theDueDateWords to {"Échéance: ", "avant-hier", "hier", "Aujourd'hui", "Demain", "Après-demain", "depuis", "dans", "jours"}
		else if myLanguage is "es" then
			set theDueDateWords to {"Vencimiento: ", "anteayer", "ayer", "Hoy", "Mañana", "Pasado mañana", "desde", "dentro de", "días"}
		else
			-- "en" for all other languages
			set theDueDateWords to {"Due: ", "the day before yesterday", "yesterday", "Today", "Tomorrow", "The day after tomorrow", "since", "in", "days"}
		end if

That means that the validDate handler should not check for “today” and “tomorrow”, but item 4 and item 5 of the list theDueDateWords.

Furthermore, I’d like to check only for the first 3 characters of those words. So instead of insisting on “tomorrow”, the validDate handler should be content with “tom”. Why is that? Simply to cover typos: “tomorrow” is a pretty long word when typed on a Smartphone keyboard, so there is a certain risk for typos like “tomorrwo” or “tomorow”. By checking only for the first 3 characters, this risk is minimized.

Today and tomorrow is what I currently support, yesterday is OK too. Absolutely no need to acknowledge “the day before yesterday” and “the day after tomorrow”.

We’ll actually… I’m brainstorming here… Why should’t the validDate handler accept item 3, item 4 and item 5 of the theDueDateWords list (see above) in any of the languages supported by the script (en, de, es, fr)? If a spanish co-worker sends you an email which ends to “Ven(cimiento): Mañ(ana)” instead of “Due: Tom(orrow)”, it should be accepted as well, independent from your “locale” setting…

Consequently, the day and week identifiers should support the 4 languages as well

{"d", "t", "j"} -- for day/día, Tag, jour
{"w", "s"} -- for week/Woche, semain/semana

Of course it’s the co-worker’s responsibility to use your “locale” format on full dates (DD-MM-YYYY etc.).

NOTE: There are no interferences between the foreign language translations of Yesterday, Tomorrow and Today, so any first 3-characters of those words are truly unique, no need to worry from that side.

NOTE: validDate still has to care only care about the dueDate text string after "Due: " etc., I’m separating the "Due: ", "Ven(cimiento): " from the actual DueDate in an earlier step.