Monday, March 18, 2019

#1 2018-12-28 10:09:11 pm

davidhmorgan
Member
From:: Sydney
Registered: 2014-08-20
Posts: 65

Finding a text pattern in a string

Using GREP, Regex or some other method, is it possible to isolate some text based on a pattern?

For example, in: "29/12/2018 06:59 PM EST", how can AppleScript return "29/12/2018"?

In GREP, it could be something like: [0-9]{2}/[0-9]{2}/[0-9]{4}
Is it possible to harness GREP using AppleScript?

Offline

 

#2 2018-12-29 12:33:05 am

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 5586

Re: Finding a text pattern in a string

You can access regular expressions using AppleScriptObjC:

Applescript:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

-- classes, constants, and enums used
property NSRegularExpressionSearch : a reference to 1024

set theText to "29/12/2018 06:59 PM EST"
set theText to current application's NSString's stringWithString:theText
set theRange to theText's rangeOfString:"[0-9]{2}/[0-9]{2}/[0-9]{4}" options:NSRegularExpressionSearch
set dateString to (theText's substringWithRange:theRange) as text

You can also search for dates, like this:

Applescript:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

-- classes, constants, and enums used
property NSTextCheckingTypeDate : a reference to 8

set theText to "29/12/2018 06:59 PM EST"
set theText to current application's NSString's stringWithString:theText
set theDetector to current application's NSDataDetector's dataDetectorWithTypes:(NSTextCheckingTypeDate) |error|:(missing value)
-- find matches in string; returns an array of NSTextCheckingResult objects
set theMatches to theDetector's matchesInString:theText options:0 range:{0, theText's |length|()}
if theMatches's |count|() = 0 then error "No date found"
-- get the date property of the NSTextCheckingResults
return (theMatches's valueForKey:"date") as list

(although I see the "EST" throws out the value in the second case; I presume you mean AEST.)

Last edited by Shane Stanley (2018-12-29 12:34:40 am)


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/
latenightsw.com

Offline

 

#3 2018-12-29 08:44:44 am

Marc Anthony
Member
From:: Dallas, TX
Registered: 2006-04-27
Posts: 855

Re: Finding a text pattern in a string

Hi. You can also use a shell script. This will place all instances into a list.

Applescript:

set theText to "29/12/2018 06:59 PM EST 1/1/19 12:01 AM EST"'s quoted form
(do shell script "echo " & theText & " | egrep -o '\\d+/\\d+/\\d{2,4}' ")'s paragraphs

Offline

 

#4 2018-12-29 04:16:52 pm

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4823

Re: Finding a text pattern in a string

And if you want valid dates only:  smile

Applescript:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

-- Test string. November only has 30 days. 2022 won't be a leap year.
set theText to "31/12/2018 06:59 PM EST, 1/1/19 12:01 AM EST
31/11/2018
(29/02/2000-29/2/2022)"


set theText to current application's class "NSString"'s stringWithString:theText

-- Regex for valid (d)d/(m)m/(yy)yy short dates in years 2000-2099.
-- set regexPattern to "\\b(?:31(?!/(?:0?[2469]|11))|(?:29|30)(?!/0?2/)|29(?=/0?2/(?:20)?(?:[02468][048]|[13579][26])\\b)|2[0-8]|[01]?[1-9]|10)/" & "(?:0?[1-9]|1[0-2])/" & "(?:20)?\\d{2}\\b"
set regexPattern to "\\b(?:29/0?2/(?:20)?(?:[02468][048]|[13579][26])|(?:(?:29|30)/(?:0?[13-9]|1[0-2])|31/(?:0?[13578]|1[02])|(?:2[0-8]|[01]?[1-9]|10)/(?:0?[1-9]|1[0-2]))/(?:20)?\\d{2})\\b"
set shortDateRegex to current application's class "NSRegularExpression"'s regularExpressionWithPattern:(regexPattern) options:(0) |error|:(missing value)
set matchRanges to (shortDateRegex's matchesInString:(theText) options:(0) range:({0, theText's |length|()}))'s valueForKey:("range")

set dateStrings to current application's class "NSMutableArray"'s new()
repeat with thisRange in matchRanges
   tell dateStrings to addObject:(theText's substringWithRange:(thisRange))
end repeat

return dateStrings as list --> {"31/12/2018", "1/1/19", "29/02/2000"}

Edit: Regex pattern modified to eliminate lookaheads and negative matching, hopefully thereby improving efficiency with more direct paths through the code.

Last edited by Nigel Garvey (2018-12-30 03:58:27 am)


NG

Online

 

#5 2018-12-29 07:14:48 pm

davidhmorgan
Member
From:: Sydney
Registered: 2014-08-20
Posts: 65

Re: Finding a text pattern in a string

What an abundance of Christmas presents for me! As is often the case with Shell scripts and AppleScript Obj C, I've been given some great code which I scarcely understand but definitely can put to good use. Thanks guys!

Offline

 

#6 2018-12-31 03:54:57 pm

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4823

Re: Finding a text pattern in a string

This version matches any valid d/m/y date in years 1 to 9999 AD of the proleptic Gregorian calendar. It's just for the fun of devising a regex solution to the specific query posed. Obviously it can't tell if something's meant to be a date, only that it matches the criteria and is valid if it is a date.

Applescript:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

-- Years represented by just one or two digits may be either absolute (except for 00 or 0) or abbreviations within the script's presumed period of use (between 2000 and 2099).
set testText to "407931/12/2018X, 31/12/2018, 1/1/19, 31/11/2018 (only 30 days in November), 29/02/2200 (2200 not a leap year), (29/02/1952-29/2/9996), 4/3/1, 17/4/541, 01/01/00, 25/12/0000 (no year 0)."
dmyDatesFromText(testText) -- {"31/12/2018", "1/1/19", "29/02/1952", "29/2/9996", "4/3/1", "17/4/541", "01/01/00"}

on dmyDatesFromText(theText)
   set theText to current application's class "NSString"'s stringWithString:theText
   
   -- Regex for valid (d)d/(m)m/(y)(y)(y)y short dates in years 1-9999 AD of the proleptic Gregorian calendar.
   set regexPattern to "(?<![^\\s(–-])(?:(?:(?:0?[1-9]|1\\d|2[0-8])/(?:0?[1-9]|1[0-2])|(?:29|30)/(?:0?[13-9]|1[0-2])|31/(?:0?[13578]|1[02]))/\\d{0,3}(?:[1-9]|(?<!/0{2,3})0)|29/0?2/\\d{0,2}(?:[13579][26]|[02468]?(?:[48]|(?<!(?:[13579][048]?|/[02468]?[26])0)0)))\\b(?!/)"
   set shortDateRegex to current application's class "NSRegularExpression"'s regularExpressionWithPattern:(regexPattern) options:(0) |error|:(missing value)
   
   set matchRanges to (shortDateRegex's matchesInString:(theText) options:(0) range:({0, theText's |length|()}))'s valueForKey:("range")
   
   set dateStrings to current application's class "NSMutableArray"'s new()
   repeat with thisRange in matchRanges
       tell dateStrings to addObject:(theText's substringWithRange:(thisRange))
   end repeat
   
   return dateStrings as list
end dmyDatesFromText

The regex pattern divides broadly into any date other than 29th February OR 29th February:

"(?<![^\\s(–-])(?:(?:(?:0?[1-9]|1\\d|2[0-8])/(?:0?[1-9]|1[0-2])|(?:29|30)/(?:0?[13-9]|1[0-2])|31/(?:0?[13578]|1[02]))/\\d{0,3}(?:[1-9]|(?<!/0{2,3})0)|29/0?2/\\d{0,2}(?:[13579][26]|[02468]?(?:[48]|(?<!(?:[13579][048]?|/[02468]?[26])0)0)))\\b(?!/)"


Leap year numbers end either with an odd digit followed by 2 or 6 or with an even digit followed by 4, 8, or 0 — except that if they end with two 0s, these mustn't be preceded by an odd digit, by an odd digit and (0, 4, or 8), or by an even digit and (2 or 6).  smile

Other dates are matched if (they begin with 1-28 and are followed by any month number OR they begin with 29 or 30 and are followed by any month number except 2 OR they begin with 31 and are followed by month number 1, 3, 5, 7, 8, 10, or 12) AND these are followed by a valid year number:

(?:(?:0?[1-9]|1\\d|2[0-8])/(?:0?[1-9]|1[0-2])|(?:29|30)/(?:0?[13-9]|1[0-2])|31/(?:0?[13578]|1[02]))/\\d{0,3}(?:[1-9]|(?<!/0{2,3})0)


Edits: Regex partially optimised by putting the more likely occurrences first. Leap year bug fixed. Explanation rewritten to match. Opening "\\b" replaced with a look-behind excluding anything which isn't specifically white space, an opening parenthesis, an en dash, or a hyphen-minus. Trailing "\\b" qualified with a look-ahead excluding another slash separator.

An equivalent pattern for m/d/y dates would be:

"(?<![^\\s(–-])(?:(?:(?:0?[13-9]|1[0-2])/(?:0?[1-9]|[12]\\d|30)|0?2/(?:0?[1-9]|1\\d|2[0-8])|(?:0?[13578]|1[02])/31)/\\d{0,3}(?:[1-9]|(?<!/0{2,3})0)|0?2/29/\\d{0,2}(?:[13579][26]|[02468]?(?:[48]|(?<!(?:[13579][048]?|/[02468]?[26])0)0)))\\b(?!/)"


And for y/m/d:

"(?<![^\\s(–-])(?:\\d{0,3}(?:[1-9]|(?<!\\b0{2,3})0)/(?:(?:0?[13-9]|1[0-2])/(?:0?[1-9]|[12]\\d|30)|0?2/(?:0?[1-9]|1\\d|2[0-8])|(?:0?[13578]|1[02])/31)|\\d{0,2}(?:[13579][26]|[02468]?(?:[48]|(?<!(?:[13579][048]?|\\b[02468]?[26])0)0))/0?2/29)\\b(?!/)"


Although the likelihood of any particular date occurring is exactly the same in all three cases, the optimisation logic in the the m/d/y and y/m/d patterns is different from that in the d/m/y one. When the day comes before the month, it makes sense to check first if its one of the 28 days which occur in every month; then, failing that, if it's one of the 2 days which occur in very month except (normally) February; and failing these, the 1 day which occurs in only seven of the months. When the month comes before the day, it makes more sense to check first if its one of the eleven months which contain at least 30 days, after which there remain 28 chances that the date is one of those in February and 7 that it's the last day of a month with 31 days. Only when none of the above produces a match is the leap day section tried.

Last edited by Nigel Garvey (2019-01-04 07:53:13 am)


NG

Online

 

Board footer

Powered by FluxBB

RSS (new topics) RSS (active topics)