Check and Search and Replace with Regex

Hi,

I need to check if a filename contains a particular string.
The string is composed by:
Pipe | char
3 numbers
OTT string
Pipe | char

I have successfully done this with:

set filename to "Thailand |456OTT|.jpg"
try
	set checkString to do shell script "/bin/echo " & quoted form of filename & " | /usr/bin/grep -E '\\|[0-9]{3}OTT\\|'"
on error msg number errnum
	--display dialog msg
	set checkString to ""
end try

checkString

First I ask if all the syntax is correct.
But in case I need to remove this string? If the string is present I also need to remove the extra space before the string (if blank is present)

Example:

Thailand |456OTT|.jpg —> Thailand.jpg
Los Angeles|456OTT|.jpg —> Los Angeles.jpg

Some expert Regex user can help me?

Ame

You can use sed because if the expression is present it will remove it, otherwise it returns the original string.

set filename to "Thailand |456OTT|.jpg"
do shell script "sed -E 's/[ ]?\\|[0-9]{3}OTT\\|//' <<<" & quoted form of filename

You’re expression is correct I just added to ‘[ ]?’ to remove the optional leading space.

If you’re using Mavericks, you should have at least one regex handler in one of your (ASObjC-based) libraries:

use framework "Foundation"

on replacePattern:thePattern inString:theString usingThis:theTemplate
	set theRegEx to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
	set theResult to theRegEx's stringByReplacingMatchesInString:theString options:0 |range|:{location:0, |length|:length of theString} withTemplate:theTemplate
	return theResult as text
end replacePattern:inString:usingThis:

Simple, fast…

Many thanks Shane!!!

˜:lol:’

Not only that, it has Unicode support. You can choose a character by it’s unicode character number ‘/u000A’ and multi byte value as single characters is supported as well, of course with some performance costs. But let’s be realistic, you need to a bunch of data before an do shell script makes up for it’s relative slow execution time.

Hello.

Huh! That was really quick to check of my todo list. :smiley:

Thank you Shane!

I’ll post a reference to the regex class here in case others than I are interested in having a peak…
I think I’ll time the different solutions when speed is mandatory, (and the texts large).

It took me five minutes to get this working, and I’m not totally proficient with the capabilities of the handler yet, due to placeholders and such in the template.

Well, I saved Shane’s script in the Script Libraries folder as a script bundle with the name RegEx.scptd, and hooked off for AsObjC in the drawer. I did not adding any further use clauses onto it, and then I saved the script below, which works. :slight_smile:

use AppleScript version "2.3"
use scripting additions
use re : script "RegEx"
set mpat to "not a drill"
set tres to re's replacepattern:mpat instring:"This is not a drill" usingThis:"a test"
--> "This is a test"

Here is an example of using capture groups in the template, (using regexe’s should be farily easy, the full syntax can be found in the class reference to NSRegularExpression above.

set mpat to "(This is not a)( drill)"
set tres to re's replacePattern:mpat inString:"This is not a drill" usingThis:"A$2 is good to have\\!"
--> "A drill is good to have!"

that long :stuck_out_tongue:

Yes, I haven’t used NSRegexp from before, and they implement it different, than how you do it directly with ICU.

By the way, I would have loved to hear you tell Linus Thorvalsds that you can learn C in one day. :stuck_out_tongue:

Comparatively speaking :wink:

Well this is with small text:

use theLib : script "<lib name>" 
use scripting additions

set filename to "Thailand |456OTT|.jpg"
log 1
repeat 10 times
	theLib's replacePattern:"[ ]?\\|[0-9]{3}OTT\\|" inString:filename usingThis:""
end repeat
log 2
repeat 10 times
	do shell script "sed -E 's/[ ]?\\|[0-9]{3}OTT\\|//' <<<" & quoted form of filename
end repeat
log 3
repeat 10 times
	set checkString to do shell script "/bin/echo " & quoted form of filename & " | /usr/bin/grep -E '[ ]?\\|[0-9]{3}OTT\\|'"
end repeat
log 4

And the log:

0000.001 (1)
0000.007 (2)
0000.076 (3)
0000.142 (4)

I would say to him that Learning C from Andrew S. Tanenbaum is harder than from K&R (joking of course).

Hehehe. I don’t have any bad C-books, I think.

Hahaha. But I once read an Applescript tutorial, where the author was infatuated with Julia Roberts. I bet there were “Julia Roberts” in all the code examples. :smiley:

Some of us are much more discerning in who we put in the example scripts of our books…

Hello.

If I remember it right, then I think Julia Roberts was in every example, save one or two. :smiley: