The -n option is stopping any of the lines from being sent to the output. When you use it, you have to specify what lines to print. For instance:
set txt to "
xxxx xxxx
xxxx xxxx
xxxxx xx x Name Name <emailad567ss@gmail.com> xxxx
xxxabcx xxx 58 xxx
xxxx xxxx
xxxx xxxx (xxxx-xx)
xxxxx Name Name <emailaddress@gmail.com> xxxx
xxx xxxxxxxxxxx xxx x xxxxxxx 6 xxx
"
do shell script "sed -En '/abc/s//y/p' <<< " & quoted form of txt
-- Or:
do shell script "sed -En '/abc/{ s//y/ ; p ; }' <<< " & quoted form of txt
@ Nigel: I was totally unaware of the fact that it is faster, quite astonishing, also the way you can use Satimage.Osax, maybe I got an old version, but last time I remember using it, I had to kind of create a regexp object, (almost) compile it, but definately drag out the matches of it… This I have to look into!
@ Yvan:
This is how I do it, assuming input is piped into sed.
E is not needed since you don’t use the enhanced regular expressions. Neither is n, since you are going to make the replacements inline, not surpressing any input.
You want to subsitute every ocurrence of abc with y, so an s command can be prepended to the search pattern, and a g flag for global should be appended after the searchpatern.
sed 's/abc/y/g'
You may find some interesting files if you google for 'macmahon sed’ and sed tutorial.
Sed also resembles a lot of how the venerable ed works. If you know how to use ed then you know a lot about how sed works already.
I was about to reply to Yvan’s question, but I see you’ve already done it.
With regard to the speeds of the various methods, I’m only saying that the particular combination of Satimage and vanilla I posted is faster than Shane’s handler and my shell script at achieving the required result with the given sample text. It doesn’t necessary mean that Satimage is always faster than ASObjC Runner or a shell script.
I am not that after speed really, to be honest, before it really matters, I am after programmer speed And I dare say that AsObjC-Runner seems to deliver the goods I want.
But it is always interesting to have a diversity of tools available, so one can pick one that fits a particular bill. So, the way you uses Satimage.Osax was totally new to me. Thanks for the insight!
I think it’s possible to combine the two ‘change’ commands:
tell application "TextWrangler" to set txt to text of front text window
set txt to (change "<[^>]+>|[^0-9 ]" into "" in txt with regexp) -- Zap e-mail addresses and anything else which isn't a digit or a space. (Uses Satimage OSAX.)
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to "+"
set txt to txt's words as text
set AppleScript's text item delimiters to astid
run script txt
There are roughly three stages: send stuff from AS to the app or scripting addition, process it, and send it back. Depending on the nature of the data, the sending back and forth takes a lot of the total time. And that has to be done similarly with an app like Runner or a scripting addition like Satimage or do shell script. The differences in performance of the regex libraries each case calls are probably trivial in such a scheme.
As I scrapped my bare head before discovering that, I pass a basic info.
When we call sed to treat a multiline file, it seems that it’s useful to double check that end of lines are linefeeds not returns.
I found a tutorial on the net but its examples were designed for the Terminal.
So, I spent time trying to edit them for do shell script instructions.
Alas, the ones dedicated to multiline documents failed.
Replacing the returns by linefeeds solved the problem.
Yvan KOENIG (VALLAURIS, France) samedi 20 octobre 2012 21:54:41
Oops, the returns were not introduced by the script editor but by Pages in which I stored the tutorial borrowed from the net.
If you suspect your text has returns instead of linefeeds, you can (if you don’t want to use AppleScript’s text item delimiters) insert an additional “sed” call to replace them before doing the line-by-line editing:
Or, as I prefer for surety:
“sed” has a peculiarity whereby when a linefeed is a replacement character, it must be rendered as a backslash (escaped with another backslash in the AppleScript shell script text) followed by an actual linefeed character. As a character to be replaced, it would be rendered as an (escaped) backslash followed by the letter ‘n’!
The linefeeds are turned into returns when ‘do shell script’ returns (!) unless you use the ‘altering line endings’ parameter:
set theText to do shell script "tr '\\r' '\\n' <<<" & quotedSource without altering line endings
With this version, any linefeeds will be preserved on the return to AppleScript, as will any trailing line endings.
Edit: As for the upper-casing of diacritical characters with “tr”, I’ve no idea why it doesn’t work with ‘do shell script’, but it does work in the Terminal, both when the instruction’s typed in directly and when it’s scripted:
Unfortunately, the result returned to AppleScript is just a reference to the Terminal tab in which the script was run, so you’d have to parse the tab’s ‘contents’ or ‘history’ to get the transformation.
Now I know why you used the instruction “without altering line endings”
My old brain assumed that it was useful when the shell script was receiving the source text.
What’s funny is the fact that I decided to try to use sed to solve a problem which doesn’t match the tool requirements;
I was trying to enhance a script applying to the index.xml file describing the contents of iWork documents.
I just realised, two hours ago that :
(a) if the file is organized as numerous paragraphs given my customized setting, it’ made of two paragraphs for current users.
One with about 50 characters and a huge one which may embed thousands of characters so that it can’t be treated by shell scripts and even by text delimiters.
So I had to read it using :
read file chemin_IndexXml using delimiter “>”
Given that, I get a huge list and extracting the wanted datas is really simple.
I’m just forced to use ASObjC Runner to unescape the returned values.
But it’s not a problem.
I will continue to try to learn sed.
Yvan KOENIG (VALLAURIS, France) dimanche 21 octobre 2012 21:06:06
In BBEdit or TextWrangler you can write “UNIX” text filters/scripts and run them from the Text Filters palette. For example, if you save this script in ~/Library/Application Support/TextWrangler and open the palette from the WIndow menu it will appear there and can be run on the front document either by double-clicking or with a key-command that you set. This script will print the sum of all numbers in the document at the end of the document.
#! /usr/bin/perl
use strict;
my $sum;
while (<>) { #read the doc line by line
print; # print the line as it is
# add to $sum every instance of a number found in the line
$sum += $1 while s/^\d[^\d]//;
}
print “\n\n$sum”; #print the sum at the end of the doc.