Sed Context Grep

kel1 · May 10, 2013, 7:00pm

Hi,

Still going through this tutorial on sed. Can someone tell me if grep cannot do what this sed script cannot do. I added “drain” to the text and it doesn’t work. It’s supposed to print the line before and after a found pattern:


set t to quoted form of "The
rain
drain
in
Spain
stays
mainly
in
the
plain."
set cmd to "echo " & t & " | sed -n '
/ain/ !{
x
d
}
/ain/ {
x
p
x
p
n
p
a\\
---
x
}'"
do shell script cmd

→ "The
rain
drain

in
Spain
stays

stays
mainly
in

the
plain."

Shouldn’t there be:

rain
drain
in

for when "drain is found? It appears to not find “drain”.

Editted: think I got it. After it finds “rain”, it does the printing, then it exchanges drain back into the buffer. Then it goes back to the beginning and reads the next line bypassing the “drain” line. Need to look at this again.

Editted: yes the n read in the “drain” line already. So how do you make the drain line the next line to be read when it’s already read and placed in the hold buffer?

Thanks,

McUsrII · May 10, 2013, 10:55pm

Hello.

It isn’t easy to cope with every situation, but I think you can have a pattern within the pattern!

By the way, I recently came to learn that the regexp algorithms used in awk, and sed, can be up to a million times faster, than those used in perl, ruby, python and Java! (When the patterns gets compicated enough.

It is good there are some benefits to use Applescript, so we can get some speed somewhere, as the do shell script doesn’t count for much, once the files or patterns are big. Here is a link to the article..

By the way, I have secured a utility you can use for writing to a terminal window from a do shell script, a tiny bit, as the commands are executed with the rights of the holder of the terminal window, so I have only turned off signal handling, while the command is active in doing stuff. There is a link to it here with source. I should mention that DJ Bazzie Wazzie brought it to my notice, and that you can read all about it here

kel1 · May 10, 2013, 11:22pm

Hi McUsrII,

I haven’t read the article yet, but I know what you’re talking about the cost of calling ‘do shell script’. Once you do one call then If it’s worth the call then it’s ok. Depends on the situation.

It seems that AppleScript programmers have been doing their job, except for the result window in Script Editor showing results.

So, I want to do as many things as I can, If I do have to ‘do shell script’. Only after doing it can I know which is better; AppleScript or ‘do shell scirpt’.

In the mean time, I find sed fun for some reason. It kind of reminds me of the olden days with assembly language. I used to have the old commodore with the two processors.

But I’m diversing.

BTW, I timed a ‘do shell script’ call vs AppleScript and ran it a thousand times. Not calling the unix was like a thousand times faster. The shell script took about 6 seconds and the applescript to half a second. What the overhead of calling the ‘do shell script’ really depends.

Editted: actually Vanilla AppleScript was 500 times faster because it took .05 secs I think and the shell script took almost 6 seconds.

McUsrII · May 11, 2013, 12:46am

Hello.

I have actually gotten hold some fast 30 ines of c, so I could have had implemented it in Applescript, but I provide the ink here should anybody feel indulged.

I use sed, because it is fast, it is not bloated like perl, but has really a small command set, that must be practiced now and then. It is really simple, just a loop, a pattern space, and a holdspcace. But the terseness, and the abstraction level combined, makes it a little bit tricky. I never fiddle with it in Applescript before I have gotten it right in a Terminal, because the edit-test cycle is shorter there, even if it is in a sed script.

kel1 · May 11, 2013, 1:11am

McUsr, your grasp of what’s going on is amazing. Tricky is good and it makes it fun. There are many ways to do things.

I never liked c in school. I liked Pascal better back then. The wording was more like a higher level language. Maybe that’s why I never became a programmer.

I don’t know why I started reminiscing, but one day I was on the internet back in about1981. Trying to connect to a game. When I first saw typing on my screen, it was great. The site was giving directions, all text. Compared to now, it’s unreal man.

Editted: no it was about 1984.

DJ_Bazzie_Wazzie · May 11, 2013, 6:37am

Are you sure about that? Pascal is the same level as C and the performance depends on the compiler. Why Pascal is fading away is simply because Apple stopped developing their OS at the early '90s in Pascal which makes it no longer an OS language. Pascal has proven itself, in the time that every bit and cycle counts, to be good as C.

McUsrII · May 11, 2013, 9:25am

Hello.

There is a matter of fact a good free Pascal compiler out there, I haven’t had the time to check it out, GNU Pascal, that supports almost every pascal dialect there were. I’d look for that if you want to have a go with it.

The problem with pascal as I see it, is the lack of standardized libraries, and that it is a kind of theoretical language. It is nicer than C, but the “clean” approach makes it, (or made it when I used it), much more limiting.

Yesterday, I wrote a c-program, that gotoe’d into the middle of a case statement, and installed a longjumphandler, gotoed out again, the longjump was later being used from within a signal, that were issued, when the program was awakened again, after it had suspended itself. It worked! Try that in Pascal! (You can probably do that but.).

DJ_Bazzie_Wazzie · May 11, 2013, 11:21am

It is possible within Pascal but like in many Programming language (including C) a goto is considered as lack of design and bad software development and should be the programmer’s last choice. So if goto in Pascal is easier or worse than in C, it still is considered as equally bad/poor designed. However some programming language don’t have a block design like C or Pascal and uses goto (on line number) instead like in BBC Basic.

McUsrII · May 11, 2013, 11:25am

Hello.

Sometimes a goto is the solution, sometimes a longjump, but they should be avoided if possible, and used sparsely, they are ok, when they simplify things in total, and I guess that is why it is included. It is not to be used like in BBC basic. But only when there aren’t any good structured solution.

Nigel_Garvey · May 11, 2013, 11:35am

Hi.

I’m not sure this is the best way to do it, but it seems to work with the given text and variations. Not quite the same as grep -C though. I’ve left it uncommented.

set t to quoted form of "The
rain
drain
in
Spain
stays
mainly
in
the
plain"
set cmd to "echo " & t & " | sed -n '
1 h
2,$ {
	/ain/ !{
		x
		/ain/ {
			G
			p
			$ !s/^.*$/---/p
			c\\'$'\\n''
		}
	}
	/ain/ {
		x
		/ain/ {
			G
			p
			s/^[^[:cntrl:]]*\\n//
			h
			$ !s/^.*$/---/p
			$ c\\'$'\\n''
		}
		/---/ !{
			s/^[^[:cntrl:]]*\\n//
			G
			h
		}
	}
}
$ {
	g
	/ain/ p
}'"
do shell script cmd

McUsrII · May 11, 2013, 12:25pm

kel1 · May 12, 2013, 7:18am

I finally got how to make it loop at the right place in the script, so it could print the lines before and after the pattern(s). Not sure, if this is how grep works when there are consecutive matching lines. I need to check out how Nigel’s results look.


set t to quoted form of "The
rain
train
in
Spain
stays
mainly
in
the
stain
plain."
set cmd to "echo " & t & " | sed -n '
/ain/ !{
x
d
}
/ain/ {
x
# print the line before
p
# get a copy of the matching lline
x
# print the matching line
p
:loop
# clear pattern space and get next line
n
# print next line
p
# branch if line contains pattern
/ain/ b loop
# print divider
a\\
---
x
}'"
do shell script cmd

Thanks a lot,

Nigel_Garvey · May 12, 2013, 8:19am

Your results are closer to grep’s than mine. It merges overlapping contexts.

set t to quoted form of "The
rain
train
in
Spain
stays
mainly
in
the
stain
plain."
set cmd to "echo " & t & " | grep -C1 'ain'"
do shell script cmd

McUsrII · May 12, 2013, 9:28am

Hello.

It is very admirable kel!

Here is a technique for using sed, I have been playing with, in order to make it more versatile, (or to use as little as possible of it.)

The idea here, is to use sed in combination with other tools, where you really have as much as you can of precomposed input in separate files, and leverages upon that you can specifiy stdin as either “-” /dev/fd/0 or /def/stdin, (one of these normallly works.).

Ok, so I have a preamble of html, called file1, containing up to and including the body tag, and I have a “postamble”, containting the closing body tag and the rest of it, and a list, of say greek letters spelled out in latin, and the sed script below to make an unordered list:

[code]#!/usr/bin/sed -nf
1i\

s_^.*_

&

\[/code] Then, in order to generate an html file with the result, I may use a commandline like this:

cat alfabet |./ul.sed |cat file1 - file2 >alfa.html

DJ_Bazzie_Wazzie · May 12, 2013, 6:36pm

In AWK it can be like with overlap:


do shell script "awk 'BEGIN{first=1}/ain/{
	if (first==0)
		print \"---\"
	print head
 	print $0
	getline
	print $0
	first=0 };{head=$0}' <<< " & t

or without overlap:


do shell script "awk 'BEGIN{first=1}/ain/{
	if (first==0)
		print \"---\"
	print head
 	print $0
	getline
	print $0
	getline
	first=0 };{head=$0}' <<< " & t

McUsrII · May 12, 2013, 8:02pm

One day soon, I’ll write one in C that mimick’s Nigel Garvey’s version in sed, but with filenames and line numbers.

cgrep is a tool I hopefully don’t use to much, but it is really handy when you are tracking a variable or something, as useful as diff, for detecting changes between two versions.

It feels kind of weird for starters, to just look at fragments of files, but I experienced that as a boon once I got used to it: I don’t have to navigate, and I only look at the parts that interests me, there and then.

kel1 · May 12, 2013, 8:22pm

Hi,

Nigel,

Your script was actually what I was thinking it should be with the text as is. When I add a matching pattern in the first line:


set t to quoted form of "The brain
rain
drain
in
Spain
stays
mainly
in
the
plain"
set cmd to "echo " & t & " | sed -n '
1 h
2,$ {
	/ain/ !{
		x
		/ain/ {
			G
			p
			$ !s/^.*$/---/p
			c\\'$'\\n''
		}
	}
	/ain/ {
		x
		/ain/ {
			G
			p
			s/^[^[:cntrl:]]*\\n//
			h
			$ !s/^.*$/---/p
			$ c\\'$'\\n''
		}
		/---/ !{
			s/^[^[:cntrl:]]*\\n//
			G
			h
		}
	}
}
$ {
	g
	/ain/ p
}'"
do shell script cmd

The second and third matches don’t come out right.

Hi DJ Bazzie Wazzie,

I have a new library; awk. Thought I’d start with sed first because I’ve tried to learn it before, but the tutorials on awk seemed complicated. Thanks for the intro.

Well, my biorythm’s intellectual line is going up (just passed zero), so now is the time to learn this stuff.

kel1 · May 12, 2013, 9:11pm

Just remembered what I read on the internet somewhere, that grep cannot print the same lines more then once. That’s why the grep output is incomplete and maybe has those double dividers sometimes.

Editted: I don’t know what the person meant by this. Grep seems to print the same line. Need to read that post again. if I can find it. I probably read it wrong.

Nigel_Garvey · May 12, 2013, 9:21pm

I could have sworn I tested with that before I posted. Obviously not. Sorry. I’ll try and fix it ” but not tonight.

DJ_Bazzie_Wazzie · May 13, 2013, 12:15pm

You’re welcome.

It’s good to start with regular expressions first and then start with AWK. AWK, the predecessor of Perl, is much more versatile which can lead you to a leap in the dark. When you’re already used to C-style syntaxes and write in a more decompressed style, AWK scripts are easier to read than sed IMO. The latest version of Kernighan (who also developed C) is considered as the one and only true AWK version which is standard on FreeBSD (also Mac OS X). I would also recommend to book associated to this version AWK, ISBN 0-201-07981-X (which can be found at the bottom in the AWK man page), which tells you everything about AWK what you should know about it.

Sed Context Grep

→ "The rain drain

in Spain stays

stays mainly in

rain drain in

→ "The
rain
drain

in
Spain
stays

stays
mainly
in

rain
drain
in