Sed Context Grep

DJ_Bazzie_Wazzie · May 11, 2013, 11:21am

It is possible within Pascal but like in many Programming language (including C) a goto is considered as lack of design and bad software development and should be the programmer’s last choice. So if goto in Pascal is easier or worse than in C, it still is considered as equally bad/poor designed. However some programming language don’t have a block design like C or Pascal and uses goto (on line number) instead like in BBC Basic.

McUsrII · May 11, 2013, 11:25am

Hello.

Sometimes a goto is the solution, sometimes a longjump, but they should be avoided if possible, and used sparsely, they are ok, when they simplify things in total, and I guess that is why it is included. It is not to be used like in BBC basic. But only when there aren’t any good structured solution.

Nigel_Garvey · May 11, 2013, 11:35am

Hi.

I’m not sure this is the best way to do it, but it seems to work with the given text and variations. Not quite the same as grep -C though. I’ve left it uncommented.

set t to quoted form of "The
rain
drain
in
Spain
stays
mainly
in
the
plain"
set cmd to "echo " & t & " | sed -n '
1 h
2,$ {
	/ain/ !{
		x
		/ain/ {
			G
			p
			$ !s/^.*$/---/p
			c\\'$'\\n''
		}
	}
	/ain/ {
		x
		/ain/ {
			G
			p
			s/^[^[:cntrl:]]*\\n//
			h
			$ !s/^.*$/---/p
			$ c\\'$'\\n''
		}
		/---/ !{
			s/^[^[:cntrl:]]*\\n//
			G
			h
		}
	}
}
$ {
	g
	/ain/ p
}'"
do shell script cmd

McUsrII · May 11, 2013, 12:25pm

kel1 · May 12, 2013, 7:18am

I finally got how to make it loop at the right place in the script, so it could print the lines before and after the pattern(s). Not sure, if this is how grep works when there are consecutive matching lines. I need to check out how Nigel’s results look.


set t to quoted form of "The
rain
train
in
Spain
stays
mainly
in
the
stain
plain."
set cmd to "echo " & t & " | sed -n '
/ain/ !{
x
d
}
/ain/ {
x
# print the line before
p
# get a copy of the matching lline
x
# print the matching line
p
:loop
# clear pattern space and get next line
n
# print next line
p
# branch if line contains pattern
/ain/ b loop
# print divider
a\\
---
x
}'"
do shell script cmd

Thanks a lot,

Nigel_Garvey · May 12, 2013, 8:19am

Your results are closer to grep’s than mine. It merges overlapping contexts.

set t to quoted form of "The
rain
train
in
Spain
stays
mainly
in
the
stain
plain."
set cmd to "echo " & t & " | grep -C1 'ain'"
do shell script cmd

McUsrII · May 12, 2013, 9:28am

Hello.

It is very admirable kel!

Here is a technique for using sed, I have been playing with, in order to make it more versatile, (or to use as little as possible of it.)

The idea here, is to use sed in combination with other tools, where you really have as much as you can of precomposed input in separate files, and leverages upon that you can specifiy stdin as either “-” /dev/fd/0 or /def/stdin, (one of these normallly works.).

Ok, so I have a preamble of html, called file1, containing up to and including the body tag, and I have a “postamble”, containting the closing body tag and the rest of it, and a list, of say greek letters spelled out in latin, and the sed script below to make an unordered list:

[code]#!/usr/bin/sed -nf
1i\

s_^.*_

&

\[/code] Then, in order to generate an html file with the result, I may use a commandline like this:

cat alfabet |./ul.sed |cat file1 - file2 >alfa.html

DJ_Bazzie_Wazzie · May 12, 2013, 6:36pm

In AWK it can be like with overlap:


do shell script "awk 'BEGIN{first=1}/ain/{
	if (first==0)
		print \"---\"
	print head
 	print $0
	getline
	print $0
	first=0 };{head=$0}' <<< " & t

or without overlap:


do shell script "awk 'BEGIN{first=1}/ain/{
	if (first==0)
		print \"---\"
	print head
 	print $0
	getline
	print $0
	getline
	first=0 };{head=$0}' <<< " & t

McUsrII · May 12, 2013, 8:02pm

One day soon, I’ll write one in C that mimick’s Nigel Garvey’s version in sed, but with filenames and line numbers.

cgrep is a tool I hopefully don’t use to much, but it is really handy when you are tracking a variable or something, as useful as diff, for detecting changes between two versions.

It feels kind of weird for starters, to just look at fragments of files, but I experienced that as a boon once I got used to it: I don’t have to navigate, and I only look at the parts that interests me, there and then.

kel1 · May 12, 2013, 8:22pm

Hi,

Nigel,

Your script was actually what I was thinking it should be with the text as is. When I add a matching pattern in the first line:


set t to quoted form of "The brain
rain
drain
in
Spain
stays
mainly
in
the
plain"
set cmd to "echo " & t & " | sed -n '
1 h
2,$ {
	/ain/ !{
		x
		/ain/ {
			G
			p
			$ !s/^.*$/---/p
			c\\'$'\\n''
		}
	}
	/ain/ {
		x
		/ain/ {
			G
			p
			s/^[^[:cntrl:]]*\\n//
			h
			$ !s/^.*$/---/p
			$ c\\'$'\\n''
		}
		/---/ !{
			s/^[^[:cntrl:]]*\\n//
			G
			h
		}
	}
}
$ {
	g
	/ain/ p
}'"
do shell script cmd

The second and third matches don’t come out right.

Hi DJ Bazzie Wazzie,

I have a new library; awk. Thought I’d start with sed first because I’ve tried to learn it before, but the tutorials on awk seemed complicated. Thanks for the intro.

Well, my biorythm’s intellectual line is going up (just passed zero), so now is the time to learn this stuff.

kel1 · May 12, 2013, 9:11pm

Just remembered what I read on the internet somewhere, that grep cannot print the same lines more then once. That’s why the grep output is incomplete and maybe has those double dividers sometimes.

Editted: I don’t know what the person meant by this. Grep seems to print the same line. Need to read that post again. if I can find it. I probably read it wrong.

Nigel_Garvey · May 12, 2013, 9:21pm

I could have sworn I tested with that before I posted. Obviously not. Sorry. I’ll try and fix it ” but not tonight.

DJ_Bazzie_Wazzie · May 13, 2013, 12:15pm

You’re welcome.

It’s good to start with regular expressions first and then start with AWK. AWK, the predecessor of Perl, is much more versatile which can lead you to a leap in the dark. When you’re already used to C-style syntaxes and write in a more decompressed style, AWK scripts are easier to read than sed IMO. The latest version of Kernighan (who also developed C) is considered as the one and only true AWK version which is standard on FreeBSD (also Mac OS X). I would also recommend to book associated to this version AWK, ISBN 0-201-07981-X (which can be found at the bottom in the AWK man page), which tells you everything about AWK what you should know about it.

Nigel_Garvey · May 13, 2013, 12:37pm

OK. A slight rethink wherein the hold space acts similarly to a three-line FIFO stack, except that the lines are all retrieved at once and only the last two are put back before another is added. The stack is output complete if its penultimate line contains “ain”, with adjustments at the last line. It seems to work.

set t to quoted form of "The brain
rain
drain
in
Spain
stays
mainly
in
the
Staines
plain
again
Jane

again"

set cmd to "echo " & t & " | sed -En '
# Move the first line to the "stack" (hold space)
1 h
# With each of the following lines in turn:
1 !{
	# Append the line to the stack and retrieve the stack contents.
	H
	g
	# If the penultimate retrieved line contains "ain", print them all and, if not at the last line of text, output a separator.
	/ain[^[:cntrl:]]*\\n[^[:cntrl:]]*$/ {
		p
		$ !i\\'$'\\n''---
	}
	# If three lines were taken from the stack, lose the first one and push the other two back.
	/^[^[:cntrl:]]*\\n([^[:cntrl:]]*\\n[^[:cntrl:]]*)$/ {
		s//\\1/
		h
	}
}
# At the last line of the text, after the above processes, the pattern space contains either the first line (if there's only one) or the last two (the edited stack contents).
$ {
	# If the line contains "ain":
	/ain[^[:cntrl:]]*$/ {
		# If there's a line before it also containing "ain", output another separator.
		/ain[^[:cntrl:]]*\\n/ i\\'$'\\n''---
		# Print the pattern space contents.
		p
	}
}'"

do shell script cmd

Edit: Comments revised.

McUsrII · May 13, 2013, 1:19pm

The lines aren’t numbered. :lol:

Just kidding. It’s just Brilliant!

McUsrII · May 13, 2013, 2:55pm

Actually, I figured, I could write what I wanted faster in sed, and implement it in a shell script, rather than write it in c. (The sole intent is to have a tool to trace a variable/function call throughout a source code file.)

What I want, is a short context, and that the context is to be broken, when a new match appear, and I want line numbers.

Given the latest “poem” from the post above, the output looks like this:

[code] 1 The brain

 2	rain

 3	drain
 4	in

 5	Spain
 6	stays

 7	mainly
 8	in

 9	the
10	Staines

11	plain

12	again
13	Jane

14	
15	again"[/code]

I implemented it as a shell script, which should be easy to use from a do shell script, the regexp is given as a parameter quoted with double quotes on the commandline, and the input must be redirected into it.

[code]#!/bin/bash
if [ $# -ne 1 ] ; then
echo "Usage: cgrep "pattern" <input

Prints 1 line before and after a match,
and adds three dashes to split the
sequences. Linenumbers for a match is
added.
Cgrep breaks the context if the next
line is a match by itself.
" >/dev/tty
exit 2
fi
nl -b a |/usr/bin/sed -n ’
1 {
/‘“$1”‘n/! h
/’“$1”’/ {
p
a
—\

n
}

}
:op
1! {
/‘“$1”’/! {
h
n
}
/‘“$1”’/ {
x
/‘“$1”’/ {
g
p
}
/‘“$1”’/! {
p
g
p
}
n
/‘“$1”’/ {
i
—\

		b op
	}
	/'"$1"'/! {
	p
	a\

—\

}
}

}
'[/code]

McUsrII · May 13, 2013, 8:40pm

Some stuff I made from ideas acquired in this thread!

Here is one little tool to wrap something into html, by piping stuff into it. (Its’ called html_wrap.)

#!/bin/bash echo '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"' echo ' "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">' echo '<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">' echo '<head>' echo ' <meta http-equiv="content-type" content="text/html; charset=utf-8" />' echo ' <title>Wrapper</title>' echo '</head>' echo '<body>' cat - echo '</body>' echo '</html>'
Below is a script that shows your shell scripts and the like with qlmanager (quicklook.). I have called it “g”

#!/bin/bash if [ $# -ne 1 ] ; then echo "Usage: g file. shows a file probably script with quicklook." >/dev/tty exit 2 fi if [ ! -e $1 ] ; then echo "g: Can't find $1" >/dev/tty exit 1 fi a=$(basename $1) ( echo '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"' echo ' "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">' echo '<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">' echo '<head>' echo ' <meta http-equiv="content-type" content="text/html; charset=utf-8" />' echo ' <title>Wrapper</title>' echo '</head>' echo '<body>' echo '<pre><code>' cat $1 |sed -e 's/</\</g' -e 's/>/\>/g' echo '</code></pre>' echo '</body>' echo '</html>' ) >/tmp/$a.html qlmanage -px /tmp/$a.html 2>&1 >/dev/null &
You need to chmod u+x to use both of course, and store them somwhere in your bin.

The idea is of course to use “g” from a do shell script, so that you can look at the scripts from Finder.

Edit

I have made a minor change, removing the path when generating the html file, so it works properly outside the folder it is placed.

Here is a small applescript, that calls it with the first item of the current Finder selection as argument.

tell application "Finder"
    set a to its selection as alias list
    if a is not missing value then
        if ((count a) > 1) then
            set a to first item of a
            set a to POSIX path of a
        else
            set a to POSIX path of a
        end if
        do shell script "g " & quoted form of a & " 2>&1 >/dev/null  &"
    end if
end tell

By the way, I saved the applescript above as an applet, found an icon for it, and dragged it onto the toolbar, just great!

kel1 · May 14, 2013, 4:19am

Wow, everybody’s been busy! Yeah, nice poem.

I tried to do this script with just the lower case commands for now. Finally got the loop in the right place. Had to hold the matching line in the buffer, incase the next line had a match also. Had to use insert (i) instead of append, because append only appends after all lines are printed within the curly braces.


set t to quoted form of "The again
rain
train
in
Spain
stays
mainly
in
the
brain
stain
plain."
set cmd to "echo " & t & " | sed -n '
/ain/ !{
x
d
}
/ain/ {
:loop
x
# print the line before
p
# get a copy of the matching lline
x
# print the matching line
p
# hold this lines in case next line contains pattern
h
# clear pattern space and get next line
n
# print next line
p
# branch if line contains pattern
# insert divider here
i\\
---
/ain/ b loop
x
}'"
do shell script cmd

This method is limited though, to one line before and after a match. I’ll have to learn Nigel’s method in the following sections with the capitalized commands for more veratillity.

McUsr,

How did you know I was planning on learning html? You must be a psychic. Actually, I don’t really know what is going on in your scripts yet. I’ve been walking around in a daze, thinking about where to put the loop. Now I can reread the posts.

Editted: oops, posted the wrong script.


set t to quoted form of "The brain
rain
drain
in
Spain
stays
mainly
in
the
Staines
plain
again
Jane

again"
set cmd to "echo " & t & " | sed -n '
/ain/ !{
x
d
}
/ain/ {
:loop
x
# print the line before
p
# get a copy of the matching lline
g
# print the matching line
p
# clear pattern space and get next line
n
# print next line
p
# insert divider here
i\\
---
# branch if line contains pattern
/ain/ b loop
x
}'"
do shell script cmd

Thanks a lot everybody,
It’s been fun.

McUsrII · May 14, 2013, 4:33am

Hello kel.

I’ll just tell you that the shell script works from an automator service as well, with a shell script action that looks like this:

g "$1"

Looks like you got it when it comes to emulate grep -C 1 , the final exercise would be to place hyphens and colons correctly, (I’m not up for that.)

McUsrII · May 14, 2013, 10:39am

Hello.

I have added substitution of ‘<’ and ‘>’ to < respective > in the “g” shell-script in post #24 to make those turn up as they should, with any following/previous text in the formatted html.

Come to think about it, you can add your own font styles here, should you be dissatisfied with the size of the font quicklook gives for whatever file you want to read. You’ll loose the syntax coloring of course, I’m thinking of .c , .m and .h files here specificially.)

The pre/code style that are standard for html, works perfectly for me, I loose the syntax colouring, but what is the point of having syntax colouring, if you can’t read what is written?