jshort February 2016

Using sed to get the text between two key words (but not the key words themselves)

So I have found this sed expression for getting the text between to keywords exclusive of the keywords:

cat example.txt | sed '/^KEYWORD1/,/^KEYWORD2/!d; //d'

where example.txt:

do
not
care
KEYWORD1
I
want
this
KEYWORD2
do
not
care

output:

I
want
this

However, I want to understand for sure what is going on with this expression. My understanding is that with a 'pattern range' (correct me if this is improper terminology), that a bool is set when you hit the first match and that the command(s) following the pattern range are only executed if that bool is true.

Then there is the //d where the // is supposed to mean the last expression/regex that was matched. So is it correct that in this case with a pattern range, the logic is as follows:

  • loop
  • Find /^KEYWORD1/, set bool to true, proceed with !d command which does not delete this line, then since the last regex was /^KEYWORD1/ then //d is effectively /^KEYWORD1/d which deletes this line
  • bool is true so it proceeds to not delete the 3 next lines and /^KEYWORD1/ is not found on said lines so nothing is deleted
  • Find /^KEYWORD2/d, execute !d and then /^KEYWORD2/d since this was the last regex used

So at this point I'm not sure how the lines before and after are not printed since it doesn't execute the commands (the !d) unless the pattern range flag is set to true.

Or does sed at least look at the command for every line and since the first command is an inverse delete, it somehow changes the logic to delete all other lines where the pattern range bool is false?

Any clarification on how this sed expression works will be appreciated. I've read this great

Answers


Benjamin W. February 2016

Your misunderstanding is this: /address/!d doesn't mean "if we match address, don't delete the line"; the ! is the negation of the address, i.e., "if we don't match address, then do delete the line."

So the one-liner (better written without cat, by the way)

sed '/^KEYWORD1/,/^KEYWORD2/!d; //d' example.txt

does this:

  • /^KEYWORD1/,/^KEYWORD2/!d: for all the lines outside the range /^KEYWORD1/,/^KEYWORD2/, i.e.,

    do
    not
    care
    do
    not
    care
    

    delete them. d jumps back to the start of the script. This leaves us with

    KEYWORD1
    I
    want
    this
    KEYWORD2
    

    where we don't want to print KEYWORD1 and KEYWORD2.

  • For these lines, we fall through to //d, which means "delete the last matching line".

    On the KEYWORD1 line, we fall through and delete the line, because it was matched before. On the next three lines, we fall through, but there was no match, so we don't delete anything. On the KEYWORD2 line, we fall through and delete because it was matched before – leaving us with the lines between the two patterns.

Post Status

Asked in February 2016
Viewed 2,377 times
Voted 6
Answered 1 times

Search




Leave an answer