Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

awk : if >=4 lines in a row begin with + or - don't print record

Tags:

bash

shell

awk

gawk

I'm trying to use awk to read a file and only display lines that do no begin with a + or - 4 or more times in a row. gawk would be fine too. Each grouping is separated by a blank line.

Here's a sample from the file, these are the lines I do not want printed:

+Host is up.
+Not shown: 95 closed ports, 3 filtered ports
+PORT     STATE SERVICE   VERSION
+23/tcp   open  telnet
+9100/tcp open  jetdirect

-Host is up.
-Not shown: 99 closed ports
-PORT     STATE SERVICE VERSION
-5900/tcp open  vnc

A sample from the file which I do want printed ( not 4 or more in a row ):

-Not shown: 76 closed ports, 18 filtered ports
+Not shown: 93 closed ports
PORT    STATE SERVICE VERSION
+514/tcp open  shell

I'm learning how to use awk at the moment as I've been reading O'Reilly's awk & sed but I'm a little stumped on this problem. Also, if anyone cares to, I wouldn't mind seeing non-awk ways of solving this problem with a shell script.

Thanks!

like image 988
jonschipp Avatar asked Jan 27 '26 08:01

jonschipp


1 Answers

If I understood your question, the input file have records as paragraphs, so you will need to separate them with blank lines. I assumed it for next script:

Content of script.awk:

BEGIN {
        ## Separate records by one or more blank lines.
        RS = ""

        ## Each line will be one field. Both for input and output.
        FS = OFS = "\n"
}

## For every paragraph...
{
        ## Flag to check if I will print the paragraph to output.
        ## If 1, print.
        ## If 0, don't print.
        output = 1

        ## Count how many consecutive rows have '+' or '-' as first
        ## character.
        j = 0

        ## Traverse all rows.
        for ( i = 1; i <= NF; i++ ) {
                if ( substr( $i, 1, 1 ) ~ /+|-/ ) {
                        ++j;
                }
                else {
                        j = 0
                }

                if ( j >= 4 ) {
                        output = 0
                        break
                }
        }

        if ( output == 1 ) {
                print $0 "\n"
        }
}

Assuming following test input file as infile:

+Host is up. 
+Not shown: 95 closed ports, 3 filtered ports
+PORT     STATE SERVICE   VERSION

+Host is up. 
+Not shown: 95 closed ports, 3 filtered ports
+PORT     STATE SERVICE   VERSION
+23/tcp   open  telnet
+9100/tcp open  jetdirect

-Host is up. 
-Not shown: 99 closed ports
-PORT     STATE SERVICE VERSION
-5900/tcp open  vnc 

-Not shown: 76 closed ports, 18 filtered ports
+Not shown: 93 closed ports
PORT    STATE SERVICE VERSION
+514/tcp open  shell

Run the script like:

awk -f script.awk infile

With following output (first record because it doesn't reach to four consecutive rows, and second record because it has a different line between them):

+Host is up.
+Not shown: 95 closed ports, 3 filtered ports
+PORT     STATE SERVICE   VERSION

-Not shown: 76 closed ports, 18 filtered ports
+Not shown: 93 closed ports
PORT    STATE SERVICE VERSION
+514/tcp open  shell
like image 199
Birei Avatar answered Jan 29 '26 22:01

Birei