Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Need to get unique errors from log file

What I have so far

def unique_ips():
f = open('logfile','r')
ips = set()
for line in f:
    ip = line.split()[0]
    print ip
    for date in ip:
       logdate = line.split()[3]
       print "\t", logdate
       for entry in logdate:
           info = line.split()[5:11] 
           print "\t\t", info
    ips.add(ip)
unique_ips()

The part I am having trouble with is:

       for entry in logdate:
           info = line.split()[5:20] 
           print "\t\t", info

I have a log file that I have to sort first by ip, then by time then errors

should look like:

199.21.99.83
        [30/Jun/2013:07:18:30
                ['"GET', '/searchme/index.php?f=man_soweth', 'HTTP/1.1"', '200', '8676', '"-"']

but instead I'm getting:

199.21.99.83
        [30/Jun/2013:07:18:30
                ['"GET', '/searchme/index.php?f=man_soweth', 'HTTP/1.1"', '200', '8676', '"-"']
                ['"GET', '/searchme/index.php?f=man_soweth', 'HTTP/1.1"', '200', '8676', '"-"']
                ['"GET', '/searchme/index.php?f=man_soweth', 'HTTP/1.1"', '200', '8676', '"-"']
                ['"GET', '/searchme/index.php?f=man_soweth', 'HTTP/1.1"', '200', '8676', '"-"']
                 ...

I'm sure I am running into some sort of syntax issue but would appreciate the help!

Log file looks like:

99.21.99.83 - - [30/Jun/2013:07:15:50 -0500] "GET /lenny/index.php?f=13 HTTP/1.1" 200 11244 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.83 - - [30/Jun/2013:07:16:13 -0500] "GET /searchme/index.php?f=being_fruitful HTTP/1.1" 200 7526 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.83 - - [30/Jun/2013:07:16:45 -0500] "GET /searchme/index.php?f=comparing_themselves HTTP/1.1" 200 7369 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
66.249.73.40 - - [30/Jun/2013:07:16:56 -0500] "GET /espanol/displayAncient.cgi?ref=isa%2054:3 HTTP/1.1" 500 167 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
199.21.99.83 - - [30/Jun/2013:07:17:00 -0500] "GET /searchme/index.php?f=tribulation HTTP/1.1" 200 7060 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.83 - - [30/Jun/2013:07:17:15 -0500] "GET /searchme/index.php?f=proud HTTP/1.1" 200 7080 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.83 - - [30/Jun/2013:07:17:34 -0500] "GET /searchme/index.php?f=soul HTTP/1.1" 200 7063 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.83 - - [30/Jun/2013:07:17:38 -0500] "GET /searchme/index.php?f=the_flesh_lusteth HTTP/1.1" 200 6951 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.c
like image 661
JasonOrtiz Avatar asked Feb 19 '26 07:02

JasonOrtiz


2 Answers

The question was a little confusing because of the sample output, but I'm pretty sure that you want something like this:

def unique_ips():
    f = open('logfile','r')
    ips = {}
    # This for loop collects all of the ips with their associated errors
    for line in f:
        ip = line.split()[0]
        try:
            ips[ip].append(line)
        except KeyError:
            ips[ip] = [line]

    # This for loop goes through all the ips that were collected
    # and prints out all errors for those ips
    for ip, errors in ips.iteritems():
        print ip
        errors.sort()
        for e in errors:
           logdate = e.split()[3]
           print "\t", logdate

           info = e.split()[5:11] 
           print "\t\t", info

    f.close()

Which produces this output from your sample file:

199.21.99.83
    [30/Jun/2013:07:16:13
        ['"GET', '/searchme/index.php?f=being_fruitful', 'HTTP/1.1"', '200', '7526', '"-"']
    [30/Jun/2013:07:16:45
        ['"GET', '/searchme/index.php?f=comparing_themselves', 'HTTP/1.1"', '200', '7369', '"-"']
    [30/Jun/2013:07:17:00
        ['"GET', '/searchme/index.php?f=tribulation', 'HTTP/1.1"', '200', '7060', '"-"']
    [30/Jun/2013:07:17:15
        ['"GET', '/searchme/index.php?f=proud', 'HTTP/1.1"', '200', '7080', '"-"']
    [30/Jun/2013:07:17:34
        ['"GET', '/searchme/index.php?f=soul', 'HTTP/1.1"', '200', '7063', '"-"']
    [30/Jun/2013:07:17:38
        ['"GET', '/searchme/index.php?f=the_flesh_lusteth', 'HTTP/1.1"', '200', '6951', '"-"']
66.249.73.40
    [30/Jun/2013:07:16:56
        ['"GET', '/espanol/displayAncient.cgi?ref=isa%2054:3', 'HTTP/1.1"', '500', '167', '"-"']
99.21.99.83
    [30/Jun/2013:07:15:50
        ['"GET', '/lenny/index.php?f=13', 'HTTP/1.1"', '200', '11244', '"-"']
like image 124
mr2ert Avatar answered Feb 20 '26 21:02

mr2ert


You have too many loops. You don't need the for entry in logdate loop. You're already looping over each line.

Remove the for entry in logdate and outdent the info assignment and print statements.

(The comments already mentioned this.)

like image 24
AnnaRaven Avatar answered Feb 20 '26 19:02

AnnaRaven



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!