So I'm trying to make a awk script that determines the most hits in order of the highest three. I am doing this based off a apache web log that looks like
192.168.198.92 - - [22/Dec/2002:23:08:37 -0400] "GET / HTTP/1.1" 200 6394 www.yahoo.com "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1...)" "-"
192.168.198.92 - - [22/Dec/2002:23:08:38 -0400] "GET /images/logo.gif HTTP/1.1" 200 807 www.yahoo.com "http://www.some.com/" "Mozilla/4.0 (compatible; MSIE 6...)" "-"
192.168.72.177 - - [22/Dec/2002:23:32:14 -0400] "GET /news/sports.html HTTP/1.1" 200 3500 www.yahoo.com "http://www.some.com/" "Mozilla/4.0 (compatible; MSIE ...)" "-"
192.168.72.177 - - [22/Dec/2002:23:32:14 -0400] "GET /favicon.ico HTTP/1.1" 404 1997 www.yahoo.com "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)..." "-"
192.168.72.177 - - [22/Dec/2002:23:32:15 -0400] "GET /style.css HTTP/1.1" 200 4138 www.yahoo.com "http://www.yahoo.com/index.html" "Mozilla/5.0 (Windows..." "-"
192.168.72.177 - - [22/Dec/2002:23:32:16 -0400] "GET /js/ads.js HTTP/1.1" 200 10229 www.yahoo.com "http://www.search.com/index.html" "Mozilla/5.0 (Windows..." "-"
192.168.72.177 - - [22/Dec/2002:23:32:19 -0400] "GET /search.php HTTP/1.1" 400 1997 www.yahoo.com "-" "Mozilla/4.0 JJohnJoJJJJJoJJoJJJJJoJJohJJJJJJJJJJJJohnJohJoJoJJJoJJ
To do this I do:
$1 ~ /[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/ {
hitCounter[$1]++
notIndexed=1
for(i in ips) {
if (i==$1) { notIndexed=0 }
}
if(notIndexed==1) {
ips[indexx]=$1
indexx++
}
}
This line detects a IP and then increments the hit count for it in the "hitCounter" array which is indexed by the IPs. Following that I check the list of ips, "ips", to see if the hit IP is already in there. If not the IP is added to the "ips" array and the index count is increased by one. In theory by doing this each index in "ips" should correlate with the indices in "hitCounter". Finally I have...
END {
indexxx=0
for (i in hitCounter) {
if (i>hitCounter[firstIP])
firstIP=ips[indexxx]
else if (i>hitCounter[secondIP])
secondIP=ips[indexxx]
else
thirdIP=ips[indexxx]
indexxx++
}
}
It is here that I go through the IP hit counts in "hitCounter", compare them to the hits in the three high hit variables and, if the IP hit is greater then one of the three high hit variable contents, I set it to the current IP.
This seems like it should work to me and I should get "192.168.72.177 192.168.198.92" as the output but instead I get "192.168.198.92 192.168.198.92".
Why?
EDIT: Sorry, this is how I print the final results which is placed right after the "hitCounter" foreach loop...
print "The most hits were from "firstIP" "secondIP" "thirdIP
Instead of searching for the IP each time to see if it exists in the list of IP addresses, I'd do this:
$1 ~ /[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/ {
hitCounter[$1]++
}
END {
for (ip in hitCounter) {
if (hitCounter[ip] > hitCounter[firstIP])
thirdIP = secondIP
secondIP = thirdIP
firstIP = ip
else if (hitCounter[ip] > hitCounter[secondIP])
thirdIP = secondIP
secondIP = ip
else
thirdIP = ip
}
}
I think part of your confusion was in thinking that i was the value rather than the key in for (i in hitCounter).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With