I have a .txt file contains requests logs in the following format:
time_namelookup: 0,121668
time_connect: 0,460643
time_pretransfer: 0,460755
time_redirect: 0,000000
time_starttransfer: 0,811697
time_total: 0,811813
-------------
time_namelookup: 0,121665
time_connect: 0,460643
time_pretransfer: 0,460355
time_redirect: 0,000000
time_starttransfer: 0,813697
time_total: 0,811853
-------------
time_namelookup: 0,121558
time_connect: 0,463243
time_pretransfer: 0,460755
time_redirect: 0,000000
time_starttransfer: 0,911697
time_total: 0,811413
I want to create a list of values for each category so I thought regular expression could relevant in that case.
import re
'''
In this exmaple, I save only the 'time_namelookup' parameter
The same logic adapted for other parameters.
'''
namelookup = []
with open('shaghai_if_config_test.txt', 'r') as fh:
for line in fh.readlines():
number_match = re.match('([+-]?([0-9]*[,])?[0-9]+)',line)
namelookup_match = re.match('^time_namelookup:', line)
if namelookup_match and number_match:
num = number_match.group(0)
namelookup.append(num)
continue
I find this logic quite complicated as I have to execute two regex matches. Moreover, the number_match parameter doesn't match the line, while ^time_namelookup: ([+-]?([0-9]*[,])?[0-9]+) works fine
I looking for experienced advice for the described procedure. Any advice is appreciated.
My guess is that you have designed a fine expression, we would maybe slightly modify that to:
(time_(?:namelookup|connect|pretransfer|redirect|starttransfer|total))\s*:\s*([+-]?(?:\d*,)?\d+)
re.findall:import re
regex = r"(time_(?:namelookup|connect|pretransfer|redirect|starttransfer|total))\s*:\s*([+-]?(?:\d*,)?\d+)"
test_str = ("time_namelookup: 0,121668 \n"
"time_connect: 0,460643 \n"
"time_pretransfer: 0,460755 \n"
"time_redirect: 0,000000 \n"
"time_starttransfer: 0,811697 \n"
"time_total: 0,811813 \n")
print(re.findall(regex, test_str))
[('time_namelookup', '0,121668'), ('time_connect', '0,460643'), ('time_pretransfer', '0,460755'), ('time_redirect', '0,000000'), ('time_starttransfer', '0,811697'), ('time_total', '0,811813')]
re.finditer:import re
regex = r"(time_(?:namelookup|connect|pretransfer|redirect|starttransfer|total))\s*:\s*([+-]?(?:\d*,)?\d+)"
test_str = ("time_namelookup: 0,121668 \n"
"time_connect: 0,460643 \n"
"time_pretransfer: 0,460755 \n"
"time_redirect: 0,000000 \n"
"time_starttransfer: 0,811697 \n"
"time_total: 0,811813 \n"
"-------------\n"
"time_namelookup: 0,121665 \n"
"time_connect: 0,460643 \n"
"time_pretransfer: 0,460355 \n"
"time_redirect: 0,000000 \n"
"time_starttransfer: 0,813697 \n"
"time_total: 0,811853 \n"
"-------------\n"
"time_namelookup: 0,121558 \n"
"time_connect: 0,463243 \n"
"time_pretransfer: 0,460755 \n"
"time_redirect: 0,000000 \n"
"time_starttransfer: 0,911697 \n"
"time_total: 0,811413 ")
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.
jex.im visualizes regular expressions:

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With