Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regular expression for HTTP Request header

I have a question about Python regex. I don't have much information about Python regex. I am working with HTTP request messages and parsing them with regex. As you know, the HTTP GET messages are in this format.

GET / HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: 10.2.0.12
Connection: Keep-Alive

I want to parse the URI, method, user-agent, and the host areas of the message. My regex for this job is:

r'^({0})\s+(\S+)\s+[^\n]*$\n.*^User-Agent:\s*(\S+)[^\n]*$\n.*^Host:\s*(\S+)[^\n]*$\n'.format('|'.join(methods)), re.MULTILINE|re.DOTALL)

But, when the message comes up with like

GET / HTTP/1.0
Host: 10.2.0.12
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Connection: Keep-Alive

I can not catch them because of the places of host or, user-agent changed. So I need a generic regex that will catch all of them, even if the places of host, method, uri are changed in the message.

like image 257
barp Avatar asked Jan 27 '26 10:01

barp


1 Answers

Readability Counts (The Zen of Python)

Use findall() for each subexpression you want to find. This way your regex will be short, readable, and independent of the location of the subexpression.

Define a simple, readable regex:

>>> user=re.compile("User-Agent: (.*?)\n")

Test it with two different http headers:

>>> s1='''GET / HTTP/1.0
    Host: 10.2.0.12
    User-Agent: Wget/1.12 (linux-gnu)
    Accept: */*
    Connection: Keep-Alive'''
>>> s2='''GET / HTTP/1.0
    User-Agent: Wget/1.12 (linux-gnu)
    Accept: */*
    Host: 10.2.0.12
    Connection: Keep-Alive'''
>>> user.findall(s1)
['Wget/1.12 (linux-gnu)']
>>> user.findall(s2)
['Wget/1.12 (linux-gnu)']
like image 96
Adam Matan Avatar answered Jan 29 '26 00:01

Adam Matan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!