In educational purposes I'm writing a HTTP server in C++.
When receiving a request, how do I know when the client has finished sending headers? Is there an obligation that all headers must be sent in one shot? What if a client sends G
, then after 5 seconds E
, then T
..? Should I wait a timeout and just close the connection if it takes too long? Should I start parsing as soon as I get the first bytes to know if the request is invalid?
I know there are a lot of libraries for this, I'm just reinventing the wheel to better understand how the Web works at different layers. And I can't find how they deal with exactly my question.
According to the HTTP 1.1 RFC (4.1):
generic-message = start-line
*(message-header CRLF)
CRLF
[ message-body ]
start-line = Request-Line | Status-Line
There is an extra CRLF after the message header. So once you encounter the sequence CRLF -> CRLF, the body starts.
Concering timeout: You could start parsing once receiving characters (wait for CRLF so you know a header was completed) and once the request takes longer than 5 seconds or so, send back a 408 Request Timeout.
There are two parts to this answer.
Firstly, the issue of delay and time-out: you should deal with timeouts indeed, as it's generally not possibly to detect whether a TCP connection is broken. There is more on this topic in this question: TCP socket in Unix - notify server I am done sending
Secondly, the format of an HTTP request is defined (in RFC 2616, section 5) as follows:
Request = Request-Line ; Section 5.1
*(( general-header ; Section 4.5
| request-header ; Section 5.3
| entity-header ) CRLF) ; Section 7.1
CRLF
[ message-body ] ; Section 4.3
Essentially, you get the request line (for example GET /index.html HTTP/1.1
), followed by multiple header lines (without empty lines). Then, the list of headers ends with an empty line. All ends of lines are represented with CRLF ("\r\n
").
In addition to this, some requests also have a body (typically those using POST
or PUT
). If the request has a message body, its length will be given either by the Content-Length
header or using delimiters via chunked transfer encoding.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With