Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient algo / way to parse (without any framework) a multipart/form-data request without reading everything to memory?

My question is simple: I want to write to disk a big file upload as it is arriving. I have two big files being uploaded by the same multipart/form-data form. How do I detect the end of file, in other words, how do I detect the boundary ------WebKitFormBoundaryuFPBAbBHzPMrZn8g in the middle of the arriving bytes?

Having the length of the file being uploaded would solve this problem completely, but this information is not given by the http request (just the full content-length, not the length of individual files being uploaded).

So what's the logic/strategy/algo to detect the boundary as I'm writing the bytes to disk. Of course I don't want to write the boundary thinking it is part of the file. I have to detect and stop writing to disk. Notice that I cannot load the whole file to memory before I start writing to disk. That would make the problem much easier.

Here is the format of a multipart/form-data with two files:

POST / HTTP/1.1
Host: localhost:8000
Connection: keep-alive
Content-Length: 362
Cache-Control: max-age=0
Origin: null
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryuFPBAbBHzPMrZn8g
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.8,pt;q=0.6

------WebKitFormBoundaryuFPBAbBHzPMrZn8g
Content-Disposition: form-data; name="file1"; filename="binary.dat"
Content-Type: application/octet-stream

aωb
------WebKitFormBoundaryuFPBAbBHzPMrZn8g
Content-Disposition: form-data; name="file2"; filename="binary.dat"
Content-Type: application/octet-stream

aωb
------WebKitFormBoundaryuFPBAbBHzPMrZn8g--
like image 506
TakeSoUp Avatar asked Oct 24 '25 00:10

TakeSoUp


1 Answers

A first very simple approach that maybe fit your needs and you can implement using memory library functions to find and move data, would be as follows:

Assuming that your boundary is N + 1 bytes (in your case data is 40 and N is 39), allocate a buffer of any size bigger than your signature, then do a first receive of buffer size in the buffer and enter a loop that processes the data as described bellow, until you don't have any more data to receive:

1 - Look for the signature in the buffer. If you find it then you are done with your first file. Save the bytes up to the finding point and close the first file. Then open the second file, move the bytes from the end of the finding point up to the buffer end to the start of the buffer, receive bytes to complete the buffer and continue in the loop.

2 - If you don't find the data in your buffer then write to your file all data up to (buffer + sizeof(buffer) - N - 1), move the last N bytes to the start of the buffer, receive the remaining bytes to fill up the buffer and and continue on the loop.

One cleaner approach that does not move the data but requires you to examine each byte is to do as follows:

1 - Allocate a buffer of any size.

2 - Set a match counter to zero.

3 - Set a boundingData array containing the bytes of your bounding data.

4 - Enter a loop that does the following

5 - Receive bytes in the buffer up to the buffer size or to the receiving end

6 - Enter another loop that examine each byte for the extent of the received data as follows:

If the byte being examined is equal to boundingData[matchCounter] then increment the counter and check if it reached the lenght of your boundingData. If it does then close your file, open the next one and set your matchCounter to zero.

Else if matchCounter is different than zero then write(boundingData, matchCount) and after that write the examined byte to your file.

When you're done with your buffer go back to step 5 until you don't have any more data to receive.

like image 190
João Amaral Avatar answered Oct 26 '25 20:10

João Amaral



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!