Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Some way to covert the string representation of a pdf into bytes in python

i'm actually trying to do something that i do not know if its ok.

Problem:

I have a web client and a web server, the server (written in python with flask) processes a pdf file in order to get some data, and the client just send the pdf file and waits for the response. The think is that the client can send various pdf files to process and what i want to do is, to send all the pdfs from the client to the server in just one request.

What I have planned to do:

I was thinking on convert the Blob of each pdf in a String and send a POST Request with a JSON body like this:

BODY:
  {
    "content":[
        {"name": "pdf_name_1.pdf", "data": "some blob data converted to string"},
        {"name": "pdf_name_2.pdf", "data": "some blob data converted to string"},
        {"name": "pdf_name_3.pdf", "data": "some blob data converted to string"},
        ...
    ]
}

So then in the server i was thinking to convert again the data into a blob(bytes) in order to write down the pdf a start the processing the data.

My question:

Is there any way to convert the str representation of the pdf to bytes in order to write down in disk the pdf with python?

Thanks a lot, if some one come up with another idea to send bunch of pdfs in only one request let me know please.

pd: I'm using python 3.5 and Flask for the web server.

like image 756
Kevin mendieta perez Avatar asked Jan 18 '26 21:01

Kevin mendieta perez


1 Answers

In such cases, it's preferred to send file data passing that with the files keyword, like so:

import requests


def send_pdf_data(filename_list, encoded_pdf_data):
    files = {}

    for (filename, encoded, index) in zip(filename_list, encoded_pdf_data, range(len(filename_list))):
        files[f"pdf_name_[index].pdf"] = (filename, open(filename, 'rb'), 'application/pdf')

    data = {}
    # *Put whatever you want in data dict*

    requests.post("http://yourserveradders", data=data, files=files)


def main():
    filename_list = ["pdf_name_1.pdf", "pdf_name_2.pdf"]
    pdf_blob_data = [open(filename, 'wb').read() for filename
                     in filename_list]

if __name__ == '__main__':
    main()

However, if you really want to pass data as json, you should use base-64 module as @Mark Ransom mentioned.

You can implement it in this way:

import requests
import json
import base64


def encode(data: bytes):
    """
    Return base-64 encoded value of binary data.
    """
    return base64.b64encode(data)


def decode(data: str):
    """
    Return decoded value of a base-64 encoded string.
    """
    return base64.b64decode(data.encode())


def get_pdf_data(filename):
    """
    Open pdf file in binary mode,
    return a string encoded in base-64.
    """
    with open(filename, 'rb') as file:
        return encode(file.read())


def send_pdf_data(filename_list, encoded_pdf_data):
    data = {}
    # *Put whatever you want in data dict*
    # Create content dict.
    content = [dict([("name", filename), ("data", pdf_data)])
               for (filename, data) in zip(filename_list, encoded_pdf_data)]
    data["content"] = content

    data = json.dumps(data) # Convert it to json.
    requests.post("http://yourserveradders", data=data)


def main():
    filename_list = ["pdf_name_1.pdf", "pdf_name_2.pdf"]
    pdf_blob_data = [get_pdf_data(filename) for filename
                     in filename_list]

if __name__ == '__main__':
    main()
like image 69
Federico Rubbi Avatar answered Jan 21 '26 12:01

Federico Rubbi



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!