I'm trying to download a .doc file using requests.get() request (though I've heard about other methods - they all require saving too)
Is there any method I could use to extract the text from it (or even convert it into a .txt for example) straight away without saving it into a file?
I've tried passing request.raw into various conventors (docx2txt.process() for example) but I assume they all work with files, not with streams.
While the script is running the memory allocation are handled by the python interpreter but if you save the content to a file the memory allocated is different. This article can be helpful to you.
Link: article
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With