How can i extract images/logo from word document using python and store them in a folder. Following code converts docx to html but it doesn't extract images from the html. Any pointer/suggestion will be of great help.
profile_path = <file path>
result=mammoth.convert_to_html( profile_path)
f = open(profile_path, 'rb')
b = open(profile_html, 'wb')
document = mammoth.convert_to_html(f)
b.write(document.value.encode('utf8'))
f.close()
b.close()
To extract the source Images from the docx (which is a variation on a zip file) without distortion or conversion.
shell out to OS and run
tar -m -xf DocxWithImages.docx word/media

You will find the source images Jpeg, PNG WMF or others in the word media folder extracted into a folder of that name. These are the unadulterated source embedment's without scale or crop.
You may be surprised that the visible area may be larger then any cropped version used in the docx itself, and thus need to be aware that Word does not always crop images as expected (A source of embarrassing redaction failure)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With