Extract images from word document using Python

Question

How can i extract images/logo from word document using python and store them in a folder. Following code converts docx to html but it doesn't extract images from the html. Any pointer/suggestion will be of great help.

    profile_path = <file path>
    result=mammoth.convert_to_html( profile_path)
    f = open(profile_path, 'rb')
    b = open(profile_html, 'wb')
    document = mammoth.convert_to_html(f)
    b.write(document.value.encode('utf8'))
    f.close()
    b.close()

K J · Accepted Answer

Native without any lib

To extract the source Images from the docx (which is a variation on a zip file) without distortion or conversion.

shell out to OS and run

tar -m -xf DocxWithImages.docx word/media

enter image description here

You will find the source images Jpeg, PNG WMF or others in the word media folder extracted into a folder of that name. These are the unadulterated source embedment's without scale or crop.

You may be surprised that the visible area may be larger then any cropped version used in the docx itself, and thus need to be aware that Word does not always crop images as expected (A source of embarrassing redaction failure)

Extract images from word document using Python

Tags:

python

python-3.x

python-2.7

Softchamp

1 Answers

Native without any lib

K J

Recent Activity

Donate For Us

Extract images from word document using Python

Tags:

python

python-3.x

python-2.7

Softchamp

1 Answers

Native without any lib

K J

Related questions

Recent Activity

Donate For Us