Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract all the images in a docx file using python

I have a docx file which contains 6-7 images. I need to automate the extraction of images from this doc file. Is there any win32com ms word API for the same? Or any library that can accurately extract all the images in it?

This is what I have tried but the problem is first of all its not giving me all the images, secondly its giving me many false poitive images, like the blank image, extremely small images, lines etc... Its also using the MS word to do the same.

from pathlib import Path
from win32com.client import Dispatch

xls = Dispatch("Excel.Application")
doc = Dispatch("Word.Application")


def export_images(fp, prefix="img_", suffix="png"):
    """ export all of images(inlineShapes) in the word file.
    :param fp: path of word file.
    :param prefix: prefix of exported images.
    :param suffix: suffix of exported images.
    """

    fp = Path(fp)
    word = doc.Documents.Open(str(fp.resolve()))
    sh = xls.Workbooks.Add()
    for idx, s in enumerate(word.inlineShapes, 1):
        s.Range.CopyAsPicture()
        d = sh.ActiveSheet.ChartObjects().add(0, 0, s.width, s.height)
        d.Chart.Paste()
        d.Chart.Export(fp.parent / ("%s_%s.%s" % (prefix, idx, suffix))
    sh.Close(False)
    word.Close(False)
export_images(r"C:\Users\HPO2KOR\Desktop\Work\venv\us2017010202.docx")

You can download the docx file here https://drive.google.com/open?id=1xdw2MieI1n3ulXlkr_iJSKb3cbozdvWq

like image 390
Himanshu Poddar Avatar asked Oct 21 '25 05:10

Himanshu Poddar


1 Answers

You can unzip all images from docx preliminarily filtered them by size:

import zipfile

archive = zipfile.ZipFile('file.docx')
for file in archive.filelist:
    if file.filename.startswith('word/media/') and file.file_size > 300000:
        archive.extract(file)

In your example 5 images were found:

enter image description here

like image 158
Alderven Avatar answered Oct 23 '25 20:10

Alderven



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!