Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python : 'Page' object has no attribute 'getImageList' where I try to extract image from pdf

Tags:

python

image

pdf

I try to extract some images from my pdf file, I used several methods but most of them were based on the Fitz library.

import fitz 
import io
from PIL import Image


pdf_file = fitz.open("my_file_pdf.pdf")


for page_index in range(len(pdf_file)):
    # get the page itself
    page = pdf_file[page_index]
    image_list = page.getImageList()
    # printing number of images found in this page
    if image_list:
        print(f"[+] Found  {len(image_list)} images in page {page_index}")
    else:
        print("[!] No images found on the given pdf page", page_index)
    for image_index, img in enumerate(page.getImageList(), start=1):
        print(img)
        print(image_index)
        # get the XREF of the image
        xref = img[0]
        # extract the image bytes
        base_image = pdf_file.extractImage(xref)
        image_bytes = base_image["image"]
        # get the image extension
        image_ext = base_image["ext"]
        # load it to PIL
        image = Image.open(io.BytesIO(image_bytes))
        # save it to local disk
        image.save(open(f"image{page_index+1}_{image_index}.{image_ext}", "wb")) 

This code gives me the error :

AttributeError                            Traceback (most recent call last)
<ipython-input-1-e5b882e88684> in <module>
     11     # get the page itself
     12     page = pdf_file[page_index]
---> 13     image_list = page.getImageList()
     14     # printing number of images found in this page
     15     if image_list:

AttributeError: 'Page' object has no attribute 'getImageList'

However according to the documentation this is the way to use this function so where could the problem come from?

like image 793
user60005003 Avatar asked Jan 25 '26 14:01

user60005003


2 Answers

Take a look at: https://pymupdf.readthedocs.io/en/latest/znames.html

It seems that getImageList is a deprecated name. Here you can find a list of new ones.

Deprecated Names

like image 161
Walter Morales Avatar answered Jan 27 '26 04:01

Walter Morales


Instead of page.getImageList() try using page.get_images()

A list of attributes and methods belonging to the Page object is given at https://pymupdf.readthedocs.io/en/latest/page.html. getImageList() is not included however get_images() is.

like image 34
Adam Avatar answered Jan 27 '26 02:01

Adam



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!