Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to save web page as image using python

I am using python to create a "favorites" section of a website. Part of what I want to do is grab an image to put next to their link. So the process would be that the user puts in a URL and I go grab a screenshot of that page and display it next to the link. Easy enough?

I have currently downloaded pywebshot and it works great from my terminal on my local box. However, when I put it on the server, I get a Segmentation Fault with the following traceback:

/usr/lib/pymodules/python2.6/gtk-2.0/gtk/__init__.py:57: GtkWarning: could not open display
  warnings.warn(str(e), _gtk.Warning)
./pywebshot.py:16: Warning: invalid (NULL) pointer instance
  self.parent = gtk.Window(gtk.WINDOW_TOPLEVEL)
./pywebshot.py:16: Warning: g_signal_connect_data: assertion `G_TYPE_CHECK_INSTANCE (instance)' failed
  self.parent = gtk.Window(gtk.WINDOW_TOPLEVEL)
./pywebshot.py:49: GtkWarning: Screen for GtkWindow not set; you must always set
a screen for a GtkWindow before using the window
  self.parent.show_all()
./pywebshot.py:49: GtkWarning: gdk_screen_get_default_colormap: assertion `GDK_IS_SCREEN (screen)' failed
  self.parent.show_all()
./pywebshot.py:49: GtkWarning: gdk_colormap_get_visual: assertion `GDK_IS_COLORMAP (colormap)' failed
  self.parent.show_all()
./pywebshot.py:49: GtkWarning: gdk_screen_get_root_window: assertion `GDK_IS_SCREEN (screen)' failed
  self.parent.show_all()
./pywebshot.py:49: GtkWarning: gdk_window_new: assertion `GDK_IS_WINDOW (parent)' failed
  self.parent.show_all()
Segmentation fault

I know that some things can't run in a pts environment, but honestly that's a little beyond me right now. If I need to somehow pretend that my pts connection is tty, I can try it. But at this point I'm not even sure what's going on and I admit it's a bit over my head. Any help would be greatly appreciated.

Also, if there's a web service that I can pass a url and receive an image, that would work just as well. I am NOT married to the idea of pywebshot.

I do know that the server I'm on is running X and has all the necessary python modules installed.

Thanks in advance.

like image 739
Chris Avatar asked Jan 30 '26 10:01

Chris


2 Answers

This is the code I used to get the screenshot of the whole scrolled webpage:

from PIL import Image
from io import BytesIO
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import logging
import os
import time

# Set default download folder for ChromeDriver
videos_folder = r"./download"
if not os.path.exists(videos_folder):
    os.makedirs(videos_folder)
prefs = {"download.default_directory": videos_folder}

def open_url(address):
    # SELENIUM SETUP
    logging.getLogger('WDM').setLevel(logging.WARNING)  # just to hide not so rilevant webdriver-manager messages
    chrome_options = Options()
    chrome_options.headless = True
    chrome_options.add_experimental_option("prefs", prefs)
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
    driver.implicitly_wait(1)
    driver.maximize_window()
    driver.get(address)
    driver.set_window_size(1920, 1080)  # to set the screenshot width
    save_screenshot(driver, '{}/Screenshot.png'.format(videos_folder))
    driver.quit()

def save_screenshot(driver, file_name):
    height, width = scroll_down(driver)
    driver.set_window_size(width, height)
    img_binary = driver.get_screenshot_as_png()
    img = Image.open(BytesIO(img_binary))
    img.save(file_name)
    # print(file_name)
    print("Screenshot saved!")

def scroll_down(driver):
    total_width = driver.execute_script("return document.body.offsetWidth")
    total_height = driver.execute_script("return document.body.parentNode.scrollHeight")
    viewport_width = driver.execute_script("return document.body.clientWidth")
    viewport_height = driver.execute_script("return window.innerHeight")

    rectangles = []

    i = 0
    while i < total_height:
        ii = 0
        top_height = i + viewport_height

        if top_height > total_height:
            top_height = total_height

        while ii < total_width:
            top_width = ii + viewport_width

            if top_width > total_width:
                top_width = total_width

            rectangles.append((ii, i, top_width, top_height))

            ii = ii + viewport_width

        i = i + viewport_height

    previous = None
    part = 0

    for rectangle in rectangles:
        if not previous is None:
            driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1]))
            time.sleep(0.5)
        # time.sleep(0.2)

        if rectangle[1] + viewport_height > total_height:
            offset = (rectangle[0], total_height - viewport_height)
        else:
            offset = (rectangle[0], rectangle[1])

        previous = rectangle

    return total_height, total_width

open_url("https://stackoverflow.com/questions/4091940/how-to-save-web-page-as-image-using-python")

Here the screenshot obtained:

Whole webpage screenshot

IMPORTANT UPDATE #1:

The current stable release of ChromeDriver is 114.0.5735.90, which is not compatible with the current version (as of 2024.06.04) of Chrome (125.0.6422.141), so the script, as above, would not work.

To fix this, at the moment, the change to be made is unfortunately manual, by downloading the ChromeDriver version (relative to the current stable version of Chrome) from here, as shown in the image below (for Chrome 125.0.6422.141):

enter image description here

Once the chromedriver-linux64.zip archive has been saved, the extracted folder must be renamed with the relevant version of Chrome (125.0.6422.141) and then moved to the path ~/.wdm/drivers/chromedriver/linux64/ (obtaining ~/.wdm/drivers/chromedriver/linux64/125.0.6422.141/chromedriver), and therefore the script must be modified by replacing driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options) with driver = webdriver.Chrome(executable_path=r"~/.wdm/drivers/chromedriver/linux64/125.0.6422.141/chromedriver", options=chrome_options).

That's all!

IMPORTANT UPDATE #2 (2025/08/01):

Steps for the following configuration:

  1. My OS: Linux Fedora 40

  2. My Browser: Chrome 138.0.7204.183 (latest official 64-bit build for my OS)

  3. Visit the previously mentioned page and filter the JSON with your Chrome version (138.0.7204.183, in my case), downloading the version of chromedriver appropriate for your OS;

  4. Extract chromedriver-linux64.zip and rename the extracted folder to 138.0.7204.183 by moving it to ~/.wdm/drivers/chromedriver/linux64/.

  5. Based on what was reported here, create a modified webpage_screenshot.py file as follows:

import argparse
import logging
import os
import time
from io import BytesIO
from PIL import Image
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

def parse_args():
    parser = argparse.ArgumentParser(description="Cattura screenshot di una pagina web.")
    parser.add_argument("-d", "--directory", type=str, default="./download",
                        help="Percorso della directory in cui salvare lo screenshot.")
    parser.add_argument("-u", "--url", type=str, required=True,
                        help="URL della pagina da cui fare lo screenshot.")
    return parser.parse_args()

def open_url(address, download_folder):
    # Impostazioni cartella download
    download_folder = os.path.expanduser(download_folder)  # <<< expand ~
    if not os.path.exists(download_folder):
        os.makedirs(download_folder)
    prefs = {"download.default_directory": os.path.abspath(download_folder)}

    # SELENIUM SETUP
    logging.getLogger('WDM').setLevel(logging.WARNING)
    chrome_options = Options()
    chrome_options.headless = True
    chrome_options.add_experimental_option("prefs", prefs)
    service = Service(ChromeDriverManager().install())
    driver = webdriver.Chrome(service=service, options=chrome_options)
    driver.implicitly_wait(1)
    driver.set_window_size(1920, 1080)
    driver.get(address)

    output_path = os.path.join(download_folder, "Screenshot.png")
    save_screenshot(driver, output_path)
    driver.quit()

def save_screenshot(driver, file_name):
    height, width = scroll_down(driver)
    driver.set_window_size(width, height)
    img_binary = driver.get_screenshot_as_png()
    img = Image.open(BytesIO(img_binary))
    img.save(file_name)
    print(f"Screenshot salvato in: {file_name}")

def scroll_down(driver):
    total_width = driver.execute_script("return document.body.offsetWidth")
    total_height = driver.execute_script("return document.body.parentNode.scrollHeight")
    viewport_width = driver.execute_script("return document.body.clientWidth")
    viewport_height = driver.execute_script("return window.innerHeight")

    rectangles = []

    i = 0
    while i < total_height:
        ii = 0
        top_height = min(i + viewport_height, total_height)

        while ii < total_width:
            top_width = min(ii + viewport_width, total_width)
            rectangles.append((ii, i, top_width, top_height))
            ii += viewport_width

        i += viewport_height

    previous = None

    for rectangle in rectangles:
        if previous is not None:
            driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1]))
            time.sleep(0.5)
        previous = rectangle

    return total_height, total_width

if __name__ == "__main__":
    args = parse_args()
    open_url(args.url, args.directory)
  1. The script now supports launching with two options (-u, --url; -d, --directory), instead of having to write them statically in the code.

  2. Run the script by entering the following commands:

python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install selenium Pillow image webdriver_manager
python webpage_screenshot.py -u "https://stackoverflow.com/questions/4091940/how-to-save-web-page-as-image-using-python" -d "~/Desktop/Screenshot"
  1. The chosen save folder (~/Desktop/Screenshot) will be created with the screenshot inside it!

Cheers

like image 103
Riccardo Volpe Avatar answered Jan 31 '26 22:01

Riccardo Volpe


from selenium import webdriver    
from xvfbwrapper import Xvfb
d=Xvfb(width=400,height=400)
d.start()
browser=webdriver.Firefox()
url="http://stackoverflow.com/questions/4091940/how-to-save-web-page-as-image-using-python"
browser.get(url)
destination="screenshot_filename.jpg"
if browser.save_screenshot(destination):
    print "File saved in the destination filename"
browser.quit()
like image 20
Kavitha Avatar answered Jan 31 '26 22:01

Kavitha



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!