Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Download GitHub Pull Request description images remotely or via the API

Background

When someone merges a pull request into a private repository in GitHub, I want to show the details of the Pull Request, including the images in the description, in another location (Slack). Usually these are short videos or screenshots of what has changed, so it would be great to have a continuous stream of changes visible to everyone in slack.

The problem

From what I can tell looking at the GitHub API Docs, there is no way to download these images via the API.

The images are stored at URL's like https://github.com/owner/project-name/assets/* that are not publicly accessible. So you have to be logged into the browser to actually get access to the image.

When you do view an image in the browser, GitHub redirects you to a short-lived URL that looks like https://private-user-images.githubusercontent.com/123456/251885706-e74af325-a947-47f7-8dad-61129ad62f11.png?jwt=eyJ.... This URL is public, but again, I want to generate that URL without being logged into the browser so that I can do this in response to a webhook.

Example

For example, the PR description might have something like this:

Did a bunch of cool stuff in this one...

## What it looks like
<img width="1238" alt="Screenshot 2023-07-07 at 6 28 14 PM" 
src="https://github.com/owner/project-name/assets/123456/e74af324-a944-47f4-8da4-61129ad62f14">

What I want to know is how to download the image located at https://github.com/owner/project-name/assets/123456/e74af324-a944-47f4-8da4-61129ad62f14 remotely with a script.

like image 861
plowman Avatar asked Sep 14 '25 21:09

plowman


1 Answers

Get your user_session Cookie from your browser and provision a token to access Github API.

export GH_TOKEN="<token>"
export GH_SESSION_COOKIE="<session_cookie>"
python download.py "<owner>/<repo>/pulls/<pr_number>"

Content of download.py

#!/usr/bin/env python3

import os
import sys
import urllib.request
import json
import re
from urllib.parse import urlparse


def main():
    # Read GH_TOKEN and GH_SESSION_COOKIE from environment variables
    gh_token = os.environ["GH_TOKEN"]
    gh_session_cookie = os.environ["GH_SESSION_COOKIE"]

    # Set pull request number & repo name
    path_segment = sys.argv[1]

    # Get URL regexp
    url_regexp = re.compile(r"https?://[^\"]+")

    headers = {
        "Accept": "application/vnd.github+json",
        "Authorization": f"Bearer {gh_token}",
        "X-GitHub-Api-Version": "2022-11-28"
    }

    # Download the pull request body
    req = urllib.request.Request(
        f"https://api.github.com/repos/{path_segment}", headers=headers)
    resp = urllib.request.urlopen(req)

    # Get all occurrences of URL like patterns using RegExp
    body = json.loads(resp.read().decode('utf-8'))['body']

    urls = url_regexp.findall(body)

    # Download files from URLs
    for url in urls:
        headers = {
            "cookie": f"user_session={gh_session_cookie};"
        }
        req = urllib.request.Request(url, headers=headers)
        with urllib.request.urlopen(req) as u:

            # Get the file name from the URL
            filename = urlparse(u.geturl()).path.split('/')[-1]

            with open(filename, 'wb') as f:
                f.write(u.read())


if __name__ == "__main__":
    main()

Caveat

session_cookie is only valid for 2 weeks. Extra care must be taken to keep it secret as that cookie allows to impersonate your Github account.

like image 184
Paul Brit Avatar answered Sep 17 '25 01:09

Paul Brit