Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Puppeteer not actually downloading ZIP despite Clicking Link

I've been making incremental progress, but I'm fairly stumped at this point.

This is the site I'm trying to download from https://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp The reason I'm using Puppeteer is because I can't find a supported API to get this data (if there is one happy to try it) The link is "Download Raw Data"

My script runs to the end, but doesn't seem to actually download any files. I tried installing puppeteer-extra and setting the downloads path:

const puppeteer = require("puppeteer-extra");
const { executablePath } = require('puppeteer')

...

var dir = "/home/ubuntu/AirlineStatsFetcher/downloads";
    console.log('dir to set for downloads', dir);
    puppeteer.use(require('puppeteer-extra-plugin-user-preferences')
        (
            {
                userPrefs: {
                    download: {
                        prompt_for_download: false,
                        open_pdf_in_system_reader: true,
                        default_directory: dir,
                    },
                    plugins: {
                        always_open_pdf_externally: true
                    },
                }
            }));

    const browser = await puppeteer.launch({
        headless: true, slowMo: 100, executablePath: executablePath()
    });

...
    // Doesn't seem to work
    await page.waitForSelector('table > tbody > tr > .finePrint:nth-child(3) > a:nth-child(2)');
    console.log('Clicking on link to download CSV');
    await page.click('table > tbody > tr > .finePrint:nth-child(3) > a:nth-child(2)');

After a while I figured why not tried to build the full URL and then do a GET request but then i run into other problems (UNABLE_TO_VERIFY_LEAF_SIGNATURE). Before going down this route farther (which feels a little hacky) I wanted to ask advice here.

Is there something I'm missing in terms of configuration to get it to download?

like image 451
ObjectNameDisplay Avatar asked Sep 19 '25 13:09

ObjectNameDisplay


1 Answers

Downloading files using puppeteer seems to be a moving target btw not well supported today. For now (puppeteer 19.2.2) I would go with https.get instead.

"use strict";

const fs = require("fs");
const https = require("https");
// Not sure why puppeteer-extra is used... maybe https://stackoverflow.com/a/73869616/1258111 solves the need in future.
const puppeteer = require("puppeteer-extra");
const { executablePath } = require("puppeteer");

(async () => {
  puppeteer.use(
    require("puppeteer-extra-plugin-user-preferences")({
      userPrefs: {
        download: {
          prompt_for_download: false,
          open_pdf_in_system_reader: false,
        },
        plugins: {
          always_open_pdf_externally: false,
        },
      },
    })
  );

  const browser = await puppeteer.launch({
    headless: true,
    slowMo: 100,
    executablePath: executablePath(),
  });

  const page = await browser.newPage();
  await page.goto(
    "https://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp ",
    {
      waitUntil: "networkidle2",
    }
  );

  const handle = await page.$(
    "table > tbody > tr > .finePrint:nth-child(3) > a:nth-child(2)"
  );

  const relativeZipUrl = await page.evaluate(
    (anchor) => anchor.getAttribute("href"),
    handle
  );

  const url = "https://www.transtats.bts.gov/OT_Delay/".concat(relativeZipUrl);
  const encodedUrl = encodeURI(url);

  //Don't use in production
  https.globalAgent.options.rejectUnauthorized = false;

  https.get(encodedUrl, (res) => {
    const path = `${__dirname}/download.zip`;
    const filePath = fs.createWriteStream(path);
    res.pipe(filePath);
    filePath.on("finish", () => {
      filePath.close();
      console.log("Download Completed");
    });
  });

  await browser.close();
})();
like image 142
stefan.seeland Avatar answered Sep 22 '25 04:09

stefan.seeland