Scrape Download Url from a link generated by javascript

Question

I know this post will probably be closed, but I have to try because I am very desperate. I am not looking for a solution but for a technique. I try to scrape some content from a public site (I am doing it with beautiful soap in python but it doesn't matter). I have stumbled into a problem getting a download link. consider this:

<a href="/games/9380-beach-buggy-racing/download-filelocal-55787" class="onclick-download-ads app-btn cuprum" title="Скачать apk файл, размером 90.0 MB">
    <b class="btntext">Скачать</b>
    <span class="lcol">90.0 MB</span>
    <span class="rcol">(apk)</span>

</a>

this download link when pressed download a file from this address -

http://dl3.top-android.org/?a=eyJkYXRhIjp7ImluZm8iOiJXaW5kb3dzOkNocm9tZXw4OS4xMzguNTQuMjIwIiwiZGF0YSI6eyJpc19hdXRoZW50aWNhdGVkIjpmYWxzZSwiZmlsZW5hbWUiOiJhcHBsaWNhdGlvbnMvYmVhY2gtYnVnZ3ktcmFjaW5nLTEuMi4xMi5hcGsifX19%3A1df1Gg%3AbsYoAragbQaUlQ_hjhJHL3FEliI%3A1df1Gg%3ATE9B8n9tJuMAKwBuzd1hXZmMOaA

As you can see this is not the href address in the a tag. I want to get somehow this link.

I know the browser doesn't make any new requests (monitored it via developer tab) when I press this link.
I tried to search all the js files and found nothing related to dl3.top...

please help me understand what is going on. As I understand if no request is being made I already have all the relevant information loaded in my browser.

Christos Lytras · Accepted Answer

I know the browser doesn't make any new requests (monitored it via developer tab) when I press this link.

The browser actually does make a new request; and it responds with HTTP/1.1 302 FOUND with a Location header containing the URL you seek.

Download link response headers

Here is a simple script to scrape that Location header from this URL. You'll have to add User-Agent and Referer headers in order to get a valid response, otherwise the response will be a 403 Forbidden error.

Python 3 code

import http.client
conn = http.client.HTTPConnection("top-android.org")
conn.debuglevel = 1
conn.request("GET","/games/1556-pou-tamago4i/download-filelocal-32318",headers={
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0',
    'Referer':'http://top-android.org/games/1556-pou-tamago4i/'
    })
r1 = conn.getresponse()

print("

URL: %s" % r1.getheader('Location'))

and it will print the wanted link of course:

Result

> python scrape_location.py
send: b'GET /games/1556-pou-tamago4i/download-filelocal-32318 HTTP/1.1
Host: top-android.org
Accept-Encoding: identity
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0
Referer: http://top-android.org/games/1556-pou-tamago4i/

'
reply: 'HTTP/1.1 302 FOUND
'
header: Date header: Content-Type header: Transfer-Encoding header: Connection header: Set-Cookie header: Vary header: Location header: X-Frame-Options header: Server header: CF-RAY

URL: http://dl3.top-android.org/?a=eyJkYXRhIjp7ImluZm8iOiJXaW5kb3dzOkZpcmVmb3h8MmEwMjo1ODc6OWMyNzplNDAwOjg0OGQ6MmNhOmE4Mjk6MjExNyIsImRhdGEiOnsiaXNfYXV0aGVudGljYXRlZCI6ZmFsc2UsImZpbGVuYW1lIjoiYXBwbGljYXRpb25zL3BvdS10YW1hZ280aS0xLjQuNjYuYXBrIn19fQ%3A1dgoCA%3AFsWjvbE-s3Mqe9tZNS2CAbfUinw%3A1dgoCA%3A6rJ8th0GeOHsVtKeAPpnwNfqUa0

Just remember, the Referer header must be set to window.location.href else will result in a 403 error.

EDIT

As rupps's comment explains, that URL contains base64 JSON and binary data. In my case, the JSON data contains "is_authenticated":false:

{
    "data": {
        "info": "Windows:Firefox|xxxx:xxx:xxxx:xxxx:xxxx:xxx:xxxx:xxxx",
        "data": {
            "is_authenticated": false,
            "filename": "applications/pou-tamago4i-1.4.66.apk"
        }
    }
}

which downloads the file as well; it just sometimes fail with a 404 strange error. If I insist hitting the download button it downloads the apk file!

Scrape Download Url from a link generated by javascript

Tags:

python

html

web-scraping

misha312

1 Answers

Christos Lytras

Recent Activity

Donate For Us

Scrape Download Url from a link generated by javascript

Tags:

python

html

web-scraping

misha312

1 Answers

Christos Lytras

Related questions

Recent Activity

Donate For Us