I know this post will probably be closed, but I have to try because I am very desperate. I am not looking for a solution but for a technique. I try to scrape some content from a public site (I am doing it with beautiful soap in python but it doesn't matter). I have stumbled into a problem getting a download link. consider this:
<a href="/games/9380-beach-buggy-racing/download-filelocal-55787" class="onclick-download-ads app-btn cuprum" title="Скачать apk файл, размером 90.0 MB">
<b class="btntext">Скачать</b>
<span class="lcol">90.0 MB</span>
<span class="rcol">(apk)</span>
</a>
this download link when pressed download a file from this address -
http://dl3.top-android.org/?a=eyJkYXRhIjp7ImluZm8iOiJXaW5kb3dzOkNocm9tZXw4OS4xMzguNTQuMjIwIiwiZGF0YSI6eyJpc19hdXRoZW50aWNhdGVkIjpmYWxzZSwiZmlsZW5hbWUiOiJhcHBsaWNhdGlvbnMvYmVhY2gtYnVnZ3ktcmFjaW5nLTEuMi4xMi5hcGsifX19%3A1df1Gg%3AbsYoAragbQaUlQ_hjhJHL3FEliI%3A1df1Gg%3ATE9B8n9tJuMAKwBuzd1hXZmMOaA
As you can see this is not the href address in the a tag. I want to get somehow this link.
I know the browser doesn't make any new requests (monitored it via developer tab) when I press this link.
I tried to search all the js files and found nothing related to dl3.top...
please help me understand what is going on. As I understand if no request is being made I already have all the relevant information loaded in my browser.
I know the browser doesn't make any new requests (monitored it via developer tab) when I press this link.
The browser actually does make a new request; and it responds with HTTP/1.1 302 FOUND
with a Location
header containing the URL you seek.
Here is a simple script to scrape that Location
header from this URL. You'll have to add User-Agent
and Referer
headers in order to get a valid response, otherwise the response will be a 403 Forbidden
error.
Python 3 code
import http.client
conn = http.client.HTTPConnection("top-android.org")
conn.debuglevel = 1
conn.request("GET","/games/1556-pou-tamago4i/download-filelocal-32318",headers={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0',
'Referer':'http://top-android.org/games/1556-pou-tamago4i/'
})
r1 = conn.getresponse()
print("\n\nURL: %s" % r1.getheader('Location'))
and it will print the wanted link of course:
Result
> python scrape_location.py
send: b'GET /games/1556-pou-tamago4i/download-filelocal-32318 HTTP/1.1\r\nHost: top-android.org\r\nAccept-Encoding: identity\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0\r\nReferer: http://top-android.org/games/1556-pou-tamago4i/\r\n\r\n'
reply: 'HTTP/1.1 302 FOUND\r\n'
header: Date header: Content-Type header: Transfer-Encoding header: Connection header: Set-Cookie header: Vary header: Location header: X-Frame-Options header: Server header: CF-RAY
URL: http://dl3.top-android.org/?a=eyJkYXRhIjp7ImluZm8iOiJXaW5kb3dzOkZpcmVmb3h8MmEwMjo1ODc6OWMyNzplNDAwOjg0OGQ6MmNhOmE4Mjk6MjExNyIsImRhdGEiOnsiaXNfYXV0aGVudGljYXRlZCI6ZmFsc2UsImZpbGVuYW1lIjoiYXBwbGljYXRpb25zL3BvdS10YW1hZ280aS0xLjQuNjYuYXBrIn19fQ%3A1dgoCA%3AFsWjvbE-s3Mqe9tZNS2CAbfUinw%3A1dgoCA%3A6rJ8th0GeOHsVtKeAPpnwNfqUa0
Just remember, the Referer
header must be set to window.location.href
else will result in a 403
error.
EDIT
As rupps's comment explains, that URL contains base64 JSON and binary data. In my case, the JSON data contains "is_authenticated":false
:
{
"data": {
"info": "Windows:Firefox|xxxx:xxx:xxxx:xxxx:xxxx:xxx:xxxx:xxxx",
"data": {
"is_authenticated": false,
"filename": "applications/pou-tamago4i-1.4.66.apk"
}
}
}
which downloads the file as well; it just sometimes fail with a 404 strange error. If I insist hitting the download button it downloads the apk file!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With