Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Anyway to scrape a link that redirects?

Is there anyway that I can make python click a link such as a bit.ly link and then scrape the resulting link? When I am scraping a certain page, the only link I can scrape is a link that redirects, where it redirects to is where the information I need is located.

like image 648
ColeWorld Avatar asked Sep 13 '25 06:09

ColeWorld


1 Answers

There are 3 types of redirections

  • HTTP - as information in response headers (with code 301, 302, 3xx)
  • HTML - as tag <meta> in HTML (wikipedia: Meta refresh)
  • JavaScript - as code like window.location = new_url

requests execute HTTP redirections and keep all urls in r.history

import requests

r = requests.get('http://' + 'bit.ly/english-4-it')

print(r.history)
print(r.url)

result:

[<Response [301]>, <Response [301]>]
http://helion.pl/ksiazki/english-4-it-praktyczny-kurs-jezyka-angielskiego-dla-specjalistow-it-i-nie-tylko-beata-blaszczyk,anginf.htm

BTW: SO doesn't let put bitly link in text so I used concatenation.

like image 64
furas Avatar answered Sep 14 '25 20:09

furas