Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scraping Ajax using Python

I am trying to get the data in the table at this website which is updated via jquery after the page loads (I have permission) :

http://whichchart.com/

I currently use selenium and beautifulsoup to get data, however because this data is not visible in the html source, I can't access it. I have tried PyQt4 but it likewise does not get the updated html source.

The values are visible in firebug and chrome developer, so are there any python packages out there which can exploit this and feed it to beautifulsoup?

I'm not a massive techie so ideally I would like a solution which would work in Python or the next easiest software type.

I'm aware I can get it via proprietary "screen-scraper" software, but that is expensive.

like image 637
eamon1234 Avatar asked May 26 '26 15:05

eamon1234


1 Answers

Page is making AJAX call to get a data to http://whichchart.com/service.php?action=NewcastleCoal which returns values in JSON. So you can do the following:

  • Use urllib to get data using HTTP
  • Parse that data with json library reads method
  • Now you have a python object to process

If you need to process HTML page content I would suggest to use library like BeautifulSoup or scrapy

like image 85
Maksym Kozlenko Avatar answered May 28 '26 05:05

Maksym Kozlenko



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!