Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing a site with dynamic content

I'm using Nokogiri to parse TechCrunch [with a specific search term.

http://techcrunch.com/search/education#stq=education&stp=1

The problem is that the site has a delay of a few seconds before it returns a list related to the search item, so the URL I input to Nokogiri to parse is empty of relevant content when Nokogiri retrieves it.

The content appears to load after a couple of seconds dynamically - I'm guessing Javascript. Any ideas of how to retrieve the HTML with a slight delay?

like image 704
Jonathan_W Avatar asked Nov 18 '25 20:11

Jonathan_W


1 Answers

Use Ruby method, sleep

seconds_to_delay = 5
sleep seconds_to_delay

Edit 1: Dealing with divs that load some time after the document finishes loading

I hate this scenario. I had to deal with exact same scenario, so here's how I solved it. You need to use something like selenium-webdriver gem.

require 'selenium-webdriver'
url = "http://techcrunch.com/search/education#stq=education&stp=1"

css_selector = ".tab-panel.active"

driver = Selenium::WebDriver.for :firefox
driver.get(url)
driver.switch_to.default_content
posts_text = driver.find_element(:css, css_selector).text
puts posts_text
driver.quit

If you are running this on some virtual machine on Heroku, AWS EC2 or Digital Ocean and stuff, you can't use firefox. Instead you need a headless browser like phantom.js.

In order to use phantom.js instead of firefox, first, install phantomjs on the VM. Then change to driver = Selenium::WebDriver.for :phantomjs.

You can use this gem that actually installs phantomjs for you.


Second edit for question b)

require 'selenium-webdriver'
url = "http://techcrunch.com/search/education#stq=education&stp=1"

css_selector = ".tab-panel.active ul.river-compact.river-search li"

driver = Selenium::WebDriver.for :phantomjs
driver.get(url)
driver.switch_to.default_content
items = driver.find_elements(:css, css_selector)
items.each {|x| puts x }
driver.quit
like image 130
Jason Kim Avatar answered Nov 21 '25 09:11

Jason Kim



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!