Currently I am using simple_html_dom to scrape a website view here to see the website I am scraping, everything comes back fine except it continues to put the same content for every single post it scrapes.. View here to see demo
$page = (isset($_GET['p'])&&$_GET['p']!=0) ? (int) $_GET['p'] : '';  
$html = file_get_html('http://screenrant.com/movie-news/'.$page);
foreach($html->find('#site-top > div.site-wrapper > div.top-content > article > section > ul > li > div.info > h2 > a') as $element)
{
    print '<br><br>';
    echo $url = ''.$element->href;
    $html2 = file_get_html($url);
    $image = $html2->find('meta[property=og:image]',0);
    $news['image'] = $image->content;
    #print '<br><br>';
    // Ending The Featured Image
    #site-top > div.site-wrapper > div.top-content > article > section > ul > li:nth-child(2)
    $title = $html2->find('#site-top > div.site-wrapper > div.top-content > article > header.single-header > h1',0);
    $news['title'] = $title->plaintext;
    // Ending the titles
    print '<br>';
    #site-top > div.site-wrapper > div.top-content > article > div
    $articles = $html2->find('#site-top > div.site-wrapper > div.top-content > article > div > p');
    foreach ($articles as $article) {
    #echo "$article->plaintext<p>"; 
    $news['content'] = $news['content'] . $article->plaintext . "<p>";
    }
    print '<pre>';print_r($news);print '</pre>';
    print '<br><br>';
        // mysqli_query($DB,"INSERT INTO `wp_scraped_news` SET
             //                   `hash` = '".$news['title']."',
               //                 `title` = '".$news['title']."',
                 //               `image` = '".$news['image']."',
                   //             `content` = '".$news['content']."'");
         // print '<pre>';print_r($news);print '</pre>';
}
I have no idea where I am going wrong here but I am assuming it's one of two things and I have messed around with both of these things with no luck.
1. I am doing something wrong with how my foreach are laid out.
2. The website is changing selectors for each new article.
In both cases I am probably wrong.. but I've tinkered with them both for about 2 hours now and at the point of giving up.. any help is very appreciated.
The problem is that you're not clearing out the old content from $news['content']. So when you process the second page, you're appending its content to the content of the first page. And the third page appends to this again, and so on.
Put
$news['content'] = '';
before
foreach ($articles as $article) {
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With