Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP web scraping

I use php web scraping, and I want to get the price (3.65) on Sunday form the html code below:

     <tr class="odd">
       <td >
           <b>Sunday</b> Info
           <div class="test">test</div>
       </td>
       <td>
       &euro; 3.65 *

       </td>
    </tr>

But I don't find the best regex to do this... I use this php code:

    <?php
        $data = file_get_contents('http://www.test.com/');

        preg_match('/<tr class="odd"><td ><b>Sunday</b> Info<div class="test">test<\/div><\/td><td>&euro; (.*) *<\/td><\/tr>/i', $data, $matches);
        $result = $matches[1];
    ?>

But no result... What's wrong in the regex? (I think it's because of the new lines/spaces?)

like image 920
francisMi Avatar asked Jun 08 '26 10:06

francisMi


1 Answers

Don't use regular expressions, HTML is not regular.

Instead, use a DOM Tree parser like DOMDocument. This documentation may help you.

The /s switch should help you with your original regex though I haven't tried it.

like image 164
Martin Avatar answered Jun 11 '26 00:06

Martin