Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need a good HTML parser on php

Tags:

html

parsing

Found this one http://simplehtmldom.sourceforge.net/ but it has failed to work

extracting this page http://php.net/manual/en/function.curl-setopt.php
and parse it to plain html, it failed and returned a partial html page

This is what I want to do, Go to a html page and get the components individual( the contents of all div and p in a hierarchy ) I like the features of simplehtmldom any such parser is required which is good at all code(best and worst).

like image 905
goutham Avatar asked Oct 15 '25 20:10

goutham


1 Answers

I often use DOMDocument::loadHTML, which works not too bad, in the general cases -- and I like querying the documents, once they are loaded as DOM, with Xpath.

Unfortunatly, I suppose that, in some cases, if the HTML page is really to badly-formed, some parsing problems can occur... That's when you start understanding that respecting web-standards is a great idea...

like image 140
Pascal MARTIN Avatar answered Oct 18 '25 09:10

Pascal MARTIN



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!