I have made an application that can extracts some specific info from a specific website. For that i have used regular expression that gives me desired output. Is there any better efficient process or idea than regex for that simple crawler.
If you say that it is a simple regex that solves your problem, than no, there is no other more efficient solution. When it comes to crawling, the alternative would be to load the entire html page in memory, in a DOM document and search using XPath or even XQuery. But really, if the information can be extracted easily with regex, then don't bother, especially if you are not familiar with XPath.
The power of XPath comes in when you want to make complex searches. And it is more elegant than regex, for this task(at least in w3c's oppinion). But if you want a quick solution, you already found it, and it is more efficient in terms of RAM too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With