I have created a very simple web crawler in PHP, where I crawl some soccer sites for match results.
But when I crawl a website, it takes about 0.5 - 1 second to crawl it. So if I have a lot of urls to crawl it will take a lot of time.
This is my code start for crawling the site:
$doc = new DOMDocument();
$doc->loadHTMLFile("http://resultater.dai-sport.dk/tms/Turneringer-og-resultater/Pulje-Stilling.aspx?PuljeId=229");
$xpath = new DOMXpath($doc);
I have created the crawler myself, so maybe there is a better way to do this or a quicker way? Or maybe my expectations about the speed is to high?
Please check this lib for kind of asynchronous realization of your crawler. It uses "yield", appeared in PHP 5.5: https://github.com/icicleio/Icicle
You will find usage example in library examples.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With