I am interested to do web crawling. I was looking at solr
.
Does solr
do web crawling, or what are the steps to do web crawling?
UiPath is a robotic process automation software for free web scraping. It automates web and desktop data crawling for most third-party apps. You can install the robotic process automation software if you run it on Windows. UiPath is able to extract tabular and pattern-based data across multiple web pages.
Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world's largest internet sites.
Solr is a search engine at heart, but it is much more than that. It is a NoSQL database with transactional support. It is a document database that offers SQL support and executes it in a distributed manner.
Solr 5+ DOES in fact now do web crawling! http://lucene.apache.org/solr/
Older Solr versions do not do web crawling alone, as historically it's a search server that provides full text search capabilities. It builds on top of Lucene.
If you need to crawl web pages using another Solr project then you have a number of options including:
If you want to make use of the search facilities provided by Lucene or SOLR you'll need to build indexes from the web crawl results.
See this also:
Lucene crawler (it needs to build lucene index)
Solr does not in of itself have a web crawling feature.
Nutch is the "de-facto" crawler (and then some) for Solr.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With