I'm trying to get crawl to work on two separate farms I have but can't get it to work on either one. They both have two WFE's with an additional WFE configured as an Index server. There is one more server dedicated for Query and two clustered SQL 2005 back end servers for the database. I have unsuccessfully tried at least 50 different websites that I found with solutions from a search engine. I have configured (extended) my Web App to use http://servername:12345 as the default zone and http://abc.companyname.com as the custom and intranet zones. When I enter each of those into the content source and then try to run a crawl, I get a couple of errors in the crawl log:
http://servername:12345 returns:
"Could not connect to the server. Please make sure the site is accessible."
http://abc.companyname.com returns:
"Deleted by the gatherer. (The start address or content source that contained this item was deleted and hence this item was deleted.)"
However, I can click both URL's and the page is accessible.
Any ideas?
More info:
I wiped the slate clean, so to speak, and ran another crawl to provide an updated sample.
My content sources are as such:
http://servername:33333
http://sharepoint.portal.fake.com
sps3://servername:33333
My current crawl log errors are:
sps3://servername:33333
Error in PortalCrawl Web Service.
http://servername:33333/mysites
Content for this URL is excluded by the server because a no-index attribute.
http://servername:33333/mysites
Crawled
sts3://servername:33333/contentdbid={62a647a...
Crawled
sts3://servername:33333
Crawled
http://servername:33333
Crawled
http://sharepoint.portal.fake.com
The Crawler could not communicate with the server. Check that the server is available and that the firewall access is configured correctly.
I double checked for typos above and I don't see any so this should be an accurate reflection.
One thing to remember is that crawling SharePoint sites is different from crawling file shares or non-SharePoint websites.
A few other quick pointers:
Anyway, there's been a lot of back and forth in the responses, so I'm just shotgunning a bunch of suggestions out there, maybe one of them is on target.
I'm a little confused about your farm topology. A machine installed as a just a WFE cannot be an indexer. A machine installed as "complete" can be an indexer, query and/or a wfe...
Also, instead of changing the default content access account, you may want to add a crawl rule instead (once everything is up and running)
Can you see if anything helpful is in the %commonprogramfiles%/microsoft shared/web server extensions/12/logs on your indexer?
The log file may be a bit verbose, you can search for "started" or "full" and that will usually get you to the line in the log where your crawl started.
Also, on your sql machine, you may be able to get more information from the MSScrawlurlhistory table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With