Wednesday, April 20, 2011

Robots.txt prevents SharePoint 2010 search crawler from crawling sites

As per our operation manager’s request I added robots.txt on our preview server to prevent Google’s bot or Bing’s bot from indexing our client’s preview sites.

Original robots.txt:
User-agent: *
Disallow: /

Effectively it should stop all legitimate bots from indexing preview sites. However this also stopped SharePoint 2010’s search crawler from crawling the sites. Any full crawl gave me 0 results.

The solution is to allow SharePoint 2010’s search crawler but to disallow other bots.  Now how do we find the user agent string for SharePoint 2010’s crawler?
If you go to Registry Editor and open the following:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\14.0\Search\Global\Gathering Manager\UserAgent
You will find the UserAgent string used by SharePoint 2010’s crawler.
So all I had to do was edit my robots.txt to allow that particular user agent to index our SharePoint 2010 sites.

Updated robots.txt:
User-agent: Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 6.0 Robot)
Allow: /
User-agent: *
Disallow: /

Hope this helps :)

3 comments:

  1. How do you add robots.txt file to the sharepoint 2010?

    ReplyDelete
  2. Thanks for sharing your info. I really appreciate your efforts and I will be waiting for your further write ups thanks once again.

    ReplyDelete
  3. Hi Shobs,

    Sorry about the delay. I didn't realise that you asked a question.

    It's easy all you have to do is just added to the root folder of your site. So you it will be something like:

    C:\inetpub\wwwroot\wss\VirtualDirectories\yoursite\robots.txt.

    Hi SEO,

    Thanks for your comment.

    Cheers,
    Evan

    ReplyDelete