Wednesday, April 20, 2011

Robots.txt prevents SharePoint 2010 search crawler from crawling sites

As per our operation manager’s request I added robots.txt on our preview server to prevent Google’s bot or Bing’s bot from indexing our client’s preview sites.

Original robots.txt:
User-agent: *
Disallow: /

Effectively it should stop all legitimate bots from indexing preview sites. However this also stopped SharePoint 2010’s search crawler from crawling the sites. Any full crawl gave me 0 results.

The solution is to allow SharePoint 2010’s search crawler but to disallow other bots.  Now how do we find the user agent string for SharePoint 2010’s crawler?
If you go to Registry Editor and open the following:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\14.0\Search\Global\Gathering Manager\UserAgent
You will find the UserAgent string used by SharePoint 2010’s crawler.
So all I had to do was edit my robots.txt to allow that particular user agent to index our SharePoint 2010 sites.

Updated robots.txt:
User-agent: Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 6.0 Robot)
Allow: /
User-agent: *
Disallow: /

Hope this helps :)

Tuesday, April 19, 2011

Updating external content type for SharePoint 2010 Business Connectivity Service

Our client submitted a request to update their existing External Content Type used for BCS. The request was so simple: rename Address field into Department and add 1 extra field. I thought this should be easy.

So I updated the employee model and redeployed the solution. But for some reason the changes didn’t get reflected in the content type. I have tried everything from resetting IIS, clean solution, rebuild solution, even copy and paste the dll directly to the GAC without any result.

After further investigation I found out that the type descriptor (which you can see from BDC explorer or by opening the bdcm file in notepad) are not automatically updated.
So that is the lesson, the type descriptors needs to be updated manually.