robots.txt transformations

Available in SCORE 2.1

Web site owners use the /robots.txt file to give instructions about their site to web robots. According to specification  robots.txt file must be in the top-level directory of the host, accessible though the appropriate protocol and port number. Static robots.txt file works just fine if Sitecore hosts only one file but multi tenant setup may require different rules for every site. This topic describes how you can achieve that with SCORE.

There are several steps to add dynamic behavior to /robots.txt file:

  1. Switch HTTP handler for site
  2. Create File Template
  3. Configure site settings

Turning on ASP.NET HTTP handler for /robots.txt

IIS by default delivers all *.txt files using StaticFileHandler HTTPHandler. That module loads file from disc and returns its content to caller. We need to register a custom HTTPHandler that can generate the right tenant specific content.

Add item below to the end of section <customHandlers>. That sections can be found in web.config file for pre 8.1 Sitecore versions and it is moved to App_Config/Sitecore.config in Sitecore 8.1.

<customHandlers>
   ...
   <handler trigger="robots.txt" handler="robots.ashx" />
</customHandlers>

 Add Item below to the end of section <handlers>

<handlers>
   ...
   <add name="Score.Robots.Txt" verb="GET" path="robots.ashx" 
        type="Score.Custom.HttpHandlers.RobotsTxtHandler, Score.Custom" preCondition="runtimeVersionv4.0" />
</handlers>


Option #1 - Creating a robots.txt as a site setting content item

RobotsTxtHandler loads robots.txt content from special settings item located within the website root folder.

Step 1 - Go to Site settings and add item based in template /sitecore/templates/Score/Base/robotstxt File. Edit file content in field Robots File Template as necessary:

Step 2 - Add a robotsTxtItem to the <site> definition in Sitecore

<site patch:before="site[@name='website']" name="score" virtualFolder="/" physicalFolder="/"
 rootPath="/sitecore/content/scoretest" robotsTxtItem="/Settings Folder/robotstxt File"
 startItem="/home" database="web" domain="extranet"/> 

The robotxTxtItem setting specifies a path relative to the rootPath specified for the site.

Option #2 - Adding a file for the tenant as a robots.txt

You can also create a file on the filesystem of the server that is specific to the tenant and "map" the filename to respond to the robots.txt request.

Step 1 - Create a File

Call the file something other than robots.txt - something like tenantname-robots.txt and save the file to the root Website folder or to a subdirectory of the Website folder.

Step 2 - Add a robotsTxtFilename to the <site> definition in Sitecore

<site patch:before="site[@name='website']" name="score" virtualFolder="/" physicalFolder="/"
 rootPath="/sitecore/content/scoretest" robotsTxtFilename="myrobots\score-robots.txt"
 startItem="/home" database="web" domain="extranet"/> 

Template variables

At this time RobotsTxtHandler supports only one variable. 

 <sitemap> text in file template substituted with full URL to site sitemap.

<sitemap> value generated using following pattern:

 xmlSitemapFilename = $"{scheme}://{targetHostName}/{xmlSitemapFilename}.gz";

 All values are taken from corresponding site attributes.

Sitemap URL assumes that compression processor CompressXmlSitemapFiles used and file saved with *.gz extension