Jun
29
Optimizing Drupal search results using robots.txt
Mon, 06/29/2009 - 20:17
You may have a website packed with great information although if search engine spiders can't crawl your site, your results may show up lower in the results listings, or not at all. A properly configured robots.txt is one key that can help make sure your site is properly crawled and help boost your search engine ranking. Some module configurations can cause the crawling of improper results, duplication, or endless looping.
In my experience, exposed View filters caused a Google search engine appliance to loop over the same content using different GET parameters with various combinations of the filter settings returning the same results. Another issue with the Drupal Calendar module where spiders will crawl endlessly through years of calendar pages many years in the future and past even if nodes are not associated with these time periods. By adding several lines to a robots.txt disallowing robots from visiting these endless sections of your site, search crawls will finish more quickly, produce better results, and use less resources. For example:
# Disallow all URL variables except for page
Disallow: /*?
Allow: /*?page=
Disallow: /*?page=*&*
Disallow: /*?page=0*
Adding these lines to a robots.txt file in your site root will allow results from the view display to be indexed while limiting the spider from crawling using multiple combinations of exposed filter values.
Post new comment