« SMX West, and the Keynote Interview with John Battelle... | Main | Interpreting Hitwise Statistics on Longer Queries »

February 23, 2009

Ask Is Going “Canonical”!

More often than not, web sites produce pages with the same content, but with different URLs. Usually, this is simply unintentional. For example, the following URLs usually lead to the same page (try it out):

1) http://www.mywebsearch.com
2) http://mywebsearch.com/
3) http://search.mywebsearch.com/mywebsearch/default.jhtml

Of course, sometimes it is necessary for web site designers to make the URL look different even if the contents are the same. For example, in the e-commerce world, it is common to use a dynamic strings called "SESSION ID" to differentiate one user from the others for session tracking purpose.

Most importantly, from a search engine perspective, these are all different URLs and will all be indexed. This leads to a glut of duplicate content in the search Index - same web page, different URLs.

At Ask, we employ sophisticated and proprietary algorithms to detect duplicate content, so that we can present the search consumer with as many useful results as possible. However, the differences in proprietary algorithms used among different search engines may result in different forms of the URL to be presented to users, resulting in inconsistency.

To help meet this challenge, Ask is announcing today that we are joining forces with other major search engines in a timely partnership to support a special search index feature called "canonical URL tag". This feature gives webmasters an opportunity to set a "preferred" name for a page.

For example, a webmaster would include a tag such as this (remembering the last '/')…

<link rel="canonical" href="http://www.mywebsearch.com"/>

...into the header section of http://mywebsearch.com/, and http://search.mywebsearch.com/mywebsearch/default.jhtml.

Ask will use this information to combine characteristics of all three pages into one, and will attempt to take the canonical URL tag as the 'preferred' name for the page.

While web master's preference hint will be strongly honored, other signals may be used in determining the final form when multiple forms exist in the index. If the canonical URL causes 404 error during crawling, or if it specifies a URL on a different domain, the canonical tag will be ignored.

The "canonical" feature represents a timely, relevant, and positive partnership between major search engines. It is a step to ensuring more consistency with regard to treatment of duplicates among all of the engines. It will also put more control into the hands of site designers over how their sites are represented within the search indexes.

At Ask, we look forward to adoption of "canonical" by more and more websites, and to working in conjunction with our search partners. It's an idea whose time has come, and we're excited to be a part of the effort to improve the user experience on the web!

Thanks for reading - and please post a note here if you have any questions at all.

Yufan Hu - VP, Core Search Systems, Ask.com

Posted by Ask.com Blog | Permalink

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c539153ef0111689245b6970c

Listed below are links to weblogs that reference Ask Is Going “Canonical”!:

Comments

Post a comment






Opinions expressed here and in any corresponding comments are the personal opinions of the original authors, not of IAC Search & Media and may not have been reviewed in advance.

Blog Search from: Bloglines