Showing posts tagged crawling
A few months ago at SemTech 2009 we announced that our questions and answers database –launched almost a year ago – had grown to more than 300 million high-quality Q&A pairs. “High-quality” means that we use our semantic and extraction capabilities to recognize the best answer from within the sea of information on relevant pages. Instead of 10 blue links, we deliver the best answer right at the top of the page.
This week we’ve achieved another significant milestone by reaching 400 million Q&A pairs, and I want to acknowledge the outstanding work of our engineering and product teams who have built one of the largest and most useful Q&A collections on the web.
I also want to share what we’re seeing from our users in response to our Q&A offerings, and to preview what’s next for Ask.
Our Q&A strategy has started to pay off. We see increasing loyalty among users who conduct question searches on Ask. Simultaneously, we’ve seen a pronounced increase in the percentage of users on Ask who conduct queries in the form of a question – we now see 3x more questions on our site as a share of total queries than our competitors. And perhaps most rewarding for us is when we ask Internet users where they go for questions and answers online, they consistently rank Ask.com first, making us the #1 brand for questions and answers online.
Online search in the form of natural-language questions was the ingenious proposition of the original Ask Jeeves in 1996, and frankly, it’s the reason we’re still around today after so many other Internet brands didn’t survive.
As the leader in questions for more than a decade, one thing is crystal clear: Asking a question isn’t the same as searching.
Our users tell us that their expectation when asking a question is different from their expectation when conducting a search. When asking a question, they have a specific need for a specific piece of information. When conducting a search, they’re browsing for information, sorting through results to unearth the answer they’re looking for.
Put another way, when asking a question, you expect the work to be done for you (much like when you ask a librarian for a book at the library). When conducting a search, you do the work yourself (skipping the librarian, and heading to the card catalog instead).
Further, with the advent of the social web, asking questions online is now more natural, as we have the ability to broadcast a question to real people, our friends, instead of hoping a computer can understand our inquiry.
I firmly believe that questions are the future of search, but search technologies as we know them today can’t deliver against this future.
And this brings me to what’s next for Ask.
We’re focused on solving the two shortcomings of search as it relates to questions:
1. Traditional search signals don’t work well for answers to questions.
2. The answers to many questions are wrong or don’t exist online.
Let me explain what I mean.
When you’re in the business of answering questions, the volume of inbound links to a given web page – a long-accepted search technique for ranking web sites – doesn’t tell you the site with the best answer to a user’s question; it just tells you the most popular page with relevant information. Nor does another search technique, text matching, sufficiently identify the best answer, as the text in a question is rarely found in the best answer. Same with a newer though established technique, pioneered at Ask, actually, that uses click-through behavior to determine a site’s relevance. Unlike presenting a text snippet that merely describes a site and a link, presenting the actual answer requires no click through to the
More importantly, no method that merely extracts answers from a published web page will ever be able to access the limitless number of answers that are unpublished on the Internet. Indeed, the information that is directly relevant to many questions most certainly exists; it’s just that it’s locked in people’s heads or captured in unpublished conversations, and therefore inaccessible by traditional search. Obviously, this is not a trivial deficiency in a world that is increasingly interconnected and clamoring for perspective, guidance, and shared knowledge at an interpersonal level online.
At Ask.com, we’re dedicating ourselves to solving these problems and we’re approaching the solution in two primary ways:
1. Extracting and ranking existing answers
2. Indexing sources of answers that have not yet been published
To extract and rank existing answers, as opposed to merely ranking web pages that contain information, we have and are continuing to develop a unique set of algorithms and technologies that are based on new signals for relevance specifically tuned to questions and answers.
I’ve outlined a few of these below.
Developing a new Q&A relevance algorithm that draws upon these signals is what we’re focused on building here at Ask, honing our ability to extract answers from the published Internet, and allowing us to fulfill a vastly larger volume of questions than can be done with existing search technologies.
But our work doesn’t end with extraction and ranking of existing, published answers. Where our vision really comes to life is in our efforts to index the sources of unpublished knowledge that can generate answers specifically in response to a question, in the moment it’s asked. This is the long tail of questions that are nearly impossible for search engines to answer, but which create incredible value for users when they are.
Here are some examples:
As we accelerate our strategy to answer the world’s questions, these “tough questions” are where we see huge opportunity, and where we are also focusing our efforts. And as you’ve probably guessed by now, we will do this unconventionally, harnessing the equity of the Ask brand, and our loyal, question-loving users to build a community of answerers available through Ask.
We’ve learned at Ask that while the existing Web can solve many problems, when you’re in the pursuit of answering questions, relying on published information sources can really only get you part of the way there. There is an infinite volume of answers in people’s heads that isn’t being indexed by the search engines today, and that can’t be successfully deployed against questions until you unleash it, in real-time, in response to the unique needs expressed by the person asking the question.
This is the problem we’re in the process of solving here at Ask: Connecting our users’ questions to the best possible answers on the planet – be they published or unpublished. And as we solve this problem, we believe today’s multi-billion dollar questions and answers value proposition will one day transcend search as we know it today.
I’m very passionate about this, and so is our team at Ask.com. You’ll be hearing much more from us on this in the coming months.