December 18, 2007

Google’s Norvig Is Down On Natural Language Search

Erick Schonfeld

44 comments »

googleogo4.gifDon’t expect to see natural-language search at Google anytime soon. Despite the buzz of startups like Powerset and, to a lesser degree, true knowledge, Google’s head of research Peter Norvig pooh-poohs the notion that people are clamoring to write full sentences in search boxes. In a Q&A with Technology Review, he says:

We don’t think it’s a big advance to be able to type something as a question as opposed to keywords. Typing “What is the capital of France?” won’t get you better results than typing “capital of France.”

True, true. But he does acknowledge that there is some value in the technology:

We think what’s important about natural language is the mapping of words onto the concepts that users are looking for. . . . To give some examples, “New York” is different from “York,” but “Vegas” is the same as “Las Vegas,” and “Jersey” may or may not be the same as “New Jersey.” That’s a natural-language aspect that we’re focusing on. Most of what we do is at the word and phrase level; we’re not concentrating on the sentence. We think it’s important to get the right results rather than change the interface.

In other words, a natural-language approach is useful on the back-end to create better results, but it does not present a better user experience. Most people are too lazy to type in more than one or two words into a search box anyway. The folks at both Google and Yahoo know that is true for the majority of searchers. The natural-language search startups are going to find out about that the hard way. If Google doesn’t trounce them first.

  • Sphere It

Trackbacks/Pings (Trackback URL)

  1. Strategy Services @ Vicorp
  2. Powerset’s Dilemma: Go For It, Or Sell
  3. Ok, Now It’s Done. Microsoft To Acquire Powerset
  4. 微软一亿美金收购了Powerset G日报

Comments

RSS feed for comments on this post.

  1. Hustle Strategy

    That and the fact that people seem to have learned to think in google. I learned a long time ago that I needed to search the way the answer would be returned, not search with my question.

    Microsofts help has tried to encompass this natural language stuff for a long time, I generally end up going to google and searching for msdn and my search terms.

  2. Stephen

    “The natural-language search startups are going to find out about that the hard way. If Google doesn’t trounce them first.”

    As a local search startup, http://www.trustedword.com, thought I’d try to drum up some traffic to our newly launched beta:. Join, list and review service providers you like or dislike.

  3. Cory

    “I think there is a world market for maybe five computers.” - IBM Chairman, 1943

    I neither agree nor disagree about natural language search, but just because someone important from a big, important company says it, that does not make it true.

  4. Bernie Lomax

    I’ve been hearing about PowerSet for the past year now and have decided they’re smoooooooooooooooooooooke. The demos they conduct are more or less just tests in which they know the results their engine will provide.

    As #1 said, learn to search for the answer rather than posing a question and you’ll be far more productive.

  5. EH

    Natural Language Search is like speech recognition: perpetually lacking and perpetually funded.

  6. Matt Clark

    Does TechCrunch have a ManCrush on Google. I know I do, but it seems like I can’t go a day with out another Google post….

    In regards to the post, isn’t a “natural language” with “better results” a “better user experience”…? I think they need to give the user the option to do either.

  7. Jeremy

    >>>Most people are too lazy to type in more than one or two words into a search box anyway.

    Definitely not true. Even in 2005, Hitwise data showed that 52% of search engine user queries were 3 words or more.

    And, according to David Krane (also from Google) 25% of searches, daily, they’ve never seen before…hard to type in new one or two word phrases all the time. :)

  8. Axel

    natural language parsing and natural language search will most likely have a greater influence on tools that search and mine information from conversational prose between people (blogs and mailing lists, for example) when indexing rather than providing a better interface to search directly.

  9. Chris R.

    If you’re reading, I want to thank who ever at Google sent my resume to Amazon in Seattle. I did a phone interview today but I bombed it pretty badly. I haven’t really used advanced SQL commands in a long, long time. I’m programming pretty exclusively in C++ and none of the questions grazed that. Thanks though :)
    I also got a written exam from another SF company well known to TC though, and I am really good at written C++ exams so hopefully that will work out ;) It was hard but I’m pretty good at C++.

    As for natural language search, I think it’s ok, but like the researcher at Google said it’s pretty useless.

  10. Amit Chowdhry

    No search engine algorithm is perfect. Each one has its own strategy and chooses whatever works for them. When I use Google, I don’t think to myself, “I’m going to use Google because PageRank works best for me!” I go to Google for its brand, simple search, and quick-delivering link results.

  11. Karter

    Natural language search is easier compared to speech recognition - because the problem of understanding whether someone said a phrase like “Austin” or “Boston” is not there - there is an unambiguous input mechanism.

    The example quoted in the article is a simple one where you are going to get results as accurate as the keyword search.

    But with a query complex such as “What is Paris famous for?” - one is better off asking the question and having NL technology do the interpretation of what all “famous” could mean and which Paris you’re talking about and organizing the results appropriately instead of learning to type a bunch of - and + after a few searches.

  12. The Fake Angela Hayden

    @9: Thanks for the incredibly interesting update on your life!

    Sincerely,
    The Fake Angela Hayden
    ART GODDESSSSSSSSS

  13. Karthik

    There’s the aspect of us being conditioned to use words and short phrases to describe what we are searching for. While I’m not certain, I suspect this has to do with the fact that it’s much easier/faster to type in a few words, scan the results, and iterate if the result set is not satisfactory.

    If a search engine can return the right answer every single time, I can imagine typing in the full question. But, would you even know what the right question is?

  14. Chris R.

    “Are you high?” - Erick Schonfeld

  15. Chad

    Glad to see someone finally point out the “would people actually use it” question. I agree with Norvig’s points about the backend being important.

  16. Jonathan Mendez

    Norvig is correct that disambiguation is a bigger problem to solve than natural language. The rise in mobile search will also ensure many more one and two word queries. Higher relevance with less information vs. higher relevance with more information? I think the first problem is much more interesting and brings many more benefits both to people and to whomever solves it first. It moves closer to AI which is where Norvig has always wanted Google to go…and where it will go.

  17. Ben Nadel

    I agree with “Hustle Strategy”; people have learned to talk in “keywords”. I think that’s from a familiarization with the computer world. I mean, it was only recently that my mom stopped putting “www.hotmail.com” into the Google search box.

    Of course, that kind of transitional generation will no longer be an issue as everyone being raised with computers learns to think in terms of tagging and categorization.

    I think better than natural language searching, Associative searching would be cooler (and I am sure they do this to some degree). For example, if I put in the phrase:

    how to html to Excel

    … Google would use the “how to” to know to only bring up results that are tutorials. Of course, the problem is that the more natural language that gets, the more ambiguous it becomes.

    I guess, that’s really what the problem is - natural language is ambiguous; but, tagging, keys, categorization is not (or less so).

  18. Elder

    “pooh-poohing the notion that people are clamoring to write full sentences in search boxes” might make good copy and catch the attention of the uninformed, but it does not actually speak to the power of NL search. The point of NL search is not really in the form of the query although Pell has hammered on that point a lot with his “end to keywordese” pitch. No, the real point is on the back end — indexing relations among words in a sentence or phrase not just keywords themselves enables more focussed and relevant retrieval. Norvig never addresses this point at all. But, let’s face it, Google is not interested in improving core search: they have too many other irons in the fire to worry about a really new search paradigm. It’s a classic case of Innovator’s Dilemma; if they did improve core search — by using NL, for example — it would be disastrous for them since the company has a huge investment in their 10-year old technology and upgrading would be hugely disruptive. Not to mention what would have to be overhauled to make their ad business work.

  19. NLPer

    This is akin to most in the IR (information Retrieval) community playing down the importance of natural language search. They always stress the “people being lazy and type only a few keywords” argument, BUT what the fail to highlight are the following points:

    1. People are lazy and type in a few keywords because we were “conditioned” to do so by their algorithms since typing longer phrases or sentecnes failed to yield good results. If we were able to exlpoit NLP effectively from the get go, I am sure things would have been very different.

    2. Disambiguation is part of NLP, so google is doing Natural Language Processing as well but at a smaller scale for obvious reasons outlined by others already.

    3. To say that Natural Language Processing won’t help is an irrational argument to make. If that were the case, I am not sure how Norvig and many others process information (perhaps they simply utter 2-3 words at googleplex). I think it is more reasonable to say that current state-of-the art in NLP may not be good enough to handle web search per se. And there are many problems in Natural Language Processing that still need to be solved before we can effectively deliver a search engine that will handle full fledged sentences. Having said that regardless of how well powerset is able to pull it off, NL is and should be the future direction for “automated search”.

    Also,

    “We don’t think it’s a big advance to be able to type something as a question as opposed to keywords. Typing “What is the capital of France?” won’t get you better results than typing “capital of France.””

    is just an example of an example hand picked to make his point. What one should ask is as humans when we engage in discourse, is it possible that we may also abridge the question “what is the capital of france” with “capital of france” and still maintain the “semantics”. In this case yes, since its not at all ambiguous.

    Also as someone pointed out as well, the power of NL should come from the ability to index documents that contain full-fledged sentences that describe the content of the pages more so that processing the query.

  20. Mike

    “Most people are too lazy to type in more than one or two words into a search box anyway…”

    That’s a pretty arrogant statement. How does he know what most people are? If this is the way google looks at it’s users then shame on them.

  21. Bjoern Wilmsmann

    People seem to be stuck on the way search engines are used today, keywords are claimed to be the somehow most natural way to search for things while actually the most natural way to do so is using full-blown sentences.
    I mean, if you are lost somewhere, do you walk around yelling keywords at people or do you ask questions that model your request as precisely as possible? Certainly, a lot of human communication is made up of noise (e.g politeness patterns, repetition, re-phrasing) that doesn’t matter in a human-machine interaction setting, which is why abstracting over sentences by using keywords is a very efficient and useful method in most of the search use cases thought of today.
    However, everyone can contrive requests (like the “What is Paris famous for?” example mentioned above) that neither can be easily expressed in keywords nor can be answered by conventional keyword-based search engines. You might say that these are just that: Contrived examples, but I think that we got used to keyword search so much that we cannot really think out-of-the-box anymore. There is plenty of space for further development of search engine technology, both UI- and backend-wise and this most certainly involves improved natural language understanding.

    Nevertheless, a serious issue that still persists for natural language search in the backend is scalability when compared to simple keyword-indexing algorithms. While keyword indexing can be done in linear or loglinear time, for example naive parsing algorithms in the worst case can take exponential time, while even rather efficient parsing algorithms like the Earley parser often still need polynomial time (O(n^3) in this case). This is the actual reason why PowerSet still has to prove to me that their particular approach of setting up a large-scale all-the-Web natural language search engine is computationally feasible at all and not just some expensive toy for investors.

    Maybe skipping the sentence level and concentrating on the text level is the way to go. Many of the elements that create textual coherence (and hence natural language understanding) operate beyond the scope of sentences. An engine that can create these kind of links could be very useful, though it would be beyond the notion that we have of search engines nowadays.

  22. David Scott Lewis

    And Google will keep saying this until they buy one of the key Semantic Web players, like Radar Networks, Powerset, TK or Hakia.

    NLP sucks on Ask, but Ask isn’t about semantic processing. With semantic processing, NLP starts to make a lot more sense. They’re not necessarily causally related concepts, but they’re certainly correlated concepts.

  23. .rb

    Note to self: Never call my users lazy

    example of Google’s theory:

    query: pallet jack

    query: I want to buy a pallet jack

    The results are 30% overlapping.

    The keywords ‘pallet jack’ over lap - 34% / Google is good.

  24. stone

    This is precisely why Powerset is a joke and will be in the deadpool within 18 months. They are solving for a problem that doesn’t exist. Google’s engine works fine.

  25. NickeyD

    If you read between the lines correctly, the guy probably just wanted to say “Sergey will buy Powerset if the price is right.”

  26. randomwalker

    For once a post about google that’s actually about search! Refreshing.

    Yes, this is the same line that google has taken for a long time.

  27. andy

    Natural language search will work as soon as the biologic intelligence can be reproduced in some way. The models must change enough to leave a path to natural evolution. May be the next step in life evolution leaves DNA for Silicon or any newer data-technology. The fact is that the actual research of AI is far from reaching a higher point, but the goal has been pointed out and many of us. (including me) are creating new paths everyday to lead to this target. Sooner or later, some of us (hopefully many) will succeed and this will be the moment in time where a next step has been reached in evolution: collective intelligence will be the answer. Today’s internet throws a glimpse on it, bettering every day, information is being debugged and improved by millions of people every hour. Wikipedia is a good example, good things prevail, bad things will disappear in noise.

  28. Ben

    Norvig is a pretty smart guy, technically speaking. I bought his second book recently, but haven’t had a chance to get into it. He’s a big Lisp/AI fan and has a lot of interesting topics on his personal page.

  29. Mr. Crash

    … Vint Cerf - about 8 months ago spoke at my university…
    He said Semantic web was very important for the future.

    And I agree.
    I know how these things work, I don’t type sentences in search boxes…

    Many people I know do.
    And every so often, I’m not quite sure how to phrase something so that I get the results I want.

    but i’d still consider myself relatively proficient at ”Google Grammar” as such, though I daresay, the extension to the language allowed, and considerably more directed results can’t be all that bad for both advertisers and customers…

    Surely an advertiser will pay more to be the ad for a query “Where can I buy Pallet Jacks in Seattle?” than “Pallet Jack” - the directedness is probably quite useful.

    But chances are, if you’re viewing this website, you’re one of the more computer proficient individuals on the planet - and I guess such a change doesn’t mean as much to you or me as it might to my 40 year old mother, who gets cranky when results are irrelevent.

  30. roddy

    while it may be true that people have learned to think “in google”, that doesn’t proclude other, more efficient types of querying from taking root.

    with NLP search, for example, you might perform your queries with your voice, in natural language, into your cell phone, and hear the answer from a vocoded sergey brin.

  31. Bjoern Wilmsmann

    @Mr. Crash

    The concept Semantic Web actually hasn’t very much to do with search techology. While the latter is about machines somehow automatically ‘understanding’ the content generated by human beings as it is, Semantic Web means more or less manually enriching the content in order to make it easier to understand for machines.

    This is why Semantic Web from my point of view won’t work at the large scale, I simply cannot imagine every content creator to manually create Semantic Web annotations just because Tim Berners-Lee has a vision. It might be justified in certain settings, where you have knowledge engineers dedicated to such kind of work, but don’t expect your average content provider to supply Semantic Web annotation any time soon.

  32. Omer

    You cant expect Google to say ‘were kind of doing something completely different because what were currently doing is a dead end’!

    Some of the above comments are right. Google/Microsoft are gonna end up buying companies like Hakia and until then what they (Google) r doing is the best.

    Where in fact Google hasn’t adapted to us (humans that is;), just like eskimos in the poles and dessert dwelling natives adapting to their environment we have adapted to Google.

    whats he is talking about is pretty much HARD CODING certain search terms that are getting a lot of hits. they know what hasnt been a good hit if people start going past page 1 or 2 and this becomes a question to solve and hard code into the system. so the next time it knows the difference between google bombed and google bombed…

    with all it might its exciting to see that even Google can go the way of the original WebCrawler in a couple of years against something that suits our natural base better…

  33. Ben Toth

    NLP is unachievable and unnecessary… but it’s a marketing tool for some. I’ve always thought it would be interesting to test Autonomy (a company which makes big claims about its software’s ability to understand text) against Google. Load Medline into both and see what happens.

  34. Bjoern Wilmsmann

    Ben, you mean just as unachievable and unnecessary as more than 5 computers for the whole world or more than 640k of RAM? Information technology evolves and there is plenty of useful work going on in AI, NLP and related areas. Besides, I bet you are using NLP regularly in one way or another. NLP techniques have been around in many products for quite some time now, just think about speech recognition systems used by call centres or your mobile phone’s T9.

    Autonomy tries to address different problems than Google. While Google can build upon the densely linked structure of the Web and an enormous amount of documents, Autonomy is more about finding that specific piece of information you are looking for on your company’s intranet that usually doesn’t sport as extensively linked documents.
    For Google it doesn’t matter if they miss some particular document as there is plenty of information available to fill that gap while in an enterprise context finding that particular document can be essential.
    This is why one may outperform the other and vice versa: They are both specialised tools that work best in the field they were designed for.

  35. Adam Jusko

    Interesting the timing on this. I just wrote a piece for Alt Search Engines titled Semantic and Personalized Search Are Fool’s Gold that makes a similar case.

    I use the example of people searching for “calendar of events” and how it’s impossible to know exactly what that searcher wants. If a human can look at a search query and still not understand exactly what the searcher wants, how is a computer algorithm going to do it?

  36. Jason Hekl

    I agree with several of the comments on this thread. There are numerous applications for search technology beyond the web. And there are
    numerous scenarios where keyword search just doesn’t live up to user expectations.

    Consider customer service, for example. The content available through support portals and in call centers is not nearly as disparate as you might find on the web. The density and frequency of product, company and
    industry keywords is much greater, which means keyword-matching engines will return very long result lists that force the user to comb through the results to find the information they really need. What the user needs is an engine that can take context into account, and NLP is one of the best ways to capture the context of the search session, and the intent of the searcher.

    Look around and you will find numerous examples – across banking, insurance, telecommunications, retail electronics, and more - where keyword search engines like Google and others have been displaced by engines with NLP and other advanced features. It’s not too hard to find examples to compare and contrast keyword search experience from NLP search experiences, even on .com sites. Try T-Mobile vs at&t wireless.

    Start with an end in mind (i.e. researching a cell phone or calling plan), then compare the search experiences on each site. Start with basic keyword searches, the type we all do every day to try to game the engine to get good results, and compare the experience against how well you find the information you need to make a purchase decision. Then compare the experiences with longer, more explicit queries that align closer to what you are actually thinking (i.e. what is coverage in NY, phones under $100, phones with a camera, rules for rollover minutes, online bill payment options, cheap phones). I think the differences are clear. Parity on shorter searches, superior experience through NLP with longer queries.

    I think keyword engines like Google and some of the other early pioneers were particularly effective in getting people to use search in the first place. But we are still early in terms of search innovation, and the technology continues to improve. To be dismissive of NLP is short-sighted.

    And one more point - to position keyword vs NLP as mutually exclusive is misleading. Most NLP engines are equally effective at keyword matching. It’s simply the case that the more words a user supplies, the more
    precise the answer.

  37. GopalaRao

    In my opinion people want to do things on the net . Google is the best tool available on the net if no specific site is known. Question is not wether few key words or a nlp query or phrase will solve the need. Question really is how do we empower the user with a tool as will let him do whatever he wants -shop,talk to friend,gather info… in the shortest time.

  38. Jyothirmayee

    I would like to reinforce the point that Karthik brought above.

    “There’s the aspect of us being conditioned to use words and short phrases to describe what we are searching for. ”

    Its not just this. I think the data that is being presented on the web today is tuned(read “dictated”) for Google - everywhere, instead of it being the other way round. Search Engines should aim to build technology or processes to identify the best content or suitable results to present as a human would have.

    I bet more than 50% of websites out there would want to change their titles, heading tags or layout if they did not have to tune it so much for SEO. It is like the websites are themselves defining if they would appear on the search results than google. This is serious thread and severe restrictions on the part of people who contribute content on the web.

  39. the*point*man

    Hello people - there are two pieces to the puzzle here:

    1) Speech to Text

    THEN

    2) NLP

    I for one, wouldn’t mind speaking to my computer or laptop and having it do things for me automatically. e.g. “Find me a recipe on making creme brulee” as opposed to typing that out. Why the hell would I type the entire thing out if I can just type, “recipe creme brulee”? It’s not about CONDITIONING - it’s about the INTERFACE (search box). I want to TYPE less because it’s a pain to type more than I need to.

    We DON’T TYPE in natural language - WE SPEAK IT. That’s why it’s called LANGUAGE. If we can get the speech to text right - then we need the machines to understand the text part — this is where NLP comes into the picture.

    Look at the BIG picture folks.

  40. Peter Fankhaenel

    I agree, people try to enter as few words as possible. But, when the fact I’m looking for cannot be expressed with a few keywords a whole sentence or more come into play. For example to find the “market value of knowledge management software in Europe” I have to sift through tons of documents Google gave me only to find out that there is no document with the number I need. If Google would be able to understand my question and tell me, it doesn’t have an answer, would save me a lot of time.
    Also, it would be nice, if Google would tell me it had 10 million documents to answer my question and ask me back details on the query so it could “drill down”.
    Finally, it would be great, if I could go in a dialog with Google (or any other search engine) in trying to find the final solution to my problem at hand.
    So I think there is a lot of room for improvement on the current state of the technology.
    And, yes I too am working on trying help it with a different approach to artificial intelligence and natural language processing.