Do not panic. We accept late submissions for TechCrunch50, but please submit soon. »
Will Powerset Pull a Google?
by Michael Arrington on October 5, 2006

Update: I received an email from a trusted source this morning that reads “…I’ve seen the demo (and spent a fair amount of time with those guys)…The technology is quite good and will certainly help on a non-trivial subset of web search queries. But they definitely have their work cut out for them. The indexing is far more expensive than traditional web search, and that means some real scaling and performance work to do…Can’t say much more without really pissing off the NDA gods, of course…(Oh, and those VentureBeat numbers are wrong about their funding.)”


I haven’t seen a demo of upcoming search engine Powerset yet, but reportedly many people who have are impressed, saying they’d never use Google again.

Powerset wants to let people use natural language when searching, including some words that search engines ignore today (what founder Barney Pell calls “stopwords“). Like most people, I’ve learned to change my language when addressing a search engine. I use important keywords and leave everything else out. Some of these search habits have even overflowed into my personal communications, and I find quick IM and email discussions often look like crazy, meaningless words thrown back and forth. Ten years ago I’d have no idea what those discussions meant. Today, they are effective in getting through fast paced business conversations with people I’m already familiar with.

So while I believe that search is far from perfect today I didn’t necessarily think that the solution was to create a better way for computers to understand what we meant. Frankly, I assumed that we’d continue to do a better job of talking in a way that computers understood, and advances would come in other areas (deep web, better algorithms, rich media search, etc.).

But Pell lays out a convincing argument that natural language search is important in order to communicate meaning and intent. He uses example searches to make his point - “book for children”, “book by children”, and “book about children” are all equivalent to “book children” to search engines today. His core argument is that there may be no way for us to properly express the query “books by children” without using natural language.

Powerset is looking for big money to launch their new engine. Venture capitalists are always the best source of rumors (the best time to hit them up for information is right after they’ve passed on a deal, or have lost it to another firm). If they don’t have a financial interest in the company, loose lips abound. With respect to Powerset, the rumors are that the company is trying to raise $10 million on a $20 million pre-money valuation. That’s a lot of money, but if Powerset pulls a Google, no one will care.

Responses

Comments rss icon

  • Natural language search. Sounds very promising. Wonder who’s gonna be the “David Filo” & “Andy Bechtolsheim” in Powerset’s case!

  • That’s a big if but worthwhile to watch. The only search engine I have found that useful other than google is http://www.clusty.com

    The other thing I wonder is if techies like us will be able to adjust backwards for natural language if this actually works the way they say it will.

  • Awsome marketing guy behind this. I loved the argumentation why natural language is usefull in a search engine. Dunno how good it will be but I’ll be willing to give it a shot after that example

  • It could be a great marketing strategy or may be they really are onto something revolutionary.

    Point to note: S.E innovation didn’t stop with AltaVista/AJ. Google too had its share of skeptics before gaining acceptance.

  • What they are saying makes sense to me. It is clear that the search user interface will be improved in the coming years.

    Encouraging users to make use of the stopwords and providing better results based on that usage is really interesting:

    books by Bradbury about carousels
    sales data for Apple iPod
    recipe for onion soup

    I wonder what proportion of searches will actually benefit from this enhancement? In many cases, the inclusion of the stopword and the added meaning that it brings is just not helpful (example: “recipe for onion soup” vs. “recipe onion soup”).

    Is this a great new feature to incrementally improve the best search engines or is this truly a paridigm shift and a new type of search engine?

  • Can they really understand a meaning of a search query and return the most relevant result? What tech do they use?

  • If you put in quotes with the search, those will be different searches in google.. I guess my question is how will they find what I am looking for without just looking for the links that contain the words “Books by children”

    if thats all it is doing.. then can’t google already do that?

    Maybe I am just confused about what it is supposed to do.

  • Nick, the quotation point you raise is a good one. I often put queries in quotes, too, to find better results, and it works. Until I see Powerset first hand, though, I can’t compare.

  • I commented about this on Matt’s post about Powerset, and I’ll post the same here as well, since it’s relevant.

    But I’ll add about the quotes — yes, Google can do that. And in fact, several search engines will still take word order and proximity into account even if you don’t use the quotes. For example, in the books by children search, you can see at least one of the results comes up with those words in that exact order. Doing a phrase search is better, but phrase searching doesn’t always improve things. That’s one reason why AllTheWeb stopped doing it automatically several years ago (plus, it takes more computing power to perform).

    Honestly, it’s not that you want to figure out “concepts.” If it was just that, Excite would still be around. They pitched “intelligent concept extraction” back when they launched in 96 or so. It’s that you want to help people understand general topics related to an initial query, then help them narrow down. Clusty offers a great example of this — but it’s one that people simply haven’t taken to, for whatever reason.

    Here’s what I left with Matt:

    Wow — so natural language searching is going to be the killer knockout? The search space is littered with companies that have promised this was somehow going to be a great advanced but never went everywhere.

    The reason is simple. You don’t need to do a lot of conceptual analysis when the typcial search query is two to three words long.

    Take “books by children.” What exactly is the by doing conceptually? Nothing. When Google drops it, you’re getting stuff mainly about childrens books. Understanding by helps you know that you want not children’s books but books BY children. But I don’t need some big natural language to alter that. I just need to put all three words into quotes, and the “conceptual” query becomes clear.

    Heck, AllTheWeb used to kick in auto phrase searching like this in the past. And if I want to do concepts, I could shove things like Clusty’s topics up top.

    Honestly, the Powertech story I’ve heard so many times before, including the reputable investors or others that are supposed to make you sit up and take notice (think Accoona and Bill Clinton as spokesperson — that was only last year, and Accoona has gone nowhere).

    Maybe I’ll make the time to highlight some of the ones that pitched and disappeared over the years, becase Powerset pretty much sits right within that model. It’s probably energy better spend other ways, however.

    As always, I could be wrong. Powerset might be the right tech and the right tech. But it’s such a well worn story. If they don’t realize how many others have made this pitch already, that especially makes me wary that they’re going to be successful at all.

  • I really hope that’s not their production domain name. “Powerset” is confusing in a not-fun sort of way (like Google or Yahoo) and too long.

  • Danny,

    This is the first time i see you commenting here , and you said its the second time you comment on Powersett , negatively of course, what’s up?

  • “Frankly, I assumed that we’d continue to do a better job of talking in a way that computers understood”

    Highly suggest picking up a copy of “The Inmates are Running the Asylum” and giving it a read Mike….

    I think it’s important to remember that most of us are actually not typical users of any of these technologies. Sure we can twist our language to entice Google to give us the results we “want” but think of non-technical users who have n idea how queries work - they don’t really stand a chance.

  • What will kill Google is not better input, but better output. Ask Jeeves! And that needs better taxonomy than we have now, and intelligent sifting after searching based on ME.

    To do this generically is hard, I would expect to see valuable niches being colonised by better designed taxonomies than the Google model. Maybe even by Google, but they will probably just buy them.

    Let the metadata wars begin.

  • Don’t forget the bad cash burn experience of a search engine with a natural language “Albert inc.” three years ago

    http://www.albert.com/en/

  • iPhrase Technologies ( which was acquired by IBM last Nov ) offered both natural language and guided search ( like Endeca ).

    It’s primary use was to improve e-commerce sales, online service and support, and call center productivity.

    According to IBM, iPhrase’s technology is compliant with IBM’s Unstructured Information Management Architecture. It will be used to help customers and partners discover inherent meaning from a wide variety of data.

    Terms of the deal were not disclosed.

    While the iPhrase purchase seems to fit in with IBM’s information on-demand strategy, what’s less clear is what the future will be for iPhrase’s customer self-service applications.

    BTW, IBM and Google were purportedly teaming up to provide search capabilities for corporate databases but I haven’t heard much more on this for quite some time.

  • There’s a similar story over at http://www.lexxe.com, a Sydney based NLP search startup (not in stealth mode, you can use their alpha product).

  • See http://www.extate.com - natural language search applied to specific verticals. One’s to watch. London, UK based too and from Imperial College’s Computer Science Dept.

  • As I understand it, the idea behind natural language search is that you don’t search on keywords per se, but on conceptually similar words.

    For instance, a Google search would take the “books by children”, and split them into the keywords: books AND children. “by” is a very common word and will appear in the “stop list”, so its not included in the search. Your results are then shown in the order where both keywords are mentioned the most times. Simple.

    In a natural language concept search, it’s looking at the individual word “books”, which is conceptually similar to book, newspaper, magazine, writing, pages, etc. along with the other word “children”, which is conceptually similar to child, kid, young adult, etc.

    So your initial search of “books by children”, is actually searching on so much more information that is *conceptually* similar. Thus your results are far wider reaching and you’ll get hits that you would not have got with a straightforward keyword search.

    Basically, this allows those that aren’t so hot on picking out keywords to get good results and not have to resort to horrendous Boolean type searches (+books AND +children OR “books by children”)

    It’s a novel idea, but if you’re already used to keyword searching and manipulating Google to bring back good results, you’ll find concept searching a bit of a step backwards and you’ll be back to Google before you know it.

    However, If it manages to combine a good boolean keyword search engine alongside conceptual searching, then it may just fly…

  • The Register reported recently that Google only covers about a third of online information so the problem to address lies there and the complacency that googlemania brings rather than in the search process itself.

  • Let’s us wait and “powerset” ur search, ne more GOOGLING..it..

    But , i think searching for something should more often point to something that u can’t get easily. There are once when I am doing my final year assignment on some tech subject. The result are not data driven but more like money driven… cuz now everyone tends to cut the queue and try to make it to the no1 google spots. Something that I don’t like. I try to search for like how this vending machine works …. the end result, please buy this vending machine or earn 3K a day using this vending machine. What i want is the blue print of the machine….

    Can POWERSET help? What if I type in ” do you have the blue print of a vending machine ? ”

    CERTAINLY, is not in stealth mode anymore, since POWERSET is creating so much buzz around….

  • They’d better be dang fast if I’m ever to switch from Google.

    Of course, Google might just buy them.

  • I smell failure, save your bucks.
    95% of search queries are one word.

  • I guess they’re banking on some kind of semantics engine which will interpret a page about a kids book that may not neccessarily contain the keywords book and kid to be highly relevant to a search for ‘books for kids’.

    I don’t believe a search engine should ever go all ‘natural’ though, as the ability to refine a keyword search is going to be much more accurate in some cases. Offering both as options should be the way forwards, I would imagine Google are looking at this closely.

  • The problem I see with this? Simple. Google have some of the smartest people in the world working for them, and whats more, they could either buy this venture out if it evens threatens them an inch or they could replicate it.

    Get a bunch of 20 PHD students in 1 room who have studied natural search characteristics and all of a sudden you can clone pretty much any “natural search engine”.

  • Google have some of the smartest people in the world working for them, and whats more, they could either buy this venture out if it evens threatens them an inch or they could replicate it.

  • Googlel can’t “buy something out” if it isn’t for sale.

  • This depends on what they want to do.

    If they are simply taking stopwords and word order into account, this is an incremental change that is not very significant.

    If, on the other hand, they plan to do some level of semantic anaylsis to “understand the intent”, then that’s very significant. To do what they say… that is, understand the difference between these sentence fragments and map that to the underlying content that is being searched COULD be game changing. But again, it depends on what they plan to do. Will it simply be checking for synonyms, or will there be some level of semantic tagging of indexed content.

    Semantic tagging of the underlying content, when used in combination with natural language search request intent understanding is the game changer. But it is an enormous challenge on a Web scale… the Everest of the Web search world (for now).

  • I read sometime back about ontology based search or clustered searching which is much more effective than traditional search mechanisms. When I googled ;-) around for ontology based search found this engine:

    http://www.vivisimo.com/

    Vivisimo gives a clustered based result which I think is much better than just simple natural language search. Worth giving a try.

    Any one can throw more light on this :-)

  • Hey Amadeus,

    > 95% of search queries are one word

    But are 95% of search queries on word because we have found out the hard way… when we first start using search engines… that anything more sophisticated that keyword search is useless. If something that really works comes along, will people change their search habits? It might take time… like changing your golf swing… but because of the benefits at the other end, I think it would catch on (in a viral way).

  • @Jessep - interest comment “Google can’t “buy something out” if it isn’t for sale.”

    Everything is for sale - it’s just at what cost someone is willing to sell it for. Let me illustrate via an easy example.

    You own and have created a search technology better than Google’s. Google offer’s you $100 Million for your company. You decide that you would rather buy a $10 million dollar yacht and bronze yourself in the Bahamas’ for the rest of your life and never have to worry about working again. The $90 Million sits in the bank, you earn roughly $6/7 Million a year on interest which builds you your Condo Penthouse.

    See my point?

  • I see Tcruncher2’s point, but its tough to tell how long a successful website owner should wait before selling. Remember the rumor of Facebook wanting to be sold for $750 mill and now could potentially sell for $1 bill:

    http://www.techcrunch.com/2006.....ype-dance/

    Between you and I TCruncher2, if we were Zuckerberg and the rumor was true, we’d probably cash out and run with the $750 mill, but then may regret it later knowing that we could get more.

  • @ Amit

    Very True - its all a matter of how long you want to stay in the business, and whether you own all the business or whether your shareholders are requiring that you stay in the business so they get more cash.

    Facebook for example, I doubt Zuckerberg owns 75% of the company that would allow him to control and over-rule any decision. 51% gives you the majority of shares, but you need 75% to over-rule any decision made via a “special interest” vote.

    If I was Zuckerberg - the yacht is sounding pretty good :)

  • Sounds good in theory, but:

    * Will they be able to have the results return in fractions of a second? (considering how much more intensive it is)
    * Will the results be relevant?
    * Will the results be current, so will their index be both broad and very up-to-date

    Also, encouraging more words in search means that you get more variation search queries (a longer-long tail if you wish) which changes the way advertisers specify what they want to search based on (they can’t have them advertise just on keywords)

    I am more interested in how much more computation this requires though - even having 2 words in a search query is hard enough, but figuring out which meaning of those words the user really wanted is even harder. These guys will need some sort of magical optimization technique for it to work well and scale

  • Vaporware!!!

    That’s great! I love hype and marketing crud like this, go get’em web 2.0! See how much capital they can raise and loose between now and the failure to launch!

  • I wonder if they will be as awesome as Dipsie!

  • “But Pell lays out a convincing argument that natural language search is important in order to communicate meaning and intent. He uses example searches to make his point - “book for children”, “book by children”, and “book about children” are all equivalent to “book children” to search engines today.”

    I don’t know what’s so “convincing” about this argument. Has anyone actually tried running these queries?
    http://www.google.com/search?h.....tnG=Search
    http://www.google.com/search?h.....tnG=Search
    http://www.google.com/search?h.....tnG=Search

    Doesn’t look that bad. Definitely not “equivalent” …

  • Sreen, thanks for the link to Vivisimo. Their search engine at clusty.com seems to be better than regular search engines for exploring an interest.

  • I’m with Danny Sullivan here.

    I’m surprised nobody mentioned “to be or not to be” query, as it’s a common example of all-stop-words query that….. Google handles well: http://www.google.com/search?q.....%20to%20be

    Powerset guys are curely going to offer more than just indexing stop-words. :)
    Natural Language Search typically means that you can enter “Natural Language Queries”, and not just keywords. For instance: “Who was the president of the U.S. during the first war with Iraq?” From what I know, this often boils down to stripping useless terms, and construction of a query with seminal terms. Of course, one can apply POS on the original indexed text or a query if it’s a proper sentence (but in which language!? ;)), throw in synonyms, and other common search engine smarts.

    I’d also agree that Google has a pile of people who dream about this stuff, so Powerset will have to try extra hard to beat that. But it IS doable. Remember AltaVista? It was super-good at its time, yet….

  • Average search term length has risen from 1.2 words in 1996 to 3.3 words in 2006 according to Yahoo!. Users are getting more sophisticated and the growing heterogenous corpus we are indexing (i.e. the web) is better rewarding longer queries.

    You can see Yahoo!s data here (See http://blogs.zdnet.com/micro-m.....x.php?p=27 )

    And important to reprise this piece of academe:

    uery length is positively correlated with user satisfaction with the search…longer search queries result in increased search effectiveness in general, indicating that more words from the searcher describing the person’s information problem results in better interactive IR performance

    Let’s not fall into the trap of learned helplessness. There is always room for improvement. Think speech driven interfaces; specific entity oriented queries; cross-language queries. The world’s big place and not everything that is to be invented has been invented.

    Disclosure: I’m a friend of PowerSet.

  • about the coolness of “powerset”:
    abc = {a, b, c, ab, ac, bc, abc}

  • Umm. Real Language support is already included with Google searches. If you want the “stop words” included in the search, then wrap your query in quotes.

    Powerset doesn’t stand a chance if all they do is to try and reinvent the wheel.

  • bring it on! i’m getting tired of big brother google. we need new blood to keeps the big boys on their toes.

  • This is a search engine that uses Yahoo’s Web services to pull in results. But it clusters them into groups and has related search terms and finds important keywords within the search results for deep search.

    It’s really a demo for TicTap Contextual Ads (http://www.tictap.com/tca), but it works.

    http://yws.tictap.com

  • What could an NLP search engine like Powerset do to improve the results of the query “synopsis of books about the civil war”? Barney Pell, founder of Powerset, uses this query as an example in his blog.

    Take a look at these results from Microsoft’s Live Search. Better than Google in my opinion.

    http://search.msn.com/results......;FORM=MSNH

    Todays search engines do a great job of finding “books with a synopsis of the civil war”, but not a “synopsis of books…”. Understanding the query was not the problem. Aggregating the results and presenting them as a synopsis is what is not possible today. I don’t see how NLP search engines would improve the results. Am I missing something?

    I was formerly a director of engineering at AltaVista so I have an ongoing interest in search issues. I wrote an indepth blog about this today. http://dondodge.typepad.com/th.....atura.html

  • big brother? but didn’t google promise not to be “evil”? for the record, that was just adorable.

  • Speaking of running software in this area, we have developed a natural language interface for search engines for Spanish, not for English (although it’s under developement and will be out in the near future). You can test the Spanish version running against a complex and popular document collection in Spain, the BOE or Boletín Oficial del Estado.

    The demo is available at http://www.bitext.com, Demos section; or at http://demos.bitext.com/BOEBitext.

    This demo runs with (the outstanding search engine) dtSearch. We plan to release soon another demo where we integrate our NLP software with MSN index and API.

    For those of you with some competence in Spanish, please, try it and let us know.

    Now let us add our two cents to the discussion.

    1. For us a natural language interface should do at least three basic tasks

    -spell check the query (and the docs, if possible)
    -enrich the query with words that represent the same meaning as those used in the query, including but not only synonyms; in other words, expand recall
    -allow the user to express his/her information need naturally, including the interpretation of complex phenomena like negation (holidays in the UK but no in London), disjunction (holidays in Manchester or Liverpool)

    Besides, whenever possible, the same processing should be applied to the query and the indexed text.

    2. Natural language is not the holy grial of search engines. Incorporating NLP technology to a search engine is adding an excellent complement that certainly provides very exciting advantages (usability and precision, mainly); however it doesn’t make a totally different application, it continues to be a search engine. In other words, a good search engine benefits from a natural language interface, but a poor one would not change drastically.

    Thanks!
    Antonio

Leave Comment

Commenting Options

Enter your personal information to the left, or sign in with your Facebook account by clicking the button below.

Alternatively, you can create an avatar that will appear whenever you leave a comment on a Gravatar-enabled blog.

Trackback URL
bugbugbug
The CrunchBoard
  • MediaTemple Logo
  • QuickSprout Logo
  • OpenX Logo
  • Cotendo Logo