Silicon Valley based search engine startup Powerset has mostly been closed lipped about their product. This makes sense given that they are gunning for the fastest growing Internet company in history, Google. But their excitement over an exclusive deal to license PARC search technology was too much to hold in - the company made what looks to be an exclusive announcement through VentureBeat today about the arrangement.
Powerset argues that the key to killing Google is in natural language search. We discussed this in detail in an earlier post about the company. While other search engines tend to ignore common words in search queries (words like “by”, “for”, “about”, “of”, and “in”), Powerset focuses on those words to try to determine meaning and context. Read CEO Barney Pell’s post about the natural search problem to get an idea of what they’re talking about. To get right to the point: Google treats the queries “book for children,” “book by children,” and “book about children” as equivalent to “book children.” Powerset, however, promises to serve results that are relevant to each query.
There are a lot of critics of Powerset, who note for starters that Google will treat each of the above queries differently if user simply put them in quotations. Search engine expert Danny Sullivan took the Powerset idea apart piece by piece in a long article last year, saying “natural language search makes a compelling pitch for those who really don’t know search or haven’t heard the natural language mantra before.” Search experts we’ve talked with about Powerset agree to varying degrees - the problem is very, very hard to solve. And users have largely learned to simply change the way they search to get what they are looking for.
It’s impossible to judge Powerset before we actually see the product. They are, however, trying to solve a very difficult problem for which there may or may not be not much of a market (at least compared to simple, 2-3 word searches). And Google has “several teams focused on natural language and dozens of Googlers with a PhD in the field, including myself,” says Peter Norvig, Director of Research, Google. If we start to see some of those PhDs leave Google and join Powerset, that will be the first sign that the company is really on to something big.
Powerset is well funded, having raised $12.5 million from Foundation Capital, the Founders Fund and individuals, and boasts a $60 million plus valuation. Matt Marshall, author of VentureBeat, is clearly impressed after having seen a demo of the product - so much so that he’s largely become a cheerleader for the company. This is one product I’m looking forward to seeing, and hopefully they’ll return our emails soon. Maybe we’ll join the squad, too.





i’ll believe it when i see it.
me too!
I’ll see it when I believe it.
a search engine that further refines your sentences’ search
i don’t see myself using this right now - google more than suffices for me
i give the co. a so and so rating, at best
Putting words in quotes doesn’t impart the meaning, it’s just multiple words in an order. I don’t know if this is the engine to make language search work, but I do think it will happen, and I do think it’ll be fairly important.
I also just don’t think bettering Google’s results is out of reach of startups. It was amazing when Fast came along how quickly its results were on par with Google. And look at Gigablast, it’s pretty much one guy, and I doubt most people would notice the difference if they used it for a day.
This is bound to often. There are few startups in the valley working on NLP and on other ideas. But is really really hard to believe anybody can really take on Google but that is what people thought about yahoo back in ‘99.
Lately Google had lot of pre-monopoly signs, time for spanky.
What you’ve written in the article isn’t actually true - [books for children] and [books by children] (without quotes) produce different results on google - the second link when searching for [books by children] is, sensibly, http://www.booksbychildren.com/ - incidentally this is one of the ugliest web sites ever.
That’s great Tom. Thanks for stopping by.
Complexity of NL searches is the combination of words means everything, but if in English language there are …I would say 60000 words…then you can count or not what are the posibilities if we will use only 5 words different combinations…for 60000…not mentioning other languages with more rich word dictionaries…another issue to look at it - how to model data storages to keep such volumes of data when allowing searches by ‘phrases’…there is no software-databases that can make any analysis on such data amounts and etc….for current situation of ‘things’ with software,databases,storages none of the algorithms for ‘organizing’ PHRASES (not single words) has not been patented or so…I don’t believe in ‘killing google thing’…
DATA VOLUMES TREMENDOUSLY increased with PHRASES versus WORD searches….etc…
If you want to beat Google you need to turn search in its head. Natural language search is not that. One of the projects I am working in is doing just that, turning search on its head. I wish I could share but it is one of “the secret sauce” projects I am working on for my company.
Now … I am not claiming that it works … I am claiming that I am working on it
The thing is that many of us have grown to type search queries as we believe the search engine will better serve us the results. More or less, we know how to communicate with these “primitive” search engines.
If now suddenly we’re told that we need to talk to a search engine like we were talking to a librarian, it may be hard to break the habbit, although I look forward to it.
Any progress done on search is a great thing. The day SEO businesses close shop will be the day that search has gone to a next level. Whether it’s done by Powerset or by anyone else.
PS: I have nothing against (white hat) SEO techniques, but the fact that they exist implies that search is still too dependant on structure, rather than just content.
Fabian - With all due respect.. What the heck does “turning search on its head” mean?
If anyone wants to beat Google, all they have to do is produce better results. Google seems to have gotten better lately but too often when I search for product reviews, Google is littered with useless hits from CNET, Nextag, Bizrate, Pricegrabber, etc. These sites rarely have decent reviews and shouldn’t wind up on the 1st page of hits. Then there are actual questions, which Google doesn’t often produce useful results for.
I didn’t know if Powerset will wind up being successful, but they sure are spending the VC’s money to live well. They are located in expensive SF, everybody gets Macbook Pros, huge monitors, free food for all, etc.
This from their careers page:
Inside Powerset
* We are in San Francisco, so get rid of that hour and half commute both ways
* We are located 1.5 blocks from the train station for those coming from the South
* Dogs are welcome and enjoyed by all
* Free food of course and Pete’s coffee
* MacBook Pros are standard (no more PCs)
* Huge monitors for everyone that pivot landscape or portrait
* Nobody has an office, everyone sits together
* Awesome loft style office with a balcony that has great views of the city
* Great restaurants nearby to enjoy the city life (no more cafeteria food)
* 1.5 blocks from the train station and public transportation in the city
* Traveling in to work - we will give you an EVDO card and train pass
* Fully outfitted game room with pool table, Xbox, Wii machine, PlayStation, 124 trackdigital audio and visual studio, bean bag chairs and pull out couchbed
Benefits Package
Powerset also offers all of the standard benefits for its employees:
* Medical, dental and vision plans with no premium for employees
* 401(k) roth plan
* 15 vacation days per year, plus 10 company holidays and 2 floating holidays
* Subsidized parking for the city
* $5000 signing bonus if you move within one mile of the office
I always worry about people using MACs to develop for Windows as programs translated from the MAC never seem to work exactly right on Windows machines (still 95% of the market guys).
And they seem to be looking for a lot of engineering help right now. I didn’t notice any sales help listing so I guess it will be some time until they go to market. The best product in the world doesn’t have much value until you can sell it to users.
RBA - in my case it has to do with how searches are conducted and what indexable content you go for. You can go for processing queries - Google et all, including Powerset - with ever more complex algorithms, or you can figure out a different search model that has a different set of semantics.
I know it sounds a bit cryptic but can not say more than that without giving away the secret.
Fabian- Very well. Thanks for the explanation and good luck!
Did they *license* the core tech or did they just bring over the brains from PARC search?
What happens with non-English queries? They fall back to the ‘traditional” TFIDF search?
Everything I read about Powerset so far (not much), and including this post, uses examples where *queries* are supposed to be interpreted differently (e.g. stop words aren’t getting dropped). While this may be the case, Powerset is probably trying to *index* the *content* differently, going beyond the “bag of keywords”++. Your examples should reflect that. Big difference!
natural language would indeed be a breakthrough but this raises a question about data mining and scalability. it would work if, for example, every web developer uses microformats, but we all know the web is not semantic.
and i have to say it is hard to believe that there is an algorithm that would automatically sort things out.
Michael,
Let’s be realistic here. Why didn’t they choose you to evaluate the latest version? Because you would have done the right thing and torn it apart. Powerset may as well skip right over the consumer offering and head into plan b. as you call it, the overcrowded nlp enterprise search space where they can go head to head with inxight who also has that magical xerox parc code. PS. Google has the most sophisticated NLP platform in the world today, many of these components are live in production already, they just take a backseat to some of the more noticable indexing feats that google has achieved. You should do a feature called is Google useful? and analyze a day’s worth of search and let us know the results. In the meantime congrats to powerset for all of those lucrative yahoo hires, too bad they didn’t get lloyd and for the ultimate powerset test, I want all employees of powerset to powerset: skrenta google winner
if for some reason that does not work for you google it, and read the post.
Rock on Mike you are doing fine keep fighting the good fight and never get involved in the hype
We tried to use a kind of natural language question answering system for our webpage. It is not a “true” natural language system because we use templates and keywords for searching, but this is not the key point.
Only 1 -3 percent of our users actually use more than 1 or 2 search terms. Nearly noone will be used to this kind of search method writing a whole sentence in a query box. So the majority of users will stick to the old methods of Google. I am not sure, but I don´t think this kind of search engine will get a huge market…
Just my opinion….
The problem is, almost everyone that uses the internet has been “trained” to use keyword search engines… To put it quite simply, people might not know how to use a natural language search engine.
@Robert: It wont be difficult to get trained.Google might have changed them a little to use just keywords but still we are humans.We always tend back to iur basics.
When they will know they they can search using natural english it wont be difficult to get people searching like that.It will still their choice to use it or not but i guess it should be a success.
I would love if it happens!!!!
The PARC technology is old news, and the public will not be able to tell the difference. That’s the key to displacing any contender: your Mom (not some NLP expert) has to be able to tell in ten seconds that the new engine is solving a real problem. Then they switch. This is not enough of a step.
The top searches will probably be what is Google? What is Yahoo? What is MySpace?
I am assumming the macbooks are only for developing html/web stuff.
No way you can do index work / nlp on a macbook (or any type of laptop).
We just had to up all our dev machines RAM to 8GB, and our indexing workload isn’t even in the same league as would be needed for a google competitor.
Could they be VMing onto servers for dev?
The problem any robot-based search engine is going to have is that it has to get people to look at it, try it and make a switch. The only way to make that happen is with a major partnership that makes the competing search engine a default engine at a big site. Become the default engine at MySpace or FaceBook or something of that caliber and then you’d have a chance.
Having better search isn’t enough—just ask most of the companies that have had had better software than Microsoft.
“I always worry about people using MACs to develop for Windows as programs translated from the MAC never seem to work exactly right on Windows machines (still 95% of the market guys).”
Not an issue for powerset, 99% of the effort will have nothing to do with the UI or end-user platform. Anything other than a *nix OS for development would be unproductive.
“I am assumming the macbooks are only for developing html/web stuff.”
Why couldn’t they develop algorithms on their macbooks? No matter how powerful a desktop PC is, it’s not going to index much more than a laptop anyway, relative to the entire web. Any serious work will be done on serious servers.
@Adam,
Perhaps they could license Google indexes of the Internet to cut back on relying on their own robots.
I believe their challenges will come down to trust and user adoption. All search engines have faced these challenges and will continue to confront inquiries around trust and the security of search data.
“And users have largely learned to simply change the way they search to get what they are looking for.”
It’s true. Most users browse 1-2 pages deep, then modify their search query. This will be a hard habit to change.
a commendable venture but too late.
1. it is not entirely clear that powerset search will be better, sullivan is spot-on in his critique
2. assuming (1), even if powerset is better, it is only going to be marginally, perhaps infinitesimally better (since, as sullivan points out, most search queries are very short, no stopwords). and query processing is just one aspect of search. mundane aspects like ops, building server farms, etc all matters now too and is really on solvable by $$$$$$
3. what is the market for a search that is sometimes 1% better?
4. how do they address branding? it is going to be difficult to dislodge the top three engines unless you can also do the branding.
powerset is a great bubble indicator…people throwing gobs of money to get a 1% improvement over market leaders.
possible exits - selling to the military or other govt agencies who might see some juice in new NLP. i think its a given that PWST will never be a ticker in your portfolio.
what a great time to be dell or hp…so many pointless .coms all trying to do the same thing - SCALE. everyone is trying to do server farms.
Michael,
Google’s personalization of search results is the best way to get most relevant results. Twisting your example a bit , if i am searching for “books on Indian History”, Google gets me the most relevant results. Because Google knows I am talking about India(ns) from my past searches.
Log out of Google and do the same search again, I results but not very relevant to what I was looking for. To achieve this kind of results, the search engine has to learn and google’s been doing just that for the past so many years.
the way search is evolving any effective search engine has to have a “attention metadata” to get high percentage relevancy. If Google continues to be at the top of this game for a few more years, it will further increase it dominance, simply based on this huge “attention metadata”, making it difficult for users to switch.
Just like an old Yahoo or Hotmail email address….which you hate to use but cannot make the switch.
I don’t think people understand how significant this will be *if* Powerset has really cracked the nut of natural language. Advanced users, like almost everybody here, know how to use Google to refine queries. Most don’t have a clue about that, so the results they’d get with a good natural search will be dramatically better than with Google.
But I like Mike’s title because it’s all hype right now. Let’s see some searches!
Consumer-quality NLP Search with Web-quality responsiveness is a very hard nut to crack. I won’t be dismissive of Powerset, but shall remain imprisoned by my skepticism until I know more.
One of the ways I’ve evaluated NLP in the past is the Negation Test and if Powerset can pull this off I’d be truly impressed.
Example of a Negation Test:
‘movies with Ben Affleck’ versus ‘movies with anyone but Ben Affleck’
“Why couldn’t they develop algorithms on their macbooks? No matter how powerful a desktop PC is, it’s not going to index much more than a laptop anyway, relative to the entire web. Any serious work will be done on serious servers. ”
Because when writing indexing and NLP algorithms you need a vast amount of lookup data. When writing NLP algorithms for example, you normally start from a conceptnet (commonsense database) - a modest one of these will be atleast 400MB. You can’t simple “create your algoritms” then “run them on the servers”.
Also because of the logorithmic nature of almost all indexing (read: relevance) algorithms, you need to run them against a significant set of data (usually 100,000 records or more - we have formulas to calculate minimum datasets). These days you have to run this stuff in RAM otherwise you will be waiting 5-10 minutes each time you run an app for debug.
ouch my spelling is bad
I think that it is short sighted to imagine that people have “learned to use Google” and figured out how to more quickly get the results they desire. Everyone here knows how to do that, but the TC group of early adopter types is a far cry from the common user.
I know a large number of people who astound me daily by how they use computers, including the fact that many of them still “ask” google questions or type in phrases and don’t understand why that often produces worse results (relative to what they are looking for) then if they just typed in a single topic.
Like most folks, I think that the surface of search has just been scratched. While there will be a tendency for Google to remain the big dog on the porch, there is plenty of room for other players, particularly those that can successfully deliver a different way of looking for information that produces satisfactory results for the user.
Another thing… as noted by Otis G, the objections about short queries etc. are an insufficient refutation of Powerset’s potential.
Search can also be substantially improved by using sophisticated NLP during the indexing process. With a sufficiently rich index (contextual meta-data, conceptual relationships, and that sort of thing) short queries can produce great results. And that’s not counting personalization.
A hybrid approach that uses deep NLP on the indexing side, and opportunistic, lightweight NLP (rules) for handling queries probably has the best chance for success.
Who cares if they have better search than google? They need a better interface and a better ad delivery system. That’s where google really won.
WEB 2.0? WHAT?!?!?What?? No way?!? $12.5 million from Foundation Capital for Powerset. You got to be kidding me. Natual Search Engine. Trust me, it’s waste of venture capital’s money. All the search engine they can’t find correct stockquote.
For example,
Type “y” in Hakia.com. — you get error
Type “y” in Google.com — you get “Yahoo! Mail - The best web-based email!”
Type “y” in Yahoo.com — you get Yahoo! (Nasdaq: YHOO)
Type “y” in Ask.com — NFL.com - NFL Players
Type “y” in live.com — Urban Dictionary: Browse Y
Type “y” all the search engine. You get wrong informations.
My search… http://alt.active.ws
Type “y”. You get “Alleghany Corporation”. This is my “wow” search engine.
I start the tiny company less than penny.
Venture capital should’ve invest DNA search engine where scientist & researcher can find cure for Parkinson’s disease, AIDS, HIV, etc.
As promising as their natural language platform sounds, the greatest threat to Google’s growing hegemony in the search/paid search arenas…given that about 1/2 of all searches are known to be for products and services…may actually spring not from better search, but from patent pending (#11/250,908) paid match, which will target people’s actual demographic and psychographic traits and characteristics (keytraits) instead of just the words we all type into little search boxes.
Though paid match is not yet an operating system, our own US Dept of Labor does run a very popular service (over 500,000 users/month) which provides an enlightening and instructive peak at the potential that such a paid match search/ad platform possesses.
Called GovBenefits (available at govbenefits.gov), it utilizes a personal profile and a match engine to determine what government benefit programs people qualify for.
Were such a system populated with the 100’s of thousands to millions of products and services companies provide nation/worldwide instead of just the 400-odd government programs it includes now, one can only imagine what its public popularity would be…
…and with the world’s advertisers having the ability to pinpoint target and control; via bidding directly on those keytraits most relevant and applicable to their products and services, exactly who sees their ads (goodbye click fraud); one can also only imagine the deleterious effects that such an elegant and superior system/platform would have on a 95% PPC income dependent company like Google…
…and as for the million+ users such a system would ideally initially need to appeal to the largest number of advertisers?
…just one simple invite e-mail to their 100+ million e-mail users…along with one to their 50-100,000+ advertisers by Microsoft and/or Yahoo…and look…
… instant new billion+ dollar advertising marketplace.
jon - true, although lets be honest, none of these bubble2 operations cares about monetization because their exit of choice, actually their only exit, is to be acquired. there is nothing else. none of these sites will ever grace any exchange save the amex or some other otc garbage dump…none of these will ever manifest revenues that multiply to the satisfaction of the stakeholders…acquisition is it for these firms.
“To put it quite simply, people might not know how to use a natural language search engine.”
Even google has pointers about using the search engine. I don’t think that this is an issue.
Having worked around some of the investors, I can honestly say that they probably wouldn’t throw money down for something that wouldn’t have a chance at working.
Note: Even if nobody creates a “google killer”, there’s still money to be made if you can shave off some percentage of their base.
Don’t you think G is working on this type of stuff? You cannot beat G. It is not even worth mentioning. This company is looking to sell the technology behind the service. If anything it may help Yahoo but G is just laughing because they understand search like no other.
It’s interesting that some people think that Google Search is infallible. Our natural or intuitive prompt for a search is a question rather than a theme, phrase, or keyword. In future, theme or keyword-related searches will be largely replaced by question-led searches. The relevance of search engine results and consequently, the quality of search engines can be more easily determined if one poses questions rather than use keyword-queries. At the moment, search engines encourage a few keywords just like “baby talk:” (:-)! I’d like to talk to a search engine like an adult. In my view, any startup search engine that facilitates “adult talk” will pose a serious threat to Google Search.
With regard to beating Google Search, the most effective strategy may be to attack Google Search at its weakest spot. Incidentally, the weakest spot of Google Search is an “open” secret: it is clearly visible but in the absence of an alternative, nearly everyone accepts what Google Search does. To paraphrase a recent comment by a CEO: “All these years, I was satisfied with using Google Search … until I saw your search engine!” I am still working on the development of this Google-beating search engine which works in a way that the brain naturally processes information. Regrettably, I have to be in stealth mode. It’s painful keeping this new search engine as a ’secret.’ But if that secret puts me in a position to better challenge Google Search, then the pain would certainly be worth it. For now, just take note of this space … I’ll be back!
Everyone is aiming for Google without taking out the small fries. They are building a natural language search which to someone people sound wonderful but I don’t think it will work the way they want it to work. The search engines remove the in, by, the, a just to limit searches. That is why this is done. They may find a way to limit this at the beginning but they will soon learn that optimizing will be a pain for those words. This should be entertaining though.
I’m sure Google is actively looking at how to enhance Search. Let’s face it, Search is their core IP, and certainly they understand there is plenty of room for improvement. I applaud those who are trying to compete, but Google has an army of smart engineers, 60% of whom work on Search, and I’m sure they’ll come out with something natural language-oriented.
David Berlind (over at ZDNet.com) recently pointed out that Sun is getting into the mix as well with a 3d Search concept (http://blogs.zdnet.com/Berlind/?p=219). Pretty interesting stuff.
So I personally think NLP does have a place in search but at this point in time you have to be realistic what it’s capable of.
Take the recall problem of finding things on your local machine first (Vista, WDS, GDS, YDS, etc.). The query “messages from tom about reviews” is a lot more natural to type then “kind:email from:tom reviews” and results in the same query if you enable Natural Language Search in Vista. The queries “music by rush” and “documents modified this week” also return what you’d expect (or at least I hope they do
Add to this a good set of synonyms for everything and NLP can be great at bulding complex constraints over metadata heavy content. And it goes a long way towards replacing the Advanced Query Syntaxes (AQS) supported by most search engines.
On the discovery side (Google, Live, Yahoo, etc.) I think it’s a little less clear how current generation NLP adds value but it can certainly help on the filtering side of things at a minimum. The query “english pages about search created this month” seems pretty useful to me and easily formed…
-Steve
12.5 Million, Ouch. As an average geek, I am so dependent on the array of Google services that search results alone would never woo me away. I think the same is for Yahoo! users and all of our MSN loving cousins in the UK.
“But we’re trying to make it better for average people to search using natural language” Average people don’t know how to change their homepage.
What’s the number one searched term on the number one search engine in the world? You know it … Yahoo!
When does this get started?
Danny Sullivan and commenters above make great points but I think we’re thinking a little to narrow here. Why so much focus on the search box?
Sure, an NLP-powered search box might only marginally improve current day searching (Google) but what if the method of input wasn’t a search box at all? What if you are speaking into your phone, to your car/gps or a shopping kiosk? Powerset’s biggest challenge is not getting the NLP right but the mundane task of dealing with web-scale data sets. Peter Norvig has previously commented in this regard.
EW, you don’t really need NLP to support speech input. You might get a few extra terms in voice queries but most of the extra terms will just get dropped as noise words.
You need NLP if you want to understand the relationship of all the terms and even that understanding comes in various degrees. If you look at a lot of the calendaring apps out there that seem pretty smart they’re either doing keyword spotting or using Context Free Grammers (CFGs) to do their magic. Both of which are pretty simple to implement and not very magical at all…
-Steve