NetBase Offers Powerful Semantic Indexing Platform That Reads The Web
by Erick Schonfeld on April 22, 2009

Regular search engines such as Google and Yahoo use statistics to make sense of the Web. They count links, keywords, and other items on a page to determine its rank in search results. Semantic search engines try to actually understand the meaning of the words found on the Web and other documents to bring back the most relevant results to a query. Microsoft bought Powerset for $100 million to gain semantic search expertise, but so far all it can search is Wikipedia.. Hakia, Textwise, and other startups are also working on semantic search. Now comes NetBase, which brings a slightly different approach that its says can scale to the entire Web.

NetBase has been around for a while. Originally called Accelovation, it has raised $9 million in two rounds of venture funding over the past four years, has 30 employees, and counts among its current customers P&G, Caterpillar, 3M, BP, Kraft, BASF, and Goodyear. It is now changing its name and offering its core semantic indexing technology as a platform for other companies to build their own products. Already, scientific publisher Elsevier uses NetBase to power its Illumin8 research tool for searching scientific articles, patents, and Websites.

NetBase takes a sophisticated linguistic approach, actually diagramming sentences to determine the relationship between words and phrases. It does particularly well with causal relationships, allowing it to tease out cause and effect from raw text. For instance, in the sentence, “The calcium, potassium and magnesium found in yogurt can help reduce your risk for hypertension often resulting from stress, obesity, and other factors” NetBase can identify that “stress” and “obesity” are causes of hypertension and that “calcium,” “potassium,” “magnesium,” and “yogurt” can be used to counter hypertension.

The company has already indexed about 8 billion Web pages and processes 100 billion sentences a month through its semantic parsing. Once it identifies causes, effects, and other relationships, it can serve them up in search results along with top-ranked links. For instance, a health-related search could turn up a guide that includes related symptoms, causes, drugs, and treatments. The technology also lends itself to Q&A types of searches. You could ask, “What companies are developing semantic search technologies?” and it will return a list of companies along with the snippets of mention that company and semantic search.

I’ve tried a few demo searches set up to do various things such as provide the pros and cons of a product, the companies in a particular market, or causes and effects of a medical problem. The results were impressive. On the whole, I’d say they were at least 70 percent relevant, compared to the much larger proportion of irrelevant links I get when I do a Google search. But it was slow. NetBase took 5 seconds or more to return results, something it says won’t be as big an issue in a production versions of its technology.

NetBase is not building its own search engine, although it plans to create a health-related search engine around PubMed content as a proof of concept Instead, it is targeting large publishers and companies that want to create their won vertical search tools, which combine data on the Web with their own databases of content. This is definitely an enterprise play. Licensing starts at about $100,000 and goes up from there.

netbase-sentence

Advertisement

Responses

Comments rss icon

  • Maybe it’s the future of searching, should Google try to learn to index the same?

    • I’m not so sure about that; unless they will be continuously updating the linguistic structures their program recognizes, and then re-processing all the data they’ve already processed. Otherwise it will be a search engine of limited use, and never really a replacement for something like goog.

    • Natural language based search has never outperformed in terms of retrieval recall & precision numerical based search engine and this is fact. Until such semantic systems is tested to be superior than numeric based algorithm, I think that Google’s PageRank including other numeric based algos will still dominate in a long time to come.

      I think that this is the reason that Google is slow in moving towards the natural language, because they are inferior to numeric based algorithm and they’re right.

  • Search is seriously game!! :)

  • I will never stop wondering why these companies need so much money to come up with a product like that.

    We’re working on slightly similar concept, yet we’re completely self-funded and yes, we have a fully functional prototype.

    Sure, without money things can be frustrating at times (mainly slow progress) but heh, .. millions of dollars in funding? What for?

    Seriously, check us out: http://nrds.co..../new/Index.html and let us know what do you think.

    Please bear in mind that this is still a very much work in progress. Anyone interested in seeing live demo, gives us a shout on info(at)nrds.co.uk.

    I’ll be hooking up a live demo and open up an API at some point next week.

    Your feedback is much appreciated.

    Thanks,

    Jan

  • Great technology. More and more companies are looking into enterprise semantic search.
    Fast, Endeca, Autonomy better get ready, since one morning their search results will become irrelevant in comparison with what NetBase, Nstein, and other semantic search/text mining companies are holding in their sleeves.

    Cheers

    OlegR (twitter – @OlegR)

  • Scanning the web semantically for a limited use like health-related topics won’t ever replace google. But if searches like these get grouped by theme in one search engine, this will add value to the customers. The number of topics will just be limited with this approach.

    • I agree. If the initial search is relevant, then it is very helpful to have items grouped to a manageable set of results. I don’t know how much time I have wasted trying to filter through results.

  • So this guys get content and knowledge from the entire web and give nothing in return?

    As a web publisher, why shouldn’t I block these guys from visiting my website?

    • Because these guys will not only bring you more traffic so that you can sell more ads and make more money. These guys will bring you targeted traffic, which will let you command higher CPMs as a web publisher, generate more leads for your sponsors, as well as generate some vibrant discussion about your content and make your content more valuable with their ideas and thoughts.

      how about that?

      OlegR (twitter – @OlegR)

  • We stand alone in our focus on developing the perfect search engine, no search engine will understands exactly what you mean and gives you back exactly what you want

  • BTW, Textwise is not a ’start-up’. They’re been in business for 12 years or more. Don’t really know what they do though they are located about a half mile from my office. Some semantic mumbo-jumbo.

  • What does ‘diagramming sentences’ mean? What you meant is ‘parsing’,is it that right?

    Also, I wouldn’t dismiss the ’statistical’ approach used in current search engines, as today’s advanced language processing methods (or as you put ’sophisticated linguistic approaches’) use statistics a lot.

  • Thought it would be fun to try some of your queries out on Evri’s semantic search labs site:

    Causes of chest pain: http://is.gd/tVaU
    Causes of hypertension (aka high blood pressure): http://is.gd/tVef
    Causes of high fever: http://is.gd/tVhV

    Effects of chest pain: http://is.gd/tVoO
    Effects of hypertension: http://is.gd/tVq5
    Effects of high fever: http://is.gd/tVqM

    More on how to formulate these types of advanced queries at:
    http://blog.evr...ts-some-search/

    Kind regards,
    D.

  • Interesting proyect.

    At bioalma we have developed novoseek a biomedical search engine based on semantic technology. We have index PubMed, PubMed Central and US grant information.

  • Sometimes I wonder if people even know what “semantic” or “semantic technology” might be. If you have a search implemented that crawls a set of sites, I don’t think you should call it semantic. Even if its a pretty good search.

    I think only explicit semantics can define clear semantics and there should be more efforts made to make people use explicit semantics (for example using RDFa).

  • Unrelated to the article, but I like today’s black background better than your usual color scheme. It’s probably because I stare at a screen all day and this feels easier on my eyes. Can you leave a “toggle theme” switch somewhere?

  • Semantics is the future of searching the web. People will get to the point that they will sick a tired of scrolling through page after page of results to eventually get to what they are looking for. The goal of Cazoodle is to minimize that pain by using our semantic search technology in several domains like apartment searching, events, and even shopping. Using this type of technology properly will give the end user far more results that they actually care about in easy to use formats.

    http://www.cazoodle.com

  • As a user of Illumin8, I can attest to the incredible power of this tool. Using Illumin8 you can find and utilize the same in depth research in ten minutes that would take hours of days of searching with Google. Google excels at returning the best first result and getting you where you know you want to go quickly, while Illumin8 gives you everything you could hope to find, all at once in a structured, organized fashion.

  • This looks like a very interesting product and I’m very excited to try it out. How do I get to do that? :)

    Keep in mind that it appears to me (without having done thorough analysis) that they’re only extracting a relatively small number of relationships that are relevant to vertical areas. That’ve probably very useful and interesting, but doesn’t mean that they’ll be able to create a generalized Web solution. What’s always been interesting to me about Powerset’s technology is that we can extract information about Hulk Hogan’s opponents, causes of cancer, and or works written by Frege with equal facility, because we understand the underlying structure of the sentence (in addition to understanding semantics).

    Still, I’m excited about these guys and will definitely put them on my radar.

    -Mark Johnson, Powerset Program Manager

  • I think it could be in future and Google will update himself about indexing but in my view Google is best

  • Remember that google already has semantic search, they just not built it so that every query can use it – so far.

  • Sounds like Microsoft didn’t do it’s complete homework b4 buying ?

Leave Comment

Commenting Options

Enter your personal information to the left, or sign in with your Facebook account by clicking the button below.

Alternatively, you can create an avatar that will appear whenever you leave a comment on a Gravatar-enabled blog.

Trackback URL
Short URL
bugbugbug