Sneak Peek At T2, Twine’s Semantic Search Engine
by Erick Schonfeld on September 18, 2009

Extracting meaning from the Web is huge project that is very difficult to do at large scale. Keyword search only skims the surface of meaning locked in Web pages. Various semantic search technologies try to go deeper by adding structured data to web pages so that the Web can be treated more like a database. But adding semantic metadata to the Web is laborious and time-consuming. Just look at Twine. It’s approach so far has been to add semantic data only to the Web pages members save to the service.

While it appeared like Twine was finally getting some traction earlier this year, it’s fallen by the wayside. Traffic is way down (see chart below), partly because it is no longer buying traffic with ads and partly because of changes to the way Google indexes the site. Bottom line is that is that beyond a hardcore following of about 250,000, Twine does not have broad appeal.

But CEO Nova Spivack and his team at Twine have been busy working on something else entirely, to the point that the current Twine service is pretty much on autopilot. In the video above, Spivack gives a sneak peak at what his team has been working on. Codenamed T2, it is complete departure from the navel-gazing approach of Twine 1.0. It is a big step towards creating a semantic search engine that might eventually scale across the Web—exactly the kind of swing for the fences type of idea we like to see at TechCrunch.

When T2 launches, hopefully by the end of the year, it will be a demonstration of what semantic search could be. T2 will have a semantic index of the top 50 to 100 sites across major categories such as food, health, sports, music, finance, television, politics, tech and movies. In those categories, T2 should provide really good guided search. If you search for “baseball” you will get a list of baseball players, along with categories on the side to refine the list such as by position or team name. When you type in “thai food,” you can select the Recipes tab and then filter by food site, rating, main ingredient, and so on. Or you can select the restaurant tab and drill down by city, hours of operation, etc.

You’ll find this type of guided search on Bing, with the categories changing based on the initial search term. But Twine does things differently.

What Twine has done, basically, is speed up the rate at which it can look at a raw Web page and create semantic metadata for it. Bing sometimes does this via natural language processing, through the technology it bought with Powerset. That takes a lot of computation. It also employs other methods. Twine’s approach is more to create a set of semantic tags for each page.

There are already standards for doing this, such as RDF and OWL, but most Webmasters don’t bother adding such tags to their sites. If they happen to be there, Twine can read them, but it can also make a good guess as to what is on the page and assign its own tags to the page. In order to try to make it easier for Web developers to tag their sites, Twine is also working on developer tools such as an Ontologies Editor. This lets anyone with domain expertise define the different concepts and tags which would characterize a page about a particular topic, such as a recipe or a baseball player or a car. For example, a recipe might be contain concepts such as ingredients, difficulty level, an author, and a a date. There are literally millions of potential properties that can be matched to different concepts. The collection of all of these together for a specific topic is an ontology.

There can literally be hundreds of thousands of ontologies for every conceivable topic. If Twine knows what ontology to apply to a given Web page, it can do a better job applying semantic tags to it and extracting data. Twine wants to create an open directory of these things, which will be like a SourceForge for ontologies where anyone can contribute and make them better. You can watch this video for more details.

All of this might seem a bit abstract, but if we could ever get to the point where the most important pages on the Web have semantic tags, it will be a lot easier for computers to know what they are about. And to the extent that data is locked in those pages, the Semantic Web will turn that data into something that can be computed. As these tags get applied to more and more information, they could eventually help filter stream data as well that everyone is increasingly drowning in.

Whether or not Twine will be the company to deliver any or all of this is a long shot, but it is definitely something worth swinging for.

Advertisement

Comments rss icon

  • worth a shot, give them credit for shooting high

    • I think I will stick with Yauba or Wolfram for my semantic search needs

      • I hate Wolfram. Just when I think I have a good query for it, it fails with no results… or something completely unrelated to what I’m looking for.

        Example – I wanted to know the population of a small town.

        First, I have to remember the name of the service: Is it Alpha Wolfram? Nope. Wolfram Beta? Nope, not that either. Wolfram Alpha? Bingo!

        Second, I type in my query: “Population of Sidnaw, MI”

        I am then treated to a result that is nothing close to what I wanted – “Assuming Sydney, Australia”

        FAIL… and back to Google (type in “population Sidnaw, MI” and it’s the FIRST result.

  • Nova excellent video! I’m excited to start using Twine 2!

  • Jesus, when are these guys going to stop fooling themselves? They’ve been at it since 2001 or 2002 and still haven’t released a meaningful product.

  • Agree. If you’re gonna shoot, might as well be high. Big challenge will be getting the webmasters to create the semantic tags for their sites.

    • LOL. Shoney said TC loves swing for the fence ideas. TC50 attests to the opposite, you guys at TC love crappy rehashes of previous failures, and twitter. Very ridiculous that TC considers itself visionary.

  • People are not interested in this kind of tech. People want the same display format that was presented to them in 1995 by webcrawler, 2000 by yahoo, 2005 by google.

    Just concentrate on improving the quality of the search and you are onto a winner. lord knows Google is so spammy and getting less relevant each day.

  • My masters thesis was on semantic web. I know the challenges in the field. As pointed out in the article it is indeed laborious and time-consuming to add semantic metadata to the webpage. A centralized approach won’t work. The best way would be to provide incentives to the content publishers to do it. For example easier crawling and higher ranking in search engines is a good incentive.

  • How many people really search such generic terms like Japanese, Baseball, etc?

  • How many people really search such terms like Japanese, Baseball etc?

  • It is hard to see how twine team of about 10 engineers can accomplish all of this and scale this appropriately. I am willing to give them the benefit of the doubt, so we shall see.

  • You can do any kind of query in T2. Including all the normal types of queries you would do in a typical keyword search engine. We combine keyword results from keyword search engines like BOSS (that already have enormous indices of the Web) with our semantic index. Our semantic index superimposes semantic information onto the keyword results. For example if we know more about a result, we annotate it automatically. In addition, we provide tabs for types of things we identify that relate to any given query. When you click one of those tabs you then go into a cut of our index showing a semantic view of those types. You can then continue to drill in on the various semantic facets we have identified. So this is a fusion of keyword and semantic search technologies. This way when you search you get complete results — just like in Google, Bing, or Yahoo but IN ADDITION you also get much richer semantic filtering abilities. Hope that helps.

  • I have coded a google like search engine and an entity extractor for semantic analysis in the past. I am quite sceptical about semantic search engines like this. My view is that they are trying to achieve too much and often lacks the basics. They might be more relevant than google for some niche but fail to add value for most use cases… This might be an interesting solution for some niche markets but I doubt it will ever become a mainstream search paradigm. Anyway I’ll watch this to see how it goes and would be happy to revisit my opinion ;-)

  • Please note this is a “marathon not a sprint.” Don’t expect that on day one we’ll have every page of the Web fully semantically indexed. We are proceeding by verticals — starting with food, health, shopping, entertainment, technology, and many other large verticals. But the key is that this is designed to be an open system. We will be providing hosted developer tools (see the second screencast linked in the article) as well as an API so that third party sites and developers can join with us in this initiative. Ultimately there is no way that a single company could do something this large on its own. This is why we’re actually designing this system to be a massive distributed collaboration. Webmasters and app developers all over the Web will be able to participate in building this index. It’s kind of an inverse of Google. In Google there is little insight into the secrets of their black box. With T2 we are going to be very open and transparent. The more sites participate in linking into this system, the more Twine will be able to index their content semantically. Also note that we are open to letting other services and even search engines partner with us to use our index in their own results. We are already in conversations about this with some of them in fact. But our main focus will be partnering with major online service providers, publishers, media companies, etc. We will be giving them access to our API and a set of tools so that they can provide this kind of precision semantic search over their own content and subject areas, within their own sites. Many of these companies have enormous amounts of structured data in multiple repositories. Traditional keyword search does not lend itself well to searching structured data repositories. This is where the Semantic Web shines, and it’s one of the keys to our go-to-market strategy for T2. It’s also worth noting that we will be including many of the social, collaborative and sharing features of T1 in the T2 release, but things will roll in over time. Yes, we are a small team, but we’ve put in over 62 engineering man-years into this platform. So we have a decent headstart. Still it is a huge mountain to climb. We are the first to admit that. Hopefully we’ll form terrific partnerships with larger companies as well as legions of individual developers who can help us get there faster after we launch.

  • I spoke F2F with some guys from Expert System last year and they revealed they were the suppliers of the NLP & semantic engine behind a new Web 2 company called Twine http://www.expe...ews.asp?idd=975 Following this I got in on the closed Beta. I would be interested to know what is behind Twine 2.

    For me, Twine is still scratching an itch I don’t have; unless Twine 2 really resonates with people’s needs I doubt it will have any more success.

    • Great points in your article. I think everyone can learn a lesson here. Read this guys post at his link above.

      P.S. I never thought I’d find anything but spam from a link in comments.

  • Also for anyone out there who might be prone to use terms like “Google Killer” — this is NOT a Google Killer. We’re not trying to be that. We are focusing on working with partner sites. There will be a way to access T2 directly as well, through Twine.com, and via our API. But we fully expect most of our usage to be via our API and partner sites. This will never replace standard keyword search engines like Google and Bing, for Web-wide search. However within publisher sites it certainly could replace them because it can deliver much more precise search and navigation over the many kinds of structured and unstructured data that such sites contain. It also saves consumers a lot of extra steps — if a consumer wants to find all the vegetarian kosher Thanksgiving side-dish recipes, they can do that in one place — instead of having to dig around in dozens or hundreds of different sites. Similarly when searching for a product to buy they will be able to search in one place — with specific parameters for exactly what they want — and then see all the relevant results from all relevant shopping sites in one click, instead of having to dig through each site one by one, searching again each time. This same example applies to every vertical area — music search, people search, resume search, product review search, investment research, health and travel etc. In keyword search engines users spend a lot of time searching and re-searching and then exploring various sites and searching within them to finally find what they want. This saves them all that trouble. It’s better for consumers, and better for content providers because it’s a lot more efficient for everyone. But we expect that consumers will eventually get to our data through sites like Google, Bing, Yahoo, and major publishers, media sites and social networks, rather than directly through Twine.

  • Nova and Erick := Dream Team!
    That nicely backs up my hypothesis of a future Web search:

    http://www.geld...e-aktualisiert/

    Unfortunately, it’s in german (yet still)

  • We need to find a way to teach computers to Understand language so the bots and crawlers could scout the internet. “Manual” filtering will always be insufficient.
    I’ll keep waiting for a semantic Google Search killer.

  • Swinging for the fences, with a tooth-pick. Spivack has piddled through 20 mill with a close to zero to show for it. Where’s the money coming for this tangent?

  • And of course, the only reason there was a scramble to release this sneak peak was because they were busted by a blogger who noticed Twine traffic is off its max by 85%.

    http://www.sema...n-freefall.html

    The response from the Twine team “ah yeah, we don’t care about the drop in traffic, you see that’s the 1.0 version, no big deal.”

  • Somebody had to say it: it’s sneak *peek*, not sneak peak.

  • Taking the thai food example shown in the video, I would have just typed into google “thai recipe ginger” and gotten it faster.
    I applaud Twine’s effort to reinvent itself after the original failure, and I doubly applaud efforts in the semantic space, but I fail to see the business model.
    You can’t outgoogle google, and the kind of semantic stuff they’re adding doesn’t solve the problem that much better.

  • auto-tagging services are needed .. people will not take the extra time

  • Aren’t we forgetting a few things ?

    1. after so many years, power-users have learnt how to make the right queries to Google (see comments above).

    2. sometimes, finding pages that are “on the edge” of what we were “officially” searching for is one of the biggest pleasure of the search action itself. That means the exact opposite of result-narrowing…

    3. those “semantic” system ALSO work with keywords, whether those keywords are already tagged (titles, meta) or whether they extract them themselves (words frequency, etc)

    This is not what we need.

  • I like twine their use of semantic can gender sites more at meaning the more is known at the semantic level the better.

    Knowing what a site is could benefits security if they partner with say norton, Devious sites could be check based on their content and the relation of links to links. There is more to it then just tags its a complete new approach what we used to know from the first static websites.

    Semantics is pretty neet good job twine.

  • Sematicus Fallacious - September 19th, 2009 at 10:58 am PDT

    WHY PUBLISHERS SHOULD AVOID TWINE:

    If you allow TWINE to build, manage and host your niche category ontologies from w/in your site, you will have essentially handed the intelligence of your domain over to Twine.

    Once Twine has modeled your niche, they can essentially monetize your knowledge better than you can. I would be VERY suspicious of ANYONE who offered me an outsourced tool that sat atop my community and product set and then sold that data back to me. Why? B/c once they start combining and extending that data w/my competition, I’m essentially their slave.

    WHY TWINE IS BARKING UP THE WRONG TREE:

    Advertising is the wrong business model. eCommerce is where its at, guys. Forget Google. You need to focus on Amazon. Product ontologies in combination with unstructured data is the real way this is going to be monetized. No one cares about Nova’s examples. His navigation approach is cute, but shows absolutely no insight into how people USE content to make purchase decisions. His whole approach is bass-ackwards. What you want to do is approach this from the discrete transactional unit (e.g. a product for sale) and work your way out to the messy, unstructured web at large to understand what is used, considered, relied upon to drive purchase decisions. Why is this so damn hard for you NLP/Semantic web nerds to understand? I understand if the DARPA/NSA is paying for your research, but if you are hoping regular Joes are going to use this, I think you’d better return to the drawing board.

    If/when Amazon develops a semantic understanding of each of its top selling categories — Good buy Google. By mapping that product/category/domain understanding to the top driving key words and related content (reviews, video, product images, blogs, etc.) what you’d have a much more meaningful AND MONETIZABLE structure that will not only drive search, but drive advertising as well.

    Twine? Why boil the ocean? Why not start where there’s already money changing hands TODAY and add value to that? Why not help someone buy Auto Insurance or make a financial investment? Something that has a real financial outcome attached?

    “Thai Ginger recipes” = FAIL.

    The NLP/AI/Semantic web folks are apparently failed academics who know absolutely nothing about how real people use information and language to make decisions. Instead, they approach this as a science experiment rather than from a series of business requirements. Business doesn’t need Twine 2. It needs better ways to reduce the cost of customer acquisition and cut SEM expenditures. To the extent you address that clearly and succinctly, you have our undivided attention. Until then, you are simply engaged in solipsistic, philosophical discourse.

    • Email me at sansom13 [at] gmail.com. This is exactly what we’re doing and I think you’d be interested in seeing what we’ve done. I’m also interested in hearing more about your perspective.

    • WHY PUBLISHERS SHOULD USE HOSTED SEARCH

      Your comment makes no sense at all. There are tens of thousands of companies already using Google as their hosted search provider.

      Hosted search is alive and well. It is safe, saves lots of time and money, and usually works much better than in-house search projects.

    • The poster above is most likely Twain Luu who was kicked out of Twine for breaking rules, harrassing other users and threatening them.

      But that was like a year ago already. Get over it!

      Twain, you are mentally unstable and obsessed with revenge. You’re not a banker — you don’t have a job — you just sit at home and pretend to be an authority and mostly you just try to get revenge on all the people who wronged you blah blah blah. Seriously, get a life.

      Poor Twain Luu. Still can’t think about anything but revenge. Pathetic.

  • Why so many haters out there? I haven’t used twine until last night, and find it very interesting. I’m always looking for something new. Have grown bored of Google. Bing doesn’t improve things much, although I give them an A for effort. If Twine can deliver on this vertically, then I think there is promise. If I were Google or Bing, I would probably buy them and integrate this asap.

  • I see TC is up I it’s ok tricks and editing or deleting important posts. I read a previous post about dangers with hosted semantic technologies such as twie and then suddenly it disappears? It was the only business focused and prospectively useful post on this thread. So who deleted it arrington?

    Spivak posts rebuttals longerthan the original article but when people actually counter their posts are deleted?

    Pravda and Tas are alive and well in silicon valley.

  • That post above about the supposed risks of hosted search is totally wrong and misguided.

    How many sites use Google as their hosted search solution? Tens of thousands at least! Hosted search is alive and well.

    If Google can do it with keyword search for thousands of publishers then why can’t other companies do it with semantic search? It makes sense to me.

    It makes perfect sense for publishers to outsource their search. Just like they outsource ad sales, bandwidth, video storage, etc.

    Publishers are under increasing pressure to compete, while revenues are falling. It makes perfect sense for them to use software as a service approaches to improve their services while cutting costs.

    I don’t know if Twine or anyone else can do better than Google, but I definitely think there’s a market opportunity here, and a clear benefit to publishers.

  • Don’t get all the haters either – the filtering aspect is absolutely valuable and does go beyond what the current search engines offer (at least to regular users, agree that power searchers often have the queries at their disposal). Tho it’s easy for any of us entrepreneurs scraping by on much smaller – or zero – funding to be resentful, to be sure. I didn’t quite “get” the first (current) Twine, but this is a big step up.

    I’m not a semantic nerd but I have done quite a bit of research into this area, and my new startup http://www.ranker.com leverages the wonderfully vast Freebase dataset (as did the Powerset technology). Twine version 2 certainly has potential to have broader impact than any of the semantic plays we’ve seen thus far.

  • THIS ARTICLE BEGINS WITH:
    “Extracting meaning from the Web is huge project that is very difficult to do at large scale. Keyword search only skims the surface of meaning locked in Web pages.”

    IN THE 6TH PARAGRAPH IT STATES:
    What Twine has done, basically, is speed up the rate at which it can look at a raw Web page and create semantic metadata for it. Bing sometimes does this via NATURAL LANGUAGE PROCESSING, through the technology it bought with Powerset.

    ALL THOSE IN FAVOR OF THE “K-I-S-S/KEEP IT SIMPLE SILLY” PRINCIPLE… RAISE YOUR HAND!
    Demonizing Google makes no more sense than it did against Microsoft. (The REAL ENEMY of the people are the PHONE companies. They track and record ALL your communications! Phone calls, text messaging, voice mails, emails AND internet history!!) Nothing the first time out is perfect. The evolutionary process for every industry is “ongoing.” Minds AND personalities play a HUGE role in the ideologies of any industry… good, bad or ugly.

    I PREFER “NATURAL LANGUAGE PROCESSING” EVEN IF BING, THE SEARCH ENGINE OF MICROSOFT, IS THE 1ST TO DO IT THIS WAY:
    Why? If the common educated person has the capacity to understand and program (in THEIR language) the simple “key words” important to THEIR searching and needs and what they mean to THEM… all the better.

    REFERENCING PAGE 55 OF ONLINE BOOK TITLED:
    Exotic DBA’s Exploit Key Word & Business Search For All Toll Free & Paid Directory Assistance, Search Engines, Online Directories & Phone Books
    Preview online book at: http://www.lulu...content/6172635
    Author: Dale D. DeWalt with http://www.411maximizer.com
    Phone: 206-793-0000

    IN CONCLUSION: What Does The Future Hold? Semantic Web 3.0!

    “Trying to predict the future is like trying to drive down a country road at night with no lights on while looking out the back window.”
    Peter Drucker (NOVA SPIVACK’S GRAND FATHER)

    In a nutshell, the Semantic Web is about machines talking to machines. It’s about making the web more intelligent. For example; the word “transmission” can be associated with motor vehicles, diseases and radio frequencies. If a web page has the word “transmission” and words “flu” or “sick” the Semantic Web will in the future “more intelligently” only aggregate and pull other web pages with similar content. Words will have code attached and “relevance” and “similarity” of codes will help make searches more pertinent to a specific subject and return less extraneous information.

    “If you want to make minor incremental changes and improvements,
    work on practices, behavior or attitude. But if you want to make significant,
    quantum improvement, work on PARADIGMS.”
    Stephen R. Covey

    END OF PAGE 55

    CONTINUING:
    Nova Spivack’s grandfather, Peter Drucker, was a WISE man, leader and investor while he was on this planet. Hopefully, he is in heaven now. I wonder what he would say about current Semantic Web options and our choices. Only God knows.

    IN MEMORY OF PETER DRUCKER, PONDER 3 OTHER INSPIRING QUOTES FROM BOOK:

    From Page 5:
    “We now accept the fact that learning is a lifelong process of keeping abreast of change. And the most pressing task is to teach people how to learn.”
    Peter Drucker

    From Page 6:
    “What you have to do and the way you have to do it is incredibly SIMPLE. Whether you are willing to do it, that’s another matter.”
    Peter Drucker

    From Page 9:
    “The only thing we know about the future is that it will be different.”
    Peter Drucker

    IN CONCLUSION:
    Until we SIMPLIFY the process of sorting and locating information in the World Wide Web we will be forever “chained” to others to help us “find it” and benefit from it. Therefore it is my opinion that ANYTHING we do to COMPLICATE the process versus SIMPLIFYING the course of action is categorically 180 degrees OFF from what our PRIMARY GOAL is… communicating with others. Relying on others to help us READ or WRITE “proprietary code” is less helpful than if they create and apply a “language” process that is “natural” to us all.
    I myself will support, patronize and promote ANY method or company that “demystifies” such a basic element as the World Wide Web and helps to keep things SIMPLE. I vote we design and apply solutions… only if they emulate the KISS principle!

    • The person posting as “Twain” aka Twain Luu was kicked out of Twine for breaking rules, harrassing other users and threatening them.

      But that was like a year ago already. Get over it!

      Twain, you are mentally unstable and obsessed with revenge. You’re not a banker — you don’t have a job — you just sit at home and pretend to be an authority and mostly you just try to get revenge on all the people who wronged you blah blah blah. Seriously, get a life.

      Poor Twain Luu. Still can’t think about anything but revenge. Pathetic.

    • Dale DeWalt = Twain Luu

      • Seriously that Twain character is wacked. Dude, don’t pay attention to her. She thinks that anyone cares about her personal saga??? Well maybe you do. But I know for a fact that everyone at Twine has long since forgotten about Twain — she is stuck in the past and all her information is based on stuff from a year ago. Totally not current. She is still upset about a video joke that someone made with Twine and rap song. No sense of humor. Go figure. As for her statements about usage, usability etc. Absurd. Everyone just ignores her at this point. She was indeed kicked out. Ignore.

  • “hardcore following of about 250,000″ ??
    I cant really believe theese numbers. I have been following twine for about 1.5 year and find the activity among the powerusers lower than before. I dont think Twine has an hardcore-userbase of 250 000 users.

    • Spivack has been a guest writer on TC, and is tight with some of the Crunchies, hence the going easy. Nah, they have nowhere near, nowhere near, 250K “hardcore” users.

      Oh, and to those wondering about the all of the hater, maybe we’re just tired of all of the horse s**t. Tech Crunch needs to do a better job at forcing bluffers like Spivack to keep it real.

      • You are just a hater.

        Perhaps one of several who was kicked out of Twine in the past….????

        But nice try with your Chinese Propaganda approach to revisionist history.

        Everyone knows who you are.

  • i’m not so certain Twine should be in the business of managing ontologies. there needs to be some standardization of them though, considering the growing momentum of things like rdfa. since the w3c is in support of that semantic effort, why not create oversight there utilizing something like the w3c patent policy?

  • why Publishers should use hosted search?

  • Erick, I like the the objective focus of the article you brought in the first few paragraphs.

    The rest however seems to be a cut & paste from an email from Twine. I would have appreciated a more objective analysis, given the hype we’ve witnessed from Twine before and the gap between what they say they’ll do and what they actually do. Entity extraction is no trivial task and I personally doubt they’ll pull this off on a scale sufficient to be successful. They’ll either have to specialize on some particular search e.g. food, which is already well structured out there, or scratch the surface of many domains without really delivering much value on any.

    One more thing, I would have appreciated a mention of your source regarding the fall in traffic. You article came 4 days after my blog post on this (http://www.sema...n-freefall.html), which RWW did refer to. Next time please make the corresponding attributions, since this is not first-hand information.

  • That comment by Greg Boutin is so snarky….

    Ironically, Greg Boutin is not exactly known for “objective” journalism himself…

    Everyone knows he has a personal vendetta against several different CEOs, especially those of several semantic web companies that refused to hire him or take his advice.

    There is documentation of his attacks on them in threads in other places.

    The truth is that Greg Boutin was fired by a semantic startup a year or two ago on very bad terms, and since then has been attacking every semantic company he can find.

    He is far from “objective” when it comes to this topic.

Leave Comment

Commenting Options

Enter your personal information to the left, or sign in with your Facebook account by clicking the button below.

Alternatively, you can create an avatar that will appear whenever you leave a comment on a Gravatar-enabled blog.

Trackback URL
Short URL

RealTime CrunchUp Sponsors:

bugbugbugbug
Techcrunch on Facebook