April 25, 2008

Is Keyword Search About To Hit Its Breaking Point?

Erick Schonfeld

103 comments »

keyword-search-slide.png

As the Web swells with more and more data, the predominant way of sifting through all of that data—keyword search—will one day break down in its ability to deliver the exact information we want at our fingertips. In fact, some argue that keyword search is already delivering diminishing returns—as the slide above by Nova Spivack implies. Spivack is the CEO and founder of semantic Web startup Radar Networks and is pushing his view that semantic search will help solve these problems. But anyone frustrated by the sense that it takes longer to find something on Google today than it did even a year ago knows there is some truth to his argument.

internet-user-chart-tiny.png“Keyword search is okay,” he says, “but if the information explosion continues we need something better.” Today, there are about 1.3 billion people on the Web, and more than 100 million active Websites. As more people pile on, the amount of information on the Web keeps growing exponentially to accommodate all those seekers, and they themselves feel compelled to put their own personal and social information onto the Web as well.

At a certain point, with billions and billions of Web pages to sift through, keyword search just won’t cut it anymore. It’s a needle-in-the-haystack problem, with the haystacks just getting bigger and bigger every second.

Spivack explains:

Keyword search engines return haystacks, but what we really are looking for are the needles . The problem with keyword search such as Google’s approach is that only highly cited pages make it into the top results. You get a huge pile of results, but the page you want—the “needle” you are looking for—may not be highly cited by other pages and so it does not appear on the first page. This is because keyword search engines don’t understand your question, they just find pages that match the words in your question.

So how do we get beyond keyword search and Google’s PageRank? There are many approaches being tried: social search, tagging, guided search, natural-language search, statistical methods, open search, semantic search, and (way out there) artificial intelligence. They all have their problems. Tags are too messy and inconsistent. Natural-language requires too much computing power, is difficult to scale, and doesn’t deal with structured data well. Semantic search is perhaps the most promising, but it essentially requires every single Webpage to be re-written.

Spivack covered these issues during a presentation earlier this month at the Next Web conference in Amsterdam. It was one of the clearest explanations of the semantic Web I’ve heard so far (I’ve embedded his full slide show below). The semantic Web is nothing more than a set of standards that, if broadly adopted, would help computers extract meaning from the flood of data on the Web. But instead of a brute software approach, it puts intelligence into the data. “All you need to use that data is carried by the data itself,” says Spivack. Dumb software, smart data. That is an approach that scales no matter how many billions of Web pages are created.

The point, says Spivack, is:

To do for data what the Web did for documents.

You are turning the Web into a database, and your data becomes a part of it. Your data becomes part of the worldwide database. The semantic Web will let you move from data record to data record, just like you go from Web page to Web page.

There are many obstacles to the adoption of the semantic Web, but its goals are something worth striving for. What is certain is that search needs to evolve, and Google and Yahoo and Microsoft with it. Of course, they can adopt whichever approach or combination proves most effective.

The question is: Will they, or are they too wedded to keyword search to move beyond it?

  • Sphere It

Trackbacks/Pings (Trackback URL)

  1. The Future is Discovery, not Just Search | 20bits
  2. n.sputnik » How Open Source and Social Media are going to Eat SEOs’ Lunches
  3. Taking the Bridge » Keyword Search Death Knoll Sounds
  4. Destino [abierto] » Etapas de la WWW
  5. nortypig » Blog Archive » The Limits of Keyword Search
  6. » The future of search. » Cornell Info 204 - Networks
  7. TechCrunch Japanese アーカイブ » キーワードサーチは限界に到達しようとしているのか?
  8. [davidchiu ~]$ - Keyword search is dead. Long live the semantic web.
  9. Basic Thinking Blog | Wochenende
  10. links for 2008-04-26 | hxf148
  11. Keyword Search at a breaking point? | Think-Through
  12. The Future of the Internet « The Mediocre Renaissance Man
  13. Darren Herman - Marketing, Advertising, Media and Technology Blog » Blog Archive » links for 2008-04-26
  14. tightgrid - Links 04/26/2008
  15. links for 2008-04-27 | Royuen X Tech
  16. Keyword Search sucks - TechCrunch « The Analytics Guru
  17. Quote by Is Keyword Search About To Hit Its Breaking Point? | Oliver's Stuff
  18. TaT: Microhoo, Yahoo Open Strategy, und vieles mehr …
  19. “Should” Is Not A Great Marketing Strategy « The Pursuit of a Life
  20. Title Searchers and the rest of the world » Blog Archive » ‘title searcher’ on the web
  21. Alt Search Engines » Blog Archive » Is Keyword Search About To Hit Its Breaking Point?
  22. Keyword Search: What Does the Future Hold? | Searchsavant - The SEOintelligence Blog
  23. Is Keyword Search About To Hit Its Breaking Point? « SearchWorld Summary
  24. Semantic Hackers need NLP, Prolog and Assembly « OMADEON
  25. Ist die Keyword-Suche bald Geschichte ? | KlonBlog
  26. Keyword search doesn’t work | A Strange Web
  27. Electronic Discovery Perspective » Blog Archive » Keyword is broken?
  28. What’s on your Google Log? Who’s on your Social Media Profile?
  29. Hacia dónde va la web 2.0 | Dos Punto Cero
  30. soutschek.com - dive into the map » Blog Archive » Who will move Google’s cheese?
  31. Faviki Blog
  32. On Yahoo « On Technology
  33. Uppkopplat » Blog Archive » Har vanligt webbsök nått sin topp?
  34. CHRIS QIE | 互联网营销和语义网络 » Blog Archive » 语义学、语义网和语义网络
  35. Alt Search Engines » Blog Archive » Beyond the keyword search breaking point
  36. The vital need for an ANONYMOUS Semantic Web (transcending trendy elites of networked… idiots) « OMADEON

Comments

RSS feed for comments on this post.

  1. Gmail USER

    Gmail.com is down since long time now. Even Google talk doesnt seem to be working all over USA.

  2. osman din

    Of course, that’s the decades old classic AI problem. Forget about the web, you can’t even extract meaning out of a random unstructured text document. It’s obvious though that both Microsoft and Google have been thinking about this every single day — just like any student of computer science. In other words, there’d always be skepticism about the term ’semantic web’ unless there’s some sort of a breakthrough. Currently, we don’t have anything better than a database (manually classify and index) approach, and one doubts anything more intelligent would come along soon.

  3. reddknight

    I have also noticed I am finding difficulty getting good search results for what should be dead simple searches. For example, I was looking for a quick howto article on running a script on XP at boot time–I had to change keywords in 12 different ways and wade through a morass of invalid results, bad articles, and third party “suggestions” before I found the straightforward way it should be done.

  4. Chris

    Juggle.com may be working on this solution right now.

  5. Loudacris

    Its good to know that web 2.0 will end in 2010. And why does tagging appear under web 3.0 like its some sort of awesome future technology that we should be looking forward to.

  6. Niraj

    I think the semantic web makes sense, but the whole idea of natural language search seems unnecessary. Keyword search generally works pretty well, and search engines don’t often return haystacks instead of needles unless you’ve failed to use related keywords and other parameters such as limiting to a specific website or timeframe.

  7. Morgan

    That graph is ridiculous. I’m going to go out on a limb and guess that presentation to the user will likely take the form of words for a while yet. And somewhere, some algorithm will decide what ads make the most sense to match with what is being presented on screen. If everyone feels better calling it a new name, go for it.

  8. Morgan

    I am an idiot an my last comment made no sense.

    However, I do think it’s going to be a lot longer than that graph that I’ll be searching by typing word queries. I’m sure someone will blow my mind with an interface that suddenly rings in the promise of sentient computing, but for now I’ll keep on with words.

  9. Phil

    I never have, and probably never will, understand why natural language search is “cutting edge”. It seems like a lot of work for nothing, keywords are just as good by my count.

  10. damon

    What a great chart….

    - my curve goes up, see

    - the competitor’s curve does down, see

    ipso facto, google sucks and I don’t

  11. Charles Parker

    The first graph is crap.

    I’m using wikipedia about as often as I use google…. 3rd most used is delicious.

    Plus through RSS & “web 2.0″ I’m getting a lot of stuff recommended/pushed to me, which cuts down my searching time.

  12. Ben Finkel

    This underscores the role that Q&A sites will play as time goes on.

    We’ve got natural languages searches working right now, and by relying on actual humans we can solve those tough AI problems that may or may not ever be solvable.

    I believe the long term challenge with search is the very assumption that search itself is the primary web tool of the future. As social networks widen and become more flexible and usable, I think a significant amount of data acquisition will move to these networks, and the role of search will diminish.

    Ben Finkel
    http://www.fluther.com

  13. Charles Parker

    Also, I still think search will utilise collective human-powered efforts. (No, Mahalo won’t scale, and no Facebook won’t have anything useful to add)

    I am yet to be convinced that the “semantic” web won’t be overtaken by spammers. Much like meta-tags were spammed early on in web 1.0

  14. Jim McNelis

    I like the Google Docs PPT widget. tits.

  15. Jim Bumgardner

    That first graph is horrible. It takes a whole lot of made up assumptions and predictions, and packages it to look as authoritative as possible, using a form normally used to present actual real world data.

    “See? This stuff I pulled out of my ass is real!”

  16. John Clarke Mills

    Loudacris,

    Tags are nothing new, that is for sure. But what if you could tag an object, or entity, with another object. So instead of tagging objects with strings, which falls back on a simple full-text search, you could tag something with an actual representation?

  17. Charles Parker

    Keyword search isn’t going away anytime soon. Keyword searching is ingrained into the internet habits of those 1.3 bn internet users. To much cognitive dissonance involved with changing to another way of interacting with the web for the majority of them.

  18. Erick Schonfeld

    The top slide is conceptual, boys. But it does feel right to me. I’d love to see some real data to back it up. If anyone has that, please add to comments or send me a note.

  19. PressReleasePoint

    i use del.icio.us more often these days. zoominfo, wikipedia, krugle, technorati etc etc help me with my search. just google is not enough anymore. some times yahoo works better than Google. only 20% of keyword search returns what i am looking for. even those results are just more popular, they may not be the best. some times i never find what i want even though i know that it exists somewhere on the web. custom search engines like google co-op must become more widespread

  20. John Titor

    Erick, I sent you an email tomorrow with actual data from 2030.

  21. Ben Finkel

    This underscores the role that Q&A sites will play as time goes on.

    We’ve got natural languages searches working right now, and by relying on actual humans we can solve those tough AI problems that may or may not ever be solvable.

    I believe the long term challenge with search is the very assumption that search itself is the primary web tool of the future. As social networks widen and become more flexible and usable, I think a significant amount of data acquisition will move to these networks, and the role of search will diminish.

  22. Yawn

    @ 19

    LOL :)

  23. TheDude

    I believe that Keywords search is not mutually exclusive with semantic search, but rather is part of it. People will still likely use keywords when searching for content. What will change though, is the ability to collect and organize relevant data based on your natural interests, your social groups and your location. Where semantics helps is when the service being used understands the contextual interactions users have on the web and proposes the proper actions based on the preferences and context. Many services such as Twine, Faves or Yokway are going in the right direction in making this new web a reality.

  24. Slavkoni

    Keyword search successfully fill up all my reqest. What to bee tomorow, it´s qestion.

  25. Abhijeet

    proper communication between human & computer is a key factor in achieving “semantic search”.
    I see computer as a blind-deaf person with very less IQ. Its pretty hard to communicate in such situation.
    I think “semantic” search can be a reality only if we could find better ways & ideas to communicate and express our thoughts to today’s computer.

  26. Nigel Tufnel

    Hey Radar, Toy’s R Us just called, and they want the backwards ‘R’ on your logo back. Hahahahaha. Ahh, I just slay myself… :-)

  27. Zach Weisman

    AI in 2020 is like Kubrick expecting HAL by 2001.

  28. gilltots

    one thing that is absolutely not going to work is relying on the content owners to put the semantic smart data into their pages. that would be no better, because it relies on people a) knowing what they’re doing, and more importantly b) not lying about it to boost their search rankings. effectively it’d just be pushing the keywords down into the metadata instead of the page content itself, unless i’m missing something.

    i’m totally with you though that keyword search sucks, especially for tech stuff. i search for some obscure thing and have to wade through 500 shitty blogs that don’t have anything to do with my problem but mention the same words.

  29. Daniel Tunkelang

    I’ve been participating in the Twine beta, and I don’t yet see the value. Presumably they are depending on scaling up long before 2030.

    But I note the glaring absence of adversarial information retrieval in their discussion. Until and unless someone can reinvent web information retrieval to not be a no-holds-barred competition among publishers (particularly spammers) for the attention of users, mitigated only by the secret relevancy computations of search engines, we’re not going to advance far beyond the present state.

    More discussion at http://thenoisychannel.blogspo.....ghal.html.

  30. Daniel Tunkelang

    URL in last post should be http://thenoisychannel.blogspo.....nghal.html

  31. David Hardtke

    Keyword search works effectively for a large fraction of searches and in these cases it is probably the most efficient way for the user to get what they want. As Charles Parker pointed out, any technology that slows down these easy searches by requiring extra user effort or new user behavior is doomed to failure.

    The approach we are taking at Surf Canyon is to augment the keyword search with an intelligent agent that observes the user’s interaction with the search results page to figure out the true meaning, in context, of the user’s query. By simply observing how the user interacts with the search results you can often figure out which results are relevant to that user at that time and which are noise.

    The ultimate search engine of the future will adapt to the user’s real-time needs (is the user looking for a specific web site? is this an open ended research query? Is this a keyword search or a natural language query?) and respond appropriately.

  32. Amir

    gilltots - I agree, I’m a bit sceptical about the semantic web in general for the reasons you gave. I think better search is going to be about personalization and the interface rather than anything fundamentally changing how keyword search is done.

    On your comment about finding tech stuff - you should try out WebMynd. I am biased since I helped create it, but I use it myself for collecting good tech resources while doing development.

    WebMynd doesn’t help you find stuff in the first place, but it saves and records all the pages you go to as you find them, so when you find something useful, it stays found. When I type ‘MochiKit’ or ‘python’ into Google I get all the resources that are actually useful in the WebMynd results rather than the Google results which are pretty general.

  33. Max

    Wait, wait… This Semantic Web, which is projected after the Social Web… Wasn’t that the talk of the town right BEFORE all this web 2.0 crap? Yes sir, it was. So either the Semantic Web is at 1.5 or something, or “they” decided to put it on hold and concentrate on 2.0 first.

    *sigh*

    This web stuff, it’s confusing.

  34. Joy

    This is the best article on Techcruch so far. Simply awesome! Well written. Michael Arrogant if you are reading this than learn how to write great and useful article.

  35. MyMesh.com

    Some fresh air… ;-)

  36. Webomatica

    Just now I was searching for info on Flash classes - but the programming version of “classes.” All the Google search results were for sit-down courses on Flash. Not what I was looking for at all.

  37. Google Tutor

    nice article Erick. keyword search will be supplanted, or heavily supplemented, how quickly is a tough call… and would we even notice it happening?

  38. pj@ferodynamics.com

    You would be surprised how good Google is at making sense of keywords, especially when you consider all the new tech terms, slang and jargon emerging every day. Just a few days ago nobody knew what a truthbox was, but already Google knows “truthbox” is a Myspace app. It amazes me Google makes these deductions without human intervention.

    Also, it’s a fact search queries are much longer today and I often see queries now that are complete sentences, often in the form of a question.

  39. Tony

    @webomatica, try “flash class files”. Wouldn’t “actionscript class files” be more correct though?

  40. matt mcknight

    “There are many approaches being tried: social search, tagging, guided search, natural-language search, statistical methods, open search, semantic search, and (way out there) artificial intelligence. They all have their problems.”
    Except he doesn’t mention what the problem is with guided search or statistical methods. Maybe because there aren’t problems, it just sounded good to say everything has problems.

    This whole premise of semantic web improving search is BS. Saying that the semantic web is going to save search is like saying that meta tags help. The boost they gave to spammers helped open the door for Google and PageRank. The initial point of PageRank is that you can’t trust the data, so the semantic web approach only improves what you can’t trust. The semantic web crowd are just a bunch of XML jockeys / failed librarians who love telling other people how to organize their own stuff. Things like del.icio.us tags are vastly more effective, when there is a community of sufficient size to outweigh the spam.

    What exactly are people having trouble finding on the web? Guided navigation, even at the level that Google already provides, certainly helps you get the right kind of thing back (video, image, book, journal article, etc.) even if you don’t specify it up front. It’s really the attempts to categorize all the worlds’ information that end up failing- very few people try to browse the web via interfaces now like the old yahoo hierarchy. It can be done at smaller scales, but not for everything.

    I always think people like commenter #3 sound whiney. I just did a google search for-> xp script startup <- (which was the first thing I tried) and got great results. Maybe you need some search training?

  41. John Mitchell

    All of those things but applied in vertical, niche search engines. The specialized search engines can do a lot better job than the general search engines. For example, http://krugle.org/ for code and http://markmail.org/ for email.

  42. Popular Wealth

    Keyword search is the NUMBER ONE way to search and always will be.

    There is no needle in a haystack problem, there will always be a #1 site, and a #2 site etc on Google.

    The problem webmasters face is increasing competition to rank #1 and thats a GREAT thing for everyone.

    A semantic web isn’t the answer - proper SEO (that so many sites ignore) is.

    Did you know that you could cut the size of the blogoshpere IN HALF right now if you got webmasters on board? Seriously, take a look at your average blog. Look at it’s source code. notice that the head section is on average 250-300 lines long? Then look at an SEO’s blog like mine, the head section is under 20 lines long. Actually my blog contains more data, uses more tools and is STILL shorter than just the head section of most blogs. Thats part of SEO. Thats helping google not drown in lines of code. Thats the answer.

    Bloated sites belong on page two or beyond… well coded sites that help google deserve a shot at #1. IMO of course.

  43. TomW

    I’m in the camp of the two not being mutually exclusive. Users will continue to seek relevant content using keywords. What will improve is the search engines ability to disambiguate their queries and present more intelligent organization and segmentation of results using semantic-web type processing. Its highly unlikely, however, that content producers will be able to generate proper semantic tags on their own. A big growth area on the web will be technologies that can process unstructured content and create intelligent meta data which can be purposed in the future for new semantic signatures that arise. These won’t appear top down for a long time, as the coordination costs are too high. Early signs of how this might happen are Yahoo’s standards for meta data ingestion within Yahoo search. There will be many flavors of the semantic web for some time to come (and a host of Semantic Web agencies much in the way there are SEO agencies today).

  44. Youn

    When we talk semantics, don’t we also talki about keywords? After all, words are semantics, so Keywords ad will not disappear so quickly, I guess

  45. Monchster

    I have to agree with the comment made by gilltots, too much will rely on the human aspect of providing “valuable data”.

    As it is, using keywords is mainly now and will continue to be the means to “search” the internet, it is not totally reliable. Being in the IT support field, I use Google constantly.

    1 out of every 5 searches results in me having to play with the keywords being used. Not every search do I get what I need, or sometimes I get data that is years old.

    Trial and error… we’ll see!

  46. Webomatica

    @Tony, yep, your suggestions worked for what I was looking for. But I think you get my point: I want Google to be able to read my mind :|

  47. Djilali

    A good concept we could extract from some comments here:

    What do you think of a mashup between google, wikipedia and delicious results with a proper algorithm?

    We can start with a simple weighting algorithm and statistically refine it.

    Too many projects in my hands and no time to play with it but let me know if someone gave it a try.

  48. Brandon Wirtz

    Djilali,

    I can tweak my results at Isayhello.com to do that pretty easily. But the problem is that if you use UGC to filter your results then they are too easy to manipulate. I can give something 500 delicious book marks easily enough. I don’t publish how I filter my results for this very reason. I looked at using Digg, and I do use them if a search is detected as “Trendy” meaning based on recent events. But that can be fooled by a game show like Are you Smarter than a 5th grader which often creates “news” like trends for dead people, like Which president was never married….

    Tweaking the results is not as simple as doing a mash-up. It is about refining the search before you process it, and returning results that are based on the type of search. I have started to address the results issue, and am using suggested searches to help users find the right questions for what they are seeking.

    As I grow I hope to handle both, but I started with one side of the equation, looking to add the other later.

  49. adrian chan

    Great post — the problem is real — the web’s rapidly filling up with speech, and speech as data is infinitely more tricky to analyze than the printed page/post (which is where web pages got started).

    Just getting context out of tweets, feed and status messages, comments etc is a massive headache unto itself, let alone relevance, rank, influence, etc etc. But that is where it’s headed.

  50. Martin Kelley

    The semantic web is a great idea but turning it into a widely used practice would take years of retraining the population. Look at CSS: an obviously superior technology but ten years after someone figured out how to hack a design using html code for spreadsheets, half of the internet is still built on tables.

    Anyone wanting to build a better search engine can’t begin by whining that the input data (the entire WWW) is built wrong. It’s the search engine’s job to embrace the chaos and find ways of making sense of the data. The folksonomic approach tries to build out on context to understand relationships and meanings that aren’t necessarily apparent. The most interesting websites now are combining multiple services and different types of data to achieve what I’ve been calling “folksonomic density.” On a daily basis I’m interacting with Techcrunch via RSS/Netvibes, Arrington’s Twitter feed, references I’m seeing on other tech blogs and even sometimes by visiting the site itself. Searches that can figure out these relationships in a sort of fuzzy logic will give better data. Some of it will be semantic, either by design or convention (e.g., Google would be smart to index Twitter’s emerging hash-tags). (Shameless self promotion alert: I wrote about folksonomic search and density in my recently published O’Reilly shortcut Web 2.0 Mashups and Niche Aggregators.)

    The talk in the comments about SEO bring up an interesting point: wasn’t metadata the first attempt at a semantic web? Professional scammers figured out how to abuse it and now search engines ignore the data (except for description). It’s much easier to game a system built on pure logic than it is to fake folksonomic context. The semantic web has a lot to offer but it’s never going to give the best unbiased data.

  51. Charles Knight

    Erick,

    Do you ever allow your posts to be republished?

    Please let me know.

    Thanks,

    Charles Knight, editor
    AltSearchEngines.com

  52. TravelsTooMuch

    Your spaghetti came off the wall. Go get a broom. Worse, it’s conjecture of the worst sort.

    The paradigm for correct parsers don’t exist. You can’t compile search indexing in such a way that it can be correctly queried by ‘natural language’ because natural language doesn’t exist. Go to Omniglot if you have any belief to the contrary.

    Instead: teach a little about what George Boole and Pascal thought. Oh, yeah, old dudes. They pre-date Berners-Lee yet still formed the logic upon which the entire computer industry is built. Understand basic Boolean logic and you can get results– good ones– almost every time from even ugly search engines.

    Try to get blood from a stone and you get tired and the stone won’t give it up. That’s because there’s none inside. Try to infer what’s not real and it’s still not real, and a semantic web is impossible because we have few real semantics.

  53. pj@ferodynamics.com

    All you noobs, listen to “Popular Wealth” he’s the only one making sense here.

    This entire thread is like a fantasy that Google will go away easily like a bad dream. I don’t think so.

  54. David Scott Lewis

    As “Twinerians” (those in the Twine private beta) mostly know, I’ve been the most active Twinerian since the private beta launch last October, although I’m now in the number two slot after Nova. Also, one of my twines, the “Apps: On Semantic Web & Related Applications” twine, is the top twine that wasn’t created by a “Radarian” (a Radar Networks employee) with slightly north of 1,000 users since its launch six weeks ago. (Besides the three Twine help, feedback and news twines, it’s number two to the Radarian-created “Web 3.0 - Semantic Web” twine which was created on day one of the private beta several months ago.)

    So, you may expect me to rush in with a defense of Twine. Well, yes and no. I think it’s more of an attempt to clear up some misunderstandings. Also, what I’m saying are my views, not views that are necessarily held by Nova or Radar Networks. Just my 2 cents as a hyperactive Twinerian.

    Some additional disclaimers: I use other semweb tools, too. I’m in the Powerset Powerlabs beta, the True Knowledge beta, use Hakia, use the AdaptiveBlue BlueOrganizer Firefox add-on … I’m even moderating the “Rising Stars of the Semantic Web” keynote/panel at next month’s Semantic Technology Conference. (Nova will be on my panel, along with the CEO’s of Powerset, Talis, AdaptiveBlue, a stealth company started by a Stanford semweb visionary, and perhaps a surprise guest panelist.) So, I’m biased in favor of semweb, but not 100% necessarily on Twine.

    First, separate semweb from AI, extraterrestrial intelligence, metaverses, “Commander Data” and transhumanism, and a lot of other flaky stuff. In fact, although I support R&D in (some of) these areas, they’re not ready for prime time, regardless of the fact that IBM is experimenting with some of this stuff (like Second Life). For IBM and DARPA (& Co.), go for it! Win another Nobel Prize, come up with a cute defense app, something the “Terminator” would be proud of. But for the real world, well, this flaky stuff isn’t ready for the real world.

    So, what about semweb? I guess it depends on what one is trying to accomplish. In general, however, I’d say that it is ready for prime time and I’ll provide an example.

    Second, Web 2.0 types may not be the best market for an application like Twine. The TC crowd already tags, has their established Yahoo or Google Groups or web-based discussion boards, uses a wiki, uses Typepad, uses Google Reader and maybe even a desktop reader. So, IMHO, the TechCrunch may NOT be the right market for Twine. In fact, I believe the TechCrunch crowd is indeed a right market for Twine, but not the only (or even the best) market.

    But in the real world, people will not tag. That’s a fact. Just accept this as a fact and don’t dispute it. Your friends and TechCrunch readers do not comprise the entire universe of internet users. Twine’s semantic auto-tagging (i.e., “semtagging”) does the tagging … and this is great for the vast majority who will NEVER, EVER tag. Is it perfect? No. It still needs a lot of work. I find that it’s nearly perfect in some cases, off in left field in other cases. Somebody mentioned the problem with metadata and the potential for spam. Yes, I see this as a potential problem, too. And even some news sources (especially BBC) include so much BS metadata that Twine doesn’t effectively grab the right concepts/keywords/subjects/tags. Yet, Twine is doing a pretty good job with semtagging. B- at this point, but I’m expecting B+/A- performance by September/October.

    As a platform for online threaded discussions, it’s already better than Yahoo Groups or Google Groups in every way except one (and it’s a biggie): There’s no way to opt to have messages sent directly as individual messages to one’s designated e-mail account. Boy, this was huge oversight by Radar Networks … and it greatly diminishes the utility of Twine as a platform for online discussions. I run a twine for the de facto expat Chamber of Commerce in Qingdao (China) and this is a constant complaint that I hear. Fortunately, Radar Networks will address this issue over this summer. In every other way, however, Twine is superior to Yahoo Groups or Google Groups: Easier to share content of all different types, easier to discover related content. This is not a trivial issue: It really does make Twine superior to Yahoo Groups and Google Groups. BTW, messages for a Twine can still be read in one’s e-mail account via the Twine Digest (a daily digest), but it’s not as effective as having an option for receiving individual messages and having a way to reply to them in one’s e-mail account. Fact is, a lot of the world lives in Microsoft Outlook (not us TC readers, but a lot of others do), so Twine needs to play nicer with Outlook (and Gmail, for that matter).

    As a wiki, Twine isn’t there. I hear things, so let’s revisit this in about 12 months. Until then, Twine isn’t much of a wiki platform.

    RSS reader and blogging platform. Not there yet, either. However, semantic blogging (i.e., “semblogging”) is coming. I know when, but I’m pretty sure I can’t say. OK, you can hold your breath, though: It’s coming sooner rather than later.

    Playing nice with Facebook, del.icio.us, is also in the works. (I’m 100% sure about del.icio.us; whether it’s Facebook or LinkedIn or MySpace or Xing, I’m not so sure.) Expect this sooner rather than later.

    How about importing of RSS feeds? Yep, in the works. Forwarding of e-mail to Twine is a bit more problematic; Twine doesn’t have a good way of handling HTML e-mail messages. Forwarding of plain text messages already works, adds a semantic layer to the messages. But HTML messages become totally garbled on Twine. I do NOT expect a short term solution to this problem. Maybe late 2009/2010. But importing of RSS feeds to give (blog) posts a semantic layer: Coming soon!!

    At its worst, Twine is the best online discussions platform and best social bookmarking site. The real world won’t tag, so semtagging takes care of this. And it has been my experience that Twine’s semtagging is generally better than human tags on del.icio.us … and I think we can all guess why: Too much variation in human tagging. This is a HUGE advantage for Twine … and an advantage I’ve never even heard Nova mention.

    At its best — well, the sky is the limit. I’ll rephrase: At its best circa April next (to use Commonwealth parlance), Twine will also have some wiki capabilities, better and worse than something like Wetpaint, will have semblogging, and will be a feed reader.

    Let’s be honest: All online feed readers suck. So it wouldn’t take much for Twine to match Bloglines or Google Reader. And it certainly wouldn’t take much to beat FF 3 or IE 8. With semtagging, it will be much easier to find related content. Of course, Bloglines and Google Reader are not going to sit on their hands while Twine develops feed reading functionality, so it might be a couple/few years before Twine can match Bloglines or Google Reader in scope of features. In fact, Twine doesn’t need to match Bloglines or Google Reader in scope: 80% of the functionality + semtagging will give Twine an edge.

    Semblogging. This is something that may have zero interest to me, but may prove to be the killer app for bloggers. For active public bloggers, semblogging will be the best thing since RSS. Whether Twine can serve as the blogging platform or not remains to be seen. Twine’s UI needs a lot of work. Everyone feels the UI needs a helluva lot of work … some even think it’s downright dreadful. (If Nova had an “if I could do it all over again moment,” I wouldn’t be surprised if making a better UI would have been a higher priority before launching the private beta. Remember, though, it’s still in a private beta, not even a public beta.) But how difficult would it be to match what the majority of bloggers do with Blogger or Typepad? I suspect Twine could do this, but probably won’t have all the cute features of a Typepad for another 18 months. So, I wouldn’t be surprised to see blog posts created on a Twine and cross-posted to a “better” blogging platform.

    So I’ll conclude with an explanation of semweb that’s easy for everyone, including your parents and kids, to understand. Nova talks about the web being a file server and semweb being a database. Although true, this won’t play in Peoria. Best explanation: The web is like a book store with 200,000 books randomly scattered throughout the store. For the most part, it’s impossible to find anything. Enter search. Search helps. But search helps only if someone uses the exact keywords/subjects/tags to find something, e.g., using an author’s name (if they even know it), the exact words in the title or subject headings. Boo. This is primitive. And it doesn’t allow for browsing.

    Semweb brings organization to this disorganized book store. Now there are aisles, each clearly marked by subject categories. In a physical book store, within the categories the books may be placed by author, which is only marginally effective for browsing. But the aisles clearly noting subject categories make browsing fairly easy — and much, much, much easier than browsing 200,000 randomly placed books. In this sense, semweb is more like a library organized by the Dewey Decimal classification or Library of Congress Call Numbers. And this is much better than even a book store organized broadly by categories and then by author.

    With search, you can find something and then jump around … and jump around … and jump around. Those of us who are Google power searches (e.g., we search by file type, such as filetype:pdf), date search, phrase search, can do a lot better at searching. But the average searcher — even the average Google searcher — tosses in a bunch of keywords and hopes for the best.

    With semweb, the web has meaning and structure. I can browse much more efficiently, don’t have to use EXACT keywords. And I can find related information much more easily. I can jump around AND continue browsing. This is a very powerful concept, a major advantage for semweb.

    A key advantage I see with Twine is recommendations. I’ve found that Twine does an excellent job recommending twines (think of a twine as a blog or discussion group, although it could be a narrow as a grouping of tags). It does a pretty good job recommending related content. For example, I’m looking at a bookmark and it recommends other content — content that may not have ANY of the same tags in a del.icio.us universe. As far as recommending people, the jury is still out. It’s probably a critical mass issue: Twine needs more users and each user needs to add more content (e.g., bookmarks, documents, videos, podcasts), subscribe to more twines, add more comments, connect with more people. But these are growing pain problems that will likely get resolved.

    Semweb as Web 3.0 is a bit silly, but even I fall into this trap. The library/book store organization analogy is better, doesn’t really have anything to do with Web 2.0, 3.0, 50.0. And an organized web is a good thing. It’s not HAL, it’s not Commander Data. It’s not AI. It’s an organized web.

    Bottom line: Semweb (and Twine, in particular) bring a lot of Web 2.0 functionality to a world that hasn’t been exposed to Web 2.0, a world that will not tag, a world that wants recommendations pushed to them, a world that doesn’t want to search and search and search to find something (or, isn’t willing to learn or use advanced syntax). For TechCrunch readers, Twine is better than a Yahoo or Google Group, better than any social bookmarking system sans Furl (which has full archiving). And semblogging might be the next best thing to look forward to in blogging … and Twine will lead the way.

  55. Bjorn Tipling

    That graph makes no sense at all. And who knew there was a whole cult-like culture of “alternate search engine” people?

  56. ididak

    Maybe I’m cynical, I wonder how many of these semweb advocates (I mean people like Spivack, not really talking about real developers) actually understand precision/recall and ROC curves? How about the impact/cost of various misclassification of sub components (manual or auto) on these curves? Many of the touted techniques have been used by Google and alike for years. If there is any breakthrough in search in the next 10 years, I’d put my bet on Google instead of these jokers.

    The problem is not only engineering (where Google has about 3 years lead over the next competitor), but also that fundamental algorithms for such breakthrough are yet to be invented. Still, there’re massive amount of incremental improvements can be made to keyword search as we know it. Read Udi Manber’s recent interview to learn more.

    Spivack’s curve is wrong. The current semweb techniques don’t scale and are only suited for small-medium sized, mostly closed data set like digital libraries.

  57. Jonsid

    >But anyone frustrated by the sense that it takes longer to find something >on Google today than it did even a year ago knows there is some truth >to his argument.

    I did not know this was the case. I’m curious to look at any studies that suggest this. There are some alternate hyptheses for this phenomenon too if it is indeed the case,

    1. users google for a lot more stuff now than they did a while back. And the average user satisfaction at a query result might have gone down as a result.

    2. As the web grows, so does spam.

    I am not sure if the semantic web or anything else might help if this was the case. (again I am not sure if this is the case, these are just my hypotheses as to why a user might be less happy with google now than a year back.)

  58. Roald Sieberath

    There’s one area where it’s “easy” to improve on search (and judge what’s proposed here) : it’s within the enterprise.
    There you can rely on the expertise of your colleagues to help you find the right information. It’s partly semantic (tagging), and partly social (your ‘internal’ social network). Imagine a mix of delicious or mahalo (and other things), but by and for your colleagues.

    There’s a nice explanation at http://www.enterprisesocialsearch.com
    (it’s sponsored by whatever company offering a tool “Knowledge Plaza”, but it makes the point clear)

  59. gianluca carrera

    I don’t think vertical search will help much… oh, well, it is helping quite a bit nowadays, but I am not convinced we will have many vertical search engines in the future… they will lack the scale to compete with the bigger search engines, and when going after that space will be worth the while , the big ones will just do it…

  60. dave

    dude, this is OLD news - where have you been? i interviewed laurie locklee back in 2004 in competitive intelligence magazine about this particular kind of issue, the future of web mining and intelligent automation of research…

    it’s online, has been for years, since 2004, enjoy:

    http://www.passingnotes.com/ar.....eb-mining/

  61. David Scott Lewis

    @ididak, have you tried Powerset? Far superior to Google for searching Wikipedia, especially for casual (re: the vast majority of) searchers. Can it scale? Now that’s a great question. But maybe it doesn’t have to. Maybe the semweb players can take a Mahalo approach to the best of the web, picking, for example, the top 1,000 news and reference sites (versus Mahalo choosing the top searches and using human indexing).

    And with something like Twine, you can also tie together content that is added by Twinerians.

  62. Workpost

    Do you ever get the feeling that none of this is going to turn out like we think it will?

  63. Abhishek

    Finding the exact information one wants is becoming harder and harder with the increasing amount of information on the internet. I feel it is the responsibility of the people who supply information to make it SEO.
    Abhishek http://www.dibugs.com

  64. pawel lubczonok

    Auto Tagging by noticing syntactical/sematic contect in pages is the definitively is needed but it is no so far removed from keyword search. It may in fact be isomorphic to the best keyword search engines. Thus due to the effort needed and insufficent benefit, unless the tags are generated automatically there will be insufficient takeup.
    What is actually needed (I believe) is the deep expression of semantics that it machine understandable. RDF’s OWL’s etc. etc. are far from being able to achieve this. The conceptual framework must be much deeper.
    We at ThoughtExpress have been working on such conceptual framework and theory and can run enterprises on our semantic machine. In July we will launch on line semantic space (nothing like anybody is talking about so far) There is no IT in it :-) Anyway, it is good to see people having a pationate discourses.

  65. Sudoku Maniac

    either ways.. I see google to adapt to the new changes..

  66. Jeff Wilson

    It’s the responsibility of the SEO community to preach the gospel of having the content ranking and sorting correctly. This should be a great opportunity to be the interface between the content providers and the search engines. Google and Yahoo are wholesalers after all.

    Jeff Wilson
    http://www.marketdrivemarketing.com

  67. Daniel Tunkelang

    David, I’ve played a little with Twine and a lot with Powerset (which I blogged about here). I have yet to find any value in Twine, though perhaps I simply haven’t been willing to make the necessary investment. As for Powerset, I don’t see why you think it is superior to Google for searching Wikipedia, especially for casual searchers. I find it at best comparable, and at worst more complex. Wikipedia is really easy to search–you don’t need NLP, semantic web tools, or any other rocket science for it. I think Wikipedia would benefit a lot from an exploratory search interface, but that’s another story entirely–and not one that any of these guys are going after. As Harper Reed as argued, what they need is guided navigation.

Leave a Reply

Continue the conversation in TechCrunch Forums