March 19, 2008

The Semantic Hacker One-Million Dollar Challenge

Erick Schonfeld

40 comments »

semantic-hackker-logo.pngSemantic startups and projects are hot right now. (See Radar Networks, Freebase, Blue Organizer, Hakia, even Yahoo). But what do you do if you are a little-known technology company in Rochester, New York with a powerful semantic-analysis engine on your hands that you want to turn into new businesses?

You offer a $1 million prize to anyone who can come up with the most profitable application for your technology and call it the Semantic Hacker challenge. The challenge starts today, and is being sponsored by TextWise, a private company backed by pension-fund adviser Bill Manning that has been around since 1994. TextWise uses natural-language processing and semantic analysis to automatically categorize Web pages and create contextual ads for them. But it wants to see what the crowd can do with its technology. It is opening up its APIs, much like semantic search engine Hakia did yesterday. But instead of merely licensing the technology, which it is willing to do, it hopes to generate actual business ideas that it can run with. CEO Connie Kenneally explains how the challenge will work:

The winners of the challenges would turn over rights to their idea. We would award them $100,000 immediately, we would likely make them or their team job offers, and we would build out whatever is required. Then they would receive 50% of the first year’s revenues, up to $1 million.

Not a bad deal for simply coming up with a killer semantic application. Before somebody else takes it, my idea is a semantic search engine that actually works at Web scale (please send the check to my home address). Don’t worry. There can be more than one winner.

light-bulb.pngKenneally is hoping for specific suggestions to apply semantic analysis to different industries. Any idea is fair game, except for four works-in-progress TextWise is already developing: a browser plug-in that replaces ads with content related to the page you are on (foof), semantic bookmarks that bring up related content from the top 6,000 blogs on Technorati (Gyzork), a shopping discovery tool, and a Facebook app that automatically provides shareable links relevant to a given conversation (Festoon). That Gyzork idea is my favorite. Instead of saving bookmarks, you save concepts and the relevant links are added to the bookmark over time.

The way the technology works is that it creates “Semantic Signatures” from any text that is fed into the system. You put text in and it spits out categories it thinks the text fits under, as well as related Wikipedia articles. On the Semantic Hacker site you can try it out by cutting and pasting some text and seeing what it comes up with. For instance, I put in the text from a post I wrote about China blocking YouTube, and it generated a Semantic Signature with these categories:


Society/Issues/Territorial_Disputes/Tibet 68

…/Religion_and_Spirituality/Buddhism/Lineages/Tibetan/Dalai_Lama 48

Society/Religion_and_Spirituality/Falun_Dafa 22

Computers/Internet/Searching/Search_Engines/Google 17

Computers/Internet/Searching/Directories/Volunteer-Edited 17

That is pretty accurate. (The numbers weight the relevance of each category on a scale of 1 to 100). And it produced ten related Wikipedia articles about Tibet as well. “It is like decoding the DNA of the text—creating a semantic map of the text,” explains Kenneally. She says it can scale to hundreds of millions of Web pages, but for the challenge she is limiting applicants to 20,000 separate requests a day, and up to 100,000 characters per request. Unlike, say, the Netflix challenge to come up with a better recommendation engine, TextWise isn’t looking for someone to improve its algorithm, which it is keeping secret. It just wants to know what applications and markets to go after.

If you were going to build a semantic startup, what would it be?

  • Sphere It

Trackbacks/Pings (Trackback URL)

  1. vcastprofiles
  2. Semantic hacking, business ideas and open APIs : business|bytes|genes|molecules
  3. FOREXBOOK.BIZ » Blog Archive » Neat Web 2.0 Stuff: Viewdle
  4. Marketiva FOREX » Blog Archive » Neat Web 2.0 Stuff: Viewdle
  5. Macedonia in Natural Language Search Engines! (ερωτήσεις στο Semantic Web για τη Μακεδονία) « OMADEON
  6. Jusqu’à 1 M US$ à qui trouvera la killer-app du moteur d’analyse sémantique TextWise - Next Audience

Comments

RSS feed for comments on this post.

  1. Boring Market

    Semantic web is Web 3.0.

  2. diystartupnews.com

    Thats pretty cool I am thinking a better techmeme.

  3. YDrive

    Very interesting.

  4. Tarun

    This is pretty cool. Similar to the earlier challenge where they hacked using a networked crazy toaster to hack - http://www.techbanyan.com/archives/104

  5. David

    How’s this compare to http://www.opencalais.com then? The API looks very similar, plus they have a ‘bounty’ too.

    Uh, I seem to be the ‘what about’ guy today :-)

  6. ididak

    I pasted some text (2K character limit) from this article: http://www.theonion.com/conten.....for_change

    Here is what I got: Society/Ethnicity/African/African-American/Online_Communities 57
    Society/Ethnicity/African/African-British 38
    Society/Politics/Nationalism/Black/Black_Panther_Party 22
    Society/Ethnicity/African/African-American/History 21
    Society/Ethnicity/African/African-American/Directories 20

    The semantic web stuff is a joke, that’s what it is. It’s a rehash of supervised training in machine learning/classification, which is a glorified curve fitting scheme. There is no real “semantics” to speak of. It’s a useful technique with limited use, as all major search engines (G & Y at least) have already found out, for at least 5 years.

  7. James

    this technology doesn’t seem very usefull lol

    the algorithm also seems very simple… I guess they go through wikipedia, scan the pages and gamble with the results

  8. James

    I guess you really do deserve 1million$ if you make this stuff usefull XD

  9. Lemon

    Indeed semantic is next web. But here only reserved to US citizens . Either for me (like others) become US or build this in Europe(or US) before end of contest …

  10. Mills Davis

    SemanticHacker technology is new. But, it’s also pretty mature. Research funding began years ago with the intelligence community. Semantic signatures provide a way to make sense of different kinds of content (unstructured information). What’s new is making it totally available for web developers, in a challenge where they can get funded to develop something cool. This kind of technology helps applications create web 3.0 functionality in mash-ups and personalize and contextualize the user experience.

  11. Gyorgy Fekete

    Please point it out that this challenge is for US residents only.

  12. sky

    input text:
    Call us crazy, but we think there are some brilliant minds out there that can find some really amazing uses for this incredibly powerful and scalable technology. Think you’re up to the Challenge? We think you are!

    signature:
    Shopping/Health/Alternative/Hypnotherapy/Audio_and_Video 43Business/Telecommunications/Services/Wireless/Software 33Arts/Music/Bands_and_Artists/311/Tablature 28Computers/Internet/Consultants/Research 26Shopping/Health/Alternative/Meditation/Audio_and_Video 25Semantic

    wiki articles:
    1.) Inhibitory gnosis 2.) Hypnotica (Von Sydow, Eric) 3.) Highway hypnosis 4.) Hypnotic (disambiguation) 5.) Covert hypnosis 6.) Post-hypnotic suggestion 7.) Fractional relaxation 8.) Subconscious 9.) Hypnofetishism 10.) Nicholas Spanos

    Usefulness:
    0.

  13. Maheswaran

    Semantics and AI are the next big things..

  14. IdeaTagger

    I would develop two (potentially related) apps:

    1. A “People in this post” photo widget for blogs or general websites. We see such photos in magazine articles all the time. It would automatically detect names in the post and suggest photos from a database that the author can accept/reject. The author will of course be able to add new names and photos to the database. Each photo in the widget will link to a page on a destination site with links to other blog posts about that person. Over time a map of relationships can be built for each name in the database, i.e. people they were mentioned in posts with and degrees of (post) separation between people. So if I was mentioned in a post with Mike and you and Mike were mentioned in another, then you and I are two degrees from each other. This could potentially have some interesting applications. By the way, if publishers are worried about page load times, the widget could come in a link-only versio too where the reader has to click through to see photos plus publishers could restric the number of photos displayed with a ‘more’ button for those interested.

    2. An application for finding solutions to problems or people who can help solve them. Have publishers tag their posts with a problem tag, i.e. this post solves or discusses this specific problem. I can then go to a destination website, type in a problem (e.g. I have a brilliant idea - really I do - but not the people/money resources to take it forward) and the site spits out web pages tagged relevantly. If linked with the other application above, perhaps it could suggest names of people that could help. I know this is half-baked at best but there is just a massive opportunity in helping people solve problems and regular search doesn’t alway cut it.

    By the way, I am never quite sure how much technology there is supposed to be in the whole semantic web thing. On the one hand it appears as though it is meant to magically create structure out of unstructured data. Reading some other articles on the other suggests that if it is to be any good, web publishers have to do a lot of the structuring themselves, i.e. tagging, categorisation etc.

  15. Chris Peterson

    AI has always been the next big thing…

  16. jt

    interesting article, thank you

  17. Yo

    Everyone knows the best companies are the ones that have no idea what to do with the stuff they build.

  18. draq

    What about a less technological, more incremental, grassroots and loose coupled approach towards semantic future.

    Eg. simply decorate (annotate) regular information fragments with meta-data to be stored in a global accessible context (eg. distributed knowledge base).

    The application layer (browser, browser plugin) is responsible to aggregate the ’small pieces loosely joined’ together while the distributed database engine to provide the query capabilities. (Micro) ontologies may describe the meta-data but this is rather an optional aspect and not a sine qua non prerequisite.

    I already incarnated the described principles within two collaborative services (web annotations — content centered meta-data and soft~link — information connectivity) at http://aspectweb.org.

  19. Jason

    Finally getting some respect out of Rochester,NY :D

  20. Diego Basch

    This is similar to a tool we released a while back:

    http://tagger.flaptor.com

    Only we use simple tags instead of structured categories, because we believe tags are simpler and nicer. Compare the results for the current top headline from Google News:

    http://news.bbc.co.uk/2/hi/americas/7305023.stm

    Flaptor autotagger:
    iraq bush president bush iran war on terror iraq war new hampshire primary war terrorism politics

    Semantic Hacker:
    Society/Issues/Warfare_and_Conflict/Specific_Conflicts/Iraq 51
    Society/Issues/Warfare_and_Conflict/Specific_Conflicts/Iraq 37
    …/Issues/Warfare_and_Conflict/Specific_Conflicts/War_on_Terrorism 35
    …/Issues/Warfare_and_Conflict/Specific_Conflicts/Middle_East 30
    …/Warfare_and_Conflict/Specific_Conflicts/Iraq/Anti-War_Movement 24

  21. Robert

    Erick,

    Stick to writing posts related to business and financial matters rather than technology.

    It’s obvious that you don’t really understand the subject matter when it’s technical in nature.

  22. Olivier Lalonde

    Check out my submission ! Targeted ad in online chatrooms and eventually, instant messengers !

    http://dev.syskall.com/chat/

    Waiting for some feedback !

  23. Rich Skrenta

    Those look like dmoz categories… Pulling out entity references and coming up with a signature vector against a taxo like dmoz isn’t bad. I could see useful apps being built on top of this.

  24. Pierre Innocent

    Hi,

    Their technolofy seems strangely similar to ours ????????
    See http://www.tsert.com/white-papers/nlp.pdf, and some
    preliminary results at http://www.tsert.com/content-analysis.htm

    Pierre Innocent
    President
    Tsert.com

  25. Rich Rosen

    Submit a biz plan and give up all rights. Not such a great deal:

    “Upon entry into the Challenge, Entrant shall grant TextWise an irrevocable, royalty-free, perpetual, non-exclusive worldwide right and license under its intellectual property (including without limitation patent) rights to use the entry for all uses throughout the world.”

  26. sam

    tried out with wikipedia plugin - nice but I was expecting/hoping to be blown away by relevant suggestions and I wasnt

  27. Stoicho

    It’s US ONLY!!!
    Who does that these days? I can understand Hulu, but why THIS is US only?

  28. fire-pixel.com

    LOL did I just win the UK lottery again? Sounds like an Amway scam to get revenue in the door.

    Yes, I’m running a contest too… get us web development accounts and I’ll cut you commission. Fire-Pixel.com.

    Here’s a better article for you to read.

    Top 10 Awesome Websites That Sell Cool Products You Probably Have Never Visited But Need To.

    http://www.comember.net/blogs/firepixel/

  29. Peter T - Webshop

    It seems like folks need a primer on the topic of “Semantic web”, there is an adequate attempt to define it on wikipedia.org.

  30. Et

    @20
    I tried your auto-tagger on the Montreal Canadiens’ page on Wikipedia and I got this:
    rangers hockey 101 in 1001 college basketball new york yankees nhl 101 things canucks nfl fantasy football

    I’m not impressed…

  31. Maheswaran

    Hey this is unfair.. only for US residents :(

  32. reanimated

    As Rich hinted at, there is a disconnect between the article here and the actual terms of the contest.

    The article says, “The winners of the challenges would turn over rights to their idea.” Which sounds reasonable enough. But the actual terms require every entrant to turn over everything (source code, biz plan, ideas) in exchange for nothing plus a loss of control. Why even bother to award a winner when they can just take the best ideas and software and run with it themselves?

    They even slipped this in:
    “Entrants agree to any additional written authorization, assignment or other documentation requested by the Sponsor in connection with the transfer and assignment of rights to their entry,…” That’s entrants, not winners.

    Maybe this is only an issue if you were planning to submit actual working software that showed greater value than just calling their APIs. It would appear to be very foolish to participate and work for three months under these terms.

  33. Too Bad

    I have an excellent idea that could easily use this this technology to create a $100M/year sales.

    But I’m not from the US so I’ll have to do it myself.

  34. Ed Kasper

    I would be creative very creative! I would get differnt people like kids and old people and put thier brains to use they are very creative! then! I would also get special needs people and have them help with creativity as well. creative, creative, creative