
Thomson Reuters’ OpenCalais, a service and open API that lets users to incorporate semantic tagging in blogs, content management systems, or website content, has been upgraded to include social tagging, integration for Spanish content, and improved linked data depth for companies. OpenCalais’ technology is powered by text analytics company Clear Forest, which was acquired by Reuters in 2007. OpenCalais, which is free, uses natural language processing, machine learning and other methods to analyze a document and finds the entities within it. CNET and Huffington Post are among the blogs and sites that use OpenCalais.
OpenCalais 4.1 (released today) and 4.2 (to be released in a few weeks) will first include a new social component that will emulate how a human might tag a document. While OpenCalais is a semantic data extraction engine, it doesn’t necessarily take out the kind of tags a human would put on an article. For example, in an article about luxury cars, OpenCalais would be able to pick out BMW and Porsche as tags but wouldn’t necessarily pick out descriptions like “sports cars” or “automobiles.” OpenCalais’s technology will now generate these sort of tags, called “Social Tags,” and will analyze content and map it to a knowledgebase based on Wikipedia and other sites.
The new version will also extract tags from content written in Spanish. OpenCalais previously supported English and French. One of the neat things about OpenCalais is that it lets publishers combine their content with Linked Data assets from Wikipedia, IMDB and other databases. The new version has also upgraded the Linked Data features for company data, linking to new enterprise information sites like our own CrunchBase. And OpenCalais’s semantic entity database has been spruced up to include recession relevant terms such as accounting changes, labor issues, layoffs, earnings restatements, delayed filings and more.
Like Akismet andAnother player in this space is Zemanta, which we’ve reviewed here.
div class=”cbw snap_nopreview”>








I’m this HTML snippet at the end of the post (FireFox):
div class=”cbw snap_nopreview”>
This is great news for site admins.
As these easy-to-use semantic tools gain ubiquity, content-driven sites such as this and those mentioned above will help users make more rich and relevant connections between topics. Pulling external sources only amplifies this value.
Phase2Technology (http://phase2technology.com) has been doing some pretty cool things with Calais and Drupal lately. The company I work for has partnered with them on a few projects, so I’m interested to see this being integrated on sites moving forward.
great news:)
Yes =)
We are using OpenCalais services and API in our matching algo.
Go to http://twitter.com/twitbubble and singup to see what this is all about.
What a terrible name for a website! Did they not notice the striking similarity between “Calais” and the ED medication which is one of the most spammed words in the English language?
It would take me all of about 0.5 seconds to mark a subject line “give Calais a try!” as spam. Maybe their next product will be called “OpenVigara”…
I tried to sign up with my gmail email address, it shows The e-mail address ….. is not valid. Seems it does not allow free email address for registration?
I work on this stuff in grad school… too bad the really interesting things happen secretly holed away inside industry.
@aditi
Perhaps you want to join us at Apache Mahout and work on this type of stuff in the open? Check out http://lucene.a...che.org/mahout/
Great News …!! More Social And Recession Friendly .!
please remove div class=”cbw snap_nopreview”> at end
Phase2’s Drupal CMS for publishing mentioned above, which uses OpenCalais can be found here: http://www.open...com/openpublish
Check out AlchemyAPI for semantic tagging in 8 languages (English, German, French, Italian, Spanish, Swedish, Portuguese). Also outputs RDF, semantic DB links, text categorization, etc.
The AlchemyAPI looks interesting and does a pretty good job of identifying names (seems a little better at that than OpenCalais or ClearForest from my testing). The problem with all these tools is that the NLP is still very primitive. They can identify names and make a guess at some verbal relationships and anaphoric references but by and large they still get it mostly wrong. And that’s not taking into account sense disambiguation, etc. So there’s still an awful lot of work to do in getting unstructured web content organized. The “semantic web” isn’t very semantic just yet.
One more FOS opinion from a jaded wanna-be academic. Go try the tools out. Stop whining that Nirvana isn’t here. It doesn’t come all at once.
yes bro.
Open calais is great. I have been using it to find daily new topics from twitter stream in real time. The results are freely available for the public at http://web2express.org. it provides more complete set of hot topics than what the twitter trending topics does. It helps me quickly figure out emerging events as they emerging on twitter.
aj chen
This is great news for site admins.
+1 This is great news for site admins. =)