Create an API for any site with Dapper

A new service called Blotter from startup Dapper (dappit.com) is getting some good coverage around the blogosphere today. Blotter graphs Technorati data for any blog over time. Most exciting to me though is Dapper’s basic service, just launched this week. The company says it’s effectively offering an easy way to create an API from any website. This might look like crass screen scraping on the surface, but the company aims to offer some legitimate, valuable services and set up a means to respect copyright. The site is clearly useful now.

Dapper provides a point and click GUI to extract data from any web site that can then be worked with and displayed via XML, HTML, RSS, email alerts, Google Maps, Google Gadgets, a javascript image loop or JSON. The site could use a UI overhaul to make it easier for nontechnical users and copyright issues will have to be dealt with. That said, Dapper is pretty awesome.

Dapper is lead by Jon Aizen, a Cornel CS graduate who’s worked on the Alexa Archive and the Internet Archive and CEO Eran Shir. Aizen says the company aims ultimately to offer a marketplace for content reuse through Dapper, allowing publishers to set the terms and prices for any creative reuse of their published content. This is the kind of thing that it takes serious negotiation to do today, but Dapper has the potential to make such deals far easier for far more people. For developers Dapper will just save time, Aizen says.

Here’s how it works. Users identify a web site they are interested in extracting data from and view it through the Dapper virtual browser. Aizen showed my how to do it using Digg as an example. I clicked on a story headline, on the number of diggs and the via URL field. I went to another page on the same site and did the same thing so that Dapper could clearly identify the fields I was interested in. I then went through the various tools available on the site to set certain conditions and threshholds and ended up with XML feeds I could do all kinds of things with. Like send me an email whenever there’s a TechCrunch story on the front page of digg, or when a search results page shows a TechCrunch story with more than 10 diggs. After I create an end product through the site, other users will be able (after a 24 hour period in which I can edit the project) to use my project either as is, altered to fit their needs or in the future, in combination with other projects.

The alerts are of most interest to me, but data from other sites can be mapped on Google Maps, turned into an RSS feed for sites that don’t publish feeds, turned into a slideshow if the data is in the form of images. Aizen says he’s created a tool for himself that runs feeds through Babblefish automatically and produces a translated feed. The possibilities are huge.

Privacy and licencing the technology so it runs on your own servers are both things the company is looking at for the future. Both are pretty key.

Though the company says the site is largely a proof of concept they are also seeking seed funding and it’s pretty usable already. Dapper says it’s aiming high: what Geocities did for static web pages, they want to do for dynamic content reuse. If they can find a good way to manage the rights pitfalls around reused content, and I’d like to believe it’s possible, then we may start seeing a lot of dazzling new ways to interact with data built via Dapper and popping up around the web.