A new service called Blotter from startup Dapper (dappit.com) is getting some good coverage around the blogosphere today. Blotter graphs Technorati data for any blog over time. Most exciting to me though is Dapper’s basic service, just launched this week. The company says it’s effectively offering an easy way to create an API from any website. This might look like crass screen scraping on the surface, but the company aims to offer some legitimate, valuable services and set up a means to respect copyright. The site is clearly useful now.
Dapper provides a point and click GUI to extract data from any web site that can then be worked with and displayed via XML, HTML, RSS, email alerts, Google Maps, Google Gadgets, a javascript image loop or JSON. The site could use a UI overhaul to make it easier for nontechnical users and copyright issues will have to be dealt with. That said, Dapper is pretty awesome.
Dapper is lead by Jon Aizen, a Cornel CS graduate who’s worked on the Alexa Archive and the Internet Archive and CEO Eran Shir. Aizen says the company aims ultimately to offer a marketplace for content reuse through Dapper, allowing publishers to set the terms and prices for any creative reuse of their published content. This is the kind of thing that it takes serious negotiation to do today, but Dapper has the potential to make such deals far easier for far more people. For developers Dapper will just save time, Aizen says.
Here’s how it works. Users identify a web site they are interested in extracting data from and view it through the Dapper virtual browser. Aizen showed my how to do it using Digg as an example. I clicked on a story headline, on the number of diggs and the via URL field. I went to another page on the same site and did the same thing so that Dapper could clearly identify the fields I was interested in. I then went through the various tools available on the site to set certain conditions and threshholds and ended up with XML feeds I could do all kinds of things with. Like send me an email whenever there’s a TechCrunch story on the front page of digg, or when a search results page shows a TechCrunch story with more than 10 diggs. After I create an end product through the site, other users will be able (after a 24 hour period in which I can edit the project) to use my project either as is, altered to fit their needs or in the future, in combination with other projects.
The alerts are of most interest to me, but data from other sites can be mapped on Google Maps, turned into an RSS feed for sites that don’t publish feeds, turned into a slideshow if the data is in the form of images. Aizen says he’s created a tool for himself that runs feeds through Babblefish automatically and produces a translated feed. The possibilities are huge.
Privacy and licencing the technology so it runs on your own servers are both things the company is looking at for the future. Both are pretty key.
Though the company says the site is largely a proof of concept they are also seeking seed funding and it’s pretty usable already. Dapper says it’s aiming high: what Geocities did for static web pages, they want to do for dynamic content reuse. If they can find a good way to manage the rights pitfalls around reused content, and I’d like to believe it’s possible, then we may start seeing a lot of dazzling new ways to interact with data built via Dapper and popping up around the web.









Brilliant concept, although I’m tired of seeing heavy use of “web 2.0 interfaces” or whatever. Can’t the new ventures combine old-school professional look with AJAX and other new tricks? I’m sick of big fonts, sick of pastel colors, but I do like the interaction AJAX brings to the browser.
Great product. The era of being able to consume and mix and match data on the web at will is looking very exciting. Bring it on!
Fascinating, and I definitely see value here, but I wonder if the real value lies in being able to extract data from sites that are currently data silos, like myspace (good to see facebook opening up, hopefully myspace follows). If history is any guide (singlestat.us), I doubt that myspace or other data silos will embrace a service like this. I’m not saying it wouldn’t be prudent to do so, but for all their success, myspace (and other silos) doesn’t seem to get it.
After all the hating in the comments from yesterday’s webmail.us post I’m glad to see that techcrunch is indeed still the #1 source for web 2.0 news. Now THIS is what I read TC for…innovative and usefull web 2.0 apps.
I think dapper is great! This will allow people to do a lot of things that would have been very difficult before. My question is, who exactly is this product targetted to? I know that “anyone” can use it, but I would guess that only people who are motivated enough to build their own alerts/maps/mashups would get a lot of use out of it. It seems like it was designed for someone who owns a website to use dapper to organize the site’s information for external use and to use it to create mashups. I’m very curious to see what kind of traction dapper actually gains…it’s a great product with a small target market.
(Which brings up a side thought… does it seem like web 2.0 products are made to be B2C? I’m having a hard time thinking of any very successful web 2.0 B2B products. And by B2B, I’m including things like dapper or clickdensity which are more for small site owners. When I think of web 2.0 successes, I think of delicious, flickr, youtube, wikipedia, digg…all B2B. I know a big part of web 2.0 is open data and user contributed content. So does this mean that “web 2.0″ isn’t applicable to the B2B world?)
I’m seriously wondering if this is legal or respectful of website owners’ rights. What are they doing different that screen scrapers dont do? If a site does not want its content used, shouldn’t that be respected? Granted, a lot of site owners havent thought about this yet, because screen scraping is relatively uncommon and frowned upon, but this definitely will be a hot topic in months to come as this kind of website scraping becomes more common.
Since not everyone who should have an API can afford the time to develop and maintain an API the same way they can maintain their main web site, you need something like Dapper to get into the “tail of APIs” with tools like these. With some luck, we can see 1000s more “unintended” entries in Programmableweb.
This is a terrible idea. It’s just the kind of thing you’d expect to be invented by an engineer with an itch accompanied by no product experience and no hands-on experience running a web services platform.
A web service is not just about xml over http. There’s a lot more to it than that.
Good points Jacob.
Dapper has every intention to be respectful of people’s content. In the future we will provide a simple mechanism for content owners to “just say no”, and we’ll promptly oblige. But we intend to do more than that. We intend to empower the content providers in brand new ways, and give them new abilities to leverage their content, for everyone’s benefit. I definitely agree that this will be a hot topic, and we will make an effort to educate content providers about newfound opportunities, but at the end of the day, we intend to be a law abiding service and will accommodate the owners’ requests.
Cheers,
Eran
I see some real problems with this. This is just like Google saying “we’ll give you the option to opt out” with their massive book scanning concept.
AOL engineers had good intentions when they released all that data, but the fact that so much of it CAN be used for nefarious purposes makes it a danger. To be fair, maybe I’m missing something, but this sounds like a way to bring scraping to the masses, which in my view is not a positive thing.
I’m just as idealistic as the next guy who supports the open source movement and data sharing, but I think this type of service needs to be initiated by the website owner, not the user. There are just too many copyright/misuse issues here.
@Joel: There were a few. I’m thinking of echosign or freshchalk in particular. And with your definition of B2B, why not throw in all of 37 Signals?
I think traditional wisdom would say there’s a very small market of willingly disruptive businesses.
@Jeffrey: Yeh, this is really a terrible idea. Trolls like you should have their IPs banned.
To concur with every single comment so far, and with Mike’s very own coverage: this is pretty damn great, and I may be making some use of it.
Paul, I think Jeffrey’s point was more than valid enough to take into consideration! Glad you like the service though, so do I.
this looks cool and likely useful. For example, right now I use a third party scraper running on my myspace blog to turn it into an rss feed (myspace’s rss feeds only include a preview of the blog, not the full text), but it gets the feed backwards and dates arent quite right – my feed reader sees the same old post as being new every time it hits ti. so I pipe that feed into feedburner which cleans it up nicely. then a cronjob on my server calls my feed wordpress plugin to look at the feedburner feed, and in the end, everything from my myspace blog ends up on automaticaly my real blog within a hour or so attributed to me and in the category myspace.
It’s not broke and I’m not fixing it, but if I’d had/known of a tool to do this with less futzing around in the first place I probably would of had it sone simplyier and easier.
also, what did geocities do for static content? make it free/ugly?
(I know, comming from a myspace user thats a pot calling a kettle free/ugly, but in my defense I do try to keep my profile legible.)
To be fair I should sign up and create some dapplets (is that what theyre called?). Then I can pass criticism on the utility and fairness of this thing. I’m sure for an end user its great — you just cruise by a website, see some stuff you could use on your website, make a scraper, and bam, you’re in business. But Eran’s answer certainly shows a high degree of respect for what the web is today and for ownership rules — bravo! I hope it will all work out because gosh yes we need something like this, if it can be done while keeping everyone happy.
Eran I wonder if you could ping me at jyl at spammers die a horrible death mod3 really you deserve death dot com. I’d like to ask you some questions about usage etc.
Is Babblefish a Google product, I don’t think it is.
Thanks to everyone for your feedback and comments. It’s extremely valuable to us and we appreciate it.
Dapper has a long way to go, we’ll admit, but we think (and many of you who have tested it seem to agree) that we’re off to a good start. In particular, we have to address a variety of issues, including:
1. Content ownership. This, as has been said, will be a hot topic. As Eran indicated, we have plans to empower content providers to use Dapper as a means of distributing their content in new and easier ways. And clearly in the coming weeks and months, we’ll provide a simple mechanism for content providers to easily stop users from accessing their content via Dapper, if that’s what they want (though, I contend that it will be everyone’s loss – the content provider and the content user).
2. Interface improvements. Though we knew it needed work before, the traffic from the last couple days has really illustrated where our interface is lacking and requires improvement. We plan to make dramatic improvements to enhance usability in the near future. Your feedback in this area would be greatly appreciated. Much of the problem has to do not only with a lack of an intuitive GUI, but with the website and process not properly informing users of what a Dapp is and how it is used. In the mean time, we’re just about to push out a change – after you create a Dapp you’ll get immediate feedback about the quality of the results, allowing you to decide how to proceed.
We really encourage your interaction and feedback. We believe that the most valuable feedback will come directly from our users. I promise to respond as quickly as is possible. Please use the contact form on our website.
Thanks to everyone for testing Dapper over the last couple days.
Jon
Might just be me but I think your commenting form where the main text goes doesn’t word wrap properly on mobile devices. Sorry for the offtopic post but I’m on the train and really wanted to off er my opinion on. Dapper. In short I think that this will be a highly contested application between unique content providers and those running sites dependant on other’s content.
Some might call Dapper a “business development 2.0″ enabler. Yesterday Fred Wilson wrote that web 2.0 offers a new way to integrate with other web services without wasting time trying to negotiate deals. Some examples he gives are Indeed and Kayak.
http://avc.blog...ess_develo.html
Flickr’s Caterina Fake says it’s “much, much better this way”. And I’ll bet that’s exactly what the Dapper folks are thinking.
Oh the idea is great! But I just can’t imagine the Internet being clogged with RSS spam sites. I agree that copyright issues must be properly dealt with here.
不错
An easy way to let content editors or webmasters control whether there content is parsed or not. Make it so that when a user wishes to pull off some content, dapper looks for a file called “Dapper.xml”. If Dapper.xml exists it allows the content to be copied, if “Dapper.xml” does not exist the user can’t parse the content.
Inside the Dapper.xml can be rules to what can and can’t be copied, similar to a robots.txt/htaccess file. So:
Allow All Users
Restrict /admin/
Allow /articles/
Restrict /ads/
A merger between htaccess and robots.txt, but notice that only if Dapper.xml is present can it be copied – doing it the other way is lame, and selfish. I mean the site is stealing bandwidth on each parse, and if needed I mean webmasters will get together and begin parsing dapper with tons of traffic so that its bandwidth is eaten in a similar way.
fgzfg
Ubuntu is almost a very good operating system to run your applications on. Not to totally dismiss Debian and other linux distributions, but Ubuntu has made it easier for developers, end-users and enterprises as well.
This seems to be somewhat like MIT’s Piggy-Bank project (which is part of the larger SIMILE project).
http://simile.m...edu/piggy-bank/
Err, well to be more specific, this seems like a web-based version of Piggy Bank’s “screen scraper” functionality.
Dapper has a great potential changing the content consumption on the web. I hope they will be able to face and handle the huge obstacles that will arrive in the future.
Dapper is an Israeli company and you may find more Web2.0 companies on http://www.thecoils.com (in hebrew)
jonny
Isn’t this illegal under copyright laws? If you are mashing up information from a website that is not your own without permission of the owner isnt it causing material damage? You are causing them loss of advertising revenue as people will not be going to their site as well as the value of click thrus. With the sale of Youtube we see how valuable that can be.
Anybody know the answers to this?
Interesting site http://portaldi.../managed-forex/
With its specialty NDDS in the form of sublingual melt tabs ProVIGRAX gives you a power punch in 30minutes flat & lasts for 4 hrs. dysfunction erectile masturbation
All you need is to have online database, and interface to access to it/add data.
http://www.mytaskhelper.com – it’s a free online database, using which you can create and share online databases.
Examples of forms is on the main page.
thaanks vere vere