Attributor
by Erick Schonfeld on April 21, 2009

As newspapers and other publishers watch their revenues diminish, one common refrain among them is that maybe they should somehow go after Google or Yahoo for aiding and abetting the destruction of their businesses and sometimes the wholesale theft of their content. We’ve seen how the Associated Press wants to handle this: by aggressively going after anyone who even borrows a headline. Today, a consortium of other publishers including Reuters, the Magazine Publishers of America, and Politico are taking a more measured approach, but one which will no doubt still be controversial. They are forming the Fair Syndication Consortium, which is the brainchild of Attributor, the startup which tracks the reuse of text and images across the Web for many of these same publishers.

The Fair Syndication Consortium is initially trying to address a legitimate problem on the Web: the proliferation of splogs (spam blogs) and other sites which do nothing more than republish the entire feed of news sites and blogs, often without attribution or links. There are tens of thousands of these sites, perhaps more. Rather than go after these sites one at a time, the Fair Syndication Consortium wants to negotiate directly with the ad networks which serve ads on these sites: DoubleClick, Google’s AdSense, and Yahoo primarily. For any post or page which takes a full copy of a publisher’s work, the Fair Syndication Consortium thinks the ad networks should pay a portion of the ad revenues being generated by those sites.

I know a little bit about this because in January I was invited to a meeting at the A.P.’s headquarters with about two dozen other publishers, most of them from the print world, to discuss the formation of the consortium. TechCrunch has not joined at this time. Ironically, neither has the A.P., which has apparently decided to go its own way and fight the encroachments of the Web more aggressively (although, to my knowledge, it still uses Attributor’s technology).

by Erick Schonfeld on April 6, 2009

With its news syndication business under direct attack by the growing abundance of other news sources on the Internet, the Associated Press announced today that it will begin to police the Web and “develop a system to track content distributed online to determine if it is being legally used.” The A.P., it appears, wants to become the RIAA of the flailing newspaper industry—ferreting out information pirates and threatening lawsuits if they don’t turn over some of their Google gold.

The A.P. has a broad view of what constitutes its content. It is not just entire articles copied wholesale by spam blogs. The A.P. has problems with the unauthorized use of its headlines, even when they include links. Many of its policies ignore the concept of fair use. And even when it has cause to go after copyright violators, it sometimes relies on antiquated and tortuous legal theories. The A.P. is so backwards in its thinking that we’ve banned links to all of its stories on TechCrunch.

Now it wants to go after unauthorized use if its news articles across the Web. Forget for a moment that its notion of what constitutes unauthorized use may not hold up in a court of law. The A.P. is going directly after the search engines and news aggregators which often point traffic away from A.P. sources directly at the supposed infringers.

So how exactly does the A.P. plan on policing the Internet? Here I must rely on informed speculation, but I think I have a pretty good idea.

by Erick Schonfeld on August 26, 2008

Who is doing a better job of getting his message across on the Web: John McCain or Barack Obama? Conventional wisdom says that it is Obama, whose performance on the Web has been strong since the beginning. And conventional wisdom is still correct when it comes to blogs and social networks. But a new study by Attributor that is being released today shows that McCain is actually leading on mainstream news sites and catching up on blogs, especially as he and his supporters have been increasing their attacks on Obama.

GumGum Launches New Image Licensing Platform
60 Comments
by Michael Arrington on February 13, 2008

GumGum launches an ambitious new project today – a new platform and business model for licensing content on the Internet, beginning with images.

Image piracy runs rampant on the Internet, of course. Blogger Perez Hilton was sued for stealing images of celebrities, and we’ve had (ridiculous) charges leveled at us as well. And don’t forget the recent Lane Hartwell debacle.

Attributor, a Silicon Valley startup, helps content owners track their intellectual property to find examples of infringement. But until now, no one has really thought about a better way to license content on the Internet, so that both large and tiny publishers have an incentive to avoid simply stealing stuff.

That’s where GumGum comes in. Images today are generally licensed for a flat fee, exclusively or non-exclusively. GumGum founders Ophir Tanz and Ari Mir think a better way is to charge for impressions, or on an advertising-supported basis. But tracking image impressions isn’t trivial, so they first had to build a platform to do that.

GumGum allows any publisher to search for images (there are thousands available now via a number of photography agencies) – here’s an example search for “Britney.” Images can be licensed on a CPM basis (generally $0.20 or so, but determined by content owner), or for free with an advertisement.

GumGum requires images be published via a Flash object so that impressions can be tracked and billed properly. Flash also allows them to serve interactive advertisements, served via VideoEgg (we wrote about their Flash ad product here).

Here are two images, one based on CPM licensing, one based on advertising:

Any photographer can now upload images and sell them. And any publisher can create an account to license images. Down the road, GumGum says, they’ll be adding video, audio and text content for licensing as well.

Will This Work?

It’s certainly a pain for publishers to have to embed a Flash object to publish an image, but it is the only reasonable way that GumGum can track impressions and serve ads. Many small publishers will of course simply continue to steal images, or look for freely usable stuff on Flickr. But if there is a killer image that a lot of people will want to publish, GumGum is a great way to easily license it to an unlimited number of people. At the very least, it’s an interesting experiment.

GumGum raised $125k in a December seed round from friends and family. The founders, who sold a previous startup Mojungle to Shozu in 2007, also put $125k of their own capital into GumGum.

CondeNet Goes Beyond Being A Copyright Cop; Approaches Infringement As A Business Opportunty
10 Comments
by Erick Schonfeld on February 4, 2008

condenet-logo.pngDigital media fingerprinting technologies are quickly becoming part of every media company’s arsenal when it comes to combating copyright infringement on the Web. So far, most media companies have used the technology primarily as an enforcement tool, in conjunction with their subpeona machines. But CondéNet, the online arm of Condé Nast magazines, is looking for ways to use digital fingerprinting technology beyond merely arming their copyright lawyers.

CondéNet is the latest media company to sign on as customer of Attributor (a startup I profiled earlier here) to see who across the Web is taking large chunks of text from CondéNet properties such as Epicurious.com, Style.com, Men.Style.com, and Concierge.com, without attribution or even so much as a link. Attributor is a reporting and tracking service that indexes a site’s content and finds copies of it on the Web.

CondéNet president Sarah Chubb signed on, she says, to get a better handle on how CondéNet content is being repurposed on the Web, In most cases, all she wants is a link back to the original site and she is even considering using the tool to find new syndication and advertising opportunities. In an e-mail, she explains her motives:

– We would like to see what our unknown distribution is.

– We’re not intending any sort of legal action unless someone is using our content in a way that could be damaging to us

– We do intend to contact the sites that are using our content to ask them to do one of a few things. On the most basic level we want attribution and a link. There might be other opportunities, with larger-traffic sites, to do some sort of ad deal with a rev share. If we find very high quality sites with a particular affinity and audience that lines up with our own verticals we might discuss a closer ad deal, as we have with the blog Sartorialist and men.style.com.

So it is fairly open ended but starts with us understanding what is out there.

That is certainly a much more enlightened view than we’ve seen so far from most big media companies, who confine access to their digital fingerprinting technology to their lawyers. Give these tools to business folks instead, and they will find new ways to make money in the future instead of trying to protect revenues from the past. CondeNet has much more to gain from generating tens of thousands of inbound links to its sites than sending out tens of thousands of cease-and-desist letters from its team of expensive lawyers. As I’ve said before, the link is the currency of the Web, and media companies who understand that will do better than those who don’t.

Attack of the Splogs—One Of Our Posts Copied 152 Times Without Attribution
127 Comments
by Erick Schonfeld on November 9, 2007

Here at TechCrunch, there is nothing we love more than when one of our posts gets linked to and talked about. And like the majority of other blogs out there, we try to be good citizens by linking back to any source from which we excerpt. But there is a growing minority of spam blogs, or splogs, that indiscriminately take entire posts from other blogs and present them as their own.

For example, here is a screen shot from one random splog that just reposts TechCrunch’s entire feed with no links back to TechCrunch or even acknowledgement of the source:

tc-splog-2.png

Just for the record, taking any blog’s entire feed and republishing it as your own content is not okay. Notice that the only difference between this splog and TechCrunch is all the Google ads splattered everywhere.

We are not alone in this. Any blog that produces fresh content on a daily basis is an easy target. Google makes it economical to create such splogs through AdSense and then rewards them with traffic through its search engine. Google (and the other search engines) need to stop rewarding such behavior.

We knew the splog problem was bad, but we didn’t know how bad until earlier this week, when I did a post about Attributor (a new startup that can track who is copying your stuff all across the Web). I noted that Attributor found one TechCrunch post that had been copied in one way or another 572 times (not all of them bad).

Attributor catches all matches of blocks of text, so I asked them to break that number down. First, they threw out anything that was less than a five percent match, which left us with 467 matches. Of those 315, or two thirds, linked back to the original post. So that is the good news. It appears that most bloggers are good citizens. But 152 of them, or fully one third, did not link back. And of those, 115—or 25 percent of the original—were plastered with ads, making money off our work without so much as a link.

Here is a screen shot of the original post, which covered the beta launch of Hulu:

hulu-tc.png

Now here is a screen shot of one of the splogs (notice the similarity?):

hulu-splog2.png

And another one (complete with a Jessica Alba cheese ad—although it arguably does give the headline an unintentionally different nuance):

hulu-splog-1a.png

You get the idea. Admittedly, this is completely anecdotal. It is only one post. But it does point to a larger problem. Other bloggers out there, have you been splogged today? Probably.

Attributor Launches Service to Track Copyright Infringement Across the Web
51 Comments
by Erick Schonfeld on November 4, 2007

attributor-logo.pngEvery media company on the planet knows that its articles, songs, photos, and videos are being copied and spread willy-nilly across the Web, but they don’t have a clue what to do about it. They are not even sure what to do about all of their stuff that is just on YouTube (should they let Google monitor itself or create some vague industry guidelines and hope that every site follows them?). A startup called Attributor in Redwood City, Calif. says it can monitor the Web for copied content no matter where it may be, help publishers and media companies track it all, and help them decide what to do about it.

Attributor was founded in 2005 and has raised $10 million from Sigma Partners, Selby Ventures, Draper Richards, First Round Capital and Amicus. The enterprise version of its service launches today, although it has been testing it with Reuters and AP for about six months. The enterprise service will cost anywhere from tens of thousands to hundreds of thousands of dollars per year (a more limited self-serve version for bloggers and smaller publishers could cost as little as $6 or $7 per month, and will launch in 2008). CEO Jim Brock gave me a demo of Attributor last week in the lobby of the Waldorf Astoria.

Attributor is already indexing 100 million Web pages a day (15 billion total so far), but it is not a keyword index. It looks for bigger blocks of content. Right now, it can handle only text. Images are in beta. And video matching will go into beta early next year. If you are a publisher that is a customer of Attributor, it ingests all your content and comes up with matches. Attributor splits up the world between sites that exhibit extensive copying (more than half of an article, for instance) and just some copying. It shows which sites have linked back to the original source and which have not. “Often, that’s all they want—a link,” says Brock. Below is a typical dashboard view of what a customer would see. In this case, the content from People.com is being analyzed (based on its feed). Of the 265,000 matches, 103,000 don’t link back to People.com.

attributordashboard.png

Attributor also shows which sites generate the most traffic, which are supported by ads, and which ad networks are making the most money off of your content across the Web. Of the sites that copy People.com extensively, for instance, 55,000 are supported by ads. “This becomes a billing engine at some level,”says Brock. But rather than go after each offending site, he thinks that Attributor’s data will give media companies leverage against Google and other ad networks. “If I am a big content producer,” reasons Brock, “and I can identify all the pages with Google AdSense, my conversations at that point is with Google.” They could ask Google to ban the offending sites from AdSense or, better yet, to cut them in on some of the advertising revenues associated with their content.

attributor-lyrics.pngUltimately, though, it is all about the links. Links are the currency of the Web. They are the way attributions are made. In most cases, media companies would be better off if they could just get everyone who is copying their stuff to link back to them than by trying to extract licensing fees out of them or suing them. There is a lot less friction in asking for a link, and it doesn’t cost anything to give one out. Yet all of those links can turn into traffic, both directly and by imbuing the original source with higher search karma (i.e. a higher ranking on search engines).

A case in point is what is going on with music lyrcis on the Web. The term “song lyrics” is one of the most popular searches online. In a study just released today (PDF here), Attributor scoured the Web for the lyrics of 14 of the songs at the top of the Billboard charts. It found 1,524 copies, mostly on lyrics sites, social networks, and blogs. The only site that has actually bothered to cut licensing deals with the record labels for these lyrics is Yahoo Music, yet in all Google searches (and even 81 percent of Yahoo searches) other sites outrank Yahoo Music when it comes to finding the lyrics for these 14 songs. Of those sites, 57 percent were supported by ads (mostly AdSense) for ring tones, concert tickets, and the like. A Google search for the lyrics to the Rihanna song Umbrella (pictured above) shows how much AdSense is powering the lyrics Websites.

It’s not just lyrics. In another study evaluating 215 recipes on Epicurious, Attributor found 3.959 copies, 65 percent of which did not link back to Epicurious, and 56 percent of which were ad-supported sites. More than half of the copycat sites ranked higher in searches than Epicurious itself. I asked Attributor to run a search on some of my TechCrunch posts. One reporting some early details of Google’s OpenSocial project (codenamed Maka-Maka) was the 15th most copied post on TechCrunch since June, when Attributor started monitoring our feeds. (This Hulu post was the most copied overall, being copied 572 times).

For the Maka-Maka post, Attributor found 243 copies, with 200 of those taking more than 80 percent of the text. Fewer than 40 percent actually linked back to the original post (you swine!) and 79 percent had ads on the pages. And this is just for one post. I won’t actually link to the offending sites—you know who you are so cough up those links—but here are some screen shots (highlighted portions are copied verbatim from TechCrunch—at least one takes our entire feed, reposts it with AdSense ads, strip out names of the authors, and does not link back to TechCrunch):

just-a-random-blog-maka-maka.pnghuman-capital-maka-maka.pngwebuy-maka-maka.png

bugbugbugbug
Techcrunch on Facebook