Attack of the Splogs—One Of Our Posts Copied 152 Times Without Attribution
Erick Schonfeld
110 comments »
Here at TechCrunch, there is nothing we love more than when one of our posts gets linked to and talked about. And like the majority of other blogs out there, we try to be good citizens by linking back to any source from which we excerpt. But there is a growing minority of spam blogs, or splogs, that indiscriminately take entire posts from other blogs and present them as their own.
For example, here is a screen shot from one random splog that just reposts TechCrunch’s entire feed with no links back to TechCrunch or even acknowledgement of the source:
Just for the record, taking any blog’s entire feed and republishing it as your own content is not okay. Notice that the only difference between this splog and TechCrunch is all the Google ads splattered everywhere.
We are not alone in this. Any blog that produces fresh content on a daily basis is an easy target. Google makes it economical to create such splogs through AdSense and then rewards them with traffic through its search engine. Google (and the other search engines) need to stop rewarding such behavior.
We knew the splog problem was bad, but we didn’t know how bad until earlier this week, when I did a post about Attributor (a new startup that can track who is copying your stuff all across the Web). I noted that Attributor found one TechCrunch post that had been copied in one way or another 572 times (not all of them bad).
Attributor catches all matches of blocks of text, so I asked them to break that number down. First, they threw out anything that was less than a five percent match, which left us with 467 matches. Of those 315, or two thirds, linked back to the original post. So that is the good news. It appears that most bloggers are good citizens. But 152 of them, or fully one third, did not link back. And of those, 115—or 25 percent of the original—were plastered with ads, making money off our work without so much as a link.
Here is a screen shot of the original post, which covered the beta launch of Hulu:
Now here is a screen shot of one of the splogs (notice the similarity?):
And another one (complete with a Jessica Alba cheese ad—although it arguably does give the headline an unintentionally different nuance):
You get the idea. Admittedly, this is completely anecdotal. It is only one post. But it does point to a larger problem. Other bloggers out there, have you been splogged today? Probably.









Your starting to sound like Prince. Its just content that you created…right?
would you be willing to pay for a solution to this illegal copying?
i have one. here’s how it works - instead of allowing people to get the content digitally, they have to “subscribe” to it, and every morning a kid on a bicycle will go around and throw a rolled-up paper containing all the newest blog entries on the front step of every subscriber’s house.
i should patent this, it’s going to be huge!
Wow, what those filthy splog slobs are doing is is incredibly illegal. I hope you big dogs with popular blogs like TechCrunch, who have a bit more resources than us common folk, find a way to squash these pathetic criminals for everybody’s sake. Get on Google’s case, I say. Google has BILLIONS OF DOLLARS –– you’d think they’d have the resources to pull a quick team of 3 or 4 guys and gals together who can write a quick algorithm to differentiate a legitimate Techcrunch post from a stolen one.
So is it illegal? Are you going to litigate someone? If you’re not defending your work than your part of the problem — right?
I think its sad that this is such a problem, but what really can be done about it? One thing that adds to it, that this article mentions, is the ease with which one can open an Adsense account. I have a younger brother who decided to start a website with just a template he downloaded. And within hours he was already approved by Adsense to place ads on, even though he had no content.
It’s all probablly those same guys who try to spam my comments boxes day after day after day after….
Jessica Alba is hot
More posts about how P2P trading is cool and the music publishers just don’t get it please
Hopefully Google will buy Attributor and stop working with these sites.
There’s an interesting side effect to all of this. Since Google doesn’t really see these as “bad” sites yet, all links in a post (say, to a startup) are multiplied 152 times. That really helps the startups being linked to.
There’re even Wordpress plugins to rip off other bloggers automatically. Check this out:
http://devthought.com/wp-o-mat.....greggator/
Michael, I hope you guys are looking at legal options to stop or minimize this. Nobody produces derivatives of NYTimes and claim it as their own because the consequences are dire. You guys step it up and I’m sure other’s will follow. There needs to be more done than just a simple slap on the wrist.
I am going to have to give a point to some of the snarkier commentators.
I think “University Update” along with about 2 dozen .info sites have been pulling my full feed and wrapping it in Adsense for about 6 months now, unfortunately, it is just the landscape that we exist in.
As publishers, unless we want to pull a RIAA and try to litigate every .ru top-level domain that runs a splog-nest, then all we can do is innovate around it. Having taken a look at some of these splogs, they are really only picking up a pittance for their effort.
There is something to be said about the conversation that authors bring to the table. Content is cheap but personality can’t be as easily ripped off.
All I hope is that we don’t get so up in arms with the sploggers that we start sounding like the music industry. Wouldn’t that be ironic?
Automatt - i understand you are trying to do a pot calling the kettle black type statement. I have two responses.
first - copying music for your own use is different than taking content from others, saying it is your own and reselling it.
second - we aren’t going to start suing people for doing this. It would be an incredible waste of time, and it wouldn’t even slow the copying let alone kill it. The search engines need to evolve to deal with this kind of stuff, and effectively keep it invisible. But the problem is those google ads…Google makes money of these sites. A lot of it in the aggregate. So they have an economic incentive NOT to fix the problem.
Anyway, its interesting stuff and that’s why we posted.
I post some of Techcrunch articles on my blog and I always include a link back to the original post here and I include your copyright info at the bottom of the post. I love reading your articles and use some on my blog with the hopes that my visitors will follow the link back here and see why I enjoy Techcrunch so much (and to provide them with useful info). My blog is mixed with articles I write and articles that I get from writers and blogs that allow the reprint as long as their links and personal information is kept intact. Many have a little “About the Author” block of text they want attached to their article if you reuse it. In my experience many writers of these articles are happy that their material is being spread around as long as all the links to their websites/blog are included. This helps their websites gain traction on the search engines and provides a lot of incoming links to their sites and blogs. I’m sure that many people actually click through and visit these site. I checked your site before I started posting articles I get from here and didn’t see any specifics about using/not using your articles. I figured it would be ok if I did it with respect to Techcrunch and the writer. I will stop using it if you rather I didn’t. By the way, I only use a small percentage of your articles and would never even consider using your entire feed.
You can pretty much thank Google for this phenomenon. They have made it attractive for people to do this, and I wouldn’t be surprised that 30% of their revenue comes from underhanded techniques such as this, and good ole fashioned click fraud.
In fact, I think google has done more to clutter up the internet (indirectly of course) with junk sites and ad farms than any other company in history.
so Prince is splogging now?
Do you consider this a splog: http://www.pop-secrets.com
I made it for my fiance so she could check all her gossip blogs from one page. I realized a bunch of people wanted this, but don’t understand RSS and feed readers, so I made it public and threw up a bunch of ads to recoup hosting costs.
I link to all the original sources and I made sure to only crawl sites that specifically say it is ok to use content from their RSS feeds with ads. Some sites have 2 RSS feeds, one that can be used with ads (usually is a summary with no pictures) and one that has the full data but can’t be used with ads.
Honestly, I think copying any article, whether you attribute the author or not, is pointless and a sign that you just can’t think of anything on your own. Of course we all see interesting posts and link to them, but mine usually accompany my own summary and thoughts rather then just a verbatim rehash.
I find that whenever I sort Google Blog search results by date, there is tons of splog stuff clogging the results.
I know quite a few vloggers who were alarmed and disappointed when RSS made it so easy for sites to revlog their stuff without attribution, or with a lot of adverts, or failing to honor creative commons or other licensing terms. People were most excited about RSS being used to allow viewers to subscribe to their stuff, ala podcasts, and hadnt always considered the other uses such as sites the feed to copy their stuff in a fully automated way. This problem hopefully peaked for vloggers during the crazy year of the video site megahype, where so many new video sites emerged and some were desperate to make it look like they had lots of content, and so took liberties.
Such matters ignited some discussions about exactly what the ‘non-commercial’ in certain cc licenses actually meant, whether a few google ads counted. In the end, verbal backlash from people who felt ripped off was enough to get most, but not all, of the offending sites to mend their ways, but people trying to track this stuff got overwhelmed by the number of sites emerging, could often feel like a losing battle.
Personally because I didnt want to see any tech used to try to limit such things, as it would also impinge on legitimate viewers, so I advocated the approach of trying to come to terms with this loss of control of distribution. The theory being that if you embed your brand & real site url in the content, then anybody watching/reading it on some ripoff site, who finds it worthwhile, will be likely to check out your original site, with will no doubt be better looking. Its not like the splogs usually have any advantage for people to read the stuff there, people usually get to them by accident via search results.
Jon is spot on, been around for an age and will be until there is a backlash against this. All we can hope for is the backlash and hope that Google can’t justify the revenue gain against the bad press.
That said Mike’s got a good point about the additional links for a new site.
Interesting on the real winners and losers
Erick - I agree there is a very large problem here, and it is fed by Google. Not only are there splogs, but there are also individuals who advertise on Adsense simply to direct to a spam page that is filled with Adsense ads. By bidding low they are able to make money by displaying higher cost ads.
Michael, thanks for the response. I understand your points but it seems like they are giving away your content for free, not reselling it.
Sure, they have google ads, but so do all of the .torrent sites.
So, P2P distribution, which I’ve seen your site endorse, shows ads to make money from other people’s content, same as these guys. It bothers you here because in this case it is your content.
I think it’s a little ironic that the splog pictured has fewer, less annoying ads than the actual TechCrunch site, and as a result probably loads faster too.
Of course, it probably doesn’t have the comments of the original site (yet).
The timing of this article is ironic, I just got my first post dugg yesterday, and the single thing I was blown away by was the number of blogs I found in the trackbacks that were literal copies of my post. Redonkulous.
It’s all about incentives, and since Google makes so much money on it, why should they stop until they have a good reason to? I’ve got a friend who makes upwards of 1000 dollars a month from a similar setup. That’s not bad money for passive income.
Regarding Pop-Secrets:
I think attribution is key. Mst content creators didnt want to be complete control freaks, or prevent all reuse of their stuff. Attribution turned up again and again as a crucial factor in how people reacted. For me it makes a big difference as to whether I label you leech, or think your aggregation is of value to the world.
And as Pop-Secrets seems to include links to the original site’s permalink for the post, I dont think you are a terrible offender. But its always nice to go the extra mile, such as the prominent ’sources’ box you have on the page.
automatt - my response:
- don’t confuse torrent sites with actual users, who download content for their own use.
- we don’t endorse file sharing, we just see it as inevitable and unstoppable.
- putting ads around content is reselling content.
Don’t take this disrespectfully, but I think you are being intellectually lazy here just to try to say we are being hypocritical. That’s fine, but it’s more interesting to be a real part of the conversation.
I’m with automatt. There’s no real difference between what these guys are doing and what the bittorrent trackers Duncan Riley’s always on about are up to. They’re both making money off other people’s content. The splogs cut and paste your stuff and get money from the ads. The torrent sites link to content they’ve found elsewhere and get money from the ads. But now, call the wahmbulance because it’s YOUR stuff getting ripped off, not some organisation everyone hates.
Hell, you’re probably not even losing a cent from this crap - maybe the other blogs are making a few bucks from your content but is it really going to make people visit them rather than you?
I guess you guys will need to add DRM to your blog now, lets start an iBlog store where we can plug our computer to get the blogs.
You might try looking at copyscape.com to help you with these types of issues, they have a tool which can find copied content. When you make your living on the internet you definitely need to know who is out there scraping your sites. I would also recommend implementing a defense against evil spiders/robots, there are a few out there if you look.
Hope you get the splogs wiped out.
So by extension, if there was a site with all of the TechCrunch content on it, attributed to the proper authors, but without any ads, you’d be okay with that, because it wouldn’t be reselling your content?
Having also had my content ripped off and used to sell Google AdWords, I’m incredibly frustrated with Google’s (non-)response to this issue.
AdWords’s abuse reporting system is worthless. I have yet to ever get a response from them when I report obviously stolen content, or something like a spam post in my comments section linking to an an AdWords-funded site.
“Do no evil” my ass. They’re profiting off of this more than the sploggers are. Until Google decides to stop funding these assholes, it’s going to remain a profitable enterprise.
“putting ads around content is reselling content” - and you call the other guy intellectually lazy? Your content is being taken and put elsewhere and someone else is making a few bucks from that content. Torrent sites link to the content someone else has made and make a large amount of bucks from that content. There are ads all over torrent sites. Have you looked at mininova lately? Checked the pages on thepiratebay? seen the explicit porn ads most of them carry?
Where’s the great difference? That torrent sites don’t actually host the content? Come on….
(And you may not personally endorse file sharing, but some of your writers certainly do.)
Ironically, I first read this on Yahoo’s new “reader” as a pop up on their incredibly slow/painful new my.yahoo homepage. It scrapes your RSS and presents it in their own window. It’s not just headlines anymore!
This is a very bothersome problem.
I have student class projects, for instance, that have this happen hundreds of times each week.
Nonprofit clients, people we’ve built sites for, have the same problem on a daily basis. They experience from dozens of daily instances to many hundreds of weekly instances.
Student personal blogs are flooded with the trackbacks from these thefts (and that’s what it is … theft, plain and simple).
Forgetting, for the moment, that it is simple theft and bad practice, consider the site maintenance required by all of these victimized sites on a daily basis. It is horrible.
The only thing that will stop the practice is an absence of financial reward. I agree with you. I’ll go ya’ one further…
Google, Yahoo!, MSN, et.al. need to fix their broken models. They aren’t just cracked, or sort of broken … they are built (if only partially) upon this kind of fraud. And that’s what it is. Fraud.
RSS is a use agnostic distribution system for better or for worse. Would it help if Google killed off the splogs? Yea, it would but in the intellectual property war content producers will always be on the losing side.
We can do things to minimize the -damage- that unattributed copying leads to, but getting into an arms race with these folks is just a form of ego stroking. What we should all be focused on is providing -value- that others can’t easily copy. It’s possible, that’s why few people actually go to these sites unless they see an accidental attribution.
I have gone back and forth with my own blog on the [more] option. Should I send out full feeds, of the first few paragraphs with a link to the full article? I go both ways on this. Certainly having the more solution will help with those sploggers as at least they don’t get the full content.
Google does downgrade their sites as it finds their content in duplicate of other content and thus measures it as less relevant. That said, you are correct about who is really making money. Google with the adwords. This is why I only allow my ads to show via google search. NEVER on content sites. Just too much crap generating worthless clickthroughs.
I like the Alba pic. More please.
Move to Short Text RSS?
This is why my friend at http://www.spendmatters.com went to short text rss. It makes your readers click to get the full feed, but prevents the problem you describe (I think).
http://www.spendmatters.com/in.....t-Text-RSS
I had a similar problem. I used to get traffic based on a single image search. Some wise guy(s) hotlinked my image on their site and google de-indexed my site and indexed theirs instead. Now there are 3 or 4 search results based on this image I published initially. Totally weird. What did I do to need de-indexing?
i think the main thing we’ve learned here today is that TechCrunch needs more pictures of Jessica Alba.
and less hypocrisy
faceloop - defend that statement.
I think that deserves an “Amen!” brother.
Amen to the Jessica Alba post that is.
one of the best ways to defeat this is to occasionally swap the images in the with something fairly racy and upload new copies of the image to the post itself. While the readers see boring web 2.0 stuff, the splogs are showing man-on-goat love. However, do not do this before Google starts sending out updates because google caches the images and your man-on-goat love goes out for all the world to see, as we discovered a few months ago.
in the words of arnold schwartznegger:
stop whining!!
http://youtube.com/watch?v=Rjy2RWO_C7s
Mike…
You have ripped the music industry (and musicians) over and over for attempting to protect their content. The industry (and select musicians) have been labeled the bad guys for complaining about it - while those who are stealing the content have been awarded white hats…or at least a pass.
Now, when we find out your content has been stolen, the thieves are somehow worse than those who steal the music. In this new instance, the thieves wear black hats.
Hypocrisy is the act of condemning another person (the music industry) for an act of which the critic is guilty.
You cannot rip on the record industry for whining about content thieves if you are going to do the same when it happens to you.
For the record…I think stealing content - music or blog posts - is bad.
They must be laughing off this - you just made them more money by posting this today, they got even more hits to their sites.
Just curious, how would Google’s code determine which are the original source?
This is an interesting debate.
Biel or Alba?
I vote Alba..
it’s simple — how can one say it is OK to copy someone else’s material or content (word for word) and put it as your own?
Even for one moment that it was not reselling or making money, it is wrong — really don’t care how you twist and turn it. I’ll be pissed if someone did that to me regardless of whether they were making money or not. To me it comes down to more of ethics and integrity than anything else. It really shows the kind of person you are too.
No matter how you look at it, it’s still Plagarism. I find it very disturbing that people even recognize the fact that it might not be.
Like spam, it’s probably going to be a never ending battle. It would probably be a matter of time where Google will do something about it — pretty sure about that one as they aren’t blind and oblivious to what’s going on.