Since its inception, one of the biggest problems with Digg has been that users often submit the same content over and over again. This makes it harder for cool content to become popular because some users digg one submitted story, while some digg another. Today, Digg is releasing “several major updates” to its duplicate (known as a “dupe”) detection system.
The solution sounds fairly intensive. “To better understand the nature of the problem, we analyzed the types of duplicate stories being submitted. Most common are the same stories from the same site, but with different URLs. Our R&D team came up with a solution that identifies these types of duplicates by using a document similarity algorithm,” Digg’s Director of Product Chris Howard writes in a blog post. He goes on to say that there will be a follow-up more technical post to explain a bit more about how this actually works, but says that it has proven to be a reliable system so far.

























See all


