Update: Overview videos of the new service and a white paper are here.
Digg has been talking up a new Recommendation Engine that intelligently suggests new stories to users for almost a year. Well, a source says that the company will launch it soon, probably this week. We have some details and a couple of leaked screen shots, which may or may not be real (I’m betting on real).
An average of 16,000 stories a day are submitted to Digg. Most users don’t venture past the handful of popular stories that make the home page. A few brave souls, though, venture into the Upcoming sections, where all the rest of the stories sit until they expire (most of them) or get promoted to the Popular section.
The Upcoming section allows users to sort by category, freshness, number of diggs and number of comments, but the sheer volume of stories means it’s nearly impossible to find interesting news. The new Recommendation Engine section will be replacing Upcoming entirely from what we hear.
Recommendations will be made based on diggs from users who tend to vote in a similar way as you do: “The Recommendation Engine suggests upcoming stories by matching you with Diggers like you.” this is very similar to the way Netflix handles movie recommendations, although since every story on Digg has a shelf life of just 24 hours, it has to work in real time.
Based on the screen shots, users can see like-minded users in the right sidebar along with a compatibility percentage. If you click on one of the users you’ll see the overlap between your and his/her digg voting over the past 30 days (in this case, six stories), and see recommended stories from that user. Users will be able to sort recommended stories via most diggs, most matches, and most recent.
Presumably the users you are similar to will change over time, and there is a button at the top of the screen to remove any particular user if you don’t want to see their recommendations in the future.
My thoughts: it actually looks like a winner. If the algorithm works to properly match me to other users with similar interests, Digg can become the first place I check every day for interesting breaking news in a variety of categories, rather than the place I go when I have a few minutes to see just the massively popular stuff. It will be interesting to see if they get a pop in unique visitors and page views.
We also believe that the Recommendation Engine will be the official introduction of Lead Scientist Anton Kast, who joined the company in 2007 but has been very much behind the scenes until now. Kast reportedly led the project internally.
More screen shots below:







Interesting. Im curious to see how well it works and what the Digg community thinks of it when released.
looks cool…
But isn’t that already been done a number of times by many other websites
One of them is “Social DNA” feature in esnips.
Hope they execute it pretty well…
very neat..
can potentially set the standard for recommendation engines..
though as a user, what’s your value in submitting to digg? Isnt’ machine taking us over?
-A
http://www.pluggd.in
somethig has to happen, digg is totally inconsequential to my life, along with reddit, boingboing, metafilter … basically, my life and findings are more interesting to me than the those of the people who populate those sites
buried for lack of lolcats.
Perhaps the recommendation engine should EXCLUDE friends - thereby forcing users to explore new prospects.
Most probably already check their friends lists
I think Digg just added another $100 million to their valuation.
Problem is everyone will digg the same stories located on the home page so you’ll get a whole host of ‘great matches’ but these people are digging stories you would have seen anyway on the homepage.
I think its impossible to get rid of the upcoming section too. There has to be a melting pot to begin with, what this is trying to do is to send you an edited version that will be of most interest to you. For it to work you’ll still need people digging the upcoming stories.
If they were to go down a purely reccommended route - you’ll have gaming of dig like never before…
Getting people to look at the 15k submissions a day is tough task and this goes someway to easing it, but i don’t think it solves the problem - in a way it is adding to the noise as this list of reccomended stories will be yet another list to go through.
I have for long started to DISTRUST DIGG.. I initially thought that Digg was a place where one could post some good stories and get a kick of the story being read and appreciated by millions(if it was a good one). I even posted some good stories from Techcrunch on it. However, I never got past 3-4 Diggs !!!!
I think the whole functioning of DIGG story popularisation is manipulated . If user ‘A’ posts a story , he may get 5 DIGGS and if another user(someone who knows a lot of other DIGGers) posts the same story, he may get 500 DIGGS.
So what do you make out of it? Is the story bad when user A posts it or it becomes too good when user B posts it? Are the people who DIGG for user B really interested in the story that they DIGGED or they have just DUGG because they knew him? The same distortions will carry on to their new development - I am sure
I’ve left digg long ago not being able to keep up with all the crap on the front page, and I’ve been testing persai for a couple of months (much better than letting the dumb crowds do the rating), I guess persai won’t be pleased with this news
well this could help Digg get some users back because I know I lot of users who simply left it because it’s not always the best stories that get the good ratings.
The problem here, as I see it with many recommendation engines, is that the recommendations are based on what other people like, not what the user would like. If you really want recommendations and don’t want to rely on other users, you could try http://blern.com.
Interesting idea, but as Jim points out, I’m more interested in service that personalizes news according to what I want and not rely on other readers. Findory was the gold standard a while ago, and the space is wide open for others to try and reproduce what Greg had done as a one-man-show. We take a lot of inspiration from Findory, and are working on something similar at NewsCred. Keep an eye out!
Jim said…
the recommendations are based on what other people like, not what the user would like.
It can do both, depends if the recommendation vendor allows it. Recommendation engine can do recommendation based on the user’s past history taste (personalization) or based on others tastes (collaboration).
If Digg wants to make an intelligent product, how about changng the fact you can digg an article without ever actually clicking through to the article or even reading it…
I don’t think that this will replace the existing system of hand picked top stories. No more than Techmeme will replace Techcrunch.
Im amazed that Digg invented this? Is it algorithm based, or simply pre programed based upon results of others?
Michael Arrington said…
…although since every story on Digg has a shelf life of just 24 hours, it has to work in real time.
Yeah, I agree that it has to work in real time although I am not a digg user.
The traditional method is to train the recommendation algorithm in one go (batch training or off-line training, ie train the algorithm on past data prior to deploying the model live), but this is fine in non-realtime application where the data is not huge and it also changing every second, minute , etc, such as Digg is currently facing now. When the data is accumulating very fast in a short period of time, then off-line training becomes unfeasible in both memory-wise and speed-wise, because the whole training has to start again from the beginning , ie, the old data that was used in the previous training plus the new data that has just arrived. But new data arrives every seconds , if not every minutes, etc… So, it is in feasible to re-train the algorithm every few seconds or minutes because new data is continually arriving.
The other form of training is to do it live ie, online training. The algorithm is trained with the past historical data initially and then let it run. When new data arrives the algorithm continues on the training and updates itself in realtime. This means that the algorithm avoids re-training itself , even with the continual arrival of new data because the algorithm trains/learns the data pattern in realitime. The algorithm is simply switches on at the beginning and then leave it to learn newly arrived data on the fly.
If Dr. Anton Kast is reading this thread, you might find the following paper on such online algorithm interesting for Digg’s realtime story tracking. The paper describes an online version of the popular NNMF (non-negative matrix factorisation) which trains itself in realtime, that is suited to your realtime tracking at Digg. The publication is a collaboration of Microsoft Asia R&D division, Peking university, Hong Kong University and University of Illinois at Urbana. There are already NNMF codes available on the internet, mainly in Matlab & C, C++, but they’re all off-line NNMF. The online NNMF (see below) I am quoting here, is the first that I have seen where NNMF has been made available online (ie, realtime updating). Here is the abstract:
Abstract:
———–
Detecting and tracking latent factors from temporal data is an important task. Most existing algorithms for latent topic detection such as Non-negative Matrix Factorization (NMF) have been designed for static data. These algorithms are unable to capture the dynamic nature of temporally changing data streams. In this paper, we put forward an online NMF (ONMF) algorithm to detect latent factors and track their evolution while the data evolve. By leveraging the already detected latent factors and the newly arriving data, the latent factors are automatically and incrementally updated to reflect the change of factors. Furthermore, by imposing orthogonality on the detected latent factors, we can not only guarantee the unique solution of NMF but also alleviate the partial-data problem, which may cause NMF to fail when the data are scarce or the distribution is incomplete. Experiments on both synthesized data and real data validate the efficiency and effectiveness of our ONMF algorithm.
The full paper (PDF) with the detail of the algorithm can be downloaded from here:
Detect and Track Latent Factors with Online Nonnegative Matrix Factorization
Actually, the algorithm can be used in anything that deal with multivariate analysis, such as search engine, online opinion mining, etc…
i never like any of non shopping recommendation engines
too many flaws
@17, a recommendation algorithm is pretty easy to implement.
There’s no need for an article on the subject.
Each user with similar reading habits is assigned points for new articles, and the articles that have the top points floats to the top of your recommendation graph.
It’s not rocket science.
I am considering re-organizing the top friends order on the browse screen of my social network to display friends that other users with similar browse parameters have chosen based on points.
A “recommend new friends” feature on the browse tab of sitespaces.net.
Since the script is only going to be $199, it should sell it pretty easily.
Again, not rocket science to implement. I read the book “pagerank and beyond” and much of it was verbose overflow. Of course page rank is a circular recursive algorithm that has no conclusion because each new pagerank redefines the rank of all the others that link to it.
What digg is doing is far simpler than that.
Its a lie! Its all a lie……
Here is another useful publication, which I haven’t read the full paper yet, but it looks to me from the description in the abstract that it it not an online (real-time) algorithm. This means any off-line NNMF version (lots of them on the internet) can be used.
Abstract:
This paper studies the online discussion participation prediction (OFPP). Online discussion is an application on the web that provides a cyberspace for users to exchange or share different information. Finding suitable online discussions on Internet becomes difficult as huge amount of information existed. This led to recommendation systems that provide advices to users. In this paper, a weighted non-negative matrix factorization method is used to discover latent user preferences of online discussions such that prediction of user’s participation can be obtained. Experimental results show that with the prediction of user’s preferences, suitable online discussions can be suggested to the user.
Note that the meaning of : online discussion participation prediction, as described above in the abstract is not to mean that the NNMF algorithm is online (realtime), but it just means that the domain of application is for online (web) analysis.
Online Discussion Participation Prediction Using Non-negative Matrix Factorization
The paper can be bought online or photocopied from a local University library.
Techclusive said…
I read the book “pagerank and beyond” and much of it was verbose overflow.
Yeah, I’ve read that one too. I have to say that Prof. Amy Langville (author of “pagerank and beyond”) is one of the top notch researcher in the world today in search engine algorithms. It has been reported that her version of PageRank solver algorithm was adopted by Google in around 2003/2004, because it was more efficient/faster than the solver that Google had used.
I do check out Langville’s site on a regular basis to see if she’s got new research publications for download.
Techclusive said…
What digg is doing is far simpler than that.
So, you think that Digg’s announcement is all hype? I wouldn’t be surprised if it is, because most Web 2.0 startups out there are.
Finally Techclusive, what is the recall capability of your recommendation engine compared to algorithms such as SVD, NNMF, LDA , etc…? I am just curious, because all that is matter at the end if the algorithm has a very low classification error, be it a complex algorithm or a simple one.
Well I will def be giving this a try, if it has an rss feed that would be great ill never have to look for a story again. Cool Digg way to keep it innovative.
Actually Falafulu, I am Chris from SiteSpaces.net. Techlusive is my new Techcrunch clone network.
So with that said, I will explain briefly how I plan to implement friend recommendations on the browse tab of sitespaces.net
Each friend is an object of recommendation value.
That friendRecommendation object accumulates points for being active. Posting messages, blogs, updating their profile, participating in groups.
The more they are active, the higher the object’s point total is.
Each friend of the person hitting the browse tab will have their friends’ friends ordered by the points in their friendRecommendation objects.
Those with the highest points float to the top. They will get extra points for multi-inheritance, meaning they were friends of more than 1 friend of the person hitting the browse tab.
Additionally those not logged in can get recommendations based on who they browsed with the PHP session ID cookie.
I did not reduce this to a mathematical equation, because I am stressed for time to wrap this script up and get an installer going so I can sell it.
I will do the algorithm on the fly.
I am going to BK to eat breakfast now. It’s going to be a hot day in LA!
As my name suggests, I’ve tired of the juvenile theatrics of Digg commenters. Where I now go for latest news that also provides me with related articles of by RSS is http://www.gyzork.com – which just appeared in bzzAgents frogpond. Seems that it already does what Digg is going to attempt to do with its “recomendation” engine - but Gyzork does it without the bias of “digs” and other clutter. The Gyzork UI isn’t great, but it’s simple, straight forward and gives me what I want effortlessly.
So, have you had any luck with selling your script Chris? Are you selling it to bloggers, corporations, Facebook users, Google or Microsoft ? Had any success yet, ie, lots of buyers.
Is BK stand for Burger King? We have Burger King here in New Zealand which people frequently use BK as a short hand for it?
Hey Chris, you might as well explore benchmarking your algorithm, such as its retrieval capability (precision & recall, etc…) if you want to approach Google, Microsoft and the likes. These are the metrics that they want to look at, if they’re interested in your software. Just a tip.
@26,
“if you want to approach Google, Microsoft and the likes. ”
No, this script is aimed at those that are really frugal, but want a social network anyway, and they do not want to give their database to Ning.
People that know the value of data. IE, webmasters from webmasterforums, ect..
http://www.webhostingtalk.com/ This is a gigantahuge market.
My script will import phpBB posts and users, and also vBulletin posts and users and make their avatar the default picture turning every bulletin board on the internet into a social network paying my royalty of $199 a pop.
BK does stand for Burger King, I worked there when I was in high school and community college.
“I’ve tired of the juvenile theatrics of Digg commenters.”
That’s why digg is popular. Kevin Rose used to host a show on TechTV before Comcast took over and moved TechTV to Canada. The bratty little kids of that era are the digg commenters today. They are the reason digg acquired value.
The power of a referral or recommendation is huge, especially when an element of personalized referral (to your user history) is added into the equation. People want to look at material that is relevant to the material they consciously came to the site for. Referral is an awesome form discovery and will one day supplant searches engines like we have today…. look!!…. http://www.readtheanswer.com/index.php?RTA=web2
digg needs to stop with all the socializing crap and get serious about cleaning up the garbage that pollutes the site. I used to spend plenty of time in the Upcoming every day and took pride in finding great submissions and getting them promoted, now I check the front page maybe once or twice a day because of the sheer volume of spam, dupes, and marketing garbage. The real sad part is this garbage is making it’s way to the front page now since digg doesn’t care about banning users that just recripo-digg each other. It’s a sad state of affairs on what was once the best site for news on the web.
digg needs to go on a diet, not keep taking steroids.
On another note, Digg quietly killed off their Cloud View feature over the weekend. Maybe cuz the recommendation engine is considered as the successor?
Digg is “Lost” of Web2, with no direction, false innovation, lack of vision. Digg is crawling on knees and will die upon acquisition. But Digg does have good friends like Mike to always give great reviews.
Noovo attempts to solve the problem of relevant recommendations with (also) deep graph analysis and text-mining algorithms. Beta will be out soon. Falafulu Fisi , let’s get in touch, seems you are interested in solving similar challenges. nabergoj at noovo dot com.
Digg has never been the same since they branched out beyond being a technology aggregator. They’ve turned Digg into Netscape 4.7.
This is a very neat tool, hopefully it works well!
Joe - http://www.leetwebmasters.com
Not clear how can I recommend a story in digg.