Have you nominated someone for a Crunchie today? »
Speeding Up RSS
by Erick Schonfeld on July 9, 2009

I’m sorry, but RSS feeds are way too slow. I know this first-hand. As part of my job here at TechCrunch, I monitor a lot of RSS feeds for breaking news. We also produce our own feed and I can see how quickly it propagates to various feed readers and feed-powered news aggregation services. The lag time between posting a story and seeing it pop up in the RSS feed is usually a few minutes, and then it can take another 10 to 15 minutes or so for it to appear in something like Google Reader. And the TechCrunch feed is probably checked more frequently for updates than most other feeds. In our business, every second counts and RSS just isn’t cutting it.

While there is an argument to be made that RSS is dying, being replaced by more instantaneous forms of content delivery such as Twitter and other real time streams, many people aren’t quite yet ready to give up on it. Instead, they want to save it by speeding it up. Tomorrow, at our Real Time Stream CrunchUp, we will see three demos of projects that do just that in slightly different ways.

Google engineers Brad Fitzpatrick and Brett Slatkin will show a demo of a new push protocol called pubsubhubbub, Netvibes CEO Freddy Mini will demo his similar RSS Instant Update Hub, and WordPress engineer Andy Skelton will show off a Jabber client which uses the XMPP protocol to push blog headlines into an IM-like environment faster than RSS.

The pubsubhubbub and Netvibes technologies create RSS hubs, which push out feeds as soon as they are available. This approach is in contrast to the polling method which is the foundation of RSS. The polling method sucks because it requires the server acting on behalf of the RSS subscriber to constantly ping the server where the RSS feed is published to ask if there is anything new yet. Depending on how often this happens, you end up with a lag. As communications become more real-time, this lag is becoming more noticeable.

The way pubsubhubbub fixes this is by putting an RSS Hub in the middle which more efficiently pushes out the feeds to the servers subscribing to them. It is an open protocol and can be applied to any existing RSS or Atom feed, as well as other real time streams. If you think about the Twitter firehose that everyone wants access to, this approach lets anyone create their own firehose for different types of data streams. It is more of a federated approach. You can think of these RSS Hubs as a content delivery network of sorts for RSS feeds, similar in concept to what Akamai does for video streams.

Netvibes is creating its own proprietary version of this for its own service, which it is developing independently. It is called the RSS Instant Update Hub. All of those widgets on your Netvibes page today take forever to load because they each have to fetch the underlying feeds of data. The Instant Update Hub will cache and push these feeds automatically so that the widgets load faster and they update continuously without requiring a refresh. Any data stream that is supported by a Netvibes widget today, which goes well beyond RSS, will be pushed through the Instant Update Hub. It will also form the basis for a new stream reader which Netvibes will introduce later this year as an alternative to its current widget grid and magazine-style layouts (see screenshot below).

The WordPress Jabber client uses a different push technology, XMPP, to speed up RSS. The effect is that headlines pop up like instant messages. Jabber is mostly used for IM clients such as Gtalk, but Wordpress is using it as a feed reader and micro-blog publisher. The great thing about it is that it is two-way. In the demo, Skelton will show how the Jabber client can be used as an interface to to post directly to your blog. Feed reading and blog posting can all be done from the same place in a more real-time fashion. The Jabber client can also be used as a blog commenting system and embedded as a widget directly into a Web page, turning comments into more of a chat room.

What we are seeing is the world’s of publishing and IM colliding. The faster we can close the loop between publish and response, the more we are going to see real-time data streams take on the look and feel of public IM systems. Twitter is asynchronous, but it often feels immediate with back and forth conversations sometimes happening almost fast as a private IM chats. That is just a taste of things to come, as all publishing platforms get up to speed.

(Photo credit: Flickr/joiseyshowwa)

new_feedreader_listview02

Here’s the video from PubSubHub’s demo at the Real-Time event:

Advertisement

Responses

Comments rss icon

  • This article demonstrates lack of tech understanding regarding how web data exchanges work. You would expect more from a tech news blog. When you query the Twitter API you basically use an URL to get back JSON or XML data. This is essentially the same thing as asking for an RSS data feed by passing a URL with few params. The difference is in the middle-tier caching layer – how frequently the data is fetched again from a db – and again, both web apps and twitter use caching to balance load on the core db and data freshness.

  • RSS feed might be slow but I don’t think its completely dead yet. I still use it and haven’t had any problems with it. http://bit.ly/1ooIS

  • This is very similar to something we proposed a couple of years ago called GetPingd.com

  • Gretchen Morgan - July 9th, 2009 at 11:34 am PDT

    RSS Feeds are slow but they are not dead yet, in my opinion. I’m still using them with good results. http://bit.ly/1ooIS

  • Pardon me if i’ve read this incorrectly, but isn’t it similar to what Apple are doing with Push notifications?

    They’re aggregating lots of feeds of different types, meaning the client only has to keep a connection/state with 1 server, just like Apple.

    If so then it’s a good idea to a point, but you’re reliant on a single party for all your stuff.

    • Sean — The SPIF (Single Point of Failure) argument sounds good, because it’s passes the logical test, but it’s a failure on the reasonable test.

      Do you just use Google Analytics for site stats? Single point of failure for user behavior tracking.

      Do you have redundant servers at different co-locs? If not, you’re reliant on a single party for your entire business.

      The cost and effort of spreading out dependencies often eclipses the cost incurred if a SPIF temporarily fails.

      • Eric, I didn’t mean to say it’s a bad thing, but it’s just something to note. I’m also surprised this is being touted as something new, when as I said it’s just another aggregation tool.

  • Still using RSS Feeds and loving it – http://bit.ly/1ooIS

  • I agree! Rss Feeds are going down the flusher! But the large question is if when they do become “extinct” what will replace them?

  • Interesting to hear about this new upcoming protocol.

  • In other words push-based technologies are inferior to a real-time asynchronous messaging based technologies for real-time applications.

    Today I also noticed that the sky is blue.

  • I think that so many people are focused on Twitter, they either do not have time for the RSS or they preffer Twitter for the more advanced features. Feedburner for example seriously needs something done with as far as upgrading the service to more much better features. Did google abandon Feedburner?

  • For people who are using WordPress, I wrote a plugin to notify the world (via PubSubHubbub) when your blog is updated. This makes it possible for people to get up to the second updates when you publish new content. Check it out:

    http://wordpres...s/pubsubhubbub/

  • I am using RSS and am very happy with the technology as it stands today.

    Very rarely am I caught up in my RSS reader (google reader). I have no problem if a story takes up to 15 minutes to reach my reader.

    Clearly, there are some applications (my business depends on this information) where speed is super important.

  • I am using RSS and am very happy with the technology as it stands today.

    Very rarely am I caught up in my RSS reader (google reader). I have no problem if a story takes up to 15 minutes to reach my reader.

    Clearly, there are some applications (my business depends on this information) where speed is super important.

  • All of this is an interesting example of incrementally reinventing the wheel (badly) because most people are unfamiliar with the history and literature of these types of systems, particularly with respect to scaling. It amuses me to see design choices that are oblivious to known pathologies.

    There is nothing new here, capability-wise, but the software community has always been resistant to learning the lessons from past implementations. This time will be different, doncha know.

  • I hate to see the wheel getting reinvented all the time.

    The RSS specification actually includes support for ‘clouds’ of clients that get notified when feeds are updated. Obviously firewalls are a problem here, but that could be worked around via some kind of middle-man solution.

    Also there are so many notification schemes around, we don’t need more. We need to standardise on them. Growl is a nice one for Mac. Obviously here we’re talking about some way for RSS clients themselves to get updated, but I fear the solutions above are not going to feed into RSS clients at all, but rather invent some new promoting scheme to feed through.

    Lastly, I’m not sure there is a problem here for the majority of people. It’s well known that task switching via interruptions greatly reduces productivity, and it certainly does for me – which is why I check my 100 or so RSS feeds about twice a day. If someone has a full time job working in news and needs instant notice obviously real-time updates could be an issue, but that’s a minority use case, and if it’s so important setting the RSS refresh time to 2 minutes would solve that.

    • Chris, I think the problem is not much on the “individual” consumer side, but more on the “producer” (which gets polled all the time for the same information) and for the “service” consumer which needs to poll all the time for hypothetical information as well.

    • Chris, your concerns are understood. That’s why we have a wiki page that discusses how RSS Cloud and weblogs.com fall short of the full solution:

      http://code.goo...b/wiki/PriorArt

      • What’s stopping you building on top of the RSS cloud? I haven’t studied your spec in any detail, but it looks like your solution is a superset of what RSS clouds already do and hence basically doing what I mentioned in my original post (”middle-man” to get past the firewall), but not building on what’s already there. I’d suggest it’d be much better for standards if you worked to make your hub plug into RSS clouds, and specced just the sub protocol part – then put that forward for inclusion in a new RSS standard.
        That said, you do seem to be talking about Atom instead on your site, so maybe that’s why.

    • The tag may be in the RSS spec, but it’s rarely seen in the wild. I think the reason for this is that the complicated part isn’t putting the tag in the RSS, it’s designing the hub that handles all that data. PubSubHubbub does a nice job of defining both parts of it — not just one.

  • Strangely this article does not get specific as to the delivery method for the RSS hubs. PubSubHubbub uses webhooks.

    More info on webhooks:
    http://vimeo.com/4537957
    http://www.slid...ow-presentation

  • Arg… (comments lost?)

    Erick, I am surprised you didn’t mention http://superfeedr.com since this is exactly what we do!

    We even called that “Turning Feeds into Streams” (vs. Speeding up RSS) :
    http://blog.sup...lding-realtime/

    I will hopefully catch you tomorow and have enough time to pitch you!

  • I’m surprised that they don’t mention SUP (http://code.goo...updateprotocol/), which is supported by FriendFeed. I wrote an article yesterday talking about Push vs. Pull and I agree that Push is the core of the real time web.

    You can read the article here: bit.ly/Hum6L

  • The Open Mobile Alliance (OMA) has completed the specifications for the Dynamic Content Delivery (DCD) enabler (http://www.open...m/dcd_v1-0.aspx), which addresses the specific objectives of “speeding up” RSS and any syndication-based service, through use of configurable (per channel/feed) definition of content delivery using both OMA Push and pull (HTTP) technologies. DCD provides additional optional features such as capability/context awareness, service discovery, subscription, dynamic selection of delivery bearer/methods, and external interfaces supporting attachment of content providers and device applications to the DCD-based services.

  • In other news only 2% of people know WTF RSS is but 98% of people have shared something via email. I think the focus on realtime for general news like TC is a red herring. Not everyone is sitting at their computers hitting refresh and waiting for the latest piece of news. And in all honesty does it really matter that you read abut something a few *hours* after it has happened? in 99.99% of the cases no. I can see some uses for financial data, etc. but for general news and the things that RSS was designed for? No way.

    • As Julien has commented a number of times already, this is not about individual users hitting refresh on their newsreaders. This is about FriendFeed needing to query Flickr three million times a day for five thousand updates by forty thousand users.

      RSS is the gateway drug of APIs. Long before most companies set up an API for their data they make it available via RSS. The moment that happens, other services start consuming that data, often for entirely invisible uses.

  • Speeding up RSS is a great idea, and needs to be done. BUT, the biggest problem is not the lack of speed of data delivery, but the ability for humans to consume that data. We need better interfaces to be able to query data sources on demand for structured data that is fresh and relevant, and better interfaces to have dashboards of data auto refreshing. Simply having a constant stream of fresh data pushed at you leads to incredible time wasting and stress.

    BTW, I LOVE Netvibes, and even teach it in my WeekendU class, http://bit.ly/17zlx6 so I am very much looking forward to it being even more responsive.

  • Real-time is WAY overrated! Most people don’t give a damn about delays.

    Quality > Speed

  • RSS is not slow. If TC’s feeds take a few minutes to sync and you don’t like it, then the caching needs to be updated so that when a new post is made it’s expired. As for the client side, use a client that has faster refresh intervals. Saying that twitter or an HTML page is faster than RSS doesn’t make any sense. RSS is a document format. The “speed” is controlled by caching on the server side and refresh frequency on the client side. You control the former and your users can control the latter depending on the client they use.

  • I find this pretty much like someone regifted me a 30 year old fruit cake. Isn’t pinging a service after posting pretty old hat? My blog is set to ping 12 different services like this after everypost. feedburner has a ping service and look how speedy that is.

    Personally I think that I would focus on the ability to break feeds down to specific posts that way you could remove a large bulk of processing by only having to parse the posts you don’t have. Do this at the source rather then some middle man and you’d make me happy. I tuned my rss grabing engine to stop parsing a feed if it hit an article it already had and it was able to process three times the number of feeds in the same amount of time. Usenet had a great feature NEWNEWS and posts would aggregate pretty quick across thousands of news servers way back in the late 80’s.

    • PubSubHubbub specifically aims to only transfer the differences in feeds between publishers and subscribers (our reference Hub does this “diffing” right now). There’s also nothing stopping publishers from running/being their own hubs (we expect and encourage this). In this way, you can have direct pushing of feeds that’s very efficient. So I think Hubbub solves exactly what you’re asking for.

  • As far as realtime information vs delayed…there is something to be said for researched stories vs rumors or unverified information instantly tweeted or posted to a blog. For example, a little research would show that a document format does not have a speed setting and that maybe the premise of this article was retarded.

  • It’s an interesting article and hits lots of issues. Obviously RSS has a place in being one option and approach to deliver data.

    Companies use the Gnip Integration Platform (http://www.gnip.com) to integrate APIs and RSS based feeds, and besides supporting a traditional polling approach we also provide a PUSH based approach.

    In fact, TechCrunch just posted on our new service that lets any company use Gnip to add PUSH to their existing infrastructure.

    see: http://www.tech...-business-data/

    Cheers,
    Shane
    VP, Products
    Gnip, Inc.

  • If we’re going to focus on RSS, the best mechanism for speeding it up was devised in SUP (Simple Update Protocol) http://code.goo...updateprotocol/ . Dirt simple model for publishing a change-log and only doing heavier polls for the stuff that actually changed. It’s nice to see some SUP pickup via YouTube of late; traction is good.

    The bigger issue, as several people point out is that general “polling” is the problem. The topic’s been beaten to death (http://blog.gni...-gnip-platform/ , http://code.goo...WhyPollingSucks , http://one.vale...e-tease-me.html and many others). PubSubHubBub provides a general framework for resolution and moving to a more hybrid polling/event driven model which is great.

    When Gnip (which I’m a part of) started, we were at a crossroads of whether or not to go champion the pubsubhubbub (distributed) model, or to go put rubber on road and try to bring publishers and consumers to the table to get event driven data distribution into the mix. We went with the latter and have had some great success thus far. I’m stoked that pubsubhubbub has since formalized the distributed model, and Gnip will likely be a node in that network soon (motherhood and apple pie).

    We’re going to start seeing publishers like TC publish more content/posts like these now that “real-time” web stuff is starting to kick into high-gear. The need for less latency in the system is obviously high… we just need to leverage existing standards/platforms/frameworks to speed things up.

    Two selfish plugs:
    - We (Gnip) have a chapter titled “Real Time Data Portability” in the O’Reilly “Beautiful Data” book coming out later this month. http://oreilly..../9780596157111/ If you’re interested in this thread, you’ll appreciate that chapter.
    - There’s a conference coming up called “Real Time Web ‘09″ that you should check out (Gnip’s a Sponsor) – http://rtw09.com/

  • Just for completeness sake, I’d also like to make sure FriendFeed’s Simple Update Protocol (SUP) gets mentioned as another competing “standard” to speed up feed delivery.

    http://blog.fri...ch-updates.html

  • RSS is really just the encoding here. If there is an event push infrastructure in place then RSS can be in real time.

    All of Spinn3r’s customers benefit from this now as the messages / events that Spinn3r receives (with spam prevention) cause any of our 30M feeds to be updated in real time.

    This includes all of our social media, etc.

    Of course to process this much content you have to have a serious infrastructure setup to handle the throughput…

    There probably is lag within Google Reader because they are not listening to the ping stream (or they haven’t tuned their application to push the data in real time).

  • Interesting mention of publishing and IM colliding, we’re working in this space with our IM and micro-blogging client. http://ximpp.com/

  • I’m sorry, but didn’t we go through all this in the mid to late 90’s? With Channels, push content and proprietry (and open?) clients…

    While it’s may have been ahead of it’s time then, I suspect that there’s always been various push technologies and services (e.g. stock/exchange market info, new tickers, etc) that have persisted since they have a specific need for “real-time”.

    But given that both RSS and HTTP have support for how frequently they should be polled/expired, which, honoured, should take care of most situations, internal processing times aside…

    Just my 2c.

  • This is not a surprising opinion for a technologist who’s lost touch with the average user.

    As a gainfully employed software product manager, I check into Twitter/Seesmic a few times a day, if I’m lucky. Later in the evening, I open up Google Reader to selectively read through all my feeds. By the time I pull up my day’s reading, I have 3-400 new items.

    Whether or not TechCrunch’s very latest article is in today’s collection of new items, or tomorrow’s, makes no difference whatsoever to me.

  • Well, rss is far from dying out. There are limits to any real time update service, twitter has word limits.

  • pubsubhub is web developers trying to reinvent messaging over HTTP. COME ON! Messaging was pretty much solved a long time ago. There are excellent extensions to XMPP for doing this.

    • Kalle,

      I’m a huge XMPP fan. I even wrote an open source XMPP server, djabberd.

      That said, XMPP isn’t catching on too quickly, hence Brett and I recognizing that HTTP is the baseline that people today actually use and have access to, thus our making PubSubHubbub use HTTP instead of XMPP.

      Trust me, we’re well aware of XEP-0060. :-)

  • Reality check: some people, for some purposes, need up-to-the-minute information, but most of us don’t. I’ve been using Twitter for 20 months and have a fair amount of discretionary time but I don’t care to swim constantly in the Twitter stream. Blogs are far more valuable to me, and RSS readers do a pretty good job of organizing them. And the news feeds are timely enough.

  • Mix RSS with server sent events(http://dev.w3.org/html5/eventsource/), that way you will instantaneously get the new records.

    Although, I’m not sure how server sent events will affect today’s webservers performance. As this will definetively keep an incredible number of connections open on busy servers.

  • I personally hate twitter and love RSS feeds. I like having all my news come to me and not have to go to the news/blog sites to read the articles. With twitter, you still have to click on links, wasting time. With RSS feeds, many sites include the whole article, saving clicks and saving time.

  • Panos Kougiouris - July 10th, 2009 at 9:08 am PDT

    I also write to report that I love RSS. I can see why an editor and a trader cares about every second but most of us have a real job and a lot of work to do and we usually catch up with reading our news once or twice a day. For me RSS is a great aggregator that collects all the news I care about so they are in one place when I have time to go read them.

    I also do not get the real time trend for news. I strongly believe that for most of us, most of the time real time information is not an issue. Now I am not naive, I understand why the media and the celebrities want to make us believe that we need to know the latest news (e.g. the gossip about a celebrity)—this way they can feed us any ad they want any time they want—but I do not think so; almost all the news can wait…

  • We may or may not need faster RSS but we do in fact need to significantly improve all of the interfaces we use to interact and engage with information. One of the reasons we have the costly problem of Information Overload (which costs the U.S. economy $900 billion per annum in lost productivity) can be traced directly back to the tools we use.

    Information Overload Awareness Day (www.InformationOverloadDay.com) is but one of many undertakings in the fight against Information Overload; being aware of how much Information Overload costs you (www.basex.com/io) is yet another.

    Regards/Mit freundlichen Grüßen
    /s/ Jonathan Spira

Leave Comment

Commenting Options

Enter your personal information to the left, or sign in with your Facebook account by clicking the button below.

Alternatively, you can create an avatar that will appear whenever you leave a comment on a Gravatar-enabled blog.

Trackback URL
Short URL
bugbugbugbug
Techcrunch on Facebook