August 1, 2006

Bloglines wants to block private feeds from search

Marshall Kirkpatrick

58 comments »

“Everything you blog goes on your permanent record!” How many times have we heard that lately? From employment to family situations, many people have been frustrated to find out that things they intended to write for a personal audience is now discoverable by anyone in the world via search engines. Bloglines proposed a new standard tonight to change that.

You can have private pages in places like Flickr and MySpace, but your page’s RSS feed can still be discovered by search engines. That’s what this new standard aims to change.

The proposed standard will allow XML/RSS/Atom feed publishers to keep their feeds out of search engines and unavailable for discovery by adding an access:restriction tag to the top of their feeds. Bloglines and Ask now support this tag and will keep feeds tagged as restricted out of their search and subscription results.

You’ll be able to pass a private feed URL to a friend you want to subscribe, but your prospective employer will not find it in participating search engines if you have a private account.

The Robots.txt protocol that tells search engines not to index web pages was agreed upon in 1994, but that’s just for HTML web pages. A growing number of search engines are now indexing the more dynamic XML/RSS/Atom feeds first. This new standard is an important part of the whole story around limiting distribution of our private accounts online.

I just talked to Robyn DeuPree, Senior Product Manager at Bloglines, and Paul Querna, Senior Software Engineer, and they told me that the company hopes that both content publishers (Flickr, Myspace) and search engines (Google, Google Blogsearch, Technorati, Icerocket) will get on board and make this feature available to users creating content who want their content undiscoverable by search engines.

No formal agreements have been made yet with any other company, but it’s hard to know why they wouldn’t accept the idea with enthusiasm. Many feed readers don’t support formally authenticated feeds (where passwords are required) but this should be easy to implement.

Will other feed readers respect this proposed standard? I sure hope they do - this is a great idea for which the time has come. Goodness knows I’ll start a whole new MySpace account if I know that it’s feed will be kept outside of search!

  • Sphere It

Trackbacks/Pings (Trackback URL)

  1. Bloglines adds new privacy feature for feeds » Bloghacking
  2. TechCrunch Japanese アーカイブ » Bloglines、検索からプライベートフィードを外す基準を提案
  3. FuzzyBlog » Blog Archive » Bloglines Search Proposal - Tastes Great, Less Filling
  4. Basic Thinking Blog » Blogger und Privatsphäre
  5. Elvis Ate America
  6. Tech[dot]Blog
  7. ResourceShelf » Search Briefs: Digitizing Books at the University of California; Our Friend Danny Sullivan; Bloglines wants to block private feeds from search
  8. Suburbia » Privata flöden för bloggar
  9. TechTalk @ ppmz
  10. Techcrunch » Blog Archive » Blogger.com unveils new beta version
  11. TechCrunch en français » Blogger.com dévoile (enfin) une nouvelle version beta
  12. Once in a blue moon
  13. vrypan|net|log » links for 2006-08-28

Comments

RSS feed for comments on this post.

  1. Brian

    I don’t understand why people want to put private stuff on the Internet and expect it not to be found. I don’t take out ad time on local television stations and talk about my love life because I know someone is watching it somewhere. Why would I post stuff I didn’t want revealed on the Internet. It’s been at least a decade since we learned this lesson I think.

  2. Thad

    This is great! I hope that Flickr, t’rati and others adopt this too. Congrats Bloglines, great move!

  3. Adam Kalsey

    Robots.txt isn’t just for web pages — it’s supposed to tell bots which files on your server you don’t want them to access. There’s no reason why you couldn’t use this if all you wanted to do was tell Bloglines not to access your feed at all.

    But if you want to allow Bloglines to access your feed so people can subscribe to it, but at the same time ask Bloglines and other services not to redisplay it publicly, robots.txt can’t help you there. That’s where this comes

    There are some interesting issues about feed contents, public web services, and content redistribution. One is the blog scraping that’s often used to create spam blogs. But there’s others as well, some of which I wrote about a couple of years ago: http://kalsey.com/2005/01/feed.....copyright/

  4. Duncan

    I can’t help that wonder though as to why Ask/ Bloglines has just gone ahead and done this without consulting its industry peers first…after all it’s pretty useless unless they all agree to it.

  5. Ezra

    I have a case where this is not so much about “putting private stuff on the internet and expecting it not to be found” (even though this article makes it sound that way). The website I work on offers custom RSS feeds based on keywords for registered users. That’s great if you have an RSS reader that runs on your desktop, but in Bloglines, it has the unintended consequence that all these custom feeds become discoverable. It’s not so much a privacy problem, it’s more a gunking up the bloglines feed search problem.

  6. Simran

    This is a fantastic idea. I just don’t see how RSS feeds can be completely private. It’s just the search engines who will block out those feeds.

  7. RBA

    #6 Adding a tag for crawlers doesn’t make them private at all. I’m sure we all can see that.

    However, RSS feeds can be username/password protected.

  8. 007

    This is a great addition to the problem of feed bleed but unfortunatley it can do nothing to stop the weakest security link (humans). You can set the flag and share your feed with friends but there is nothing to prevent your friends from posting the content somewhere else. Your friends today may not be your friends tomorrow. Please be smart and don’t post something you wouldn’t want others to see. It’s just common sense.

  9. Chris Pirillo

    …MySpace has feeds? :)

  10. Techwarrior

    I agree with Duncan, they should have partnered with some other content sites and engines. But overall, good thinking on Ask’s part for trying to give people what they want, because that’s what its all about.

  11. Stewart Butterfield

    I doubt this matters to you Marshall, but while part of this sentence is true in the strictest literal sense, it is totally misleading:

    “You can have private accounts in places like Flickr and MySpace, but the contents of your account’s RSS feed can still be discovered by search engines.”

    Yes, your accounts feed can be discovered by search engines, but it never contains any private data. So, the two halves of the sentence are basically unrelated (cf., “You can believe what you read on TechCrunch, but you might be woefully uninformed.”)

    (Also, there are no “private accounts” on Flickr, just private photos.)

  12. Alex Barnett

    Have posted why I disagree this is a good idea here: http://blogs.msdn.com/alexbarn.....86111.aspx

  13. Danny

    This extension *doesn’t* provide access control in the way authorization would, it’s only optional, so I’d suggest it’s wide open to misinterpretation (as in the first paragraph of your post). Longer version:
    http://dannyayers.com/2006/08/02/in-band-robots

  14. Rogers Cadenhead

    One issue I have with the access element is that it doesn’t work cooperatively with the creativeCommons tag that’s already in wide use.

    http://www.feedforall.com/creativecommons.htm

    If I’m using creativeCommons to allow non-commercial redistribution of a feed, that’s a more fine-grained reuse policy than what this access element permits.

  15. Scott Johnson of FuzzyBlog, Ookles, Formerly of Feedster

    I really see this only causing problems and making people think that their stuff is secure when it really isn’t. Unfortunately this is a social problem that requires education. Its not really a technical problem imho — people simply need to understand that once something is on the net it WILL be indexed and it WILL be found.

    More in this post and podcast:
    http://fuzzyblog.com/archives/.....s-filling/

  16. LostInBrittany

    I don’t get it. If you blog about something in the net, you know that complete strangers can read it. If you publish a feed, you know that complete strangers will receive it. Not ?

  17. Chris Scott

    I think Ezra in comment #5 hit on the reasoning. If you have a RSS URL that requires HTTP authentication, that feed isn’t accessible to others, including search engines. However, once a user adds that to their Bloglines (or any other feed reader that lets others search your feeds) and it isn’t marked Private then the content is accessible to others. This tag would give the feed publisher more control and not have to rely on the user to mark it Private.

  18. Randy Charles Morin

    IMHO, this solves a concrete problem. As we know, people have complained about republishing RSS.
    http://www.schwimmerlegal.com/....._aske.html
    This gives Bloglines a viable means of responding to these complaints. Problem solved. Well done.

  19. Marshall Kirkpatrick

    Ok, this was a complicated story to write but I did my best to get it right - we’ll see how things unfold in the future around this. I won’t claim to understand every drop of the minutia, but I think I got the big picture right here. Nice to see so many topic experts here weighing in, even if many disagree with my take on it :)

    A few thoughts:

    1. The fact that this is voluntary doesn’t seem like a big deal, that’s the case with many standards. If Google or Technorati gets on board then marked feeds won’t show up in Google and Technorati searches. That’s good. Malicious search engines and sploggers may not respect this standard but that’s the case with all voluntary standards.

    2. On expecting privacy online: many services offer an option to limit access to your content and this is just one more move in that direction. SixApart, for example, often says that the future of blogging is in blogs intended only for a small group of readers like friends and family - not for global distribution.

    3. On the efficacy/need for such a technical step - I’d say the need is there and the efficacy will remain to be seen. Providing a unique URL for personal feeds leaves those feeds open to discovery and redistribution. This is a way to stop that.

    4. Chris - MySpace does offer feeds - they look like this http://blog.myspace.com/blog/r.....D=591XXXXX

    Ok! We’ll see how this plays out in implementation.

  20. Michael Martine

    I don’t understand why people want to put private stuff on the Internet and expect it not to be found.

    What do you care? If people want it then they want it. Give it to them. We don’t have to agree with or even understand their reasons.

  21. kogent

    Livejournal has a level of security that is cookie based on a per post level and set specific security as a default. Users can set permissions on who can read what articles (ie - friends only, specific groups of friends, etc). It requires people to login to livejournal’s site, then they can view their friends protected blogs. I’m pretty sure this authentication applies to feeds as well.

  22. lost in space

    well done Ask/Bloglines!

    This is the second cool/useful RSS move they made in the past two weeks.

    Last week they launched RSS Smart Answers. Put the content where people will see it.

    They even use TechCrunch as an example on their blog.
    http://blog.ask.com/2006/07/introducing_rss.html

    I also noticed a day or so ago that in some cases the blogs author or publisher will trigger. For example:
    http://www.ask.com/web?q=micha.....&l=dir

  23. Gordon Weakliem

    Randy: this does not address Mr. Schwimmer’s complaint. He objected to BL’s plan (stated plan of 1/2005, not necessarily current) to display context sensitive ads next to his feed. I don’t recall him objecting to his content showing up in search results.
    Bloglines says in their announcement “In Bloglines, we’ll use this to prevent the display of the feed information or posts in search results or any other public venue. ” I take this to mean that Bloglines users will be able to subscribe to a feed, but BL will not admit to the existence of that feed, either in search results for feeds, for content, or in a users’ subscription list - possibly other places. I wonder if they will disallow the “clip/blog this” feature for posts as well.
    My issue is that the spec isn’t especially clear on the what a publisher can expect aggregators to do about this element. BL gives a concrete use case which would help clarify the spec, if it were included.
    I would also like to note that some server-based aggregators already exclude a feed from search results if the feed is password protected, so for authenticated feeds, the allow tag would be superfluous.

  24. Kevin Marks

    As Stewart said, this makes little sense - if private content is showing up in unauthenticated feeds, adding a ‘don’t index me’ flag is not solving the problem.
    MySpace (like pretty much every other blogging platform) already allows you to set up a space as private, and thus opaque to search indexers.
    Now, as Bloglines is a feed-reader, it needs to cache your authentication information to show you the contents of such private feeds. If Bloglines are feeding the contents into their indexing engine, this is where the security breach occurs. Asking publishers to change their feeds to stop them doing this seems a bit odd.

  25. Lea

    This is what comes of using these public programs - you don’t have full control of your data.
    If you are hosting your feed yourself, you don’t have a problem:
    - drop the feed in a separate directory, eg. /nocrawl/
    - add the exclusion line to your robots.txt file for that directory
    - tell your friends, family and lovers where the feed is found (possibly not all those people should read the same feed…. ;))
    and you’re done.

  26. noindex

    Why not use
    ?

  27. Marshall Kirkpatrick

    Fwiw, Bloglines just posted a follow up on the original post about this proposed standard, with reference to this discussion. http://bloglines.com/about/news#116

  28. Panatotis Vryonis

    This is stupid. It brings to mind the “net neutrality” issue. If it’s public, it’s public. Users interested in keeping their content private should do so by protecting it (or their feed) with access controls.

    robots.txt is not supposed to control content distribution but manage machine interaction: help bots not to crawl “useles” pages (ex. where you just her a message that you have to login first) or enter infinite loops. The fact that some search engines interpret robots.txt as indexing permissions is just their choice. For example (let’s say) I could buy Google’s index. The fact that robots.txt disallows my bot from crawling site X does not mean I am not allowed to use Google’s data to include site X pages in my index.

  29. KenguruNET

    Wow! Read this :-)
    Sony Corp. has released its long-discussed e-book reader and launched an online bookstore, which raises far more questions than it answers.

  30. demmian

    People! I want only thank for you;)
    Thnks!

  31. Vencom12

    Hello, friends, I’d like to leave a memory at your GB.I thnk that there is no necessity to make heavy weather of describing how nice your site is! So long, it is possible to hang at such a site all the days long

  32. Maybesssss

    You have wonnderful site & book! Thnks people!!!!!
    I want come one more time

  33. Andys656

    Excellent work! Maybe it is hard to find something pleasant for me, but in fact there are the things like your site, bringing a sheer pleasure!
    Thnks

  34. MaxFinder

    Thanks a lot for this site. I will forward this link to all my friends interested in this subject!
    It was really interesting to surf your website. A lot of useful info can be found here!Respect!

  35. BobWinston

    each visit it becomes more and more pleasant to me. It is felt at once, that professionals work at the site! It is such an unusual occurrence nowadays!

  36. KellyMarshall

    Wonderful design in a complex with professionally picked up information leaves very pleasant impression.
    You have worked well!

  37. JamesBrown

    I placed your site into favorites. I have found some of the information here helpful.
    There are too little good sites. Your site is the best. Thank you for the chance to sign your guest book.

  38. AllenBoba

    Having found out your site, I have apprehended it skeptically, as one of thousand similar. Excuse me, after viewing the first page, I have understood that you are the best!
    I cannot call this subject as a hobby as it became my life with the advent of your site!
    I would like very much to thank you personally, but not having such opportunity and allow myself to leave a note in your guest book!

  39. SeanArmstrong

    I have devoted all life to it. Nobody understood me. My reasoning caused only sneers.
    People considered, that I only waste time. You have returned me to life, having collected on your site the most interesting information and the most interesting people.

  40. monleypiton

    nice book you have there!
    maybe the best one of i saw in my life!
    thnks!

  41. EshlyMonty

    Thnks for work!
    Keep trying & you been one of most popular!
    Thnks!

  42. entempo

    Lets get more good resource out lik
    e this site good work!
    http://formuladesign.com/entempo/index.html

  43. keepadiary

    Interesting WWW page. Our family is using it
    http://www.ikeepadiary.com/diary/2005/

  44. BobThompson

    I m engaged at the same business.
    What can I say Great Work!!!