Facebook’s Friends Data Has Already Left the Barn
Erick Schonfeld
55 comments »
How much are your friends worth? That is the question behind the big debate going on around social networks and data portability. In the last ten days, Facebook, Google, and MySpace have all announced ways to let people access their data (including friends lists) from other sites, except that what they are really trying to do is erect new walled gardens by positioning themselves as the primary repository of that personal and social data. This is valuable data and none of the big players want to cede any more of it than is necessary, which is why Facebook banned Google from tapping into its members’ social data.
But here’s a little secret. All of this data is already leaking out in ways that Facebook and other social networks can hardly control. Startups are finding ways around their official APIs to get the data consumers want into their own systems. For instance, Zude, a personalized Webpage service, recently launched a feature called SocialMix that lets people import friends lists, photos, profile information, status updates, comments, and other data from Facebook, MySpace, Bebo, Orkut, and hi5. (See the screen shot below, which shows my Facebook friends on Zude). “What we are doing is taking the information and normalizing it and making it available in any manner you want,” claims Zude CTO Steve Repetti. He was tired of waiting around for true data portability to arrive, so he figured out a hack to offer it on his own (and it doesn’t involve screen scraping).
Taking a different approach, Minggl has found a way to access your social data through a browser plug-in. And Media6° is placing cookies through the ads themselves on Facebook to collect social data for advertisers. If you click on an ad with one of its cookies, then the same ad will be shown to all of your friends, who supposedly are two to ten times more likely to click on the ad than other people. Media6° also should be able to target Facebook members as they wander across the Web (as long as a cookie has been placed in their browsers and they come across an ad with the Media6° Javascript code embedded in it).
I’ve come across other startups who claim to be able to pull profile and friend data from Facebook. Facebook can go after them and shut them down, but it is rightly more concerned about Google gaining free and unfettered access to that data. Google is the bigger competitor and the bigger threat. But in the meantime, all of these little startups are finding ways to get at the same social data being so ferociously guarded by Facebook. In fact, they already have it, and Facebook is going to have a hell of a time trying to put it back in the barn.
(Photo by Larry Wilder).







haha … I like the title of this post
ultimately, facebook is a feature in gmail. so google’s move and facebook’s defense are both completely predictable.
Which methods for letting users import their data working best for folks here?
I second Dave question..
@Dave: the option that doesn’t require much thinking. One click with as many different applications that play nice. Open standards that are consistent with everyone + a simple user experience.
@Erick: good post. I wonder if these companies actually realise storing a person’s data is not the best situation. Value comes through the application of that data. Dealing with the storage costs distracts a company innovating with useful ways of connecting data to generate information (which is what the consumer wants), as they are too busy deaing with infrastructure to host the data that will become commoditised over time. Lest of all, the social cost of hosting a persons data - privacy being one term we are all aware of with its impact.
I use Facebook because it’s a rocking service. If they arrogantly think because I’ve uploaded my data there and now it’s there own - they seem to forget I will willing to remove my data line by line; re-enter my data elsewhere; and bag the hell of them if they make it any more difficult just like I promoted the hell out of them when Facebook first became popular. Facebook grew quickly because people evangelised - but what goes up can go down - they can die quickly as well if they piss us off.
I still haven’t received my facebook friend connect beta. I applied some time ago. Wonder how long the waiting list is.
If you can reach it in a browser, you can scrape it. I have 100 million profiles from various sites on one of my servers. I was building a social search called peeplr. It was put on pause when I found out that many of the social data portability and search tools such as spock or wink did not make any money.
I am not here doing this for my health.
If you want the 100M profiles I got metadata for and the IP from Peeplr, call me. I would be interested in selling as I no longer have time for that.
I should mention that the profiles have all had their meta data extracted, and you can use the robot to go ahead and extract a billion more if you like.
/sbin/service/peeplrd start cluster ip1, ip2, ip3 -p password
It will leave Mozilla as the user agent and they will suspect nothing. I was able to parse 100M of meta data in 2 months with nothing firewalled from a single IP address.
Control is also a value promoted by dataportability.org –
“Connect. Control. Share. Remix.” as their slogan has it.
Hence, there’s no reason to denounce Facebook saying that they are not data-portability-friendly, while they’ve expedited to make progress towards “Facebook Control” — after their “Facebook Connect”, and “Google Friend Connect”.
As such, both Facebook and Google both achieved 1/4 of the data portability values.. though Facebook is (undeniably) leading the way.
We just have to be a little bit more patient to wait for -
a) Google Friend Control
b) Facebook Share
c) Google Friend Share
d) Facebook Remix
e) Google Friend Remix
That said, nothing prevents Google to move faster (or, well, if they can)..
Zude’s SocialMix application is really based on fundamental Data Portability concepts. For us, it was a logical extension of our already extensive support of DP (OpenSocial, XML, JSON, h/v Card, etc…more than 60 combinations of data/formats/feeds).
SocialMix does not store data (though it includes a volatile caching component to enhance performance), rather it normalizes information – regardless of source or format – so that applications can use the information in the most optimum form.
The application-side of SocialMix uses this information to make it easy for users to drag-and-drop the normalized data, or even its underlying source, in ways that become immediately useful.
Nevertheless, Zude’s SocialMix, as well as the Data Portability initiative itself, is an ongoing work in progress – technological, social, and political…
More at: http://www.zude.com/socialmix
Oops, sorry, I didn’t mean to bold the whole article..
must have missed out the ending slash when typing the /b turn-off tag…
we need comment-editability on TechCrunch
That said, apoogies for the bold…
Erick, why are you sharing your full birth date with us? I wouldn’t.
In the realm of security software, it is now standard to recognize that the most secure systems are those that operate correctly even when all of the source code for the system is open. Security through obscurity is, at best, only temporarily useful. No matter how well you obscure your code and algorithms, eventually someone will figure them out.
People building “social sites,” and those investing in them, should recognize that the most sustainable sites are those that have business models that work even if all their “data” becomes visible (or if they don’t rely on “owning” any data at all!). Walled gardens are simply not sustainable. No matter what restrictions are put on data portability, eventually someone will find a way to make it happen.
bob wyman
flickr.com/photos/8606487@N03/sets/72157605112575732/
It does profile metadata extraction on 30 of the top social networks on the web including facebook.
Here is a close up of what it looks like from the crawler daemon:
http://www.flickr.com/photos/8.....112575732/
another
flickr.com/photos/8606487@N03/2501331858/sizes/o/in/set-72157605112575732/
All the IP, the logo and the Peeplr pending trademark at the USPTO is yours for 100k
This project does not cover the scope of our business plan and so I want to sell it.
This will let you do what the guy that wrote this post is trying to explain.
Parsing 30 large social networks is hard. All the work is done for you and I will throw in a dual core pentium 1.5TB server with a Gigabit adapter to run it. You can call me at the toll free number on our website.
If anybody is interested but fears that this is a ruse, I can literally put gigs of data online for you to search through with the GUI front end. I can also get you the source code to the project so you can verify that the IP, the trademark, and the logos are sovereign and contains no 3rd party code. It’s “clean”.
I have no problem popping the hood for any prospective buyer.
We have no huge budget to run up something like Wink or Spock or even a data portability layer based on what the blog article talks about.
We have to be profitable now, not in 3 years. We’re going after Monster. So I hope that answers any questions.
Hey Erick, do you think TechCrunch can maybe slow down and wait for the news to catch up before you (TC in general, not you specifically) post another three essays about data portability in as many days? I love the news here, but I don’t think “people are getting data off of facebook” is worth another few pages of pontification.
They have released an API, therefore developers should be able to what they one. Facebook could only stop that by limiting its API. But what’s the point. Release it, and then later realizes, Oh its our data.. blah blah blah and pull back?
@11 Steve Repetti — until Facebook blocks you, ha ha..
[let's hope they won't -- but if they do, it will raise your profile also and gets you onto the faster exit route.. so, perhaps a necessary evil :P]
@18 — it’s not only about API.. it’s about what you can do with the data extracted by the API.
Chris: stop spamming the TechCrunch comments. Do you happen to be the same lame ass Chris “beerco” ? Also, why does it take you 5+ posts to get your point across?
How valuable is any of the data on Facebook anyway? Friends lists have gone from close friends to acquaintances people often hardly know. As these lists are now social contacts at best, the next iteration will be less about spreading garbage friend lists around and more about tiering them to share relevant information with the right sets of folks. Garbage in= garbage out.
Rather than passing around increasingly bloated feeds of connections (and their updates) in the need of spring cleaning, the data being shared needs to stop filling the barn. The interesting tools will be those that help us cut
Sorry, last line should be cut through the clutter
i guess soon we will arrive in a new period of the internet when all this questions will disappear because data will be portable anyway.
but lets make money now we block like this we block like that (untill we can)
i imagine a different future.
independent sites, with portable data, portable features.
hopefully in the upcoming years the creativty will be the value in “personel profils” or sites.
(nobody is stupid to give real information in online profils by the way long time now, expect one of their e-mail address)
Eric:
“I’ve come across other startups who claim to be able to pull profile and friend data from Facebook.” - your post today -
“I came across developers recently who had the info they mined with data they pulled down from FB and other social nets - and trying to peddle their wares calling it analytics” - comment no. 53 by G - May 15, 2008
http://www.techcrunch.com/2008...../#comments
Glad you are reading the comments - just one question - what took you guys so long - been going on for months now - and I even commented on this months ago - and no one bothered.
So let me ask again - as I did on May 15th - “Michael - how about you call out Zuckerberg on this one - and see if you get a clear answer on data retention controls” - or even better - How about a discussion on the Gillmor Gang to talk specifically about “data retention” - would be a great podcast.
I’ve created a petition to unban Google Friend Connect from Facebook. Oh, and I’ve created it on Facebook. Please join it.
Technically, you’re allowed to grab the friend data as long as you play within the API rules (storing for 24 hours, only user initiated API calls, etc)
The real question is, who will pull it all together in an integrated way that makes the aggregation more functional for the end user than the disparate sum of parts.
…
i think facebook still has a cashflow problem and by allowing other companies to use api for free can hurt them more than it helps. Exposing private data to external, unknown companies will hurt the integrity and reputation of facebook. If the added benefit of allowing external api is to make facebook fun and interesting, then it should be something that the facebook team needs to create internally.
Not sure why one should even advocate data portability? Both google, facebook make money off advertising and people chose them despite knowing that facebook has privacy issues and is full of junk and spammers. On the other hand, google even searches your desktop and they are no difrernet. So consumers have no voice and should not have when something is free. Welcome to capitalism!
Facebook will lose a lot of people with data portability if one of these networks become popular.
OK, after data portability is there what happens?
I haven’t seen Mobile Operators going bankrupt after regulators forced the portability of the numbers.
The best way to get people coming back is by offering a unique and good service. Very simple…
Did you try Copenda?
You can easly import your social profile to our social people search engine.
Check it here:
http://www.copenda.com
One way that a lot of companies (like FriendFeed) are importing your friend data is by just having an application that you add which exports your data to their servers. I wonder if facebook would block a Google FriendConnect application.
You can also try http://www.leapways.com.
How exactly does Media6 work it’s magic without an app-install? You’re saying javascript in the ad itself can scrape social graph while you’re surfing your buddies’ profiles?
If that’s the case, it’s a very very brilliant hack…
I tried to give Google and Yahoo a solution to these issues over 3 years ago, but the time wasn’t right. Gutted now. More info at http://webdesignfromscratch.co.....s-data.cfm
As CEO of Media6Degrees, I thought I would step in and make a
significant point of clarification; Media6Degrees does not and has no
plans to scrape ANY content off ANY pages. Our R&D is 100% focused on
behalf of brand marketers to more confidently message across the
entire spectrum of user-generated content. Our principle data source
is the marketer’s own ad-serving logs.
I am hoping someone comes along and offers to host profile information and friend data independently and somewhere USERS can choose, with a login to the destination social networking sites then borrowing that data from us to provide a user experience.
Who will step up to the bar first…?
Ian Hendry
WeCanDo.BIZ
http://www.wecando.biz
Any one else find it amusing that Erick lists himself in that screen shot as “married” as well as “looking for women”?
ecirkit.com has had data portability since january 1st they call it the social penetrator I saw it featured on G4 attack of the show a few months back. Facebook Flickr Ebay Youtube. You can grab it all and it works really well.
as for google and facebook wars its really stupid because its the internet for christ sake Share People SHARE!!!!
Before you jump to conclusions, please hear me out
#######################################
OK, I just recieved a C&D from Facebook.
I better explain:
“It does profile metadata extraction on 30 of the top social networks on the web including facebook.”
We created 25-30 modules to extract PUBLIC data from the web. The same data Google has when it creates it’s indexes. The same data any search engine has. Except it’s organized for social searches.
Yes we created a module to grab Facebook data, but NO, we never used it to get any quantity of profile data from the Facebook website. It was created but disabled.
We do not violate any social network site policy, such as Facebook’s or Orkut’s
People that read this thread jumped to conclusions and now I have an email box full of false and hurtful allegations, some of which were from Orrick, Herrington and Sutcliffe and lawfirms saying that we violated user agreements with no basis or grounds.
We wish for this to stop immediately. If this goes further I will take action on behalf of Orjii LLC accordingly to recover any losses that are incurred.
Chris - Orjii LLC
Before you jump to conclusions, please hear me out
#######################################
OK, I just recieved a C&D from Facebook.
I better explain:
“It does profile metadata extraction on 30 of the top social networks on the web including facebook.”
We created 25-30 modules to extract PUBLIC data from the web. The same data Google has when it creates it’s indexes. The same data any search engine has. Except it’s organized for social searches.
Yes we created a module to grab Facebook data, but NO, we never used it to get any quantity of profile data from the Facebook website. It was created but disabled.
We do not violate any social network site policy, such as Facebook’s or Orkut’s
People that read this thread jumped to conclusions and now I have an email box full of false and hurtful allegations, some of which were from Orrick, Herrington and Sutcliffe and lawfirms saying that we violated user agreements with no basis or grounds.
We wish for this to stop immediately. If this goes further I will take action on behalf of Orjii LLC accordingly to recover any losses that are incurred.
Chris - Orjii LLC
If you are reading this thread because somebody emailed you a link, please check this post out FIRST.
#######################################
OK, I just recieved a C&D from Facebook.
I better explain:
“It does profile metadata extraction on 30 of the top social networks on the web including facebook.”
We created 25-30 modules to extract PUBLIC data from the web. The same data Google has when it creates it’s indexes. The same data any search engine has. Except it’s organized for social searches.
Yes we created a module to grab Facebook data, but NO, we never used it to get any quantity of profile data from the Facebook website. It was created but disabled.
We do not violate any social network site policy, such as Facebook’s or Orkut’s
People that read this thread jumped to conclusions and now I have an email box full of false and hurtful allegations, some of which were from Orrick, Herrington and Sutcliffe and lawfirms saying that we violated user agreements with no basis or grounds.
We wish for this to stop immediately. If this goes further I will take action on behalf of our company accordingly to recover any losses that are incurred.
Chris