
The following guest post was written by Dan Birdwhistell, founder of people directory Bigsight (reviewed here) and creator of Hacking Facebook, a website that teaches developers how to pull user data out of Facebook.
There’s one thing about Facebook that most people still seem to have wrong: that it’s a walled garden. Quite the contrary, the Platform allows for full data portability and has since its inception. It actually isn’t a walled garden at all.
The problem is that this knowledge is buried deep within the FB documentation, a place few developers have wandered. For whatever strange reason, legal documents are like amusement parks for me, so I’m now fairly well acquainted with the ins and outs of porting data (and users) out of FB. So that’s what this whole post is about: To show you how it’s done.
Background
Once we got our heads around the Platform back in October, 2007, we hacked together FriendCSV as a demonstration. This is an app that allows you to export your full social graph (and all friend data) to your hard drive. This is all done in accordance with FB policies. After people got comfortable with this, we took it a step further by allowing users instantly port their own personal data into bigsight to create a new profile and account. Test out our importer here.
Why Facebook and the Platform are important
We believe FB is architecting the next version of the web. This is a bold claim – no doubt — but here’s the thinking:
- FB has the users: 80mm and growing, with huge international membership and no age bias.
- Users enter their real information: Users enter their real name and affiliations. This moves the web away from (and makes users comfortable with abandoning) aliases.
- Users express themselves by connecting to entities that are “outside”: Users articulate their identity by claiming lasting elements like cities, companies, schools, and groups (or pages) that exist outside of FB.
- These entities are increasingly moving “in”: These groups are connecting to the same users and establishing broad footprints through ads, Pages, and Applications.
- The Platform and FB Connect are building the “between”: All the nice-happy-fun going on between Users and entities inside FB will start to extend back out into the web as developers learn how to build data/interaction bridges with the Platform and Connect.
The result is a web based on users and not content, with an individual’s FB ID ultimately serving as his chief tour guide, passport, and keymaster (but not like Vinz Clortho) around the rest of the web. So if I am right, FB will become king – not as a social network, but as the architect, owner, and manager of the next version of the web. So the point: you need to know how FB works and how you can leverage the Platform to grow your site or business. So here we go…
Understanding how FB Data is structured
Before you go messing around in the pool house, you’ll need to get your head around how everything is structured. It’s best to first focus entirely on non-user data given that these are the permanent structures users “claim”. Each of these elements has a unique ID and entry fields are typically auto-complete to ensure data alignment.
- Location: There are ~540 regional networks and ~24,000 city/state/country listings. Cities in the US are expressed as “City, State abv.” while cities in other countries are expressed as “City, Country Name”. Regional networks outside of the US, Canada, and the UK are typically expressed just as a country. Users claim locations through networks, current city, hometown, work cities, groups, pages, events, and photo albums.
- High Schools: There are ~23,000 worldwide high schools in FB. Users can enter up to two high schools, with graduation year for one of them. High school name and year is expressed on the profile.
- Colleges and Universities: FB recognizes ~5,000 institutions. To streamline search during data entry, FB allows for multiple aliases for the same school. For instance, a user can search/find/select “UCLA” or “University of California, Los Angeles”. Whichever one is selected displays on the profile, though both are linked to the same ID. This makes data integration a bit dicey, but there’s a fix we’ll get to later. Users can enter up to five schools and can ascribe graduation year, type, concentration, and degree type (if it is a grad school).
- Companies: You’ll find ~25,000 different companies. FB allows for multiple aliases during search, but it filters them out to the same display name across all profiles. We’re clueless as to why they did this for companies but not schools. Users can enter up to 15 jobs and can ascribe position, description, location, and duration.
So exactly how much data can you export?
Stated simply, you can touch basically everything but a user’s contact information. So here’s the list, including how the data is structured in its output. We’ll address friend lists and data in a moment.
| Data Element | Export Format | |
| UID | Permanent | |
| First name | Free form (ff) | |
| Last name | ff | |
| About me: | ff | |
| Activities: | ff | |
| Birthday | Day, Month, Year (1900-2008) | |
| Books | ff | |
| Colleges | Up to five: name, type, degree, concentration, grad year | |
| Hometown | “City, State” or “City, Country” if outside the US | |
| High school | Up to two: name, grad year | |
| Interests | ff | |
| “interest sex” | Male or female | |
| “interest meeting” | Friendship, Dating, Relationship, or Networking | |
| Location | “City, State” or “City, Country” if outside the US | |
| Movies | ff | |
| Music | ff | |
| # of notes | # | |
| # of wall posts | # | |
| Networks | (up to four) Region, High School, College, Work | |
| Photo albums | All pictures + tags, titles, etc. | |
| Pictures | Misc. pictures + tags, etc. | |
| Political Affiliation: | Party name | |
| Profile pictures: | 50×50, 50×150, 100×300, or 200×600 | |
| Profile update time: | Date, time | |
| Quotes: | ff | |
| Relationship Status: | Single, in a relationship, engaged, married, it’s complicated, open relationship | |
| Sex: | Male or female | |
| ID of Significant Other: | UID | |
| Status message: | ff + date/time | |
| Timezone: | # offset from GMT: “-6” for Nashville, for instance | |
| TV shows | ff | |
| Work History: | Up to 15 companies: name, position, description, location, duration |
In addition to these core profile elements, you can also make calls for and then export huge amounts of data through:
- Events: Title, location, date (duration), picture, type, members, etc.
- Pages: Name, type, location, hours, members, etc.
- Groups: Name, type, description, location, members, etc.
Now about friend lists: As you’ll see when you use FriendCSV, you can not only access all of the above for a single user, but you can also access the same data from their friends. Pretty crazy, right? This means that by touching one user you can instantly touch thousands more. But hold on now…time to talk Privacy.
Understanding FB Privacy, Terms of Service, and Platform Documentation
There are five key documents that come into play re: data portability on FB. Taken alone, each is hard enough to understand – taken together, it’s downright labyrinthine. As a developer, though, there are really only four things you need to know:
- The Onus of Privacy is on the User: While FB puts restrictions on how you can access and store information, they ultimately put the onus on the user when he interacts with an application. This means that users interact with apps at their own risk. From the Privacy Policy:
“If you, your friends, or members of your network use any third-party applications developed using the Facebook Platform, those Platform Applications may access and share certain information about you with others in accordance with your privacy settings…
…in addition, third party developers…may also have access to your personal information (excluding your contact information) if you permit Platform Applications to access your data.”
- The 24-hour Clause: Most of you have heard of this. It basically states that you can suck out any data, but you can’t store it for more than 24 hours; however, there are two key things that people overlook: 1) There are some elements that can be stored indefinitely and 2) if there is a disclaimer on the application, the developer can do almost anything with the data.
- The “Storable Indefinitely” Properties: FB allows us to store User ID, Network ID, Event ID, Group ID, and Photo ID.
- The Gold in the Mountain — “Full Disclosure Opt-Ins”: As a clear extension of FB putting the onus on the user, they have included a clause in their documentation that says that developers can do almost anything with the data they touch if they have full disclosure. Taken from 2.A.6 of the TOS:
“You may retain copies of Exportable Facebook Properties for such period of time (if any) as the Applicable Facebook User for such Exportable Facebook Properties may approve, if (and only if) such Applicable Facebook user expressly approves your doing so pursuant to an affirmative “opt-in” after receiving a prominent disclosure of a) the uses you intend to make of such Exportable Facebook Properties, b) the duration for which you will retain copies of such Exportable Facebook Properties, and c) any terms and conditions governing your use of such Exportable Facebook Properties (a “Full Disclosure Opt-In”).”
This is a bit wordy, so we’ll translate: If you outline which data you’ll use, how you’ll use it, for how long, what other terms the User might be subject to, and get User consent, then you can keep and use profile information for as long as you want.
So the main lesson here is that you shouldn’t be afraid of the various policies and documents because they are outlined to help you rather than restrict you. But again… a note about friends’ data. FB has been incredibly aggressive in policing how developers are accessing and using these data, and rightfully so. Last week they shut down the Top Friends app for allowing too much data access and earlier this year they canned Google Facebook Connect because it didn’t operate in accordance with their policies.
I’ll say again that they were right to do this and when thinking through how to port users, you should be mindful not just that FB might shut you down, but that a secondary friend who doesn’t opt-in to your site probably should be left alone. More than likely, he doesn’t want what you’re selling. Of course, there are ways around this if you want to brute force it, but we’ll just keep that to ourselves. So let’s keep going…
Setting up the Application(s) and managing the exports
Your importer can be inside FB as part of an application or it can exist as a standalone. We do it both ways. With FriendCSV, users install the app and we then direct them to their new profile as an add-on; meanwhile, out in the ether, we have a dedicated portal at http://fb.bigsight.org that directs users to FB for initial authentication, but then kicks them right back to our web app. If you already own a great app with lots of traffic, start there. If not, it’s probably best to set up your porter out on the web. Exporting the key data for a single user doesn’t take too long, so you can typically create a new page/account for them instantly. However, if you plan on exporting an element like friends lists (careful, hoss) or photos, you’ll need to batch up FQL requests when possible and also be open to allowing some processes to happen in the background.
The FB API is “REST-like,” which means it can be used by anything that handles standard HTTP requests. Libraries exist for PHP, Java, Ruby, and other languages that make the API easier to use. The following example code is for Ruby on Rails and the Facebooker library, as that’s what we use at bigsight. No matter which language you choose, writing FB applications to extract data is surprisingly easy. One line of code will tell your application to authenticate with FB. Simply add “ensure_authenticated_to_facebook” to your Rails controller and it will send your user to the FB login page if needed, and return them to your application. From that point on you have full access to the FB user and all exportable data. Here’s one example of how to extract educational history:
def gather_schools
# Create a local copy of the Facebook user
@user = User.create(:name => @fb_user.name, :fb_uid => @fb_user.uid)
# Load the user's schools
for fb_school in @fb_user.education_history
School.create(:name => fb_school.name, :user_id => @user.id)
end
end
For a full view of the FQL queries, check out this page in the documentation.
Integrating FB Data into an Existing Third Party Site
Ok so now you know what the data look like and how to access it, you need to think through a few things to figure out how to integrate it all with your site or widget. These are the questions to ask:
- What are the basic data elements you need for a user to interact with your site? Start by isolating the variables you need to a) successfully port a user to your site and b) give them enough active features that they instantly get a taste for your offering. Design your integration so that it is as simple (though complete) as possible. You might also consider including an “instant remove” link so that a user can quickly exit and take back his data.
- What deep database elements do you need to align? This might take a bit of work depending on what types of information you need. For instance, we suck out and integrate city, company, and school data. This sounds easy enough, but it gets dicey: There are quite often many names for the same entity. So if you want to align these elements, you need to: a) figure out what FB calls them and then b) use that naming system or make it line up with yours so that your importer can identify multiple aliases.
- How can you enrich user data in a novel way? There’s tons of win to be had if you can figure out a way to enrich a user’s data. We do this in two ways on bigsight:
- We match their school data against our own database and add the school logo to their profile pages. Furthermore, our school links go to pages that instantly show them people they may know. Here’s my alma mater, for instance: http://bigsight.org/school/wlu
- We built an algorithm that constructs full biographies based on a user’s profile data. This is fully dynamic and can have up to 140 different combinations depending on which school, company, and city data the user has and how he has structured it.
Basically, get creative. It’s almost silly how many cool things can be done here.
- Is there any way to leverage group, page, or event data? Check this out: http://bigsight.org/city/nashville_tn/events. This is a display of the events that I RSVP’d to in Nashville over the past year. Sucking out this data is fully legit. It doesn’t take long to realize how entirely new sites can now be built based on even one or two User imports.
- How can you set up a User account? You might have to get creative when it comes to getting information (namely email) that isn’t directly available, though often needed to set up a working account. We ask for a user’s email up front and assign them a temporary login and pw based on this.
- Are you going to store their raw data output? We highly recommend your discarding their original raw data, even if you have a full disclosure. It’s just better for everyone involved and is better for the user and the web. Remember that you can keep the User ID and if you codify the information in some way, you’re in the clear.
Conclusion
Like I said above, we believe that FB is on the path to doing something amazing with the web, and we believe that everyone in the industry needs to know how to not just adapt to it, but also thrive from (and alongside) it. It should be an interesting summer re: the web as Facebook Connect launches and more and more people begin leveraging this and the Platform for utility rather than blind user engagement.
Our opinion is that while FB Connect will offer some amazing functionality in regards to quick user integration and synching, it likely won’t be as powerful as the Platform in terms of data access. Either way, these developments will not only change how users interact with third party sites, but they will also raise the bar for user experience as individuals accustomed to the FB UI will begin to demand increased alignment. Soon we’ll likely see businesses start to build sites on the back of FB rather than a) going out on their own or b) doing what could prove to be complicated integration. Additionally, we’ll probably also find resolutions to a few ongoing discussions and questions such as who owns a friends’ list and how what FB calls “dynamic privacy” actually works out in the wild.
It’s all pretty interesting stuff to think through and incredibly fun to see it all come together so quickly. Creative destruction all around, you know. Lots of warriors in the arena. ARE YOU NOT ENTERTAINED?





here’s an article discussing why the facebook platform will continue to slow down based on current usage trends reported in oreilly:
http://zooie.wordpress.com/
FOLLOW THIS GUIDE AT YOUR PERIL !! It is providing completely erroneous information regarding the legal right to store FB data beyond the 24 hour period.
Read comment #6 by Sanjay Vakil. He is 100% correct.
“Exportable Facebook Properties” is a DEFINED TERM in the documentation, and Facebook has currently defined this term as an empty set:
http://wiki.developers.faceboo.....Properties
Further, Facebook has specifically identified the particular types of data that can be stored beyond the 24 hour period. That list is provided here:
http://developers.facebook.com.....p;doc=misc
Facebook specifically prohibits all other data from being stored beyond 24 hours.
So, this guide is essentially completely wrong in it’s basic thesis. The author has made a serious error in his reading of the legal documents. He is so wrong, in fact, that I would recommend Techcrunch consider retracting the article. Because, currently, all this article is doing is spreading information about how to breach your contract with Facebook and illicitly gain access to people’s private FB information.
Can someone validate whether comment 52 is accurate please?
Sorry…I’ve been busy pouring through all of the new illicitly-gained data users have been actively sending us since yesterday
So here’s the thing: Both Sanjay and this VCReader guy are correct about what the documentation on the wiki states. It expressly says that there are currently no properties that FB deems exportable. It says that plain as day. However, a few pages after that, it then identifies eight properties that actually can be exported. Ok. Quite often much of the information on the documentation wiki doesn’t line up with the more formal TOS, Privacy Policies, and Developer Guidelines. Furthermore (and perhaps more importantly) it doesn’t line up with how FB is actually operating with developers out in the wild.
For instance, we launched FriendCSV back in October. The entire application is SOLELY about exporting massive quantities of information. When this application launched, ~50 users sent in TOS complaints; however, when we discussed this with the FB employee handling these complaints, we received affirmation that we were well within the current policies. Furthermore, the application was approved into the directory w/o any hitch. Next, our importer to bigsight has been out there for six months and has even been used by a few fb employees. Of course, these facts aren’t enough to go on to say that we’re 100% in the right and that FB won’t enforce this Exportable Properties tidbit at some point. But they aren’t now and with good reason.
The current environment seems to be that FB sticks primarily to what is outlined in the core policy documents and then adhering to the following principle: “Just don’t be a dick.” So as long as developers handle user data with care and offer them services in which they have an active interest (while not violating their privacy settings or that of their friends), then that’s what matters most and developers are in the clear to experiment with and build off of user data portability.
It could turn out that FB decides to start actively enforcing that one definition and our entire exporting/importing operation could be shut down tomorrow. This would take a couple thousand other applications with it. But I just don’t see that happening. So if you are a developer, yes, know that you are operating at your own risk when you build data portability mechanisms, but that’s no less the case with many other applications. It’s still the wide open frontier out there and as developers on the Platform, it’s largely up to us to a) understand users’ interests and then b) build appropriate services that meet them. If we do this well, then that will shape (or re-shape) any policy elements that are currently ambiguous.
Dan,
So if I read your comment #54 correctly, it sounds like your position is:
“Yes it is technically a violation of the FB docs to export user information and store it for longer than 24 hours. However, right now, FB isn’t enforcing that provision too carefully or maybe even at all. And, so, as long as you’re not a dick about how you use the data, you should be able to get away with it.”
I know these are not your exact words, but it’s how I interpreted your comment.
If that’s your point, I can understand it, and you may even be right. And any FB app developer who wants to proceed that way and assume any associated risk is free to do so.
The problem, however, is that is NOT what your article says and it is NOT what your website says at: http://bigsight.org/hfb/
Your “guide” essentially says that the FB documents PERMIT you to export and store this info beyond the 24 hour period — and that just simply isn’t the case at all. That is factually wrong. The docs say the very opposite.
And so, rather than helping app developers by providing them useful information, you are providing them misinformation, and by doing so, exposing them to a potential risk, without even telling them about it.
Every FB app developer should be aware that the FB documentation clearly prohibits the storage of user information beyond the 24 hour period, even with an opt-in, except for the 8 pieces of data listed here: http://developers.facebook.com.....p;doc=misc.
If a developer chooses to ignore these provisions, they should do so with full knowledge that they are violating the terms of their Developer Agreement with Facebook.
Now, I am not in a position to speak authoritatively regarding Dan’s point as to whether or not FB is, or is not, enforcing these provisions. That is a risk decision that each developer will ultimately have to make for themselves. However, even if FB is not enforcing them now – there is no guarantee that they won’t start tomorrow. And, personally, I take no comfort from the line of reasoning that goes: “It’s ok, I know it’s wrong, but a bunch of people are doing it, and Facebook isn’t doing anything about it…”
Right, so Google was being a dick? *rolls eyes*
Unfortunately until Facebook allows a full export of all data (email, phone numbers and screen names being key) then it still *IS* a walled garden to some extent, at least to it’s users. Facebook developers may care about the other fields of course, and with good reason, but the walled garden keeps a tight grip on the ripest fruit and greatly limits the value a user can actually get from the FB offering.
I use Facebook extensively and the one feature I wish it had above all others is the ability to sync just the information listed above with Outlook. Frankly, I’d pay good money just to get one solid sync of my 150-ish friends contact info, most of which are people I’ve lost touch with over the years and would love to reconnect with a little more easily.
For the record, you can sync pictures and contacts with Outlook using the fields provided… you just won’t get any contact info.
I agree with @52. From what I read, you cannot pull out FB data other than the few items that the second link in @52’s post lists. App designers then use the users’ FB ID numbers as a link into their own RDBMS systems for storage of newly generated user data (i.e. not-FB data such as Vampires deployed or whatever the freak the app does).
I think there may be some ambiguity here. The FriendsCSV app may be perceived by FB as a tool for a single user to export the data for personal consumption on their PC and not really an export app for use importing elsewhere. They may be a little lenient on it with the data portability initiatives over the last several months. However, you are still in violation as far as I can tell. Also, didn’t FriendCSV get in a shitstorm a handful of months ago or was that someone else?
Finally, FTA, the author states “…legal documents are like amusement parks for me, so I’m now fairly well acquainted…” This was irresponsible to set the tone like you are some legal guru who knows his shit.
Harry “and I thought I was ballsy at times” Wang
Yo Harry,
Actually FriendCSV encountered no storms of any kind…and neither did our importer when we launched it six months ago. The main thing is that there are a good number of ambiguities here that FB doesn’t seem eager to try to clarify, and probably with good reason.
For instance, I could easily take the argument that we’re not even touching what they call Exportable Properties, both because they are undefined and because we’re actually not exporting any raw user data. We codify most all user data the instant we touch it inside FB and then we integrate it back in to bigsight.
So again, lots of gray here, but I continue to believe that developers who provide positive opt-ins and offer users services that they want should have no problem. They all know that they are operating their apps at their own risk
VCReader,
For some reason your comment just came through now, so I didn’t see it before. You have a good point that the guide/article should have been more clear that at the moment, FB’s formal documentation says that there are no “Exportable Properties”, as they define them (or…er…don’t define them) on the wiki. I am adding a note about this to the PDFs on the site to identify the ambiguities there and to stress that developers are operating at their own risk.
Facebook platform is powerful, and they do seem to be trying to be the controller of the “online ID” moving forward, but can they pull it off? Yes, there are 100mm or so people on facebook, and it will keep growing, but that is nowhere near the number of people online? What about users of mysapce (200mm), many of them don’t use FB, think they will just switch? It seems that for FB to “own the new web” they would essentially need to run all the other SN’s out of business - and I don’t see that happening.
What about sites that are bringing all your social graphs together? Here is a blog about one start-up that is doing just that http://www.gruvie.blogspot.com. aren’t these sites positioned even better?