Yet Another Update: AOL: “This was a screw up”
Further Update: Sometime after 7 pm the download link went down as well, but there is at least one mirror site. AOL is in damage control mode - the fact that they took the data down shows that someone there had the sense to realize how destructive this was, but it is also an admission of wrongdoing of sorts. Either way, the data is now out there for anyone that wants to use (or abuse) it.
Update: Sometime around 7 pm PST on Sunday, the AOL site referred to below was taken down. The direct link to the data is still live. A cached copy of the page is here.
AOL must have missed the uproar over the DOJ’s demand for “anonymized” search data last year that caused all sorts of pain for Microsoft and Google. That’s the only way to explain their release of data that includes 20 million web queries from 650,000 AOL users.
The data includes all searches from those users for a three month period this year, as well as whether they clicked on a result, what that result was and where it appeared on the result page. It’s a 439 MB compressed download, expanded to just over 2 gigs. The data is available here (this link is directly to the file) and the output is in ten text files, tab delineated.
The utter stupidity of this is staggering. AOL has released very private data about its users without their permission. While the AOL username has been changed to a random ID number, the abilitiy to analyze all searches by a single user will often lead people to easily determine who the user is, and what they are up to. The data includes personal names, addresses, social security numbers and everything else someone might type into a search box.
The most serious problem is the fact that many people often search on their own name, or those of their friends and family, to see what information is available about them on the net. Combine these ego searches with porn queries and you have a serious embarrassment. Combine them with “buy ecstasy” and you have evidence of a crime. Combine it with an address, social security number, etc., and you have an identity theft waiting to happen. The possibilities are endless.
Marketers are going nuts over the possibilities, users are calling for a boycott of AOL, and others are just enraged:
User 491577 searches for “florida cna pca lakeland tampa”, “emt school training florida”, “low calorie meals”, “infant seat”, and “fisher price roller blades”. Among user 39509’s hundreds of searches are: “ford 352″, “oklahoma disciplined pastors”, “oklahoma disciplined doctors”, “home loans”, and some other personally identifying and illegal stuff I’m going to leave out of here. Among user 545605’s searches are “shore hills park mays landing nj”, “frank william sindoni md”, “ceramic ashtrays”, “transfer money to china”, and “capital gains on sale of house”. Compared to some of the data, these examples are on the safe side. I’m leaving out the worst of it - searches for names of specific people, addresses, telephone numbers, illegal drugs, and more. There is no question that law enforcement, employers, or friends could figure out who some of these people are.
There is some really scary stuff in this data.
I am assuming that AOL will take this page and the data down soon, but as of the time of this post it has been downloaded 809 times already. People I’ve spoken with are already building a web interface to the data. If you are an AOL customer, I feel sorry for you.
Note that Microsoft has proposed releasing similar data to researchers, although with an important difference - the data is not associated with a user. Excite released data very similar to what AOL has done here, with user associations, in 1999.
AOL is hitting bottom when it comes to brand image. This story comes on the heels of the recorded phone call with customer service disaster as well as a just-in story about a woman who is unable to cancel her deceased father’s AOL account, nine months after his death.









See all



Mike,
Good post. I agree this is terrible, but does it make things any better to supply the link to the file? Aren’t you just propogating the privacy infringement even further this way? Unless this is your attempt at taking the file down with a traffic overload.
Hey Noam, the data is out there and blogs are buzzing. Too late. If I found it first I would have called AOL and strongly suggested they take it down.
This will be like dropping a nuke on the affiliate marketing world. Watch the spammers go nuts.
Hey, it’s not like there’s a been a lot of furor over the last year or two about data privacy…
Gone. Anyone with a mirror please post.
Microsoft was offering researchers similar query data earlier this year.
I was just giving you a softball Mike. See how much better your response comment made you look?
Madness. A “what were they thinking?” moment if ever I’ve seen one.
They weren’t. Thinking.
I’m having trouble getting the whole file. It looks like their web server is having as much trouble as their legal team is about to.
Danger: You’re not alone. I’m sure a couple of people have already successfully downloaded the entire file; it’s just a matter of time before a torrent pops up, and the file spreads like wildfire.
You’d think a company the size of AOL had more common sense.
AOL being a mammoth beast has been attempting to cut red tape within its ranks. This very effort might have enabled the tech/research team to publish such data without obtaining clearance through the legal/executive team. Big ‘whoopsie’ here, which is going to cost AOL a bomb.
I can see a couple of heads rolling already.
It’s not gone. I’m still downloading.
im still downloading too…
Yip, the download is stalling. The intro page has been wiped too (not surprising - it featured some pretty irrate customer comments). Story is now on digg and getting some traction, so I suspect there won’t be any hope of getting the file until someone sets up a torrent.
On another note, I’ve been observing TechCrunch’s FeedBurnet stats. Just a week or two ago, TC’s feed had 90k+ readers, but this number has been steadily declining ever since and now stands at 78k. What’s the deal, Michael?
Wow this is insane.
Wow. Just in time for Search Engine Strategies in SJ this week. Should be fun to visit their booth.
Mike - a good time for the shorts to come piling in?
I personally have never used AOL but I find a large number of people hate it and its services. I wonder what kind of uproar this will cause.
Joe Banix: This is pretty common with FeedBurner’s TC stats. They have to guess at a lot of things to measure active readers. Many more people check TC during the work-week, so by sunday evening, FeedBurner shows a lower # of subscribers. By Tuesday, you should see it shoot back up.
It’s just a matter of type before this data is ripped to CD’s, filtered in multiple ways, and sold on Ebay
Can someone please publish a torrent?
I have downloaded the file, and put it up online.
Blog post: http://www.gregsadetsky.com/?p=22
Direct link: http://www.gregsadetsky.com/aol-data/
Was speechless. So I made a video about it. “AOL PRIVACY CAM.”
http://nalts.wordpress.com/200.....vate-data/
AOL is blatantly irresponsible
Many innocent lives are going to be ruined over this. I am so glad that I don’t use AOL.
They took it down…link anyone?
Thank god for Cox!
Ahhh Importing all data from text to MySql now gonna be fun querying this data instead of working tomorrow
heh…its funny that they recommend firefox…instead of their broswer
http://www.gregsadetsky.com/aol-data/ mirror
was just on drudge’s radio show, breaking HUGE
Just heard about this on the radio from drudge. Everyone is saying this is was a mistake, im just wondering who paid how much for this to happen…
upload onto my ftp MUAHAHAH
http://ftp.pininxweb.com
usename: aol@pininxweb.com
Password: aolaol
Did anyone else notice that AOL actually mispelled the word “anonymized” as “ananomized”.
http://texturbation.com/blog/aoldata.jpg
If they can’t even spell it…
I’ve been using AOL to find gay lovers since 1994. How can I hide this from my wife?!?!?
It’s obvious that AOL is trying to destroy itself and close up shop. Why do you think they decided to make AOL “free”? So they could slowly get out and dissolve. This is another effort to speed up the process.
Here’s an example of what this means.
User 761231 (from file #7) searches for:
who owns the property at 1133 thieriot avenue in the bronx
messenger service in woodbury new york
lotion that beyonce uses
how do i register to become a senator
exposed nerve in my tooth
edwin gould services for children and families
incontinence
who are the political leaders in the bronx
as well as at least one search that leads me to believe I could identify this user which I won’t post here.
Unbelievable.
What searches are they showing results for?
Google? Just using the AOL search bar? Websites visited?
I would guess this is a result of AOL trying to get “with it” by being more transparent, although being transparent with customer data isn’t exactly what people have in mind. From the outside, it seems they encouraged an environment where a couple of engineers could setup a cool wiki and just do stuff with minimal bureaucracy. They wanted to prove that they aren’t an old-school company like Time-Warner. But if you think about it, this is the type of dumb mistake a company like that would have made when first getting involved with the “interweb” back in the dot-com boom.
I think that we are missing the greater point here. True, AOL just did about the dumbest thing imaginable, but let us not forget that Google, Yahoo and MSN users are succeptible to the same kind of catastrophe. It just takes one idot to post a link to a file, and there we have it again. In these huge institutions there are numerous people that have access to this kind of data, and no matter how tight the controls, chances are pretty good that this will happen again.
Anyone that performs searches while logged into these search services is likely being recorded–perhaps even if not logged in. Therefore, we need some serious laws and enforcable standards to mitigate this threat.
I am no wiz kid, but looking at that sample, I was able to determine that this person is female, who is active with the american cancer society’s program, “tell a friend”, and that she lives in texas. She also likes the gaithers gospel music, and was looking into going to disneyland and a possible cruise with the gaither band.
oh well…..
I have AOL, and searched for “buy child porn marijuana free cocaine anal fetish bdsm hack FBI KKK fight the power lawyers” recently. Do you think I’m in trouble?
Looking over that MS RFP, it looks as if they stripped out any personally identifiable information from the queries they’d provide. A session ID is just a random number for terms of uniqueness and everything else provided is just arbitrary.
As for AOL’s mishap, geesh, that is a big mess if I ever saw one.
I tried to download the file and it stopped at 111mb. As a result, I don’t know what is on the file so I have no first hand knowledge. I’m left with opinions.
My initial reaction was holy sh*t. I’m now rethinking this incident. I’m not an AOL member, but I looked at their privacy policy on http://www.aol.com. I don’t see an area which was violated.
I also have to think the research arm was more familiar with the privacy policy than I and this release would have to pass internal review. I worked with an external vendor 10 years ago and my take is this had to have had some sort of review.
In addition, people have talked about being able to determine who the actual users were, but I’m not certain that is easy to do on our end. How do you know the search for “Bob Jones” in “Anytown, USA” is for someone doing an “ego search” versus someone doing a reference check on a resume?
Do I think there will be people who can benefit from the file, absolutely. Certainly, there will be marketers who will do keyword analysis much the way they do with WordTracker, KeyWord Discovery, AdWords Accelerator or other tools. And yes, I’m one of the people who thought the DOJ would’ve been better off buying a subscription to these services than requesting info from the various search engines.
I don’t want to dismiss user privacy. I actually write an annual report for my users on how I use their data. Personally, I wouldn’t release this type of info. It’s not that it violated my privacy agreement, but more that I didn’t explicitly state in detail I would. I also don’t pass my data in aggregate to other people.
The last point I’d like to mention is even if people determine that AOL violated their privacy agreement, at least they have one. It makes them accountable. But I wonder how many webmasters who have come to a conclusion, one way or another, don’t have one. In my mind that is also a travesty.
OMG, this is a gold mine, I could probably get new clients just showing them this list…
Anne you’re correct. I happen to run one of the companies which will benefit and OMG, that’s all I can say. OMG…OMG
I am just waiting to see instant messenger conversations uploaded next. Then emails… that would be sweet.
This fiasco should drive home a point to the webtopians: use of online apps like Writely, Basecamp, and Google Spreadsheets is just as secure as being a search-happy member of AOL. When your data is online, you don’t own it. Period.
When it comes to data like documents, most ordinary users and businesses already get this, and won’t go near online desktop apps.
More at http://web2dot5.blogspot.com/
Is this data only current AOL customers or previous as well? (I’m sorry to say I was a proud dial up customer circa 1996).
This article presents many extremely weak arguments…
First of all, most people on earth have tens, hundreds, or thousands of people who share their name. Type your name into a search engine, and you’ll find mostly always find many people who ARN’T you.
Second, how does a query “buy ecstasy” show evidence of a crime?… That just completely doesn’t make sense. Evidence of a crime is evidence of an actual transaction of the drug… If any random person types in “buy ecstasy”, did that person just break a law?
In my opinion, this decision by AOL was horrible, but this article is just trying to scare people and shoot down AOL. Looks like most people are buying it too.