AOL is apologizing in the aftermath of yesterday’s story about their voluntary release of search data on 650,000 users.
In addition to removing the data, AOL spokesperson Weinstein has left comments on blogs, including ours, and to the press in general:
All –
This was a screw up, and we’re angry and upset about it. It was an innocent enough attempt to reach out to the academic community with new research tools, but it was obviously not appropriately vetted, and if it had been, it would have been stopped in an instant.
Although there was no personally-identifiable data linked to these accounts, we’re absolutely not defending this. It was a mistake, and we apologize. We’ve launched an internal investigation into what happened, and we are taking steps to ensure that this type of thing never happens again.
Here was what was mistakenly released:
* Search data for roughly 658,000 anonymized users over a three month period from March to May.
* There was no personally identifiable data provided by AOL with those records, but search queries themselves can sometimes include such information.
* According to comScore Media Metrix, the AOL search network had 42.7 million unique visitors in May, so the total data set covered roughly 1.5% of May search users.
* Roughly 20 million search records over that period, so the data included roughly 1/3 of one percent of the total searches conducted through the AOL network over that period.
* The searches included as part of this data only included U.S. searches conducted within the AOL client software.
We apologize again for the release.
Andrew Weinstein
AOL Spokesman
Blogs are certainly buzzing about this. One of the more interesting posts: AOL employee Jason Calacanis is suggesting that they stop keeping search logs altogether (great idea, won’t happen).








See all



You think Google does not share search queries with anyone?
Think agaun and do some research and you will find their key pre-ipo funder (name cut) specifically one partner’s obvious interconnections with NSA as well as a transfer at that time by a key infrastructure persona (name cut) from NSA to Google.
Regardless if you use Google in logged-in mode or not, someone is building a deep relational interest file on YOU and remember NSA operates above the gov. with an excuse “for national security”.
Some day someone may make this public info through a mistake or to capitalize on information collected.
Keep your dark desires in the dark because it is very easy to track them back to you.
Jeez, are they making no effort to halt the people who are now illegally redistributing this data? Without that, their apology means nothing.
Liam, the data is freely distributable under the terms of the license, which prohibits commercial use. There are no other legal constraints on the distribution of this data by those, including me, who’ve downloaded it. Expect a web interface on it soon.
Hi,
I asked on the other thread, but is this data *only* from using AOL’s search engine, or does it include ALL searches done by AOL members (meaning searching using Google, Yahoo, etc) while logged in to AOL?
@jane
[quote of statement above]
* The searches included as part of this data only included U.S. searches conducted within the AOL client software.
[quote]
If this is true (noone can tell you the truth apart from AOL) it means all data which can be intercepted by this client (I don’t know it) and so it’s indipendent from the search engine they use.
If they changed the engine it wouldn’t make any difference.
What? A mistake?? No s#!t… Not only do they need to punish the people involved, they need to address the fact that they have a corporate culture that would even allow AOL employees to think that this might be OK.
Either their researchers are clueless, or data privacy is just not taken very seriously at AOL. I mean, how can a bunch of bright people be sitting around, have someone propose this and it’s not immediately shot down or at the very least checked out with Legal? I get that researchers might not be the most practical of people but really…
They are, of course, aware of all the other public data from search queries that is still available on http://research.aol.com/pmwiki.....ollections or aren’t they? Granted, it’s much less in magnitude but browsing the data is at least equally amusing.
My, my, what a mess. I wonder what might have happened differently had AOL had a public-facing corporate blog. Quite possibly the damage control would be easier to manage.
Jason Calacanis’s suggestion to not keep search logs is something I too wish could happen, but in all likelihood won’t.
Once that file was downloaded, it will be extremely difficult, if not impossible, to contain its spread. There are already a number of mirror downloads on the Net.
I am very, very thankful I am NOT an AO Hell customer, and even more glad I never used their slow excuse for a broswer!!
Re: “the data is freely distributable under the terms of the license”
Thanks for the clarification. But surely their legal dept could come up with some loophole or provision of law that would enable them to yank it back. (The vast majority of downloaders hitting mirror sites are clearly commercially motivated.) They’ve got to at least try, given the likelihood of lawsuits stemming from this…
I tried to find out whether my data was included by asking. This is what I got.
9:49:14 PM System Alex stated the question or problem as: Hello. I wish to know whether my searches are among those now publically available on the internet.
9:50:14 PM AOLTechNPL Could you provide me with additional information which searches you are referring to, Alex?
9:50:46 PM You http://www.gregsadetsky.com/aol-data/
9:52:04 PM AOLTechNPL Please hold on as i check it before I provide you with any information.
9:52:15 PM You Holding on.
9:55:44 PM AOLTechNPL With your concern, there was no personal member data was released. The search data only had random numbers assigned to each search account, so there is no personal information from AOL linked to any of the data.
9:56:55 PM You My searches, would I believe, clearly ID the searcher to any interested party
9:58:24 PM AOLTechNPL You can be assured that there are no personal information linked to any data.
9:59:09 PM You Unless of course, someone searched for their name or other unique combinations - but that never happens, right?
10:00:24 PM AOLTechNPL The searches included as part of this data only included U.S. searches conducted within the AOL client software.
10:01:04 PM You I am well aware of that. However, I performed those searches.
10:03:14 PM AOLTechNPL I understand your concern. While there is no personal information linked to any data, this is also being checked to make sure that this type of thing never happens again.
10:03:57 PM You But no-one within AOL can respond directly to my concern as to whether my data was included, correct? I have to find that out for myself?
10:05:34 PM AOLTechNPL We are unable to verify if any of your search data is included since only a small percentage of the searches were included.
10:07:24 PM AOLTechNPL I understand how you may feel about it, but what I can assure you is that your personal information was not compromised.
10:08:11 PM You With all due respect, since I know that none of this was your choice, you cannot assure me of that and you should not deceive others.
10:08:37 PM You You can only say that my name was not on top of a list- not whether that list compromises personal information
10:09:42 PM You I sense this is fruitless- I’ll try to contact the company another way and probably end up only harassing another customer service rep.
@Liam: Actually, spreading the collection may be very well illegal. The only way it would not be illegal is if passing it along is successfully declared “non-commercial research” because that is all the license allows. Anything else is copyright infringement. The data is not in the public domain; only the copyright owner can give it away. And the license clearly states “Copyright (2006) AOL”.
Also, re. the guy who set up a web site with google ads where you can (so he says, didn’t work for me) search the data - that’s decidedly not “non-commercial research”.
Manual trackback
In all the current discussion about AOL’s sharing of the query-log
data, there has been little emphasis on the importance of such data to
research on information retrieval. In addition to the real privacy
concerns, a key point that must be considered is the fact that if
useable data is not made available to the wider research community,
only the big search companies will be able to analyze that data. We
academic researchers are increasingly dependent upon industry for this
sort of data to do research; the sort of small-scale data that can be
gathered in a university-based setting is simply insufficient for
obtaining reliable experimental results.
Should companies be prevented from sharing data with the research
community (either by law or public outcry), research progress will be
greatly reduced, as it will be impossible to compare different studies
with one another, since each study’s data will be proprietary, and
thus no one will be able to trust any research result from another
lab. All non-industrial research in this area will more-or-less dry
up, and search technology will tend more and more to be developed in
“closed-shop” efforts within the large firms; innovative startups and
open-source hacking will not exist, since the research projects that
serve as launching pads for such technological innovation will not
exist. This prospect should disturb us all, as search technology
(broadly construed) is more and more the vehicle that people use to
gain information about their society and the world.
All of this is not meant to ignore the real privacy issues that can be
involved in the preparation and release of such data. It appears to
me that there was little real privacy risk in the data released by
AOL, but it is clear that policies and practices need to be debated
and developed that accomplish two essential goals: (a) to protect the
privacy of individuals in any sharing of research data, and (b) to
ensure that as much useful data can be shared by companies with the
greater research community. In this effort researchers and privacy
experts must collaborate to ensure that all sides of these important
issues are properly addressed.
Shlomo Argamon, Associate Professor
Department of Computer Science
Illinois Institute of Technology
Chicago, IL 60616
Maybe what AOL did is just a new Web 2.0 notion.
I’ve made a cartoon:
http://geekandpoke.blogspot.co.....ntion.html
Bye,
Oliver
A site where you can search this data is here:
http://www.datablunder.com/logitems/query/
Shlomo, another way researchers could get such data would be an opt-in proxy service. You could then monitor all traffic, not just what AOL sees. Though this data is nice in that you see what sites people end up on too. http://www.seosleuth.com/site/
Everything about me and my children was realeased by AOL I have a son that Died and I am going thru a nasty custody battle and all my serachs personal and private were released