AOL Data: First Web Interface Up
Michael Arrington
71 comments »
It was only a matter of time before someone put up a simple web interface to the 20 million search queries published by AOL last week (see AOL Proudly Releases Massive Amounts of Private Data, and AOL: “This was a screw up”).
A commenter on our original post wrote:
Here’s something you guys might like. I whipped this up to help those of you who don’t feel like grepping your way through 2 gigs of files. it’s a searchable mySQL database of these searches (most of them, anyway, I’m not done indexing yet) with all redundancies removed, searchable by categories. Hopefully this should make for a few hours of fun.
They even picked out a suitable domain name and took the time to wrap it in Google ads. See AOLSearchDatabase.com to search the data via user ID, search terms, date of search or website results. A screen shot of the results is below.
I also spoke on the phone today with Andrew Weinstein, the AOL spokesperson who released the apology. He is genuinely pissed off that this happened, and said he was as surprised as everyone else when he found out about the debacle late Sunday evening. As an interesting side note, he said he first heard the news on TechMeme, one of my favorite sites, and contacted the AOL employee who posted the data and the exec team immediately to deal with the problem. After the call the first thought that came to mind was, why don’t they find someone like Andrew to run AOL customer service?






Nice work, except the results can’t be sorted by date.
It is this type of turnaround speed and distribution that makes me wonder how life existed without the web…….excellent!
It would be really useful if you could sort by date (as mentioned above) and also click on the results to then filter by user ID etc. But nice work to get this up and running so quickly!!
well, the site is down for the moment -> that is, no data being returned.
i’m sure the guy was just putting it up as a research tool. i didn’t see the ‘research only’ disclaimer, though. weird, that.
I wonder if they’re keeping a LOG of the searches people do ON the log of the searches people do!?!!
Ah… The wonders of technology…
This is bad karma IMHO..
AOL admitted they made a mistake, it says in the readme not to use this data for commerical purposes (eg. adsense) and its not very respectful of the people who “accidentally” typed in their personal details.
/rant off
In very bad taste I guess, I agree with numlock.
stalk away. data isnt completely loaded up yet, but 30% is.
Michael, can you please get over this and go back to web 2.0?
the search site doesn’t seem to be working?
Yeah probably isn’t working. I’m sure AOL contacted Mr. Zhao.
Wait till the first database appears on Ebay
I expect it in 2 days.
“Wait till the first database appears on Ebay
I expect it in 2 days.”
And wait til it gets removed, I expect 1 hour. Ebay is aware of stuff like this, and they do a damn fine job of policing their auctions.
Not working any more. It was interesting while it was up.
Never check ebay for things like this… always newsgroups; it’ll be free there!
I think this was up first, actually: http://aol.6brand.com/
Don’t quote me on that, but I think I saw it yesterday or the day before.
I took some time and built a database from the data that shows number of times certain things were searched for…possibly good for Search Engine Optimization (or just some entertainment)?
You can find it here: http://aoldb.unwieldy.net (not very pretty…just bandwidth-efficient)
Hey all.. Since this original one didnt seem to work, I got a tool up and running to analyze the AOL data. It works almost exactly like Overture (but this is top 1000). Additionally, you can click the keyword/phrase and get the average rank and list of sites clicked..
http://dontdelete.com
The domain is still replicating.. so you can get to it from the IP as well..
http://63.212.167.185
For all of you who can’t download the whole 2 gig database and search it.
We’ll be adding other queries soon..
FYI.. it’s still slow asbecause it’s 36 million records. The more specific your search the faster it’ll run.
I agree about the bad Karma.
For those exploiting this at the expense of other people’s privacy, I wonder how long it will take for those webmasters’ personal info to get dug up splashed all over the ‘net…a taste of their own medicine.
P.S. I’m not talking about blogs discussing this news, like this one, but people who set up the searchable databases. Karma is a b*tch.
not working, there are no results, darn missed the opportunity.
I also agree about the bad karma, especially for putting adsense on the interfaces. It’s also definitely illegal because at MOST, anyone only has a license that grants usage of the data for NON-COMMERCIAL purposes.
I’ve been looking at the data myself to see how easy it actually is to find out personal information. You can read about my findings at http://erinye.com/u500k/ if you like. Turns out there really isn’t much to say until someone accidentally stumbles over a user ID with useful data. It’s definitely not as bad as the media and blogs are making it seem (i.e. of course it’s bad for AOL, but not so much for the users that were recorded.) I’ll be updating that page as I find out more, but I don’t expect any miracles.
Here is another one-
http://aol.yogurtrat.com/
Lets you Vote for the best one and has a top 100 list. It is not done yet, they still are entering a lot of data into the db
@Danny
“(i.e. of course it’s bad for AOL, but not so much for the users that were recorded.)”
How about for the users who entered in their credit card information and their social security numbers?
I would be very worried if this happened to me and I knew I had purchased anything online thru the AOL service or used my social security number.
Fortunatly, I am not a big enough idiot to pay for an overpriced service
Here is another one too…
http://data.aolsearchlogs.com/search/index.cgi
hey kids… THERE’S NO SUCH THING AS KARMA.
@Mike: I’m not saying it’s all completely harmless and credit card numbers are very probably what could cause the most harm. Had you followed the link I posted, you’d know I had a perl script look through the data and search for valid credit card numbers (i.e. those with a valid checksum). It turned out there are less than 10 unique searches for credit card numbers and of those maybe a handful are actual searches from users for their own number. I’m thinking maybe I should just report those numbers to the respective issuers and that would settle the whole matter, at least, regarding credit cards.
As for social security, social security numbers have no checksum. Almost all 9 digit numbers could be social security numbers. It is nigh impossible to automatically locate them in the data set unless you have access to the central database. I only found four SSN type numbers that were in a query that explicitly included the term “ssn” or similar. Maybe there are more SSNs in the logs, but statistically, there shouldn’t be, say, more than an order of magnitude more.
I looked at the other searches from the users with the credit card numbers and the SSNs and it seems almost impossible to find out any more personal details from them.
I seem to get it to work whatever I put in returns nothing, but the other 2 in comment work thanks for the laugh!
“john - August 8th, 2006 at 10:23 am - hey kids… THERE’S NO SUCH THING AS KARMA.”
“Karma” is a sanskrit term that translates loosely to “doing” in english. Basically, if you rob someone and get caught and thrown in jail, this is your “doing”, or your “karma”. Karma does not refer to some mystical force keeping a tally on your deeds, but is in truth a lot more literal. The consequences of any action, be they positive or negative, is your own “doing” or “karma”.
In this case, the karma that AOL is feeling is the result of their “mistaken release” of this information. There’s nothing mystical about it– they messed up, they’re facing the consequences. That is Karma, which by literal defenition, does exist. It is semantically no different that the idea of “consequences”.
How about an AJAX one? I stripped out the user data from the results because there are some good uses for this data that aren’t creepy. I am still loading in the data, so it will be a while before it becomes useful.
http://websearchdata.com/
To Mike Payne regarding “karma”: Thanks for educating john, and all of us.
AOL was completely wrong to release this data. It was also completely wrong for others to set up online searchable databases. Two wrongs do not make a right. Those that set up online searchable databases are just as morally culpable as AOL. And from a legal perspective, they would probably be named in any impending lawsuits right alongside AOL. Trial lawyers take a shotgun approach to naming defendants and anyone that builds and sets up an online searchable database is certainly within shotgun range. Karma.
Luckily we’re not all in the United States of A. And uehm.. you cant invade every country you’ve got a horn in the side with..
Also.. the mentality of suing every living thing because things didnt go your way.. is .. the american way of life. Not the rest of the worlds.
This is great, maybe companies will start paying more attention to privacy.
There’s some sick stuff on there
If you’re brave, search the searches made by user “17635136″ (no brackets)
WARNING: Not for the faint of heart
Wonder if steps can be taken to stop guys like this
Here’s another sicko…
user: 9636476
You can tell the following about him by reading his searches:
1) He seems he was molested by his mother
He searched for “tongue warts” ewwwww i don’t want to know why
2) He is looking for a girl to do him in the a**
3) He lives in Bradenton Florida
4) He appears to be a contractor who deals in landscape lighting
5) He wants to buy a willys jeep 49 stationwagon
6) He really likes 9 or 11 year olds, for some reason his searches are always about 9 or 11 year olds if an age is entered.
7) He has researched sex offeder laws in Florida and Louisiana
9) Is a previously convicted felon
I’m sure the FBI could find out what sex offender this is. How many felon sex offender landscapers can there be in Bradenton, FL? This may have been private info at one point but it’s open domain now. Somebody go bust him.
@jordi: Actually, everyone can find out about sex offenders in Florida. There are 216 registered sex offenders in Bradenton, FL. Some of them are currently in detention and probably not searching for 9 and 11 year olds on the internet. 131 of them are in “released” or “parole” status. All of them have online accessible flyers complete with high resolution portrait, age, weight and other personal data.
So it seems registered sex offenders are really fucked, pardon the pun, when it comes to identifying them from search data, simply because it is trivial to get a list of possible identities. The next steps would be matching all 216 of them to the search history.
Keep in mind though that sex offenders’ data is really public in the US already, so you won’t discover anything new.
@ http://aol.yogurtrat.com/
Ha. Ha.
The Top 100 is *classic*, cannot wait until everything is all loaded.
Not to mention it is best ajax interface of them all, just need to be able
to search for other items besides just queries.
So, I wondered if the AOL user community ever searched for TechCrunch. Detailed analysis can be found here:
http://www.brianberliner.com/2.....arch-data/
Enjoy!
-Brian
Spent two minutes on the site and I think I have seen enough. I’ll lose faith in humanity if I pry any deeper.
Have a look at 19254818 and you might understand why.
Don’t really know what to think of all this. I support privacy laws since everyone desevres a right to live their lives without fear of unjust treatment for doing things or believing in something that doesn’t fit the status quo. But some people are just seriously messed up.
Some of the best ones collected on this website… will be adding more as often as possible.
http://www.dearaol.net
aolsearchdatabase.com updated their site, it says “Indexing 15 million searches”, pretty sweet thanks guys
Here’s a more interactive search tool with the data - Splunk - you can click on terms in the results to update your search - http://apophis.rhombic.net:8000/.
@Mike, Matt: Remember the license doesn’t allow you to use the data for commercial purposes, such as web sites with ads.
…and while all the web interfaces still exist, a really scary one I encountered: 16623147
if i were a writer and struggling with character development, this would be a dream come true.
Absolutely brilliant. Had the data not been retracted so quickly, it would have never reach such levels of popularity as it will now. That guy that released it should be in marketing.
There’s a website to analyze and duscuss particular AOL users: http://aol.zanoza.lv/
“My neighbour is killing cats”: http://aol.zanoza.lv/user/723190
“ways to kill yourself”: http://aol.zanoza.lv/user/9486162
“wife killer”: http://aol.zanoza.lv/user/17556639
A Face Is Exposed for AOL Searcher No. 4417749: http://aol.zanoza.lv/user/4417749
Another site where you can search this data is here:
http://www.datablunder.com/logitems/query/
Anyone care to discuss methods of splitting the AOL table data into smaller chunks for optomization (using MySQL 5.x)?
I’ve been tasked with making (yet another) search system for an internal application - nothing that hasn’t already been created 10x over.
However, i’m getting stuck on trying to split out the Query column into a unique table (17+hr query!). Anyone? chad edge at g mail dot-com