December 28, 2006

Google Top Searches: Based on Everything and Nothing

Michael Arrington

55 comments »

Google made a clarification today to their much criticized list of most popular queries in their search engine. The list raised eyebrows because it included some fairly unlikely terms. “Bebo” for example, was the number one term on the list.

In the original announcement, Google said: “Google today announced its annual Zeitgeist, featuring lists and charts of the most popular and fastest-rising global search terms that people have typed into Google.com.” It turns out that while that statement was literally accurate, it was still somewhat misleading.

In the clarification today, Google said: “we do not simply retrieve the most frequently-searched terms for the period — the truth is, they don’t change that much from year to year…Instead, we looked for those searches that were very popular in 2006 but were not as popular in 2005 — the explosive queries, the topics that everyone obsessed over. To come up with this list, we looked at several thousand of 2006’s most popular searches, and ranked them based on how much their popularity increased compared to 2005. (”Bebo”, for example, had very little traffic in 2005.) We also gave a bit higher score to searches with more traffic. Similarly, our “what is” and “who is” lists are not necessarily the absolute most frequent searches, but rather those that best represent the passing year.”

I agree with Liz Gannes at GigaOm. If at the end of the day Google is taking the top few thousand searches, subjectively picking a handful that are interesting and then redetermining the order based on velocity of growth rather than overall rankings, we end up with a list that is, in the end, completely meaningless.

  • Sphere It

Comments

I don’t think google intended zeitgest to be taken so seriously, I certainly wouldn’t say its meaningless..

 

Haha. Google sucks. It’s about time people realized it.

 

I think that the biggest problem is that the title above the top 10 is misleading and inaccurate. Instead of being called “Top Searches of 2006″ it should have been “Hottest Searches of 2006″ (as per their explanation). “Top” to me implies the most searched for - which in fact this list isn’t.

 

Google rules.

 

Yup! This habbit of Google to mask things is across the board and with time people are noticing it more and more. Google search product is great but it does not give them the right to say things in whatever way they like. Say it right or don’t say it.

 

The list does have some importance…,

because out of BILLIONS of searches done during the period of a year, these terms reveal the most activeGLOBAL TRENDS.

So perhaps the words Search TRENDS should be used in Google Zeitgeist to offer a better perspective.

 

It’s quite scary, I bank on gmail for all my mail. What are they talking about “personal backup”?? I guess the only way to backup is to enable pop and periodically back that up.

If google deleted all my mail I’d be pretty pissed!

 

Seriously though. Do we really want to see the same search terms such as “sex” “porn” “britney” “paris hilton” etc etc etc. I think what Google are going makes sense in context of a Zeitgeist and certainly makes it more interesting than the filth they would have to post if they just showed the top10 largest search terms for that year. I can guarantee sex, poker & ringtones would be in there every year.

 

I don’t believe they manually extracted the words, my understanding is that their formulation was very similar to Yahoo’s Buzz index.

 

just to remind google guys … “Don’t be evil …”

 

Metacafe? Yahoo was rumored to buy metacafe. It must be a prank by Google to inflate acquisition price.

 

> subjectively picking a handful that are “interesting”

Yep.
And not only in the Zeitgeist.

See: http://www.idcide.com/affair/

 

The funny thing is : Search for Bebo in google and the page contains a link to live search on “Bebo” :-). When did google started indexing other search engine’s search URLs?

 

Er, Mike? Where, in the statement you quoted, does Google say they picked the most interesting terms? Oh, that’s it… they DIDN’T. Yet you placed that word in quotes… and you’re taking them to task for accuracy and clarity????

 

I don’t think the list completely meaningless, quite the contrary.

 

rick - that’s a fair point. I didn’t mean it to be a quote in that sense, but I see what you mean.

Smaran - ok, what did you learn by reading the list?

 

The list isn’t *completely* meaningless, but it’s definitely meaningless for any kind of serious research, statistical analysis or a comparison with other services. And that’s what such a list should be used for.

This way, it’s pretty much as scientific as if they did a list of the top 10 albums for 2006.

 

Hmmm: nice google-bashing but not really a very accurate story: The list is clearly not meaningless, and the representation of ‘buzz’ rather than straight quantity alone is a very common and highly regarded methodology online now, used everywhere. So really you’re just saying they didn’t label the stats clearly enough for someone not to be able to find fault if they look REALLY HARD. Funny how you use the same licence with you own article headline in my RSS reader to get my attention by saying “Google Top Searches: Based on Nothing” - is that really accurate?.

Definitely it’s a worthy story to help better understand how these figures are arrived at, but I guess it’s a shame you couldn’t comment on that without sensationalizing the story. “SCOOP! GOOGLE SUCKS AND IS REALLY EVIL! READ MORE NOW…”

Sorry to be harsh but your audience is not stupid is it?

SCOOP! TECHCRUNCH READ BY IDIOTS! READ MORE NOW…

:-)

 

I have no problem with the list. It is more interesting to me than what the actual list would be.

1. Yellow Pages
2. Hot Babes
3. Dictionary
4. Maps
5. Animal Sex
6. White Pages
7. Boobs
8. Free Stuff
9. More Porn
10. TechCrunch

 

I couldn’t care less for the real top 10. I do care for the list that was presented. It’s much more indicative of how search trends were different this year than the last. We all know that the top 10 is likely to be the same mundane stuff year after year and mostly uninteresting. So while they could have been more clear, Google should probably stick to this zeitgeist format.

 

I can’t believe “Rebelde” made it to the list.

Rebelde is the name of a TV series and band franchise. It started in latin america and it’s been sold to different TV channels across spanish speaking countries.

Is it important? Maybe not to you. But that goes to show how small the number latin american searches are… and I really can’t believe that’s the most popular one… You would have to consider:
1. Third world countries: Most people I know don’t even use IE favorites (not to mention FF), they google something and if it came on the first page, that’s the way they will find it, so that’s probably a main reason why that word is so searched: the lack of use of bookmarks.
2. That franchise is aimed to 10 to 22 yr olds, mostly girls. I don’t know the exact numbers, but I’m pretty sure that they are the main age for internet users in latin america. I don’t see the mayority of older ppl using the internet, given the socio-economic reality of latin america.

 

I agree with what others have said, this format certainly makes the most sense relative to the alternatives. I usually find your views fairly fleshed out, but how about recommending an alternative here? I don’t think there is a feasible one. As others stated, a rephrasing of what Google means by the zeitgeist might be a good idea, but I also think they do it in a fun way, not expecting this intense scrutiny of what they include.

 

A note about Rebelde - since one can’t go to the obvious rebelde.com to read about the show, I imagine the next step would be to turn to Google or some other search engine. That might have inflated its Google frequency over what one would expect.

Also, I can’t imagine that Britney, porn, sex, etc. where unpopular searches in 2005. Thus they wouldn’t show up on a list that favors new over old.

 

Google rules and results are quite interesting, related to phenomenons happened this year. It’s really a surprise Rebelde, which everyone in Latin america know is a pop band, has achieved such a top position in search terms.

 

I think the biggest problem with the Google list is not posting the actual algorithm used to produce the top lists.

According to dictionary.com, zeitgeist is “the spirit of the time; general trend of thought or feeling characteristic of a particular period of time.” With that definition, I wouldn’t expect to see anything on that list that had been popular before; instead I would expect to see terms that were unique, yet popular, to the year in question.

Is there value in zeitgeist information? Sure, the enterprising could take this list and produce more quality content related to those terms. People are looking for them, why not produce more content driven to what people are looking for? I know that when I look at a list like this, my curiosity kicks in and I start learning about things I may not have known before.

Is there value in seeing the actual top 10? You bet. Knowing the top 10 searches, regardless of year, can speak volumes about society.

In my opinion, the lists they produce don’t go far enough. With the vast amount of data collected on search terms, I think Google could do a lot of fun things with that information at the end of the year. Like why can’t we see trend graphs for not only the zeitgeist, but the most popular search terms? Why can’t we see searches by region? Why can’t we see most popular sites that get clicked for those searches? If anything, I think they just stopped short of what they could do, and didn’t explain what they did enough.

 

I dont understand. I dont see anything wrong with the terms or relevance of the terms. Bebo and MySpace are social networking sites and the results are consistent with the social networking trends. Metcafe is a Video sharing site and the other terms are I think we all know.

 

The Google Zeitgeist is based on global traffic, and some of the comparisons and commentary are looking at US based traffic and wondering why they don’t see some of the popular search terms. Bebo isn’t very well known in the USA, but it is big in the UK and growing very fast, World Cup is a much bigger deal outside the USA, Rebelde is a fast growing brand name in Spanish, and so on. Please be really clear when looking at Hitwise, Comscore, Alexa etc. numbers whether they are US market focussed or worldwide.

 

No problem - google doesn’t suck- You don’t have so much money that the SEC wants to make you call yourself a bank… for no reason -

- No problem; They stated it was a list including the terms with the most velocity.

- and to CHRIS: you wouldn’t understand the formula they used. (by the way they probably didn’t use any)

thanks, Richard bowles

 

So Google’s going to buy Bebo so they can have a social networking site that is in english? Awesome.

 

Yeah, to say the list is meaningless goes too far. Rather, it’s more just poorly titled since “Top Searches” implies frequency rather than trend. Moreover, it didn’t sound so much like Google subjectively picked whatever they fancied for the Top 10, but based it largely on the percentage gains of keywords from the previous year. If you’re looking for shady behavior, this isn’t it.

 

This tendency to list what folks are searching for gives me pause. Usually I think, “Who gives a rats ass what others are searching.” Occasionally the thought is they have to fill the screen with something. Zoe……I’ll stick with the former.

What do you use a search engine for? Perhaps to find info, data, comparisons? If the tater needs help in search subjects mebbe it should stick to the couch.

;-)

-Marty

 

Meaningless or worse (misinformation?). Bebo does get a lot of traffic (http://snapshot.compete.com/bebo.com), but top of mind?? Maybe for tweeners or x-ers, but not for my world. A few geeks deciding the Zeitgeist for the net over Wolfgang Puck food at the Googleplex ain’t news!

 

“top of mind?? Maybe for tweeners or x-ers”

It may not be what you (or I) are searching for, but if we are to believe the Zeitgeist accurate, it is a big chunk of what’s happening now (that wasn’t last year), and can be an interesting piece of data.

 

I’ve written quite about this on my blog the last few days (it’s fun to scoop Technorati on something once in awhile), in case anyone’s interested.

 

What ever Google says is Gospel, righr. Google knows best what you should for any given search, eh? I get a little feed up when it comes to Google’s holier than thou attitude but they are the only real game in town right now. FYI, building links to my site, a Google pr6, I have been using LinkMachine and this software has been super in all respects. These comment were made for free but the link to LinkMachine is my affiliate link, cheers - MarkH

 

I think its still a useful zeitgeist, though I would like to see those static results as well.

 

This looks a lot like TF-IDF which is commonly used in data mining to uncover terms that help to best differentiate one “document” from another, which seems to me to be what something like Zeitgeist should be going for.

If you have a collection of documents (or popular search terms for different years), TF-IDF gives you the terms that help to best differentiate one document from the rest. If something like “bebo” was never searched for before this year and then reasonably high in the top searches this year, TF-IDF would rank it very highly.

 

Huh. Wonder how high “china censorship” got?

 

what? you didn’t search for ‘rebelde’ everday in 2006?

2006 was also the year when I stopped believing that everything that Google made worked and made sense. Maybe it’s time to get back to being a search engine.

 

Bebo is also the nickname of a popular young Indian film actress.

 

Google’s hand-chosen list of “most popular queries” reminds me of The New York Times bestseller list: It’s interesting, but it isn’t a list of bestsellers. The NYT list arbitrarily excludes categories that include most of the books sold in the US (among them the Bible, perenially #1) because they’re too middlebrow to interest NYT book editors. My job requires me to spend a lot of time looking at Web search logs for another large search engine, and unless Googlers are somehow unique, I can tell you that their list of “most popular queries” bears almost no resemblance to what users actually enter most frequently in their browsers. It’s a carefully pruned and sanitized list designed to make Google look good, not to represent actual user behavior.

 

I agree with the methodology Google used.
I created a daily updated mashup using the ClearForest semantic web services and CNN news feed, with the aim of showing the “product/person/city/country” of the day, etc. I started with absolute daily counts determining the newsmakers of the days, but I always got the same thing - bush, USA, new york, etc.
So then I used same algorithm as Google- computed a daily average for each entity, and saw on each day, which entity had the most number of articles above the average.
Now I get more exciting results, that really reflect what’s new & important for the day. &I continued calling them “product/city/country of the days,” as they’re still a top result for the day, just by a different measure.
I’d say Google had same thought process.
Info on my mashup here- http://imagine-it.org/sblog/in.....215-190257

 

I am so happy and excited to know that rebelde is in the top I mean. Just to think of that it all started as a Mexican novela (soap opera) and now they are the hottest music group in the world. Every one knows them, not just Mexico.

 

I like the methodology and idology of Google.

 

Google really needs help on alot of there methods of search engine placement. You search for something the results show websites that are 2 - 3 years old.

 

It is amazing how a name such as bebo which had little traffic in 2005 can be the top searched word in 2006!

That should give us all some encouragement!

We may be a nobody one day to standing on top of the world the next day.

Hard work does pay off in the end!

 

Sorry, the comment form is closed at this time.