

It’s no secret how bad most of the analytics firms are at gathering statistically relevant data about Internet traffic. All of them, Quantcast, Comscore, Hitwise, Compete, Alexa, etc., are flawed in various ways and to various degrees.
But today’s blog post by Hitwise shows just how bad their data really is. They say that Craigslist is now the top searched term on the Internet, taking that honor from MySpace. Facebook is third.
But the real data is out there for the taking. Google Trends shows Google search data, and since Google commands such a large lead in search in most countries, presumably the data is accurate. Google trends shows exactly the opposite data as Hitwise – Facebook is by far the most queried term, followed by MySpace and then Craigslist.
I’m putting my money on Google when it comes to accurate search trends. And if I were Hitwise, I’d make very sure my search data conformed to whatever Google was saying.









here’s your problem, google only tracks google (and unlike what you and the other fanboys think, google isnt the only search engine). Hitwise tracks all searchs. Plus google removes many terms they deem un-suitable from their list.
i’m not sure hitwise is really tracking much of anything.
Clasilistados.org grew 350% in one month
Good job http://www.Clasilistados.com !! I liked your site
I’ve spent a good hour on the phone with Hitwise staff asking why they show my site has 50% less “share” (they do everything in terms of “share”) when Google and Quantcast (direct measured both) show my site AHEAD of competitors’ self-published (presumably inflated and optimistic) numbers.
They claim all they’re publishing is “share” based on their own sample, which, god forbid anyone confuse with reality. Yet they have no problem pimping their crap data via a constant stream of industry rankings press releases that purport to show which sites are leading.
Publicizing this faulty, conflated data makes it hard for a site to get the attention of prospects for bizdev, advertising, and M&A… EVEN WHEN YOU’RE WINNING.
How ever Hitwise is getting data, it’s crap. It makes it impossible to compare a site that gets 99% organic traffic against a site that is 50% organic, 50% PPC. For whatever reason, Hitwise’s sample favors the PPC buyer.
They deserve a good public flogging and whatever harm comes their way– they’ve cost plenty of other sites far more than they’ll likely suffer.
Google isn’t the only search engine? Get out, I didn’t know that.
You sir are an idiot. They are the largest, most popular and not by a little bit, but by a wide margin. If Craigslist was the top search term on MSN and Yahoo, it wouldn’t mean much if it’s not the top term on Google.
I do feel u r a die hard google fan. go out internet is more than just google. Google trends can not be just translated to internet trends .. i think u should see beyond its 40% search traffic outside google.
I do Love google when it comes to search but google is not god sometimes its just hollow effect, people dont wanna shift.
Google has 80% market share, not 60%. nothing else matters besides google. They’ll be at 90% in 2 years.
Well, you should take in consideration differences in demographic and psychographic make-up of users that turn to Google, Yahoo, and other searche engines. It’s been demonstrated by independent (offline, national representative samples) sources that Google search users are skewed towards more educated, liberal, and affluent people. Interestingly,k they are more likely to be pet owners, too. Yahoo users are more likely to be yonger and live in suburbs. MSN users are also affluent and educated, but more conservative. AOL users are older, less affluent, and more into traditional media like magazines and newspapers.
besides google trends is in beta and all they show is a graph that has no significant data other than showing the trend of “search volume index”. At least hitwise shows you all the details.
Hitwho?!
I trust Google data as well.
All the best,
Mike
Hitwise’s post is talking about US data. Your link is showing worldwide data.
Why are you comparing two different data sets?
the same shows for us data fool, you can filter that with google trends
I know this is where Google is headed and you alluded–What is the point of these companies if Google publishes the data freely?
Is google trends US only? Because hitwise clearly states it is a US only report.
Don’t be so smug – or brief in dismissing their data. Carigslist LISTINGS are up 75% year over year. Result of he recession. Just saying…
http://twitter.com/A_F
S-p-e-l-l when you type. Just a thought!
Hitwise: “There are three principle ways to measure Internet usage. A panel of users can be measured at their computers with installed software (user-centric), marketers can monitor how visitors interact with a specific website (site-centric), or data can be collected directly from ISP networks (network-centric).
The network-centric methodology employed by Hitwise enables the most efficient way of monitoring of how more people visit more websites than any other way of measuring Internet usage.”
So… Our ISP’s are reporting our search terms?
Hitwise in Australia only has access to a few small ISP’s and the biggest ISP’s like Telstra BigPond don’t participate.
So, with data coming from less than possibly 25% of the market their data is garbage.
Michael,
While your assertion is still correct it would be helpful if you compared apples with apples. The Hitwise blog specifically mentions US data. The more appropriate comparison would be to Google’s US stats rather than worlwide which is this URL
http://www.goog...=all&sort=0
This shows a very different picture to the global stats.
John, that graph you linked is right on the money. This shows how little Arrington knows about Internet metrics. This is a non-story, and Arrington should retract it based on that graph. Not that he will, of course.
and if you add another term http://google.c...=all&sort=0 it gets even more interesting.
Statistics is wrong 75% of the time
hit wise is incorrect ..even alexa doesn’t support it
This is the funniest comment i ever read! Alexa doesn’t support it! thats a good one.
If you look at US-Only data, the race is a little closer – but Facebook still wins.
http://www.goog...eo=usa&sa=N
But I wonder if a Yahoo, MSN, or Ask user is more likely to search for “Craigslist”. Would be interesting data.
good point on the US only data. as you say, results are the same although closer.
Why don’t you update your post then ? This is why blogs “should” be better than newspapers ?
Considering that GOOG is 63% of searches in the US, you can’t even infer anything from GG search trends.
So for once Michael, please amend your post and aknowledge your mistake.
Although I agree that most panel based data firms have hard time to get the right story (and Google data is probably the best you can get), it’s true that in order to compare Hitwise results you need to look just at US traffic and search data.
The US data is very different. Internationally, Facebook is winning MySpace by far, but in the US the competition is a lot closer.
By the way, I would guess that one of the reasons for that is MySpace Music which if I’m not mistaken, is not open for all the users from all the countries.
Michale, imo, the “secret” is to plug your site into their systems and compare with your actual log files. Then you can extrapolate the skew of each system.
And apply that skew percentage to any site.
Let us know what you come up with!
Thanks!!
A note to take by anyone who is doubting what is Google doing here. Google’s analytic is so advanced that they’re using sophisticated techniques in their system. For example, it has been reported on the internet that Google now uses LSI (latent semantic indexing). There are variants of LSI available today where some are more robust than others, but I wouldn’t be surprised that Google is using a superior LSI (perhaps a proprietary) .
So what is the relevancy of this LSI thing to accuracy. Well, standard LSI uses a matrix decomposition algorithm called SVD (singular matrix decomposition), and this algorithm can eliminate noise (synonym – a single word with multiple meaning or polysemy – multiple words with a single meaning) from the original dataset via feature selection in reducing the dimensions of user-by-keyword matrix. Well, a key-word metric that involves key-word feature-selection is more accurate than a metric that doesn’t.
I don’t know what type of analytics is Hitwise using, but by judging of what they describe in their site, one can conclude that they’re just using simple uni-variate statistical data-analytic methods in which noise removal is not available in uni-variate.
Google researchers have recently published some new algorithms in dimensional reduction (DR), which is the foundation of LSI, therefore one can infer that these dimensional reduction techniques must have been implemented and developed into their analytic products.
Google doesn’t say what they’re using, so as Hitwise, but one cannot escape the fact that Google researchers have published new algorithms in the domain of LSI (which are applicable in key-word noise removal or keyword feature selection) one then infers that their analytic must be more sophisticated than Hitwise.
Hitwise: “There are three principle ways to measure Internet usage. A panel of users can be measured at their computers with installed software (user-centric), marketers can monitor how visitors interact with a specific website (site-centric), or data can be collected directly from ISP networks (network-centric).
The network-centric methodology employed by Hitwise enables the most efficient way of monitoring of how more people visit more websites than any other way of measuring Internet usage.”
So… Our ISP’s are reporting our search terms?
Re-twee… err… re-posted due to posting under the wrong reply.
How do you track people such as me who block every tracker we know about?
Um, with trackers you don’t know about?
Hitwise do traffic sniffing at your ISP. You can’t prevent them from monitoring your traffic. You can obfuscate what you’re doing by using HTTPS where available.
Mike makes the same myopic “the US is the world” mistake with alarming regularity — at least once per year. It would be only mildly amusing if he wasn’t so arrogant and flippant about it, as if his word was beyond reproach.
Next time you want to claim someone’s stats are wrong Mike, I suggest (again, for the fifth time or so) that you check your own flawed assertions first.
Better yet, maybe you should just stop posting stories like this, since you get them wrong more often than not, and it undermines your own credibility.
As a former statistician I say they’re all flawed, all for their own reasons.
Anyone who puts their money on any one source as a basis for making real world business decisions is either delusional or they’re delusional and part of the old school belief that statistical extrapolation is a valid concept, which it is not given the global reach of the web combined with the flawed foundation in each statistical solution.
Google doesn’t count people who have javascript disabled.
Hitwise bases their info on extrapolation and if/then guessing.
Alexa relies on its core users – people who have the Alexa toolbar – and they pretend that if X number of their users go to a site, and if their user base is Y percent of the world of users, then Z must be how many people are really surfing to those sites.
They all spend a good amount of money on marketing spin to fake out people who think such stats are so important.
It’s just as bad with the sites themselves. We have X users. Well out of X users, how many sign up or visit once and never return? How many open a browser window, only to open another browser window or another program and leave the browser open in the background, while not actually being active on that page?
It’s all bogus and at most can help with generalized trending but only in limited ways.
NO matter what statistical say ,I vote for Google.although Google may not count the people who disabled there java script ,I believe there “trends is what truly means “trends”out of hitwise that completely sale there analytic for paid user only ,I believe what they count as data.is what data already Google collect and what already Google analytic extract for the first time
Trends on google. Yes.
Nice post! Also check out my blog/podcast at http://macmaniapodcast.com.
How can Hitwise make money if they’re supplying false data?
I would benchmark with Google too, possibly an opportunity for them on an enterprise level to take these companies on.
Why are we comparing apple to orange and making a statement here.
1. Period of comparison seems fine but what about region isn’t the Google Trend data heavily skewd towards US and the representation here is US or Global
2. If Hitwise’s claimed method of tracking thru ISP logs is correct than there is no way they can be wrong
3. In certain countries Search Engine Use itself is not more than 60% of Internet Users and Goolge may not be the leader in all countires
4. Is Hitwise reporting only search thru Search Engines or All Kind of Search?
Biggest issue around all this data accuracy is basically difference in methodologies and we always end up comparing apple with orange and think that they are not same.
In fact, to me it seems Goolge Trend Doesn’t explain methodology anywhere. For that matter Goolge doesn’t explain most of the things they do… and we still believe that they are always correct. Definitely a great brand building.
In a world there would always be people who like Orange and people who like Apple… so all these Analytics Tools would have their own fan following
Does anyone know which ISP’s sell Hitwise their data? I doubt the telecom or cable guys allow this. Who’s left? Only Netzero?
EVERYONE IS MISSING ONE MAJOR THING.
In countries that are “new” to the internet, or users who are new to the internet, they use Google to direct to new sites.
For example, “most” US people will type http://www.facebook.com to visit facebook. Firefox will autopopulate and you hit enter.
Newer internet users (like my mom) will go to Google and type facebook, then click the Facebook result.
In my opinion, this ads uncertainty to Google’s data and makes it skew international.
It seems to me that one explanation is the types of users each search engine attracts.
Largely, Google users are slightly more sophisticated and therefore able to type “Craigslist.com” and “Facebook.com” into their browsers without performing a search.
People who use MSN as their homepage, or Live… generally have it set that way by default. My mom, for instance.
And when she wants to go to Craigslist? She searches for Craigslist in MSN and then clicks the link.
Heck, my grandmother has Live as her homepage and searches Live for “Google” – and then Googles things.
Just a thought.
I was an econometrician from MIT.
Mike’s right. Any collected data needs to be tempered by judgement, experience.
Don’t get lost in the details. It’s easy to let that happen.
Hitwise should make a public apology before losing global credibility.
Idiot.
Google is only trash
The guy writing this is doing his job. He says things that are obviously incorrect to generate comments to correct him. The comments make this website appear more active than it actually is, and then generates more page impressions, which ultimately leads to revenue.
Your article makes sense, but there is a possible exception: if Google is gaming results through a partnership with one or another firms to list them first.
Not suggesting this is happening, but it’s not beyond comprehension, for reasons of the same business logic you mention that Hitwise should apply to their results.
The idea of gaining reliable metrics on a decentralized platform like the web seems to me impossible.
I think it is worth bearing in mind that Hitwise only collects data from a selection of ISPs so the data is going to be skewed slightly. A great example of this is that the hitwise data very accurately matches our findings on some of our Mens Lifestyle sites when compared to our internal analytics data whereas the womens lifestyle data for a specialist title of ours is out by quite a bit. This indicates that the market for the latter is probably not as well represented as the mens lifestyle audience. This doesn’t mean that Hitwise is crap but could be indicative of the geographic or behaviour with regards to the ISPs Hitwise is using. It would certainly be great if subscribers had more transparency on which ISPs Hitwise was using (or even what the geographic split was!).
I also would be cautious using the Google trending data as fact as well since Google isn’t necessarily the most accurate either. If anything, Google’s methodology is similar to Alexa’s but on a much larger scale. Many a time I have had to gain insight on that same niche womens lifstyle title and Google is unable to provide sufficient data. Comscore is is survey based instead of behaviour based which is dangerous as it relies on how users perceive themselves rather than on actual behaviour. As I always tell clients and colleagues, this data is useful for fact finding and helping to provide insight into the market and behaviour off-site. The most accurate data one can use and therefore rely on as fact is with an analytics tool (provided it is correctly implemented). Be careful slating these tools as you will invariably need them to build your case in future for online advertising or media spend and the data provided by these tools is far more insightful than anything available for television or outdoor display. The best thing to do is provide a footnote on the data source & methodology and be able to use the intelligence as a starting point for further investigation or testing.
what nobody seems to have noticed is that when you chart using trends or insight you are not charting the term you type in, but every single term/phrase that includes the term/phrase you have typed in.
for instance, use Google Trends or Google Insight to compare the term: card, with the term: credit card
are this many people really searching for card? you think?
what you are actually comparing is the volume (Trends) or the share (Insight) of every search term with the word ‘card’ in, versus every search term with the phrase ‘credit card’ in. because of the obvious overlap (i.e. card is also going to include all searches including the phrase: credit CARD, card is seen in trends to have greater volume, and share in insight).
because of the way google collects the data you simply cant make a comparison with data like Hitwise which i have found useful, at times, as it is charting the exact term you type in. In the example used here Hitwise and Trends are charting totally different data.
If you then factor in that the Hitwise data presented here is US only and the Trends data is worlwide, what you have is a blog based on a type of lunacy rarely seen these days.
Everyone else, great debate!
This is correct, the Hitwise data is looking at the single exact match terms.
Google Trends on the other hand looks at the “broad” range of terms within the main core term.
Hitwise is also tracking based on the number of “clicks” not impressions like Google Trends.
Did I mention that this is one of the worst researched articles I’ve had the displeasure of reading?
In February Compete had the right numbers for our site.
http://siteanal....com/?metric=uv
Maybe a one time happening.
You fail to mention that while all of these are flawed only Quantcast has taken the step of providing a tracking pixel so the measured data is accurate. Even LinkedIn now uses Quantcast.
Keep in mind, in the past what techcrunch calls “Make Believe” turned out to be “spot on”
http://www.tech...lieve-internet/
http://www.tech...gs-nothing-new/