Mozilla Stealth Data Project Could Be Just What The Internet Needs
by Michael Arrington on May 13, 2008

One of the most frustrating tasks about my job is finding reliable traffic and other usage data about websites.

But today, Mozilla CEO John Lilly and VP Engineering Mike Schroepfer said they may fix that problem in the future, via the massive installed base of Firefox users.

The State of Analytics Today

There are three ways to measure web traffic.

The first is user-focused and based on software installed on user machines. Services like Alexa and Compete get users to install software on their computers and then track surfing habits to come up with best guesses on Internet-wide traffic. It works in theory, but getting enough users to get statistically relevant results has proven challenging. Alexa is famously flawed, and while Compete seems to be somewhat better, it only tracks U.S. users. Comscore is another user-focused metrics company that tends to work well for large sites, not well at all for newcomers (and it is very expensive to access their database).

A second way to determine site useage is to track traffic directly from websites. Quantcast combines user surveys with direct tracking on websites (when they can get it) to estimate traffic. Comscore also does this with certain sites.

The third way is to track surfing behaviors via records from ISPs. Hitwise uses this method to provide web analytics to clients.

None of these services are particularly accurate (as can be seen by the fact that they almost always disagree with eachother). The problem is simply gathering enough data from enough users to be able to draw a picture-perfect image of actual Internet usage. That’s why I’ve called for Google to offer users to make their Google Analytics data publicly available. Would many people do it? Just the ones that want us to trust the user numbers and page views they claim.

How Firefox Could Fix The Problem

The product is still very early, say Lilly and Schroepfer. In fact, it doesn’t have a project name within Mozilla – they simply refer to it as “Data.” But the idea is fairly straightforward. Ask Firefox’s 170 million (and growing) user base if they would like to opt in to anonymous data collection on their surfing habits. Then take that anonymized data and create very statistically relevant analytics reports for all websites.

Only a small percentage of those 170 million users would have to agree to be tracked (Lilly said 1% is more than enough) to get useful data. There are Firefox users in every country, and the distribution is fairly attractive for worldwide analytics tracking. Only 29% of Firefox users are in the U.S. 13% are in Germany, 6% in France, 4% in the UK, and so on. Firefox is now available in 50 different languages.

Of course, this would track only Firefox users, not IE, Safari, Opera and other browsers. And Firefox users as a group may have different surfing habits than the Internet as a whole. But as Firefox usage grows more mainstream, this will become less and less of a problem. Mozilla estimates that they now have 18% market share across all browsers.

If and when this launches, it would likely be the most reliable public traffic and usage data available. Let’s hope they do launch it, and soon. I’ll be the first to sign up.

Advertisement

Responses

Comments rss icon

  • well stats from mozilla would be more accurate for sites from with tech saivy traffic, like techcrunch and digg. It definitely wont work for general sites where more than half of traffic browses using IE.

  • Brilliant.

    However, there’s one method left:
    Operating system.

    I wonder when this will happen.

  • That would be a pretty good business model. Hitwise makes a lot from selling their data.

  • @David Whittle

    You forget the option of reading the users minds directly – to work out what sites they browse, and even what sites they think about.

    Only hurdle to this is tinfoil hats. Damn tinfoil hats.

  • This is another example of how the browser can become the “center of universe” for users and business alike.

    With all the talks about Google (with iGoogle) and Microsoft trying create stuffs similar to FriendFeed, etc – I think the browser is the best place to start.

    Let’s face it. The first thing you see when you want to use the Web IS your browser. The trick to make it work is to make the browser experience more user friendly and acceptable to users with all these “social/business” features.

  • Elvris is right …. We need Microsoft …. ha ha ha did I really just say that? How about Microsoft and Mozilla working together …. I’m killing myself … I’m so funny!

  • How is this any different from Alexa? You still get a self-selecting group of users providing the data and that means your vulnerable to webmasters manipulating the results by constantly visiting their own site.

  • Agree with Monty. This is not different from the first approach you have mentioned, the only positive is you might get a larger sample size. But then, is that sample representative of the entire internet population – is the big question. Alexa is “famously flawed” because of this reason.

  • I’m sure Firefox is more mainstream then the Alexa toolbar, I’d definitely opt-in. I also wouldn’t mind if my GA stats were publicly available.


    http://crunchlabz.com

  • I have yet to hear about a Fortune 500 company that allows its employees to use a browser other than IE.

    I’m starting to notice a little twitter-mac-firefox navel-gazing from TechCrunch, perhaps it’s time to remind the editors that an overwhelming majority of people still use IE on Windows (and have never heard of, or will ever use) Twitter.

  • Most services I’ve seen are just basic. I mean that they still use univariate (descriptive) statistics to analyze web data. Perhaps the advanced analytics is just around the corner for those services mentioned in this article. There are other advanced tools for web analytics out there that use multivariate analysis (machine learning), such as ones from SPSS and SAS, but they’re quite expensive to buy a license from them, since they target enterprises.

    M said…
    “But then, is that sample representative of the entire internet population – is the big question. Alexa is “famously flawed” because of this reason.”

    One way to improve this if the dataset is small and not representative of the whole population is to use statistical bootstrapping (repetitive sampling with replacement).

  • I think it is great to have a reliable traffic usage data system, Thanks for sharing. Great post.

    thanks !

  • I hope the project goes forward, we do need a better traffic ranking system.

  • @ Original

    Many US Gov orgs let users install FF. Granted they have to request it, but it is a start. I use it at work and home. Done with IE forever. :-)

  • Why people would opt in to these statistics if they care at all about their privacy?

    IMHO the main reason would be promoting their websites. So, if you’re a small social site you’d ask all your users to opt in and if you’re a big newspaper you’d never do that. That way, the balance will shift.

    And FireFox is much better than IE in most aspects except compatibility with badly written websites.

  • Interesting indeed, browsers manufacturers are the perfect candidates for traffic measurements and stats solutions, where BTW, Microsoft’s IE is still dominant by far and can easily beat this initiative if they really want to.

  • This would be immense. I tend to use Compete more than Alexa but both of them can be fairly flakey.

    Online advertising (particularly long tail advertising) depends heavily on the ability to track eyeballs. Making this more transparent would be better for everyone in the industry.

  • This is like saying FireFox could fix the problem by bundling Alexa by default.

    I also can’t bare to think what the opinions would have been had this been a Microsoft initiative.

  • google has another source you didn’t mention. They could simply add tracking to their browser toolbar like the others but have a much better spread of traffic.

    yahoo and google could basically compete with compete, alexia with ease and have more varied people using it.

  • This is another example of how the browser can become the “center of universe” for users and business alike.دردشة

  • it’s a very good idea, but how do they deal with the cheating?
    alexa is too easy to be broken…

  • “Let’s hope they do launch it, and soon. I’ll be the first to sign up.”

    Isn’t this exactly the reason it’s nothing different from Alexa? The statistics will still be drawn from a biased ‘techy’ userbase including people like you and me, but ruling out people like my dad and my aunt who have never heard of Firefox or who don’t believe it can be beneficial to them.

  • This would be an amazing source of data but unless an ordinary user has some reason to share their clickstream, it would have the exact same flaws as the Alexa toolbar now: heavy webmaster/developer/tech skew.

    That being said, Mozilla could make an incredible amount of money liscensing that raw data.

  • What are the advantages to Mozilla to undertake this initiative. Are they going to start charging for these statistical data. More important, why should the user allow such a plugin in their firefox brower, clearly there are no incentives.

  • Also, if this works out, MS will immediately do this on their IE which is more popular and more mainstream.

  • Techcrunch used to put a public Sitemeter on all of its pages and made the stats public – as of two years ago – you removed the stats from all the newer pages and have NOT made your Google analytics or other stats public

    How can anyone have the audacity to complain and call for public Google analytics – WHEN you HIDE your own traffic?????

    Being at number 2 on Technorati – you must have impressive stats

    http://www.site...s=s26techcrunch

    If you are getting this much traffic from two year old posts – imagine how extreme your traffic is now

  • So by “just what the internet needs” you mean you and advertisers?

    And can we trust Mozilla that this would ALWAYS be optional? In the same way as options for blocking 3rd party cookies, or the modest “awesomebar”?

  • Porn sites and security / privacy sites would be under-reported, as those users would be less likely to opt in.

  • Interesting new, but believe in it hardly.

  • Great idea! However I am not sure how FF would be able to get something accurate up and running. There are just too many variables that play into data collection and I highly doubt that Mozilla has much experience in this field. (just look at Alexa, which does this for 10 years and still everybody is unhappy with it). I am also not sure how many users would voluntarily opt-in to submit their browsing data to a third party.

  • Sheet. Why didn’t I think of this? Oh, yeah. Opt-in always scares me as a business model but that group could pull it off. Also, the bandwidth and data storage for such a feat requires architecture that this lowly asian kid cannot afford at this time.

    Also, why doesn’t Google Analytics just anonymously aggregate its users data for statistical purposes. I am sure they are doing something like this internally already in some shape or form. They could easily compete with Compete, Comscore, etc. Or make it free! :-)

    Harry “if only I had more mulah (and time)” Wang

  • You might want to edit your post because it has a big error:
    Compete is using a mix of data sets.
    ISP data is one of the data sets and it has a big impact on the score.

  • my mind thinks advertising both needs this, and is afraid of it …. afraid, because what if the emperor really has no clothes, needs, because nothing is very quantifiable at the moment

    this ad game thing is going to be very interesting to watch with more data… my bet, collapse of business as we know it

  • Relying on a sample of self-selected users to provide an accurate picture of the entire population is very flawed, regardless of how its prettied up. There is no way that this better than the ISP traffic monitoring route.

  • If somebody offers some money for our attention ( user data is now technically called attention ) many will be happy to share their data. For years I am using Alexa and for a while A9, now Google History and many things I have come across but so far no one offered something in return for my valuable data. Some of the concepts of attention are well discussed in http://www.attention.org

  • I would never user it and I use FF daily.

    My seven year old would never use it, he uses IE.

    My mom and dad would never use it, they use Safari.

    Would provide accuracy in a very narrow audience.

  • So if 1% of 170 million is acurate enough ie 17 million that is less than Hitwise which has a sample of 25 million. Admitedly, they have less global representation but I found it to be more accurate than anything else on the market. I can see a huge bias towards the web dev community who would be using Mozilla instead of IE so that would present it’s own problems as surfing habbits would be different. Time for Hitwise to open up offices all over the world!

  • We’ve noticed some flaws with Google Analytics with regards to reality every once in a while, so we have (on a few pages) two different analytics to copmare and they don’t even always match. None of these are perfect.

  • By asking for permission, they’ve already skewed their data. But, this data will still be better than what we have today.

  • There is an opportunity here to aggrigate all the different sources. Isn’t Google starting to show comparrison stats? What if you took Compete, Alexa, Google (Urchin) and now FF and put them together into a metric that is more accurate than the individal parts?

  • I, too, wish Google Analytics would allow us to selectively publish public data. I’d do it in a heartbeat, and our business isn’t even traffic based!

    Compete’s data, at least in our corner of the universe, is pretty dang bad. They’re off by a pretty large multiple. We’re Quantified, so Quantcast’s data is remarkably close to Google’s – so you can see first hand just how off Compete is.

  • Agree with #35 and #40: self-selection skews the data. That’s statistics 101.

    What is the motivation of Mozilla? Is this supposed to become a revenue stream that makes them depend less on Google? How much would it cost? Can the revenue from such a service ever become more than 1% of what Google is paying Mozilla to include Google Search?

    If they are _not_ planning to make this a business: Why bother? The privacy aspects are going to hurt them immensely in Europe, in spite of opt-in. I already see the headlines about “backdoors” to start hidden tracking or spouses finding people’s porn statistics. IE is never going to have something like this built in (because it is not acceptable in an enterprise world), and would therefore be called “more privacy-friendly” then FF.

  • I see it as maybe an open platform (focusing on Firefox at the start) for data accumulation. Maybe they (or others) could then establish add-ons/plug-ins for other platforms (IE, mobile devices, etc.). Keeping it aggregate based, they could use opt-out instead of opt-in with a pop-up upon first load if it is installed along with the container software.

    Harry “that would be sweet but think of the web traffic implications” Wang

  • I would allow this as well, as long as I could then get access to the data. Imagine the business opportunities of aggregating this data and parsing into usable reports. Of course Hitwise and Comscore would be hurting, but…

  • Who would this be for? Website owners (who are reasonably well-served by web analytics tools like Google Analytics and Omniture)

    You’re still dealing with non-probability sampling of a group (i.e. Firefox users who decide to opt-in to Mozilla’s panel) that may or may not be representative of the internet population at large.

    This could potentially represent a step forward but details seem pretty scarce at present.

    Making Google Analytics data publicly available is not a good idea at all IMO. It’s incredibly easy to inflate your page view totals by making minor changes to your Javascript tags (e.g. such as by tagging events as page views).

  • Well great, I already wondered what’s comming next to god, old “GoogleFox”!

    Since Google strengthened their influence at Mozilla by hiring Firefox developers and funding the foundation, Firefox is going down.
    The only reason left to use it is AdBlockPlus!

    I think it’s really time for a fork: Firefox minus Google spyware!

  • Good stuff. I was an early contributor at Compete.com, so I know this space well. The most striking statement was Lilly saying, “1% is more than enough”.

    Just a heads up, but 1% is not enough. 1% of 170M gives you 1.7M. That’s 300K less than Compete’s 2M and about 500K more than comScore (don’t quote me on comScore as I haven’t followed comScore in awhile). Then take into account that Mozilla’s 1.7M is distributed all over the world vs. Compete’s 2M concentration in the U.S.

    Net result = Mozilla produces increasingly flawed stats, with less data, spread across more countries with an obvious browser bias. I suggest Mozilla concentrates on what it does well (ie. build great browsers) and reach out to Compete, comScore, Hitwise and share its data to help improve existing solutions. (Hint: You’ll make more $$ licensing the data than creating a Alexa 1.5).

  • I’ve never seen you get anything so wrong Michael.

    Study after study, and anyone running analytics on a monetized site can confirm, Firefox users are completely different than the average (read: IE) surfer. This is just another heavily skewed, virtually useless data point.

    I would think you’d know better.

  • this is great news and all, and def a step in the right direction, but it still comes down to the fact that the numbers i would get from this have no depth behind it. The major players (comscore/nielsen) are very highly priced because they work to maintain a balanced panel of people (or at least so they say) – compete and alexa can all be ‘gamed’ to make your site look better (opt-in panel software). the key thing that i would love to see from this is if mozilla can somehow extrapolate demographical data from this data (cause we all know that 1 unique is not the same across all points of the internet).

    if anyone wants to have a further discussion about this, def hit me up thru email – i love talking about this

Leave Comment

Commenting Options

Enter your personal information to the left, or sign in with your Facebook account by clicking the button below.

Alternatively, you can create an avatar that will appear whenever you leave a comment on a Gravatar-enabled blog.

Trackback URL
bugbugbugbug
Techcrunch on Facebook