If you’ve read this blog often enough, you’re aware that I’m a big fan of Quantcast. I’m also a fan of analytics software / providers and was excited to try out Compete’s new pay-per-usage analytics program. Compete now offers detailed data about websites or verticals – providing:

- Top Keywords, ranked by volume. - Site Share, percentage of all search referrals to a site via that keyword. - Keyword Engagement, index representing average amount of time spent after entering the site via that keyword. 100 represents that term resulted in the most time spent on the site. - Keyword Effectiveness, index that combines the total number of people referred by the keyword and the amount of time those people spent on the site. 100 represents the most effective term.

Conceptually, I love Compete’s model and would be willing to pay for usage… of course assuming that the data is accurate. I was skeptical at first considering that, unlike Quantcast and Google Analytics, Compete doesn’t have the most accurate form of data: embedded tracking on my site.

… My hypothesis was right. I pulled data on Techmeme and the results appeared fishy:

#1. Techmeme (12.4% site share, 100% keyword effectiveness)… obviously #2. Fred Wilson Venture (2.6%, 0.14%) #3. Techmeme Mesh (2.5%, 17.5%)… obvious again #4. New coach signature faye wedge shoes slides sandals (2.4%, 3.4%)… wtf? #5. Techmeme.com (0.9%, 3.8%)

Would you pay for data that accurately suggests that your branded keywords are effective? Even more problematic is that the unbranded keywords (what you really want) are clearly way off. Fred Wilson is thematically relevant, but shouldn’t be the #2 keyword by volume (or any other measure) - if it really is a blog, it would be TechCrunch (which represents 8% of Techmeme headlines). The below chart not only shows the bizarre variety of keywords, it shows the problem with including brand-related keywords within such data sets: they represent such a large portion of traffic that the it becomes impossible to compare the remaining data:

I obviously don’t have access to Techmeme’s logs, so just to validate that this data is troublesome, I ran the same report on my blog. The results were better (probably because I have less traffic and data to work with) – but still raised some serious questions. I’ve charted out the top queries below and, for the most part, am impressed with the keyword portfolio (except for bizarre entries like “Ryan Spahn” and “gorilla convict blog”, #3 and #8) - but the keyword volumes are way off.

I applaud Compete for pushing metrics into new, potentially powerful fronts. Unfortunately though, I’m not sure the data supports it yet. Additionally, Quantcast provides the same sort of keyword, audience and demographic data … which better accuracy… and for free. The first rule of thumb with analytics is accuracy is critical. And if you’re charging for that data - it really needs to be accurate.