Only 52% of web analytics spend goes on staff

On average, only 52% of web analytics expenditure is spent on internal staff, a figure which has not changed since 2011. This is despite 40% of companies in 2011 having planned to increase their budget on staff to analyse web data, which highlights that finding the right people is proving a difficult challenge

This craziness has to stop. For every $1 you spend on analytics tools, you should be spending $2 on staff. Minimum. Otherwise you might as well use free tools, despite their limitations.

What this stat shows is the effectiveness of analytics vendors' sales organisations. I've seen them in action and they're very impressive! One company I worked for spent 7 figures annually on its high-end analytics tool and had 1.5 analysts looking at the results and managing the implementation. Insanity!

The report brings up the skills shortage in our field. Something I plan to blog about later. We need to get better, as an industry, at cross-training people. On that note, I'm currently hiring junior web developers to cross-train as web analytics specialists.

How can you reliably work out a new "visit" to your sites?

How can you reliably determine if a particular page view on which your analytics code is running is a new arrival at your site? This is a difficult problem and I can't see any easy solution, so the "Direct" visit source will always be overreported along with all the other sources of inflation for that source.

I'm currently using this logic:
  • URL doesn't have a campaign code
  • Referrer is either blank or is from a domain that isn't in your list of internal domains

However this fails for links between HTTPS and HTTP pages within your own sites. When someone goes from a secure page to a non-secure page, the referrer is wiped out.

The only alternative I can think of is setting a cookie, but that won't work in my case because we have to support multiple domains.

The gold standard would be to send your beacons to a third party, which could apply some kind of time-based test to "direct" traffic to determine if it's truly "direct". That won't work in my current architecture.

Any ideas?

Attribution is still the hot new thing

John Wanamaker is credited as saying:
"Half the money I spend on advertising is wasted; the trouble is I don't know which half."
The issue still keeps marketers up at night, and digital attribution rides in on a white horse promising to solve this problem. The basic idea is to follow an individual user throughout the journey to buying something, including every trackable marketing touchpoint. That is, track display impressions, clicks on ads, visits to the site, SEM and SEO visits, social media clicks.
From this barrage of information, you somehow work out a way to confidently say "this ad, in this location, to this segment is working". Most marketers are a long way from that.
A new Forester report (funded by the Internet Advertising Bureau and a selection of attribution vendors, caveat emptor) surveys a bunch of "Marketing Executives" to see what the state of actual implementation is out in the real world.
There's some heartening results. If you thought you were behind the pack with  simple last-click attribution, take comfort that 44% of the respondents aren't allocating any credit for conversions to any marketing channel! Terrifying eh?
It can be a daunting field to enter, and the temptation is to jump into the most complex approach first. That would be a mistake. You'll spend a lot of time, energy and money getting a complete implementation of one of the high-end tools, and the gains will only be incremental to what you can do yourself. If you're not already using at least last-click attribution to inform your media spending, how is a more complicated, harder to explain approach going to get you more airtime in those decisions?
The more advanced approaches use complex algorithms to decide how to allocate credit across all the different touchpoints. This is well worth exploring once you've exhausted everything you can get out of simpler models up to and including linear allocation. There's a lot of gold to be had in those discussions, and as your organisation learns to make more data-informed decisions, you'll find more scope to ramp up the complexity to make additional gains.
One of the more interesting quotes from the report comes here:
“There’s no attribution approach that is 99.9% right, and it’s not coming along. But an inability to measure everything is not an excuse for not trying. You can measure a lot even with basic [fractional] attribution, and there’s a lot of improvement you can make.”
The report is well worth a read. Make sure your boss reads it too.

I'll post something soon about the multiple methods of attribution you can implement right now, using nothing more than JavaScript and out-of-the-box Omniture SiteCatalyst.

I'm hiring: Online data analyst

I'm hiring for an analyst in my team at Vodafone. We're an Omniture installation with some big advantages:

  • A mandate to push through big improvements.
  • Supportive, data-focussed management hierarchy.
  • One of the cleanest, most consistent Omniture implementations I've seen.
  • And, of course, you get to report to a smart boss who initimately understands web analytics.

This role will initially be a bit of a report monkey job, but we'd love to automate away all the repetitive pieces so we can all focus on the more interesting work.

Check out the job description and apply on the Vodafone careers page.

Yes, this blog is back!

And yes, by the way, I'm back posting on my web analytics blog. I took a break for a while when working for Datalicious, as all my thoughts on web analytics went to the company blog. Now I'm again able to share my thoughts here.

I'm also planning to get Web Analytics Wednesday happening regularly in Sydney.If you'd like to sponsor it, please get in touch.

What's missing from Google Analytics?

While looking at the pricing of our analytics service, my boss asked why we couldn't use Google Analytics and I've had a bit of a think about it. On the surface there's not a huge amount between the two really. Google Analytics has a great user interface, is well understood by developers and does almost everything right. But there's this one thing that is a show stopper for most of the places I've ever worked.

The problem is to do with how Google Analytics sets the cookies that identify visitors. Most web analytics tools set this in the HTTP header of the response from the data collection server. In the diagram below we see a company that has two domains. The cookie is set in the "" domain as a third-party cookie in the response the first time the visitor sends a data collection beacon. The upshot is that when the visitor goes to the "", that cookie identifying the visitor is also sent to the data collection server.

By contrast, the way Google Analytics works is to set a first-party cookie in JavaScript on the current domain. That means when a visitor goes to another domain, that cookie isn't available and so the visitor identifier is different, as you see below.

Yes, Google and third parties provide a few workarounds for this. They either don't work in all browsers or rely on the visitor going between the domains by clicking a URL that embeds the visitor identifier. If you want to see the overlap between two only-slightly related domains, this approach just isn't going to work. And if you're unable to pass the user through via an HTTP GET instead of a POST, you're out of luck for the most used browsers.

This is a really strange limitation in GA. The only reason I can imagine for it is that it makes the data collection servers much simpler and thus more easily deployed in the Google server architecture. To collect data the way Omniture, WebTrends et al do it, they'd need to be setting and refreshing unique identifiers on every data collection. Not ridiculously complicated, but a specialised data collection server which I understand is hard to get deployed in the core Google infrastructure.

The scale of this problem is huge. I haven't worked for a single company doing web analytics where the company has only one web site. You end up having extra domains for historical reasons of someone working around the domain names gatekeepers, or extra brands the company owns or acquires, and all kinds of reasons. This even happens in relatively small companies.

It's a big deal and until Google fixes it, it's going to continue to be a major limitation of Google Analytics.

How widespread is IPv6?

I've recently been asked about our tool vendor, Omniture, and their support for IPv6. It seems they currently don't support it, but are working towards it.

Does anyone know the current proportion of mainstream traffic coming through IPv6?

There's a bunch of data you'd lose with people coming from IPv6 endpoints. Most critically would be any Geo-IP mapping you're doing. If you're using IP addresses for visitor identification (and please don't!) you'll also have problems.

Omniture: execute plugin code only on page views

Omniture uses the term "plugin" to refer to a few different things. In this example, I'm talking about the plugins placed in the s_doPlugins function in your s_code.js file. This area is run whenever Omniture code is called, whether it by an s.t() page view call, an link call or an something like ClickMap.

My problem was that I wanted to trigger some code to run only when there's a page view and not in any other circumstances. Useful, for example, if you want to fire off something to another analytics system only for page views, or you've got some code that only makes sense for pages.

The trick is to test for the value of s.eo, which contains the object associated with the link. So when you do an and pass in the linked item (usually with "this") it'll become the value on s.eo. Page views don't have this link, and items that don't define it end up with a value of "0". That means this is an effective test to see if the current call is a page view.

Anyone know a better way to do this? Perhaps a way to definitively differentiate between the different types of call?


function s_doPlugins(s) {
        // Only run this code on page views
        if (s.eo === undefined) {
                // Your code goes here


Video performance reporting

We were asked if there was some way to provide a way to measure how users were experiencing video playback. That is, how long to people spend buffering and how often does the player run out of video and have to rebuffer. We're using Omniture SiteCatalyst for this, and already do the standard video reporting.
It's important to get this kind of information from the client side. We can do server-side reporting, but all we'll know from that is how many times the files were requested. The approach we took gives us information direct from our customers, so we can quantify what they actually experience.
I started out by creating an eVar (conversion variable) for the video identifier and another for the video player. Handily this is also what we'll need to do to start using the SiteCatalyst 15 video solution, which saves us one task there. Another eVar captures the buffer time. I also created four new custom events to captur
e the different steps in video playback. These events are recorded alongside the eVars and so we can report by video identifier (and its classifications) or player.
When the user first requests the video, we send the "Video request" event. When the video has finished buffering, we send the "Video first frame" event, alongside the number of milliseconds spent buffering, rounded to the nearest 100 milliseconds.
If, during playback, the player runs out of video and has to start buffering, we send the "Buffer underrun" event. Again when the video starts playing again, the "Buffer underrun first frame" event is sent, alongside the rounded buffering time.

So now we've got our eVars for video ID, video play
er and buffer time, alongside these video events. We can report on any of these events happening with those eVars.
A simple report looks something like this using a simple "buffer underruns / first frame" calculated metric to show us the videos that people are finding most problematic.

The problem is that the "worst" videos according to this report are viewed by a miniscule number of people. Some poor sucker has been desperately trying to watch VOD:37367 on his dialup connection, with 521 buffer underruns. That doesn't sound like much fun, but it doesn't really help us optimise our service for the majority of people.
So I came up with some simple calculated metrics that insert a modifier to show us poorly performing videos, but weighted towards ones where the poor performance is widespread.

So now we can see there's a few videos that are problematic, and for a reasonably number of people. We can start optimising and seeing if we can improve this experience.

That calculated metric:
[Video buffer underrun] / [Video first frame] * ( [Video first frame] / [Total Video first frame] ) * 1000