tag:webanalyticsinpractice.com,2013:/posts Web analytics in practice 2017-03-05T20:59:57Z Simon Rumble tag:webanalyticsinpractice.com,2013:Post/1051556 2016-05-16T06:17:34Z 2016-05-16T06:17:34Z Uploading an SSL certificate to an AWS load balancer

So you've got an SSL certificate for the domain name you want to use to collect data and you want to use it. How do you do that?

  • Open the AWS console
  • In the top-left select Services > EC2
  • Click "Load balancers"
  • Select the load balancer and click [Actions] > Edit listeners
  • Add a listener for HTTPS (port 443)
  • Click Change under "SSL Certificate"
  • For Certificate Type select "Upload a new SSL certificate to AWS Identity and Access Management (IAM)"
  • Give the certificate a name
  • Add the pem-encoded Private Key, Public Key Certificate and then Certificate Chain.
    • The Private Key was created when you made the Certificate Signing Request you sent to the SSL certifier.
    • The Public Key Certificate is what your your SSL certifier sends you back.
    • Certificate Chain defines the signatures between your certifier upwards to the root certificates for SSL. Search your certifier for "intermediate certificate".
  • All these things need to be "pem encoded". If your certificate doesn't look like that, try this with the OpenSSL command-line tools.
Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/880825 2015-07-14T02:22:23Z 2015-07-14T02:22:24Z How much faith should you put into Doubleclick data in GA?

(Hint: Not a lot)

Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/680039 2014-04-21T22:31:51Z 2014-07-30T03:06:56Z What's worse than a pie chart?

Just got this in an email from Adobe.

Impressive. I guess the inner circle is Device Type, Middle is Device and outer is Operating System Version? Three layers of idiocy. Idiocy3 then?
Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/676107 2014-04-11T05:16:52Z 2017-03-05T20:59:57Z beaconWatch: Prototype of automated testing for web analytics

I've finally got my tool for testing web analytics beacons running reasonably well.

What is does is open a browser via Selenium, spin up a proxy server and tell Selenium to use it, then browse to your URL. The proxy parses out the URLs of beacons according to some basic rules.

It's very rough, but I want to get the concept out there and hopefully better developers than I can help me make it suck less!


Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/664558 2014-03-17T05:45:17Z 2014-03-17T05:45:18Z Stripped referrer header on iOS Facebook app

I worked with Jethro recently to diagnose an odd issue we had with Facebook referrers across a couple of publications. One of the breakthroughs came through, as expected, with Facebook referrer mostly intact and our analytics tools picked it up as social referrals. The other came through with a huge chunk of "Direct", when clearly (judging from the campaign code) it wasn't direct. What was going on?

We only worked out the cause by hooking up an Android and iOS device to proxy through Charles on a wireless network and inspecting the traffic as it flowed through.

What we discovered was interesting. Facebook links that bounce through bit.ly will lose their referrer header on iOS clicks made from the Facebook application. Facebook actually goes to some lengths to pass through referrers without leaking identifiable information (try not to snigger at the line "As part of our continued efforts to protect users’ privacy" from Facebook) but it seems Safari doesn't cooperate when it's redirected through something like bit.ly.

Moral of the story: obviously, always use a campaign code and (for non-GA) remap your sources based on what is in the campaign code. Also, avoid things like bit.ly if you get a lot of traffic from the Facebook app on iOS.

Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/631171 2013-12-16T22:34:24Z 2013-12-16T22:34:24Z Data layer standard released

After over a year of work, the W3C Customer Experience Digital Data Community Group has released the Customer Experience Digital Data Layer specification document. This is about as close as we're likely to get to a standard in the web analytics space. It's been a long slog and much of the credit has to go to to Viswanath Srikanth at IBM for herding the cats to get the job done.

This is an exciting time in web analytics. A data layer standard allows our industry to move to the next layer of abstraction. Once this gets implemented, we'll be able to focus on more interesting things than basic implementations. For example, with the common standard you shouldn't need to do anything special for regular things like an ecommerce implementation. Shopping cart software vendors will implement the data layer once, and you just pull what you need from there in your tag manager.

There's still much to be done, in particular the data layer helper library being built by Brian Kuhn at Google. The current data layer is static, rendered with the page at page load time echoing old school web analytics.

Brian's helper library makes the data layer dynamic, using the Google Analytics queue mechanism to enable changes to be made during the lifetime of a page, essential for modern web applications. The crucial change here is that while Google Analytics replaces the push() method of _gaq once loaded and handles everything itself, the helper library allows multiple listeners to register and be notified of any new updates to the data layer. So once an update occurs to the data layer, multiple analytics tools or tag managers are able to react to the change and do things. Very cool.

It's a great day. Really excited and planning my first implementation right away. 

Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/578057 2013-05-08T23:31:17Z 2013-10-08T17:25:11Z Data layer standards

I just listened to the latest episode of Rudi and Adam's Beyond Web Analytics podcast, all about data layer standards. This may sound like an esoteric subject but it's going to become really important in the future, and is key to out industry moving to the next level.

As technologies mature, there is always a tendency to standardize so that we can move to the next layer of abstraction. It means we've worked out the details of things that practitioners have embedded in their practices and we can move on to bigger and better things at higher levels.

So what is a data layer? In "traditional" web analytics implementations, you push information to your web analytics platform using its own platform-specific mechanisms. To record a "Newsletter signup" event in Google Analytics you might use:

_gaq.push(['_trackEvent', 'Newsletter', 'Subscribe', 'Customer list']);

in Omniture you might set:


If you wanted to switch between vendors or have something like an ad server conversion beacon inserted on the page, you have to write yet more platform-specific code. More code means more scope for error and for the events to fire on subtly different criteria, so your numbers never line up across systems.

A standardized data layer means instead you'll record things in a common manner and if you're using tag management you can very easily set up whatever analytics, ad server or other tools you want to fire on the same criteria. If well adopted we'll see platforms like Shopify, BigCommerce, Magento, CMSes and the like all supporting it and having turnkey web analytics implementations for 80-90% of use cases. Now that's a good thing so we practitioners can start working on more cool stuff and less tedious implementations and reimplementations.

This is a fantastic initiative and incredibly important. Check out the W3C community: http://www.w3.org/community/custexpdata/wiki/Main_Page

I'll be giving a talk on this next month at Web Analytics Wednesday Sydney.
Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/442962 2013-04-15T06:47:23Z 2016-07-08T13:11:59Z The coming cookiepocalypse

Slides from my presentation to Web Analytics Wednesday last week.

Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/442976 2013-02-13T22:47:00Z 2013-10-08T16:56:28Z 3D view of the inside of your ecommerce store

inside.tm is a pretty cool way to visualize traffic flows through your site. Very clever. Even a technophobe like Gerry Harvey would get this idea!

Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/442989 2013-02-11T05:11:51Z 2013-10-08T16:56:28Z Business case for Tag Management
I'm working on a business case for tag management. If only this were an acceptable thing to present, it covers the lot really!

Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/443002 2013-01-07T01:22:00Z 2013-10-08T16:56:28Z Web analytics in the real world

(Gmail messed up the original post. Fixed manually.)

Over the break a news story bubbled up about Euclid's retail analytics product. These kinds of tools are pretty damn exciting, and scary too.

The premise is this: track individuals as they move through a retail store using the unique MAC address their smartphone's wireless gives out. Euclid explains that by tracking signal levels, they can triangulate the individual's location within the store. Visitors don't need to have their WiFi actually connected to the store's network, just have it switched on.

Another company doing this is Helsinki-based RapidBlue, who illustrate it with this diagram:
This is pretty amazing stuff and brings the kind of analytics and optimization we regularly do online into the retail environment. Conversion rates, dwell times, split tests, the whole lot.

How many individuals walked past your store? How many then went in? Then how many bought?

Thinking about this further, you could get even better at it:
  • Directional antennae to isolate specific areas of the store
  • Highly directional antennae pointed at the checkouts to record sales
  • Match sales using credit card number or loyalty cards, suddenly you've matched the MAC address to a CRM identifier
  • Give visitors an app with some kind of discount and you can automatically match MAC to CRM identifier
  • Free in-store WiFi and you can see what sites they're looking at as they browse through the store
I can think of some further ways retailers might track people beyond smartphone MACs:
  • Long-distance reads of RFID devices (public transport cards, contactless credit cards, security passes)
  • Partner with mobile telcos to bring mobile coverage into the store, in exchange for sharing anonymized identity information
  • Bluetooth MACs
Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/443007 2012-12-10T23:35:00Z 2013-10-08T16:56:29Z Web Analytics Wednesday in Sydney
Some of you have probably heard me talk about this for some time. Well I've finally got around to setting up a WebAnalytics Wednesday in Sydney. In 2013, this will run every month on the second Wednesday of the month.

The first event will be:

Wednesday, January 09, 2013 at 6:30 PM
at the City Hotel on Kent Street in the Sydney CBD.

What's Web Analytics Wednesday?
It's a casual social meetup of like-minded web analytics, digital marketing and optimization types. We're a reasonably small community, so it's worth getting together and learning from each other.

What format will it take?
We'll have a couple of short presentations, but the focus is on networking. If you've got something you'd like to present, please get in touch with me!

What does it cost?
It's free! I'm looking for sponsors willing to cover food, drinks and venue hire. Please contact me.

Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/443022 2012-10-18T03:45:18Z 2013-10-08T16:56:29Z Iframeception
I'm integrating Omniture with a (very poorly done) web chat system. The solution involves their iframe pointing at their domain sitting on our site which creates a popup for the chat session. The popup contains an iframe pointing at our site with parameters passed into the querystring. My code reads the querystring parameters and inserts the analytics beacon. Iframeception. Apparently it's not possible to do this in any remotely sane way, according to the guy I'm dealing with anyway.

Amazingly it actually works. Even in IE. Without any frigging around with P3P headers. It seems IE sends the cookies in the header for third-party cookies if they were set earlier in the session, so it's all okay. I was surprised.
Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/443040 2012-10-15T01:54:03Z 2013-10-08T16:56:29Z Optimizely throws down the gauntlet to Adobe This piece on TechCrunch talks about how Optimizely is rapidly catching up on industry leader Test&Target.


In the comments, an Adobe product mangler points to a blog where they discuss unnamed competitors. Optimizely follow up by throwing down a challenge: try Optimizely and Test&Target side by side and decide.

If you haven't tried the Optimizely demo, well worth a go. It's awesome. Love the product. Be warned though, the pricing isn't the Enterprise Grade you're used to ;)
Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/443042 2012-09-19T23:32:00Z 2013-10-08T16:56:29Z Real time: you probably don't need it, but you're going to have it anyway

A big buzz in web analytics for the last while has been real-time. It's one of those really cool features that everyone gets all tingly about it. But you don't need it. Seriously, you don't.

Real-time is popular because managers have a Napoleon complex. We all see ourselves as generals sitting on top of the hill, directing our minions into battle. "Send reinforcements down the left flank." War happens in real-time and important decisions need to be made based on real-time outcomes. Marketing and business don't work this way.

The geeks among us all want to feel like we're in the control room of a nuclear power plant, or the War Room from Doctor Strangelove. Those real-time graphs tickle something deep in our consciousness, make us feel like we're alive with fresh data, completely up-to-date. But it's an illusion.

If you're making decisions based on real-time data, you're probably doing it wrong and spending a lot of time spinning your wheels. You really shouldn't be wasting time implementing the kinds of changes you'd make on real-time results. It's just a waste of time. Spend time doing quality analysis of a decent length of time's data, then make a decision.

That's not to say there's absolutely no valid use cases for this stuff. I'm sure in a digital newsroom it's be great to see the results of a breaking story in real-time, tweaking headlines and allocating resources to popular stories. Tools like Chartbeat pretty much have that market nailed.

Google Analytics has a pretty half-arsed attempt at real-time. It's pretty crappy though. No conversions makes it a complete non-starter for any ecommerce business, and the design is completely wrong for the standard use case, sticking it up on a big screen hanging on the wall.

So there's plenty of reasons not to use real-time. You'll probably will end up using it anyway. The pull of a real-time dashboard is just too great. Your boss's Napoleon complex will be stroked by a Big Board dashboard. It'll raise your team's profile. There'll be people standing in front of it, mesmerized, watching the data come in. Particularly when there's a big launch.

The challenge for us analysts is to ensure we do the minimum possible, and keep the real-time data as unactionable as possible. Reducing to a bare minimum the number of "why'd that go down in the last hour" questions we waste time answering.
Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/443067 2012-08-30T11:47:00Z 2013-10-08T16:56:29Z Omniture jumps the shark

3D pie chart. Check.
Gradient fill. Check.
Drop shadows. Check.

Oh dear. And it's not 1st April.

Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/443081 2012-08-23T02:05:00Z 2013-10-08T16:56:29Z Automated web analytics testing with Selenium WebDriver and Sauce Labs

I've been vaguely aware of Selenium and WebDriver for years through my mate Simon, but hadn't dipped my toe yet. Last night I had a little time to check it out.

For those who don't know, Selenium is a framework for automated testing of web applications. WebDriver is the component that allo


ws you to drive individual browsers. There's now hosted services that will spin up browsers for your tests at your command, one of which is Sauce Labs, so you don't have to go through the pain of setting up and maintaining all the browsers and platforms you want to test.

There's two ways you can test analytics tagging in this environment. The ideal is to set up a proxy (see Steve Stojanovski's approach) and then inspect the beacons as they pass through. That would definitely be an ideal situation, but it introduces another element to deal with. My approach is a bit simpler and involves getting WebDriver to run a bit of JavaScript that finds and outputs the beacon URLs for us to then handle.

To run this script you'll need Ruby, a Sauce Labs account and the WebDriver Ruby bindings. Follow Sauce Labs' instructions to get set up. The Sauce Labs free account gives you plenty of minutes to get started. You'll need to replace the username and API key with your details. Then run the script.

This script is incredibly simplistic. It just loads the page, runs the JavaScript, and checks to see if there was anything in the output. I'm having trouble with using it in IE6, so it'll need some improving. At the moment this is just for Omniture beacons, but it wouldn't be difficult to get it looking into any other types of beacons.

So this at least allows a simple sanity check that your analytics beacons are at least firing. A good start. I'll start doing more soon, though I suppose I should learn some Ruby first.
Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/443092 2012-08-09T06:34:00Z 2013-10-08T16:56:30Z Omniture is so 1996

Training a new junior analyst with a web dev background I found myself cursing, yet again, how old fashioned Omniture implementation still is. JavaScript is a wonderfully expressive language with a beautiful object syntax for representing real-world behaviour. Omniture doesn't use any of this. Everything is strings with cryptic, archaic delimiters. Their JavaScript is Stringly Typed.

In case you don't know, the syntax of Omniture's product string looks like this:

 category;product name;quantity;totalprice[,category;product name;quantity;totalprice] 
For example:
The quantity and price are optional for any event except "purchase". Of course. Category is deprecated, so instead you have to always add a leading semicolon. Though the examples in the knowledgebase still helpfully include the deprecated category. Nice work.
Now try explaining that to a developer who won't read anything longer than a sentence, and expecting him to get it right. The number times I've seen "REPLACE THIS WITH THE VALUE" showing up in my Omniture reports is insane.
How about this as a better idea?
s.products = [];
s.products.push( {
productname: 'lemonade (cans)',
quantity: 3,
totalprice: 3.30,
s.products.push( {
productname: 'cola (bottles)',
quantity: 2,
unitprice: 1.23,
Any developer will understand this syntax immediately, no documentation required.
Now to convert that into Omniture's preferred Stringly Typed variable. Show this inside the s_doPlugins() block of your s_code file:
if (typeof(s.products) === 'object') {
var outputString = '';
for (var i = 0; i < s.products.length ; i++) {
if (s.products[i].totalprice === undefined) {
s.products[i].totalprice = s.products[i].unitprice * s.products[i].quantity;
if (i > 0) {
outputString += ",";
outputString += ";" + s.products[i].productname
if (s.products[i].quantity && s.products[i].totalprice) {
outputString += ";" + s.products[i].quantity + ";" + s.products[i].totalprice;
s.products = outputString;
Note, this isn't really tested and won't handle some of the more esoteric uses of s.products.
So really Omniture, this shouldn't be so hard. Why can't you update your code mechanisms to the 21st Century and start using the full expressiveness of the language? It will make implementation easier, because it will make more sense to developers!
Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/443103 2012-07-27T00:18:00Z 2013-10-08T16:56:30Z SnowPlow web analytics

I'm really excited about this new open-source web analytics system. The loosely-couple architecture is brilliant, meaning you can swap out components as needed.

One example I've already found is the limitation of using CloudFront as the data collector. It's very easy to set up, but limits you to GA-style first-party cookies and an inability to reliably track across domains. I'm already planning to write something in NodeJS to replace this component.

There's no GUI, but it's incredibly easy to set up data collection and just start collecting. Unlike ordinary web analytics tools, you're collecting everything. Tools like GA and Omniture throw away things like full URLs, full User-Agent strings and full referrer URLs, which means there's some times of questions you can't answer unless you were capturing the data the right way from the beginning.

As an example, I recently got asked to analyse our on-site search. Some of the searches are recorded in SiteCatalyst, but other aren't. Even though the keywords are included in the request URL every time, we're bang out of luck doing any keyword-level analysis on historical searches. If we had something like this recording data, the analysis would be tricky but possible using Hive.

Very cool stuff. If you're reading this on my blog site, you're already being recorded to it.

Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/443106 2012-07-23T02:18:00Z 2013-10-08T16:56:30Z Introduction to web analytics reading list
I've got a new person starting who's a bit of an information sponge. In the lead up to him starting, I thought I'd put together a bit of a reading list to get him up to speed on web analytics, digital marketing and all the other things we look at.

Since this is probably generally applicable, I'll share it here. I'll continue adding to this as I think of new things.

  • Omniture SiteCatalyst: the gold standard of enterprise analytics, though gold isn't supposed to tarnish right?
  • Mixpanel: a new tool that captures every data point, and does amazing things with it.
  • KISSmetrics: next generation web analytics tool.
  • QuBit OpenTag: container tag management platform, to make all the tags manageable. This one has a free offering so you can stick it on your own site and play with it.
  • Ghostery: (all browsers) find out what beacons are on any page
  • Collusion: (Firefox) who's tracking you online, and how do they overlap?
  • Omniture SiteCat debugger: (Chrome) what's being sent to SiteCat on the current page
  • Occam's Razor: Avinash Kaushik's always writing great stuff, great insight.
  • ClickZ: a broad look at the whole online marketing space.
  • Adam Greco: one of the gurus of SiteCatalyst implementation.
  • Andrew Chen: exciting stuff at the boundary of behavioural psychology and analytics. The new field of "Growth Hackers".
  • Which Test Won?: how good is your gut over an A/B test?
  • Nir and far: Another Growth Hacker with interesting things to say.


Quora topics
Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/443108 2012-07-06T03:29:39Z 2013-10-08T16:56:30Z Only 52% of web analytics spend goes on staff
On average, only 52% of web analytics expenditure is spent on internal staff, a figure which has not changed since 2011. This is despite 40% of companies in 2011 having planned to increase their budget on staff to analyse web data, which highlights that finding the right people is proving a difficult challenge


This craziness has to stop. For every $1 you spend on analytics tools, you should be spending $2 on staff. Minimum. Otherwise you might as well use free tools, despite their limitations.

What this stat shows is the effectiveness of analytics vendors' sales organisations. I've seen them in action and they're very impressive! One company I worked for spent 7 figures annually on its high-end analytics tool and had 1.5 analysts looking at the results and managing the implementation. Insanity!

The report brings up the skills shortage in our field. Something I plan to blog about later. We need to get better, as an industry, at cross-training people. On that note, I'm currently hiring junior web developers to cross-train as web analytics specialists.
Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/443121 2012-07-02T01:35:16Z 2013-10-08T16:56:30Z How can you reliably work out a new "visit" to your sites? How can you reliably determine if a particular page view on which your analytics code is running is a new arrival at your site? This is a difficult problem and I can't see any easy solution, so the "Direct" visit source will always be overreported along with all the other sources of inflation for that source.

I'm currently using this logic:
  • URL doesn't have a campaign code
  • Referrer is either blank or is from a domain that isn't in your list of internal domains

However this fails for links between HTTPS and HTTP pages within your own sites. When someone goes from a secure page to a non-secure page, the referrer is wiped out.

The only alternative I can think of is setting a cookie, but that won't work in my case because we have to support multiple domains.

The gold standard would be to send your beacons to a third party, which could apply some kind of time-based test to "direct" traffic to determine if it's truly "direct". That won't work in my current architecture.

Any ideas?
Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/443129 2012-06-28T22:50:00Z 2016-07-08T07:13:16Z Attribution is still the hot new thing
John Wanamaker is credited as saying:
"Half the money I spend on advertising is wasted; the trouble is I don't know which half."
The issue still keeps marketers up at night, and digital attribution rides in on a white horse promising to solve this problem. The basic idea is to follow an individual user throughout the journey to buying something, including every trackable marketing touchpoint. That is, track display impressions, clicks on ads, visits to the site, SEM and SEO visits, social media clicks.
From this barrage of information, you somehow work out a way to confidently say "this ad, in this location, to this segment is working". Most marketers are a long way from that.
A new Forester report (funded by the Internet Advertising Bureau and a selection of attribution vendors, caveat emptor) surveys a bunch of "Marketing Executives" to see what the state of actual implementation is out in the real world.
There's some heartening results. If you thought you were behind the pack with  simple last-click attribution, take comfort that 44% of the respondents aren't allocating any credit for conversions to any marketing channel! Terrifying eh?
It can be a daunting field to enter, and the temptation is to jump into the most complex approach first. That would be a mistake. You'll spend a lot of time, energy and money getting a complete implementation of one of the high-end tools, and the gains will only be incremental to what you can do yourself. If you're not already using at least last-click attribution to inform your media spending, how is a more complicated, harder to explain approach going to get you more airtime in those decisions?
The more advanced approaches use complex algorithms to decide how to allocate credit across all the different touchpoints. This is well worth exploring once you've exhausted everything you can get out of simpler models up to and including linear allocation. There's a lot of gold to be had in those discussions, and as your organisation learns to make more data-informed decisions, you'll find more scope to ramp up the complexity to make additional gains.
One of the more interesting quotes from the report comes here:
“There’s no attribution approach that is 99.9% right, and it’s not coming along. But an inability to measure everything is not an excuse for not trying. You can measure a lot even with basic [fractional] attribution, and there’s a lot of improvement you can make.”
The report is well worth a read. Make sure your boss reads it too.

I'll post something soon about the multiple methods of attribution you can implement right now, using nothing more than JavaScript and out-of-the-box Omniture SiteCatalyst.

Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/443143 2012-05-07T00:26:00Z 2013-10-08T16:56:30Z I'm hiring: Online data analyst

I'm hiring for an analyst in my team at Vodafone. We're an Omniture installation with some big advantages:

  • A mandate to push through big improvements.
  • Supportive, data-focussed management hierarchy.
  • One of the cleanest, most consistent Omniture implementations I've seen.
  • And, of course, you get to report to a smart boss who initimately understands web analytics.

This role will initially be a bit of a report monkey job, but we'd love to automate away all the repetitive pieces so we can all focus on the more interesting work.

Check out the job description and apply on the Vodafone careers page.

Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/443144 2012-04-19T01:30:00Z 2013-10-08T16:56:30Z Yes, this blog is back!

And yes, by the way, I'm back posting on my web analytics blog. I took a break for a while when working for Datalicious, as all my thoughts on web analytics went to the company blog. Now I'm again able to share my thoughts here.

I'm also planning to get Web Analytics Wednesday happening regularly in Sydney.If you'd like to sponsor it, please get in touch.

Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/443154 2012-04-19T01:14:00Z 2013-10-08T16:56:30Z What's missing from Google Analytics?

While looking at the pricing of our analytics service, my boss asked why we couldn't use Google Analytics and I've had a bit of a think about it. On the surface there's not a huge amount between the two really. Google Analytics has a great user interface, is well understood by developers and does almost everything right. But there's this one thing that is a show stopper for most of the places I've ever worked.

The problem is to do with how Google Analytics sets the cookies that identify visitors. Most web analytics tools set this in the HTTP header of the response from the data collection server. In the diagram below we see a company that has two domains. The cookie is set in the ".BigCompany.com" domain as a third-party cookie in the response the first time the visitor sends a data collection beacon. The upshot is that when the visitor goes to the "BigCompanyShop.com", that cookie identifying the visitor is also sent to the data collection server.

By contrast, the way Google Analytics works is to set a first-party cookie in JavaScript on the current domain. That means when a visitor goes to another domain, that cookie isn't available and so the visitor identifier is different, as you see below.

Yes, Google and third parties provide a few workarounds for this. They either don't work in all browsers or rely on the visitor going between the domains by clicking a URL that embeds the visitor identifier. If you want to see the overlap between two only-slightly related domains, this approach just isn't going to work. And if you're unable to pass the user through via an HTTP GET instead of a POST, you're out of luck for the most used browsers.

This is a really strange limitation in GA. The only reason I can imagine for it is that it makes the data collection servers much simpler and thus more easily deployed in the Google server architecture. To collect data the way Omniture, WebTrends et al do it, they'd need to be setting and refreshing unique identifiers on every data collection. Not ridiculously complicated, but a specialised data collection server which I understand is hard to get deployed in the core Google infrastructure.

The scale of this problem is huge. I haven't worked for a single company doing web analytics where the company has only one web site. You end up having extra domains for historical reasons of someone working around the domain names gatekeepers, or extra brands the company owns or acquires, and all kinds of reasons. This even happens in relatively small companies.

It's a big deal and until Google fixes it, it's going to continue to be a major limitation of Google Analytics.
Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/443157 2011-05-17T04:02:00Z 2013-10-08T16:56:30Z How widespread is IPv6?

I've recently been asked about our tool vendor, Omniture, and their support for IPv6. It seems they currently don't support it, but are working towards it.

Does anyone know the current proportion of mainstream traffic coming through IPv6?

There's a bunch of data you'd lose with people coming from IPv6 endpoints. Most critically would be any Geo-IP mapping you're doing. If you're using IP addresses for visitor identification (and please don't!) you'll also have problems.
Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/443000 2011-05-10T03:27:00Z 2013-10-08T16:56:28Z Omniture: execute plugin code only on page views

Omniture uses the term "plugin" to refer to a few different things. In this example, I'm talking about the plugins placed in the s_doPlugins function in your s_code.js file. This area is run whenever Omniture code is called, whether it by an s.t() page view call, an s.tl() link call or an something like ClickMap.

My problem was that I wanted to trigger some code to run only when there's a page view and not in any other circumstances. Useful, for example, if you want to fire off something to another analytics system only for page views, or you've got some code that only makes sense for pages.

The trick is to test for the value of s.eo, which contains the object associated with the link. So when you do an s.tl() and pass in the linked item (usually with "this") it'll become the value on s.eo. Page views don't have this link, and items that don't define it end up with a value of "0". That means this is an effective test to see if the current call is a page view.

Anyone know a better way to do this? Perhaps a way to definitively differentiate between the different types of call?


function s_doPlugins(s) {
        // Only run this code on page views
        if (s.eo === undefined) {
                // Your code goes here


Simon Rumble
tag:webanalyticsinpractice.com,2013:Post/443030 2011-04-21T06:11:00Z 2013-10-08T16:56:29Z Video performance reporting
We were asked if there was some way to provide a way to measure how users were experiencing video playback. That is, how long to people spend buffering and how often does the player run out of video and have to rebuffer. We're using Omniture SiteCatalyst for this, and already do the standard video reporting.
It's important to get this kind of information from the client side. We can do server-side reporting, but all we'll know from that is how many times the files were requested. The approach we took gives us information direct from our customers, so we can quantify what they actually experience.
I started out by creating an eVar (conversion variable) for the video identifier and another for the video player. Handily this is also what we'll need to do to start using the SiteCatalyst 15 video solution, which saves us one task there. Another eVar captures the buffer time. I also created four new custom events to captur
e the different steps in video playback. These events are recorded alongside the eVars and so we can report by video identifier (and its classifications) or player.
When the user first requests the video, we send the "Video request" event. When the video has finished buffering, we send the "Video first frame" event, alongside the number of milliseconds spent buffering, rounded to the nearest 100 milliseconds.
If, during playback, the player runs out of video and has to start buffering, we send the "Buffer underrun" event. Again when the video starts playing again, the "Buffer underrun first frame" event is sent, alongside the rounded buffering time.

So now we've got our eVars for video ID, video play
er and buffer time, alongside these video events. We can report on any of these events happening with those eVars.
A simple report looks something like this using a simple "buffer underruns / first frame" calculated metric to show us the videos that people are finding most problematic.

The problem is that the "worst" videos according to this report are viewed by a miniscule number of people. Some poor sucker has been desperately trying to watch VOD:37367 on his dialup connection, with 521 buffer underruns. That doesn't sound like much fun, but it doesn't really help us optimise our service for the majority of people.
So I came up with some simple calculated metrics that insert a modifier to show us poorly performing videos, but weighted towards ones where the poor performance is widespread.

So now we can see there's a few videos that are problematic, and for a reasonably number of people. We can start optimising and seeing if we can improve this experience.

That calculated metric:
[Video buffer underrun] / [Video first frame] * ( [Video first frame] / [Total Video first frame] ) * 1000
Simon Rumble