What's missing from Google Analytics?

While looking at the pricing of our analytics service, my boss asked why we couldn't use Google Analytics and I've had a bit of a think about it. On the surface there's not a huge amount between the two really. Google Analytics has a great user interface, is well understood by developers and does almost everything right. But there's this one thing that is a show stopper for most of the places I've ever worked.

The problem is to do with how Google Analytics sets the cookies that identify visitors. Most web analytics tools set this in the HTTP header of the response from the data collection server. In the diagram below we see a company that has two domains. The cookie is set in the ".BigCompany.com" domain as a third-party cookie in the response the first time the visitor sends a data collection beacon. The upshot is that when the visitor goes to the "BigCompanyShop.com", that cookie identifying the visitor is also sent to the data collection server.

By contrast, the way Google Analytics works is to set a first-party cookie in JavaScript on the current domain. That means when a visitor goes to another domain, that cookie isn't available and so the visitor identifier is different, as you see below.

Yes, Google and third parties provide a few workarounds for this. They either don't work in all browsers or rely on the visitor going between the domains by clicking a URL that embeds the visitor identifier. If you want to see the overlap between two only-slightly related domains, this approach just isn't going to work. And if you're unable to pass the user through via an HTTP GET instead of a POST, you're out of luck for the most used browsers.

This is a really strange limitation in GA. The only reason I can imagine for it is that it makes the data collection servers much simpler and thus more easily deployed in the Google server architecture. To collect data the way Omniture, WebTrends et al do it, they'd need to be setting and refreshing unique identifiers on every data collection. Not ridiculously complicated, but a specialised data collection server which I understand is hard to get deployed in the core Google infrastructure.

The scale of this problem is huge. I haven't worked for a single company doing web analytics where the company has only one web site. You end up having extra domains for historical reasons of someone working around the domain names gatekeepers, or extra brands the company owns or acquires, and all kinds of reasons. This even happens in relatively small companies.

It's a big deal and until Google fixes it, it's going to continue to be a major limitation of Google Analytics.