Own web analytics for startups – Part V

Part V – we talked about data analysis but we haven’t forgotten about data collection. Next we bring in some important from our point of view analytics functionality which is not hard to implement – Events.

In our last post we defined web analytics as data collection and analysis but talked about analysis only. Today we will talk about data collection. If you think about data collection – short answer will be – we do not care how data are collected if web analytics answers all our questions perfectly.

There is only one place where analytics data are collected – log files*. That said there are two ways to direct data to log files. First – web server with standard or custom logging will store each and every request in its’ own access log file. Second – JavaScript inside web page. It sends HTTP request to a web server from client triggered by page load or certain event. Web server again saves such request in own log file. JavaScript is the only way to collect web analytics data when you use external service like GA or Clicky. Essentially a web analytics service provider asks you to send them your web server logs this way. Thus, you have to insert JS into all pages, increasing their loading time, send the additional traffic to the web analytics provider, and are then compelled to study their API to upload your own log files back and to write the reports answering your questions. Think for a minute. Do you see something illogical in such an approach?

One more scenario where JavaScript is needed as an analytics data collection method is when you need to artificially create lots of events which will not otherwise leave a trail in web server log files. This is needed when you want to build all sorts of heatmaps – activity, attention or to watch recorded visitors’ behaviour. We mean such services as ClickTale or CrazyEgg. We do not need such functionality now and I am in doubt if we will need it at all.

It turns out that it will be much easier for us to customise and write logs using web server not JavaScript. If later we want to register an event on page which will not be recorded by web server, we will add additional JS code which will generate a HTTP request to a web server. If you have anything to add about data collection in general irrespective of implementations we will be glad to hear.

The problem with use of external web analytics systems often consists in that you do not know how clever they are in respect of elimination of noise and if they are able to tease out the main thing. Here what Joshua Porter writes about it in his book ‘Designing for the Social Web’:

Use in-house metrics
If you set up your own data-collection system, you’ll know exactly what it is measuring. If you rely on a third-party system, you might get into guessing games about what the numbers mean, because you don’t know the particulars of how they work and what they track. Invariably, if you don’t control your own collection process, you won’t know all there is to know about what you are measuring.

Of course there is the likelihood that doing analytics by ourselves we will screw it up but it is much easier psychologically to accept an own error and correct it rather than keep paying for the service and staying in the dark.

I would like to underline that the web analytics gives us a very approximate picture, which will always differ from reality. Therefore, we believe that the web analytics goal is to come back to reality as close as possible.
Let me explain what I mean. One reason for inaccuracy of the data collected lies in the fact that it is impossible to identify the same person who visited your site from different computers or cleaned cookies in browser until he has passed the authorization. There’s nothing we can do about it. However, if he visited some pages, and logged in later, then nothing prevents us to count him as user from the very first visit, as if he would have immediately been authorized. Thus we reduced number of page views for visitors and increased number of views for users and made the picture closer to reality, isn’t it?

Another example – with the appearance of tabbed browsers (talking about myself, but I do not think I’m alone in this sense), and the possibility to restore them between runs, I use them as ‘random access’ bookmarks. There I usually have web pages I use every day as well as web sites that may potentially be needed, such as logged in the sessions into various web services that I would like to review. These bookmarks can hang out (for me) for several months, which means that each time I start browser someone having web analytics on the other ‘end of wire’ counts me as a visitor, even though I do not even see their page. Is it possible to recognize in the web analytics such behavior as noise? Do you know if external web service you use can do it?

ClickTale is working in this direction, but I think there are easier ways neither Google nor ClickTale had time to think. Super intelligence in this case would be to ignore in web analytics all repeated visits to a page if the page requires an action (such as moving to a different page). What does it mean if the same visitor comes to your home page if no further action is going on? If you are in web application business – most likely, nothing. If you are running a popular blog where home page contains new posts’ announcements – nothing touched your visitor. They come to scan posts and leave without opening full posts. As you can see there is no only and true recipe here too. Web applications require action, content sites; on the contrary require a user to stay on a web page.

While evaluating web analytics systems and later thinking about what we actually need, I caught myself thinking that none of the systems I saw has a functionality to add events. As usual when I finished writing this post I found Annotations in Google Analytics. Well – we are not genuine owners of an idea this time but it is a good validation of our progress.

Examples of events:
- Redesigned home page (was – link to screenshot, has become – another screenshot)
- A popular blog wrote about us
- Added a new feature to the product

All you need to create a new event – is to enter a date, short and full description. Wouldn’t you agree, however, that if you modify the design of the site or change the product’s interface, it would be nice to be able to link to the event not only the description ‘added synchrophasomometer interface on thermotrometer management page’ but also screenshots before and after the change. It could be also a diagram if you changed not UI but your application internals. Let alone it couldn’t be any easier to implement – make a folder available from your web server, upload the screenshots, and insert URLs in an events’ full description.

Keeping events in web analytics at hand, it is easier to repeat those ones which produce outcomes we are happy with. After all, how can you learn, if you do not see an explicit dependency between cause (the event) and consequences (traffic, conversion rate of visitors into registered users, etc.)?

If web analytics cannot show you these events at your will – eyeballing the graphs will be no different from the process of visiting an art gallery.

On the other hand we would like web analytics itself to try and locate events in the spikes up or down on a graph and let us know about them. After all, there are events affecting us we know nothing about – for example, someone somewhere has written about us or our product (good or bad). So seeing a spike and using the standard deviation (do not worry it’s just a simple formula) web analytics system can process the HTTP referrers and tell us where most visitors forming this peak came from or what visitors contributing to a spike down were coming to us. Now we can make a decision and create an Event linked to a spike if the conclusion of web analytics was correct. Yes, yes – we know about the landing page concept but it works only when you yourself created a link to itself. Besides, why would you absolutely everywhere create landing pages?

All what we said above will be implemented by us with time. So then – in the next post we will decide what components to use and how to utilize a session and a cookie. Next we integrate selected components to share how we did it and talk how we embedded code for web analytics data collection into them. After this we will add events. Finally we will launch our product with surrounding systems we were talking for so long on this blog pages and start write first SQL queries answering questions we raised. How’s that plan?

* – In this case we do not take into account systems allowing to monitor what people say about us in the Internet community. They are essentially a specialized web analytics systems based on search.

Own web analytics for startups – Part IPart IIPart IIIPart IV

Print this post | Home

Comments are closed.