It’s not very often that I get to work on some software that has the potential to appeal to developers, testers, designers, and the marketing team all at once. And of course when I do get to work on something like that, it usually means there is a significant amount of pressure to get it done and done quickly. My work on dojox.analytics has been one of those rare instances when I’ve been able to work in peace on writing simple and useful code that can entertain a wide variety of use cases.
dojox.analytics is a small project, both in aspirations and in size. It has a simple goal of logging browser and application data to the server for review. This data can be used to monitor application performance, effectiveness, and quality, or it can be used for custom data collection to identify or monitor a business-specific use case.
The software is a tiny little logger that has a very loosely defined plugin system. It is a collection of objects that monitor some specific aspect of an application or its environment and then pushes the data it collects to the main logger, which in turn pushes this data to a server at a configurable rate. Currently there are plugins for the console, window info, Dojo Toolkit info, mouse position sampling, and mouse click events. Not too complicated, not too difficult, but it opens up a world of utility.
None of this, of course, is a new idea; we build on what exists and what we can see as other uses for a utility. In this case there are a number of different products that do something similar, Google Analytics not least among them. There are also other products such as Firebug for iPhone that essentially do the same thing, but for an entirely different purpose. dojox.analytics is meant only to provide client side code that can enable these other projects, and do so in a way that is simple and will not get in the way of the loading or performance of an application.
What is in dojox.analytics?
The _base package of dojox.analytics defines a singleton, which is designed to be loaded at the beginning of an application but really only needs to be loaded before any of the plugins, so that they have a logger to attach to. The core code has basically one useful method, addData().
dojox.analytics.addData("SuperImportantModule",
"There was a very serious error here");
addData()” takes an arbitrary number of parameters, packages them as a single data point, and adds them to a send queue. The configurable poller periodically processes the queue, sending any newly received data off to the server. The interval is defined by the “sendInterval” parameter which is passed as a djConfig option. The post to the servers defined to be sent to the URL defined by “analyticsUrl” and using a method defined by “sendMethod”. The method is either “xhrPost” or “script”, with the default being script based io (JSONP style).
Given that you could potentially collect a lot of data quickly if you aren’t careful or are overzealous, the size of a request can grow outside of the bounds of a valid script based request by exceeding the URL length limit imposed by browsers, and most seriously restricted in IE. In order to avoid related problems, a “maxRequestSize” parameter can also be defined (defaults to 4000) which will not allow a request to be larger than the given size. It also caps IE’s size below this regardless of the “maxRequestSize” specified. Requests that are larger than the max size are automatically split up and delivered as multiple requests. No data will start to be sent, regardless of what has been collected, until after the page load event has fired and the application as been given the opportunity to startup. We want to make sure not to disturb the performance of the application at all costs. It’s better to lose the data than get in the way.
That’s pretty much all there is to the base—the real interesting work is in the plugins.
consoleMessages
The consoleMessages plugin connects to a definable set of events on the console object and passes any of their parameters to the logger to be passed to the server. By default these parameters are error, warn, info, and rlog. The plugin verifies the existence of the console object, creating one if necessary. It then attaches to one, or failing their existence, creates the methods as necessary. The method names simply get added to the server logs as part of the existing addData call. Internally, the plugin wraps addData using dojo.hitch. For example, dojo.hitch(dojox.analytics, “addData”, “consoleMessage”, methodName, arguments); Any methods that didn’t already exist on the console object are added to the console object, but will have the effect of logging data with that method only to the server. If the method already existed on the console object, then it will be logged to the console as normal, but will also be logged to the server. The “rlog” method is created by default as a way that an application developer could specifically log to the server without logging to the console, even when Firebug is enabled. As many of these functions can be added to provide the exact amount of granularity you might want.
dojo
The dojo plugin packages up the information that the Dojo Toolkit sniffs on load such as browser information, Dojo Toolkit version, etc.
window
The window plugin collects information from the window object and packages it up for collection.
idle
The idle plugin tracks whether a user has become idle and/or regained activity. The length of time before coming idle is controlled via the “idleTime” parameter in djConfig
mouseClick
The mouse click plugin tracks any time the mouse is clicked in the window. It records information such as mouse position and target information. This can be used to track clicks for items that are leaving a site/page. Every attempt is made to get this data sent off to the server before the page is lost, and is often successful, though not 100% (sometimes the data doesn’t get logged because the browser moves on to the next page before the final log gets sent or at least before it has been completely sent). Any suggestions for improvement here are welcome!
mouseOver
The mouseOver plugin simply samples the mouse every X seconds where X is defined by the “sampleDelay” parameter which defaults to 2500 ms. The data is similar to that of the onclick including the targets of the items the mouse is over and the mouse coordinates. There is also a “targetProps” parameter that allows you to define which properties of a target (target, originalTarget, explicitOriginalTarget) you want to track. Note that before messages are sent off to the server they are converted to JSON, and so care needs to be taken not to include targetProps that would create infinite recursion in dojo.toJson().
All in all a simple set of functions that collect data and log it to the server in an unobtrusive fashion. Of course this is all the easy part. The real work is involved in analyzing and making sense and use of the data. With other applications such as Google Analytics, the logging goes to the service provider, who then do this data analysis for you. Google undoubtedly has data mining experience as well as a large amount of processing power. At the same time, others like to log data to their own servers. Maybe they don’t like to share their data with third parties or perhaps they simply want to combine this data with data from other sources such as their log files.
This data can clearly be used for marketing analysis to see if changes to the platform need to be made, but it is the other analysis that it can open up that I’m interested in.
- Testing and Quality Control: The system can be used to collect data from applications in beta or in testing or even to simply record serious errors for applications that are live in production.
- UI analysis: Information such as common use paths, heatmaps, and other such User Interface data can be collected and analyzed.
- Debugging: Simply logging all console information to the server can provide an easy “remote” debugging console for any browser, though it is particularly useful for debugging IE given all the scratches on my cornea from previous IE debugging sessions (though the recent IE8 has a Firebug like utility that sounds promising)
Now you might be thinking, yah there are possibilities, but do I want to go through the trouble? Is it hard? Do I have to learn the Dojo Toolkit? Can I use it with Foo Package?
The answer is that it is easy and works with anything. If it doesn’t it’s a bug in my mind. There are essentially two ways you might want to use the package. One is for Dojo Toolkit users. These lucky individuals with incredible foresight will only have to dojo.require() the package like anything else they are already doing or even use the Dojo Toolkit build system to build dojox.analytics into their own custom build layers. Unfortunately there are plenty of people out there who don’t or can’t use the Dojo Toolkit and they deserve the benefits just as much as I do. For them, we provide a custom Dojo Toolkit build with analytics automatically included. This is a single script tag that can be included in any arbitrary page. Configuration parameters are set as attributes on the <script> tag, just as they are with djConfig, and no other code or configuration is required. This can even be loaded cross-domain from the AOL CDN. Inclusion might look something like this:
<script type="text/javascript"
djConfig="sendDelay: 5000, sampleDelay:10000" src="dojoxAnalytics.js">
</script>
I don’t think it can get much simpler than that.
What about size of this package, you ask? While I’m aiming to make a custom minimized build of the dojo _base that will get included as part of the above build, I haven’t completed that yet. Currently the build is 26KB compressed and gzipped, about 1KB larger than the standard dojo.js. Realistically I think this can be shrunk a little bit further by removing things from base that aren’t needed for this project. The Google Analytics ga.js file is currently 19KB, so don’t think this is doing all that bad for a first pass, not to mention this assumes that _all_ plugins are used, which is likely not to be the case in many instances, allowing for further optimization.
Thoughts? Suggestions? Other use cases?