Web trackers (called also web bugs, beacons, pixels or tracking tags) are used in tracking visitor behaviour/activity on websites. In this post I explore workflow of a web analytics tracker as well as point out potential pitfalls and issues to pay attention to when designing your own tracker.
Web analytics tracker workflow
A high-level workflow of a web analytics tracker is the following:
- Browser starts loading a page, e.g. example.com,
- Tracker tag (piece of JS code placed in the html code) executes, and:
- it loads a tracker class from external location (e.g. mytracker.com/tracker.js),
- it creates an instance of the tracker object (e.g. myTracker) passing URL of the server-side tracker and id of the website,
- it calls tracker main function (e.g. myTracker.trackPage()), passing optionally additional information (e.g. user demographics or other 1st party data).
- Tracker collects information:
- it collects page-level information (e.g. current page URL, page title),
- it collects browser-level information (e.g. enabled plugins, screen resolution, browser language),
- it looks up first party cookie, if not found, generate UUID and save it in the cookie,
- Tracker makes a request to server-side tracker
- Tracker creates an URL string to server-side tracker, encoding all collected and passed to the tracker information in GET variables + adding a random parameter (to prevent caching),
- Tracker makes a request to server-side tracker, usually by creating a DOM Image or Script object and setting its source (location) of the URL from step above (Note: 3rd party cookie is managed by server-side tracker),
- Server-side tracker responds with a 1×1 px transparent gif or an empty script.
- Browser starts loading a page, e.g. http://example.com/,
- <noscript> section loads pixel code from server-side tracker location passing in GET parameters id of the website (and possibly other, hardcoded parameters).
- Server-side tracker responds with 1x1px transparent gif.
Of course, in real web trackers, the logic is usually a bit more complex, e.g. a web analytics tracker may hook up to various elements on the page and browser events (e.g. onClick for tracking outgoing links and downloads), could be loaded asynchronously, or may do so called cookie respawning, a technique in which in case cookie was not found (user possibly deleted it) tracker logic tries to restore it from alternative places where it saved the visitor profile id such as flash cookies or HTML5 local storage.
A couple more notes and things to pay attention to:
- web bugs frequently use fingerprinting, instead of relying on cookies (this is default setting we did at Piwik.org) as this is more reliable technique,
- in case JS tracker requires data to be loaded from server, look for a lightweight JSON-P library on github,
- make sure you have very good data validation, as you will see a lot of strange/junk data coming to the tracker,
- when you need 100% reliability or you pass sensitive information such as affiliate commissions, or conversions, use server-side tracking methods (called pingbacks or postbacks) - data passed to the tracker may be always altered by an attacker!
More about tracking in advertising.
This article is continuation of series of posts about tracking in advertising. More is coming, stay tuned! Other posts published so far:
- What is the tracking flow for Yahoo! Web Analytics? http://help.yahoo.com/l/us/yahoo/ywa/faqs/tracking/firststeps/3520322.html
- Wikipedia: JSONP, Web_bug
Post illustration source: Wikipedia Commons