How to track documents as pages in Google Analytics


“But where are my downloads?” cried the website manager upon looking at her Google Analytics reports.

“They’re under Events,” replied the support guy.

“Oh,” said the confused manager.  “…What’s an event?”

People don’t like documents on the web or the intranet as a general rule. They prefer HTML pages. But sometimes they’ll go to the lengths of downloading a large, image-heavy PDF brochure or a 16 page, typeset application form.

Sound the trumpets! An event has occurred! Someone downloaded a document!

But hold on. This isn’t an event. It’s just someone reading your content, which happens to be in a different format to HTML. Why track it in a different section of your analytics? Shouldn’t you track it as part of your content?

If you track documents as part of your content, they become part of the content flow. They become page views and, as such, will have bounce rates, timings, referrers and unique views. This gives so much more information than a simple count of events.

To setup documents to appear as part of your content you’ll need to change your tracking code and setup some filters in Google Analytics.

Don’t pass this point if you don’t want to get your hands dirty!

Setup the tracking code

The trackEvent function is used in Google’s examples as a way to track document downloads. You need to change any existing tracking code for documents on your pages from the trackEvent function to the trackPageView function. See example code below.

If you don’t currently track documents, you can add a bit of jQuery to every page that will run when pages load, going through and adding the onClick event to the document download links, triggering the tracking code when the link is clicked. You’ll need to put this code within a SCRIPT tag on every page or include it as a separate .js file. Make sure that you have already included your regular Google Analytics tracking code.

Code available on jsfiddle

function gaTrackDownloadableFiles() {

var links = jQuery(‘a’);

for(var i = 0; i < links.length; i++) {
if (links[i].href.indexOf(‘.pdf’) != “-1”) {
jQuery(links[i]).attr(“onclick”,”javascript: _gaq.push([‘_trackPageview’, ‘”+links[i].href+”‘]);”);
} else if (links[i].href.indexOf(‘.csv’) != “-1”) {
jQuery(links[i]).attr(“onclick”,”javascript: _gaq.push([‘_trackPageview’, ‘”+links[i].href+”‘]);”);
} else if (links[i].href.indexOf(‘.doc’) != “-1”) {
jQuery(links[i]).attr(“onclick”,”javascript: _gaq.push([‘_trackPageview’, ‘”+links[i].href+”‘]);”);
} else if (links[i].href.indexOf(‘.ppt’) != “-1”) {
jQuery(links[i]).attr(“onclick”,”javascript: _gaq.push([‘_trackPageview’, ‘”+links[i].href+”‘]);”);
return true;

In Google Analytics reports, the domain name does not appear in the page URL for regular HTML pages, you’ll just see the initial backslash followed by the page URL. However, documents will appear with the full URL including the domain name. This can look a bit messy and is hard to read on your reports, when all you’re actually interested in is the document name which appears way down the end of the URL.

To present this in a better way when running your reports, we can add some filters to the GA account which will prettify the incoming document data.

Configure Google Analytics

You’ll need to create a filter. In your Analytics account, go to the Admin section.

Google Analytics admin button

Then within your View panel, choose Filters.

Google Analytics Filters button

Add a new filter and call it “remove domain name”. Then choose Custom filter, followed by Advanced.

Add filter - step 1

In the next form, choose Request URI  for both Field A and Output To.

Add filter - step 2

In Field A you’ll need to specify the URL pattern for your document folders. The URL pattern needs to be in a specific format, known as regular expressions. You need to work out what your common folder path is.

Here are some example folders:

Example 1

Example 2

In example 1, the common path is /docs/ since all documents are stored in this folder.

In example 2, the common path is /anything/downloads/ since all documents are stored in a downloads folder, somewhere within a hierarchy of folders. Note that anything could represent forward slashes in addition to other letters and characters.

When you have worked out your common folder path, add your domain URL to the start, then take off the http:// or https:// bit, so that you have something like:

Example 1:
Example 2:

Apply the following rules:

  • add a backslash in front of any dots \.
  • add a backslash in front of any forward slashes \/
  • replace anything with (.*)

Example 1: intranet\.luke\.co\.uk\/docs\/
Example 2: intranet\.luke\.co\.uk\/(.*)\/downloads\/

Finally, add (.*) to the end, giving you something like:

Example 1: intranet\.luke\.co\.uk\/docs\/(.*)
Example 2:

Et voila! This is your regular expression. Add your expression to Field A.

Then count how many times (.*) occurs in your expression. Use this number in the Output To field, preceded by $A

Save the filter.

There’s one more thing that you can do to improve your reports. By default, the documents will be entered into Google Analytics with the page title of the HTML page where the click to download the document occurred. To include the document filename in the page title in addition to the title of the page being viewed when the link was clicked, it’s back to filters again.

Add a new filter and call it “page titles for documents”. Setup the filter as follows:

Field A: Request URI

Field B: Page title

Output To: Page title
$A1 ($B1)

Page titles for documents filter Save the filter.

On your filters lists, make sure that your page titles filter is below your remove domain name filter, otherwise your page titles will appear with the full URL. If necessary use the assign filter order option to change the order.


Now you should start to see documents appearing in your content reports, including Realtime:

Realtime analytics showing documents and source page
Realtime analytics showing documents and source page

NOTE: changes to filters have a permanent effect on your data view, so make sure that you have a copy of your raw data in another view or profile.

Tracking offline comms channels in Google Analytics

Google AnalyticsWhen it comes to evaluating staff communication campaigns, we have to look at both online and offline channels. Using Google Analytics campaign tracking we can capture activity on the intranet as a result of clicks in emails and documents that are sent out to staff.

There are a few loopholes when tracking in Google Analytics (GA) for example, linking directly to a PDF document or linking to other documents from within a document (the russian doll effect), but GA can track pretty much most of the activity that happens on the intranet including tracking offline traffic sources.

I blogged a few weeks ago about the overbearing transformation campaign in the workplace, but there are many smaller campaigns that go on as part of the daily business, which may or may not form part of the larger programme. Not having any platform for social media in our IT setup, email is still a big channel for staff communications. Most of the comms will link back to the intranet where the meaty content lives. From here, staff can get HTML pages, documents and the odd bit of video to support the communication. It is this core intranet content that should ideally be the finally destination of any communications campaign.

For a long time, I have been campaigning to tag non-digital channels such as email and PDF documents with campaign tracking code that will feed back into GA. It’s nice to see that our corporate comms team are now doing this consistently with their campaigns. It has been hard to get these non-technical colleagues to add the bit of tracking code to the end of links in emails and documents. It’s hard enough getting the right URL. But Google have an online URL builder that makes the job easier and seems to be working wonders with the consistency of the campaign tracking.

The data from offline comms is now flowing through to GA and we can now realise the benefits of spending that little bit of time in adding the tracking code. By tallying-up with the original amount of mail sent, we can produce some statistics on click-through rates, visit duration, bounce rates etc. For campaigns that link back to intranet news stories, we can measure online votes against the different channels.

I’ve been reading a bit online recently about how relevant votes really are on intranet pages, and I agree to a point. Our news stories are written in the same format, in the same tone and all that changes is the content, both the written word and photographic content. I do believe that our system of “more like this” and “fewer like this” can help our editors in segmenting content themes and audience types (through campaign tracking), producing some concrete intelligence.

Of course we can’t evaluate whether behaviours or opinions have changed using GA. We need surveys and other means to do that. But we can see that people are visiting the core intranet content and how long they are spending there – as a result of the non-digital channels. Put this with data from direct intranet traffic and we can get a more-rounded picture of communications campaigns.

You need to be consistent with tags that you use for the campaign name, source and medium. If you stick to a plan then the results in GA will be more useful.

Non-digital campaign tracking in Google Analytics
Non-digital campaign tracking in Google Analytics

Interpreting the results is not foolproof. Going back to the news story example, it may be difficult to get good timings if staff are just being directed to a news story and there is nothing to do on the HTML page such as download a document or go to the core intranet content. GA will track a start time but will not track an end time and will consequently result in a bounce.

If a comms email contains a list of links to news stories then the second news story clicked will end the timing for the first story clicked, the third will end the second and so on. Meaning that the last story clicked will result in a bounce and won’t be included in timings. Communications and news stories should ideally link to the core intranet content where there is something to do, more information to find, a form to download or a supporting document to read.

As I prepare to start my new job at Helpful Technology I leave our comms team thinking about using GovDelivery to send staff emails. I blogged previously on our success with GovDelivery email alerts and newsletter management and, since writing, the system has been constantly improved. Using it internally would also provide great reporting on campaigns. I won’t have the chance to post about the outcome of this but you may find more in due course at

Google Analytics Realtime intranet statistics

It was so satisfying, when I first caught wind of Google Analytics realtime, to just go to my analytics account, switch on the live statistics, and watch. No extra code; it just worked. Google have come up trumps yet again with this recent addition which is so useful, and free.

Only available in the “new” interface, realtime shows how many people are on the site right now. It shows a live ticker-graph of pageviews per minute and per second, the top ten pages in terms of active visitors (right now!), where they came from, what keywords they used and their location on a world map.


You can drill down and monitor segments of data and zone in on individual pages. We have used this technique, running GA on an iPad, during a recent webchat that we ran internally. The GA display showed how many people were tuning into the webchat and really helped in providing live evidence to stakeholders. Slightly amusing to note that at one point there were more people involved in “producing” and handholding than were actually tuning in to take part!

We have also used this to monitor and test our online communications channels. for example, watching after we have published a corporate announcement to see how much uptake it gets.

In terms of intranet usage, you get a good idea of your killer content, which changes depending on the time of day. I’m half-thinking of looking into some kind of timer switch on our “most popular” panel boxes. Of course it would be great to provide live, up to the minute, “trending now” information. But until we have the sort of content that has a bit more “flow”, I suspect that the results wouldn’t change too much.

Proudly on display, in the heart of the digital team, we now have a monitor displaying live stats throughout the day. And when we’re not eating another Malteser each time the figures go up, this new weapon in our analysis and evidence arsenal should prove to be a good tool for highlighting the importance of intranet content and applications and how staff consume, use and interact with them.

See also:

Intranet information design must follow staff culture

Sometimes users don’t act as you’d expect. My recent analysis of search terms that staff use to find the online learning area of the intranet showed some unexpected results.

A while back, our Human Resources department introduced an online learning area to the intranet. They wanted to label it as “JusticeAcademy.” I argued that we shouldn’t use “marketing” language and to try something like “learning and development” or “online training” which would more accurately describe the content within.

After a fair bit of debate, and because all the communications and marketing material was already in process, we went with the “JusticeAcademy” label, although I managed to slip in some keywords to help staff searching for terms like learning and development, e-learning, CBT and online training.

I checked out our intranet analytics reports for the past year, splitting staff searches into 2 groups; “JusticeAcademy” and “learning and development and everything else”. There were 26,139 individual searches for this area of the intranet. And to my surprise, 91% of these searches were for “JusticeAcademy.”

Search terms used by staff for the online learning area of the intranet
Search terms used by staff for the online learning area of the intranet

Usability best practice tells us not to use flowery marketing names and instead to use plain language. So why are staff searching for a branded name instead of using regular plain language terminology as expected? The answer is that the HR department have marketed the online learning area of the intranet very effectively. All literature and communications have managed to get the new name across and staff know where to go for their training needs.

Ironically, we are soon launching the newly rewritten HR area of the intranet because content ownership has moved to a different part of the organisation. This area will still incorporate the online learning section and the content owners want to call it “Learning and development”. So now I have to go back to them to say “You know, a few years back I wanted to call it learning and development, but, actually…”