Usability, content, search and analytics on the corporate intranet

How to track documents as pages in Google Analytics

23 January, 2014 – Luke Oatham

 

“But where are my downloads?” cried the website manager upon looking at her Google Analytics reports.

“They’re under Events,” replied the support guy.

“Oh,” said the confused manager.  “…What’s an event?”

People don’t like documents on the web or the intranet as a general rule. They prefer HTML pages. But sometimes they’ll go to the lengths of downloading a large, image-heavy PDF brochure or a 16 page, typeset application form.

Sound the trumpets! An event has occurred! Someone downloaded a document!

But hold on. This isn’t an event. It’s just someone reading your content, which happens to be in a different format to HTML. Why track it in a different section of your analytics? Shouldn’t you track it as part of your content?

If you track documents as part of your content, they become part of the content flow. They become page views and, as such, will have bounce rates, timings, referrers and unique views. This gives so much more information than a simple count of events.

To setup documents to appear as part of your content you’ll need to change your tracking code and setup some filters in Google Analytics.

Don’t pass this point if you don’t want to get your hands dirty!

Setup the tracking code

The trackEvent function is used in Google’s examples as a way to track document downloads. You need to change any existing tracking code for documents on your pages from the trackEvent function to the trackPageView function. See example code below.

If you don’t currently track documents, you can add a bit of jQuery to every page that will run when pages load, going through and adding the onClick event to the document download links, triggering the tracking code when the link is clicked. You’ll need to put this code within a SCRIPT tag on every page or include it as a separate .js file. Make sure that you have already included your regular Google Analytics tracking code.

Code available on jsfiddle

function gaTrackDownloadableFiles() {

var links = jQuery(‘a’);

for(var i = 0; i < links.length; i++) {
if (links[i].href.indexOf(‘.pdf’) != “-1”) {
jQuery(links[i]).attr(“onclick”,”javascript: _gaq.push([‘_trackPageview’, ‘”+links[i].href+”‘]);”);
} else if (links[i].href.indexOf(‘.csv’) != “-1”) {
jQuery(links[i]).attr(“onclick”,”javascript: _gaq.push([‘_trackPageview’, ‘”+links[i].href+”‘]);”);
} else if (links[i].href.indexOf(‘.doc’) != “-1”) {
jQuery(links[i]).attr(“onclick”,”javascript: _gaq.push([‘_trackPageview’, ‘”+links[i].href+”‘]);”);
} else if (links[i].href.indexOf(‘.ppt’) != “-1”) {
jQuery(links[i]).attr(“onclick”,”javascript: _gaq.push([‘_trackPageview’, ‘”+links[i].href+”‘]);”);
}
}
return true;
}

In Google Analytics reports, the domain name does not appear in the page URL for regular HTML pages, you’ll just see the initial backslash followed by the page URL. However, documents will appear with the full URL including the domain name. This can look a bit messy and is hard to read on your reports, when all you’re actually interested in is the document name which appears way down the end of the URL.

To present this in a better way when running your reports, we can add some filters to the GA account which will prettify the incoming document data.

Configure Google Analytics

You’ll need to create a filter. In your Analytics account, go to the Admin section.

Google Analytics admin button

Then within your View panel, choose Filters.

Google Analytics Filters button

Add a new filter and call it “remove domain name”. Then choose Custom filter, followed by Advanced.

Add filter - step 1

In the next form, choose Request URI  for both Field A and Output To.

Add filter - step 2

In Field A you’ll need to specify the URL pattern for your document folders. The URL pattern needs to be in a specific format, known as regular expressions. You need to work out what your common folder path is.

Here are some example folders:

Example 1
/docs/annual-leave-policy.doc
/docs/eye-test-voucher.pdf
/docs/staff-magazine-2012-09.pdf

Example 2
/hr/leave/policy/downloads/annual-leave-policy.doc
/health/dse/downloads/eye-test-voucher.pdf
/news/downloads/staff-magazine-2012-09.pdf

In example 1, the common path is /docs/ since all documents are stored in this folder.

In example 2, the common path is /anything/downloads/ since all documents are stored in a downloads folder, somewhere within a hierarchy of folders. Note that anything could represent forward slashes in addition to other letters and characters.

When you have worked out your common folder path, add your domain URL to the start, then take off the http:// or https:// bit, so that you have something like:

Example 1: intranet.luke.co.uk/docs/
Example 2: intranet.luke.co.uk/anything/downloads/

Apply the following rules:

  • add a backslash in front of any dots \.
  • add a backslash in front of any forward slashes \/
  • replace anything with (.*)

Example 1: intranet\.luke\.co\.uk\/docs\/
Example 2: intranet\.luke\.co\.uk\/(.*)\/downloads\/

Finally, add (.*) to the end, giving you something like:

Example 1: intranet\.luke\.co\.uk\/docs\/(.*)
Example 2:
 intranet\.luke\.co\.uk\/(.*)\/downloads\/(.*)

Et voila! This is your regular expression. Add your expression to Field A.

Then count how many times (.*) occurs in your expression. Use this number in the Output To field, preceded by $A

Save the filter.

There’s one more thing that you can do to improve your reports. By default, the documents will be entered into Google Analytics with the page title of the HTML page where the click to download the document occurred. To include the document filename in the page title in addition to the title of the page being viewed when the link was clicked, it’s back to filters again.

Add a new filter and call it “page titles for documents”. Setup the filter as follows:

Field A: Request URI
(.*\.doc|.*\.docx|.*\.xls|.*\.xlsx|.*\.pdf|.*\.ppt|.*\.pptx|.*\.jpg|.*\.gif|.*\.png|.*\.zip)

Field B: Page title
(.*)

Output To: Page title
$A1 ($B1)

Page titles for documents filter Save the filter.

On your filters lists, make sure that your page titles filter is below your remove domain name filter, otherwise your page titles will appear with the full URL. If necessary use the assign filter order option to change the order.

Filters

Now you should start to see documents appearing in your content reports, including Realtime:

Realtime analytics showing documents and source page

Realtime analytics showing documents and source page

NOTE: changes to filters have a permanent effect on your data view, so make sure that you have a copy of your raw data in another view or profile.

Leave a Reply

Your email address will not be published. Required fields are marked *