How we improved our intranet search experience

We use the Google Search Appliance (GSA) across our family of intranets. In 2009 we launched a new search experience to coincide with an upgrade to the GSA.

Analysis of original search experience

I so wish I had some screenshots of the original search results pages. If I manage to find some, I’ll post.

Back in 2008, a typical search from any one of the family of intranets produced results from all the intranets. I know it is considered best practice to include everything in the initial scope of a search and then allow users to winnow down the results. But in this case, because each intranet served a very specific part of the organisation, results from all intranets clouded the experience. Inadequate metadata and file names made the experience worse and the interface was generally busy, as if someone had got into the admin page and turned on every option. Just because you can do something, doesn’t mean you should.

Search interface redesign

For the interface, I decided to localise the results and narrow the scope of the initial search to the intranet that staff search from. I thought that it would improve orientation. So if I search from the HQ intranet then I just get results from the HQ intranet. Same goes for intranet 1, 2, 3…

For the results page I included the Google logo to psychologically increase confidence in the results. I placed a nice wide search box at the top, pre-filled with the original search phrase to allow staff to easily refine their query. I used a drop-down menu so that staff could switch to results from a different intranet. The presentation of the results themselves also followed Google’s public design with a title, snippet, URL and links to the cached version.

I decided to use icons for downloads instead of the existing [PDF] text that came by default with the GSA template. The template also included the amount of time it took to fetch and display the results, which I dropped in favour of simplicity. I changed the *Cached* link to *Text view* in the hope of encouraging staff to use this is a quick method of viewing PDF and Word files in a text version rather than waiting ages for the application to load.

Advanced search is available, but inconspicuous, since my experience of watching users try advanced searches is that they tend to overly restrict themselves. I also rewrote the help page.

The results display by relevance with the most relevant at the top (I’ve never understood some websites which clutter the search results page with a relevance or significance scale against each result; surely the first result is ALWAYS the most relevant?) There’s also the option to sort by date.

GSA backend tweaks

In addition to the interface, I also had a play around with the backend configuration with the aim of improving search quality, relevance and general usability.

Date stamping

For each entry on the search results page there is a date. But what does the date mean? It is not always the date when the page or document was last amended. Consider a date-specific news article that is published and then gets amended several months later. Which date do we show in search results; the date of the article or the date is was last amended? This is just one question that I had to face while tweaking the interface. It’s possible to extract a date for inclusion in search results from the page or document title, filename, URL, from within the content itself, from metadata or the file datestamp. That’s a lot of choices. Our search result dates are contextual so that if you see a news story, the date reflects the initial publishing date. If you see minutes from a board meeting on a certain date, that’s the date we’ll show.

Manual biasing

You can manually control the ranking of documents or folders within the GSA. I use this functionality to manually demote date-specific content, such as news stories and meeting minutes and to promote popular areas.

When it comes to configuring the backend, it really helps if you are working with a well organised folder structure and good file naming conventions. For example, we have a policy of putting date-specific information in correctly labelled folders. So the meeting minutes for 2007 are filed in the /whatever/2007/ folder. The same rule applies no matter whereabouts you are in the intranet structure. In GSA, we can then use regular expressions to specify any folder with /2007/ in the path and automatically apply the same rules across the intranet. In the example below, content in any /2007/ folder gets a medium decrease in result biasing, since we prefer old news stories, old meeting minutes etc. to have less importance in the rankings.

Manual result biasing

Manual biasing also allows us to exert control over specific intranet areas and I use this, for example, when we are HiPPOed into creating sections that we’d rather did not show up and cloud search results 😉

Date biasing

The GSA has functionality that automatically promotes more recent pages, across the board. But I don’t use this. I prefer to use the manual method of specifying folders to promote and demote because some of our pages and documents are old (long-standing), yet current. Example: the annual leave template is a popular search request but it was amended years ago. It’s still the correct template. But because it is old, GSA would demote it in the results page if we used date biasing. Since we have various forms, policies and guidance which are still current but old, I don’t use this built-in functionality.

Crawl frequency and freshness

I also use our folder structure to help Google know how often to crawl areas of content. I know that some areas of content, such as news stories and meeting minutes, once published to the intranet, do not often change. So I help Google by specifying these areas and instructing not to waste so much time in attempting to re-index.

Related queries

This is a manual method of promoting alternative or related queries. I don’t use it much since Google does a great job of serving results. I mainly use it where there is problem with internal office language with projects which insist on calling themselves one thing when everyone else knows it as something else. Clicking the alternative suggestion will perform another search.

Key matches

Another manual method of promoting results. Key matches differ from related queries because clicking the suggestion will take staff directly to the intranet page rather than performing another search. Again I don’t saturate staff with these but I do use it for staff who come to the HQ intranet expecting to find popular items on their local intranets.

Query expansion (synonyms)

The GSA comes with a default set of synonyms, in different languages. Unfortunately, the UK set comes with American spellings so I had to change all the -ize, -ized, -izing to -ise, -ised, -ising. Again, I use this functionality for dealing with problems with internal language. Query expansion differs from key matches and related queries in that the GSA will incorporate the expanded search terms into the initial query. Example: I search for *organogram* and GSA will also include *org chart* and *organisation chart* into my query and return any results containing those terms.

Content metadata

The remaining improvements to the search experience rely on the quality of the content that Google is crawling. I spent weeks going through a good proportion of the HTML, PDF, DOC, XLS and PPT files on the intranet, getting the metadata into shape.

I posted last month on the importance of quality metadata, with examples of what happens if you don’t think about metadata:

Intranet search engine optimisation common mistakes

Since the launch

As we publish more and more content on the intranet it is a constant battle trying to maintain the quality of search results. However, the importance of metadata is slowly but surely becoming embedded into our processes. We are still using an older version of the GSA (version 5.2) so we don’t have the advantages of any of the newer developments such as type-ahead search results and allowing staff to tag and rate results. But I believe it’s important to get the basics right before adding the frills.