Tags vs Keywords

Many of our clients use WordPress-driven websites and intranets. Tags are a core feature of WordPress. Keywords are not. But we do use them on intranets.

Keywords on the web

Keywords don’t really pack much punch on the web. They were initially designed to aid search engines in indexing content but due to abuse, search engines have long since ignored them.

Keywords on the intranet

Keywords on the intranet are a great way for the intranet search engine to index words that wouldn’t otherwise appear on a page. Employee restaurant, staff canteen, lunch menu, food and drink are all words that people might use to search for the lunch menu. Not all of these words will appear on that page. So we use keywords to help people that search using words that do not appear in the main content of a page. And although the search engine will read and index the keywords, these keywords will not appear on the front end pages.

You don’t need to repeat words that already exist in the page title or body content again in keywords. Keywords are not obligatory. It’s fine to leave them blank. And I stress this because some publishers will enter any words they can if they feel they have to and this can damage search results.

Don’t include your company name as a keyword. It’s really not necessary. Unless the page is specifically about your company and this is the page that should appear in search results if someone, on your company intranet, searches for your company name.

When you add a keyword, think of it as an instruction to the search engine to include the page as a possible entry in search results when anyone uses the word in a search query. Be very specific.

Keywords are also a good way of promoting internal campaigns. Include the message Search for “funderpants” in an offline poster and make sure that you have the obscure/unique word as a keyword in one intranet page only.  Then it will appear all alone at the top of the search results pages, if staff search.

Tags on the intranet

Tags are not an alternative to keywords. Tags are for grouping  content. They are not designed solely for the search engine to index. And they are visible in the front end.

Ideally, you should apply a single tag to more than one page. For example, you can apply the tag “meeting” to pages for booking a meeting room, ordering catering for a meeting, collecting guests for a meeting and how to operate the projector.

When adding tags to a page, check that a similar tag doesn’t already exist to avoid creating different variations. If you create a new tag, check which other pages might also benefit from the same tag.

Tags on the web

The only benefit of this shotgun approach of adding lots of tags is that the visible, front-end content will get indexed. On the web, this is a benefit as keywords are generally ignored.

Managing tags

I find, in practice, that publishers fall into two categories. Those who organise and manage their tags, and those who treat tags as keywords.

Those who manage their tags will build a controlled vocabulary of words and terms that can be used. On the DCMS intranet there are just 65 tags to cover all the content. You click a tag; you get a nice bunch of pages relating to that tag.

Those who treat tags as keywords add them in a less controlled manner. This can result in having hundreds of tags, which in itself is not necessarily a bad thing. It produces pages with lots of tags. But it also produces lots of single tags that apply to only one page. Groupings happen by chance.  Variations on words start to appear, on the frontend too.  Tag clouds become very flat, as all the singleton tags appear at the same size and colour.  You click a tag; you get one page, or a few if you’re lucky.

On the intranet, it’s best to use tags to group content, and keywords to aid the search engine.

Relevanssi WordPress plugin: intranet search comparisons

As part of the forthcoming iteration of the GovIntranet WordPress theme, I’ve been testing search results. I compared search results on 2 client intranets, each using their own content, but different search implementations:

Both intranets use bespoke WordPress themes with the Relevanssi search plugin. Intranet A has some custom code to integrate documents into search results.

So, fasten your seatbelts for a bumper ride through 21 of my top intranet search queries, typical of any government departmental intranet. Screenshots show page 1 of the search results for the two different intranets. I’ve anonymised the results where appropriate.

1. Book a meeting room

Grab that room while it’s still vacant. The room booking facility is a top ranker for office-based staff. Or is it?

Intranet A: book a meeting room
Intranet A: book a meeting room
Intranet B: book a meeting room
Intranet B: book a meeting room
2. Eye test

One of my personal favourites. You fill out a form, send it to HR. They send you a voucher. You take it to the optician when you get your annual eye test. All paid for by Her Majesty’s ever-so generous Government.

Intranet A: eye test
Intranet A: eye test
Intranet B: eye test - goes direct to required page
Intranet B: eye test – goes direct to required page

3. Maternity leave

You’re pregnant. You’ve got a lot to think about and plan for. Wouldn’t it be nice if your intranet gave you the facts straight?

Intranet A: maternity leave
Intranet A: maternity leave
Intranet B: maternity leave
Intranet B: maternity leave

4. Guidelines on blogging

For review, before you boldy put finger to keyboard.
Intranet A: guidelines on blogging
Intranet A: guidelines on blogging
Intranet B: guidelines on blogging
Intranet B: guidelines on blogging

5. Replace my building pass

You went for that *just one drink* after work and you arrive at the office the next day knowing you’re in the shit.

Intranet A: replace my building pass
Intranet A: replace my building pass
Intranet B: replace my building pass
Intranet B: replace my building pass

6. Claim expenses

You’ve been for that glorious, 3-night stay in Sunningdale.
Intranet A: claim expenses
Intranet A: claim expenses
Intranet B: claim expenses
Intranet B: claim expenses

7. GPC

For those of you who don’t work in government or who don’t speak acronym, this is the Government Procurement Card. A credit card for the responsible people with a grand spending power. To be fair, it’s an acronym that is commonly used.
Intranet A: gpc
Intranet A: gpc
Intranet B: gpc
Intranet B: gpc

8. Rail tickets

Online booking please; it’s digital by default!
Intranet A: rail tickets
Intranet A: rail tickets
Intranet B: rail tickets
Intranet B: rail tickets

9. Induction for new staff

A must-have for every intranet. How do you welcome your new joiners?
Intranet A: induction for new staff
Intranet A: induction for new staff
Intranet B: induction for new staff
Intranet B: induction for new staff

10. Gifts and hospitality

Yup, even that bottle of Harvey’s Bristol Cream you got at Christmas from the agency you worked with, you gotta declare it.

Intranet A: gifts and hospitality
Intranet A: gifts and hospitality
Intranet B: gifts and hospitality
Intranet B: gifts and hospitality

11. Voicemail

How to setup your friendly recording to play to your beloved colleagues when you can’t be arsed to pick up.
Intranet A: voicemail
Intranet A: voicemail
Intranet B: voicemail
Intranet B: voicemail

12. Box times

Is this something to do with the diet coke advert? No, it’s the deadline for submitting something to a minister before they slope off for a nice Sancerre.
Intranet A: box times
Intranet A: box times
Intranet B: box times
Intranet B: box times

13. Wisleblowing

How good does your search engine deal with typos?
Intranet A: wisleblowing
Intranet A: wisleblowing
Intranet B: wisleblowing
Intranet B: wisleblowing

14. Written Ministerial Statements

Terribly exciting stuff, I know. Such is the life of a Civil Servant.
Intranet A: written ministerial statements
Intranet A: written ministerial statements
Intranet B: written ministerial statements
Intranet B: written ministerial statements

15. Risk register

For your Queen of Prince II or your agile scrum master, who’ll need to be armed with one of these.
Intranet A: risk register
Intranet A: risk register
Intranet B: risk register
Intranet B: risk register

16. Translate into Welsh

Intranet A: translate into welsh
Intranet A: translate into welsh
Intranet B: translate into welsh - goes direct to required page
Intranet B: translate into welsh – goes direct to required page

17. Whistleblowing

For those moments when you just have to do the right thing.
Intranet A: whistleblowing
Intranet A: whistleblowing
Intranet B: whisteblowing - goes direct to required page
Intranet B: whisteblowing – goes direct to required page

18. Payslip

The intranet, filled with topics closest to your heart.
Intranet A: payslip
Intranet A: payslip
Intranet B: payslip
Intranet B: payslip

19. Season ticket loan

Can’t stand queuing for your ticket at the station? Get an annual ticket and pay it off in 10 easy monthly payday deductions.
Intranet A: season ticket loan
Intranet A: season ticket loan
Intranet B: season ticket loan
Intranet B: season ticket loan

20. T&S

Because we all need a little T&S.
Intranet A: t&s
Intranet A: t&s
Intranet B: t&s
Intranet B: t&s

21. Lunch menu

A guy walks into a restaurant and says to the waiter, “Can I see what you had on your menu three weeks ago?”
Intranet A: lunch menu
Intranet A: lunch menu
Intranet B: lunch menu
Intranet B: lunch menu

Analysis

So which intranet performed best and why? And how could we make improvements?

Content quality

There is only so much that a search engine can do. The quality of search results largely depends on the quality of the content. Garbage in, garbage out. The quality of the content on intranet B shines through in the search results. Pages are written in plain English using active language. Page titles are concise.

Documents in search results

In Intranet A, we see a profusion of documents in the results, often containing hyphens and dot docs and brackets and dates. Staff will find these hard to sift through because the sheer number of documents returned tends to cloud the search results.

In 2002, I decided to turn off documents in search results on the London Underground intranet, subscribing to the belief that if a document is important enough then there will be an HTML page that mentions it. Search results improved dramatically.

The number of results returned

A basic usability rule is that less choice improves efficiency. So it follows that fewer entries in the search results page will make it easier and faster for staff to make a choice. Few people will go past page 1.

Intranet A: 548 total results, 6 out of 20 searches with just one page of results

Intranet B: 195 total results, 16 out of 20 searches with just one page of results

Direct hits

Even better than less choice is not having to make a choice at all. For those searches that produce a single result, Intranet B will take staff direct to the page in question, skipping the search results page. Skipping the time taken to scan a search results page, make a choice and click. Intranet A has 18 results for an eye test. Just how many pages do you need on your intranet about getting an eye test? Just one.

Social content

Intranet A includes social content generated by staff in forums and crowdsourcing areas.

Intranet B only includes core intranet pages by default. For social content, staff need to search from within the forums.

While there is no question that social, staff-generated content is good for all sorts of reasons, including it in search results by default can cloud the results. In the examples above, on Intranet A, it would appear that searching for any financial information returns a post about a sports day event, consistently in first position.

I still believe that there is a definite line between corporate, official content and social, staff-generated content. And that each has its uses.

Context and design

Intranet A uses breadcrumbs to give context to the search result title. While this can be useful, it becomes a problem when the page title refers to an item 9 levels down in the structured navigation.

Intranet A shows no clear date information telling you when the page was updated and there is no snippet text to help give you the scent of what you’ll get if you click.

Intranet B shows the type of content, category and contextual information where appropriate. Date information varies upon the type of content, so for example, we’ll show the last modified date for tasks but the first published date for news stories.

Content housekeeping

It’s really up to content publishers to make sure that old information isn’t left lying around. Regular housekeeping and clear procedures can help to keep search results free from useless information. So if you publish a lunch menu each week, why not keep the URL the same so that staff can bookmark the page and always return to it? Why publish multiple versions of the document with different URLs and include them all in search results when they become redundant?

Configuration

While I would say that content is the main area that you can use to improve search results, there is a fair bit that you can do behind the scenes to configure how the search engine works. Do you promote more recent content? Do you provide synonyms for words? What about search suggestions on typos or demoting old news stories? How does using an AND search compare to using an OR search?

Advanced search and filtering

Nice to note that neither intranet uses advanced search or filtering. In my experience of user testing, such options only add confusion. It is rare that you’ll get a member of staff wanting to do complex searches. The majority don’t need it, and including it only serves to provide yet more choice.

Conclusion

A budget search solution combined with tip-top content can produce very good search results, making it faster to find guidance and information and making staff more productive.

How do your top searches perform?

Related blogposts

Intranet show and tell *Search* event: 2 Dec 2010

*Intranet search* was the subject of the Intranet show and tell event that took place at the Ministry of Justice in London on 2 December 2010. The bad weather meant that a lot of people could not make the event, but we managed to get 17 people around the table.

There was a diverse set of talks ranging from working with intranet content and metadata to back-end coding of the search interfaces and the nuts and bolts of the underlying systems.

I kicked off the talks with a quick history of how we have improved the search experience for staff by paying attention to metadata within our HQ intranet content, using Google Search Appliance. I also showed some of our search analytics and infographics that we are starting to use for data visualisation.

Simon Thompson (@thompsonsimon) and Angel Brown (@angelbrownuk) organised the event. Simon showed us some bespoke programming of Sharepoint to enable typeahead search results for both people and intranet content using jQuery.

Tyler Tate (@tylertate) managed to connect through Skype to do a remote session on *Dealing with diverse data* giving examples of search results page interfaces based on different data sources.

Tom Mortimer from Flax gave a very detailed presentation about restricting search results based on user credentials. And Rangi Robinson (@rangfu) from Framestore demo’d his staff intranet people search.

Great to see support from @Funnelback. And sorry we were unable to hear from Martin White from @IntranetFocus who was snowed in.

I noted a lack of talk about what I term *social search*, allowing staff to tag and rate search results. I would be interested to hear how this is being used by anyone.

Lastly, big thanks to @starhorseUK for adding the finishing touches to the room booking and for helping our guests into the building.

How we improved our intranet search experience

We use the Google Search Appliance (GSA) across our family of intranets. In 2009 we launched a new search experience to coincide with an upgrade to the GSA.

Analysis of original search experience

I so wish I had some screenshots of the original search results pages. If I manage to find some, I’ll post.

Back in 2008, a typical search from any one of the family of intranets produced results from all the intranets. I know it is considered best practice to include everything in the initial scope of a search and then allow users to winnow down the results. But in this case, because each intranet served a very specific part of the organisation, results from all intranets clouded the experience. Inadequate metadata and file names made the experience worse and the interface was generally busy, as if someone had got into the admin page and turned on every option. Just because you can do something, doesn’t mean you should.

Search interface redesign

For the interface, I decided to localise the results and narrow the scope of the initial search to the intranet that staff search from. I thought that it would improve orientation. So if I search from the HQ intranet then I just get results from the HQ intranet. Same goes for intranet 1, 2, 3…

For the results page I included the Google logo to psychologically increase confidence in the results.  I placed a nice wide search box at the top, pre-filled with the original search phrase to allow staff to easily refine their query. I used a drop-down menu so that staff could switch to results from a different intranet. The presentation of the results themselves also followed Google’s public design with a title, snippet, URL and links to the cached version.

I decided to use icons for downloads instead of the existing [PDF] text that came by default with the GSA template. The template also included the amount of time it took to fetch and display the results, which I dropped in favour of simplicity. I changed the *Cached* link to *Text view* in the hope of encouraging staff to use this is a quick method of viewing PDF and Word files in a text version rather than waiting ages for the application to load.

Advanced search is available, but inconspicuous, since my experience of watching users try advanced searches is that they tend to overly restrict themselves. I also rewrote the help page.

The results display by relevance with the most relevant at the top (I’ve never understood some websites which clutter the search results page with a relevance or significance scale against each result; surely the first result is ALWAYS the most relevant?) There’s also the option to sort by date.

Example search results
Example search results

GSA backend tweaks

In addition to the interface, I also had a play around with the backend configuration with the aim of improving search quality, relevance and general usability.

Date stamping

For each entry on the search results page  there is a date. But what does the date mean? It is not always the date when the page or document was last amended. Consider a date-specific news article that is published and then gets amended several months later. Which date do we show in search results; the date of the article or the date is was last amended? This is just one question that I had to face while tweaking the interface. It’s possible to extract a date for inclusion in search results from the page or document title, filename, URL, from within the content itself, from metadata or the file datestamp. That’s a lot of choices. Our search result dates are contextual so that if you see a news story, the date reflects the initial publishing date. If you see minutes from a board meeting on a certain date, that’s the date we’ll show.

Configuring datestamps
Configuring datestamps

Manual biasing

You can manually control the ranking of documents or folders within the GSA. I use this functionality to manually demote date-specific content, such as news stories and meeting minutes and to promote popular areas.
When it comes to configuring the backend, it really helps if you are working with a well organised folder structure and good file naming conventions. For example, we have a policy of putting date-specific information in correctly labelled folders. So the meeting minutes for 2007 are filed in the /whatever/2007/ folder. The same rule applies no matter whereabouts you are in the intranet structure. In GSA, we can then use regular expressions to specify any folder with /2007/ in the path and automatically apply the same rules across the intranet. In the example below, content in any /2007/ folder gets a medium decrease in result biasing, since we prefer old news stories, old meeting minutes etc. to have less importance in the rankings.
Manual result biasing
Manual result biasing
Manual biasing also allows us to exert control over specific intranet areas and I use this, for example, when we are HiPPOed into creating sections that we’d rather did not show up and cloud search results 😉

Date biasing

The GSA has functionality that automatically promotes more recent pages, across the board. But I don’t use this. I prefer to use the manual method of specifying folders to promote and demote because some of our pages and documents are old (long-standing), yet current. Example: the annual leave template is a popular search request but it was amended years ago. It’s still the correct template. But because it is old, GSA would demote it in the results page if we used date biasing. Since we have various forms, policies and guidance which are still current but old, I don’t use this built-in functionality.

Crawl frequency and freshness

I also use our folder structure to help Google know how often to crawl areas of content. I know that some areas of content, such as news stories and meeting minutes, once published to the intranet, do not often change. So I help Google by specifying these areas and instructing not to waste so much time in attempting to re-index.
URL patterns to crawl infrequently
URL patterns to crawl infrequently

Related queries

This is a manual method of promoting alternative or related queries. I don’t use it much since Google does a great job of serving results. I mainly use it where there is problem with internal office language with projects which insist on calling themselves one thing when everyone else knows it as something else. Clicking the alternative suggestion will perform another search.

Related queries
Related queries

Key matches

Another manual method of promoting results. Key matches differ from related queries because clicking the suggestion will take staff directly to the intranet page rather than performing another search. Again I don’t saturate staff with these but I do use it for staff who come to the HQ intranet expecting to find popular items on their local intranets.

Key matches
Key matches

Query expansion (synonyms)

The GSA comes with a default set of synonyms, in different languages. Unfortunately, the UK set comes with American spellings so I had to change all the -ize, -ized, -izing to -ise, -ised, -ising. Again, I  use this functionality for dealing with problems with internal language. Query expansion differs from key matches and related queries in that the GSA will incorporate the expanded search terms into the initial query. Example: I search for *organogram* and GSA will also include *org chart* and *organisation chart* into my query and return any results containing those terms.

Content metadata

The remaining improvements to the search experience rely on the quality of the content that Google is crawling. I spent weeks going through a good proportion of the HTML, PDF, DOC, XLS and PPT files on the intranet, getting the metadata into shape.

I posted last month on the importance of quality metadata, with examples of what happens if you don’t think about metadata:

Since the launch

As we publish more and more content on the intranet it is a constant battle trying to maintain the quality of search results.  However, the importance of metadata is slowly but surely becoming embedded into our processes.  We are still using an older version of the GSA (version 5.2) so we don’t have the advantages of any of the newer developments such as type-ahead search results and allowing staff to tag and rate results.  But I believe it’s important to get the basics right before adding the frills.

Intranet search engine optimisation (SEO) common mistakes

We have a “family” of intranets and we use Google Search Appliance for our search engine. The quality of search results varies widely across the different intranets.

To demonstrate the difference in quality, I tested a one-word search which was the name of a project I wanted to find out about. Let’s call it ProjectX.

Corporate HQ intranet search results
1. ProjectX homepage
2. ProjectX academy
3. How to use ProjectX
4. ProjectX success stories
5. ProjectX open day (London 6/08/10)

On our Corporate HQ intranet, the search results listed the project page as the top result, with various associated documents and news stories going down the list. They are all well labelled and it’s clear what you will get if you click any of them. Key words appear at the start of the title and each title stands out.

Now, while the HQ intranet hosts the main pages for this project, the other intranets should have some pages that mention it, since the project is about ways of working within the organisation. So here are some of the search result titles that the other intranets had to offer (company name and acronyms are made up, GOPIT and Pods are not the word that I searched for):

Intranet 1 search results
1. Acme Service Intranet – Our Organisation – Change Programme…
2. Acme Service Intranet – Our Organisation – Change Programme…
3. Acme Service Intranet – Our Organisation – Change Programme…
4. Acme Service Intranet – Our Organisation – Change Programme…
5. Acme Service Intranet – Our Organisation – Change Programme…

Intranet 1 is not well optimised for the search engine, to put it mildly. It highlights the common mistake of using breadcrumb information for your page title. The result is that all search results look the same. This mistake is common in CMS systems that output an automated page title based on a rigid menu structure and means that the content provider does not have to think about a meaningful page title. Staff will scan the first few words down the left hand side of the list. If they are all the same then you are forcing them to work harder in scanning left to right along each search result, and in the case of Intranet 1, not giving them any benefit for their hard work. Google cuts off search results at 63 characters on our intranet.

Intranet 2 search results
1. recruitment authority form internal recruitment
2. recruitment authority form internal recruitment
3. Recruitment authority form internal recruitment
4. job advert template
5. GOPIT Trouble Shooters

Intranet 2 demonstrates how few content providers think about the search engine when publishing documents on the intranet (Microsoft Word, Powerpoint, Excel, Adobe PDF etc.) You can directly control the search engine results by adding metadata to documents. Google will read the title field in the document properties. Just remember, File, Properties.

Another common problem when search engines index CMS systems is that content providers may enter a meaningful document title within the CMS, but fail to do the same for the physical document. Search engines that do not read the CMS database but which index the resultant HTML and .DOC, .XLS, .PDF files, will use the metadata within the files. Intranet 2 obviously uses the same job vacancy template again and again for each vacancy, but fails to add appropriate metadata to each version of the document, i.e. what is the job title and where is the job based? I don’t know what GOPIT Trouble Shooters is (don’t use acronyms!), but it doesn’t sound like the project I was looking for.

Intranet 3 search results
1. filename.ppt
2. Pods Newsletter
3. Pods Newsletter
4. Pods Newsletter
5. Pods Newsletter

Poor old Intranet 3. The first title is caused by a Powerpoint presentation that contains loads of images but has no text AND zero metadata. And it would never pass accessibility. Loser! The final 4 results may contain news about the project, but I’m not tempted to click.

Intranet SEO

There are no excuses for poor metatdata! It took me 22 seconds to press the necessary keys and create the metadata for a document as shown in this blog image.  22 seconds of my time saves lots of lots of seconds of staff time.