We have a “family” of intranets and we use Google Search Appliance for our search engine. The quality of search results varies widely across the different intranets.
To demonstrate the difference in quality, I tested a one-word search which was the name of a project I wanted to find out about. Let’s call it ProjectX.
Corporate HQ intranet search results
1. ProjectX homepage
2. ProjectX academy
3. How to use ProjectX
4. ProjectX success stories
5. ProjectX open day (London 6/08/10)
On our Corporate HQ intranet, the search results listed the project page as the top result, with various associated documents and news stories going down the list. They are all well labelled and it’s clear what you will get if you click any of them. Key words appear at the start of the title and each title stands out.
Now, while the HQ intranet hosts the main pages for this project, the other intranets should have some pages that mention it, since the project is about ways of working within the organisation. So here are some of the search result titles that the other intranets had to offer (company name and acronyms are made up, GOPIT and Pods are not the word that I searched for):
Intranet 1 search results
1. Acme Service Intranet – Our Organisation – Change Programme…
2. Acme Service Intranet – Our Organisation – Change Programme…
3. Acme Service Intranet – Our Organisation – Change Programme…
4. Acme Service Intranet – Our Organisation – Change Programme…
5. Acme Service Intranet – Our Organisation – Change Programme…
Intranet 1 is not well optimised for the search engine, to put it mildly. It highlights the common mistake of using breadcrumb information for your page title. The result is that all search results look the same. This mistake is common in CMS systems that output an automated page title based on a rigid menu structure and means that the content provider does not have to think about a meaningful page title. Staff will scan the first few words down the left hand side of the list. If they are all the same then you are forcing them to work harder in scanning left to right along each search result, and in the case of Intranet 1, not giving them any benefit for their hard work. Google cuts off search results at 63 characters on our intranet.
Intranet 2 search results
1. recruitment authority form internal recruitment
2. recruitment authority form internal recruitment
3. Recruitment authority form internal recruitment
4. job advert template
5. GOPIT Trouble Shooters
Intranet 2 demonstrates how few content providers think about the search engine when publishing documents on the intranet (Microsoft Word, Powerpoint, Excel, Adobe PDF etc.) You can directly control the search engine results by adding metadata to documents. Google will read the title field in the document properties. Just remember, File, Properties.
Another common problem when search engines index CMS systems is that content providers may enter a meaningful document title within the CMS, but fail to do the same for the physical document. Search engines that do not read the CMS database but which index the resultant HTML and .DOC, .XLS, .PDF files, will use the metadata within the files. Intranet 2 obviously uses the same job vacancy template again and again for each vacancy, but fails to add appropriate metadata to each version of the document, i.e. what is the job title and where is the job based? I don’t know what GOPIT Trouble Shooters is (don’t use acronyms!), but it doesn’t sound like the project I was looking for.
Intranet 3 search results
2. Pods Newsletter
3. Pods Newsletter
4. Pods Newsletter
5. Pods Newsletter
Poor old Intranet 3. The first title is caused by a Powerpoint presentation that contains loads of images but has no text AND zero metadata. And it would never pass accessibility. Loser! The final 4 results may contain news about the project, but I’m not tempted to click.
There are no excuses for poor metatdata! It took me 22 seconds to press the necessary keys and create the metadata for a document as shown in this blog image. 22 seconds of my time saves lots of lots of seconds of staff time.