Podcast 128. Optimising technical documentation for Search Engines

Our latest episode of the Cherryleaf Podcast looks at optimising technical documentation for Search Engines.



This is the Cherryleaf Podcast.

Hello and welcome.

To the podcast. My name is Ellis Pratt. I’m one of the directors at Cherryleaf.

In this episode, we’re going to be looking at search in the context of technical communication. It’s something we’ve touched on in some other episodes, specifically podcast Episode #62, but we’ve never done an episode specifically on search.

Another reason for looking at this is because there has been an episode about search in a more general context or one of the most popular podcasts there is. And that is 99% Invisible. And they had an episode called “Search and ye might find”.

So let’s start by quoting from that episode.

And if I put all my best Roman Marr’s voice, let me quote from that.

“For the last decade, people have been grumbling about not being able to find things online, both in our private data and on the public web, despite ever-evolving algorithms. Ever since humans started writing stuff down, the struggle has been in how to organize it all so that its contents wouldn’t be lost in the stacks. Search has always been an attempt to fix that problem.

In the early 2000s Google started getting bogged down by monetization and people trying to game the system.

One way this played out was with SEO, or Search Engine Optimization. which is basically the set of tools for getting a web page ranked higher in specific search results. In the early years of SEO, webmasters would often stuff their webpages with keywords to get them to rank higher, regardless of actual relevance – and the strategies keep evolving.

As the internet expanded, more and more spammy or irrelevant web pages would come up in search results. Google’s AI helped filter those out by emphasizing high quality results. But spammers started to figure out workarounds, resulting in a kind of continual arms race between ill-intentioned actors and Google’s algorithms.

For more on this back and forth, and the problems that still persist, check out Why Search Sucks by Adam Rogers.

In the piece, he goes into another big problem with the “private” side of search — people, for instance, searching their own email. Without a wealth of data to come through that’s publicly available, private search AI is slow to learn. And private searches are becoming more and more essential as more and more of our emails, photos, and memories get stashed away. “

The 99 PI episode looks at this in more detail, and there’s about 30 minutes of content if I remember correctly.

I will include a link to that episode in the show notes. It’s a great podcast, 99 PI, and it’s well worth listening to.

Now what it talks about with regards to search in the general sense is also true for technical communication and the online content that technical writers and technical authors provide. In fact, in 2019, CIDM asked technical communicators what were the biggest shortcomings of their content.

And 52% of the respondents said that they had so much content that customers couldn’t find the correct information to help them be successful.

In this episode, what we’re going to do is look at go for help topic search engine optimization. That is how your help content can be found by Google or found by users via searching on Google. And also that other aspect that they discussed on the 99 PI episode: how to deal with or improve the results when it comes to the private side of search. So that it means the search that comes within a help file that might be provided by help authoring tool.

In doing this, I’m going to be repeating some of the content that we covered or provided in Episode 62, and specifically some of the advice that Jennifer Mcdaid from VMWare discussed. And she was at a conference called the evolution of TC in 2019, I think it was.

And she did a workshop that I attended about search engine optimization for technical communication and specifically how VMWare had changed and adapted their technical documentation so that it was more available on the web and more easier for users to find. So I’m grateful for the advice and information and experiences that Jennifer shared.

Let me start with the private side of search first and then look at the more general aspects of helping your content to be found by Google; you want to be presented as the most likely search result to an end user when they search on Google.

Sometimes there are grumbles that the search that comes with a help authoring tool isn’t as good as the search that is provided by Bing and Google.

And it’s a simple fact that Google is a multibillion-dollar company with a huge amount of resource put into making their search as good as possible.

And they have lots and lots of data that they can use.

And the actions of users to also consider evaluate so they can make the best search engine that’s possible now.

The help authoring tools, the providers of search for websites and other platforms, they don’t have that data, they don’t have that money there and have that resource to do that. So it’s going to be inevitable that you will have a better search experience or more sophisticated search engine with Google and Bing than you would with another application.

But the search is still good. I would argue that the search that you get with the help authoring tools, there are certain things that we can do to help users find the information that they need.

And we want to provide alternative routes to just relying on the search engine. One of the reasons why search is so popular is because in the Internet and World Wide Web, the information architecture, the information structure, isn’t very good. It isn’t consistent, because it’s all of the world knowledge available effectively via the search engine.

As technical authors, as technical communicators, we have the ability within the world that we control to make the information structure a lot better.

The information navigation, the way in which people can navigate around, to find information to discover. A link that takes them to relevant information, to more information so we can create a good structure, a good rich linking structure to help people navigate via links.

We can also help the search engines by writing topics that match closely with the search terms that people put in.

And within some authoring tools, there’s the ability to add structured content in some form or even metadata. And then some of the help authoring tools that enables the search engine to provide options for filtering the content, so that it searches only a certain document or that the content is filtered by a geographical criteria. or by the user skill set or user’s job title.

And the major search engines and the help documentation that they provide have advice and tips on making your content searchable.

Yes, the search isn’t going to be as good as Google and Bing’s.

But the users aren’t going to be searching quantity of information that’s say as vast as searching all of your emails of all of your company’s emails and company’s content. So the scale of the problem isn’t necessarily as great as elsewhere.

Let’s look at the other aspect, how can we make our help topics more findable when users search via Google. and some of these techniques and approaches that we can take. We can also apply to that private type of search as well and improve the results.

The first thing to consider is the page length, the length of your help topics. Google is nowadays tended to prefer and rank more highly in the search end results, pages it judges are information rich.

You do see different opinions on how much content there should be on a web page to make it optimised for Google.

Forbes indicates that an average to 600 to 700 words is optimal.

And according to HubSpot, from study they did in 2021, the ideal blog post length for SEO purposes is 2100-2400 words. And at the conference where Jen Mcdaid spoke, she said that VMWare were aiming at around 7 pages of A4 text, which equates to roughly 1750 words.

So these numbers suggest there’s a sweet spot in terms of the amount of content that you need to have on a page or in a topic for it to rank highly in the search engines.

Now that is different from maybe the culture that is in your organisation.

And there has been a fashion for writing topic-based content and for some of the publishing tools to then publish those topics as individual help topics, individual web pages.

Whilst that provided a quick and dirty way of publishing, it didn’t create content that was particularly user-friendly for the users, or SEO friendly either.

It could mean a lot of clicking for the end users, navigating from one topic to the next.

And there was a presentation at MadWorld about the consequences of its impact on Google as well. I can’t remember the person who presented at MadWorld on this. What they did was they replaced the search engine that comes with MadCap Flare, and had on the top navigation bar, a search that was driven by Google search.

And what they found was that Googles excluding some of the of topics in search engine results, even though those topics were there, because they were so short. These topics, Google judged them to be poor quality and therefore wouldn’t be of relevance to the end user.

So as a solution, this might lead some teams to create content that follows more of the “Every Page is Page One” approach to writing. And we did an episode with Mark Baker on the podcast about that. Essentially that is assuming that somebody is landing on the topic, and that this is the first topic they see. It is as if it’s the homepage, they’re first at Page One. And that approach can lead to topics that look more like Wikipedia-type entries, longer entries with more content down the page.

Now there may be a temptation to put the content into a publishable format such as PDF and put the PDF up onto your website.

The downside of that is that it appears that Google is not ranking PDFs particularly highly. It favours web pages over PDFs. So if you do that, then your content is less likely to be found.

Another consideration is keywords and what is called keyword stuffing. So keywords is another term for search query. So these are the words that people type into a search engine and affect what results that come out. And so if there’s a match between the words that people put into a search engine and the words that are in your content, there’s a match, then that’s one of the ways that the search engine decides that your content will answer the query that the user has.

Now that has led to what is sometimes called the recipe problem, where if you search for a recipe on the web, you’ll get to a recipe and there is just so much content that you have to wade through until you actually get to the recipe itself.

And within that extra content, the keywords that people have been searching on appear time and time again.

And that is because, to an extent, but not exclusively, the search engines and the algorithms choose the topics or the pages they’re going to appear based on the number of times the keyword appears, the percentage of times the keyword appears within a page.

And so people write content that has these words repeating over and over again.

So what they do is they end up writing content first for the search engine, and secondly for the end user.

And from a usability perspective, and if our objective is to help users do the thing they want to do, find the information they need, that’s not the right way round.

We should be writing for users first, not for search engines first.

And in the past, it was even worse. People would have text hidden white on white. They would put lots of keywords into the metadata, hoping that their pages would rank more highly within Google.

And so there is this arms race and battle between the companies that are trying to create search engine optimised friendly content and Google, which is trying to provide the best real results for the end users, the pages that don’t gain the system.

Now, as I said, we should be writing for the end users, not for the search engines. But technical communicators should still consider keywords.

Now, that can have an impact or affect what people write in a number of ways. So one is product names.

Product names change over time. And whilst you the organisation may have changed the product name, it may be the case that users tend to use the name that’s been around for many years previously.

So if your app was called SuperApp and that’s what people have called it for years, and you change it to WonderApp, and all of your topics become WonderApp, Then you may fall down the search engine rankings. Because although you’ve changed the name, people still searching for SuperSpp.

One of the approaches that technical authors have done is they have the new name and the old name on the help pages.

Now that can be done a number of ways. One is you can write “previously known as” or “formally known as”.

What VMWare do is they say “powered by”. And so they have the new word and the old word on the same page.

Another consideration, when it comes to keywords, is that sometimes we have content for version one and we have content for version two and version three. And so we can have pages which are almost identical on the website. And this is known as keyword cannibalization. You can end up with different pages within your documentation competing against each other.

And because if there’s no one dominant page of this, just this noise of information, then Google can rank the individual pages more slowly than if there were just one single page.

To address this, you can consider combining content into a single page with one single information rich page.

And like that’s your new landing page and have other pages link from that. Or you can consider the keywords that you’re using and have the primary page with the keywords that you want, and then for the secondary pages, maybe fewer instances of the keywords that people might search on. That’s a slightly unusual approach that can be done.

Or, you can use things like tags to let Google know that some content is redundant. Or use redirects so-called 301s to redirect people to relevant pages that way.

One thing to do is to test and research. Use some of the tools that Google provides, like the Google search console and the like, to see how Google values your content.

Look at the search engine results and queries they’re being typed in. And there are other tools to help you with advice on which keywords to use tell you which are the most popular keywords.

But it’s important to do some research and see whether you are inadvertently making your life more difficult by having content that’s just not competing against each other, or not ranking very highly.

This can also help you identify any gaps in content. It may be that people are searching for certain content and it doesn’t exist, or that they’re using words and you don’t have those words at all. And the other aspect that we should consider is how our results are presented in the search engine.

That’s the description, or what’s called the SERP results description. And that is the summary that tells you what somebody will get to when they click on the link.

Sometimes, a page can appear highly in the search engine results, but users don’t decide to click on the link and go to that particular page, and that is because they don’t think that the information is relevant to them. It will not give them the answer they’re looking for.

Depending on the authoring tools that you use, you can write a summary and you can use metadata to control what that description is in the search engine results.

If you don’t have that, then what Google will do is it will generate its own description, and it will do that by generally taking the first few words in your page and using that as a summary. So if your summary at the top of the page isn’t particularly relevant, isn’t a good description, then that can affect whether people click through or not. If your description is too long, then Google may truncate that sentence and that can lead to misleading information as well.

So the general rule is to keep your titles relatively short, typically under say, 50 characters.

We can use usability testing and the information that Google provides to see what works as a good title.

But generally it’s good to describe the information accurately, and to match closely with the goal that the user has. Generally, people are searching for a reason. They have a problem that they want to solve.

They’re looking for information, and so if you have topics that match with or resonate with the intent that they have, then that can lead to more click-throughs as well.

Google is placing or encouraging people to use structured content such as the schemas, and schema.org, and it is creating summary snippet boxes based on metadata and structured data.

And that could be another factor. If you’re within maybe the top 10 pages on Google, if you have structured content, if it’s conformant to schema.org, that might push you up to being a summary box; the top one or two results in the search engine. So having structured content can also be useful.

So those are really the main things as technical communicators that we can do.

And it’s in many ways sticking to the same principles, writing for the audience what they need. Also doing research to cheque that what we’re producing is what users need. And also considering the intermediary between the user and our results, considering what Google is doing with our information, how it ranks it.

Using the consoles and feedback and data that Google provides to guide us in identifying content that might need improving.

It also means that we consider writing longer topics, and consider including some structured content to help Google organise the content that’s available.

Ultimately, the objective from Google and Bing is to give users information that answers their question that is useful to them.

So if we still keep to writing useful, usable content, then in the end, that should end up as being at the top of the search engine rankings. At least that’s the hope.

So that’s it for this episode. If you have your own approaches to how you do search let us know.

You can contact us at info@cherryleaf.com. Have a look at the links in the show notes.

If you want to know more about what Cherryleaf does that technical writing services and the training courses we provide, then hop along to cherryleaf.com.

And thank you for listening.




Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.