High-quality documentation reduces support costs and drives product adoption, but traditional feedback methods often fail to capture the full user experience.
This episode explores how modern documentation teams are combining AI simulation with human research to understand how different users interact with their docs.
Key topics
- The problem with traditional feedback
- Understanding your user archetypes
- Traditional testing methods that still work
- AI user simulation: The new frontier
- AI limitations to remember
- The winning strategy: Hybrid approach
Resources mentioned
Reader-simulator by Casey Smith (open-source)
Impersonaid by Fabrizio Ferri
Cherryleaf course: Managing and Mastering Documentation Projects with AI
Transcript
Hello, and welcome to the Cherryleaf Podcast. I’m Ellis Pratt. We were at the TCUK25 conference earlier this week, and I spoke to a number of people who said that they listened to the podcast and that they enjoyed it. So that was great to know. It was also interesting that some were a few episodes behind and had a backlog or a list of episodes from different podcasts that they were going through slowly.
We’re going to be looking at new ways of discovering what users think of your documentation
We know documentation reduces support costs and can drive product adoption but only if it works, only if it serves its purpose.
And sadly, it’s the case that often technical communicators publish effectively into a void, and they only learn about failures where the documentation is weak when support tickets roll in about questions that we thought had been answered in the documentation.
Now one of the reasons for this is that many technical authors just don’t have access to real users. At the TCUK conference.
One of the presenters was Beth Morgan from IBM and she was saying when they’re developing their documentation, they have access to subject matter experts, but they’re not allowed actually to talk to real users. And that was a theme from other attendees at the conference as well.
So we’ve had this situation for years where teams have tried to gauge their success by having feedback loops and buttons on web pages that sometimes work, often don’t. You might see the widget, the floating box at the bottom of a page saying, was this page helpful? Thumbs up or thumbs down.
The problem with that type of data is often it only captures emotional outliers. The few percent of people who are either totally delighted or the few percent who are absolutely enraged. The rest just don’t bother clicking on that button. They’re unrepresented.
The interesting thing is that we’re moving in, thanks to AI, to a world where we can do better than that.
So let’s start off by acknowledging that we’re writing for more than one type of person. There’s rarely a single homogeneous user.
And one of the challenges is that if you treat everyone the same, particularly with developer documentation, you can end up frustrating a large proportion of your audience.
There are users that come to documentation with profoundly different goals, different backgrounds, different skill sets, and different patience levels.
We can identify some common architect. We can identify some common archetypes. Typically, there’s five common archetypes that every technical writing team has to serve.
One is the novice and the novice is starting from scratch. They might be new to your product or maybe even new to the whole technology behind it. For them, the primary need is understanding. It’s conceptual clarity. It’s reassurance.
If they are reading content that’s full of jargon and acronyms, it’s going to be really difficult for them. They could well feel overwhelmed and assume that the product is just too complicated for them.
So that first impression for new users, particularly in a Software as a Service environment where they can stop paying almost immediately it becomes high stakes.
At the opposite end of the spectrum, you have the experts. It might be an expert that knows your product inside out. They’re typically not interested in the big picture. They are very efficient. They’re looking for one specific, sometimes really obscure piece of information, for example, a configuration variable.
And the moment they have to scroll through three paragraphs of welcome to our product just to find a single bullet point, you’ve lost them. You’ve lost their respect. So for the expert, it’s all about speed, scanability. They want dense information, clear headings. They don’t want the fluff.
The novice is reading linearly for understanding. The expert is jumping around non-linearly for reference.
And that brings us to our third type of user, the task focused user, the person with the job to do
They’re on a tight deadline in most cases. They don’t care about the underlying architecture. All they care about is: how do I connect a to b, or how do I do this thing? They don’t want the theory. They want the recipe. And if you’re documenting code, typically, they want runnable code samples they can just copy and paste directly. If that code is broken or needs five undocumented setup steps, the task fails and so does your documentation.
Another type of user is the person who is already in crisis, the troubleshooter. It’s two in the morning. They’re staring at an error log, just praying for a miracle copy and paste solution. And their frustration level, their tolerance is basically zero. Something’s broken. It’s probably in a production environment, so they’re searching for error messages, searching the FAQs.
And if the documentation doesn’t immediately validate their panic.
If they don’t see the error code in a few lines, then you’re in trouble. They’re opening a support ticket. They’re in triage mode.
So they need to be given the answer now, given the answer quickly.
And the fifth type of user that we might typically need to document for is the non native English speaker. So what are their unique needs?
For them, they want crystal clear, simple language. Consistent terminology is essential because even tiny variations in wording can destroy their comprehension. You need to get rid of cultural idioms. Visual aids, simple sentences, structured paragraphs become indispensable.
So that gives us the players, the audience.
How do we analyse whether we’re meeting their needs or not? What methods can we use?
There are the tried and tested human driven methods for doing user testing, usability testing.
And in many ways, they’re still the gold standard, the best way for getting rich data. The problem is that they are resource intensive.
They do tell you the truth. But as we saw as I mentioned before with regards to IBM, you may not be able to do them because you may not have access to users. That may limit your choice in what you can do.
We can break them down into three areas.
We can start with data and analytics mining, and that is acquiring analytics and data that looks at what users actually do.
So this can be page views. It can be more than that. We need to identify what are the key metrics that we want to track
You can use search as a diagnostic tool. You can track search queries and especially zero result searches. If lots of users are searching for the word bill and your documentation uses the word invoice, then you can see there’s a disconnect.
We can also look for repeated searches. Are people failing and then trying again.
If a user searches for something, clicks a link, and then immediately bounces back and searches again, what does that tell you? It probably means there’s a failure of clarity, that the title of the page was promising, but the answer wasn’t there or was buried or was unclear.
If it was hidden behind jargon or just wasn’t clear enough to find in a quick scan, that could be the reason why they’re bouncing
They’re repeating a search because the documentation lied to them.
So if a page has a low time on page at a high bounce rate, what’s the story? It’s the classic pogo sticking scenario. The user lands, realizes that the title is totally misleading, and they immediately bounce back to Google. Low time, high bounce equals bad signposting. But what about low time, low bounce?
That could actually be a success. It might mean that the user found exactly what they needed instantly and left because their task was done. So context is everything.
And that’s the type of nuance simple analytics, simple widgets can completely miss.
There is also what you might call the treasure trove, support tickets and support ticket analysis. Every single support ticket’s about a question that should have been in the documentation, but that’s a recorded documented failure.
Some teams use a technique called docs gap tagging.
But isn’t that just adding more manual work for an already busy support team?
It takes discipline, yes, but the value can be great. Support agents tag every ticket where the problem could have been solved by better documentation.
And this way, you’re not just treating the symptom, which is the ticket. You’re finding the root cause in the documentation itself.
It turns anecdotes into hard data, and that moves us from passive data to actually engaging with users via the support team.
Some technical communicators, not all, can do direct user interviews and can do usability testing.
Nothing replaces sitting down with a user. Surveys give you quick hits as it were, but direct interviews reveal not just what confuses users, but why. You get rich content. You get rich context.
The emotional impact of the frustration.
Usability testing is the gold standard of observation watching the user succeed or fail in real time
There are three key methods for doing that. One is to use paraphrase testing. You ask a user to read a short section, say, on setting up two factor authentication, and then you just ask them to repeat the steps back to you in their own words. If they can’t, that suggests the content failed, comprehension failed. The content is too dense or is too complex.
A second method is what’s called plus minus testing. Users read the text and just put a plus sign next to the things they like, clear explanations, useful examples and they put a minus sign next to the things they don’t. Confusing terms, bad layout. This method captures little micro reactions.
The third way is the ultimate proof, task based testing.
You ask them to do a realistic job or task.
For example, set up this API, make your first authenticated call, and you just watch
Can they find the right page? Did they complete the task?
Did they want to throw their laptop across the room?
That observable action is just invaluable.
If you don’t have access to real users, you can try and see if you can do it by finding a colleague from a totally different department and ask them to complete these tasks.
And the fourth test, also something that you can do with colleagues is the five second test. You show them a page for exactly five seconds and then you hide it, and then you ask them two questions. What was the page about, and what did you want to click first. If they can’t answer that, then your visual hierarchy or information design probably has some failings. It’s probably broken
So that’s a quick summary of all the common traditional methods. They’re good because they can capture human nuance, but they’re slow. They’re expensive. They may not be available to you. They don’t scale.
You may not be able to interview many people in a week. And because of those reasons. people have been looking at other ways to do this using AI and specifically the AI based user simulation.
So instead of waiting weeks to recruit five different personas, you pressure test your documentation against AI simulated users before you even publish . And you use systems like Claude or ChatGPT or Gemini to do this.
Large language models can be prompted to role play specific user personas.
By simulating how different users navigate and comprehend your content. it means you’re able to catch major issues before real users even see the documentation.
The process is, firstly, create detailed personas. That is define prompts based on representative personas, and you can use user data and typical goals and simulated technical backgrounds to flesh out and describe those different personas.
Number two, what you do is you run the simulation. That is, you feed draft documentation or provide a link to a developer portal preceded by the persona prompt.
And thirdly, you get back the feedback.
You receive critique from the AI system typically in the voice of your simulated user. Some systems is more observational. It’s a series of tests where it says it’s passed or it failed.
To get valuable feedback, you need to prime the AI with a specific identity. So it’s not asking, is this well written, but being specific to different personas with different technical backgrounds, goals, and constraints
So it relies on detailed role playing using specific persona prompts. You’re basically building a psychological profile for the AI system and then asking it to review your content. So you tell the AI system that the user is a novice. They know a little bit about programming and they’re trying to get the “Hello World” to work. They’re anxious with jargon, and then you ask the system to critique the documentation for that type of user
What you get back in terms of feedback can be surprisingly specific to that user user to that persona’s pain points
For instance, the novice AI persona might flag something like the guide told me to instantiate the client, but it never actually explained where to find the API key. It might not explain what is meant by instantiate.
So conceptual gaps are something that AI systems can spot.
So you can get feedback in minutes compared to say weeks if you’re using real people.
And you can simulate those high pressure specific scenarios such as a developer trying to fix an error code at three AM in the morning
One example of this is a tool called Reader Simulator. Casey Smith or CT Smith of Payabli has developed an AI powered documentation testing tool.
The tool simulates different user personas navigating through documents to identify navigation issues and to measure success rates.
It’s available as open source code. And what reader simulator does is it recognises that different users don’t just prefer different content, they consume it in fundamentally different ways.
CT created four different personas, the confused beginner who rapidly cycles through documents trying to find their bearings and understand basic concepts. This is for developer documentation, I should say.
There’s the efficient developer who jumps directly into API references and uses control f to find specific information quickly. The third is the methodical learner who reads documentation from start to finish, building understanding sequentially. And the fourth is what she called the desperate debugger who searches frantically for an error message and and immediate solutions to blocking problems.
The simulator simulates how the different personas navigate and read. Beginners receive progressive disclosure, previews first with full content revealed only when needed.
Experts gets keyword extraction that simulates real-world Control F behaviour.
And methodical learners always receive complete content to support their linear approach
The tool also has capabilities like prioritising links.
It weights navigation choices based on persona preferences.
Beginners gravitate towards tutorials. Experts tend to favour API reference content.
After each session, the tool evaluates whether the correct format match the persona’s preferences and whether users successfully completed their tasks. When users fail to complete tasks, the tool provides specific actionable recommendations for improving the document structure and the contentAnd the configuration of the tool allows it to work across different documentation platforms
CT built the entire tool through what’s called vibe coding or writing a prompt and asking an AI to create the code. She did it all through Claude.
To explore whether this approach could be replicated on other platforms, we, that is Cherryleaf, we tried out whether we could replicate it using some other AI tools. The first one we tested was whether we could create a reader simulator using ChatGPT’s agent mode.
Using the same personas that CT had created, testing against the same documentation to compare results from ChatGPT’s agent mode with what she was getting back from Claude. And it worked. ChatGPT generated comparable insights to the original Claude implementation.
And we also investigated whether a reader simulator could be built using a no code app platform. Now these platforms allow users to create applications through text prompts rather than writing code, and they provide a more polished application like interface compared to something where you’re just entering text into a prompt window
You have fields. You have screens where you can enter the site that you want tested. You can pick from a list which type of persona you want to use in that particular test
The good news is we were able to replicate the functionality again.
What’s nice about the no code app versions is the visual impact. It’s much more shiny, much more visually appealing, much easier for other people to use at the same time having that core key functionality.
And what’s possible with that is you can extend it. You could incorporate a back-end database for keeping an historical record of all the different results. You could easily add extra personas or edit and amend the personas.
Somebody else that’s been looking at this is Fabrizio Ferri Benedetti who has developed a similar tool which he called Impersonaid.
Now he developed this about six to nine months ago. That lets you create custom personas following a schema. He said it provides a more emotional pseudo user response.
We had to go again and think we could recreate this app using a no code system and we were able to do that. The responses that come back pointing it at the same sites as the reader simulator is you get a more narrative type feedback and recommendations from the system.
I should say that we have recently introduced a new course called managing and mastering documentation projects with AI. We teach people how to create no code apps that can be used within
technical writing for managing documentation projects, and these apps are covered in that course. And you’ll find details on that course on the Cherryleaf website.
These tools mean that you can quickly find and identify problems with your documentation and get back specific solutions.
And that’s pretty compelling as a step in your development documentation for having a validation prior to releasing your documentation. You can test it against a range of personas and implement fixes in very, very little time.
Now there are some limitations to AI.
The AI has no ego. It just points at the flaws.
It can simulate frustration based on patterns it’s learned, but fundamentally, it lacks real emotion. It can’t feel genuine rage. It can’t feel the stress of a deadline or the distraction of a noisy office.
And the AI only knows what’s in its training data or what you put into the prompt.
This means if your domain topic is highly technical.
it might miss subtle inaccuracies that a human expert would spot in a second. So you can’t rely on it to catch those really niche errors.
It might not spot a subtle technical inaccuracy a true expert would.
The Nielsen Norman Group has looked into these types of simulated users, And this was back in 2024 and this was at whether they could simulate users for product or service design rather than documentation.
But what they found was that or how they describe the responses, they describe them as being one dimensional and too shallow to be useful for final decisions. The reason for this is they tended to be overly positive. It would say what it liked, but it wouldn’t indicate whether it would give up; that it was predicting idealised behaviour rather than messy reality
With the examples that we’ve been talking about, in many ways it’s been setting up the the system to do testing whether there’s a success or a failure. So that is less likely to happen but it’s still important to bear that risk in mind.
Where this ends up as as an effective strategy is that we should have a hybrid model, that we use AI simulation for rapid iteration, also for testing hypotheses for catching the low hanging fruit as it were, the structural issues, the clarity problems at the stage of you drafting the content, essentially catching the eighty percent of the obvious stuff before a real person ever sees it.
And in addition to that, if you can have it validated with real people to get that deep nuanced understanding of why a user is frustrated.
And also that there should be a culture of continuous improvement
The best documentation teams know their work isn’t finished at publication. It’s a continuous improvement loop so our documentation process could be that
We draft content.
We test it using an AI simulator
We revise the content.
We test it with humans.
We iterate and improve.
And then we publish.
So the AI provides hypotheses about what users might experience and real users provide the truth.
So to wrap up, we can, using AI, ideally using AI and real users, reduce the amount of guesswork in
our understanding of what users think and how users behave with our documentation.
To make better documentation that enables users to get the job done, to make them happier
less frustrated
So if you are interested in this, we’ll provide links to CT Smith’s tool, and we’ll also provide links to our training course on mastering and managing documentation projects with AI where we can teach you how you can build this tool yourself along with other tools. And if you’d like more information about Cherryleaf in general or you’ve got questions, the website’s cherryleaf dot com. The email address is info at cherryleaf dot com. And as we usually sign off with every episode, I thank you for listening

Leave a Reply