One of the main benefits from single sourcing is the ability to reuse existing content. Different departments can avoid duplicating work, which means they can save time and money.
Unfortunately, it can be difficult to quantify these savings before you move to an authoring or content management system that enables you to single source. Analysing all the existing documents in a business can be overwhelming, which means often organisations only quantify the savings after the single sourcing content management system has been implemented.
There are a few software applications that can help you analyse your existing content and determine how much duplication exists. You get a sense of how much time and effort was wasted in the past, which is a pretty good indication of how much waste you’d avoid in the future.
Madcap Analyzer can analyse your content and produce a list of:
- Segments of text in your project that are identical.
- Segments of text in your project that are similar, but not identical, to each other.
Analyzer will search segments of between 1-1,000 words, and up to 10,000 differences in characters between each segment. Madcap Analyzer is designed to be an analytics tool for Madcap Flare projects, so you first need to import your existing content into Madcap Flare. Be aware that analysing content for similarities is actually a huge computational task, so it will take quite a while to calculate the results.
Google’s approach to matching identical strings of text (“N-grams”) is to use large clusters of computers working in parallel. The good news is that Google’s MapReduce application is available for developers to use, and you only need to store your content in a single .TXT file. The bad news is that it will only identify exact matches, not similar matches.
Translation Management software
A more circuitous approach is to use Translation Management software such as Madcap Lingo to spot exact and fuzzy matches in documents. These are designed to identify content that has been translated previously, so you’d be using the software to do something that isn’t its intended function. In this situation, your content will need to be converted to SDLXLIFF, TTX or TXML file formats.
If you’ve tackled the issue a different way, do let us know.