Transcript from What’s the deal about structured content? Part 2/2

Below is a transcript of second half of our podcast episode What’s the deal about structured content? :

So what’s come about is lots of different standards for the ways in which we can do structured writing. And you know the same about standards and toothbrushes? Everyone wants to use their own. No-one wants to use anybody else’s. And that is often the case with standards and structured writing standards.

So a couple I’ll talk about.

Probably the most successful is one called S1000D – ATA100. And it’s related to something called MILSPEC as well. Has anyone heard of that? This is the standard for military equipment. One the reasons why it’s successful is because the Department of Defense in the States has said all the documentation for all the military equipment that we buy has to conform to S1000D. And if your documentation doesn’t conform to this we’re not going to buy it from you. So they’re in a position where they have a great deal of power, and they can set rules and everyone has to obey it. It’s mainly come about for military aircraft, but it is also used in other hardware equipment: boats and trains and submarines and all manner of physical equipment like that.

And it’s very, very strict. If you’re writing instructions for an aircraft, there’s the equivalent of that map. Every section has a number. Every menu they ever pick up, they’ll be in the same order, where you go. If you want to know how to taxi that aircraft from one point to the next.

And every section within that has a defined set of information types: crew, who should be doing this thing.  So if we’re talking about changing a wheel of an aircraft: who should be doing that, the description of what to do, the procedure, what faults might come about, how you recognize what those faults are, diagrams, and so on.

And they’ve even taken this to extent where, if we go back a slide, all those little chunks of content are actually stored as self-contained files in a database. So there’s a great big database of aircraft information, all labelled by a consistent structure and formats that all the aircraft manufactures have to follow. And this is one of the reasons why we have great levels of safety when it comes to commercial aircraft and a military aircraft.

XML is 20 years old this month. XML, when it began, was seen as the future. 10 years ago XML was seen as the future. Today, it’s seen as the future as well.

There are a whole set of different XML standards. I mentioned recipes. There is a XML standard for… there’s three XML standards… for recipes. For laying out recipes and combining them automatically, being used in databases and websites. And more common ones that you’ll come across: there was one called DocBook, and you might come across that, and one called DITA.

A few hands.

Darwin Information Typing Architecture was developed by IBM and Nokia as the way to document technical documentation.

Robert Horn had these six Information Types, S1000D had about 10. In DITA, there are three main ones: concept, task, and reference. Task being how to do steps, reference being look up information, concepts or a general catch-all. There’s been more. They’ve been added to this over time.

And I mentioned at the beginning, STOP and Information Mapping, because if you go to the website for a DITA, and where it comes from, actually what they say is they’ve built their standard upon the principles from those, and a couple of other standards from the past.

So it’s XML. It’s got tags. Ut’s got semantic tags. So that where you’ve got the step, it is described as a step. The different sets of steps are within a tag called steps itself. You’ve got a tag called results. This is for a task topic.

This is a reference topic. In DITA, you’ve got basically a sort of, like a table: a couple of headings that go across,  different property values like oil type, and oil brand, and then different values where you would use those particular oils. Again, lots of tags, different rules on where one can appear, what it has to precede and follow on.  And there can also be variables like ID: this is to say which type of audience it’s for, or which product it relates to.

And then all these topics, those files, are stored. And you have a thing called a dita map, which says: I want these different topics in my document. You can have maps within maps. And what happens is that it goes through a sausage machine, or transformation engine,  that converts those text files into PDFs, or HTML, or chatbot text, in different ways.

Does anyone use XML today? A couple of people. So IBM and Nokia have been pushing this about 10 years now.

But meanwhile in Germany there’s this issue called translation that they had to deal with.  And they dealt with that years and years before that.  And a situation in Germany is all the different manufacturers, all the different tools you can get for writing content, content management systems for technical documentation, came up with their own structured writing standards.

There are an XML standards for the way that they create content. And the idea of them all losing their unique selling position and going to a standard format, which meant anybody could move from one software platform to the next didn’t really appeal. And if everyone did move all of their content from these proprietary systems to Dita, they wouldn’t get much benefit from actually doing that anyway.  Because they’re using a structured writing standard today. They’ve got all the savings that they would have from topic reuse. The only potential benefit would be if you were sharing content with somebody from another system.

So what they’re doing in Germany is they’re saying: we’ll let people write to their own structures and standards. Between companies, what we’re worried about is how we can swap data between the two. And they’re coming up with standard called IIRDS, or Intelligent Information Request and Delivery Standard, and saying: we’ll wrap our content around with metadata and all set standard rules on what that metadata will be. And then we’ll be able to share the information from different people. And we’ll be able to combine them in two different ways.

From all of this you might have spotted there are a few problems in doing all of this.

It’s all very bit like communism in some way. It’s all very good in theory, but in practice, people can let the whole thing down. And first off, it’s nearly always is. XML is semantic markup. You have these things called tags. You have to have a beginning tag, and you have to have an ending tag. You have rules on the order in which things happen. And when you take this text, when you try and convert it into PDFs or HTML or other formats, if that tagging is incorrect, if there’s a tag missing or it’s in the wrong order, it will not generate the documents that you want.

If anyone’s dealt with XML, you may have come across this in the past.

And there’s the writer again. This constrained writing. A lot of people find that really difficult; particularly if they’re not convinced of the model that they’re being told to use is appropriate to the type of content that they’re creating.

It’s very difficult if you’re trying to deal in a marketing perspective, where presentation layer and the movement of things, and this feeling that the presentation and the content are combined. And this one is very difficult. To create reluctance, to come to use it, in that particular way.

And what do you do with all your existing content if you’ve been around for a few years? You’ve got this huge amount of content that exists, that’s unstructured.

Do you keep that as it is, and run two parallel systems? Or do you go through the painful exercise of converting and structuring all that information, so it can be put into one single system?

So, as a result, after 10 or so years of data being promoted as the next big thing, and XML in general being promoted as the next best thing, within my world of technical communication it’s about 9 percent of people in the UK and about 9% in the United States that actually use it as a standard.

A lot of what we’re doing is really back to the Information Mapping and STOP days – of doing restructuring of information. And labelling it, but not tagging it up with XML.

Well, we’ll do it, but if we were to give it back to somebody in a NHS Trust, they really would struggle to try and keep it up to date using XML. In that environment.

So there are attempts to try and make it more popular. There are tools that are coming out which hide all that tagging.  They’re not necessarily cheap, and none of them are really cheap actually, these tools. And somebody was talking about Adobe earlier. Adobe have a tool that hides some of the tagging, and can make it work with Experienced Manager. And Experience Manager is actually AML-based also.

And there are attempts to make Dita less scary. There are moves to make one called Lightweight Dita. So developers can write in Markdown, marketing people can write in Word or Experience Manager or whatever. and that there’s a way of combining that information together. There are some vendors that will wrap a piece of text around a Dita topic. So theere’s vendor that’s got a tool called Dita Glass. So that you can manage these unstructured bits in your content that way.

So where do we go? What’s happening in the future?

Well, one alternative is to use forms. And rather than get people to write in tagging, in XML give them a form to fill in. And they fill in the fields. And they write the content that way.  And some of the tools for do use this: BBC news, the journalists that do that in some ways. They’re given the form.  They write the heading, they write the lede, write the body of the article. They add some metadata .

That’s one way to go. What happens to that information that is stored? Is it just stored to be presented in one way, or is it stored in a database and wrapped up semantically, so it can be reused in different ways? So that anyone use forms for writing and generating content properly.

Another potential one is we’ve seen this rise of APIs. And APIs are used for data today. At the moment for things like content like weather reports or football scores in that way. Potentially, we can see documentation, instructional information, be stored in an a repository and then delivered. People make queries to an API, and it delivers the different topics they want. To build a document that way. And you could have personalized documents created on the fly. That’s one way that it could go.

There’s a guy called Mark Baker in Canada. He has come up with an alternative where you make the authoring environment much simpler and much looser. You basically have some certain rules for certain information types and what’s in them. But you don’t really have these rules on the order in which things are. You don’t really have documentation maps. What you basically have, is a big bucket of information. And then when somebody asks for a piece of content, it generates a page on the fly. So it’s like a database query and a page is created in that way. SPFE stands for Synthesis Presentation Formatting Encoding. It is based on metadata. Basically everything is driven by metadata.

I think, from personal experience, particularly with dealing with people outside of the IT world, if we could just get them to write topics chunks of text which talked about one thing, and nothing else. And if we could get them labelled and even just using Word heading styles, will go a long way further forward in getting structure and consistency.

If those topics were stored individually as files, so this bit was stored as a file, maybe the next section was stored as a file as well, we would be able to, at that stage, move on to the next step. Of then taking those different files and combining and reusing them in different ways.

That’s not how Word works today, but one tool I’ve come across, there may be others, is called IAWriter. Which is available for Mac and I think for iOS, I think for Windows as well. It’s about £9. End every time you write a piece of text, it stores it as a separate file.

Another potential routes there’s something called AsciiDoc. Has anyone come across this? This is like a version of Markdown, but a bit better. This is based on DocBook. You still have this separation of the presentation from the contents. So you’re just writing in a text file. But what you can do that you can’t do in Markdown is, you can have a semantic markup. So you can say, this bit of text is about a role name, or this bit of text is for a certain audience. You can set different semantic rules in your document. So it can filter and navigate in different ways. You can include topics within topics. And you don’t have this tagging. You don’t have to a tag. Heading 1 is just one equal sign. Heading 2 is  his two equal signs. So it’s easy to markup without having to worry about XML tags.

Just to summarize.

Structured writing is really down to whether we need to take content and process it automatically.

If we want to reuse content in more than one place, we want to deal with different variations, without having two versions of the same piece of information in two different places,  we need to look at it.

If we want content to adapt and change intelligently, to provide personalized information. And particularly, if we want to translate our content into more than one language.

In many situations, the only people that do structured writing are those that have no choice. There’s no alternative. But to do that, to solve those type of problems, that they have this problem, they cannot get their documentation about on time quickly enough for the product releases, they can’t work collaboratively, it is taking too much money or taking up too much time, the documents are a mess, they’re poor quality… Those are the types of people that are looking at structured writing at the moment.

And now it’s time for questions

OK. Thank you


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.