Towards content lakes

One of the trends in both data and content management is the move away from silos. In data management circles, there is a trend towards the collection and aggregation of customer data into “data lakes”. According to Margaret Rouse, a data lake is:

A storage repository that holds a vast amount of raw data in its native format until it is needed. While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data. Each data element in a lake is assigned a unique identifier and tagged with a set of extended metadata tags. When a business question arises, the data lake can be queried for relevant data, and that smaller set of data can then be analyzed to help answer the question…Like big data, the term data lake is sometimes disparaged as being simply a marketing label for a product that supports Hadoop. Increasingly, however, the term is being accepted as a way to describe any large data pool in which the schema and data requirements are not defined until the data is queried.

(source: what is a data lake?)

“Content lake” isn’t a word that’s used in the content management or technical communication sectors yet, and whilst it seems unlikely end user content will grow at the same rate as other forms of data, there’s a fair chance this phrase could catch on.

A content lake is likely to have similar attributes to a data lake:

  • Content will be stored in a native format that is then changed into other formats.
  • It will use a flat architecture to store data.
  • Content will be stored in some type of structured format. Perhaps XML, JSON or plain text (with AsciiDoc-like attributes assigned to certain sections). However, user documentation does not require the rigorous structure of other forms of content.
  • The content lake can be queried for relevant content, and that a smaller set of information can then be extracted to help answer questions. This might not mean publishing content on-the-fly, but generating PDFs, CHM files and web-based content from a single source.
  • Rather than content being simply archived, it will deliver the right information in very short timeframes.

See also:

Please share your comments below.

Documentation as Code

Tom Johnson has written two interesting posts on his blog on the “Documentation as Code” concept:

Documentation as Code can be interpreted in a few ways. Tom describes it as being able to store the documentation with the code:

From a technical angle, Etter argues that one should embrace lightweight markup languages, use static site generators, and store content in version control repositories with engineering code.

An other interpretation associated with this is that documentation should be seen as a design problem; it should be seen as part of the product (and seen in a similar way to the software code), rather than an add-on. If the documentation is stored with the code, it can mean that the requirements for documentation can be more closely linked to the code. When a requirement for a new feature is raised, so can a requirement for the related documentation. It can also mean that content that’s embedded in the UI, presented as on-boarding screens or presented as online Help, can be considered as different potential solutions to each user need.

Documentation as Code is a topic we touch on in our advanced technical writing training course. It’s an approach that we may see growing in popularity.

General Data Protection Regulation – will this affect online Help?

Yesterday, I saw a presentation by Hazel Southwell on the EU’s General Data Protection Regulation (GDPR), which will be implemented on the 25th May 2018. The impact in its data privacy and protection rules seem likely to affect pretty much every website, with the threat of hefty fines for those that do not comply.

Organisations providing personalised Help content, by storing information in cookies or monitoring the behaviour of users living in the EU by tracking their digital activities, will need to comply with the GDPR regulations. In particular:

  • Businesses will have to adopt governance and accountability standards and meet their data privacy obligations.
  • Clear and affirmative consent to the processing of private data must be provided, and the relevant information must be laid out in simple terms.
  • Organisations need to consider the risks of transferring data (such as the storing of cookies or IP addresses) to countries outside of the EU.

One solution is to require users to log in to see information. However, this may be an unpopular and impractical solution for many users.

Training news

Just a quick update on some recent training-related news.

We’ve scheduled some new classroom courses:

We’re also continuing to add more courses to WriteLessons – our bundle of elearning courses for technical communicators looking to expand their core skills. We’ve added courses called “Writing and designing embedded Help” and “Markdown”.

WriteLessons is a subscription service – a bit like Netflix. You pay for it for as long as you need it. You can stop when you want, and the subscription will finish at the end of that month. You have access to all of the courses, which you can take at your own pace.

We’re currently working on a module on post-writing and verification, which focuses on editing and proof reading, which will be added to WriteLessons. You might also see a course on Cascading Style Sheets in the upcoming months.

 

Cherryleaf launches WriteLessons

WriteLessons, from Cherryleaf, provides you with access to a range of courses in technical communication. You have access to all of the courses contained within WriteLessons, which you can take at your own pace.

writelessons screenshot

Currently in beta, we’ll be adding extra courses over time. At launch, it contains:

  • DITA fundamentals
  • Single sourcing and content reuse training course
  • Introduction to content strategy
  • Documenting REST APIs
  • Managing technical documentation projects

You have access to all of the courses in the collection under a Netflix-style subscription plan.

See: WriteLessons

Awards

The Spring 2015 edition of Communicator magazine and its special supplement on the Value of Technical Communication was entered in both the IoIC (Institute of Internal Communications) Awards in 2015 and the APEX Awards in 2016. One of Ellis’ articles (“Creating videos: tips and tricks”) was part of that issue.

We’ve just learnt this issue has won an APEX Grand Award. This is the first time Communicator has won a Grand Award. It has also won an IoIC Award of Excellence in 2015.

IOIC logo ISTC Spring 2015APEX Awards 2016

From http://apexawards.com/apex2016grandawardcomments.htm:

“This clean, appealing layout offers attractive spreads, a crisp, legible type schedule, with effective use of callouts, sidebars and captions. Content is equally exceptional, with fully vetted, well written articles on a wide range of professional topics. And the supplement on the value of technical communication is an effective ‘selling tool’ for managements and other key audiences. This magazine is precisely the kind of first rate publication you’d expect from a professional association of scientific and technical communicators.”

Cherryleaf’s technical writing online training course has been re-accredited by the ISTC

Cherryleaf’s technical author basic/induction training course has been accredited by the Institute of Scientific and Technical Communicators since its launch. This accreditation has to be renewed every few years, which involves having the course is re-assessed by the ISTC’s accreditors. Earlier this year, we submitted the course for renewed accreditation, and we’ve recently received an email informing us the course has been approved again by the ISTC.

Americanisms and Britishisms

There are user documentation projects where we are asked to write in American English instead of British English, and generally this is a pretty straightforward exercise for us. However, when I speak at conferences in the USA, delegates sometimes ask me afterwards what I meant by a particular expression. For example, I was recently asked what I meant by “round the houses” and “cheesed off“.

There are a large number of subtle differences between the two versions of English, which has led to a number of very interesting blogs on this subject. In particular, Dr. Lynne Murphy’s Separated by a common language and Professor Ben Yagoda’s Not One-Off Britishisms blogs provide a fascinating insight into how words and expressions gain popularity. The Language Log is another blog worth reading.

If the move to a more conversational approach to technical writing grows in popularity, we may see these differences becoming a bigger factor in localis(z)ing to American or British English.