Towards content lakes

One of the trends in both data and content management is the move away from silos. In data management circles, there is a trend towards the collection and aggregation of customer data into “data lakes”. According to Margaret Rouse, a data lake is:

A storage repository that holds a vast amount of raw data in its native format until it is needed. While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data. Each data element in a lake is assigned a unique identifier and tagged with a set of extended metadata tags. When a business question arises, the data lake can be queried for relevant data, and that smaller set of data can then be analyzed to help answer the question…Like big data, the term data lake is sometimes disparaged as being simply a marketing label for a product that supports Hadoop. Increasingly, however, the term is being accepted as a way to describe any large data pool in which the schema and data requirements are not defined until the data is queried.

(source: what is a data lake?)

“Content lake” isn’t a word that’s used in the content management or technical communication sectors yet, and whilst it seems unlikely end user content will grow at the same rate as other forms of data, there’s a fair chance this phrase could catch on.

A content lake is likely to have similar attributes to a data lake:

  • Content will be stored in a native format that is then changed into other formats.
  • It will use a flat architecture to store data.
  • Content will be stored in some type of structured format. Perhaps XML, JSON or plain text (with AsciiDoc-like attributes assigned to certain sections). However, user documentation does not require the rigorous structure of other forms of content.
  • The content lake can be queried for relevant content, and that a smaller set of information can then be extracted to help answer questions. This might not mean publishing content on-the-fly, but generating PDFs, CHM files and web-based content from a single source.
  • Rather than content being simply archived, it will deliver the right information in very short timeframes.

See also:

Please share your comments below.

2 Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.