Darwinian XML Musings

Darwinian XML Musings


Posted on June 2016


It has been conference season for the last couple of months. Lavacon Dublin, the STC Summit, and the MIT Information, Communication and Technologies (ICT) Conference have me thinking about evolution and natural selection including the unexciting, but important issue of Metadata. I admit that I have not got it figured out yet. But I have some thoughts.

Metadata Dimensions

Metadata is information about the content. Metadata is important for two main purposes. It is needed when an author searches to find content for potential reuse. It is also essential for conditional publishing to let the information consumer get just in time, just enough content, on the device of their choice. Metadata (sometimes called properties or attributes) can be applied at the level of a document, topic, section, image, or something else.

Metadata Time Travel

Going back in time, during the 1980's, I worked at Computer Corporation of America where organizations were putting numbers, records, and fields into database management systems (DBMS). There is a time-independent link to content management here because the idea was to harness control and efficiency by letting multiple computer programs access the same numbers, records, and fields. The switch, from what had been "flat files" to a reusable database made a huge difference in speed and efficiency. In the process of moving to a DBMS, metadata was used to describe the nature of numbers, records, and fields. In some ways, metadata was as important as the data itself because it played such an important role in access.

Today's Tech Pubs

Today, Tech Pubs uses Metadata for the same purposes. The DITA standard has a well-developed system of standard Metadata along with the option of specialized Metadata. It works for technical publications and there are many consultants who will, of course, help. At the recent Lavacon Conference one of the industry Gurus discussed the difficult problem of Metadata at the corporate level. She described significant efforts undertaken to define and establish a corporate Metadata Taxonomy.

Here is My Issue

I just don't see how we can come close to pre-specifying a comprehensive Metadata taxonomy at the corporate level. I believe that the reuse of content is sometimes difficult because of both nuanced audience requirements and subtle author prejudices. I am inclined to think that after a broad Metadata taxonomy is specified, a useful Metadata taxonomy needs to evolve. And as Darwin asserted, evolution and natural selection happen but are not planned. It may be important and even critical for authors to understand the function, mindset, and prejudices that were behind the content available for reuse. How are we going to capture all possible variations of Metadata required for great content across the enterprise? Answer: I don't think we can do this at all, and never in a static structure.

Smartest Guy in the Room at MIT's ICT

Needless to say, there were a lot of smart presenters, participants, and exhibitors hanging around the MIT ICT conference. The 3-D printing companies certainly have some great toys. Neil Gershenfeld, Director, MIT's Center for Bits and Atoms was in a class by himself with his discussion of the impending emergence of a new computing architecture. After reviewing the work of a number of early computer scientists he discussed how digital computing had pretty much replaced analog computing. He said that at first this shift had seemed totally logical and comprehensive. But he declared that there were a lot of computing applications where analog computing is superior to digital computing and that its broader death has been largely premature.

In response to a question about what is next, he went on to discuss an emerging computing paradigm that was more biologic than digital. He said that computing structures and materials will soon evolve themselves with internal intelligence to become useful. Drawing us from computing to biology, he talked about the formation of earth and that hydrogen and oxygen became water. Primitive cells emerged and evolved to become amoebas. Subsequently they evolved to become, quite recently, animals and plants, then humans. Some cells have specialized as arms or eyes. And so he said it would be in the next generation of computing. Computing systems would be smart enough to evolve without human planning or intervention. Most attendees realized they were in the room with a true genius. (More information on Neil Gershenfeld here: http://ng.cba.mit.edu/ )

Implication for Metadata Taxonomy

And so I've been thinking that this is exactly what needs to happen with Metadata at the enterprise level. I believe that content systems need to learn and evolve as Darwin stated. Based on Simply XML's experience, the importance of structured content and structured markup is much more than an XML issue, or a DITA issue. It is a usability and value issue. I will check with Neil (if I run into him at the coffee machine), but I believe that true content evolution will happen at the level of a cell of content that is a level above the DNA of DITA or S1000D or XML in general.

Content Evolution

Evolution will improve the life of content with a focus on the following content structures:

Human NeedInformation Type
What someone needs to doProcedure or Task
When to do somethingPrinciple, Policy, Rule
How something worksProcess
KnowledgeConcepts, Facts, References, Images
ConsistencyTemplates, Fragments, Tables, Structures
Specialized or technical understandingIndustry or domain-specific content

Evolution will further improve the use of content when modern processes and a realistic XML architecture allows information consumers to get just enough of the right information at the right time on their device of choice.

Simply XML's Prediction

As they become even more granular and complicated to meet the niche requirements of technical publications supporting massively complex systems, XML architectures risk extinction unless they achieve simplicity. Lightweight DITA, including its attempt to keep DITA and XML in the background, offers hope for the enterprise. Now is a critical time for markup. Like other examples in history, if markup cannot adapt and evolve to be relevant at the enterprise level in the new age of business 4.0, it will go the way of the dinosaurs.

Simply XML believes that products like Content Mapper, related technologies, and modern processes will emerge to force DITA and XML to the background. These modern tools will allow organizations to focus on true improvements to the content supply chain. Lightweight DITA has the potential to start meeting these evolutionary requirements.

And our advice remains...... Keep It Simple, Smart-person.