modelling – Software Superglue

The purpose of a reference architecture is to identify the architectural principles that apply when creating a sustainable and scalable solution. As new additions are made to the solution, the reference architecture becomes the yardstick that all solution proposals can be measured against and so enable a fair comparison of ideas.

A key test of the principles used to create an architectural model is whether they display conceptual integrity. Proposed solutions and additions to the solution must respect the conceptual integrity and show where they deviate from said principles as well as motivating why it was necessary to do so.

Deviating from these architectural principles is often (well-)motivated for reasons of time and cost when implementing the solution. However, without an explicit reference architecture one cannot measure the effect of these compromises, compromises that generally result in higher maintenance costs. And the larger the deviations from the reference architecture, the greater the risk for higher maintenance costs and the harder it will be to continue developing and scaling up the solution in future.

For over time, the tendency is always towards greater and greater divergence between the solution and the model; until finally the conceptual integrity of the solution can no longer be discerned. How then do we preserve the conceptual integrity? By training and information dissemination, what is known as shared understanding as preached in the Agile world. The development team must share the same picture of the reference architecture if it is going to be maintained going forward; and it will also be much easier to explain how a solution works if it adheres to architectural principles compared to one that does not.

An essential part of a reference architecture is the creation of architectural artefacts, such information models, state machines, process diagrams and so on. Using standard modelling notation such as UML and BPMN reduces the risk for ambiguity and makes knowledge sharing that much easier.

Where to start? Try to identify what type of solution it is you are building. Does it fit into a known pattern? Try to find the appropriate technical literature (in book format and online) that provides a frame of reference (including vocabulary), and use it to create a reference architecture which can then be applied to the solution.

Growing

ACME is doing alright but they want to grow the business faster so they try doing some marketing. Again they build a simple application to support this business function. Examining the real world again they design the following data model.

The model contains just two entities; the customer again, this time with different attributes, and an entity called Contact Method.

Boom times

The marketing strategy is a success and ACME soon have to expand their operations and need to develop their existing systems to better handle the increased volume of customers and orders for pencils.

But now it’s becoming a hassle to have to create the customer in two systems and wouldn’t it be great if all customers created in the ordering system were also added to the marketing system automatically?

This shouldn’t be a problem as long as the two systems have compatible data models. In other words, a customer entity in the ordering system can map to a customer entity in the marketing system. But if it’s not possible, which system do we change? The ordering system is business critical so we may not want to mess with that one too much. However, ACME are thinking long-term and realise that they need a more robust representation of reality, one that the company can grow into.

At this point they go back to their view of reality and create a model that is independent of any system, a reference model if you will. This is called an information model. Or as Wikipedia explains:

An information model provides formalism to the description of a problem domain without constraining how that description is mapped to an actual implementation in software. There may be many mappings of the information model. Such mappings are called data models, irrespective of whether they are object models (e.g. using UML), entity relationship models or XML schemas.

The information model now serves two purposes. First, to aid future software design in creating robust data models, for example by supporting different customer address types. Secondly, to enforce a common terminology across the system landscape and in the documentation, e.g. a mobile phone number is to be called “Mobile number” when writing user stories, test cases, defining class names and methods, creating database tables, etc.

In order for the Ordering system and the Marketing system to be able to exchange information, they can try to map their data models to the information model. All the existing data models and information models are modelling reality so the differences really arise from how faithful or granular the data model is compared to reality.

An organisation can have many data models, usually one per system, but should only have one information model. Different parts of the organisation may only be interested in certain entities and relationships and may create an information model for the parts of reality they are interested in, but these partial information models are really all part of the same organisation-wide information model, even if a complete information model does not yet exist. In very large companies this may not be practical or desirable especially where autonomy between divisions is encouraged.

An information model is almost never implemented as-is in a system. Firstly, an information model will often contain more entities and attributes than any one system needs to implement. The reverse is also true: data models will contain application-specific artefacts as well, as entities needed to handle many-to-many relationships for instance. Secondly, data models are optimised for the specific system that utilises them, meaning the developers have combined entities and attributes in ways that improve the performance of the database. Again, information models should not constrain the implementation of the data model.

Going global

ACME have now decided to establish operations in Europe and have opened a sales and support office in Sweden. The company is now multilingual. While the reality of ordering, shipping and marketing goods is the same globally, each country uses their own language to describe it.
So when the Swedish sales offices start sending Requests for Change back to HQ, they are using word like Kund for Customer and Beställning for Order. They are referring to the same thing but it is hard for the Swedish Sales people to discuss the changes needed with the English-speaking developers.

The different lingual groups need to agree on a common terminology, this can be neatly reflected in the information model (which also does not expose implementation details the way a data model does):

We can generalise and say that if English is the lingua franca of programming and programming languages, then there will always be a need to agree on the terminology in more than one language in non-English speaking countries. Put another way, the information model provides a useful bridge between the technical and business sides of the organisation which can often use different languages. While there are many tools that can be used to create information models, few have support for multiple languages in the same model unfortunately.

Conclusion

The difference between information models (IMs) and data models (DMs) can be summarised as follows:

IMs provide a formal description of the organisation’s view of reality.
There should only be one IM per organisation, but there can be many DMs, usually one per system.
IMs define the terminology that should be used in documentation and software development.
DMs are optimised for the application that needs them. IMs help future-proof the solution but should not constrain the DM.
IMs can support multilingual organisations where the business units are using another language than English.

In future articles I hope to discuss how information models can be used in integration platforms to aid the definition of canonical data formats when performing data mapping and also enforcing data access controls. Another area where information models are very important is Master Data Management and in the use of Data Standards.

Information models are also a visualisation of ubiquitous language which is an important part of Domain-driven design (DDD) and Behaviour Driven Development (BDD).

Tag: modelling

The importance of a reference architecture

Further reading

Information model vs. data model

Startup

Growing

Boom times

Going global

Conclusion