uml – Software Superglue

As a software developer or architect you will probably have had at least one discussion about the difference between information models and data models. Why do we want to make this distinction? In practice drawing an information model is much the same as drawing a data model; both use the entity-relationship model for describing the world. ER-diagrams are easily transformed into the SQL used to create the table structure in relational databases (MySQL, MSSQL, etc.). So when do we need to create information models? Let’s look at an example.

Startup

ACME Trading has started a business selling pencils to its customers. They have set up a very basic ordering system to handle orders and ship goods to their customers. They designed a data model that will support the business software by examining the process (reality) of ordering goods and came up with the following:

The model has just two entities, one for the customer and one for the orders. These entities contain all the attributes needed to fulfil an order.

Growing

ACME is doing alright but they want to grow the business faster so they try doing some marketing. Again they build a simple application to support this business function. Examining the real world again they design the following data model.

The model contains just two entities; the customer again, this time with different attributes, and an entity called Contact Method.

Boom times

The marketing strategy is a success and ACME soon have to expand their operations and need to develop their existing systems to better handle the increased volume of customers and orders for pencils.

But now it’s becoming a hassle to have to create the customer in two systems and wouldn’t it be great if all customers created in the ordering system were also added to the marketing system automatically?

This shouldn’t be a problem as long as the two systems have compatible data models. In other words, a customer entity in the ordering system can map to a customer entity in the marketing system. But if it’s not possible, which system do we change? The ordering system is business critical so we may not want to mess with that one too much. However, ACME are thinking long-term and realise that they need a more robust representation of reality, one that the company can grow into.

At this point they go back to their view of reality and create a model that is independent of any system, a reference model if you will. This is called an information model. Or as Wikipedia explains:

An information model provides formalism to the description of a problem domain without constraining how that description is mapped to an actual implementation in software. There may be many mappings of the information model. Such mappings are called data models, irrespective of whether they are object models (e.g. using UML), entity relationship models or XML schemas.

The information model now serves two purposes. First, to aid future software design in creating robust data models, for example by supporting different customer address types. Secondly, to enforce a common terminology across the system landscape and in the documentation, e.g. a mobile phone number is to be called “Mobile number” when writing user stories, test cases, defining class names and methods, creating database tables, etc.

In order for the Ordering system and the Marketing system to be able to exchange information, they can try to map their data models to the information model. All the existing data models and information models are modelling reality so the differences really arise from how faithful or granular the data model is compared to reality.

An organisation can have many data models, usually one per system, but should only have one information model. Different parts of the organisation may only be interested in certain entities and relationships and may create an information model for the parts of reality they are interested in, but these partial information models are really all part of the same organisation-wide information model, even if a complete information model does not yet exist. In very large companies this may not be practical or desirable especially where autonomy between divisions is encouraged.

An information model is almost never implemented as-is in a system. Firstly, an information model will often contain more entities and attributes than any one system needs to implement. The reverse is also true: data models will contain application-specific artefacts as well, as entities needed to handle many-to-many relationships for instance. Secondly, data models are optimised for the specific system that utilises them, meaning the developers have combined entities and attributes in ways that improve the performance of the database. Again, information models should not constrain the implementation of the data model.

Going global

ACME have now decided to establish operations in Europe and have opened a sales and support office in Sweden. The company is now multilingual. While the reality of ordering, shipping and marketing goods is the same globally, each country uses their own language to describe it.
So when the Swedish sales offices start sending Requests for Change back to HQ, they are using word like Kund for Customer and Beställning for Order. They are referring to the same thing but it is hard for the Swedish Sales people to discuss the changes needed with the English-speaking developers.

The different lingual groups need to agree on a common terminology, this can be neatly reflected in the information model (which also does not expose implementation details the way a data model does):

We can generalise and say that if English is the lingua franca of programming and programming languages, then there will always be a need to agree on the terminology in more than one language in non-English speaking countries. Put another way, the information model provides a useful bridge between the technical and business sides of the organisation which can often use different languages. While there are many tools that can be used to create information models, few have support for multiple languages in the same model unfortunately.

Conclusion

The difference between information models (IMs) and data models (DMs) can be summarised as follows:

IMs provide a formal description of the organisation’s view of reality.
There should only be one IM per organisation, but there can be many DMs, usually one per system.
IMs define the terminology that should be used in documentation and software development.
DMs are optimised for the application that needs them. IMs help future-proof the solution but should not constrain the DM.
IMs can support multilingual organisations where the business units are using another language than English.

In future articles I hope to discuss how information models can be used in integration platforms to aid the definition of canonical data formats when performing data mapping and also enforcing data access controls. Another area where information models are very important is Master Data Management and in the use of Data Standards.

Information models are also a visualisation of ubiquitous language which is an important part of Domain-driven design (DDD) and Behaviour Driven Development (BDD).

Tag: uml

Information model vs. data model

Startup

Growing

Boom times

Going global

Conclusion