The Agile factory

In the book The Phoenix Project there is a part where Bill is discussing lead times with Wes and Patty. Even though a certain task only takes 30 seconds, the lead time is still hours due to the time spent waiting for a resource to become available (Brent). Bill draws a graph to demonstrate the problem: if a resource is fully (100%) utilised then wait times become very long.

This statement threw me for a bit until I did some research. The graph definitely nails the problem with trying to maximise utilisation of resources. I mean, it is counter-intuitive for a manager to allow resources to idle. So what is the graph actually showing?

The graph shows that when the average utilisation goes over 80-90% then wait times become very long. So over time, a very high average utilisation will cause the queue to grow very long, increasing lead times dramatically. In other words, if a resource is very busy then it cannot cope with a workload that varies. The team must be allowed some slack so that lead times are still reasonable even when the workload is sometimes higher than average. In short, there is a trade-off between workload variability and resource utilisation.

This is well-understood in the manufacturing industry, but it is intrinsically true for software development. Everything that is built in software development is a one-off, unique. This creates huge variability in the job times of the development process; no two projects are ever the same. The challenge then is to reduce this variability so that we can create a more predictable workload and push up utilisation.

In Agile we use techniques such as storyboarding, MVPs and backlog grooming to manage variability and ensure that we maintain flow. WIP limits and Velocity are our KPIs that let us know how well we are succeeding in maintaining flow. Flow refers both to managing the variability of the arrival time of work (i.e. breaking down the work into smaller deliverables) and the execution time of the job (e.g. sizing of User Stories).

The science

Back to Bill’s graph. Where does it come from? It is actually based on Kingsman’s formula which is from the domain of queueing theory. In layman’s terms the wait time is made up of three parts:

So what Bill is actually saying is that, given a certain variability and a certain job time, then the wait time will be a function of the utilisation as shown in the graph above. Bill wants to focus on utilisation, so he normalises the other parameters (variation and job time) as follows:

For an excellent explanation of the Kingsman’s formula have a look at EuroLEAN+’s Youtube tutorials.

Reducing variability

Variation is the norm in software development, so it has to be dealt with. There are several ways to mitigate variability. Reducing utilisation is one option, but we can do better than having our developers and testers idling.

Kingsman’s formula shows that adding more work centres reduces sensitivity to variation (as well as increasing capacity obviously). However, this is probably more feasible in manufacturing than in software development, because a work centre (i.e. a development team) often has domain expertise, i.e. no two teams have exactly the same capabilities. But this approach may be more applicable in larger organisations.

We have already mentioned reducing variability using Agile techniques such as storyboarding, MVPs and backlog grooming, and this should be the primary focus of the team coach in creating and optimising flow.

Another option available to development teams is to have a technical backlog containing work that is lower priority that can be used to fill idle time and bring utilisation closer to 100%. The kind of tasks in this backlog should be small and independent. For example, it could involve refactoring, writing automated tests, learning about a new technology, and so on.

In summary, it is the combination of these techniques that allows development teams to be fully utilised. What Lean teaches us, is that the same discipline and structure that is used to optimise manufacturing flows, applies even more so to software development.

Tracking the team’s velocity gives us insights into both utilisation and the amount of planned work vs. unplanned work. We can also track the velocity of both Stories and Epics to see how good we are at sizing our MVPs. (An Epic is always an MVP in my book; this makes it clear what the definition of Done is for an Epic).

Skipping the queue

One of the early problems Bill had to deal with was departments trying to skip the queue. This is the result of a chronic failure of the development process. If lead times become unacceptably long (due to high utilisation, high variability or both), then eventually people will try to find shortcuts. This just makes a bad problem even worse, and represents a total breakdown in the chain-of-command. That kind of short circuit has to dealt with before any other improvements have a chance of succeeding. Hence the need to start by visualising all of the work in process.


I was almost going to write that inventory doesn’t cost anything in software development, after all it is virtual. We don’t have to purchase raw materials and we don’t have to store anything in warehouses. (Yes, GitHub costs something, but it is a negligible cost in this context.)

But there is still inventory in software. The raw materials are just ideas, one-liners that take up virtually no space at all and until the team commits to building (analysing, developing and testing) something, the backlog can be reorganised and priorities changed as often as desired.

The rest of the inventory is in the queues between work centres, e.g. when handing over from development to test. This inventory does represent an investment in time and effort, e.g. breaking down the problem, defining an MVP and coding a solution. The cost of having this inventory is that the knowledge about the solution disappears over time; no amount of documentation can replace the shared understanding that existed when the team were working actively on the solution. Furthermore, TTM is probably the single most important factor for success nowadays. So to sum up, a lot of inventory, or WIP, is bad in software development. Watch those WIP limits!

Cycle time

Here is an example of a good article describing Lead times and Cycle times. However, the difference between the two is not very clear in my opinion. A new Initiative will contain an unknown amount of work, that’s why we analyse it and break it down into reasonably sized chunks; we are reducing variability and minimising risk. A task (e.g. User story) is only added to the backlog when it is somewhat well-defined and so the Lead time for every deliverable is a reasonably well-understood and managed parameter. Otherwise, Lead times just becomes guess work and that is not so useful.

But the backlog doesn’t only contain planned work, it also contains unplanned work; bugs and outages which must be dealt with immediately. This increases the Lead time for planned work in ways that can be hard to manage. While unplanned work cannot be avoided completely, it can be mitigated using a small iterative release process, i.e. continuous delivery, continuous improvements to the delivery process, as well as detective and preventative security controls.

So ideally, Lead Time only applies to tasks that are MVP-sized and, we should also have a WIP limit on new work to control Lead time. It does not make sense to fill up the backlog with Tasks that will be delivered years from now. Doing this, we achieve an understanding of the team’s capacity and that long Lead times indicate the need for an increase in team capacity, the need for more teams, or a change in priorities.

My agile development team was involved in storyboarding and backlog grooming for all new tasks, not just development and testing. The team were constantly managing the flow of deliverables at all stages on the Kanban board, both when there were too few and too many tasks in a queue. So the difference between “Task created” and “Work started” was really very small, and therefore Cycle time should be uninteresting.

In queueing theory there is a formula known as Little’s Law which is used to calculate the Cycle time. So does this formula still have relevance even if we are not interested in Cycle time?

The term “Cycle time” is somewhat non-intuitive. But if you think about it, the cycle time is also the average time it will take to deliver everything that’s on your Kanban board right now. For example, if your WIP is 10 and your average throughput is 2 tasks/day, then your cycle time is 5 days/task. Or put another way, the team can deliver everything on the Kanban board within the next 5 days. Now that’s a rather powerful statement. So Cycle time is also the Turnover time for all WIP.

The better the team get at breaking down the backlog into equal-sized chunks (i.e. minimising variability), the more relevant the turnover figure becomes.

And so if Cycle times converge with Lead times, then we are much more sure of our commitments to the business side of the organisation. Roll-on Big Room Planning!

Integration bloat

Integration platforms create a useful abstraction layer and are a prerequisite for building a Service Oriented Architecture. The integration platform is often the domain of an “Integration team” which may reside in-house or be out-sourced.

When building new services, one of the first things that has to be done is to create the service specification, which defines how the integration platform will publish your service. For SOAP web services this is done using WSDL. The integration team is then responsible for translating messages between systems, mapping fields, etc.

In some cases the integration work involves packaging a specific functionality of an existing legacy service and publishing it as a more intuitive and lightweight service that can be more easily consumed by modern clients. If the clients are under development, then the scope for the integration team may not be 100% specified. To compensate, the integration team can include mappings that might be needed. This can result in a service that contains more functionality than is strictly necessary to create a working solution.

When end-to-end testing is performed any problems found will be fixed, but only for that portion of the new service that is actually used by the client. Furthermore, the integration team may not have tested all or indeed any of the features of the service they created, instead relying on the end-to-end testing to find problems.

The result is an integration service that fulfils the client’s requirements but includes features that are untested. The integration team document the entire service but have no idea how much of the service has actually been verified to work. This creates a maintenance headache when the service must be modified.

The presence of superfluous fields is an obvious problem. A more subtle issue are fields that support specific values (like enums)  where clients use some values but not all. The service provider might allow values A,B,C,D,E,F, the integration documentation might only advertise A,B,C,D, and the client might only use A and B. In reality, the integration may allow all values if no validation is applied; however all that has been tested are A and B. Since the integration team do not have in-depth knowledge of the client behaviours, they have no alternative but to rely on their own code and documentation to understand the scope of the service.

In conclusion, once a service has been created that is too big for purpose, it is difficult if not impossible to reduce its functionality. Ideally, the service should be built up incrementally in an agile way-of-working, this ensures that the client and the integration are fully meshed. This method may not be possible with out-sourced integration teams. Another alternative is for the integration team to create a mock client that verifies the whole service even if no client actually exists that will use all of the service’s functionality. This at least would enforce a cost constraint on the integration team that will hinder the creation of services that are larger than necessary. Tools such as SoapUI and Postman can be used for this purpose.

The Delivery Storyboard

Storyboarding, or user story mapping, as described by Jeff Patton, is a central part of our Agile development process. We use it whenever we are doing feature discovery and it helps us structure our ideas without constraining the discussion. A lot of what we do involves integrations with other team’s deliveries, usually two or more systems need to interact and will use a service layer as a communication broker, SOA in a nutshell.

Our team’s definition of Done is getting stuff into production. We do that quite well because we only put stuff in our backlog that we can deliver, our dependencies are managed elsewhere, usually on the backlog of the team we are dependent on.

But when it comes time to deliver the complete solution there are a lot of moving parts to keep track of and one of my roles is to coordinate amongst the teams and make sure that each team knows what actions they need to take in order for the roll-out of the entire solution to be successful. Often this process takes weeks to complete because there can be data migrations, third-party upgrades, etc. My primary focus is the order things need to happen in, and which things involve more than one team to make it work. Optimising the schedule of events comes after.

I immediately found it natural to extend user story mapping to planning product roll-outs. This gives us all the benefits of visualisation and discovery that happen when we do product discovery. All the teams can see how the roll-out will be done and where everyone is involved. I call this a Delivery Storyboard. The Delivery Storyboard is completely separate from the User Storyboard we use for product discovery.

The delivery storyboard features a backbone (blue post-its) and describes a flow from left-to-right as usual. The flow in this case is the flow of execution of the activities needed to complete the roll-out. Each backbone activity is broken down into tasks (yellow post-its) that are placed underneath. An example of a task could be “Deploy component X to server” or “Import data file to System A”.

Now for the cool part. Each column is independent of every other column, whereas everything in the column has to be executed more or less at the same time. In other words, we focus on executing all the tasks in one column until completion. The next column can be executed an arbitrary time later, but then all of the tasks in that column must be executed together as well. Repeat the process until the last column is executed and the roll-out is complete.

When all the tasks in any column are completed, the production environment should be left in a stable state and not dependent on other tasks in other columns for the time being. The challenge then is creating an execution flow that is flexible and does not have dependencies or hard time limits between the tasks in one column and the tasks in the next. Of course this can’t always be avoided, but one of the goals here is to visualise these types of constraints!

Another problem I have encountered is that some tasks in the middle of the flow need to be (or can be) executed first. Either the columns need to be reordered or, the storyboard is trying to meet more than one goal. In the later case, try writing down the original goal on a post-it and see if all of the tasks on the board are needed for that goal. Then write another goal on another post-it for the remaining tasks and so-on. Each goal then deserves its own storyboard (big or small). This mirrors the concept of MVP (Minimum Viable Product) that Jeff Patton describes in his excellent book User Story Mapping.

And as always, regular stand-ups with all the teams involved, usually one representative from each team if there are many teams. Depending on where in the execution flow the roll-out is, not everyone needs to be at every stand-up. The teams walk through the delivery process, breaking down the work into concrete activities with clear responsibilities.

Each task is the responsibility of a specific team, and every task is tagged with a coloured sticker to indicate the team responsible. During product development, tasks, user stories, etc. are usually maintained in the team’s product backlog and this may still be so for some of the tasks on the Delivery Storyboard, but now they are duplicated here because we want to visualise dependencies to other teams, and where they will feature in the roll-out plan. If they have a JIRA issue number then write that on the post-it too.

A column with tasks that have different colour tags visually indicates where teams need to coordinate closely. That is pretty neat. Participants in the stand-ups can talk to each other about how they should collaborate to get the backbone item delivered successfully. During storyboarding sessions with the teams we can easily reorganise the tasks to minimise risk, reduce lead time and reduce downtime. The tasks in the column can also be ordered top-down to indicate the order of execution if meaningful.

When a task is completed you should mark it somehow, for example crossing it out using a green marker. This provides a visual cue to focus on the remaining tasks as well as green being a positive colour.

The Delivery Storyboard can be complemented with dates concerning when certain columns and/or tasks are to be executed which is useful for planning to meet deadlines. However, the main focus is on the sequence of events, who-does-what and where do teams need to coordinate their deliveries. Finally, the board should contains only tasks that will be executed, hopefully we are not doing product discovery at this late stage.

A place in Wikipedia

For years I have been reading and writing in Wikipedia. Some time ago I created a page for my home village Kilcloon. Village, or parish or maybe census town? I revisited the Wikipedia article numerous times and was keen to expand it. During my research about the history of Kilcloon it became obvious that Kilcloon could refer to many things, the most common of which is the parish of Kilcloon as stated at the beginning of the Wikipedia article.

There are other definitions, such as the postal town of Kilcloon which applies to some, but not all, of the parish. For me, growing up near the centre of the parish, the postal town was synonymous with the parish name, but apparently this is not so for everybody. Do people still identify themselves as living in Kilcloon if they have a different postal town in their address? Nowadays people moving into an area do not automatically associate themselves with the parish they are in. Parishes and parish boundaries are managed by the Catholic Church, not the state.

More definitions

So how does the Irish state define as Kilcloon? This depends on which authority you ask, and the answers are many! The postal service is run by An Post and Kilcloon is the name of the postal town covering just part of the parish, as mentioned above. A direct question to An Post about what townlands were part of the Kilcloon postal town did not provide a very satisfactory answer, but all was not lost.

Ireland has recently introduced postal codes (eircodes), unique for address, and these will replace the existing address system of townlands and postal towns, though the two systems are aligned for the time being since it is not mandatory to include an eircode when writing an address, yet. It turns out that the areas covered by each of the eircode routing keys has been published on Google Maps. Kilcloon is now part of the A85 (Dunshaughlin) routing key and actually is a very distinct appendage to this routing key as seen on the map. This I believe provides a definitive answer to what Kilcloon is from the postal service point-of-view.

Kilcloon also features in the Central Statistics Office (CSO) statistics as a “census town” or “settlement”. Kilcloon settlement can be seen clearly on the CSO Small Area Population (SAP) map. This can be compared to the Meath County Council’s definition of Kilcloon which is in the form of four physical signposts centred around Ballynare Crossroads. This is the geographically smallest definition of Kilcloon that exists and could be defined simply as the “village” of Kilcloon, which is much smaller than the census town and contains only a fraction of the people that consider themselves as living in “Kilcloon”.

Some history

And so back to the parish of Kilcloon. Historically, the parish of Kilcloon is a modern parish that comprises several smaller medieval parishes, one of which was called “Kilclone”. My research shows that the medieval parish was often referred to as “Kilcloon” and this was used to name the modern parish. Every medieval parish was comprised of townlands, one of which bore the same name as the parish, thus there exists a townland of Kilclone in the medieval parish of Kilclone. While the medieval parish names have disappeared the townlands prevail and are a central part of the postal address system mentioned above. The local post office is called Kilclone Post office precisely because it is in the townland of Kilclone for instance.

The townlands themselves have also been transformed through the ages and the modern townland boundaries differ to varying degrees from the boundaries as they were when the parishes were first formed. This is the subject of some amazing research and the results are available on It has also provided the inspiration to create the maps I would use to illustrate the multitude of definitions of the place known as Kilcloon.


Based on all of this research, there were five definitions of Kilcloon that I wanted to created maps for: the parish, the townland (Kilclone), the postal town, the census town and the village!

The townlands website uses the fantastic OpenStreetMap and Leaflet JavaScript library to create maps of all of the Irish townlands, baronies and much more! The data is publicly available and I could extract the coordinates from the web page to create unique maps for the Kilcloon Wikipedia article. These first maps showed which townlands the modern parish of Kilcloon included as well as which baronies the townlands were originally part of.

Medieval parishes and their associated baronies

The Routing Key map data could also be downloaded and used to render the Kilcloon postal area. Leaflet could overlay the A85 routing key onto the parish to see how the lined up!

Leaflet naturally allows points-of-interest to be displayed, so I created several maps showing the most important features of the parish. Finally, the trickiest maps to create were the parish and census town maps. The Kilcloon census town map is available on the CSO SAP map, but not the data. Still I managed to extract the data through visual inspection. The village is defined only by physical sign posts on the roads leading into Ballynare Crossroads, but I combined the positions of the signposts with property boundaries in the area to create a theoretical village boundary and add the coordinates to a Leaflet map.


Creating the maps required some straightforward JavaScript to render the maps. I wanted the code to be open source since the maps must be maintained along with the Wikipedia page, so I added a simple index page to the code base that would render each map in turn and checked everything into Github.


Kilcloon on Wikipedia
Kilcloon maps on GitHub

Scalable Observer Pattern

When developers talk about publish-subscribe design patterns I immediately think of the newspaper analogy. As described in Head First Design Patterns:

  1. A newspaper goes into business and begins publishing newspapers.
  2. You subscribe to a particular publisher, and every time there’s a new edition it gets delivered to you. As long as you remain a subscriber you get new newspapers.
  3. You unsubscribe when you don’t want papers anymore, and they stop being delivered.
  4. While the publisher remains in business, people, hotels, airlines, and other businesses constantly subscribe and unsubscribe to the newspaper.

As a software design pattern, this is known as the Observer Pattern. In this pattern the publisher is called the Subject and the subscribers the Observers.

Comparison of Observer and Pub-Sub patterns

The Observer Pattern has some limitations such as scalability and hard-coupling. Unlike the physical world of newspapers it is possible to build an improved subscription service that does scale and is loosely-coupled. This improved pattern is called the Publish-Subscribe Pattern (or “pub-sub”).

Now you’re wondering, why name the pattern “publish-subscribe” when it does not behave like a newspaper pattern?? This has caused a lot of consternation in my discussions with other system architects. Unless one is aware of the naming convention used for these patterns; then it has happened that one person is talking about pub-sub and the other thinks they’re talking about newspapers.

It would be have been more intuitive to have called pub-sub something like the Scalable Observer Pattern.

Information model vs. data model

As a software developer or architect you will probably have had at least one discussion about the difference between information models and data models. Why do we want to make this distinction? In practice drawing an information model is much the same as drawing a data model; both use the entity-relationship model for describing the world. ER-diagrams are easily transformed into the SQL used to create the table structure in relational databases (MySQL, MSSQL, etc.). So when do we need to create information models? Let’s look at an example.


ACME Trading has started a business selling pencils to its customers. They have set up a very basic ordering system to handle orders and ship goods to their customers. They designed a data model that will support the business software by examining the process (reality) of ordering goods and came up with the following:

The model has just two entities, one for the customer and one for the orders. These entities contain all the attributes needed to fulfil an order.


ACME is doing alright but they want to grow the business faster so they try doing some marketing. Again they build a simple application to support this business function. Examining the real world again they design the following data model.

The model contains just two entities; the customer again, this time with different attributes, and an entity called Contact Method.

Boom times

The marketing strategy is a success and ACME soon have to expand their operations and need to develop their existing systems to better handle the increased volume of customers and orders for pencils.

But now it’s becoming a hassle to have to create the customer in two systems and wouldn’t it be great if all customers created in the ordering system were also added to the marketing system automatically?

This shouldn’t be a problem as long as the two systems have compatible data models. In other words, a customer entity in the ordering system can map to a customer entity in the marketing system. But if it’s not possible, which system do we change? The ordering system is business critical so we may not want to mess with that one too much. However, ACME are thinking long-term and realise that they need a more robust representation of reality, one that the company can grow into.

At this point they go back to their view of reality and create a model that is independent of any system, a reference model if you will. This is called an information model. Or as Wikipedia explains:

An information model provides formalism to the description of a problem domain without constraining how that description is mapped to an actual implementation in software. There may be many mappings of the information model. Such mappings are called data models, irrespective of whether they are object models (e.g. using UML), entity relationship models or XML schemas.

The information model now serves two purposes. First, to aid future software design in creating robust data models, for example by supporting different customer address types. Secondly, to enforce a common terminology across the system landscape and in the documentation, e.g. a mobile phone number is to be called “Mobile number” when writing user stories, test cases, defining class names and methods, creating database tables, etc.

In order for the Ordering system and the Marketing system to be able to exchange information, they can try to map their data models to the information model. All the existing data models and information models are modelling reality so the differences really arise from how faithful or granular the data model is compared to reality.

An organisation can have many data models, usually one per system, but should only have one information model. Different parts of the organisation may only be interested in certain entities and relationships and may create an information model for the parts of reality they are interested in, but these partial information models are really all part of the same organisation-wide information model, even if a complete information model does not yet exist. In very large companies this may not be practical or desirable especially where autonomy between divisions is encouraged.

An information model is almost never implemented as-is in a system. Firstly, an information model will often contain more entities and attributes than any one system needs to implement. The reverse is also true: data models will contain application-specific artefacts as well, as entities needed to handle many-to-many relationships for instance. Secondly, data models are optimised for the specific system that utilises them, meaning the developers have combined entities and attributes in ways that improve the performance of the database. Again, information models should not constrain the implementation of the data model.

Going global

ACME have now decided to establish operations in Europe and have opened a sales and support office in Sweden. The company is now multilingual. While the reality of ordering, shipping and marketing goods is the same globally, each country uses their own language to describe it.  
So when the Swedish sales offices start sending Requests for Change back to HQ, they are using word like Kund for Customer and Beställning for Order. They are referring to the same thing but it is hard for the Swedish Sales people to discuss the changes needed with the English-speaking developers.

The different lingual groups need to agree on a common terminology, this can be neatly reflected in the information model (which also does not expose implementation details the way a data model does):

We can generalise and say that if English is the lingua franca of programming and programming languages, then there will always be a need to agree on the terminology in more than one language in non-English speaking countries. Put another way, the information model provides a useful bridge between the technical and business sides of the organisation which can often use different languages. While there are many tools that can be used to create information models, few have support for multiple languages in the same model unfortunately.


The difference between information models (IMs) and data models (DMs) can be summarised as follows:

  • IMs provide a formal description of the organisation’s view of reality.
  • There should only be one IM per organisation, but there can be many DMs, usually one per system.
  • IMs define the terminology that should be used in documentation and software development.
  • DMs are optimised for the application that needs them. IMs help future-proof the solution but should not constrain the DM.
  • IMs can support multilingual organisations where the business units are using another language than English.

In future articles I hope to discuss how information models can be used in integration platforms to aid the definition of canonical data formats when performing data mapping and also enforcing data access controls. Another area where information models are very important is Master Data Management and in the use of Data Standards.

Information models are also a visualisation of ubiquitous language which is an important part of Domain-driven design (DDD) and Behaviour Driven Development (BDD).

Business Process Modelling with BPMN

Having moved away from software development and design and more towards management of IT processes and services, I have found that Business Process Modelling is more applicable than UML to describing the kinds of processes I am encountering. This is not surprising, as UML is more IT-centric and I needed more flexibility to capture the realities of how things work in real life. Yes, you can use a combination of UML diagrams to capture a real-world process, but this is not as intuitive to non-IT people of which I encounter more often.

My first attempts at modelling a business process was using activity diagrams, sequence diagrams and use case models. The use case model defines all of the actors involved – both people and systems, the sequence diagram showed the message flow between them.

Figure 1 – Use Case Model Diagram
Figure 2 – UML Activity Diagram

However, this was still too low-level and I needed something that would capture the “big picture”. After all, a high-level process (e.g. a sales process) can naturally be broken down into sub-processes. Each level of detail provides meaning to the different layers of the organization as appropriate. Of course, UML is still important for helping to formally describe the resulting IT systems implementation.

The nice thing about BPMN is that you can practice it all the time. With UML you generally want to be working on something IT related, but BPM can be applied to any process. For instance, how do people get something to eat for lunch? Do they eat out or have they brought a lunch box? This process can be described using BPMN.

Figure 3 – Process for eating lunch using BPMN

If BPM interests you and you are reading this article, the chances are that you are a pioneer in in your organization. BPMN is an industry-standard notation so if you are learning BPMN then the quicker you learn the rules and follow best-practice the more rewarding will be the result. I highly recommend the following two books:

Spending time formally documenting a process may seem like a waste of time in some ways. In the real world, situations change and people adapt or take shortcuts and the process model may be out-of-date in no time, but your BPMN model should not try to capture every detail or variation. More importantly, modelling a process using BPMN is an excellent aid to understanding how a given process currently works (even if it is dysfunctional). This process analysis can be much more complete when using a comprehensive notation like BPMN – if it can’t be modelled in BPMN then there is probably some wrong assumption or something hidden in the process that needs to be investigated. BPMN gives you the confidence to pursue a process analysis to its proper conclusion.

I will finish with an example of a process model I was grappling with recently. Systems integration is often done using messaging, typical of a Service Oriented Architecture. Files are transferred from one server to another and then imported into the recipient software system. (As this is an IT-centric problem I could of course have used UML to model this.) File transfer is either push or pull, in this case push. The sender places files on the recipient’s file system. The receiver checks for new files every few seconds and if it finds any it processes them.

Modelling system interaction in BPM it is called a collaboration. The collaboration is named after the process, in the case “File transfer”, and the lanes are named after the actors. The first thing I had to figure out was whether to use events to show that a message had arrived. At the same time the recipient is busy polling the directory looking for files, and will continue to do so as long as the service is available.

The sender and receiver are modelled as two separate processes. The sender sends the file using a message activity with a message flow symbol attached.

Figure 4 – File transfer using BPMN

The message is sent to the recipient’s polling subprocess which can generate a non-interrupting escalation event (ooh!) (the little arrow in the dotted circle) to trigger the next activity that processes the files. The subprocess is looped (the little circular arrow), so it will continue to run after the escalation occurs (forever in this case).

So how did I know how to use a non-interrupting escalation? Well, the non-interrupting part is just saying that the event does not interrupt the subprocess flow, i.e. polling will still continue when files have been found. The escalation part, just means that the polling process has found files and needs someone else to deal with them, so it notifies the parent process (escalation).

The diagrams were produced using Visio Professional 2016 which includes a function to validate the diagram according to BPMN 2.0 (“Check diagram”).

The agile way to migrate from Gmail to Office 365

I was recently working on a migration from Google Apps to Office 365 and was not happy with the big bang approach for migrating email as suggested by Microsoft. This is just too big a risk since email is a critical service for communication within the company – and with customers. It also meant that everyone would start using Office 365 at the same time, which provided no opportunity to improve the migration process once it was set in motion.

So I worked out a way to do an agile migration, where users could be migrated in batches and the administrator could refine the migration process with each iteration Kaizen-style. I decided to publish a generalised procedure that hopefully could be of use to others looking for a better way. At the very least, it should provide some insights into how to plan your own Office 365 migration.

Thanks to Finn McCann for reviewing the document and providing valuable insights. Enjoy!

Retro games

So I bought a Raspberry Pi 3 and installed an OpenELEC’s implementation of Kodi, the media centre application. This would finally replace my Windows Media Center (WMC) PC that I’d mothballed some time ago. Back then I had decided to convert my DVDs into ISOs in order to capture any extra stuff that came with the film, and (apart from WMC) Kodi was the only mainstream app I could find that could play back ISOs.

I have had a Synology DS412+ for a while now to back up files, photos and home videos, and I had also transferred my ISO collection to it. The Synology does have DLNA support and I can navigate the video/music libraries on it from my Samsung TV. However, the DS412+ with its four bays is more for business users, and has limited transcoding support compared to the Synology “Play” variants. But even the Play devices cannot compare to Kodi’s transcoding capabilities, and Synology cannot play back ISOs. Converting to some other container format seemed like the wrong way to solve the problem.


Once the OpenELEC bundle was installed on the SanDisk 32GB micro SD card and the Pi was connected to the TV, Kodi started up automatically. Kodi can be navigated using the TV’s remote control thanks to HDMI-CEC eliminating the need for an extra remote control. The setup was fairly straightforward, I needed to do the following:

  1. Make my Synology media available in Kodi. There are some default sources set up in Kodi that point to the local filesystem, I edited these to point to the relevant folders on the NAS using the Synology’s NFS service.
  2. Get Kodi to fit properly on the screen. On larger screens Kodi can be too big but there is an option to resize it to fit the screen called Zoom. I set this to -4% which was perfect.
  3. Display the time and date correctly. Firstly, Kodi needs to be synced with an NTP server so that it displays the correct time and date. Then I also wanted it to display both the time and the date in the correct format. I navigate to System -> OpenELEC -> Network and added the standard three NTP servers to the list of Timeservers:

After that everything setup and ready to play.

Arcade console

The Raspberry Pi is a general purpose computer and a media centre is just one of the uses it can be put to. I had played old 80’s arcade games on MAME about 15 years ago on my PC and thought why not use the Pi now.

There are a couple of methods to turning a Pi into an arcade game emulator. One way is to use RetroPie, a dedicated arcade game Linux setup, however that would mean replacing OpenELEC which I didn’t want to do for obvious reasons. The other option is to use RetroArch which plugs nicely into Kodi. In fact RetroPie is built on RetroArch. RetroArch works as a launcher for many different emulators including MAME. The emulators are including in the RetroArch distribution but not the game ROMs themselves.


I installed RetroArch and tested the one game that was included (a Sega Genesis game) which worked fine. To start a game, go to Program -> Advanced Launcher -> Default and select an emulator and then a game to play. Before we go any further, I will explain the parts of the RetroArch filesystem that were most relevant to my setup:


This is where all of the many configuration options of RetroArch are stored. There is also a GUI (called RGUI) which can be used to edit these settings. More on that later.


This is where the ROMs go. In Kodi select the emulator you want to use to run the new game(s) and use the context menu to “Add items”. I use the option to scan for new items which are then automatically added to the list of games under the emulator. The scan will also remove items whose ROMs have been deleted.


Here is the list of emulators that ship with RetroArch. Only some of them are preconfigured in the Kodi Advanced Launcher menu. Setup more of these emulators in Kodi as needed.


Here is the reference configuration. This is a handy cheatsheet that explains what each setting in retroarch.cfg does, as well as showing you the default value.

First ROM: Hardhat

On the MAME website there a few free ROMs to download. So I installed Hardhat in the ROMs directory using WinSCP. Then I added the game to “MAME / iMame4All” in Kodi and that ran fine too.

When RetroArch starts from Kodi, Kodi is replaced with the emulator and the TV remote control can no longer be used. So I plugged in a USB keyboard which was all I had available. RetroArch uses default bindings for keyboards out of the box. Here are the basics:

  • Right shift: Insert coins
  • Enter: Start game
  • Left/Right arrow keys: Move left/right
  • Space: Shoot

Once I could use the keyboard to play games, I started looking for a pair of SNES joypads to make the experience more authentic. These USB joypads were a small investment, of course RetroArch can bind to all kinds of game controllers, but for most of the early arcade games, the SNES joypads have sufficient functionality. I plugged the first one in and fired up Hardhat. Retroarch found the joypad but complained that the “controller not configured”. What to do?

RetroArch does of course have a (very large) configuration file which includes the settings for binding game controllers. RetroArch also provides a GUI (called RGUI) for editing the same settings. There is no obvious way to start RGUI from Kodi but I accidently stumbled across it when I renamed the the “” ROM to “” (Linux is case-senstive). When Kodi tried to launch the emulator using “” it failed and the RGUI started instead (which is the default behaviour I assume).

In RGUI I used the keyboard to navigate the menus. Here are the most relevant bindings:

  • Up/Down arrow keys: Move up and down the menus
  • Left/Right arrow keys: Hop up and down the menus
  • x: Enter submenu or edit value
  • z: Leave submenu or stop editing
  • Esc: Quit RGUI

SNES controller

So I navigated to Settings->Input->Input User 1 binds and bound the joypad to each control field. There were 10 in all: Left, Right, Up, Down, A, B, X, Y, Start and Select.

Super Nintendo controller

My plan was only to have the joypads plugged in the Pi; I wanted to avoid having a keyboard lying around just so I could press “Esc” to return to Kodi. This is where the RetroArch Hotkeys comes in. The SNES controller includes the “L” and “R” shoulder buttons which are not needed for most early arcade games. So I bound “L” as the RetroArch HotKey enabler (Settings->Input->Input Hotkey Binds->Enable hotkeys) and “R” as the “Quit RetroArch” hotkey (…->Input Hotkey Binds->Quit RetroArch). So now when I press “L” and “R” together the game exits and Kodi is restored. Bye bye keyboard.

input_enable_hotkey_btn = "4"
input_exit_emulator_btn = "5"

When I plugged in the second SNES joypad RetroArch automatically applied the same bindings to it which was nice.

The last problem was the games themselves were too big for the TV screen. The top and bottom were not visible which meant I couldn’t see vital information like the score and the number of lives left. RetroArch solved this too. This was fixed by changing the setting Settings->Video->Integer Scale to ON.

Finally, I changed the setting on the Advanced Laucher to Activate “Launching Application” notification. This is so that I could see the Kodi was responding even if it took a few seconds for RetroArch to warm up.


MAME is built for PCs which means it expects the user to be seating in front of the keyboard and to be able to type in commands or use hotkeys. iMame4All is built on MAME (currently MAME version 0.37b) and is aimed at mobile phone and other touchscreen platforms and is therefore better suited to a media center platform like Kodi.

RetroArch ships with MAME, iMame4All and lots of other emulators but only a handful are preconfigured in Kodi. The “MAME / iMame4All” menu item is preconfigured to run the iMame4All emulator but can be changed to run one of the MAME emulators included with RetroArch if desired.

MAME 0.37b is a very old version of MAME from 2000, so finding ROMs that work with that version of the emulator via the normal ROM websites was not going to be easy. So I searched for “mame 0.37b5 roms download” instead.

Once I had a few games up and running, I added a thumbnail to each game, usually a screenshot, to give a visual clue about what type of game it is. Of course you can add more metadata to the Kodi menu items to aid filtering if you have a lot of ROMs.

And that’s it. Just got a find the time to play now.

Big Data and the new EU regulations

On Tuesday, the new EU regulations regarding Big Data went into force. This affects all companies and authorities who are registering and storing personal data. This replaces the patchwork of rules and regulations that exist today:

On 4 May 2016, the official texts of the Regulation and the Directive have been published in the EU Official Journal in all the official languages. While the Regulation will enter into force on 24 May 2016, it shall apply from 25 May 2018. The Directive enters into force on 5 May 2016 and EU Member States have to transpose it into their national law by 6 May 2018. ( Read more)

The major points of the legislation are (source Wikipedia) :

  1. Responsibility and accountability: controllers have much more responsibility for the proper management of personal data.
  2. Consent: Valid consent must be explicit for data collected. Consent for children under 16 must be given by child’s parent or custodian.
  3. Data Protection Officer: A person with expert knowledge of data protection law and practices should assist the controller.
  4. Data breaches: Breaches must be reported to the Supervisory Authority as soon as they become aware of the data breach.
  5. Right to erasure: The data subject has the right to request erasure of personal data related to him.
  6. Data portability: A person shall be able to transfer their personal data from one electronic processing system to and into another.

Further reading: The EU Data Protection Reform and Big Data Factsheet (PDF)

With regards to exporting data outside the EU, the now invalid Safe Harbour agreement has been replaced with the new EU-U.S Privacy Shield which is promises to improve the handling of EU citizens data by U.S. authorities and companies.

Further reading:EU-U.S. Privacy Shield (PDF)