wip – Software Superglue

Kanban continued

In an earlier post I showed how a team can get started with Kanban by visualising and organising their work using post-its, and concluded with a look at managing WIP (work-in-process). In this article I want to broaden the discussion to look at some of the other features of Kanban that can help the team manage their work.

WIP is a problem for most teams, especially when release cycles are long (1 month or longer), but there are other factors that can affect flow and that can also contribute to a lot of WIP.

Categorising work

The Kanban board represents what the team are expected to deliver; therefore the work must be well-defined with clear acceptance criteria (as distinct from the discovery phase where ideas are still being explored and the scope is unclear). There is usually a week’s worth (or a sprint’s worth) of tickets ready to be pulled from the “To do” column.

Something I encounter frequently in teams is where developers pre-assign tickets to themselves that they know they will work on, but haven’t started yet. When I asked why they do this I learned that they want to know what was in their pipeline, what was coming up next. So, assigning the issue to themselves was their way of labelling the issue as being part of their domain, e.g. web development, or back-end development.

We talked about other ways of tagging the issue; Jira provides the Components field and Labels field to help categorise issues, and on the Kanban board it is possible to create “Quick filters” that will filter on these fields. For instance, the front-end developers were able to create a filter to that only showed “web” components. This turned out to be a great idea, something the whole team could use to find certain types of work on the board.

Using filters in this way also has the advantage of highlighting all of the work related to a specific component or label. And whereas a ticket can only be assigned to one person, there can be many components and labels set on a ticket, allowing it to be filtered in different ways.

Identifying bottlenecks

Even if the team manage to get WIP under control, it will still happen that bottlenecks will occur. After all, every piece of software development is unique and will have its own unique challenges with delivering it.

As a coach, I can talk to the team about what appear to be bottlenecks in the flow – with Kanban the problem is there for all to see. If the bottleneck is in testing, the developers may not see it as their problem to solve, until I point out that continuing to build more stuff that needs to be tested just adds to the bottleneck, and they will be waiting that much longer for the code to reach production. This usually gets the team thinking.

There are two things the developers can do in this case, either help out with testing or, do some technical improvement that does not require QA resources. In the best case, the developers use their skills to help automate some of the testing, a win-win for the whole team.

Flagging blockers

As I stated at the beginning, when a ticket is added to the board it should have a clear Definition of Done and clear acceptance criteria. But it will still happen that there is hidden complexity in the ticket and work on that ticket stops until the scope is clarified. What I see happen quite often is that the team moves stuck tickets to a separate “On hold/Blocked” column. This breaks the flow of work, now the team have to remember where the ticket was blocked, was it during development? Testing?

A better approach is to flag the ticket in the column where it is stuck. For instance, Jira provides a convenient “Add flag” option to highlight tickets that are stuck without changing its status. When the blockage is removed, the flag can be removed and work continues from where it left off. The ticket is also visually striking when is flagged, it demands attention, which it should.

Hidden complexity is just one reason why a ticket cannot move forward. There are other reasons, but they are all a result of the same thing: external dependencies. For instance, hidden complexity means going back to discovery with stakeholders, and stakeholders are not part of the team, they are an external dependency. This is not a bad thing, but it is important that the team understand their domain of control, and what can slow them down.

Understanding your domain of control

The essence of the Kanban board is that when a ticket is added to, the team can say “Yes, we can deliver that”. The work is clearly defined, but more importantly all the resources needed to deliver that piece of work are in the team: designers, developers, testers, devops, etc.

If that is not the case, then the team are relying on third parties (external to the team and/or external to the company) to get the feature into production. And every time the team need help from a third-party to move the ticket forward, they are essentially blocked because they have no control over the priorities a third-party has. So while some people work closely with the team, they are still not part of the team and so they block the team because they answer to another master with other priorities.

If a lot of tickets are being blocked, the tendency is to start working on something else, rather than solving the blockers, which just adds to WIP. Instead the team must relentlessly focus on removing the blockers, whether it means adding the necessary resources to the team, doing more in-depth discovery, or doing a more radical re-evaluation of the team purpose.

Managing risk

Finally, the Kanban board can be used to manage risk. In a nutshell, a lot of WIP means longer lead times, which increases the risk that priorities change before the feature is shipped, with the result that features are abandoned halfway through development which is an expensive way to run a business.

Regardless of what development process and release cycle the team uses, if there are a lot of tickets on the board it means that when something new is added, it will have to wait until all the work already on the board goes into production before it can be shipped.

Since the work on the board is supposed to be well-defined, it should be possible to make some ballpark estimate how long it will take to deliver everything on the board. Let’s say, your estimate is 3 months. Ask yourself, how confident are you that you will be able to deliver everything on the board before priorities change?

The goal of the Agile coach

Agile teams often use velocity as a metric to measure the team’s performance. This is a measure of the throughput of the team, how fast they are at delivering stuff. But this metric alone cannot be used to determine if the team are making customers happier or helping the company to make money.

To illustrate the problem let’s suppose we have three delivery teams in a chain. The first team pulls stories from the backlog and feeds them to the next team and so on until the feature is delivered to the customer. Throughput is low, so the company hires a couple of agile coaches to help the teams work more efficiently. Their goal is to:

Increase team velocity.

The coaches are good at their jobs, helping each team create INVEST-type stories, remove impediments, focus on delivering one thing at a time, CI/CD, etc. They soon maximise the team’s efficiency, realising their full potential, velocity reaches 100%, hurrah! But wait, there is still a bottleneck; the capacity of the FE team is limiting the flow of features to the customer. The queue of stories is also problematic as the all important shared understanding between teams is quickly lost as the wait-time between work centres becomes longer. What did the coaches miss?

What the scenario shows is that the Agile coaches cannot focus solely on the velocity of stories in the individual teams. It is still a useful measurement for planning team capacities, but the goal cannot be to maximise this value. If velocity can’t be used to solve the queue problem, then the coaches need another measurement that does. Observe that the wait-time between work centres delays the feature getting to the customer. In other words, wait-time adds to the total time needed to design, build and deliver features, i.e. it increases lead time. The coaches are given a new goal:

Increase team velocity while minimising the lead time for new features.

When should the coaches start measuring lead time? Probably from the time the company commits to delivering the feature. How do the coaches measure lead-time? Well, if the product owners are using a Kanban board for example, then they can just write the date on the card when they committed to building and delivering the feature. Then, when the feature begins its journey across the board the coaches can measure how long it took to reach the customer.

The coaches now have two conflicting metrics. On the one hand the team want to maximise their story velocity (local efficiency), but on the other hand the company wants to reduce time-to-market (TTM) (i.e. minimise lead time). (I am deliberating ignoring throughput which is the average time to deliver any feature. Even if throughput is high, it could still take months to deliver any one particular feature if the lead time is long. Thus, adapting to change becomes hard.)

Conclusion: the team must subordinate itself to the company’s goal. This means that if there is a downstream bottleneck (the FE team), then the BE team cannot keep pushing more work into their queue. In practice this means that the BE team cannot start a new story until the queue is cleared. The best way to manage this is using pull instead of push. If the BE team is finished with a story, then it is still included in their WIP limit (Work-In-Process rather than Work-In-Progress) preventing the team from starting a new story. When the FE team is ready to work on a new story they pull it from the BE team.

Do the BE team go home and wait for the FE team to pull the story? Ideally yes, in reality, no; there is always work to be done. Software systems pay a perpetual rent, what can be called maintenance debt, that must be continuously paid off to prevent the slow glide into non-compliance and obsolescence. In practice, the team should maintain a technical backlog of work that they can do while waiting for the bottleneck to pull work.

A more realistic scenario would be that the Design team releases the story simultaneously to the Frontend and Backend teams to work on. This is what happens in loosely-couple architectures; the two teams can agree on a contract and then work independently of each other to deliver their respective parts. This would improve lead time but there is still the problem of one team running faster than the other. What happens is that the bottleneck has moved to the end of the delivery process when the feature is delivered to the customer.

What more can our Agile coaches do to reduce lead times? Cross-functional teams are considered a good thing in Agile, can they help with reducing lead times? Let’s illustrate that in a completely new diagram.

What’s that? I just drew a box around the old diagram you say? OK, yes, I did. Instead of three separate delivery teams, we now have one product team. It’s still the same mix of competencies and the same system architecture, so why would we expect the delivery process to behave any differently?

So how is using a cross-functional team better? Well, the delivery teams now have a common goal that is set by the Product Owner. (The PO must express the company’s goal in terms that are meaningful to the team.) Also, the potential for collaboration and innovation, and the ability to “build the right thing” can be fully exploited. What about velocity and lead time? Maximising velocity now also means reducing lead time. Since there is one WIP limit for the whole team, there are no queues, further reducing lead time. Well done coaches!

OK, let’s take a step back. The coaches are now using two metrics to achieve the goal of reducing lead times efficiently. The new cross-functional team is pumping out features faster than ever. Are the customers happy? Is the company making more money? Eh, still no idea. How do we measure customer happiness or the return-on-investment for all the features the team is delivering?

The team must find some way to measure the effect a feature has on customer growth or customer retention, or increase in revenue, or whatever is important to the company. However, the team’s velocity enables it to fire off lots of features in rapid succession, making it impossible to know which features are actually the ones that are helping the team achieve its goal, and which are just adding to system complexity and maintenance debt. Let’s call this the feature success rate, i.e. what percentage of features released move the team towards their goal.

Once again there is a conflict between two metrics: velocity and feature success rate. The customers must be given the time to evaluate each feature in turn in order for the team to know if it was successful or not. So now the customer has become a bottleneck with a WIP limit = 1. How do we increase the customer velocity?

One way is to divide the customers into groups, so-called A/B testing with each group evaluating different features or different versions of the same feature. But this is the most expensive way for the team to find out if they have built the right thing. Instead the team should try to figure out as early as possible and as cheaply as possible if a feature will move the team towards their goal: customer surveys, impact mapping, wireframes, etc.; whatever it takes to validate assumptions while building as little as possible. Also, when choosing between features the team should pick those that have the biggest impact. For our intrepid Agile coaches the goal is finally expressed as:

Increase team velocity and minimise the lead time for new features while increasing feature success rate.

Summary

One surprising result of this analysis is that it is not possible, nor desirable, for developers to spend 100% of their productive time developing features. This is something every Product Owner for a cross-functional team must be aware of. This is due both to the capacity constraints of the different competencies in the team, the variance in the work itself, WIP limits and the bottleneck (wherever that happens to be in the flow).

Developers must maintain a technical backlog to work on when they are blocked by WIP limits or starved of new work. Like automated testing, developers must also spend time devising methods to measure the impact of whatever features they create. This holistic approach will also help the team members better understand the team’s goal.

The purpose of this analysis was to identify the goal for an Agile coach and to find the minimum number of metrics that the Agile coach needs, to know if they are moving towards their goal. My conclusion is that these metrics are:

Team velocity to aid capacity planning and measure efficiency
Lead time to shorten TTM and allow the team to adapt to change quickly
Feature success rate to minimise the number of features used to meet the team’s goal

In short, the Agile coaches are moving towards their goal if the team’s velocity is increasing, lead time is reducing and feature success rate is increasing.

References

The Goal by Eliyahu M. Goldratt
GOTO 2012 • Frankenbuilds; if Agile is so good, why are our Products so bad? with Gabrielle Benefield

An introduction to Agile

In this article I will discuss how to get started with Agile in the most hands-on way possible, with no discussion of frameworks and methodologies. I believe it is important to understand the essence of Agile first, as it is easy to be overwhelmed with all of the techniques and tools that have evolved from it (Scrum, XP, SAFe, etc.).

The goal then is to create an iterative software development process that can be improved upon continuously. The only tool you are going to need is a stack of post-its and some wall space or a whiteboard where the team can work together.

Start small, which means starting at the team level. Learning to work in an Agile way will also require some experimentation as every team works differently. The point being that you will need to create some slack in the team’s schedule if you want to change the way they work. Finally, you or the team lead will take on the roll of Agile Team Coach.

I should also mention that there’s lots of help out there: blogs, forums and books. One excellent resource is the Agile and Lean Software Development Group on LinkedIn. Now let’s get started!

Step 1: Visualisation

First the team should start by visualising the their work, this is especially true in software development which by nature is very abstract. By visualisation, I am not referring to traditional documentation which tries to capture an entire scope such as requirements or test cases. What you want to visualise here is what the team is doing right now. For this you use post-its. Every team member writes down what they are working on, big or small, together with their initials in the corner, and sticks it onto a whiteboard or wall.

Now the team have an opportunity to discuss the work, make adjustments, add or remove post-its. The team can try to group related activities for instance. Spend about 5-10 minutes at this, no more; just enough time to smooth out the rough edges.

Step 2: Create flow

The next thing the team need to consider is what the definition of “Done” is for each post-it note. By “Done” we mean that the team is finished with the work item; it could be putting software in production, writing a manual, upgrading a database, etc. In reality, a lot of teams starting out with Agile do not have a clear definition of Done for their work items, so don’t sweat it too much yet. Fixing this will be part of the improvement process mentioned later on.

Create three columns on the whiteboard or wall and label them: Backlog, Doing, Done. Now each team member places each of their Post-its into one of the three columns. Finished? Great! You have now created your first Kanban board. (Here we introduce the most elementary and useful of Agile tools, the Kanban Board.)

So now the team has visualised their work and created a flow of work from left to right on the board. Congratulations!

Step 3: Reflection

The whole exercise above shouldn’t take more than an hour for a team of 10 people. Stand back and take a look at it. It may be obvious that some items on the board have unclear scope and some items are very large (or small). We’ll come back to these issues later.

One final exercise, sum the number of post-its in the Backlog and Doing columns and divide them by the number of members on the team. This will give you some indication of how much multitasking is going on and how much overhead is being created due to context-switching.

Step 4: Focusing on the goal

OK, the team have taken the important first steps in becoming Agile. And they will continue taking small steps, applying well-proven techniques that will improve the flow of work. But let’s discuss the goal; where is the team trying to get to? In the book The Phoenix Project, Bill is inspired by Lean manufacturing techniques used on production lines. Bill’s goal becomes the creation of a factory production line for his IT Department. As stated in Agile Principle #8:

Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely.

In other words the team shall create a process that they can reuse to build whatever software solutions the organisation or customers need now and in the future. The team can now take their new Kanban board and visualised work flow and use it to build a software factory!

Step 5: Breaking down the work

In our first iteration the team’s work items had unclear scope and different sizes. The team should deal with the scope problems first. This can be solved by breaking these work items into smaller items, each with a clear definition of “Done”. Spend 1-2 hours on this step, starting with the most important work items.

A classic problem is that a work item involves input from people outside the team. The Kanban board should not contain items that are assigned to “outsiders”. If this external work is a prerequisite for completing a team work item, then it should be added as a dependency to a team work item only. It is essential that the team have control over the work items on their board, even if they are currently blocked by external dependencies. The Kanban board should be used to focus on the team’s work!

Sizing of work items is about creating items that are of roughly equal size. As a rule-of-thumb, a work item should take about 2-3 days to complete, up to a maximum of two weeks. There are techniques for standardising work sizes, but for now I recommend a simple consensus from the team on whether an item is large or small or somewhere in between. Remember, if the definition of “Done” is software in production, then this must include coding, testing, etc.

In the worse case, a work item is so badly scoped and sized that it may not be possible to continue working on it in its current state and some more analysis (of requirements or architecture) is needed. If work stops altogether on such items then it should be moved into the backlog. This is one of the hardest things to do in Agile, but really knowing when a work item is ready for execution is one of the great benefits Agile brings.

A clear definition of Done for each item together with creating items of roughly equal size will build team confidence. By breaking down work items into smaller chunks and visualising them on the Kanban board it becomes possible for every team member (and stakeholders!) to understand what the team is going to deliver. And getting items to Done will make everyone happy.

Step 6: Limiting Work in process (WIP)

At this point the team have broken down the work into similar size chunks and this probably means that there are many more post-its on the board. (There are many tools available for creating digital Kanban boards, but this is still a low priority for now; wait 2-4 weeks before taking that step.) What the team needs to focus on next is WIP. This exercise should take about 30-60 minutes.

Earlier the team calculated how many work items were being done per team member. Ideally, each team member should be working on one item at a time, i.e. sequentially; so for a team of 10, the number of work items in the “Doing” column would be 10. In practice the figure is higher and the team need to think about what that number is.

In Agile terms, we are talking about the team capacity. We use this figure to set a work in progress (or process) limit (WIP limit). In other words, the team cannot start a new work item until they have finished a work item that is already in progress (unless an item is blocked). Remember, the team have a clear definition of Done for every work item, so they are supposed to be able to complete them before starting something new.

WIP limits are extremely important in creating flow. It follows that if the team tries to complete 20 work items at the same time it will take twice as long as if they were working on just 10 items.

For now, there is just one column with work in progress (“Doing”). The team should try to estimate how many man-days of work is in that column. Anything more than 30-40 days (3-4 days x 10 people) worth of work should be moved to the Backlog, and this means prioritising what needs to get done first. Prioritising is the responsibility of the Product Owner or Business Manager responsible for the product being developed, so naturally they need to be involved. Agile creates visibility for both the team and stakeholders!

Step 7: Daily stand-ups

Book 10-15 minutes with the team every morning for a stand-up in front of the Kanban board to discuss the day’s activities. The stand-up is for the team only, but guests can be invited on occasion. Longer discussions should be saved for break-out sessions with those involved. The focus of the stand-up is to make sure everybody knows what they are doing, if there are any blockers that need to be escalated, and to check that the Kanban board is up-to-date.

In case it’s not obvious, the Kanban board has now become the most important tool the team have for organising and visualising their work. Well done!

Conclusion

The team have made great progress! They have managed to visualise their work, create flow, size their work items and limit their work in progress. This demonstrates the concept of Continuous Improvement (“Kaizan“) as preached in Lean manufacturing, meaning that the team are constantly looking for ways to improve the flow of work.

In Agile we use Retrospectives to specifically discuss how well the flow of work is, well, working. All the team are involved in suggesting improvements, and then some or all of the team are responsible for implementing at least one improvement right away. Process automation (e.g. test automation) is a classic example of improving flow.

There are many, many other techniques that are used as part of Agile such as User Stories, Storyboarding, Minimum Viable Products (MVPs), backlog refinement and measuring velocity to create an iterative software development process. Scrum is a subject onto itself. But these are topics for another article.

Tag: wip